Link Rot and Digital Decay on Government, News and Other Webpages – Pew Research Center
Pew Research Center conducted the analysis to examine how often online content that once existed becomes inaccessible. One part of the study looks at a representative sample of webpages that existed over the past decade to see how many are still accessible today. For this analysis, we collected a sample of pages from the Common Crawl web repository for each year from 2013 to 2023. We then tried to access those pages to see how many still exist.
A second part of the study looks at the links on existing webpages to see how many of those links are still functional. We did this by collecting a large sample of pages from government websites, news websites and the online encyclopedia Wikipedia.
We identified relevant news domains using data from the audience metrics company comScore and relevant government domains (at multiple levels of government) using data from get.gov, the official administrator for the .gov domain. We collected the news and government pages via Common Crawl and the Wikipedia pages from an archive maintained by the Wikimedia Foundation. For each collection, we identified the links on those pages and followed them to their destination to see what share of those links point to sites that are no longer accessible.
A third part of the study looks at how often individual posts on social media sites are deleted or otherwise removed from public view. We did this by collecting a large sample of public tweets on the social media platform X (then known as Twitter) in real time using the Twitter Streaming API. We then tracked the status of those tweets for a period of three months using the Twitter Search API to monitor how many were still publicly available. Refer to the report methodology for more details.
The internet is an unimaginably vast repository of modern life, with hundreds of billions of indexed webpages. But even as users across the world rely on the web to access books, images, news articles and other resources, this content sometimes disappears from view.
A new Pew Research Center analysis shows just how fleeting online content actually is:
This digital decay occurs in many different online spaces. We examined the links that appear on government and news websites, as well as in the References section of Wikipedia pages as of spring 2023. This analysis found that:
To see how digital decay plays out on social media, we also collected a real-time sample of tweets during spring 2023 on the social media platform X (then known as Twitter) and followed them for three months. We found that:
There are many ways of defining whether something on the internet that used to exist is now inaccessible to people trying to reach it today. For instance, inaccessible could mean that:
For this report, we focused on the first of these: pages that no longer exist. The other definitions of accessibility are beyond the scope of this research.
Our approach is a straightforward way of measuring whether something online is accessible or not. But even so, there is some ambiguity.
First, there are dozens of status codes indicating a problem that a user might encounter when they try to access a page. Not all of them definitively indicate whether the page is permanently defunct or just temporarily unavailable. Second, for security reasons, many sites actively try to prevent the sort of automated data collection that we used to test our full list of links.
For these reasons, we used the most conservative estimate possible for deciding whether a site was actually accessible or not. We counted pages as inaccessible only if they returned one of nine error codes that definitively indicate that the page and/or its host server no longer exist or have become nonfunctional regardless of how they are being accessed, and by whom. The full list of error codes that we included in our definition are in the methodology.
Here are some of the findings from our analysis of digital decay in various online spaces.
To conduct this part of our analysis, we collected a random sample of just under 1 million webpages from the archives of Common Crawl, an internet archive service that periodically collects snapshots of the internet as it exists at different points in time. We sampled pages collected by Common Crawl each year from 2013 through 2023 (approximately 90,000 pages per year) and checked to see if those pages still exist today.
We found that 25% of all the pages we collected from 2013 through 2023 were no longer accessible as of October 2023. This figure is the sum of two different types of broken pages: 16% of pages are individually inaccessible but come from an otherwise functional root-level domain; the other 9% are inaccessible because their entire root domain is no longer functional.
Not surprisingly, the older snapshots in our collection had the largest share of inaccessible links. Of the pages collected from the 2013 snapshot, 38% were no longer accessible in 2023. But even for pages collected in the 2021 snapshot, about one-in-five were no longer accessible just two years later.
We sampled around 500,000 pages from government websites using the Common Crawl March/April 2023 snapshot of the internet, including a mix of different levels of government (federal, state, local and others). We found every link on each page and followed a random selection of those links to their destination to see if the pages they refer to still exist.
Across the government websites we sampled, there were 42 million links. The vast majority of those links (86%) were internal, meaning they link to a different page on the same website. An explainer resource on the IRS website that links to other documents or forms on the IRS site would be an example of an internal link.
Around three-quarters of government webpages we sampled contained at least one on-page link. The typical (median) page contains 50 links, but many pages contain far more. A page in the 90th percentile contains 190 links, and a page in the 99th percentile (that is, the top 1% of pages by number of links) has 740 links.
Other facts about government webpage links:
When we followed these links, we found that 6% point to pages that are no longer accessible. Similar shares of internal and external links are no longer functional.
Overall, 21% of all the government webpages we examined contained at least one broken link. Across every level of government we looked at, there were broken links on at least 14% of pages; city government pages had the highest rates of broken links.
For this analysis, we sampled 500,000 pages from 2,063 websites classified as News/Information by the audience metrics firm comScore. The pages were collected from the Common Crawl March/April 2023 snapshot of the internet.
Across the news sites sampled, this collection contained more than 14 million links pointing to an outside website. Some 94% of these pages contain at least one external-facing link. The median page contains 20 links, and pages in the top 10% by link count have 56 links.
Like government websites, the vast majority of these links go to secure HTTP pages (those with a URL beginning with https://). Around 12% of links on these news sites point to a static file, like a PDF document. And 32% of links on news sites redirected to a different URL than the one they originally pointed to slightly less than the 39% of external links on government sites that redirect.
When we tracked these links to their destination, we found that 5% of all links on news site pages are no longer accessible. And 23% of all the pages we sampled contained at least one broken link.
Broken links are about as prevalent on the most-trafficked news websites as they are on the least-trafficked sites. Some 25% of pages on news websites in the top 20% by site traffic have at least one broken link. That is nearly identical to the 26% of sites in the bottom 20% by site traffic.
For this analysis, we collected a random sample of 50,000 English-language Wikipedia pages and examined the links in their References section. The vast majority of these pages (82%) contain at least one reference link that is, one that directs the reader to a webpage other than Wikipedia itself.
In total, there are just over 1 million reference links across all the pages we collected. The typical page has four reference links.
The analysis indicates that 11% of all references linked on Wikipedia are no longer accessible. On about 2% of source pages containing reference links, every link on the page was broken or otherwise inaccessible, while another 53% of pages contained at least one broken link.
For this analysis, we collected nearly 5 million tweets posted from March 8 to April 27, 2023, on the social media platform X, which at the time was known as Twitter. We did this using Twitters Streaming API, collecting 3,000 public tweets every 30 minutes in real time. This provided us with a representative sample of all tweets posted on the platform during that period. We monitored those tweets until June 15, 2023, and checked each day to see if they were still available on the site or not.
At the end of the observation period, we found that 18% of the tweets from our initial collection window were no longer publicly visible on the site. In a majority of cases, this was because the account that originally posted the tweet was made private, suspended or deleted entirely. For the remaining tweets, the account that posted the tweet was still visible on the site, but the individual tweet had been deleted.
Tweets were especially likely to be deleted or removed over the course of our collection period if they were:
We also found that removed or deleted tweets tended to come from newer accounts with relatively few followers and modest activityon the site. On average, tweets that were no longer visible on the site were posted by accounts around eight months younger than those whose tweets stayed on the site.
And when we analyzed the types of tweets that were no longer available, we found that retweets, quote tweets and original tweets did not differ much from the overall average. But replies were relatively unlikely to be removed just 12% of replies were inaccessible at the end of our monitoring period.
Most tweets that are removed from the site tend to disappear soon after being posted. In addition to looking at how many tweets from our collection were still available at the end of our tracking period, we conducted a survival analysis to see how long these tweets tended to remain available. We found that:
Put another way: Half of tweets that are eventually removed from the platform are unavailable within the first six days of being posted. And 90% of these tweets are unavailable within 46 days.
Tweets dont always disappear forever, though. Some 6% of the tweets we collected disappeared and then became available again at a later point. This could be due to an account going private and then returning to public status, or to the account being suspended and later reinstated. Of those reappeared tweets, the vast majority (90%) were still accessible on Twitter at the end of the monitoring period.
Link:
Link Rot and Digital Decay on Government, News and Other Webpages - Pew Research Center
- 'A really bad idea': Wikipedia's Jimmy Wales on Australia's social media ban, trust and the truth - Crikey - May 1st, 2026 [May 1st, 2026]
- The Wikipedia Play: Overlooked Reputation Lever for Law Firms in the AI Era - Law.com - May 1st, 2026 [May 1st, 2026]
- Indonesia, Wikimedia reach deal to keep Wikipedia accessible amid regulatory concerns - Indonesia Business Post - May 1st, 2026 [May 1st, 2026]
- Capacity Building: Beyond Article Writing Organizing Wikipedia in Your Language with Categories and Other Curation Tools - Wikimedia.org - May 1st, 2026 [May 1st, 2026]
- Wikipedia has become a battlefield, and we are on the losing side - ynetnews - April 27th, 2026 [April 27th, 2026]
- How to Find the Best and Cheapest Airfares Using Google Flights and Wikipedia (Yes, Wikipedia!) - AFAR - April 27th, 2026 [April 27th, 2026]
- FAO expands free public access to agrifood knowledge through collaboration on Wikipedia - Food and Agriculture Organization - April 27th, 2026 [April 27th, 2026]
- Depth Of A Wikipedia Article: Michael Jackson Biopic Earns Negative Reviews, Here Are The Most Brutal - AOL.com - April 27th, 2026 [April 27th, 2026]
- Meta is logging employee keystrokes on Google LinkedIn and Wikipedia to feed its AI models - Startup Fortune - April 27th, 2026 [April 27th, 2026]
- Pat Kane: Wikipedia, encyclopaedias, and the dark art of 'wiki-laundering' - The National Scot - April 27th, 2026 [April 27th, 2026]
- 25 years of Wikipedia - ucanews.com - April 19th, 2026 [April 19th, 2026]
- In Belarusian Wikipedia, edits to political articles can no longer be hidden. Why did this happen, and what a - - April 19th, 2026 [April 19th, 2026]
- March @ WMGH: Documenting Women in Highlife and Growing Our Wikipedia Editing Community - Wikimedia.org - April 19th, 2026 [April 19th, 2026]
- Now the PlayStation 3 game emulator configures everything itself - RPCS3 will use data from Wikipedia - ixbt.games - April 19th, 2026 [April 19th, 2026]
- Celebrating Wikipedia 25 in Tashkent: A New Generation of Uzbek Wikimedians Takes the Lead - Wikimedia.org - April 17th, 2026 [April 17th, 2026]
- Cebuano Wikipedia: From Ghost Town to Growth Engine - Wikimedia.org - April 17th, 2026 [April 17th, 2026]
- Celebrating 25 Years of Wikipedia at Manipal University Jaipur: Learning, Innovation, and Community - Wikimedia.org - April 17th, 2026 [April 17th, 2026]
- Wikipedia founder says trust is broken here's how to rebuild it - axios.com - April 7th, 2026 [April 7th, 2026]
- Women in the spotlight: stories that are shaping Wikipedia - Wikimedia.org - April 7th, 2026 [April 7th, 2026]
- Writing against the status quo: What can a Suriname edit-a-thon add to the Wikipedia public sphere? - Diggit Magazine - April 7th, 2026 [April 7th, 2026]
- Musician Plays Magnetic Reel-to-Reel Tape in Sync With Wikipedia Articles for Its 25th Anniversary - Laughing Squid - April 7th, 2026 [April 7th, 2026]
- Meet the group correcting gender bias on Wikipedia and beyond - Thenational Scot - April 7th, 2026 [April 7th, 2026]
- Coming Soon To Wikipedia Archaeology In Aotearoa - Scoop - New Zealand News - April 7th, 2026 [April 7th, 2026]
- An AI Agent Was Banned From Creating Wikipedia Articles, Then Wrote Angry Blogs About Being Banned - 404 Media - April 5th, 2026 [April 5th, 2026]
- Edit War Breaks Out on Chillis Wikipedia Page Over Trump Donations - meidasnews.com - April 5th, 2026 [April 5th, 2026]
- Wikipedia Editors Tried and Tried to Work With AI Content, Eventually Realized It Was Total Trash and Banned It Entirely - Futurism - April 5th, 2026 [April 5th, 2026]
- Wikidata graphs for data visualisation of endangered horse breeds in Wikipedia - Wikimedia.org - April 5th, 2026 [April 5th, 2026]
- How Wikipedia of cyber helps SAP make sense of threat data - Computer Weekly - April 5th, 2026 [April 5th, 2026]
- Closing the Gender Gap on Wikipedia: Art + Feminism Edit-a-thon - WashU Libraries - April 5th, 2026 [April 5th, 2026]
- Wikipedia Shares Its Stance on AI-Written Articles - newsbreaks.infotoday.com - April 5th, 2026 [April 5th, 2026]
- AI Agent Runs the Im Being Censored Playbook After Getting Banned from Wikipedia - Gizmodo - April 5th, 2026 [April 5th, 2026]
- AI Agent Gets Banned From Wikipedia Then Accuses Human Editors of Uncivil Behavior - tech.yahoo.com - April 5th, 2026 [April 5th, 2026]
- Colm O'Regan: 'Browsing Wikipedia is like taking a bus, missing your stop, and waking up in a strange town' - Irish Examiner - April 5th, 2026 [April 5th, 2026]
- AI bot gets banned from Wikipedia, then writes angry blogs protesting about it - indiatoday.in - April 5th, 2026 [April 5th, 2026]
- Wikipedia Banned an AI Bot from Writing Articles. It Then Wrote an Angry Rant Blog - Republic World - April 5th, 2026 [April 5th, 2026]
- Wikipedia bans AI bot 'Tom': It responded with furious blog posts that went viral; heres what it said - bhaskarenglish.in - April 5th, 2026 [April 5th, 2026]
- AI Bot Protests Wikipedia Ban With Viral Angry Blogs; Heres What It Said - Mashable India - April 5th, 2026 [April 5th, 2026]
- Wikipedia Bans AI Agent for Spamming Articles AI Responds With Furious Blog Rants - International Business Times UK - April 5th, 2026 [April 5th, 2026]
- Arabic-language Wikipedia filled with terrorist propaganda, bias report - The Times of Israel - March 26th, 2026 [March 26th, 2026]
- I was surprised how upset some people got: A conversation with the creator of TomWikiAssist, the bot that edited Wikipedia - Nieman Lab - March 26th, 2026 [March 26th, 2026]
- Arabic Wikipedia Riddled With Terror Propaganda and Bias, New Investigation Shows - Algemeiner.com - March 26th, 2026 [March 26th, 2026]
- Wikipedia mulling whether to rename entry on Hamas beheading babies hoax - JNS - March 26th, 2026 [March 26th, 2026]
- GZERO WORLD WITH IAN BREMMER: In Wikipedia We Trust? - KPBS - March 26th, 2026 [March 26th, 2026]
- AI Memory Project Transforms Personal Photos Into a Wikipedia-Style Archive - Tech Times - March 26th, 2026 [March 26th, 2026]
- This guy used AI to document his grandmother's life on a personal Wikipedia and now you can, too - Boing Boing - March 26th, 2026 [March 26th, 2026]
- Wikipedia Bans AI-Generated Text With Two Exceptions What Every Editor Must Know Now - International Business Times UK - March 26th, 2026 [March 26th, 2026]
- Twenty-Five Years of Free Knowledge: Wiki Palestine Celebrates a Quarter Century of Wikipedia - Wikimedia.org - March 26th, 2026 [March 26th, 2026]
- Who is pushing the propaganda tag against Dhurandar on Wikipedia? How an anti-Hindu Wikipedia Editor booked in Manipur for inciting violence cited... - March 26th, 2026 [March 26th, 2026]
- World Jewish Congress report finds extensive, systemic bias on Arabic Wikipedia - JNS.org - JNS - March 26th, 2026 [March 26th, 2026]
- Quiz: Name these 10 national team managers from Wikipedia - Planet Football - March 26th, 2026 [March 26th, 2026]
- The Unsung Heroes of Kit Culture: Appreciating Wikipedia's Pixel Kit Artists - Footy Headlines - March 24th, 2026 [March 24th, 2026]
- Wikipedia has banned AI-generated text, with two exceptions - How-To Geek - March 24th, 2026 [March 24th, 2026]
- 39 Unusual Places With Their Own Wikipedia Pages That Showcase The Worlds Weirdest Sites - AOL.com - March 24th, 2026 [March 24th, 2026]
- PR firm linked to Gates-backed AGRA edited Wikipedia to remove criticism - U.S. Right to Know - March 24th, 2026 [March 24th, 2026]
- In Wikipedia We Trust? - WLIW - March 24th, 2026 [March 24th, 2026]
- Palestinians trained to fill Wikipedia with anti-Israel propaganda - The Telegraph - March 15th, 2026 [March 15th, 2026]
- SimWikiMap for MSFS 2024 brings Wikipedia to your cockpit tablet - MSFS Addons - March 15th, 2026 [March 15th, 2026]
- The Editors by Stephen Harrison: Wikipedia, internet communities, and the battle for truth in the digital age - New America - March 11th, 2026 [March 11th, 2026]
- Wikipedia Forced to Lock Down Edits Over JavaScript That Could Delete Pages - PCMag - March 9th, 2026 [March 9th, 2026]
- At 25, Wikipedia faces a double threat: the rise of AI and the decline of local media - CBC - March 9th, 2026 [March 9th, 2026]
- Oh no, Wikipedia has been turned into a gacha card game and I can already feel my time slipping away from me - Rock Paper Shotgun - March 9th, 2026 [March 9th, 2026]
- Please send help: We can't stop opening packs in Wikigacha, a browser-based card game where you collect Wikipedia articles like 'List of Red Hot Chili... - March 9th, 2026 [March 9th, 2026]
- Wikipedia hit by self-propagating JavaScript worm that vandalized pages - BleepingComputer - March 9th, 2026 [March 9th, 2026]
- Wikipedia's been turned into a Pokemon TCG-like gacha game where you collect its pages, because the random article button wasn't distracting enough... - March 9th, 2026 [March 9th, 2026]
- At 25, Wikipedia confronts twin challenges: the surge of AI and the downturn of local journalism. - stl.news - March 9th, 2026 [March 9th, 2026]
- Wikipedia administrator account compromised and temporarily put into read-only mode - GIGAZINE - March 9th, 2026 [March 9th, 2026]
- Zara Larsson Begs Wikipedia Editors to 'Cut It Out' and Stop Changing Her Photo to Unflattering Snap - People.com - February 20th, 2026 [February 20th, 2026]
- Knowledge is human: Co-founder Jimmy Wales on why Wikipedia still matters in an AI world - The Indian Express - February 20th, 2026 [February 20th, 2026]
- Zara Larsson begs fans to stop changing her Wikipedia photo - The Independent - February 20th, 2026 [February 20th, 2026]
- How to Use Jwikithe Wikipedia for all Things Epstein Files - inc.com - February 20th, 2026 [February 20th, 2026]
- Zara Larsson is at to war with Wikipedia over her photo - - Happy Mag - February 20th, 2026 [February 20th, 2026]
- Hamas-Linked NGO Trains Gazans to Influence Wikipedia Narratives on Israel - Combat Antisemitism Movement - February 20th, 2026 [February 20th, 2026]
- Zara Larsson Is Begging You to Stop Changing Her Wikipedia Photo - Exclaim! - February 20th, 2026 [February 20th, 2026]
- Meet wonderkid Tom Edozie who doesn't have Wikipedia and unknown to Wolves boss - The Sun - February 20th, 2026 [February 20th, 2026]
- IIT Guwahati Unveils Scalable Method To Detect Wikipedia Name Errors At AI Summit 2026 - BW Education - February 20th, 2026 [February 20th, 2026]
- Org. trains Gazans to edit Israel, Palestine on Wikipedia - The Jerusalem Post - February 18th, 2026 [February 18th, 2026]
- Theres a whole show about Wikipedia, and its delightful and hopeful - San Francisco Chronicle - February 18th, 2026 [February 18th, 2026]
- Wikipedia is having a renaissance in the age of AI - vox.com - February 18th, 2026 [February 18th, 2026]
- Wikipedia: The Non-Profit Exception on the Web in the AI Era | 2026 - nssmag.com - February 18th, 2026 [February 18th, 2026]
- German Wikipedia bans AI-generated content while other language editions take a softer approach - the-decoder.com - February 18th, 2026 [February 18th, 2026]