Link Rot and Digital Decay on Government, News and Other Webpages – Pew Research Center
Pew Research Center conducted the analysis to examine how often online content that once existed becomes inaccessible. One part of the study looks at a representative sample of webpages that existed over the past decade to see how many are still accessible today. For this analysis, we collected a sample of pages from the Common Crawl web repository for each year from 2013 to 2023. We then tried to access those pages to see how many still exist.
A second part of the study looks at the links on existing webpages to see how many of those links are still functional. We did this by collecting a large sample of pages from government websites, news websites and the online encyclopedia Wikipedia.
We identified relevant news domains using data from the audience metrics company comScore and relevant government domains (at multiple levels of government) using data from get.gov, the official administrator for the .gov domain. We collected the news and government pages via Common Crawl and the Wikipedia pages from an archive maintained by the Wikimedia Foundation. For each collection, we identified the links on those pages and followed them to their destination to see what share of those links point to sites that are no longer accessible.
A third part of the study looks at how often individual posts on social media sites are deleted or otherwise removed from public view. We did this by collecting a large sample of public tweets on the social media platform X (then known as Twitter) in real time using the Twitter Streaming API. We then tracked the status of those tweets for a period of three months using the Twitter Search API to monitor how many were still publicly available. Refer to the report methodology for more details.
The internet is an unimaginably vast repository of modern life, with hundreds of billions of indexed webpages. But even as users across the world rely on the web to access books, images, news articles and other resources, this content sometimes disappears from view.
A new Pew Research Center analysis shows just how fleeting online content actually is:
This digital decay occurs in many different online spaces. We examined the links that appear on government and news websites, as well as in the References section of Wikipedia pages as of spring 2023. This analysis found that:
To see how digital decay plays out on social media, we also collected a real-time sample of tweets during spring 2023 on the social media platform X (then known as Twitter) and followed them for three months. We found that:
There are many ways of defining whether something on the internet that used to exist is now inaccessible to people trying to reach it today. For instance, inaccessible could mean that:
For this report, we focused on the first of these: pages that no longer exist. The other definitions of accessibility are beyond the scope of this research.
Our approach is a straightforward way of measuring whether something online is accessible or not. But even so, there is some ambiguity.
First, there are dozens of status codes indicating a problem that a user might encounter when they try to access a page. Not all of them definitively indicate whether the page is permanently defunct or just temporarily unavailable. Second, for security reasons, many sites actively try to prevent the sort of automated data collection that we used to test our full list of links.
For these reasons, we used the most conservative estimate possible for deciding whether a site was actually accessible or not. We counted pages as inaccessible only if they returned one of nine error codes that definitively indicate that the page and/or its host server no longer exist or have become nonfunctional regardless of how they are being accessed, and by whom. The full list of error codes that we included in our definition are in the methodology.
Here are some of the findings from our analysis of digital decay in various online spaces.
To conduct this part of our analysis, we collected a random sample of just under 1 million webpages from the archives of Common Crawl, an internet archive service that periodically collects snapshots of the internet as it exists at different points in time. We sampled pages collected by Common Crawl each year from 2013 through 2023 (approximately 90,000 pages per year) and checked to see if those pages still exist today.
We found that 25% of all the pages we collected from 2013 through 2023 were no longer accessible as of October 2023. This figure is the sum of two different types of broken pages: 16% of pages are individually inaccessible but come from an otherwise functional root-level domain; the other 9% are inaccessible because their entire root domain is no longer functional.
Not surprisingly, the older snapshots in our collection had the largest share of inaccessible links. Of the pages collected from the 2013 snapshot, 38% were no longer accessible in 2023. But even for pages collected in the 2021 snapshot, about one-in-five were no longer accessible just two years later.
We sampled around 500,000 pages from government websites using the Common Crawl March/April 2023 snapshot of the internet, including a mix of different levels of government (federal, state, local and others). We found every link on each page and followed a random selection of those links to their destination to see if the pages they refer to still exist.
Across the government websites we sampled, there were 42 million links. The vast majority of those links (86%) were internal, meaning they link to a different page on the same website. An explainer resource on the IRS website that links to other documents or forms on the IRS site would be an example of an internal link.
Around three-quarters of government webpages we sampled contained at least one on-page link. The typical (median) page contains 50 links, but many pages contain far more. A page in the 90th percentile contains 190 links, and a page in the 99th percentile (that is, the top 1% of pages by number of links) has 740 links.
Other facts about government webpage links:
When we followed these links, we found that 6% point to pages that are no longer accessible. Similar shares of internal and external links are no longer functional.
Overall, 21% of all the government webpages we examined contained at least one broken link. Across every level of government we looked at, there were broken links on at least 14% of pages; city government pages had the highest rates of broken links.
For this analysis, we sampled 500,000 pages from 2,063 websites classified as News/Information by the audience metrics firm comScore. The pages were collected from the Common Crawl March/April 2023 snapshot of the internet.
Across the news sites sampled, this collection contained more than 14 million links pointing to an outside website. Some 94% of these pages contain at least one external-facing link. The median page contains 20 links, and pages in the top 10% by link count have 56 links.
Like government websites, the vast majority of these links go to secure HTTP pages (those with a URL beginning with https://). Around 12% of links on these news sites point to a static file, like a PDF document. And 32% of links on news sites redirected to a different URL than the one they originally pointed to slightly less than the 39% of external links on government sites that redirect.
When we tracked these links to their destination, we found that 5% of all links on news site pages are no longer accessible. And 23% of all the pages we sampled contained at least one broken link.
Broken links are about as prevalent on the most-trafficked news websites as they are on the least-trafficked sites. Some 25% of pages on news websites in the top 20% by site traffic have at least one broken link. That is nearly identical to the 26% of sites in the bottom 20% by site traffic.
For this analysis, we collected a random sample of 50,000 English-language Wikipedia pages and examined the links in their References section. The vast majority of these pages (82%) contain at least one reference link that is, one that directs the reader to a webpage other than Wikipedia itself.
In total, there are just over 1 million reference links across all the pages we collected. The typical page has four reference links.
The analysis indicates that 11% of all references linked on Wikipedia are no longer accessible. On about 2% of source pages containing reference links, every link on the page was broken or otherwise inaccessible, while another 53% of pages contained at least one broken link.
For this analysis, we collected nearly 5 million tweets posted from March 8 to April 27, 2023, on the social media platform X, which at the time was known as Twitter. We did this using Twitters Streaming API, collecting 3,000 public tweets every 30 minutes in real time. This provided us with a representative sample of all tweets posted on the platform during that period. We monitored those tweets until June 15, 2023, and checked each day to see if they were still available on the site or not.
At the end of the observation period, we found that 18% of the tweets from our initial collection window were no longer publicly visible on the site. In a majority of cases, this was because the account that originally posted the tweet was made private, suspended or deleted entirely. For the remaining tweets, the account that posted the tweet was still visible on the site, but the individual tweet had been deleted.
Tweets were especially likely to be deleted or removed over the course of our collection period if they were:
We also found that removed or deleted tweets tended to come from newer accounts with relatively few followers and modest activityon the site. On average, tweets that were no longer visible on the site were posted by accounts around eight months younger than those whose tweets stayed on the site.
And when we analyzed the types of tweets that were no longer available, we found that retweets, quote tweets and original tweets did not differ much from the overall average. But replies were relatively unlikely to be removed just 12% of replies were inaccessible at the end of our monitoring period.
Most tweets that are removed from the site tend to disappear soon after being posted. In addition to looking at how many tweets from our collection were still available at the end of our tracking period, we conducted a survival analysis to see how long these tweets tended to remain available. We found that:
Put another way: Half of tweets that are eventually removed from the platform are unavailable within the first six days of being posted. And 90% of these tweets are unavailable within 46 days.
Tweets dont always disappear forever, though. Some 6% of the tweets we collected disappeared and then became available again at a later point. This could be due to an account going private and then returning to public status, or to the account being suspended and later reinstated. Of those reappeared tweets, the vast majority (90%) were still accessible on Twitter at the end of the monitoring period.
Link:
Link Rot and Digital Decay on Government, News and Other Webpages - Pew Research Center
- The right wing is coming for Wikipedia | On Point - WBUR - September 21st, 2025 [September 21st, 2025]
- Keeping information reliable in the digital age: Lessons from Wikipedia - Wikimedia Foundation - September 19th, 2025 [September 19th, 2025]
- Recent attacks on Wikipedia may have more to do with politics than accuracy - NPR - September 15th, 2025 [September 15th, 2025]
- Wikipedia is planning to take down Erika Kirk's page - and the reason why is shockingly brutal - The Tab - September 15th, 2025 [September 15th, 2025]
- In Neurocracy, it's up to you to solve a murder mystery through the internet's greatest resource, Wikipedia - Rock Paper Shotgun - September 13th, 2025 [September 13th, 2025]
- Wikipedia Editors Are Trying To Downplay Details Of Iryna Zarutska's Murder - OutKick - September 11th, 2025 [September 11th, 2025]
- Wikipedia Editors Are Trying To Downplay Details Of Iryna Zarutska's Murder - OutKick - September 11th, 2025 [September 11th, 2025]
- Recent attacks on Wikipedia may have more to do with politics than accuracy - KUOW - September 11th, 2025 [September 11th, 2025]
- Wikipedia Vs The Bengal Files: How politically motivated editors are distorting public perception of the movie - OpIndia - September 9th, 2025 [September 9th, 2025]
- The 10 Giveaway Signs Of AI Writing, Wikipedia Reveals - Forbes - September 9th, 2025 [September 9th, 2025]
- A Woman Was Stabbed to Death on a Train. Wikipedia Might Pretend It Never Happened. - The Free Press - September 9th, 2025 [September 9th, 2025]
- The Terrifying Reality of Wikipedia Bias in an AI World - National Review - September 6th, 2025 [September 6th, 2025]
- Biased Wikipedia Hurls Brickbats at Fox and Newsmax, Bouquets at CNN and MSNBC - The Daily Signal - September 6th, 2025 [September 6th, 2025]
- This self-hosted Wikipedia is wrong about everything, and it's hilarious - xda-developers.com - September 6th, 2025 [September 6th, 2025]
- Wikipedia is under attack and how it can survive - The Verge - September 5th, 2025 [September 5th, 2025]
- The Silent Architects of Wikipedia: How a Tiny Elite Shapes What We Know - Vocal - September 5th, 2025 [September 5th, 2025]
- GOP Investigation Pressures Wikipedia to Reveal Identities of Editors Accused of 'Bias' Against Israel - Common Dreams - August 29th, 2025 [August 29th, 2025]
- With just a Raspberry Pi, you can host your own offline Wikipedia: here's how I did it - xda-developers.com - August 29th, 2025 [August 29th, 2025]
- African science and tech missing from Wikipedia - SciDev.Net - August 29th, 2025 [August 29th, 2025]
- US Lawmakers Launch Investigation Into Wikipedia Over Claims of Systemic Anti-Israel Bias - Algemeiner.com - August 29th, 2025 [August 29th, 2025]
- House Republicans investigate Wikipedia over allegations of bias - Straight Arrow News - August 29th, 2025 [August 29th, 2025]
- House panel probing organized efforts to distort Wikipedia, including entries on Israel - JNS.org - August 29th, 2025 [August 29th, 2025]
- 'We seek your assistance...': US lawmakers wants to investigate Wikipedia over alleged biased entries; se - The Times of India - August 29th, 2025 [August 29th, 2025]
- Republican quest to meddle with all informational institutions arrives at Wikipedia - AV Club - August 29th, 2025 [August 29th, 2025]
- A mysterious Wikipedia editor is scrubbing Daniel Luries page of controversy - The San Francisco Standard - August 27th, 2025 [August 27th, 2025]
- Congress opens investigation into Wikipedia over foreign efforts to manipulate information - Washington Examiner - August 27th, 2025 [August 27th, 2025]
- Re: Response To Inaccuracies In The Wikipedia Article On Naa Gbewaa And The Origins Of The Mole-Dagbamba Kingdoms - Modern Ghana - August 26th, 2025 [August 26th, 2025]
- The subtle signs that give away chatbot writing, according to Wikipedia - TechSpot - August 26th, 2025 [August 26th, 2025]
- The most Wikipedia-searched people from Haverfordwest and Pembrokeshire towns - Yahoo News UK - August 26th, 2025 [August 26th, 2025]
- Jimmy Wales Says Wikipedia Could Use AI. Editors Call It the 'Antithesis of Wikipedia' - 404 Media - August 22nd, 2025 [August 22nd, 2025]
- Wikipedia and the Challenges of Open Editing Amid Crises - - August 22nd, 2025 [August 22nd, 2025]
- Weapons (2025) Wikipedia Plot Summary Review (Guest Column By Guy Too Scared to Watch It) - hard-drive.net - August 20th, 2025 [August 20th, 2025]
- Why Wikipedia might be the last good place on the internet - The Globe and Mail - August 20th, 2025 [August 20th, 2025]
- Bots flood Wikipedia prompting creators to set up other platform - The EastAfrican - August 18th, 2025 [August 18th, 2025]
- Wikipedia loses a round to the UK - Politico - August 14th, 2025 [August 14th, 2025]
- Wikipedia loses a round to the UK - R Street Institute - August 14th, 2025 [August 14th, 2025]
- Wikipedia loses UK Safety Act challenge, worries it will have to verify user IDs - Ars Technica - August 14th, 2025 [August 14th, 2025]
- UK Court Refuses to Exempt Wikipedia from Online Safety Act, Treats it on Par With Social Media - MediaNama - August 14th, 2025 [August 14th, 2025]
- Wikipedia May Have To Impose Identity Verification On Readers - Forbes - August 12th, 2025 [August 12th, 2025]
- Heres how to spot AI writing, according to Wikipedia editors - the-decoder.com - August 12th, 2025 [August 12th, 2025]
- Volunteers fight to keep AI slop off Wikipedia - The Washington Post - August 9th, 2025 [August 9th, 2025]
- Reddit tops AI information top sources list in 2025, outpacing Google and Wikipedia - Storyboard18 - August 9th, 2025 [August 9th, 2025]
- How Wikipedia is fighting AI slop content - The Verge - August 9th, 2025 [August 9th, 2025]
- Read this: How Wikipedia identifies and removes AI slop - AV Club - August 9th, 2025 [August 9th, 2025]
- Elon Musk Agrees to Ban Wikipedia from X Community Notes Over Bias Concerns - WebProNews - August 9th, 2025 [August 9th, 2025]
- Wikipedia volunteer from Punjab recognised at annual global conference in Kenya - Tribune India - August 7th, 2025 [August 7th, 2025]
- Wikipedia Deletes Article on YoungHoon Kim Amid Fake News and Defamation Concerns - Vocal - August 7th, 2025 [August 7th, 2025]
- Wikipedia goes to war against AI slop articles with new deletion policy - PCWorld - August 7th, 2025 [August 7th, 2025]
- When Truth Is Curated: Wikipedia, Groupthink, and the Slow March Toward New Speak - IOL - August 7th, 2025 [August 7th, 2025]
- How Bobby Shabangu is representing Africa on Wikipedia - Primedia Plus - August 7th, 2025 [August 7th, 2025]
- How Health Content is Published on Wikipedia:A Deep Dive into Collaborative Knowledge Creation - healthbusiness.co.ke - August 7th, 2025 [August 7th, 2025]
- Anti-Israel activists are rewriting Jewish history on Wikipedia heres why it matters - Unpacked - August 6th, 2025 [August 6th, 2025]
- Colleges Arent Supposed to Fiddle With Their Wikipedia Pages. They Try Anyway. - The Chronicle of Higher Education - August 1st, 2025 [August 1st, 2025]
- Cbd Gummies Wikipedia & Addressing Misinformation and Controversies Surrounding Dr Ben Carson CBD Gummies - Appalachian Voices - August 1st, 2025 [August 1st, 2025]
- 21 Disturbing Wikipedia Pages About Unethical Experiments, Rare Deadly Diseases, And Other Nightmares - Yahoo Home - July 28th, 2025 [July 28th, 2025]
- Naomi Ackie has been trying to correct birthday on Wikipedia 'for years' - Film-News.co.uk - July 27th, 2025 [July 27th, 2025]
- Wikipedia editors, the internets nerdy unsung heroes, keep the website one of the last best places online - The Globe and Mail - July 27th, 2025 [July 27th, 2025]
- AI is eating leftist garbage from Wikipedia and YOU consume what comes out - New York Post - July 24th, 2025 [July 24th, 2025]
- Wikipedia threatens to limit UK access to website - The Telegraph - July 24th, 2025 [July 24th, 2025]
- Pirate Software's Game Heartbound is Under Attack on Steam and Wikipedia - Sports Illustrated - July 24th, 2025 [July 24th, 2025]
- Like Wikipedia, Mastodon now shows a top banner urging you to donate to keep it alive - Neowin - July 24th, 2025 [July 24th, 2025]
- Five members of 11KBW in Wikipedia Online Safety Act challenge - 11KBW - July 24th, 2025 [July 24th, 2025]
- Wiki-Quantities and Wiki-Measurements: Datasets of quantities and their measurement context from Wikipedia - Nature - July 22nd, 2025 [July 22nd, 2025]
- UK online legislation threat to operations,Wikipedia to argue in court - Euronews.com - July 22nd, 2025 [July 22nd, 2025]
- Exclusive | Why Mary-Louise Parker refuses to look at her Wikipedia page: Its bad - Page Six - July 14th, 2025 [July 14th, 2025]
- Opinion: How Wikipedia became one of the greatest inventions of the modern age - The Globe and Mail - July 12th, 2025 [July 12th, 2025]
- The Last of Us Part 2 has a 'chronological mode' now, in case you wanted to play through a story with all the finesse of a Wikipedia plot summary - PC... - July 10th, 2025 [July 10th, 2025]
- Many Wikipedia Articles Are Outdated or Incorrect - Yahoo - July 8th, 2025 [July 8th, 2025]
- Wikipedia Vs. Epistemic Insecurity: Why the World's Most Trusted Website Still Matters 06/30/2025 - MediaPost - July 2nd, 2025 [July 2nd, 2025]
- Mark O'Connell: I love Wikipedia so much, I hardly even minded when it killed me off - The Irish Times - June 29th, 2025 [June 29th, 2025]
- History of Wikipedia - Australian Broadcasting Corporation - June 28th, 2025 [June 28th, 2025]
- 60 Times Wikipedia Articles Were So Scary And Unsettling People Just Had To Know More - Bored Panda - June 24th, 2025 [June 24th, 2025]
- 60 Times Wikipedia Articles Were So Scary And Unsettling People Just Had To Know More - inkl - June 24th, 2025 [June 24th, 2025]
- Wikipedia Is The Latest Place To Join The Daily Gaming Craze - Kotaku - June 22nd, 2025 [June 22nd, 2025]
- Wikipedia Did What No One Expected: This Open Platform Just Proved Its Possible to Fight Back Against Powerful AI Systems - Rude Baguette - June 22nd, 2025 [June 22nd, 2025]
- Terrifying Survey Claims ChatGPT Has Overtaken Wikipedia - futurism.com - May 24th, 2025 [May 24th, 2025]
- Wikipedia wants you to wear your love for an open internet on your sleeve - Fast Company - May 24th, 2025 [May 24th, 2025]
- Wikipedia knew first? What really happened after Portnovs killing in Madrid - Euro Weekly News - May 24th, 2025 [May 24th, 2025]
- Can Wikipedia survive the rise of AI and the age of Donald Trump? - Australian Broadcasting Corporation - May 11th, 2025 [May 11th, 2025]
- Wikipedia fights the UKs flawed and burdensome online safety rules - The Verge - May 10th, 2025 [May 10th, 2025]