Sitemaps Best Practices Including Large Web Sites

One of the key Search Engine Optimization (SEO) strategies for web sites is to have high quality sitemaps helping search engines to discover and access all relevant content posted on that web site. Sitemaps offer this really simple way for site owners to share information with every search engine about the content they have on their site instead of having to rely solely on crawling algorithms (ie: crawlers, robots) to find it.

The Sitemaps protocol defined atwww.sitemaps.org, is a now widely supported. Often web sites and some Content Management Systems (CMSs) offers sitemaps by default or as an option. Bing even offers an open source server-side technology,Bing XML Sitemap Plugin, for websites running on Internet Information Services (IIS) for Windows Server, as well as Apache HTTP Server.

If you dont have a sitemap yet, we recommend first that you explore if your web siteor your CMS can manage this, or install a sitemap plugin.

If you have to, or want to, develop your own sitemaps, we suggest the following best practices:

Interestingly some sites these days, are large really large with millions to billions of URLs. Sitemap index files or sitemap files can link up to 50,000 links, so with one sitemap index file, you can list 50,000 x 50,000 links = 2,500,000,000 links. If you have more than 2.5 Billion linksthink first if you really need so many links on your site. In general search engines will not crawl and index all of that. Its highly preferable that you link only to the most relevant web pages to make sure that at least these relevant web pages are discovered, crawled and indexed. Just in case, if you have more than 2.5 billion links, you can use 2 sitemap index files, or you can use a sitemap index file linking to sitemap index files offering now up to 125 trillion links: so far thats still definitely more than the number of fake profiles on some social sites, so youll be covered.

The main problem with extra-large sitemaps is that search engines are often not able to discover all links in them as it takes time to download all these sitemaps each day. Search engines cannot download thousands of sitemaps in a few seconds or minutes to avoid over crawling web sites; the total size of sitemap XML files can reach more than 100 Giga-Bytes. Between the time we download the sitemaps index file to discover sitemaps files URLs, and the time we downloaded these sitemap files, these sitemaps may have expired or be over-written. Additionally search engines dont download sitemaps at specific time of the day; they are so often not in sync with web sites sitemaps generation process. Having fixed names for sitemaps files does not often solve the issue as files, and so URLs listed, can be overwritten during the download process.

To mitigate these issues, a best practice to help ensure that search engines discover all the links of your very large web site is that you manage two sets of sitemaps files: update sitemap set A on day one, update sitemap set B on day two, and continue iterating between A and B. Use a sitemap index file to link to Sitemaps A and Sitemaps B or have 2 sitemap index files one for A and one for B. This method will give enough time (24 hours) for search engines to download a set of sitemaps not modified and so will help ensure that search engines have discovered all your sites URLs in the past 24 to 48 hours.

This post originally appeared on Bing Webmaster Blog, and is re-published with permission. Featured Image: Raywoo via Shutterstock

I am working on the most exciting, fastest moving team at Microsoft: Bing Team.

Read the original post:
Sitemaps Best Practices Including Large Web Sites

Comments are closed.