sitemaps
March 13th, 2007 at 11:02 am by digital media works
What is a Sitemap file and why should I have one?
A Sitemap file lets you tell search engines about all the pages on your site, and optionally, information about those pages, such as which are most important and how often they change. By submitting a Sitemap file, you can take control of the first part of the crawling/indexing processes: our discovery of the pages.
This may be particularly helpful if your site has dynamic content, pages that aren’t easily discovered by following links, or if your site is new and has few links to it.
Sitemaps helps speed up the discovery of your pages, which is an important first step in crawling and indexing your pages, but there are many other factors that influence the crawling/indexing processes. Sitemaps lets you tell search engines information about your pages (which ones you think are most important, how often the pages change), so you can have a voice in these subsequent steps. Other factors include -
- how many sites link to you
- if your content is unique and relevant
- if the search engine can crawl the pages successfully.
A Sitemap provides an additional view into your site (just as your home page and HTML site map do). This file does not replace the normal methods of crawling the web. Search engines still search and index your sites the same way they have done in the past whether or not you use this file. Sites are never penalized for using this.
What can a Sitemap contain?
Keep in mind the following for Sitemaps of any format:
- A Sitemap can contain a list of URLs or a list of Sitemaps.
- If your Sitemap contains a list of other Sitemaps, you should save it as a Sitemap index file and use the XML format provided for that file type. A Sitemap index file cannot list more than 1,000 Sitemaps.
- A Sitemap file can contain no more than 50,000 URLs and be no larger than 10MB when uncompressed. If your Sitemap is larger than this, break it into several smaller Sitemaps. These limits help ensure that your web server is not overloaded by serving large files to search engines.
- Specify all URLs using the same syntax. For instance, if you specify your site location as http://www.example.com/, your URL list should not contain URLs that begin with http://example.com/. And if you specify your site location as http://example.com/, your URL list should not contain URLs that begin with http://www.example.com/.
- Completely specify each URL. For instance, include the protocol (such as http://) and include all required trailing slashes.
- Do not include session IDs in URLs.
- The Sitemap URL must be encoded for readability by the webserver on which it is located. In addition, it can contain only ASCII characters. It can’t contain upper ASCII characters or certain control codes or special characters such as * and {}.