An XML sitemap lists the URLs on your website and signals to Google which pages you want crawled and indexed. It does not guarantee rankings but helps search engines discover new and updated content, especially on sites with weak internal linking or deep architecture. This guide explains what XML sitemaps do, what they should include, and the configuration mistakes that can create conflicting crawl signals.
An XML sitemap is a structured file, formatted in Extensible Markup Language, that lists the URLs on your website along with optional metadata about each URL such as when it was last modified, how frequently it changes, and its relative priority compared to other pages on the site.
The sitemap is submitted to Google through Google Search Console and acts as a direct communication channel between your site and Google’s crawlers. Rather than relying entirely on Google to discover every page by following links, the sitemap tells Google explicitly which URLs exist and which ones are worth crawling.
It is important to clarify what a sitemap does not do. It does not force Google to index pages. It does not improve rankings on its own. It does not replace the need for strong internal linking or quality content. A sitemap is a discovery and communication tool, not a ranking lever. Understanding this distinction prevents business owners from over-investing in sitemap optimisation while under-investing in the technical and content factors that actually drive rankings.
XML sitemaps are one component of a broader technical SEO foundation. A technical SEO audit examines sitemap health alongside crawl configuration, schema markup, and Core Web Vitals as part of a complete infrastructure assessment.
Google’s primary method of discovering pages is by following links, both internal links between pages on the same site and external links from other websites. For well-linked sites with clean internal architecture, Google will discover most pages through link-following alone.
The sitemap becomes more important in specific circumstances where link-following alone is insufficient. These include sites with pages that have few or no internal links pointing to them, newly launched sites with limited external links, large sites where some content is buried deep within the architecture, sites that update or publish content frequently and want Google to discover changes quickly, and sites with content that exists in isolation from the main navigation structure.
Google’s own documentation on sitemaps states that sitemaps are particularly useful for new sites and large sites, and that they help Google understand the structure and priority of your content. For small, well-linked sites that have been live for several years, a sitemap provides modest benefit. For growing sites publishing new content regularly, it is a straightforward mechanism for keeping Google informed of what is available to crawl.
When a sitemap URL is submitted to Google Search Console, Google logs when it last crawled the sitemap, how many URLs were discovered from it, and how many of those URLs were successfully indexed. This data is available in the Sitemaps report in Search Console and provides direct insight into how Google is processing the site’s URL inventory.
The most common sitemap mistake is including every URL on the site regardless of whether those URLs should be indexed. A sitemap that includes redirects, noindex pages, 404 error pages, duplicate content URLs, and parameter variations creates conflicting signals for Google and wastes the crawl allocation that the sitemap is intended to support.
A well-configured XML sitemap includes only the following types of URLs:
The sitemap should not include session parameters, filter variations, paginated versions of pages unless they contain unique content, internal search result pages, admin or account pages, and any URL that is blocked in robots.txt. Including these URL types adds noise to the sitemap and can slow the discovery of the pages that actually matter.
For most small to medium business websites, the sitemap will include the homepage, all main service pages, all location pages, all published blog posts, and key informational or conversion pages. This is typically a clean list of 20 to 200 URLs depending on the size of the site.
A correctly formatted XML sitemap follows a specific structure defined by the Sitemaps protocol, which is supported by Google, Bing, and other major search engines. The basic format opens with an XML declaration and a urlset tag that specifies the schema namespace, followed by a url entry for each page.
Each url entry must contain a loc element with the full absolute URL of the page. Optional elements include lastmod, which specifies the date the page was last modified in YYYY-MM-DD format, changefreq, which indicates how often the page content changes, and priority, which signals the relative importance of the page compared to others on the site on a scale from 0.0 to 1.0.
In practice, Google has stated that it ignores the changefreq and priority elements in most cases and focuses on the loc and lastmod values. Keeping lastmod values accurate and updated when pages change is the most useful sitemap optimisation available. Setting lastmod to today’s date for pages that were not actually changed is counterproductive and reduces the signal value of the field.
For sites with more than 50,000 URLs or a sitemap file that exceeds 50MB uncompressed, the sitemap must be split into multiple sitemap files referenced by a sitemap index file. This is primarily a concern for large e-commerce or directory sites rather than typical small business websites.
In addition to the standard URL sitemap, Google supports extension sitemaps for specific content types. Image sitemaps allow you to provide additional metadata about images on your pages, including captions, geographic locations, and license information, which can improve image search visibility. Video sitemaps provide structured data about video content, including thumbnails, duration, and description.
For most small business websites, the standard URL sitemap is sufficient. Image and video sitemaps provide incremental benefit for sites where image or video search visibility is a meaningful traffic source, such as photography businesses, real estate listings, or educational video content.
For most established business websites, yes. The creation and submission of a properly configured XML sitemap is a low-effort, low-risk step that provides real benefit for crawl efficiency and new content discovery. The work involved in generating and submitting a sitemap is small relative to the ongoing benefit of faster discovery of new and updated pages.
The more relevant question is whether your current sitemap is correctly configured. A sitemap that includes hundreds of redirect URLs, noindexed pages, and parameter variations is actively creating problems rather than helping. In this case, cleaning the sitemap is more important than whether the sitemap exists at all.
For businesses working through a full-service digital marketing engagement, sitemap configuration is reviewed and corrected as part of the technical foundation work that precedes content and link building. A clean sitemap is a prerequisite for efficient crawling of the content being produced.
Small sites of fewer than ten pages with strong internal linking and several years of indexed history will see minimal practical benefit from a sitemap. For every other category of business website, including any site actively publishing new content, maintaining location pages, or operating across multiple service areas, a correctly configured sitemap is a standard technical requirement.
Most content management systems generate XML sitemaps automatically through built-in functionality or plugins. WordPress sites using Yoast SEO, Rank Math, or All in One SEO generate sitemaps automatically. Shopify generates a sitemap automatically at yourdomain.com/sitemap.xml. Squarespace and Wix both generate sitemaps automatically for published pages.
For custom-built sites without a CMS, sitemaps can be generated using dedicated sitemap generator tools or by building the XML file manually and updating it programmatically when new content is published. The Screaming Frog SEO Spider can crawl a site and export a formatted XML sitemap as a starting point.
Once the sitemap is generated and accessible at a consistent URL, it should be submitted to Google Search Console through the Sitemaps section of the Search Console dashboard. Submitting the sitemap URL registers it with Google and allows you to monitor how many URLs are being discovered and indexed from it. Bing Webmaster Tools accepts the same sitemap URL for Bing discovery.
After submission, the sitemap should remain accessible at the same URL indefinitely. If the sitemap URL changes, the old URL should be redirected to the new one and the new URL submitted to Search Console. A sitemap that moves without a redirect or resubmission stops contributing crawl signals until Google re-discovers it.
The most frequently encountered sitemap problems in technical audits include: including non-canonical URLs in the sitemap where the canonical tag on the page points to a different URL, including paginated URLs that are not the canonical version of the content, including URLs blocked by robots.txt which creates a direct conflict between the sitemap instruction and the robots.txt instruction, failing to update lastmod values when content is genuinely updated, and including URLs that return 301 redirects rather than the destination URL.
Each of these errors dilutes the value of the sitemap and can create confusion in how Google interprets the relationship between the URLs it finds. The crawlability guide covers how sitemap errors interact with the broader crawl configuration of a site and how they should be sequenced in a technical remediation.
For businesses that have never reviewed their sitemap configuration, checking the Sitemaps report in Google Search Console and comparing the number of submitted URLs to the number of indexed URLs is the fastest way to identify whether the sitemap is functioning correctly. A large gap between submitted and indexed URLs indicates either sitemap configuration problems or indexability issues with the pages themselves.
A correctly configured XML sitemap is one element of a technically sound website, not a standalone solution to ranking or crawl problems. It works best when the pages listed in the sitemap are genuinely indexable, internally linked, and contain content strong enough to rank for their target queries.
If you want to know whether your sitemap is correctly configured and contributing to efficient crawling of your most important pages, a technical SEO audit will identify the specific configuration errors and provide a prioritised remediation plan. Every Whissel Strategies engagement begins with this audit, backed by a 90-day performance guarantee. Book a free strategy call to get started.
Most CMS platforms place the sitemap at yourdomain.com/sitemap.xml or yourdomain.com/sitemap_index.xml. You can check your robots.txt file at yourdomain.com/robots.txt, which typically references the sitemap URL in the final line. Google Search Console also shows the submitted sitemap URL in the Sitemaps report.
Your sitemap should update automatically whenever new content is published or existing pages are significantly updated. Most CMS plugins handle this automatically. If your sitemap is generated manually, it should be updated whenever new pages are added or page URLs change. The lastmod date for each URL should reflect the actual date of the most recent meaningful content change.
No. A sitemap tells Google which pages exist and are worth crawling. Google decides independently whether to index each page based on its quality signals, content, and the overall authority of the domain. Pages with thin content, significant duplicate content, or technical barriers may not be indexed even when listed in a correctly configured sitemap.
No. Pages with a noindex directive should not appear in the sitemap. Including noindexed pages creates a direct conflict: the sitemap tells Google the page is worth crawling, while the noindex directive tells Google not to include it in the index. Google typically respects the noindex directive, but the conflict wastes crawl allocation on pages that will not produce ranking results.
Yes. Large sites with different content types often use multiple sitemaps, such as one for pages, one for blog posts, one for images, and one for products, all referenced by a sitemap index file. This approach makes it easier to monitor indexation performance by content type in Google Search Console and to update specific sections of the sitemap without regenerating the entire file.
A correctly configured XML sitemap is one of the lower-effort technical SEO improvements available and one of the first things to check on any site that is not performing at its indexation potential. Clean it, submit it, monitor it in the Search Console, and keep it updated as your content grows. Book a free strategy call to get started.
XML sitemaps help Google discover your pages faster. Whissel Strategies helps Canadian businesses create and optimize sitemaps to improve crawl performance and visibility. Book a free strategy call to see how a proper sitemap can boost your site’s SEO.
Book a 30 minute growth call, where Bailey Whissel will personally assess your business, identify challenges and goals, and create a customized one-page growth plan.