Sitemap XML, 3 steps to improve the SEO

SEO admin 27 May 2020

We can define it as a roadmap that helps search engines to make their way inside our website, finding (and analyzing) more easily the inserted Urls, and we know how important it can be to make them discover the pages that are relevant to us: we are back on focusing on the XML sitemap and in particular on the three basic steps to avoid problems and also help improve the SEO.

Three checks to perform on the XML sitemaps

To provide us with these indications is an article published by searchengineland, which reports a small checklist to follow for our sitemaps provided to search engine crawlers, useful to avoid errors such as the absence of important Urls (which could potentially therefore not be indexed) or the insertion of wrong Urls.

1. Are important URLs missing?

The first step is to verify that we have included in the sitemap all the key Urls of the site, meaning those that represent the cornerstone of our online strategy.

An XML Sitemap can be static, thus representing a snapshot of the website at the time of creation (and therefore no longer updated later) or, more effectively, dynamic. The dynamic sitemap is preferable because it updates automatically, but the settings must be checked to ensure not to exclude central sections or Urls for the site.

To verify that the relevant pages are all included in the sitemap we can also do a simple search with the Google site: command, so as to find out immediately if our key Urls have been properly indexed. A more direct method is to use some crawling tools to compare the pages actually indexed and those inserted in the sitemap submitted to the search engine.

2. Are there URLs to remove?

Completely opposite is the second check: not all Urls should be inserted in the XML sitemap and it is better to avoid including addresses that have certain characteristics, such as

URL with HTTP status code 4xx / 3xx / 5xx
Canonicalized URLs
URLs blocked by txt
URLs with noindex
URLs of pagination
Orphaned URLs

An XML sitemap should normally contain only indexable Urls, responding with status code 200 and that are connected within the website. Including other types of pages, such as those mentioned, could contribute to worsening the crawl budget and potentially cause problems, such as indexing orphaned Urls.

Scanning the sitemap with crawling tools allows you to highlight if there are resources entered incorrectly and, therefore, to intervene to remove them.

3. Has Google indexed all the XML sitemap URLs?

The last step concerns the way Google has understood our map: to get a better idea of which Urls were actually indexed, we must send the Sitemap in Search Console and use the Sitemap report and the report on the index coverage status, that give us indications on the coverage of the search engine.

In particular, the index coverage report allows us to check the Errors section (which highlights problems with maps as Urls that generate a 404 error) and that of Excluded Urls (pages that have not been indexed and do not appear on Google) also including the reasons for their absence.

If these pages are useful – not duplicated or blocked – there could be a quality problem or a wrong status code, especially for those scanned but currently not indexed pages (Google has chosen not to insert for now the page in the index) and for the detected, but not indexed pages (Google has tried to do a scan, but the site was overloaded), and so it is necessary to intervene with appropriate onsite optimizations.