Noindex: how to use it to exclude pages from Google

Put us to the test
Put us to the test!
Analyze your site
Select the database

It is like a red light blocking search engines from accessing certain resources on our site, a clear and direct command not to include the page in the search results index. The noindex tag is one of the most powerful SEO directives we have at our disposal because, like a kind of switch, it can make a web page invisible in the eyes of search engines. However, in order to use it correctly, it is essential to fully understand its nature and implications, because the risks in the case of an error are very high.

What the noindex tag is

The noindex tag is a command that can be inserted in the HTML header of a web page to tell search engines that that page should not be included in their index and thus in search results.

Take care of your site!
SEO Spider is the technical SEO analysis to check all site tags and discover errors that block performance and returns.
Registrazione

More precisely, the noindex tag is a directive specified within the meta tag <meta name=“robots” content=“noindex”> that is placed in the HTML header of a web page. When in the course of its scans Googlebot encounters this information, it will completely remove the page from Google Search results, regardless of whether or not there are other sites containing links pointing to the page.

Noindex is recognized and respected by most search engines. It is not an imperative command, but search engines usually respect it.

It should be understood that noindex does not necessarily prevent the crawler from visiting and analyzing the page or from processing the links on it. Simply, the page will not be shown in the search results. Basically, it’s like saying to Google, “Look, this page exists, but I’d rather you didn’t show it in the search results.”

What the noindex is for

The noindex tag is part of the broader meta robots tag directive, which major search engines began implementing around 2007.

Webmasters generally use the “noindex” directive to prevent the indexing of content not intended for search engines. Essentially, this command is used to prevent non-essential or in-process pages from ending up in the eyes of users searching for relevant content.

For example, if we are testing a new page or have duplicate content that we do not want to penalize our ranking, noindex is the right tool to “hide” it temporarily or permanently. However, it is not the best solution for quickly removing a site page from Google search results, a situation in which it is preferable to use the removal tool right away.

In general, the noindex tag is a valuable ally in optimizing our website and ensuring its proper indexing by search engines. By using it judiciously, we can in fact hide unnecessary or harmful content, protecting the quality of our site and its online reputation.

When to use the noindex tag and why to implement it

The noindex tag is a highly accurate SEO tool used to manage the visibility of web pages in search engines. Its implementation needs to be thoughtful, as it can significantly affect a site’s online presence, and therefore it is important to know at least broadly when it is appropriate to use noindex and the rationale behind its implementation.

First, noindex is particularly useful for pages under construction or those that contain content not yet ready to be shown to the public. This allows webmasters to work on pages in the background, without them appearing in search results, thus avoiding presenting users with an incomplete or under revision experience.

Another common scenario is the need to hide pages with sensitive content or content that you do not want to be easily accessible. Noindex can serve to maintain a certain level of privacy for content that, by its nature, is not intended for wide distribution.

Duplicate content is another challenge webmasters regularly face. Pages with identical or very similar content can be penalized by search engines because of their redundancy. By using noindex, we can prevent duplicate versions of a page from being indexed, thus focusing authority and ranking on a single, authoritative version, which should be flagged to search engines with the appropriate rel=canonical.

Then there are pages that, although integral to a site, do not add value from a search perspective, such as privacy policy pages, terms and conditions, shopping cart or payment pages on an e-commerce Web site, or post-conversion confirmation pages. These pages, while important from an informational or legal standpoint, are not what a user expects to find through a search. The noindex helps keep the focus on content that actually attracts qualified traffic.

As for outdated, seasonal content or limited-time offers, noindex can be an effective strategy to manage their visibility and the site’s crawl budget. Once an event has passed or a promotion has expired, it no longer makes sense for these pages to take up space in the search engine index, potentially confusing users with information that is no longer current.

A/B test pages or mobile-optimized versions can also benefit from the use of noindex, as it allows for experimentation and testing without affecting the ranking of the main pages.

On a broader level, the use of the noindex tag is useful when we do not have root access to the server and need to manage web page indexing on an individual level, without the ability (or specific expertise) to change server-level configurations. Root access to a server allows us to make deep, global changes, such as modifying the web server configuration file (e.g., .htaccess on Apache servers) or changing the robots.txt file settings, which affect the behavior of search engine crawlers across the site. However, not all website administrators have root access privilege, especially those using shared or managed hosting platforms, where such changes are restricted or managed by the hosting provider. In these cases, the noindex tag becomes a valuable tool because it can be implemented at the individual page level, directly within the HTML code, without the need to change server settings.

Interaction with search engines: how interpretation changes

And so, the noindex tag is an effective tool for managing the visibility of web pages in search engines because it tells crawlers our preference not to have a particular page listed in search results.

This directive is critical for controlling the indexing of a website’s content: while its implementation is fairly standardized, its “interpretation” is handled slightly differently depending on the search engine in question.

There are in fact minor differences in how each search engine handles the noindex tag, although the general rule is that most search engines will respect the directive not to index marked pages.

In fact, in general, when they encounter a noindex tag, search engines should remove the page from the index if it is already there, and not include it in the future.

However, it is important to remember that the noindex tag does not prevent crawlers from accessing the page, but instructs them not to include it in their public indexes. Therefore, this does not mean that crawlers will stop visiting the page: Google, in particular, may continue to explore it-even if very infrequently-to better understand the structure of the site and to collect data on internal and external links.

In addition, we should be aware that even if a page is not indexed, it can still be discovered if it is linked from other indexed pages. In such cases, search engines may show the page URL in search results with a generic title or missing description, since they cannot show the content of the page itself.

In practical terms, the major search engines follow a similar behavior toward noindex: for example, Yandex-the most widely used search engine in Russia-respects the tag (if its bot discovers noindex on a page, it will not include it in its search results) and also provides tools for webmasters to manage the indexing of their pages. Bing (and search engines based on its system, such as Yahoo and DuckDuckGo) also does the same, which does not include the page labeled with noindex in its search results and also offers additional tools through Bing Webmaster Tools that allow webmasters to control the indexing of their pages in a more granular way.

As for Google, the world’s most widely used search engine also makes available a guide to “Blocking Search Indexing with noindex” in which it specifies to respect the noindex tag very directly. When Googlebot explores a page and detects the presence of the noindex tag within the <head> element, it proceeds to remove the page from the index if it was previously indexed. In addition, Google will not show the page in future search results. However, it is important to note that Google can still visit and analyze the page to gather information about inbound and outbound links, which can be useful for mapping the structure of the site and discovering new content.

Risks of noindex: the mistakes and woes for SEO

The use of the noindex tag must be done carefully, we specified, because in cases of mistakes it can lead to unintended consequences.

Accidentally marking important pages with noindex means excluding them from search results, and thus losing traffic and potential conversions. Very “trivially,” we must use this directive carefully and strategically, making sure to mark only the pages we actually want to hide.

One of the most serious mistakes is precisely the accidental application of the noindex tag to important pages that should be indexed, which can happen during site updates, migrations, or CMS changes. If pages that should generate traffic and conversions are excluded from the search engine index, the site may suffer a significant loss of visibility and, consequently, organic traffic.

Another mistake is to use noindex on pages that have accumulated quality backlinks, which are a signal of trust and authority to the search engines: noindex eliminates the possibility that the page can leverage this authority to improve its ranking in search results, also vitiating trust in the site as a whole.

It is then essential to remove the noindex tag in a timely manner after completing work on a page under construction or finalizing content-when we are then ready to publish online: if this is not done, the page will continue to remain invisible in search results, losing opportunities to reach audiences.

The use of noindex on pages that are integral to site navigation can confuse search engines and users: the exclusion of categories or tags can negatively affect users’ ability to find related content and navigate the site effectively.

In the context of international SEO, then, the improper application of noindex to language or regional variants of a page can prevent users from other nations from finding relevant content, damaging the global presence of the site.

Finally, the combined use of noindex with other SEO directives, such as canonical or nofollow, can create conflict and confusion for search engines, leading to unintended results in indexing and ranking pages.

To avoid these negative consequences, it is critical to have a vetting and verification process in place before implementing noindex. That is, we should carefully examine the site architecture and importance of each page before deciding to exclude it from the index. In addition, we should regularly monitor the indexing of the site through tools such as Google Search Console to ensure that the correct pages are visible and that there are no errors.

Blocking indexing: difference between noindex and robots.txt file

You can you need to get out of a common misunderstanding that people fall into when talking about ways to influence web page indexing, namely the use of noindex tags and blocking pages via the robots.txt file, which operate separately and, more importantly, have different effects and “weights.”

As mentioned, the noindex explicitly tells search engines not to show that page in search results, even if it was previously “open”; crawlers can still visit and crawl the page, allowing the links it contains to be followed and affect the ranking of other pages on the site.

On the other hand, the robots.txt file provides instructions to crawlers on which pages or sections of the site should not be explored: if a page is blocked by the robots.txt file, crawlers should not access it, which means that links on that page will not be followed. There is a big “but”: if a blocked page has already been indexed or receives links from accessible pages, it may in fact still appear in search results, usually with a title but no description, since search engines cannot parse its content.

In addition, the combined use of noindex and robots.txt can lead to conflicts: if we block a page with robots.txt, crawlers will not be able to access the noindex tag on the page, potentially causing confusion about indexing signals.

As Google says, therefore, for the noindex rule to be effective “the page or resource must not be blocked by a robots.txt file and must otherwise be accessible to the crawler.” If the page is blocked by a robots.txt file or cannot otherwise be accessed, the crawler will never detect the noindex rule and the page may still show up in search results, for example, if other pages contain links back to the page.

Always referring to Google, it is good to remember what options are valid for blocking the indexing of a page:

  • Noindex in robots meta tags directly in the HTML code of the page.
  • HTTP 404 and 410 status codes.
  • Protecting pages with passwords.
  • Use of disallow in the robots.txt file.
  • Tool for removing URLs within Google Search Console.

How to use noindex and how to apply it

Let’s move on to the practical aspects.

There are two ways to implement noindex: in the form of the <meta> tag and in the form of the HTTP response header. The effect achieved will be identical, Google’s guide clarifies, and so we can choose the method that is most practical and most appropriate for the type of content we publish on the site.

Again Google adds that it does not support the specification of the noindex rule in the robots.txt file. To be precise, this rule was never officially supported and was finally deprecated on September 1, 2019, when the new rules for indexing pages by Googlebot and, in particular, for exclusion from the search engine index came into effect.

  • Implementing noindex in the HTML code

The most common method of marking a page as unavailable for indexing is to insert the noindex tag directly into the HTML code of the specific page. This can be done by adding the <meta name=“robots” content=“noindex”> tag within the <head> element of the page. This is a fairly simple operation that can be done manually or through CMSs that offer options to manage such settings.

We can also limit this directive to one crawler by specifying the user agent in the meta tag:

<meta name=“googlebot” content=“noindex”>.

In this case, we are only preventing Google web crawlers from indexing a page.

  • Implementing noindex with HTTP header

A possible alternative is to use an HTTP X-Robots-Tag header with a noindex or none value, which is usually used to handle non-HTML resources, such as PDFs, video files, and image files. This is an example of an HTTP response with an X-Robots-Tag header that tells search engines not to index a page:

HTTP/1.1 200 OK

(…)

X-Robots-Tag: noindex

(…)

  • Implementing noindex with CMSs

Google says it too: if we use a CMS, such as Wix, WordPress or Blogger, we may not be able to edit the HTML code directly or it may be preferable not to. In these cases, to block indexing we can refer to the appropriate search engine settings page of the specific CMS or some other mechanism to indicate metatags to search engines. The more experienced can choose, for example, configuration via .htaccess files on Apache servers, while less competent or more timid users can rely on the reassuring ease of a WordPress plugin et similia. Popular SEO plugins such as Yoast SEO, All in One SEO Pack or Rank Math offer this functionality, through which noindex can be applied to a single page, a group of pages or the entire site.

Monitoring and maintenance of directives: how to control noindex

After implementing the noindex tag, it is important to monitor the effect it has on site traffic and visibility in search engines, so as to verify that pages are indeed excluded from the index and to identify any implementation problems.

To check the directives set on a web page, there are several methods and tools that can be used to examine the instructions provided to search engines.

  • Examine the Source Code

The first step is to view the source code of the page. This can be done by simply navigating to the page with the browser, right-clicking on it and selecting “View Page Source” or “View Page Source” (the exact option may vary depending on the browser). Once the source code is displayed, we perform an on-page search to check for the presence of the noindex directive.

  • Using SEO Tools

There are many SEO tools, both free and paid, that can help us check the directives set on a web page. For example, the Google Search Console allows you to see how Google views the page, including noindex directives and blocking information via robots.txt. Tools such as Screaming Frog allow you to scan the site and gather information on directives such as noindex and canonical, as well as check the status of the robots.txt file. Similarly, SEOZoom’s SEO Spider also provides this crucial information for overall site management.

Il SEO Spider di SEOZoom per verificare noindex e altre direttive

  • Other systems

Finally, there are browser extensions that can quickly provide information about a page’s SEO directives as we are browsing it. Again, tools such as cURL or online services can be used to view HTTP headers and check for directives such as X-Robots-Tag: noindex.

How to debug problems

To troubleshoot problems related to the noindex tag, it is first essential that the page is accessible and parsable by search engines, which otherwise cannot detect <meta> tags and HTTP headers.

Registrazione
Manage your site at a PRO level
With SEO Spider you have the support you need to detect and overcome errors and obstacles

If the page we blocked is still in the results, it is likely that Google has not yet crawled the page since we added the noindex tag. In fact, depending on the importance of the page on the Internet, it may take Googlebot months to visit it again. To speed up the process, we can solicit a rescan via the URL Inspection tool available in Google Search Console, which allows us to request a rescan of the page by Google.

Another reason why Google is unable to see the tag could be that the robots.txt file is blocking this URL from web crawlers, and so we need to unblock the page by editing the robots.txt file.

Finally, as mentioned, it is important that the noindex rule is visible to Googlebot. To check if the implementation of noindex is correct, we can still use the URL Inspection tool to check the HTML code received by Googlebot when crawling the page. Alternatively, we can use the Index Coverage Status Report in Search Console to monitor the pages on the site from which Googlebot has extracted a noindex rule.

Iscriviti alla newsletter

Try SEOZoom

7 days for FREE

Discover now all the SEOZoom features!
TOP