Orphan pages, the rooms without doors problematic for SEO

SEO admin 31 August 2023

The name is rather evocative and symbolic, and it stands for precisely the main characteristic of these resources: orphan pages are called those that have no incoming references from any other page on the site. That is to say, they are pages that receive no internal links, practically isolated from the site structure and other pages, and therefore at great risk of being lonely, forgotten and often ignored even by visitors and crawlers. Even from this summary alone, it is clear that the copious presence of such a situation can be a problem for SEO, but finding and correcting orphan pages is not complicated and there are various tools to take action.

The definition of orphan pages

In the SEO field, orphan pages are defined as those that are present and active on the site but have no link pointing to them from any other page. An orphan page can thus be a URL or subpage that is physically present but essentially invisible to browsing users because it is absent from the internal linking structure of the site.

They are called orphan pages because they have no parents-the pages from which the link originates-and are therefore isolated and devoid of any links to the main pages of the site. Because of this, orphan pages have no connection to the outside world: a user will only be able to access them if he or she knows the direct URL, and even search engine crawlers often have difficulty finding them. In fact, as we know, these bots crawl by traveling to Web sites through precisely the links they find, so as to compile a list of site URLs to send to the Index: orphan pages are outside the web (to borrow the classic metaphor of the Web as a spider’s web, precisely) and can be discovered by search engines only if they are included in the Sitemap file or in any external backlinks, but more frequently they are not indexed at all.

The difference between orphan pages and dead-end pages

However, we must be careful not to confuse orphan pages with dead-end pages: both can create problems for the browsability of our site and its visibility in search engines, but they represent two distinct problems.

An orphan page, as we have just discussed, is a page that has no internal link connecting it to the rest of our site: it is a room without doors, inaccessible unless we know its exact location.

It is in some ways the opposite concept from dead-end page, a page that has no outbound links. In this case, it is like a room with doors leading only inside, with no doors leading out. This means that once visitors or search engines arrive at a dead-end page, they have no way to continue exploring the rest of our site. In short, these are dead ends, pages that lead nowhere else because they have no outbound links (but do, on the contrary, have inbound links) and thus trap visitors, preventing them from continuing their exploration.

How orphan pages are created: causes and practical examples

As a rule, orphan pages are not created intentionally, but rather are the result of a series of circumstances and actions that may be beyond our control, such as changes in the site or simple mistakes.

There are a number of reasons that can lead to the appearance of these URLs, such as product pages that are no longer in stock, old news content that has now been removed, or deleted videos that we may have removed from our menu or internal links but are still physically present on the server and reachable. Another type of orphan page are the thank-you pages that show up after filling out a form or making a purchase-they exist but are not linked to any other part of our site-but they are not a real problem because they serve a specific function.

Other reasons that generate orphan pages are incorrect use of the CMS for page creation, mishandling of a migration, categories taken offline without a redirect, failure to delete test pages (e.g., those used for A/B testing). E Then there are two common technical causes that give rise to orphan pages that should be immediately addressed and resolved, because they essentially create duplicate pages that should automatically and consistently redirect to a single URL: these are the handling of HTTPS/HTTP and non-canonical www/non-www and that of trailing slash, the final slash of the path.

To recap, then, the frequent causes that lead to orphan pages on the site are:

Obsolete pages. Examples such as past event pages, products that are no longer available or no longer sold, test pages, or simply old articles and blog posts are all at risk of becoming orphan pages if we decide to remove the link to these pages from menus, archives, categories, and other updated pages without, however, physically removing the obsolete resource.
Changes in structure. For example, removing a page from our main menu or a side navigation bar while forgetting to add a link to that page from another part of our site.
Removal of linked pages. If we remove a page that contains links to other pages-and those links were the only way to reach them-those pages become orphaned.
Programming errors. For example, an error in our code that prevents some links from being displayed, making the pages to which those links should lead inaccessible.
Special cases: thank you pages and landing pages. As mentioned, some orphan pages are intentionally created, such as thank-you pages that display after filling out a form or landing pages used for marketing campaigns. These pages are often isolated from the rest of the site to avoid distraction and guide the user to a specific action, but if not managed properly they can contribute to a maze of inaccessible pages that make bulk.
Page variants. At an ideal level, each public page on the site should use HTTP or HTTPS (preferably) consistently and, again consistently, version with www or without www. To check for errors, we can do a simple test: type the four variants of the site’s home page into the browser – i.e.

– https://www.example.com
– http://www.example.com
– https://example.com
– http://example.com

and check that all four automatically redirect to the exact same URL, which, for consistency, should be set as canonical to itself. If one of these variants does not redirect correctly, it may be a sign of similar problems on the site on other pages as well, and you need to check other URLs for the offending variant to see if it is a more widespread error, then test some pages on the site and the .htaccess file to make sure that the redirects for these are set correctly.

Paths with trailing slashes. Another thing to pay attention to is the consistent use of trailing slashes. For example, these two URLs may produce the same content, but the URLs are not identical:
– https://example.com/page1/
– https://example.com/page1

To know if the settings are correct, just do a random check on some pages of the site searched with and without the trailing slash, verifying that there is an automatic redirect to the same URL and that the choice is consistent.

Orphan pages and SEO: why they are a problem

To understand why orphan pages are a problem we can use a simile and imagine our website as a big house: each page is a room and each link is a door connecting the rooms together. Visitors and search engines enter the house and start exploring, moving from one room to another through the doors.

However, if a room has no doors it remains unexplored, ignored, forgotten. This is exactly what happens with orphan pages: they physically exist in the structure of the house, but no one visits them because there are no doors connecting them and allowing entry.

This is why orphan pages are problematic: not only can visitors not find these pages, but also search engines may have difficulty indexing them. And if a page is not indexed, it cannot appear in search results, which means all our efforts to create quality content on that page will be wasted.

The negative effects for SEO

In general, the link structure of a website should be uniformly organized to ensure two goals: to encourage internal link juice to important pages and to ensure a good user experience.

Left as they are, orphan pages are of no value to the site and indeed can become detrimental, especially if present in large numbers.

On the one hand, they create frustrating user experiences because users cannot reach those pages through the natural structure of the site; if there is important or useful information on those pages, it is therefore wasted.

On the other hand, they can impact crawl budget optimization and the quality of site visits and conversions: the web crawler cannot report a lot of data or a favorable indexing profile, and this in the long run can affect the ranking, making the website appear of lower quality.

Since they have no internal links, then, they receive no equity, and search engines have no semantic or structural context in which to evaluate the page: that is, they have no way of understanding where the page fits into the site as a whole, which makes it more difficult to determine for which queries the page is relevant.

The implications for the site and for SEO

To recap, orphan pages can therefore have a number of negative consequences for the website, including:

Reduced visibility. Orphan pages are not accessible to users via internal site navigation and are not indexed by search engines, so they have very limited visibility.
Deterioration of SEO. Orphan pages do not contribute to the ranking of the website, in fact they can even harm it, as they can be seen as low-quality pages by search engines.
Increased bounce rate. Users who visit an orphan page are more likely (practically forced) to abandon the website, which increases the bounce rate.
Management problems. These pages take up server space, do not contribute to the site’s ranking, and can even hurt it, especially when they waste the time Google spends overall on crawling the site, taking crawl budget away from the most useful and relevant resources.

As we know, from the perspective of web page discovery, search engines like Google usually find new pages in two ways:

The crawler follows a link from another page.
The crawler finds the URL listed in the XML sitemap.

In order for Google to crawl and subsequently index the page, it must first be able to find it through links; in the case of orphan pages, this is not possible and so these URLs often do not get indexed and may never show up in search results.

Even if listed in the XML sitemap, however, orphan pages remain a problem for SEO and one must try to detect and correct them.

How to find all orphan pages on the website

The first step in solving the orphan pages problem is to identify the scannable pages, that is, to create a complete list of URLs that can currently be reached by crawling the links on the site.

It is important to have a list of all active URLs-that is, those that can receive hits from crawlers-and then exclude pages that are not indexable by search engines because they are classified as noindex or blocked with setting in robots.txt. Crawling should always start with the site’s home page and proceed by making sure to use the canonical URL, including correct HTTPS or HTTP and www or www-less versions.

Compare URL lists to discover gaps

Having obtained the scan, you export the list of URLs to an excel spreadsheet, pasting them into a column.

Now you need to proceed with gap analysis, which compares data from different sources looking for any discrepancies: for example, Google Analytics data, Search Console data, Sitemap data, or data from the site’s server log files.

What is important is to have complete lists of URLs to scan for resources that are “missing” to identify gaps, precisely: using the match formula, for example, automatically launches the search for matches and absences and you will be able to find orphan URLs.

How to deal with and fix orphan pages

After performing these steps and finding all orphan pages, it is time to figure out what fate they should have based on some evaluations and considerations:

Is the page relevant?
Does it rank for certain keywords despite everything?
Does it generate visits?
Does it receive backlinks from authoritative external sources?
Does its existence make sense in the site’s taxonomy?
Is it optimized?

If the answers are positive, you should further enhance this page and place it within the internal link structure of the site, simply linking it from an existing regular page; to improve its performance, then, you can update and improve its content if necessary.

Conversely, if the page is useless and, moreover, has duplicate or nearly duplicate content, the best option is to remove it, setting an HTTP 404 or 410 status code, which could also offer benefits in terms of crawl budget efficiency.