Case sensitive URLs for Google, pay attention to upper and lower case

In English it is called case sensitivity, although perhaps we are more familiar with the term case sensitive, and it is perhaps overlooked when building a site and, in particular, studying the structure of Urls: it is, then, the distinction (or not) between upper and lower case letters, which can open to different scenarios and complicated consequences for our SEO efforts.

What case sensitive means in informatics

Case sensitivity indicates any text analysis operation in which the upper and lower case letters are treated as if they were completely different characters. Therefore, two apparently equal words – such as Sugar and sugar – are actually different for the use of the upper or lower case letter – in the example, the first term refers to the Italian songwriter, while the other to the common food product.

Examples of distinction between upper and lower case letters

In the computer world there are some case sensitive areas and languages (which discern the difference between upper and lower case characters), while other systems are insensitive or non-case sensitive.

Some programming languages, such as BASIC, Pascal and ASP, are for example insensitive houses, while others such as Java, C, C++ and Python are case sensitive: in this second case, writing a word using uppercase or lowercase makes a difference.

Even operating systems can be case sensitive or not: among those that make no distinction there are MS-DOS and Microsoft Windows, which consider the two forms equivalent and accept in an indifferent way commands, both in upper and lower case letters. In contrast, the Linux operating system is sensitive to the difference between upper and lower case characters: since most web servers rely on Unix systems, for many sites there may be a difference between two pages such as «index.html» and «INDEX.HTML».

This distinction also applies to file name management: Microsoft Windows does not differentiate upper and lower case (although it maintains the distinction in most file systems), while Unix operating systems treat file names in a case-sensitive way.

Also different is the case of the URL, where the path, the query, the fragment and the sections of authority can or not make distinction between uppercase and lowercase, depending on the receiving web server; however, by convention, the schema and the host parts are strictly lowercase.

Also speaking of URLs, we can say that by nature domain names or hosts are treated in lowercase by both browsers and DNS servers (and therefore are practically case insensitive); on the contrary, the paths (the text after the first bar) are case sensitive, although many websites also normalize this part by setting the lowercase automatically.

Is Google case sensitive? Pay attention to characters

The management of case sensitivity therefore also affects the SEO and the optimization of the site, especially if we want to avoid errors and be sure that users and crawlers of search engines can properly reach our pages.

It is John Mueller who introduces this topic and explains what Google’s approach is to case-sensitive elements, particularly in URLs: in a nutshell, the search engine is sensitive to the distinction between upper and lower case letters, but it is even more rigorous the spelling of the addresses inserted inside of the robots.txt files and for the redirects, that are case sensitive: when we write the rules of redirection, in particular, we must not neglect to respect the correct syntax.

Google and URLs: how upper and lower case letters are managed

It is no surprise to find out that for Google case changes (and thus the use of uppercase or lowercase letters) can make a URL different from another, similar to how a URL with a trailing slash or final bar is different from a URL without bar, and may cause some SEO problems such as an orphaned page or duplicate content.

In practice, inserting a capital letter inside the path of a URL creates, in fact, a new URL.

Therefore, Mueller confirms that the use of uppercase or lowercase characters has a value for Google, which is case sensitive: two Urls might look the same and even lead to the same content, but they can be treated as different Urls if one has a uppercase letter and the other does not.

By definition, in fact, “Urls distinguish between upper and lower case” and therefore even such a seemingly trivial element “counts and can make Urls different”.

Canonicalization of the different versions of a URL

In fact, when faced with Urls that differ by use of uppercase and lowercase, search engines try to figure out for themselves whether pages refer to the same content, thus solving the problem.

However, even if automatically managed, this process is not ideal for the site, because Google could take longer to discover and index content: for example, explains the search advocate of the company, “Search engines will try to scan all variants of the URL they find”, and this can slow down the search for other useful content on the website.

When it encounters multiple distinct versions of Urls showing the same content, Google starts a process called canonicalization, through which it decides which Url to keep in the Serps, consolidating all signals of the other versions in that URL; the page that ends up being displayed in the search results is known as the canonical URL.

Canonicalization is not exactly a “problem” for the site and its ranking, but it’s good to remember that Google’s systems might choose a different URL from the one we would have chosen as a priority, and so it can somehow impact on returns, as well as having effects on the budget crawl.

We can report to Google which version of a URL we want to be shown in search results in two ways (even complementary): using internal links in a consistent way to point to that version and add the rel=”canonical” link, element that helps confirm the choice and encourages search engines to focus on that version.

The robots.txt file is case sensitive

More problematic is the lack of care in the use of uppercase and lowercase letters inside the robots.txt file, where the exact URL plays a crucial role: this document, in which we can “report which parts of a website should not be scanned“, as Mueller reminds us, uses exact Urls.

This means that not curing syntax and spelling is a serious mistake for the robots file, because if we insert only one of the entries that refer to a version of a URL, the instructions would not apply to other versions of that URL. More generally, it is appropriate to check carefully that all data (directories, subdirectories and file names) are written without mixing uppercase and lowercase in an inappropriate way.

A solvable problem for the SEO

Anyway, it is still Mueller to cheer us up and calm us down: at the end of the day, the case sensitivity on Google is an aspect that “is not so fundamental for a website”, even if it is a best practice to be consistent in the way we use capital letters and lowercase letters in Urls.

Sigh of relief also for the management of the URL in the files robots, because it is always the Search Advocate to reveal that “it is rare that we see that the case sensitivity causes problems”.

7 days for FREE

Discover now all the SEOZoom features!
TOP