Googlebot scans HTTP/2 sites (but there are no ranking benefits)

Put us to the test
Put us to the test!
Analyze your site
Select the database

Starting in November, Googlebot will begin to support HTTP/2 crawling for “selected sites”, with significant efficiency benefits. However, there will be no real ranking benefits for those who adopt this new protocol. This is the summary of the announcement that appeared a few hours ago on the Webmaster Central Blog, which anticipates what will be the effects of this step.

What is the HTTP/2 protocol

In short, the HTTP/2 or h2 protocol is “the next major version” of HTTP, the protocol that the Internet uses primarily for data transfer: a defined version “much more robust, efficient and faster than its predecessor, thanks to its architecture and the features it implements for clients (the browser, for instance) and servers”.

The benefits of the new version

Only in terms of speed – an increasingly high priority for Google and more generally for the whole Web – according to some studies HTTP/2 provides an average improvement in the performance of page loading times ranging from 5 to 15 percent.

As a follow-up on developers.google.com explains, the primary goals for HTTP/2 are “reduce latency by enabling request and response multiplexing, minimize protocol overload by efficiently compressing HTTP header fields, and add support for request priority and push server”.

Its main features are:

  • Creation of a single connection established between a browser and a site.
  • Exchange of collected data through a multiplication process.
  • Binary code for information exchange.
  • Compression of http header.
  • Use of push servers in case of predictable response requests.
  • Prioritizing the most important page elements.

What HTTP means and the history of protocol

Let’s open a small digression: HTTP means Hypertext Transfer Protocol and is the system of rules that makes it possible to communicate and transfer files in the World Wide Web between clients (browser with its applications) and servers (on which the requested site is located), and therefore represents one of the foundations of our Internet browsing experience.

The first version of this application level protocol dates back to the late 1980s and is signed by Tim Berners-Lee and the CERN in Geneva; in 1991 HTTP/1.0 debuted, the first version actually available of the protocol, which underwent several updates and changes until version 1.1, the longest running (lasting from 1997 to 2015).

Version 2.0 was developed by the Internet Engineering Task Force’s Working Group Hypertext Transfer Protocol (httpbis) in 2014 and represents the first major protocol revolution (the most important so far). The latest version, HTTP/3.0 was announced in 2018 but is still being defined.

 

 

As early as 2009, however, Google presented its alternative to HTTP/1.1, called SPDY, to overcome the most critical point of that old version of transfer protocol, which unnecessarily slows down the most modern complex sites. In fact, this experimental Google project already had great benefits, with increases in page loading speed reaching almost 50 percent compared to HTTP.

The first copy of HTTP/2 was based on SPDY, used as a starting point, but then it departed, choosing for example a code-based compression algorithm (and not dynamic stream-based compression) to reduce potential risks of attacks on the protocol. And so, in 2015 Google announced the abandonment of the project, starting to support HTTP/2 starting with Chrome v.40.

Differences between HTTP and HTTP/2

According to some sources, currently the web is essentially divided in the adoption of HTTP/1.1 and HTTP/2 protocols: the new version is supported by all major browsers and used by about 45 percent of sites, and also because it is very easy to implement.

In fact, HTTP/2 is extending, not replacing, the previous http standards and does not modify in any way the previous semantics of the application: this means that all the fundamental concepts, such as methods, status codes, URLs and header fields remain valid. What changes is the way the data is framed and transported between the client and the server, which manages the entire process and hides all the complexity of our applications within the new level of framing. As a result, all existing applications can be delivered without modification.

Given the growing focus on the issue, many asked Google when Googlebot would begin crawling on the updated and more modern version of the protocol, leading to today’s decision and announcement.

A more efficient crawling to both Googlebot and the server

The reason that prompted Google to make this change is soon said: by analyzing h2, scanning will become more efficient in terms of using server resources, because Googlebot is able to open a single TCP connection to the server and efficiently transfer more files in parallel, instead of requiring more connections. In practice: less connections open, less resources the server and Googlebot have to spend on crawling.

How testing works

Google then announced that from November will start an experimental phase of testing for this new type of scan: initially, will be chosen a small number of sites on h2 that could take advantage of the features initially supported, like multiplexing requests, and then the sample will be gradually increased.

Googlebot decides on which site to scan on h2 based on two factors: whether the site supports h2 and whether the site and Googlebot benefit from scanning on HTTP/2. According to Google, “if your server supports h2 and Googlebot already performs many scans of your site, you may already be eligible for the upgrade of the connection and you don’t have to do anything”.

We don’t have to worry if “your server still only speaks HTTP/1.1“, because that’s okay too: there is no explicit drawback to scanning this protocol and therefore the crawling will remain the same “as quality and quantity”.

Giving up the test

Google’s preliminary tests showed no problems or negative impacts on indexing, but “we understand that, for various reasons, you may want to disable the scan of your site on HTTP/2″.

To do this, you must instruct the server to respond with an HTTP 421 status code when Googlebot attempts to scan the site on h2. If this solution is not currently feasible, it is temporarily possible to send a message to the Googlebot team (but, in fact, it is only a temporary possibility).

The main questions on Googlebot and HTTP/2

At the bottom of the article, Google presents some Faqs that clarify some points of this operation and help to better understand the benefits – even practical – of the new scanning system.

  1. Why are you updating Googlebot now?

Because the software used to allow Googlebot to scan h2 “has become mature enough to be used in production”.

  1. Do I need to update my server as soon as possible?

It is a personal choice. However, “we’ll only scan h2 sites that support it and clearly benefit from it,” Google says. So, if “there is no clear benefit to scanning on h2, Googlebot will still continue scanning on h1“.

  1. How do I check if my site supports h2?

Google suggests reading the Cloudflare blog which describes “a myriad of different methods to test whether a site supports h2”.

  1. How do I update my site to h2?

It depends on the server, so Google suggests to “talk to your server administrator or hosting provider”.

  1. How do I get Googlebot to speak in h2 with my site?

It is not possible: Google reiterates that this only happens “if the site supports h2 and if this would be beneficial to the site and Googlebot”.  Conversely, if scanning on h2 does not result in significant resource savings, for example, “we will simply continue to scan the site on HTTP/1.1”.

  1. Why not scan all h2-enabled sites on h2?

In preliminary evaluations, Google found little to no benefit to some sites (for example, those with very low qps) while scanning on h2: therefore, “we decided to switch to scanning in h2 only when there is a clear benefit to the site”.  However, “we will continue to assess performance improvements and we may change our criteria for the transition in the future”.

  1. How do I know if my site is being scanned on h2?

When a site becomes eligible for scanning on h2, owners of that site registered in the Search Console will receive a message informing them that part of the scanning traffic may be on h2 in the future.  You can also check the server logs (for example, in the access.log file if the site runs on Apache).

  1. Which h2 features are supported by Googlebot?

Googlebot supports most of the features introduced by h2, but some features such as the push server, which may be useful for rendering, are still being evaluated.

Is there any advantage in terms of ranking for a site in being scanned at h2?

No, he answers the article.

  1. Does Googlebot support the HTTP/2 (h2c) plaintext?

No. The website must use HTTPS and support HTTP/2 to be eligible to scan over HTTP/2.  According to Google, this is equivalent to the way modern browsers handle it.

  1. Will Googlebot use the ALPN extension to decide which version of the protocol to use for scanning?

The Application-layer protocol negotiation will be used only for sites for which scanning has been enabled on h2 and the only protocol accepted for responses will be h2.  If the server responds during the TLS handshake with a version of the protocol other than h2, Googlebot will shut down and return to HTTP/1.1 at a later time.

  1. How will all different h2 features help the scanning?

According to Google, the most important of the many advantages of h2 are:

  • Multiplexing and competition: a lower number of open TCP connections means less expense resources.
  • Header compression: drastically reduced HTTP header sizes will save resources.
  • Push server: this function is not yet enabled, as mentioned, but it could be useful for rendering.
  1. Will Googlebot perform the scan faster in h2?

The main advantage of h2 is the saving of resources, both on the server side and for Googlebot.  But the fact that “we scan using h1 or h2 does not affect the way your site is indexed, and therefore does not affect the amount of crawl that we are going to perform on your site”.

  1. Is there any advantage in terms of ranking per un sito nell’essere scansionato su h2?

No, straightly answers the article.

Call to action

Try SEOZoom

7 days for FREE

Discover now all the SEOZoom features!
TOP