Quality raters guidelines, Google updates the evaluators document

News admin 16 October 2020

After 314 days, or 10 months and 10 days, here comes the new update for Google quality raters guidelines, meaning the guidelines that the search engine submits to its quality raters to address the work of evaluating the SERPs. According to the first readings, there should be no major interventions or changes, but it is still useful to find out which sections are updated, so as to have a general overview of what is important to Google in search results.

Quality raters, guidelines updated on the 14th October 2020

The previous version of the guidelines for Google’s external evaluators dated back to December 5th, 2019 and, as we said in our article, it stressed the importance of excluding personal prejudices in the analysis of results in order to provide users with objectively better answers.

Almost a year later – and what a year, this 2020! – Google then put its hands on the PDF document and, on 14th October 2020, published the new updated and expanded version, which goes from 168 pages to 175 pages.

https://static.googleusercontent.com/media/guidelines.raterhub.com//en/searchqualityevaluatorguidelines.pdf

Among the most useful news is the introduction of a change log, a register of changes that reports the main interventions and changes made – at the time, over a period of two years, from October 2018 to today’s date.

What changes with the new version

Barry Schwartz on Search Engine Land and on Seroundtable quickly studied the main innovations of the new guidelines, at least from a structural point of view. Thanks to analysis through a scanner, he found “281 substitutive changes of content, 233 content insertions and 209 content deletions, most of which seem stylistic”.

More specifically, he noted that Google added a section to explain the role of examples in these guidelines, has also moved the section’s position on the relationship between the quality of the page and the needs met and has also added a section to evaluate the results of dictionaries and encyclopedias for different queries.

The news of the latest version of quality raters guidelines

Looking at the change log we have some more information in the changes made to the document. Google has:

Added a note to reiterate that evaluations do not directly affect the order and placement of search results.
Stressed that “the role of examples in these Guidelines” as an independent section in the introduction.
Added a clarification on the Special Content Result Blocks, which may have links to landing page, including an illustrative example.
Updated the indications on how to evaluate pages with malware warnings and when to assign the flag Did Not Load (not uploaded), inserting illustrative examples.
Modified the order of pagination of the Rating Flags section and Relationship between Page Quality and Needs Met in order to give greater clarity.
Added the Rating Dictionary and Encyclopedia Results for Different Queries section, in which emphasizes the importance of understanding the search intent and the query for the evaluation of the satisfied needs, adding illustrative examples..
Made minor changes throughout the document, such as updated examples and explanations for consistency; simplified language to explain to evaluators what they represent for people in their locality; corrected typos, etc.

The section for dictionary and encyclopedia ratings

As mentioned, the newest part concerns the indications for the evaluation of results for queries relating to dictionaries or encyclopedias with respect to the satisfaction of the intentions and needs of users.

The guidelines say that “when assigning Needs Met ratings for dictionary and encyclopedia results, it is necessary to pay special attention to the user’s intent“.

As with all other queries and results, Google points out, “the usefulness of dictionary and encyclopedia results depends on the query and intention of the user“. In particular, “The results of dictionaries and encyclopedias may be topically relevant for many searches, but often these results are not useful for common words that most people in your local evaluation area already understand“.

Therefore, the claim is to “reserve high Needs Met ratings for dictionary and encyclopedia results [only] when the intention of the user for the query is likely what it is or what it means and the result is useful for users looking for that type of information“.

The importance of updated evaluations and statistics

As we know (and as Google is keen to point out at every turn), the quality raters do not directly influence the ranking of the results on the search engine, but they do a useful job to improve the quality of SERPs and ensure that users find the answers they really want and that they meet their purpose.

Basically, they provide a feedback that helps Google to improve its algorithms – and, as we have noticed in various circumstances, the updates of the guidelines slightly precede the broad core update, but without wanting to hypothesize a link between these two factors, but simply signaling a “coincidence”.

Recently, Google has also updated statistics on the work of its quality raters, as noted by Jennifer Slegg on The Sem Post, to reflect changes made in 2019 to search results.

Numbers of the work of quality raters

In 2019, Google performed 464.065 experiments with Google’s quality evaluators, which led to 3620 “search improvements”: this first means a significantly lower number of experiments than in 2018 (when 654,680 were made)but curiously more effective improvements in results (3234 were mentioned in 2018), while in 2017 there were 200,000 experiments that led to 2400 updates.

More specifically, Google revealed that it performed 383.605 quality tests in 2019 with its evaluators, another significantly reduced result compared to 2018 (595,429 tests). These quality tests are generally performed by Google to ensure that the search results show the highest quality results, based on the overall search results and the sites or pages that appear in the top positions of the SERPs.

Syde-by-side experiments and live traffic

One of the ways in which quality raters evaluate search results is a kind of A/B test: they display a double panel in the hub, which reports on the one hand (random) the current search results and, on the other hand, a potential update applied to the research results. The raters must provide a rating on both versions and assess which side they consider to provide the best search results, not limited to the traditional “ten blue links” but also to the overall results, which include the several Google features that, as we analyzed a few days ago, are increasingly predominant and take space and visibility away from organic results.

Moreover, still in 2019 Google conducted 17.523 experiments on real-time traffic (this time an increase compared to 15.096 the previous year): these are experiments that are tested with real-time researchers, in which you see “things like colors, borders and different font sizes used in search results on a very limited basis to understand how the general public reacts”. Google enables these types of features to a small percentage of users, “usually starting at 0.1 percent”, and in some cases these live tests make changes “so minor that you don’t even notice, like the 41 shades of blue tested by Google several years ago”, reminds Slegg.