Results quality, the tools used by Google to defend the Search

News admin 14 September 2020

For many people – that is to say more than 92 percent of all Web users, according to the latest global statistics – Google Search is THE place to go to find information on any topic, “whether it’s learning more about a problem or verifying the claims of a friend quoting a statistic about your favorite team”.

Every day Google receives billions of queries – and the 15 percent is represented by even new and never-before-searched queries, an increasingly high figure – and one of the reasons why people continue to turn to this search engine “is that they know that they can often find relevant and reliable information they can trust”.

Google’s efforts to provide reliable results

So writes Danny Sullivan, the Public Liaison for Search of Mountain View, in the article published on The Keyword in which he tells the latest evolutions of the system with which Google analyzes and protects the search results to avoid manipulation, vandalism and, in general, poor quality responses, intensifying its efforts to provide relevant and reliable information.

More specifically, the contribution of Pandu Nayak, Google Fellow and Vice President of Search, which lists and summarizes the company’s latest investments in the quality of information in Search and News.

There are no particularly new or shocking indications, but knowing the efforts made by Google in this area can serve as an incentive to constantly improve your site, so as to ensure the highest quality, more relevant content, reliable and accurate and a globally positive user experience.

An increasingly better understanding of web content’s quality

Offering a high quality search experience is one of the elements that makes Google so useful, says Sullivan, and “since the first days when we introduced the Pagerank algorithm, understanding the quality of web content was what distinguished Google from other search engines”.

As users become more aware, however, there are also growing questions and interest in what “quality” is meant and what assurances are given that “the information people find on Google is reliable“.

The answer lies in the “three key elements in our approach to information quality”:

First, “we design our ranking systems to identify information that people will presumably find useful and reliable”.
As a complementary element, “we have also developed a number of features that not only help you make sense of all the information you see online, but also provide direct access to information from authorities, such as health organizations or government agencies”.
Lastly, “we have policies for what can be displayed in Search features to make sure we show you useful and high quality content”.

These three approaches allow the system to keep on improving and increasing its quality level, always with the aim of providing a reliable experience for people around the world.

Ranking systems built on quality

To understand what results are most relevant to each query, Google uses a variety of language understanding systems “that aim to match the words and concepts in your query with the related information in our index,” Sullivan explains.

It ranges from “systems that include things like spelling errors or synonyms to more advanced ones, based on AI, such as BERT, capable of understanding more complex queries and in natural language“. Such updates to language understanding systems “certainly make the research results more relevant and improve the overall experience”, but it remains a gap, or the impossibility – even for tools with high and advanced abilities to understand information – to understand contents as humans do. And so, “we can often not say only by words or images if something is exaggerated, wrong, low quality or otherwise useless”.

How works the search of the quality

What search engines can do, however, is to broadly understand the quality of content through what are commonly called signals, which are “clues on the features of a page that align with what humans might interpret as high quality or reliable”. An example, quoted in the article, is “the number of quality pages that link to a given page”, which is a signal “that that page can be a reliable source of information on a topic”.

The work of quality raters and the value of the EAT

Google considers a variety of other quality signals and performs tests to check if the combination of quality signals works; in addition, it employs over 10 thousand quality raters that “perform millions of sample searches and evaluate the quality of the results based on how much they are up to what we call EAT – Expertise, Authoritativeness and Trustworthyness”.

The raters, following the instructions contained in the Search Quality Rater Guidelines – public and periodically updated guidelines, last time for now in December 2019 – analyze the results by sample query and evaluate whether and how many pages listed appear to demonstrate these quality features.

A process in constant evolution

Sullivan wants to emphasize two concepts: the quality raters help Google in the search evaluation process, but their indications are not used directly in the classification algorithms. These external collaborators “provide data that, when taken in aggregate form, help us measure the functioning of our systems to show quality content in line with the way people evaluate information in every part of the world, helping us to improve our systems and ensure high quality results”.

The attention toward YMYL topics

A specific focus is then dedicated to YMYL contents, “topics in which quality information is particularly important, such as health, finance, useful information for citizenship and crisis situations”where Google places an even greater emphasis on competence and reliability factors.

This decision is based on the fact that it emerged that “sites that demonstrate authority and competence on a topic are less likely to publish false or misleading information”; thus, building “Our systems to identify signals of such features, we can continue to provide reliable information” and this work “is our greatest defense against low-quality contents, including potential misinformation”.

Information from experts directly in Search

In most cases, Sullivan says with a hint of pride, “our classification systems do an excellent job in simplifying the search for relevant and reliable information from the open Web, especially for topics such as health or in times of crisis”. But in these areas, Google is also developing features to make information available directly on the Search from authoritative organizations such as local governments, health agencies and election commissions.

For example, “we have a long-standing knowledge panel in Research with information on health conditions and symptoms, controlled by medical experts”, and in recent times “we have seen a significant increase in people seeking information on unemployment benefits, so we have worked with administrative agencies to highlight the details of eligibility and how to access this service”; more, “for many years we have been offering features that help you find out how to vote and where your polling station is”. These are features that help ensure that people get essential guidance when they need it most.

Knowledge Graph to provide accurate information

Pandu Nayak provides some additional details on this topic in the other article, focusing in particular on the use of Knowledge Graph to quickly access facts from sources on the Web: to ensure accurate and high-quality information in these features and protect against potential acts of vandalism, Google has established partnerships with government agencies, health organizations and Wikipedia.

For example, to meet the needs of the COVID-19, Google has “partnered with healthcare organizations around the world to provide guidance and local information to keep people safe“, and is doing the same to respond to emerging information needs, “such as the surge of people seeking unemployment benefits, to whom we provide easy access to information directly from government agencies in the United States and other countries” or to give news about the elections, “working with non-partisan civic organizations that provide authoritative information on voting methods, candidates, election results and more”.

The sources of the Knowledge Panel

Information in the Knowledge Panels “comes from hundreds of sources and one of the most comprehensive knowledge bases is Wikipedia“, which has created robust systems to protect neutrality and accuracy through its community, using “machine learning tools combined with complex human supervision to detect and deal with vandalism”, and today “most vandalism on Wikipedia is restored in minutes”.

To complete Wikipedia systems, Google has “added additional protections and detection systems to prevent the display of potentially inaccurate information in knowledge panels” and to intervene in the rare cases of vandalism on Wikipedia that have escaped control. According to Nayak, “we have improved our systems to now detect 99 percent of those cases of potential vandalism, and when these problems occur, we have policies that allow us to act quickly to solve them”.

In addition, to further support the Wikipedia community, last year Google created the Wikiloop program “which hosts several editor tools focused on content quality”, including Wikiloop Doublecheck, “one of the many tools that Wikipedia publishers and users can use to track changes on a page and report potential problems”. In addition, Google provides “data from our tracking systems, which members of the community can use to discover new information”.

A help to the understanding of information

Google is also evolving into a tool to deepen topical news or to get more information on a topic, and increasingly often “people use Search after listening to information elsewhere, with the aim of seeing what others say to form their own opinion”.

The goal of the search engine is therefore also to provide users with the right tools “to make sense of the information they are seeing online, to find reliable sources and explore the complete picture on a topic”and this is achieved through various methods.

The most recent is the ability to easily locate verified news in Search, News and now even in Google Images by displaying label fact checks – that “come from publishers who use the Claimreview markup to mark the fact check on the articles they have published. For years then “we offer full coverage on Google News and Search, helping people to explore and understand how stories have evolved and explore different angles and perspectives”.

Google’s fact checking

On this topic, too, the article by Pandu Nayak offers additional cues, by recalling how Search and News are designed to help users see the big picture and easily understand the context behind the information that can be found online.

Only this year, until September 10 included, users have seen the fact check on Search and News more than 4 billion times, which is more than the whole 2019 put together. In addition, Google supports the ecosystem committed to exposing misleading information and has recently donated an additional 6.5 million dollars to help fact-checking organizations and nonprofit organizations focus on disinformation about the pandemic.

The use of BERT to improve the reliability of information

We then discover that an update has been “just launched using our BERT language comprehension models to improve the correspondence between news and facts check available”.

These systems can better understand “whether a fact check claim is related to the central topic of a story and highlight such fact checks more prominently in Full Coverage, a News feature that provides a complete picture of how a story is reported by a variety of sources“.

With a simple tap, Total Coverage “lets you see the headlines from different sources, videos, local news, Faqs, social comments and a timeline for stories that have unfolded over time”.

The policies to defend all Search features

The last topic covered by the Public Liaision Search concerns the policies applied to ensure and defend the quality of the results shown in the most general features of the Search, such as knowledge panel, featured snippets and automatic completion system, which “highlight and organize information in unique ways or predict the queries you might want to execute”.

For these features Google applies very high quality standards and follows specific guidelines, to which sites must refer to understand what content may appear in those spaces. In particular, automated classification systems are at work to show useful contents but, not always being perfect, are aided by human teams working to prevent the display of content that violates the rules and acts against those who are responsible for violations.

The preservation of the automatic completion feature

The vice president of Search speaks in more detail about Autocomplete, recalling the policies “that for a long time protect against the display of offensive and inappropriate predictions in auto-completion”. Google’s systems are designed to automatically enforce those rules and “we have improved our automated systems not to show predictions if we detect that the query may not lead to reliable content”.

In recent weeks, Google has “expanded our rules on automatic completion related to elections and we will remove predictions that could be interpreted as claims in favor of or against any candidate or political party”. The removal will also be applied to “provisions that could be interpreted as claims on participation in elections – for example, statements on voting methods, requirements or status of polling stations – or on the integrity or legitimacy of electoral processes, such as election security”.

Basically, this means “that predictions as you can vote by phone, you cannot vote by phone or make a donation to any party or candidate, should not appear in auto-completion”, but of course this does not prevent the user from still searching for what he wants and finding appropriate results.

Google’s investments for the quality of Search and News

The article by Pandu Nayak – published in the same hours as the previous one – also devotes a more specific paragraph to the investments made by the company to ensure quality results in Search and News.

“Providing high quality results is what has always distinguished Google from other search engines, even in our early days; over the years, as the product and user experience have evolved, our investments in quality have increased”, begins the vice president of the division Search, which also provides numbers that explain this effort.

Google performs “in-depth tests to ensure that research is as useful as possible, from the quality of the information we provide to the overall experience: since 2017 we have carried out more than 1 million research quality tests and now we have an average of over 1,000 tests per day“.

More and more attention to the usefulness of information

As never before this year, ensuring access to reliable information is more important than ever, not only because of the Covid-19 pandemic – the biggest topic in the history of Google Trends, so much so that it completely disrupts the search volumes, as also seen with our Covid Impact function – but also for updates on natural disasters and for various moments of civic participation around the world.

For this reason, Google has made new improvements to continue to provide high-quality information, always in the name of a “long-standing commitment to quality, which remains at the heart of our mission to make information about the world both accessible and useful”.

New insights from the Intelligence Desk

The information landscape can change rapidly because of the many new things that happen around the world every day: to understand how systems behave when news arrives, Google has developed an Intelligence Desk, “actively monitoring and identifying potential cyber threats”.

Nayak explains that “this commitment was born from our Crisis Response team, which for years has carried out real-time monitoring of events around the world, launching SOS alerts in Search and Maps to help people quickly get vital information” and over the years has “tracked thousands of events and launched hundreds of alerts to help keep people safe,” as we can see in the following picture.

The Intelligence Desk is a global team of analysts that monitors news events 24 hours a day, 7 days a week, embracing natural disasters and crises, breaking news moments and the latest developments in ongoing topics such as Coronavirus. In detail, “when events occur, our analysts collect data on how our systems are responding and compile reports on the narratives that are emerging, such as new claims on COVID treatments”. Subsequently, Google’s product teams “use these data sets and reports from the Intelligence Desk to perform more reliable quality tests and ensure that our systems work as expected for the wide range of topics people are looking for”.

Improved systems for breaking news and crises

The speed of the evolution of the situation makes necessary a change of gear even for Google: the development of the news in fact risks to make even the most recent information published on the Web suddenly obsolete, or in any case inaccurate and therefore unreliable, and more generally “people’s need for information can accelerate faster than facts can materialise”.

For this reason, in recent years Google has improved its systems “to automatically recognize the latest news on times of crisis such as natural disasters and make sure we return the most authoritative information available”.

Breakthroughs have also been made in the overall ability to accurately identify breaking news moments and do it faster: “We have improved our detection time, which only a few years ago was 40 minutes, to reach today just a few minutes from the news“, writes Nayak, always managing to ensure reliable and quality results, with analysis of the topics that could be susceptible to hate-speech, offensive and misleading information.