Google and spam: 25 billion pages with issues discovered every day

Put us to the test
Put us to the test!
Analyze your site
Select the database

“Every search matters”: it is with this phrase, which encloses the company’s philosophical manifesto, that opens up the post with which Google presents the results of the activity of its antispam team during the year 2019. The most sensational figure is that concerning new spam pages discovered every day by the search engine, which are over 25 billions, but the report also contains interesting indications to understand which are the most problematic areas and which the overall commitment of the American giant in this area.

Google’s battle against spam

The article on the official Big G blog is signed by Cherry Prommawin, Search Relations, and Duy Nguyen, Search Quality Analyst, who start by recalling the importance of Google’s antispam activity: “Whenever you come on Search to find useful and relevant information, it is our constant commitment to ensure that users receive the results of the highest possible quality,” they write.

However, unfortunately, on the Net “there are some disruptive behaviors and contents that we call webspam and can degrade the experience of people looking for useful information”. It has therefore become necessary for Google to put in place “a number of teams working to prevent the webspam from appearing in search results“, because “staying ahead of spammers is a constant challenge”.

The definition of spam

The report also features an article written by Danny Sullivan on The Keyword, which deals with the topic in a more general way and starts from the definition of spam according to Google.

It defines spam as the use of techniques that attempt to imitate the high-quality signals sought by Google without actually keeping the promise of high-quality content or other tactics that could prove harmful to users. This includes various types of techniques – commonly known as black hat SEO – from the scraping of pages to keyword stuffing, from participating in link schemes to implementing devious redirects.

The numbers of the spam battle on Google in 2019

The Google index is made up of “hundreds of billions of web pages serving billions of queries every day”, and therefore “perhaps it is not too surprising that there continue to be bad actors trying to manipulate the search ranking”. And the spam threat is one of the reasons why Google continues to pay close attention to revealing few details about the operation of its systems.

More surprising perhaps is reading the quantitative impact of this phenomenon: “more than 25 billion pages that we discover every day are spam,” they say from Google. “It’s a lot of spam” and this issue “demonstrates the scope, persistence and duration of the activities that spammers are willing to do”.

The results of the antispam work

Google is constantly working to “ensure that the chance to meet spam pages in Search is as low as possible”, and efforts over the years have “helped to ensure that over 99% of visits to our results lead to spam-free experiences“. At the same time, the company works with webmasters “to ensure that they follow the best practices and can be successful in Search, making great contents available on the open Web”.

The content identified as spam is often demoted or removed completely from search results.

The main forms of web spam in 2019

In the 2018 Google Webspam Report, the company reported that it has reduced user-generated spam by 80%: thanks to the work done last year, “this type of abuse has not grown,” says the new report.

A difficult problem remains the spam link, but even in this case the Google team is getting good results and is limiting its impact. Specifically, Google’s systems have captured more than 90% of spam links and “techniques such as paid links or link exchange have been made less effective,” explain Prommawin and Nguyen.

Even hacked spam – which remains “still a commonly observed challenge” – has more stable numbers than in previous years and Google has continued its work on solutions to detect the problem and better notify it to the webmasters and platforms concerned, helping them to recover compromised websites” with Search Console tools and the process of reconsideration of the site.

The spam trends on the 2019 Google Webspam Report

One of Google’s top priorities in 2019 was “improving our ability to fight spam through machine learning systems“: such solutions are not perfect and should be combined with a human review, a team of spam detectives trying to figure out if pages or sites violate the guidelines. This mix has been crucial in identifying and preventing the spread of spam results to users.

In recent years, spam sites have increased with automatically generated content and scraped content, as well as growing spammy sites that use “behaviors that annoy or harm users, such as fake buttons, huge and oppressive advertisements, suspicious redirects and malwares“. These websites “are often deceptive and do not offer any real value to people,” says Google, but the team’s effort has allowed it to “reduce the impact on search users from this type of spam by more than 60% compared to 2018″.

L'azione antispam di Google

Prommawin and Nguyen also write that Google is gradually continuing to improve its “ability and efficiency in identifying and capturing spam, continuously investing in reducing broader types of damage, such as scams and fraud”. These types of sites “cause people to think about visiting an official or authoritative website and in many cases users may end up disclosing sensitive personal information, losing money or infecting their devices with malwares“. During 2019, Google’s anti-spam teams “paid close attention to scam and fraud-prone queries and worked to keep up with spam tactics in those spaces to protect users”.

Everyone’s contribution to defeat the spam

Much of the work done by Google to combat spam consists in using automated systems to detect the behavior of spam and individuals, but even so “it is not possible to capture everything”, admits the American giant. For this, you need the contribution of all the actors of the Search: for instance, as “person who uses the search engine, you can help us to fight spam and other problems by reporting spam in results, phishings or malwares”.

Over the course of 2019, users sent out nearly 230,000 search spam reports and Google was able to act on 82% of the reports it compiled, which serve to keep search results clean.

How Google’s antispam activity works

The article also summarizes the processes of Google’s antispam activity when the system receives user reports or automatically identifies a criticality: “an important part of what we do is to alert webmasters when we detect something wrong on their website”, say the Googlers, and only last year “we have generated over 90 million messages to website owners to inform them about issues and problems that may affect the visibility of their site in search results and potential improvements they can implement”.

Of all these messages, “about 4.3 million were related to manual actions“, the kind of penalty that sanctions sites in open violation of the Google webmaster guidelines.

More tools for webmasters

Another important front of Google’s anti-spam work is the ongoing “search for ways to better help site owners”: among those launched in 2019, the article recalls the “many initiatives aimed at improving communications, such as new Search Console messages, Site Kit for WordPress sites or Auto-DNS verification in the new Search Console”, which provide webmasters with “more convenient methods to verify their sites and will continue to be useful”, also ensuring faster access to news and, therefore, the ability to discover and solve the problems of webspam or hacking more effectively and efficiently.

The reconsiderations of nofollow links

The focus was not only on cleaning up spam, because Google has not neglected the commitment to “keep up with the evolution of the web”, with a view to the reconsideration of the nofollow attribute, officially became a suggestion for ranking from March 1, 2020.

Initially introduced as a means to fight spam in comments and report sponsored links, “the nofollow rel has come a long way” and continues to evolve, just as Google’s ability to fight spam has evolved. The post also recalls the introduction of new link attributes, rel = “sponsored” and rel = “ugc”, which provide webmasters with other ways to identify on Google Search the nature of certain links, and Prommawin and Nguyen reveal that “these new rel attributes have been well received and adopted by webmasters all around the world”.

 

Call to action

Try SEOZoom

7 days for FREE

Discover now all the SEOZoom features!
TOP