Spam on Google is increasing: in 2020, 40 billion pages with abuses have been blocked daily

News admin 3 May 2021

2020 has been an exceptional year in many ways that, but if we only talk about aspects related to the digital world, it has led to a strong increase in the use of the Web for many aspects of everyday life, from purchases to systems of conversation at a distance. The trend has also affected search engines, which traffic volume has risen significantly, also causing unwanted effects: as revealed by the 2020 Google Webspam Report, in fact, spam has also strongly increased compared to the past, but Google’s detention systems have become better at recognizing and blocking suspicious pages.

Numbers of spam in 2020

Let’s directly focus on the numbers, which make us immediately see the big picture.

Last year more than 4.6 billion people, that is almost 60% of the world’s population, have accessed the Net, with an increase of 7.3% compared to the previous figure (source Digital 2021, which also indicates that “the values could be even higher, due to problems related to the proper tracking of Internet users related to the pandemic by COVID-19”).

Interesting is also another aspect: two out of three people, in fact, define the search for information among the main reasons for their navigation, with traditional search engines that remain “a go-to substantially by default for 98% of the online population”. We can then imagine what is the traffic of and on Google, which continues to be the search engine worldwide (over 92% of preferences globally, almost 96% in Italy, source statcounter).

And so, it is no surprise to find that malicious attempts to take advantage of the expanded mass of users have also increased: according to the 2020 Google Webspam Report, in fact, Google has “detected 40 billion spam pages every day, including sites compromised or created in a misleading way to steal your personal information, and we blocked their display in the results,” and in addition to traditional web spam has “expanded efforts to protect against other types of abuse, such as scams and fraud”, reads the official post. This is an increase of 60% compared to 25 billion last year.

2020 Google Webspam Report, the results of Google’s antispam work

The article by JK Kearns, Product Manager of Search, describes Google’s activity to find and remove spam from search results and focuses on the five ways Google tries to prove “the safest way to search”.

The last thing a user needs to worry about “when he searches for pie recipes or does research on a work project is landing on a malicious website, where his identity could be stolen”, and it’s Google’s job “Help him protect himself, and he’s one of those we take very seriously,” Kearns writes.

Since 2018 the Google’s fight against spam has highly increased and “we have been able to protect hundreds of millions of searches a year by detecting potentially fraudulent sites and preventing users from ending up on scam sites that try to fool with low quality sites with keyword stuffing, brand logos they’re imitating or a scam phone number they want you to call.

Actions against spam

The Web hosts many extraordinary things, the Product Manager said, “but it is also a place where bad actors can try to take advantage of you or access your personal information“: for this reason, Google is always at work to keep users safe during search and also to provide tools to take control of your search experience.

In particular, to combat spam in the bud provides “website creators with resources to understand the potential vulnerabilities of the site and better protect their projects, as well as tools to check if their sites have been compromised“.

This work helps the entire Web to remain more secure and allows those who search “to access secure sites with extraordinary experiences more easily”.

AI’s support

It outlines in more detail the work against Google spam the article by Cody Kwok, Principal Engineer of the company, which focuses on the importance of artificial intelligence (AI), which offers unprecedented potential to revolutionize the approach to the problem.

By combining “our deep knowledge of spam with artificial intelligence, last year we were able to build our artificial intelligence to fight spam, which is incredibly effective in capturing both known and new spam trends“. A concrete example is the possession of sites with automatically generated content, reduced by more than 80% compared to a couple of years ago.

In 2020 hacked spam was still rampant, with “a number of vulnerable websites remained quite high, although our detection capability has improved by over 50% and we have removed most of the hacked spam from search results”.

In addition, also in view of last year’s “big events, including a global pandemic”, Google has “dedicated significant efforts to extend protection to the billions of searches we have received on such important topics”. And so, “if you’re looking for a COVID test site near you, you shouldn’t worry about ending up with meaningless spam that might redirect you to phishing sites”. The elimination of spam content was also joined by collaboration “with several other research teams to ensure that you receive the most up-to-date and top quality information when and where it is most important“.

The contribution of everyone is necessary

But the fight against spam “is not a problem that we can solve on our own”, writes Kwok, with particular reference to the openings on the sites: “Even if we could detect and protect against all spam, hackers would not stop exploiting the loopholes until they are all closed”.

And so, Google asks for everyone’s contribution to keep the web safer together, and in particular website owners “can protect their sites by practicing good security hygiene: it is easier to prevent a site from being hacked than to restore it from a hack”.

Google’s intervention to prevent spam

If the concrete fight against spam in search results through the possession of malicious sites is the first front of attack, Google’s interventions also extend with four other methods that try to increase the safety of users.

The encryption of searches protects “even from something more than spam”, because it prevents hackers and unwanted third parties from seeing what the user is looking for or accessing his information. All searches made on google.com or in the Google app are protected by encrypting the connection between your device and Google, while keeping the information more secure.

Another way in which Google tries to protect users is to make available to them “tools and context to learn more about search results”. The reference is to the recent function “about this result” (not yet available in Italy), which allows you to find out more information about the results from sources that the user does not know through a click on the three points next to the result that pops up a tab with the description of the site, the date of first indexing on Google and type of connection (safe or not); this additional context “allows you to make a more informed decision on the source before clicking the blue link”.

Google has also activated the Safe Browsing feature that currently protects over four billion devices and prevents users from ending up “to click a link to a dangerous site without even realizing it in the enthusiasm of trying to learn more about a topic”. When enabled in Chrome, the feature shows warning messages that inform you that the site you are trying to access may not be safe, “protecting you and your personal information from potential malware and phishing fraud”.

No less relevant is the action of protection from bad ads: Google’s commitment to providing “access to reliable and high quality information on Search also extends to ads that appear when searching for products, services and content”, says Kerns, and to ensure that such ads are not fraud or misused “we constantly develop and enforce rules that put users first”. Statistically, on all of the group’s platforms (including Search) in 2020 Google blocked or removed about 3.1 billion ads for violation of its rules and limited an additional 6.4 billion ads.

How Google prevents spam

The work of spam prevention by the Search is analyzed in more detail by Kwok’s article, which explains that “every day Google discovers, scans and indexes billions of web pages, but before providing a series of search results many things happen behind the scenes“; the aim is to prevent spam from hindering the search for useful and functional information, and as we have said, the enemy is fierce.

This diagram conceptualizes how Google defends itself (and defends users) against spam.

First, there are systems that can detect spam when scanning pages or other content. Scanning occurs when automated systems visit the content and consider it for inclusion in the index used to provide search results. Some content detected as spam is not added to the index.

These systems also work for content discovered via sitemaps and Search Consoles: for example, “Search Console has a request indexing function that allows creators to communicate new pages that should be added quickly”. Google has discovered that “spammers hack vulnerable sites, pretending to be the owners of these sites, occurring in the Search Console and using the tool to ask Google to scan and index the numerous spam pages they have created”but thanks to artificial intelligence “we were able to detect suspicious verifications and prevent spam Urls from entering our index in this way”.

Subsequently, other systems analyze the content included in the index, which “work to recheck if the content that corresponds to your query could be spam”. In that case, that content will not be displayed among the first search results. This information is also used to “improve our systems, so that spam is not included in the index”.

Thanks to the action of automated systems aided by artificial intelligence, the result is that “very little spam actually enters the best results that anyone sees for a search”. According to Google estimates, “these automated systems help to keep over 99% of visits from Search completely free of spam” and, as for the small percentage remaining, “our teams proceed with a manual action and use the learnings obtained to further improve our automated systems”.

Protection from other online problems

But it is not just spam that threatens users, and Google’s commitment also extends to other problems and abuses, many of which “can cause significant financial and personal damage”.

Last year “we have made significant progress in improving our coverage and protecting more users from online scams and frauds, which have many forms and can hit you negatively in more ways than traditional web spam,” says Kwok.

For example, many scammers pretend to offer customer support phone numbers to popular services and products only to induce users who call to pay them via bank transfers or gift cards. Commonly known as “customer support scam” or “technical support scam”, this type of scam has been reported by hundreds of thousands of users and it could cause people the loss of hundreds of dollars bacause of scammers in each single case.

Algorithmic solutions have ensured the reduction of the presence of fraud and scams in search results, with the aim of “anticipating the challenges to provide the most reliable results”, and at the same time also users can “Protect yourself better by staying informed and learning about scams”.

Another dimension in which advances in artificial intelligence have greatly contributed was the understanding of the content of the sites. An example is “how we have helped to improve the way we rank product reviews and purchasing and informational sites”. Google Search is “a great way to search and find products before making a purchase and we wanted to make sure you get the most useful information for your next purchase, rewarding your content with more in-depth research and useful information”.

Still a lot of spam on the Web, what we can do

Google continues to grow and users seem to use Google to search for more and more queries.

Despite the significant progress made by the search engine to fight spam and the constant work “to improve and protect people from new types of abuse, spammers are strongly motivated to develop new techniques that can circumvent our detection”. A useful support can be the external reports of the users themselves, who can report recent experiences with the Search in which they felt “misled, cheated or spammed” by sharing feedbacks using the report on spam, in which to indicate the query and any other information that might be useful.

Those who work in search marketing and operate cleanly can only welcome the ability of Google not to show sites containing spam, low quality and above all harmful before ours, in the hope that these efforts of Google will help our legitimate sites to position themselves better and, at the same time, also protect us as users from these malicious searches.