Fighting spam, the results of Google’s SpamBrain activity in 2022

News admin 13 April 2023

More than 99 percent of visits from Google Search are spam-free. That is one of the numbers, probably the one that offers the most pride, that comes from reading Google’s annual Webspam Report, the document with which the search engine informs about the concrete results of its activity against spam or otherwise illicit and opaque situations that can compromise users’ browsing experience. And, thanks in particular to the machine learning system called SpamBrain, the Webspam Report 2022 offers an optimistic overview of Search’s health, although of course the work against spamming never stops.

Google’s Webspam Report 2022, the numbers on counter-spam activity

Referred to in the 2021 report as “our most effective solution against spam,” SpamBrain is the name Google has given to its machine learning system, a platform from which algorithms that detect multiple forms of unwanted content start and which, as a ML feature, uses data to learn to become increasingly proficient at the task for which it is designed.

In 2022, Google made further improvements to extend the coverage of spam-fighting areas, an activity in which “SpamBrain is critical”: as the Report makes official, SpamBrain thus detected 5 times more spam sites than in 2021 and 200 times more than when it was first launched. Thanks to this system, as mentioned in the opening, Google was able to ensure “that more than 99 percent of visits from Search were spam-free.”

SpamBrain’s work: fighting spam links and compromised sites

Specifically, over the past year Google has worked to solidify SpamBrain “as a robust and versatile platform, launching multiple solutions to improve coverage of different types of abuse.” One such example was link spam: Google trained SpamBrain to detect sites that create spam-containing links, as well as sites created to broadcast spam-containing links to other sites. Thanks to the platform’s learning capability, it was possible to track 50 times more link spam sites than the previous link spam update. Similarly, efforts to teach SpamBrain more about compromise via spam (compromised spam or hacked spam) resulted in a 10-fold improvement in ascertaining compromised sites.

In addition, SpamBrain has also been a contributing factor in better detecting spam during crawling, meaning that Google is better able to identify spam from the time it first visits a page so that it does not index it at all and uses its resources more effectively to index useful pages. In a sense, then, we can say that one of SpamBrain’s many functions is to act as a gatekeeper, a guardian that blocks spam before it has a chance to enter Google’s index.

More security for users

But it’s not just spam that worries the Mountain View team, as the article by Duy Nguyen, Search Quality Analyst, reminds us: in recent months, Google has also implemented new anti-fraud solutions to improve security for Search users, which have improved coverage and, for the first time, extended fraud protections to all languages. Translated into figures, the result was a 50 percent reduction in clicks on scam sites compared to 2021.

And then, alongside actively countering spam, Google updated its anti-spam policies as part of Search Essentials (which we know are the search engine’s new General Guidelines), with particular reference to the most common types of spam and illegal and illicit behaviors that could result in a site ranking lower or not showing up in search results, so as to help site owners avoid creating malicious content.

Ultimately, in the wake of the huge interest in AI-generated and AI-assisted content, Google has published a guide on AI-generated content, which basically tries to explain how artificial intelligence and automation can be a useful tool for creating useful content, while it amounts to a violation of the historic policy against automatically generated spam content when artificial intelligence is used for the main purpose of manipulating search rankings.

Indeed, as Nguyen concludes, Google cannot carry on the fight against spam alone, and people’s input – in creating useful content and functional websites for users, but also in sending feedback and in-depth reports on spam and abuse-is valuable and crucial to achieving the ultimate goal of this hard work: detecting and undoing spam so that users can find the most useful content through Google Search.