“There are those who still remember it with fear, and those who lie”, to paraphrase a well-known Web meme. On August 12, 2011, exactly ten years ago, the Google Panda algorithmic update arrived in Italy, which had caused serious disturbances to Serps in English as early as February, when the implementation started first in the United States and then in other markets. But what was this update and why it was so revolutionary? Let’s retrace the story and discover the most curious secrets.
What is the Google Panda Update
Still remembered today as one of Google’s most memorable algorithmic updates, Panda has been presented from the beginning as an intervention oriented to bring out “more high quality sites in Search”, as Amit Singhal and Matt Cutts wrote (back then respectively Google Fellow and Principal Engineer) on the official blog right in February 2011.
In practice, this update is designed to filter out low-quality content or “thin content” from Google’s organic search results, so as to “provide people with the most relevant answers to their questions as quickly as possible”. According to the (few) information disseminated, Panda implements theories of scalable machine learning algorithms at the Google index, to understand which sites offer useful content and which do not, making accurate predictions of how humans would assess the quality of content.
In fact, more than a true search algorithm, Panda is defined as a “filter” of quality: when it meets pages that do not meet the criteria – summarized in the famous 23 questions to build quality sites written by Singhal (and valid in large part still today) – lowers its ranking, both to protect users from poor or irrelevant content, and to encourage other sites to take better care of the content and to fall into the same mistakes.
Why it is called Google Panda
Other information (a curious one): the name of the update is not a reference to the panda animal (although there still is a “double meaninge”), but to the engineer Biswanath Panda, one of the main persons responsible for the intervention (as revealed by Bill Slawski, who found some of the researches that led to the update).
The immediate impact of Google Panda update
That the Panda update was not a simple algorithmic update was almost immediately clear: in a single day, and only on Google USA, the intervention hit almost 12 percent of the queries, twisting the search results in a sudden way.
In fact, Google has been working to address the underlying problems in Panda for over a year and has focused for months on this specific change, which has had the objective of countering (if not excluding directly from the Serps) the so-called Content Farm, that is to say the set of sites, portals and aggregators of news that published content of poor quality only to attract visits and earn by clicking on ads Adsense.
This means, therefore, that the Panda update has not only tried to limit duplicate content, but more generally pages with useless topics for users, with huge presence of external links and advertising of various kinds, or even misleading content, written with the sole intention of bringing visitors to the site but then propose totally different topics and not corresponding to the term sought.
Types of content and sites affected by Panda
This is a quick list of problematic phenomena among Google SERPs that the Panda update has tried to stem:
- Thin content
Weak pages, with very little relevant or substantial text and resources: for instance, a set of pages describing a variety of health conditions with only a few phrases present on each page.
- Duplicate content
The classic copied contents, which appear in more than one place on the Web. Problems of this type may also occur on the same site, if it has multiple pages with the same text with minimal or no variations.
- Low quality content
Pages that provide little value to readers since they lack in-depth information.
- Lack of authority/reliability
Contents produced from sources that are not considered definitive or verified. According to Google, sites are quality if they are recognized as an authority on their subject, an entity to which a human user would trust in providing their credit card details.
- Content farm
Set of many low quality pages, often aggregated by other websites. An example of a content farm could be a website that employs a large number of cheap copywriters to create short articles that cover a wide range of search engine queries, producing content that lacks authority and value for readers, because they have as their main purpose simply to get the search engine rankings for every term imaginable.
- Low quality User Generated Content (UGC)
A frequent situation, for instance, in blogs open to short guest posts, full of misspellings and grammar and devoid of authoritative information.
- High proportion of ads and content
Pages consisting mainly of paid advertising rather than original content.
- Low quality content around affiliate links
Thin content around links that point to paid affiliate programs.
- Websites blocked by users
Sites that human users are blocking directly in search engine results or using a Chrome browser extension, thus reporting low quality.
- Search queries with not matching content
Pages that “promise” to provide relevant answers when you click from the search results but then do not.
To be affected by the Panda update was also the activity of article marketing, carried out by SEO publishing low quality articles on content farm sites as a form of link building. Moreover, the most affected sites had less attractive designs, more intrusive ads, inflated word counts, low editorial standards, repetitive sentences, imperfect searches and, in general, were not useful or reliable.
How to recover from Google Panda
Panda is often mentioned as an update from which a site can be resumed with difficulty, but it is not entirely accurate: the road is rather “simple”, in theory, because it plans to increase the quality and uniqueness of the content and the site in its entirety.
And so, on the practical front, the corrective actions to be taken in order to recover rankings were:
- Abandonment of content farming practices.
- Review of the contents of the site in the light of criteria of quality, utility, relevance, reliability and authoritativeness.
- Review of the ad/content or affiliation/content rate, so that the pages are not dominated by ads or affiliate links.
- Guaranteed correspondence between relevance of the content of a given page with the query of a user.
- Removal or revision of duplicate content.
- Accurate control and editing of user-generated content, ensuring (where possible) that it is original, error-free and useful to readers.
- Use of the noindex or nofollow robot commands to block the indexing of internal contents of the site that are duplicated, almost duplicated or otherwise problematic.
In summary, sites that consistently published high quality original content have had little to fear from this update; on the contrary, sites involved in ambiguous practices have probably been affected by Panda, because the hope to avoid the filter was to develop a brand recognized as an authority in its field and build a site that proved to be a reliable resource thanks to its excellent content.
The Panda update chronology
As mentioned, Google Panda was first launched back in February 2011 to delete black hat SEO tactics and web spam.
At the time, users’ complaints about the growing influence of “content farms” had become rampant, and Google decided to put a stop to the situation through the new filter, which assigned pages a quality rating, used internally and modeled on quality assessments performed by human persons, which was then incorporated as a classification factor.
The timeline of the Panda update has actually been much more articulated, as reconstructed by Danny Goodwin:
- 0, 23 February 2011
Google introduced the first iteration of an unnamed algorithm update, shocking the SEO industry and many big players, and ending the business model of the “content farm” that existed at the time.
- 0, 11 April 2011
The first update to Panda’s main algorithm, which incorporates additional signals, such as sites that Google users had already blocked.
- 1, 9 May 2011
Initially called Panda 3.0, Google actually made it clear that it was just an update of the data, as with all subsequent updates 2.x.
- 2, 21 June 2011
- 3, 23 July 2011
- 4, International, 12 August 2011
Panda is launched internationally for all countries also not speaking English, except Japan, China and Korea. It iss the anniversary we remember today, 10 years later.
- 5 and Panda-Related Flux, September/October 2011
Series of consecutive minor updates.
- 0, 19 October 2011
Added some new signals in the Panda algorithm and recalculated the impact of the algorithm on websites.
- 1, 18 November 2011
Release of a minor update, which had an impact on less than 1% of research.
- 2, 18 gennaio 2012
Data update
- 3, 23 February 2012
- 4, 23 March 2012
- 5, 19 April 2012
- 6, 27 April 2012
- 7, 8 June 2012
- 8, 25 June 2012
- 9, 24 July 2012
- 9.1, 20 August 2012
- 9.2, 18 September 2012
- 27 September 2012
A relatively large Panda update, which marks the beginning of another naming convention in the community (which abandoned the 3xx numbering, which had gone on and on in an almost embarrassing way).
- 5 November 2012
- 21 November 2012
- 21 December 2012
- 22 January 2013
- 14 March 2013
- Dance, 11 June 2013
It is not the date of an update, but the day Cutts made it clear that Panda wouldn’t be directly embedded in the algorithm, but rather that it would be updated monthly with much slower implementations and not with the sudden data updates of the past.
- Recovery, 18 July 2013
Update that seems to have been a tweak to correct some excessively harsh Panda activities.
- Panda 4.0, 19 May 2014
On this date an important Panda update occurred, which had an impact on 7.5 percent of queries.
- 4.1, 23 September 2014
Another major update, which had an impact of 3 to 5 percent of queries and included some changes to the Panda algorithm.
- 4.2, 17 July 2015
The last Panda update confirmed, with a slow implementation lasted months.
- Core Algorithm Incorporation, 11 January 2016
Five years after the release, Google confirmed that Panda was incorporated into Google’s main algorithm, probably at the conclusion of the slow launch of July 17, 2015. In other words, as of this date Panda is no longer a filter applied to Google’s algorithm after it has done its work, but is embedded as another of its main ranking factors. It has been made clear, however, that this does not mean that the Panda classifier acts in real time.
After Panda: Google’s work on the quality of results
Over time, Google’s work on quality has never been interrupted, and some of the principles behind the filter can also be found in subsequent innovations, such as EAT principles – Competence, Authority and Reliability – introduced in the research quality guidelines since 2014, and then became a central theme for all digital marketing professionals.
More generally, after Panda Google has continued to produce a series of basic updates and algorithmic changes that focus on the quality of content and user experience, and just mention only the recent Page Experience update to confirm it. The aim is always the same, that is to avoid the presence among the search results of thin content not informative, proposed by non-authoritative sources, with unreliable content and questionable links.
And so, although the name Panda may no longer appear in communications or may not be remembered, its principles are still relevant to Google today: the final advice, which is worth even today after 10 years, is not to resort to black hat tactics or spam links, but rather strive to focus on creating quality content for the user and on a site that makes the browsing experience and use of the pages enjoyable.
Also since, we know, Google’s use of machine learning and technology continues to adhere to these principles and evolve, and therefore the risk of losing rankings due to deficiencies in these aspects is still high.