Misspelling, how Google recognizes and corrects errors in Search
Not all evils come to harm, says an old adage, and for misspellings this maxim seems to be particularly fitting: the search system on Google evolves, in fact, also thanks to typos, results of approximate knowledge or incorrect typing, which serve the algorithm to grow and improve.
Errors and misspellings in Google Search
The topic of misspelled terms had already been at the centre of Google Search On 2020 anticipations, when the Senior Vice President of Search, Prabhakar Raghavan, had revealed that “one search query in ten contains spelling errors”.
These data, together with the number of new words constantly inserted (and that sometimes may not give relevant results), made necessary the development of a new algorithm dedicated to the deciphering of spelling errors, with high ability to understand and correct spelling and ability to respond with the right results “in less than 3 milliseconds”.
Thanks to this algorithm Google is able to better and quicker understand the context of words with spelling errors and thus provide targeted suggestions to the user; simply put, it is the secret that allows Google to magically know what we are looking for, even when our search query contains typos and is misspelled.
The misspellings help Google
Pandu Nayak, Google Fellow and Vice President of Search, who wrote an interesting article on “the ABC of spelling in search“, also spoke recently on the topic in which it explains first of all that it is already over 20 years that Google has introduced and uses a spelling control system, although spelling and spelling remain “a continuous challenge for the understanding of the language”.
Even before you can “start searching for relevant results for a search query”, Google must “know what a user is looking for, typed correctly”: but the high number of search queries with misspellings and the continued introduction of new words, “together with new ways to write them incorrectly”, require constant and specific work to improve.
How Google classifies misspellings
The first thing Google’s artificial intelligence does when it comes across what it considers to be a word with misspelling errors is classify it, and there are two main categories of misspellings: conceptual errors and finger slip errors.
The conceptual mistakes are those made “when we are not sure how to write something and we try to guess with our best hypothesis”.
Slip-of-finger errors occur when “we know how to write what we are looking for, but we accidentally type it incorrectly”.
Examples of conceptual errors on Google
Also known as best-effort spelling (effort to improve spelling), an error of this kind occurs if a user does not know how to write a word and type it into what he believes to be the best way.
Nayak clarifies the situation with an example related to the term gobbledygook (which identifies incomprehensible words), which is also “a difficult word to pronounce and has two commonly accepted spellings, including gobbledegook“. If we want to deepen the meaning of gobbledygook but we do not know exactly how to write it, we risk typing what we consider to be the best solution for us (and closer to reality), such as “garbledygook”, “gobblydegook”, “gobbleygook”, “gobbly Gook” and more.
Examples of slip-of-finger errors on Google
Different is the case of the misspelling caused by slip-of-finger, which can be considered a kind of classic print errors: the user knows the term and knows how to write, but wrong to type it in the box due to haste or distraction.
So this is an accidental error, rather frequent and growing error with the spread of smartphones – “but it also happens when we type on standard-sized keyboards”
It’s a situation that each of us has probably experienced at least once and that’s why we see more than 10,000 variations of queries like YouTube, all generated by the accidental scrolling of a finger, such as ytoube, 7outub, yoitubd and tourube” (all letters or digits on the keyboard are close to the correct ones).
Interventions on misspellings
Despite the frequency with which errors occur, many queries with misspelling “only appear once, making spelling a unique challenge for Search” and, regardless of the type of misspelling error, Google systems find ways to understand what we mean.
Google’s previous approach to search queries with spelling errors never seen before was simply based on keyboard design: for instance, explains Pandu Nayak, “If you tried typing u but you made a mistake, our systems had learned that it was more likely to premise y rather than z because on a standard English-language keyboard the y key is adjacent to u”.
Google’s models “applied the general concept to all new spelling errors, proceeding with nearby letter replacements until a popular replacement term was identified“. In practice, they would analyze the error in the query and begin to replace the letter closest to the one typed to see “the final effect”, eventually continuing with that still adjacent and then so on with the following until you find a suitable letter to compose a correct word.
On the surface, this is an obvious way to solve finger slipping errors, but in reality it has proved a valid approach also to correct conceptual errors.
The new system based on deep learning
Thanks to progress in deep learning, Google has started “a better way to understand spelling” for a few months now, introducing “a new spelling algorithm that uses a deep neural network that best models and learns from less common and unique spelling errors”.
Such advancement “allows you to run a model with more than 680 million parameters in less than two milliseconds, so that people can search without being interrupted by their own spelling errors”.
The progress is evident: previously, the algorithm brought out the results that a user was looking for in less than three milliseconds, while today models with over 680 million parameters in less than two milliseconds – “a very large model that works faster than the wing beat of a hummingbird,” says Nayak.
How works the Google algorithm for misspellings
Instead of using the previous keyboard approach, the new algorithm uses context to understand what a user intended to type, thus succeeding “to know what someone is looking for, regardless of the type of error and whether we have never seen the misspelling before”.
In detail, explains the VP of Search, Google’s models to understand natural language “examine a search in context, as the relationship that have words and letters between them within the query”, first trying to decipher or try to understand the entire search query. From there, “we generate the best substitutions for words with misspellings in the query based on our general understanding of what you are looking for”.
For example, from the analysis of the other terms of the query “average home coast” Google deduces that probably the user is looking for information on “average home cost“.
Synthesizing, therefore, with the new approach the Google misspelling algorithm:
- Assesses the entire query, not just the wrong word.
- Searches for replacement words that fit the overall query.
- Provides search results based on the “best match”.
The correction of misspelled queries
We can see these spelling technologies appear in Google Search in different ways, correcting a possible misspelling with a probabilistic prediction of what we intended to look for.
When the algorithm is “quite sure” of knowing what we are looking for, it may politely ask us “did you mean” another query and then show us the alternative that it thinks is more likely we were actually looking for.
When the algorithm is “very sure” that you have correctly identified the misspelling error, it will automatically show the results for what it thinks we are looking for, “but we will always let you know and provide you with a way back to your original spelling”. That is to say that a note below the search bar informs us that there has been a change and gives us the opportunity to see the search results for the original misspelled query.
Why errors are useful to Google
The way we respond to these notes and interact with the results thus provided directly affects the algorithm, as Google uses these signals to continue training the AI. As Pandu Nayak says, “whether you accept our suggestion or not, we constantly learn and improve our systems based on that feedback to make research more useful”.
And so, even a misspelling error while searching on Google, which may seem just annoying, actually becomes something useful and “bigger”, which allows the search engine to “keep improving our spelling so you can keep looking”.