Guide to structured data: what they are, how to use them and why they are useful
This is a topic that is becoming increasingly central and important, and it comes into play when we talk about semantic SEO and, in detail, features such as featured snippets, knowledge graphs, and multimedia results that appear in Google SERPs. Therefore, it is time to address the topic in depth and understand what structured data is, how to include it in the pages of our site, and why this information is important for SEO optimization and to communicate more accurately with Google.
What is structured data
Structured data is meta-information manually inserted into the HTML code of a page to provide additional data about sites and pages and to allow semantic search engines to better rank their content.
In the broadest sense, it is information (data) that is organized (structured) and becomes understandable to search engines: in more technical terms, it is a standard format that allows Google and other search engines to better navigate a site, understand page relationships, and obtain information to better understand and evaluate it, enabling it to be displayed as a multimedia result in search results.
In the words of Google’s guidance on this topic, updated in February 2023, “structured data is a standardized format for providing information about a page and classifying its content,” so that search engine algorithms can analyze the page itself from what they interpret as explicit clues about its meaning.
These small portions of code are called structured data because the information is organized according to a defined schema, i.e., the famous schema.org vocabulary, which since 2011 has provided the rules for organizing the information found on publishers’ websites, the markup (the computer language) used to define entities of each type and the relationships between them, turning content into data and, more precisely, metadata.
That is, information that is not directly viewed by visitors to the site, but is intended for search engines, which, thanks to this language, can more easily and without the need for algorithmic interpretation understand what the images and content on the marked-up page are about and accurately display such content in search results.
In practice, by correctly entering this meta-data, respecting the syntactic rules and common frame of reference, Google can understand the meaning of the information and, after analyzing it, return the best and most relevant results for users’ queries.
A win-win situation for the search engine and sites, in short.
The importance of structured data for search engines
And so, structured data is a common system of providing information about a page and its content, which uses schema.org vocabulary and generates different search features in Google.
Markups are useful because, as mentioned, they help Google’s systems understand the content on a site and page more accurately-thus, users get more relevant results for their queries and better understand how such pages are relevant to their searches. At the same time, if a site implements structured data, it might be chosen by Google for a better, enriched display in Search results (but of course, there is no guarantee of a direct link between the use of structured data and actual presence in search results, they state from Google).
More generally, we need data that is structured to make it easy for machines to read, understand, and classify such information, especially in light of the increasing evolutions in technology and the expanding size and complexity of the Internet. Even as Google and other search engines continue to get smarter and more advanced, their resources are still limited in terms of time, processing power, and energy to distribute across all the activities required for their “wheeling,” and so providing them with more direct information is certainly one way to facilitate their tasks.
In summary, then, search engines use structured data for three main purposes: to recognize the entities on the page, to understand the relationships among these entities, and ultimately to return the right answer to the desired query to the user.
Many sites (even today) do not use these tools and offer crawlers data retrieved only from database repositories, formatted in HTML code that can be difficult to interpret; in contrast, structured data makes life easier for crawlers, who use the information to better understand a site’s core business or main topic and thus improve search results for that business.
What is structured data: some examples
Examples of structured data are a product name, review content, ratings, and images: markups allow a publisher to tag a product name, review content, ratings, and images for search engines.
Structured data continues the evolution of the semantic Web and allows search engine crawlers to interpret without misunderstanding even the type of a document and Web page: that is, with structured data we can tell Google whether our site hosts news or feature articles, recipes or products, whether the date entered is that of creation or modification, what topics are covered, and so on.
In addition, this information makes it possible to immediately determine the basic parameters of a business (so-called NAPs), name – address – location of the business.
Structured and unstructured data
Sstill in the realm of “theory”, it is good to dwell again on definitions, because not all computer data available on the Web are the same: in fact, alongside structured data there are other types called “unstructured data“, and the differences concern how the information they communicate is retrieved, collected and scaled, as well as the type of database in which it resides.
Thus, structured data refers to organized data, while unstructured data identifies unorganized data – not from the perspective of human understanding, but from the perspective of machine understanding. If we write, for example, “Gennaro is the author of this article and works at SEOZoom,” we are not providing search engines with organized data: while it is easy for readers to understand the sentence and the information it contains (Gennaro is a human being, a copy, works at SEOZoom, and SEOZoom is a brand), for search engines this information is not equally immediate. They can understand and organize it, but with an extra “effort” because the data provided is “ambiguous”-Gennaro could be a brand name and SEOZoom a person’s name, to exaggerate: thanks to structured data, we can instead better define information and clarify “concepts” and entities for search engines.
Last digression. According to an IBM page, true structured data, also classified as quantitative data, are those that follow the structured query language (SQL) developed by the same company in 1974, and are highly organized and easily deciphered by machine learning algorithms. Using a relational database (SQL), business users can quickly enter, search and manipulate such structured data.
Unstructured data, on the other hand, is typically classified as qualitative data: it cannot be processed and analyzed by conventional data tools and methods because it does not have a predefined data model, and therefore can be managed in nonrelational databases (NoSQL) or through data lakes to store it in an unprocessed form.
In this understanding, structured SEO data would actually be semi-structured data (examples mentioned are JSON, which we will discuss shortly, but also CSV, XML) and represent a “bridge” between structured and unstructured data: they have no predefined data model and are more complex than structured data, but easier to store than unstructured data. Semi-structured data use “metadata” (e.g., tags and semantic markers) to identify specific characteristics of the data and scale the data into preset records and fields, and ultimately enable better cataloging, searching, and analysis of information than unstructured data.
How to put structured data on the site: learn more about formats and techniques
Implementing structured data on a site is not complicated and no special skills are needed, thanks in part to available tools that make the process even easier. As the aforementioned Google guide explains, there are three main languages for this information, each characterized by specific syntactic rules and schema.
One of the first languages used to schematize structured data is called RDF (Resource Description Framework) and allows knowledge to be implemented in properties, descriptions, and entities; the second is called microdata, employs HTML tags and attributes, and allows properties of objects in the document to be associated with entities, so that the nature and characteristics of the objects can be traced. The use of both declined after the introduction of the JSON-LD script language, which uses a JavaScript object in the HTML page and is also preferred by Google for flexibility and ease of use, since it can be inserted directly into the Head section of the HTML document.
More specifically:
- JSON-LD, short for Jason for linked data. A JavaScript notation embedded in a <script> tag in the <head> and <body> elements of an HTML page. The markup is not intertwined with user-visible text, a fact that makes it easier to express nested data elements, such as the Country element of Event’s MusicVenue’s PostalAddress. In addition, Google can read JSON-LD data when it is dynamically inserted into page content, such as through JavaScript code or widgets embedded in the content management system.
- Microdata. An open-community HTML specification used to nest structured data within HTML content. Like RDFa, it uses HTML tag attributes to name the properties we intend to expose as structured data. It is generally used in the <body> element, but can be used in the <head> element.
- RDFa. HTML5 extension that supports linked data by introducing HTML tag attributes that correspond to the user-visible content we intend to describe for search engines. RDFa is commonly used in both the <head> and <body> sections of the HTML page.
We reiterate that all three formats are equally good for Google, as long as the markup is valid and properly implemented according to the feature documentation, and the preference toward JSON-LD depends “only” on the fact that it is usually the easiest format to implement and manage-i.e., it reduces the chance of user error.
In fact, in a (now old) episode of Google’s SEO Mythbusting series, focused on clearing the air of some of the false myths and urban legends surrounding the world of search engine optimization, Martin Splitt talked with Suz Hinton (Cloud Developer Advocate, Microsoft) about microformats, structured data and optimization tips for effective use on sites. Specifically, after defining microformats as “annotations of HTML code that add semantic information” to page content, Splitt says they are not the best resources available to a site that wants to offer rich content to its users, because “there are better tools“, namely more appropriate structured data. For the Googler, specifically, JSON-LD is a kind of evolution of microformats that uses a different language instead of the previous microdata attributes, the basis of which is the schema.org project, the open-source organization that seeks new ways to integrate semantic data into the Web.
And it is precisely the users of the Schema platform, Splitt tells us, who have built far more semantic data than Google and search engines can actually support in search results pages.
How to add structured data on pages
In general, structured data on the Web uses Schema.org as a reference vocabulary and can be incorporated into Web pages using various online tools, including Google’s Structured Data Markup Assistant, or by adding code directly to Web pages.
To practically embed structured data on the site we have essentially two options:
- Manually add the code to the pages of the site.
- Use a dedicated plugin, which still requires manual choices.
We can choose the first method if we know how to work with code and are familiar with the process of creating and adding structured data to the site; plugins, on the other hand, do not require technical knowledge in code, but still require an understanding of SEO and how semantic markup works, as well as a general review of the content and management of mandatory fields. Many of these plugins automatically add the most important data for the type of site, and we can select and determine what type of content is present per page or article, and then describe the page in the most appropriate way for search engines with valid structured data.
Actually, there is then a third way, which is to rely completely on special tools that completely automate the process.
How to verify structured data
Structured data can be added to a page using schema.org vocabulary or by tagging data on the page using Google’s Data Highlighter tool, and currently nine categories of data are supported, to define articles, events, local businesses, movies, products, restaurants, software applications, TV episodes, and books.
To verify that it has entered markups correctly and improve the performance of on-page structured data, Google initially made the Structured Data Test Tool available to webmasters and developers, which represented Google-specific validation and was integrated into the Search Console.
For some time now, however, Google has abandoned this tool, or rather migrated the tool to a new domain – the Schema Markup Validation Tool, hosted on a Schema.org site domain-which works essentially the same way, allowing us to view the structured data detected for the site, along with any errors in the page markup that may adversely affect the display of rich snippets or other features.
The alternative is the Google Rich Results Test, which allows us to validate the structured data and preview any feature they trigger in Google Search. Better yet, this tool is used instead to find out whether the analyzed page supports multimedia results: structured data is very generic and not specific to the goal a site intends to achieve by implementing it, which is why the multimedia results test helps us to check whether the markup entered works and whether Google, therefore, can potentially show the information as a rich result.
Structured data and Google: how structured data works in Search
Google uses the structured data it finds on the Web to understand page content, but also to gather information about the Web and the world in general-particularly information about people, books, or companies included in markup – and concretely uses it to trigger its special features in SERPs, i.e., the thirty or so features that show users rich information about the query they have just made.
Typically, Googlebot can still recognize entities within Web pages, but the use of structured data ensures that the information is taken, interpreted, and categorized in the most accurate and safe way. When it takes in this data, Google first identifies our entity (Brand or Person) and then the relationships with other related entities (companies, products, services). From an SEO perspective, if our competitors have not yet implemented structured data, this could offer us a direct advantage, as we will also discuss later.
There are many visible results of Google’s use of structured data: a first case is rich snippets or rich results, the additional information that appears for certain queries. For example, a search for a restaurant might also offer users the average rating of the reviews obtained and the typical price range, while for a movie, data on awards won or actors starring may appear: in the case of La Stangata, we can mark the awarding of the Oscar for best picture, etc. Structured data is then the basis of the system by which Google creates the Knowledge Graph, the graph that gives an advantage to recognized entities because it is a symbol of Google’s trust of the site and brand.
Perhaps the most classic example is that of a recipe page: with valid structured data (the ingredients, cooking time and temperature, calories…) the site can appear in a graphical search result, that is, in a rich result within a carousel, or reviews and other related information can be shown.
From a technical perspective, Google explains that structured data is encoded through in-page markup on the page to which the information is applied, whose content it describes. Compared to Schema evolutions, however, Google Search may not support all attributes and objects (which in any case may be useful for other search engines, services, tools, and platforms), and so if our goal is to enable Search functionality we need only refer to Google Search Central documentation to understand what properties are mandatory, recommended, or optional for structured data in Google’s house.
Why use structured data: the benefits for SEO
From all that has been written, we may have already guessed the importance of structured data for a site, which is a useful tool for “speaking” to search engines in a language they understand better.
If used well, structured data can support our SEO work because it makes it easier for Google to understand what pages, products, and the website are about: Google’s job is always to understand the content of a page in order to give answers to users, and using structured data is like communicating directly with Google by giving the algorithms explicit clues about the meaning of a page, which can potentially help us in terms of visibility as well.
When appropriate, in fact, structured data changes the appearance of our snippets in Search, showing more information-and more specific information-to users, who may then be more likely to click on our results. People appreciate rich snippets-which is precisely detailed information about a web page that Google learns from structured data-because they can find out immediately what ingredients are needed for a recipe, how difficult it is or how much preparation time it takes, and even how many calories the dish will contain; or, they can find out the price of products and what people who have bought them think about them.
In short: if Google understands page markup, it can use that information to add rich snippets and other features to your search result. Wanting to translate into other words, Google needs the structured data of a site and page, and it uses it to trigger search results that may be more engaging for users, who may then be encouraged to interact more with the website.
It is Google itself that reveals the successful results of some case studies of websites that have implemented structured data for their site:
- Rotten Tomatoes added structured data to 100,000 unique pages and saw a 25 percent higher click-through rate on pages with structured data, compared to pages without structured data.
- Food Network converted 80% of its pages to enable search result features and recorded a 35% increase in visits.
- Rakuten found that users spent 1.5 times more time on pages where structured data was implemented than on pages without structured data and a 3.6 times higher interaction rate on AMP pages with search results features than on AMP pages without features.
- Nestlé found that pages shown as advanced results in Search had an 82 percent higher click-through rate than pages not shown as advanced results.
The benefits of markups for organic visibility
Some time ago, Search Advocate Daniel Waisberg published an article to elaborate on this topic, pointing out that using structured data on a site allows for a richer search experience and can make the difference between positive and negative performance – even in light of new trends in user engagement on SERPs.
To explain the concrete benefits of using structured data, the Mountain View blog analyzes three concrete examples of sites that have benefited in terms of performance and ranking, namely Eventbrite, Jobrapido, and Rakuten. In the first case, the event management and ticketing site
Eventbrite leveraged structured data for event coverage and saw a 100 percent increase in year-over-year traffic growth from search.
The other case study cited by Google involves the job search engine, Jobrapido, which integrated with the job experience on Google Search and saw a 115 percent increase in organic traffic, a 270 percent increase in new user registrations from organic traffic, and a 15 percent reduction in bounce rate for Google job page visitors. Finally, the Japanese giant Rakuten, as mentioned above, used the recipe search experience and generated a 2.7 increase in traffic times from search engines and a 1.5 increase in session duration times.
Three major benefits for websites
Even more interesting are the tips that come from Google on how to use structured data to gain benefits for one’s online site, because the article lists some possible benefits that are generated, which can be summarized as increasing brand awareness, highlighting content, and highlighting product information.
Structured data to increase brand awareness
In relation to brand awareness, the use of structured data allows you to leverage features such as a searchbox with logo, Local business information, and site links; in addition to adding markup, you must verify the site for the Knowledge Panel and claim activity on the now former Google My Business.
Featured content
On the other hand, those who publish content online can take advantage of the many features that can promote articles and attract more users, depending on their industry; the list of rich results includes informative articles, breadcrumbs, events, job openings, recipes, reviews, and more.
Featured product information
The benefits can also apply to eCommerce businesses that sell products, because structured on-page data allows Google to immediately show information such as price, availability and review scores.
What mistakes to avoid for on-page markups
To sum up, we said that structured data is gaining more and more weight for Google, which devotes specific and rather constant attention to these markups: on the other hand, the Search system is based on the attempt to understand the content of a webpage in the best possible way, and structured data is a means to provide Google with explicit indications about the meaning of a webpage.
Being a rather technical subject, it is possible, however, to run into some mistakes that can damage the strategy or otherwise make the work performed futile, and below we try to list the main problems with structured data, delving particularly into some technical issues and one of a more theoretical nature, so to speak.
Basically, Google makes it clear that if our page contains a problem related to structured data, this may result in a manual action: as opposed to “general” ones, in this case there is no effect on the page’s ranking in Google Search, but the page loses its suitability for display as a multimedia result.
- Not understanding the value of structured data.
The first mistake is the one we mentioned when talking about theory: it may indeed be easy to misinterpret the value and meaning of structured data implemented in a web page, and consequently misjudge its use.
It is Google that specifies a central point: “The use of structured data triggers the presence of a feature, but does not guarantee it“, and also “there is no guarantee that your page will show up in search results with the specified feature.”
It is still Google’s decision whether and when to show structured data
This means that search results will always be shown based on the interpretations of Big G’s algorithm and its attempt to give the user the “best search experience“, taking into consideration “many variables, including search history, location, and device type.” Thus, the algorithm may “in some cases determine that one feature is more appropriate than another, or even that a simple blue link is,” or it may favor a competing site’s page over our own.
2. Committing syntax errors.
The syntax for proper markup of structured data is quite rigid and complex, so it is not uncommon to fall into some compilation difficulty or forget to add a required or recommended property. One of the most common mistakes is to skip a required comma, or to disregard case sensitivity, the distinction between uppercase and lowercase letters to which the JSON-LD language is subject.
To avoid this mistake, you can use Google’s structured data testing tools, which allow you to check the correctness of the information during development, or the multimedia results status reports after deployment, which allow you to monitor the status of the pages.
3. Misusing structured data.
As we have tried to explain, structured data should be” a faithful representation of page content,” markups that help Google show more useful and accurate results to users. It follows that including structured data that “is not representative of the page’s main content or is potentially misleading” is a serious mistake, because it goes against the very principle of the tool, just as the search engine’s official guidance warns us against creating empty pages only to include structured data and from adding structured data on information that is not visible to the user, even if it is accurate.
Among instances of irrelevant data Google cites two off-limits examples, namely “a live streaming sports site that labels broadcasts as local events” or “a woodworking site that labels instructions as recipes.” Other wrong uses of markups are creating blank pages only to contain structured data or adding them “on information that is not visible to the user, even if that information is accurate.” However, it seems clear that acting in this way gives rise to distortions that do not serve onpage optimization.
4. Blocking Googlebot access.
The fourth point is also very technical: a common mistake, however naive, is to block Googlebot access to pages implemented by structured data using one of the control methods such as robots.txt, noindex or other systems. Doing so, as is clear, prevents proper crawling of content and, in practice, renders structured data useless, which cannot be used by Google for indexing or shown in SERPs.
- Frequent errors by structured data category.
In the guidelines on the topic Google also presents a useful list of common structured data errors, problems and failures, broken down by markup category.
- For example, in the events category there are two types of mistakes: using in-page markup but not actually having content related to visible events, or using text that appears more “aimed at promoting or selling the event than describing it.” The problem of presenting markup that does not relate to the content also occurs with the recipe category
- On-page content that does not match the markup. In the field of job postings, on the other hand, the errors are more numerous: in addition to the general one of non-concordance between markup and on-page content, Google also cites the impossibility for the user to apply for the position and the mismatch between the markup and the job description visible to the user. Also deemed serious are the presence of misleading job documentation on the page and the poor quality of the offer (i.e., if a “payment is required to apply or the job appears to be fake”).
- Problems with structured directory and product data. Among list items, it is incorrect to treat the various items as a single element when assigning object properties: in particular, “performing markup of only one category entity among those listed on the page goes against our guidelines”, Google points out. Therefore, you should avoid assigning a single review rating or position to a list of items, just as you should not treat lists as individual items. Also very specific is the guidance on products and their reviews. First, there is a set of precise rules for the indication of the product’s name, which should not be identified simply through the brand name of the manufacturing or selling company nor with a description of it: the indication Nexus 5X is valid, Google says, but not “Android Phones” or “Best-selling Nexus Phones.”
- Regarding the markup of reviews there are other stringent recommendations: a review written by the site or person providing the product or service is considered wrong, while those made by a customer or an independent, unpaid reviewer are acceptable. In addition, if a page displays reviews it must also offer users the opportunity to submit their own opinion, with the exception of a single, acknowledged author’s review.
The quality standards for structured data
Finally, there are more general rules regarding the proper approach to implementing structured data on our pages, which, for example, must not violate classic Google Search guidelines for content, including those related to spam, as well as quality standards are listed.
- Content.
- Provide up-to-date information, because Google will not show advanced results for outdated content that is no longer relevant.
- Provide original content generated by us or our users.
- Do not perform markup of content that is not visible to readers of the page: for example, if the JSON-LD markup describes an artist, the HTML body must describe the same artist.
- Do not perform markup of irrelevant or misleading content, such as fictitious reviews or content unrelated to the topic of a page.
- Do not use structured data to deceive or mislead users; do not steal the identity of people or organizations; and do not misrepresent our ownership, affiliation or main purpose.
- Content in structured data must also comply with additional content guidelines or rules as documented in the specific feature guide. For example, content in JobPosting structured data must comply with job posting content rules. Content in structured data of type Exercise must comply with the guidelines related to content of type Exercise.
2. Relevance.
- Structured data must be a faithful representation of the page content, and Google flags as examples of irrelevant data:
- A live sports streaming site that labels broadcasts as local events.
- A woodworking site that labels instructions as recipes.
- Completeness.
- Specify all mandatory properties listed in the documentation for the specific advanced result type; items for which no mandatory properties are specified are not eligible for advanced results.
- The more recommended properties we provide, the higher the quality of results for users. For example, users prefer job postings with explicit salary information than those without it, and recipes with real user reviews and authentic star ratings (reviews or ratings not from real users may result in manual action). Advanced results ranking takes into account additional information.
- Location.
- Enter structured data on the page they describe, unless otherwise specified by the documentation.
- If we have duplicate pages for the same content, we recommend that you place the same structured data on all duplicate pages, not just the <a href=”https://www.seozoom.it/rel-canonical-cosa-sono-e-come-suggerire-a-google-gli-url-canonici/”>canonical</a> one.
- Specificity.
- Use, as much as possible, the most specific applicable type and property names defined by schema.org for markup.
- Follow all additional guidelines provided in the documentation for the specific advanced result type.
- Images.
- When specifying an image as a structured data property, it is important that the image be relevant to the page on which it is located; for example, if we define the image property of NewsArticle, the image must be relevant to the news article in question.
- All image URLs specified in the structured data must be able to be crawled and indexed; otherwise, Google Search cannot find and display them on the search results page.
- Multiple elements on a page.
Multiple elements on a page means that there are multiple types of elements on the page: for example, a page might contain a recipe, a video showing how to make it, and breadcrumb information related to how users can find the recipe. All of this user-visible information can also undergo markup using structured data, allowing search engines such as Google Search to more easily understand the information on a page. If we add more elements to a page, Google Search can get a more complete picture of the topic it covers and display it in different search features.
Google Search includes multiple elements on a page, both in the case of nesting and specifying each element individually, and specifically:
- Nesting. When there is only one main element and additional elements are grouped under that element-an especially useful solution for grouping related elements, such as a recipe with a video and reviews.
- Single elements. When each element is a separate block on the same page.