Guide to Schema.org, the heart of semantic SEO
It was launched eleven years ago, now, and has recently reached release number 15 to keep up with the continuous advancement of the Web and technology: more than anything, however, Schema.org has emerged as the new vocabulary used by crawlers to interpret information about Web sites and various pages, an essential element for improving communication (also) with search engines and the heart of semantic SEO. So let’s go find out what this project is, how Schema.org works, and why using its markup of structured data on Web pages, the famous microdata, can be useful for SEO and for a site’s visibility in general.
What is Schema.org
Schema.org is a collaborative community activity active since 2011 with the mission of creating, maintaining and promoting schemas for structured data on the Internet.The site has become the reference for publishing documentation and guidelines for the use of structured data markup on Web pages, commonly called microdata.
In a nutshell, Schema.org is a project that provides a particular set of agreed-upon definitions for microdata tags: its mission is to standardize the HTML tags that are to be used on sites to create multimedia results on specific topics, and in that sense it has become central to the development of the semantic Web, which as we know aspires to make document codes more readable and meaningful to both humans and machines.
In particular, Schema.org was founded on the idea of making it easier and simpler for ordinary, everyday sites that make up the Web to use machine-readable data, with the understanding that such data underlies an ecosystem of applications used by millions of people.
The history of Schema.org
The project officially kicked off on June 2, 2011, when major search engines including Google, Yahoo and Bing (joined a few months later by Yandex, Russia’s leading search engine) launched a collaborative initiative to create, support and promote a common set of schemas for the markup of structured data on the Internet, Web pages, in e-mail messages and beyond, repeating a sharing exercise pioneered a few years earlier with the development of XML sitemaps.
The interesting part lies in the proposal to set up a common, participatory vocabulary – named schema.org itself – based on Microdata, RDFa or JSON-LD formats to allow Web pages to autonomously mark up the content present via with metadata, which can be more easily recognized by search engine spiders and other parsers, thus ensuring access to the meaning of sites.
Schema is a rapidly growing area of semantic SEO, and over the past eleven years great efforts have been undertaken to update the vocabulary with new releases, thanks in part to the vital participation of a large online community that uses open and shared channels to simplify the efforts and work of webmasters and developers.
At present, the latest release is version 15.0, published last October 25, which contains bug fixes and improvements to previously implemented features, and the list of usable markups is constantly evolving.
An evolving project: the Coronavirus case
To understand how the updates work-and also how quick and timely they are-we can refer to what happened in March 2020: in fact, just days after the official recognition of the Coronavirus emergency as a pandemic that the whole world was facing, even Schema.org reacted to the new needs that were emerging, releasing specific structured data to update information on sites at the time of the coronavirus and supporting the entire digital community, struggling with the battle against the invisible enemy that was disrupting even Google searches, as we recounted in this study.
Specifically, as early as March 16, 2020, version 7.0 of Scheme launched, designed precisely to be “in step” with what was happening on a global scale, with lots of “Special Announcements” related to changes in programs and other aspects of daily life. Not only the closure of facilities, businesses or schools and the rescheduling of events, but also the ability to temporarily use medical facilities such as test centers.
Thus, “a new accelerated vocabulary” was needed to aid the response of the entire Web ecosystem to the Coronavirus outbreak, the official announcement explained.
The structured data for the Coronavirus
Since that date, then, some new structured data have been introduced, starting with “SpecialAnnouncement” and “eventAttendanceMode“, which enable sites to more accurately inform users and provide services and responses tailored to people’s needs.
The first type is used to signal “special announcements” and provides simple text updates marked by the date, with markup to associate such an announcement with an abnormal situation (such as the Coronavirus pandemic is). For example, we can point to URLs for various types of updates such as school closures, public transportation lockdowns, quarantine guidelines, travel bans, and information on how to undergo testings to check for COVID-19 infection.
And as new facilities are being created around the world in which to perform such tests, Schema.org has created the CovidTestingFacility type that allows them to be represented, regardless of whether they are part of permanent medical facilities or are temporarily adapted facilities for emergency use.
The Schema team also explains that they are working to “make improvements to other areas and help smart working,” facilitating the migration that is affecting the whole world to working online from home. One example is the EventMovedOnline property, which allows users to signal that an event has been moved from a physical location to an online mode.
With eventAttendanceMode, on the other hand, organizers can indicate the type of the event, i.e., whether it is offline (at the physical venue), fully online, or mixed (e.g., event with speakers physically present but no audience in the room, connected only via streaming).
Google and Coronavirus, the information for event organizers
Subsequently, Google also intervened on the topic, offering a blog article detailing how sites can manage structured data for scheduled events and adding other useful information.
Again, the goal was to show users the latest and most accurate information about events in a rapidly changing environment, with the COVID-19 emergency around the world that is leading to the cancellation, postponement, or online-only release of so many conferences and appointments.
Google itself knew something about this, as it postponed on a global scale all scheduled Webmaster Conference appointments in 15 countries, and then added some new optional properties to the developer documentation that apply to all regions and languages, specifically taking action on some structured data markup.
The schema.org eventStatus property sets the status of the event, and at this stage is very useful for reporting whether the initiative has been canceled, postponed or rescheduled, because it allows Google to show its current and actual status to users, “instead of completely removing the event from the event search experience.”
Specifically,
- If the event has been cancelled we need to set the eventStatus property inside EventCancelled and keep the original date in the event startDate.
- If the event has been postponed, but the new date is not yet known, we need to leave the original date in startDate and pass the eventStatus to EventPostponed. The startDate property, Google explains, is needed to help identify the uniqueness of the event, so it is important not to change it until we know the new date. At that point, we can change eventStatus to EventRescheduled update startDate and endDate with the new unfolding information.
- If the event has been rescheduled for a later date, simply update startDate and endDate with the new dates. There is also the optional option of marking the eventStatus field as EventRescheduled and adding the originally scheduled date in previousStartDate.
- If the event has changed to online-only we can optionally update the eventStatus field to indicate the change with EventMovedOnline.
Other more specific directions come for all those events that inevitably went virtual, because the Mountain View team “is actively working to show this information to Google Search users.” So if we have organized such an initiative, we can communicate it to the search engine (and users) using two particular properties:
- We set the location to the VirtualLocation type.
- We set the eventAttendanceMode property to OnlineEventAttendanceMode.
In conclusion, the article reminded us to “update Google” after making changes to the event markup: the means of communication is to “make your Sitemap available automatically through your server,” believed to be the best way to ensure that our new and updated content is highlighted in search engines as quickly as possible.
What are Schema.org markups: let’s get to know microdata
We have called Schema.org a semantic vocabulary, and its language is based on tags or microdata that we can add to the HTML of our pages to improve the way search engines read and represent the page in SERPs.
More specifically, schema markups found on Schema.org, is a form of microdata used to represent data; the actual data is called structured data and organizes the content of the page making it easier for Google and other crawlers to understand the information. When added to a Web page, schema markup is read by search engines, which more precisely recognize the meaning and relationships behind the entities and can provide multimedia or rich snippet results in search results to enhance the user experience.
The essential element is therefore microdata, an HTML5 specification used to nest metadata within existing content on Web pages: search engines, Web crawlers and browsers can extract and process microdata from a Web page and use it to provide a richer browsing experience for users.
As a result of the use of microdata, the structure of the Web site will be simple and easy for search engines to scan, and this can improve its visibility in search result pages, as well as potentially affecting ranking as well.
Markup Schema.org and SEO: the usefulness and benefits
As marketers, schema markup is important because it allows us to simplify and refine search engines’ scanning and understanding of our site, which can then improve the effectiveness of the results shown to users.
In essence, by adding this simple code to our pages we can provide vital information to search engines, which can improve the online visibility as well as click-through rates of our project; more importantly, microdata can provide context to an otherwise ambiguous Web page.
Schema.org markup is plentiful and includes, for example, information about ratings, reviews, e-Commerce products, events, and more; as we know, Google (and other search engines) use such microdata to compose features in their SERPs, and here for example are listed some of the main types of multimedia results linked to structured data that appear on Google.
On the SEO side, schema.org markup has become increasingly important in recent years, to the point that it is considered one of the ranking factors for Google; but it is also a key element in the creation of the Knowledge Graph, the skeleton that enables the definition of structured data, and, last but not least, a factor to be exploited for voice searches. In practice, Google’s evolutions draw fully on this form of microdata.
In its first eleven years, there are millions of sites using Schema.org to mark up their Web pages or email messages; in addition, many applications from Google, Microsoft, Pinterest, Yandex and other big companies already use this system to deliver rich and valuable experiences.
Why use schema.org markup on a site?
Markup is especially useful in the age of Hummingbird and RankBrain: the way a crawler, and Googlebot specifically, interprets the context of a query determines the quality of a search result, and the right use of Schema.org can provide context to an otherwise ambiguous Web page. The various types of structuring help refine content more clearly or more prominently in search results, and so potentially every site and every online business can benefit from implementing Schema.org, to generate rich snippets or even simply to offer specific, targeted information to users, especially for localized searches.
On the other hand, this stems from the very nature of the project, developed as mentioned with the ambitious idea of improving the Web by creating a structured data markup scheme supported by the major search engines Google, Bing, Yandex, and Yahoo! to create advanced search functionality for users and enable them to find more relevant information in SERPs. Through on-page markup, search engine crawlers understand Web page information better and more easily and provide richer search results. A shared vocabulary, as mentioned above, also allows webmasters to have functional, working reference templates to maximize work and business results for their efforts.
Schema.org’s formats.
From a technical standpoint, there are three main formats of Schema.org markup available: since 2015, Google began supporting the JavaScript-based JSON-LD format and, as of September 2017, recommended using JSON-LD for structured data where possible, but RDFa (an acronym for Resource Description Framework in Attributes, which work well in different document types such as XML, HTML 4, and SVG) and microdata also remain accepted and used. Initially, the very form of microdata was the one adopted by Schema.org, which then over time expanded its vocabulary to the other, more user-friendly types.
What schemas can be used on sites
Schema.org’s main vocabulary is composed of hierarchically organized sets of “types,” each of which is associated with a set of properties; currently, there are 797 types, 1457 properties, 14 data types, 86 enumerations, and 462 enumeration members that you can integrate to your site. The most common case is the “thing” type, which is the most generic because it refers to “things” and is used by approximately 200 thousand domains, while more specific Types are for example Actions, Organizations and Products.
In simpler words, Schema.org essentially defines a dictionary of terms (types, properties and enumerated values), linked by a very precise taxonomy: the first level of the hierarchy is represented by types a collection of types or classes indicating textual elements, also marked as items, to each of which correspond properties that describe them more precisely.
Synthetically, the most commonly used types are those that refer to Creative Works (such as Books, Recipes, TV Series or Movies), embedded non-textual items (audio, video or images), Events, Organizations, People, Places, Restaurants, Local Activities, Products, Offers, Aggregate Offers, Reviews, Actions and so on.
Actually, despite these technicalities the process of implementing markup and structured data on the site is quite simple, thanks in part to the many schema markup generator tools that are currently available. In addition, as support, each page on Schema.org provides examples on how to correctly apply the tags, and then there are specific data validators (such as Yandex Microformat validator or Bing Markup Validator), which allow you to monitor whether the implementation was successful.
In Google’s house, the reference utility is the Rich Results Testing Tool, which precisely helps test the validity of data tagged with schemas and microdata, and there is also a report section in Google Search Console for structured data that cannot be analyzed, which lists the presence of any schema codes that are not working or not properly used.
Tips on Schema.org
Ultimately, using Schema.org’s page markups is something that is recommended for all sites, not least because you can anticipate new evolutions in the systems by which search engines leverage such data to deliver more targeted and accurate results to queries. However, it is good to remember two things: not all types of information entered into schema.org are then actually displayed in search results, but it is easy to imagine that there are other accelerations coming.
As a rule of thumb, then, you should only mark up content that is actually visible to people visiting the Web page, and therefore do not use schema.org for content in hidden elements or hidden divs.