DEV Community

Cover image for How to Boost SEO by Enhancing HTML with Microdata
Scott Nath
Scott Nath

Posted on • Updated on • Originally published at scottnath.com

How to Boost SEO by Enhancing HTML with Microdata

tl;dr

I've been re-writing the HTML of my site and added structured data, in the form of microdata attributes, following the Schema.org vocabulary set. Structured data can be understood by search engines and other machines, giving your content structure and context.

The following is what I've learned so far, with some code examples.

Prerequisites

You understand HTML

What is all this technology?

There are a few different technologies involved, but in general this is about how to include strucuted data in your HTML markup. Structured data falls under the semantic web label which rolls up into the new hotness of Web 3.0. This article is not about Web 3.0, I'm only at the what-I-read-on-wikipedia level of knowledge about the next web. This article will focus on microdata, Schema.org vocabulary and how to add it to your HTML.

Here's some short bits about each of those, with links to learn more.

What is structured data?

There is a vast amount of information out there about structured data and this article is more of a how than a what. Brevity is the soul of wit and all that, so I'll (try to) be brief and give you links to dive deeper. To shortcut "what is structured data", lemme just reference this article: Google explains structured data, which is an excellent primer on the subject. Low on time? Just watch the 1st video in the article for a great sum-up.

My sum-up: structured data is a way to include machine-readable key-value versions of the content on your page alongside (or integrated with) your HTML markup.

What is the semantic web?

In short - a machine-readable internet.

More details are out-of-scope for this article. This Smashing Mag article has a nice primer with charts and stuff.

What is Schema.org? What is their vocabulary?

The term "vocabulary" means the names and expected structure of each type of data.

Founded by Google, Microsoft, Yahoo and Yandex, Schema.org vocabularies are developed by an open community process. e.g. the largest search providers got together and came up with a shared way to document content on the web.

The concept is that a shared vocabulary makes it easier for webmasters and developers to decide on a schema and get the maximum benefit for their efforts. The benefit being, web developers can document their content with data structures that work across all the major search players.

huge caveat: this helps search engine accuracy, it does not help ranking. (this disclaimer is repeated across all the Search Engines' docs about structued data)

Schema.org type inheritance

The Schema.org vocabulary is made up of "types" which build upon each other with ever-more increasing specificity in properties. The most root type is the Thing type. Thing contains a set of generic properties like name, url, description. All other types generally derive from Thing. Person, Place, Action, etc. - they all fall under Thing which means they all automatically have the Thing properties (name, url, etc) and add their own on top.

For instance, the Article type is a superset of the Creative Work type. Creative Work adds a ton of properties on top of Thing, such as author, about, dateCreated and headline. Article then has more properties in addition to what is in Creative Work, such as articleBody, backstory, and wordCount.

How I chose microdata to document my content

To document your data in your HTML, you have four choices: microdata, JSON-LD, RDFa, and microformats

rejected: microformats, due to limited structures and it being more focused on webmentions, not the whole-of-your-content

rejected: RDFa had outdated docs and uses a different vocabulary from schema.org

So that left microdata and JSON-LD.

JSON-LD

A fairly common argument for JSON-LD says that JSON-LD is easier to maintain due to not being directly tied to HTML structure. You just create your data-mapping and when your system is writing your pages, it also writes the JSON-LD.

Downside? That duplicates all your content into <script> tags. It also requires you to write or manage a system to generate them (since it is separate from your HTML). If you have unknown data or hand-written HTML, it's harder to add and maintain.

Upside? Lots of plugins and tools our there to automate adding JSON-LD. (doesn't help me on scottnath.com because I do a lot of one-off HTML and I mostly don't use a CMS)

Microdata (spoiler alert: I chose this)

Microdata is added, mostly, by adding attributes to your HTML elements.

I added microdata across scottnath.com where using microdata made sense because I was writing the HTML in bespoke little components. It was fairly simple to add to my pages because I already use semantic HTML, and microdata expects your HTML to reflect the structure of your content.

It also doesn't add a whole lot of new stuff, for instance, the title on my articles went from this:

<h1>{title}</h1>

to this:

<h1 itemprop="headline">{title}</h1>

Pretty easy to add!

FYI - I initially started this work wanting to write HTML for JSON Resume which conformed more stringently to a resume's hierarchical content. HTML is a programming language after all. That work is coming in next article - a semantic JSON Resume!

How to add microdata to HTML

Did you know that HTML already has global attributes specifically for including microdata?

These attributes can go on most HTML elements and they follow a specific structure in their usage. The three you'll most often use are:

There is also itemref and itemid, but those won't be used in this tutorial.

itemscope and itemtype

These both are always together on a container element, although some examples show just using itemscope, but without itemtype the itemscope just denotes that everything contained within that element is related to the scope. For this learning, we only care about documenting our content for SEO, so we need itemtype to point to a type at schema.org to know what the itemscope is documenting.

Scope in this case, means a related chunk of content. When parsing microdata within an HTML element with the boolean itemscope attribute, it is expected that every itemprop inside that element falls under scope of itemscope.

itemtype is always a URL. In this case, a URL to a schema.org type. So, if you had an article with a title and summary, it would match the schema.org Article type and your HTML container would be like so:

<article itemscope itemtype="https://schema.org/Article">
  ...article contents
</article>
Enter fullscreen mode Exit fullscreen mode
  • itemscope is boolean, so it is never itemscope="meow"
  • itemtype denotes what type of content is being described by pointing to the documentation about that type

itemprop

itemprop adds properties to a scope.
e.g. it adds data that is a subset of it's main itemscope
e.g. if itemscope is an object, itemprop is a property on that object

Assuming this is the main article on a page, the title should be the top heading, an <h1>. The properties of an article are detailed on the schema.org Article type page From that type, we'll be using headline, a property inherited from Creative Work and expected to be Text (plain text) and described as "Headline of the article." With itemprop, our HTML becomes:

<article itemscope itemtype="https://schema.org/Article">
  <h1 itemprop="headline">An important article about things</h1>
</article>
Enter fullscreen mode Exit fullscreen mode

Adding in a main article image, we can document that as well. We'll be using image, which is part of Thing. Remember, the hierarchy goes Thing->Creative Work->Article in order to know the properties available.

<article itemscope itemtype="https://schema.org/Article">
  <h1 itemprop="headline">An important article about things</h1>
  <img src="https://example.com/image.jpg" itemprop="image" />
</article>
Enter fullscreen mode Exit fullscreen mode

What data do we know so far?

This would be the output when reading our microdata:

Article
  @type     Article
  headline  An important article about things
  image     https://example.com/image.jpg

itemprop which is an itemscope

This is the structure part of structured data.

Let's add an author! Under the Creative Work type, there is a property author (schema.org/author), but it must be either a Person type or Organization type. These can only be documented via an itemscope. The author will be a schema.org/Person, which means we can have lots of author-specific properties documented.

<article itemscope itemtype="https://schema.org/Article">
  <h1 itemprop="headline">An important article about things</h1>
  <img src="https://example.com/image.jpg" itemprop="image" />
  <p itemprop="author" itemscope itemtype="https://schema.org/Person">
    Written by 
    <a itemprop="url" href="https://example.com">
      <span itemprop="name">Scott Nath</span>,
      <span itemprop="jobTitle">Open Source Developer</span>
    </a>
  </p>
</article>
Enter fullscreen mode Exit fullscreen mode

What do we know from the above HTML?

Author has been added as it's own section and has sub-properties:

Article
  @type     Article
  headline  An important article about things
  image     https://example.com/image.jpg
  author
      @type     Person
      url       https://example.com/
      name      Scott Nath
      jobTitle  Open Source Developer

How to parse-out and validate your microdata

Sure you added microdata, but how do you check it?

There are two main validators, Schema.org's validator and Google's Rich Results Test.

They both accept a URL or a snippet of HTML, parse the HTML, then return whatever data it could find. The difference is Google's only parses a subset of the Schema.org vocabulary - the rest of your microdata will be ignored by Google. Both returned the same results for our HTML snippet from above.

Google Rich Results Test

search.google.com/test/rich-results

Google recommends that you start with the Rich Results Test to see what Google rich results can be generated for your page. You'll need to reference Google's list of structured data types they support, which is a subset of the Schema.org types. If the type you use ain't on Google's list, it ain't gonna get read by Google (the other search engines will read it tho.)

Screenshot image shows the Rich Results Test output for the Article HTML snippet

Schema.org validator

validator.schema.org

For generic schema validation, use the Schema Markup Validator to test all types of schema.org markup, without Google-specific validation.

Screenshot image shows the Schema.org validatory output for the Article HTML snippet

Not The End

So that's a high-level overview of how and why to add microdata to your HTML. I added microdata across my site but so far, my findings are inconclusive on how it helps with SEO. It takes a while for crawlers to fully index, so hopefully after I've written the article on JSON Resume with microdata, I'll have some kinda noticeable outcome. Stay tuned!

Top comments (0)