An extremely close up photograph of the face on a US ten dollar bill

For understandable reasons, people tend to get very confused by the new technologies used to add metadata to HTML5 pages: should you use microdata, microformats, metatags or RDFa? Right now, the best answer is “all four”, but that usually just adds to the confusion.

With a few caveats and reservations, it is possible to categorize the best use of each format:

  • If you are displaying contact, address, resume or event information, use microformats, which have better (and wider) support for such data.
  • If you’re tying your pages into Facebook’s Open Graph, wish to embed Dublin Core information in an HTML5 page, or are attempting to plot the relationships between pages, use RDFa.
  • For everything else, use microdata.

A Brief Overview

Adding microdata to web pages allows search engines to better understand and categorize your content while associating it with related data such as thumbnail images, author and summary information. Sometimes this information will be immediately visible to the user, while other microdata will be used to enhance search listings and sharing on social networks while otherwise being invisible to the reader.

There are a few things to remember about microdata schemas:

  • Looking at the possibilities of a microdata schema such as BlogPosting can be somewhat overwhelming. It’s important to understand that you don’t have to use every last microdata property on every page. While there are a few required properties in each schema, most are optional, and can be added to the page as content grows and changes and you become more familiar with the format.
  • Content must be inside an appropriate container in order for a microformat schema to work. If you’re using HTML5 well, you’ll usually have these containers already in your markup: blog postings should be wrapped in <article> tags, for example. On occasion you may need to add <div> and <span> elements, but these should be kept to an absolute minimum: you should use the existing semantic markup as a framework to build on microdata whenever possible.
  • Content that does not appear on the page but needs to appear in microdata is marked up as metatags. Rather confusingly, these metatags are inside the associated content, and appear in the <body>, not the <head>, of the document.
  • Not every microdata schema is supported yet.You’ll find good search engine response to some (Recipes, BlogPostings) but not others, as Google hasn’t yet integrated them into its search algorithms. This will change over time.

Let’s take a look at a typical blog post that uses HTML5 for basic markup:

<article>
	<header>
		<img src="arabic-manuscript.jpg" alt="Arabic manuscript">
	</header>
	<h1><a href="http://goldenage.com/maths.html">
Mathematics in The Muslim Golden Age</a></h1>
	<p>Dudley Storey / <time datetime="2012-11-28">28 November 2012
	</time></p>
	<p>During the faith’s rapid military expansion out of Arabia and Persia 
	in the 8th to 12th centuries, Muslim thinkers preserved and 
	expounded on the discoveries of the Greeks and Romans.
</article>

The article already has good semantic structure, with the heading wrapped with a permalink to the article URL. However, we need to make that association concrete for search engines (“this isn’t a random link: this is the URL you should associate with this article”), as well as provide information on who the author is and the date of publication. Right now, that information is obvious to human readers, but not to search engines: the date in the article could be a reference to anything, and the author’s name is simply a string of characters.

Let’s start by creating a context for the microdata by adding itemscope and itemtype attributes to the parent article element:

<article itemscope itemtype=http://schema.org/BlogPosting>

All the microdata information we add must be within the context of this itemscope, i.e. within the article tag.

<article itemscope itemtype="http://schema.org/BlogPosting">
	<header>
		<img src="http://golden-age.com/images/arabic-manuscript.jpg" alt="Arabic mathematical manuscript"  itemprop="image">
	</header>
	<h1 itemprop="name"><a href="http://goldenage.com/maths.html" itemprop="url">Mathematics in The Muslim Golden Age</a></h1>
	<meta itemprop="description" content="From the 8th to 13th centuries Muslim scholars  made significant advancements in mathematics, including linear algebra">
	<p><span itemprop="author">Dudley Storey</span> / 
	<time datetime="2012-11-28" itemprop="datePublished">
	28 November 2012
	</time></p>
	<div itemprop="articleBody">
		<p>During the faith’s rapid expansion out of Arabia and Persia in the 
		8th to 12th centuries, Muslim thinkers preserved and expounded on the 
		discoveries of the Greeks and Romans.
	</div>
</article>

There are a few important points to note:

  • First, URLs we provide in microdata must be absolute, and contain the full URL. This includes the value for the image src attribute.
  • Because the short description does not appear on the page itself, I have added it as a metatag.
  • I’ve only added to the markup in two places: once around the main content of the article itself, to demark the body text, and once around the author’s name. In both cases, this was necessary because no other semantic structure existed immediately around the content with which I could directly hook the itemprop attributes into.

Passing the completed code through Google’s Structured Data Testing Tool (formerly Rich Snippets) shows just how much more information and context microdata provides.

There is much more that we could do here, but I would suggest that this is the minimum amount of microdata that should be added to a typical blog post.

Photograph by Eli Christman, used under a Creative Commons Attribution 2.0 Generic license

Enjoy this piece? I invite you to follow me at twitter.com/dudleystorey to learn more.