When Google or another search engine does reach your page (using what is known as a “web spider” or “robot”) it will attempt to work out what the page is about. It will do so not only by using the heuristics previously discussed, but also by looking at several areas on the actual page:
- The page
title
- The content of the headings used on the page (
h1
,h2
, etc) - The content of the first paragraph, which may function as an introduction
- Unique words used repeatedly throughout the document and their proximity to each other. The words “Africa” and “drums” used several times on a page in close proximity to each other make it more likely that the page is about African drums.
- Ensure that descriptive
alt
values are placed on images - Microformatted address content, to try to determine where you are
- General semantic markup:
dfn
,abbr
, etc - Microdata, microformats, metatags and
rel
Following the practices and protocols we have talked about so far should create a good, well-structured, semantic page. It’s important to remember that any page that is linked to (either within your site or via exterior inbound links) may be reached directly, by both the search engine and visitor – so crafting an introduction, or at least a comfortable context for the user to understand what is going on at the start of each page, is important.
Enjoy this piece? I invite you to follow me at twitter.com/dudleystorey to learn more.