Defining the language you are using on a HTML page makes it easier for browsers, speech synthesizers, search engines and online services to understand, present, and translate your content. Generally speaking, you should declare the language of a document by specifying the lang
attribute on the <html>
element:
<html lang="en">
(You’ll note that I use this approach in my HTML5 boilerplate example.)
The language code is a two-letter acronym that follows ISO 639-1, an international standard for defining every known human language. (You can find a complete list of 639-1 language codes at Wikipedia or IANA). If you are using a specific dialect or derivation of a language, you can also add a sub-code. Often, this sub-code follows the ISO-3166-1 specification (which, interestingly, matches the country’s assigned TLD):
<html lang="en-US">
(“US English”)
<html lang="fr-CA">
(“Canadian French”)
But the sub-code can be specified in other ways:
<html lang="en-cockney">
<html lang="X-klingon">
The W3C suggests that older methods, such as declaring the language in a pragma directive or specifying it in the site’s .htaccess
file, which you will find many references to online, should no longer be used.
Language declaration becomes interesting when you are mixing languages on a web page. You can define the fact that you are using multiple languages on the same page by separating the values for each language with commas:
<html lang="en,fr">
Alternatively, you could define a single default language, then switch to another language for a particular element:
<figure>
<blockquote lang="la">Idcirco, quod non animadvertit ea, quae in alius animo aguntur, non facile quisquam repertus est infelix; illos autem, qui sui animi motus non percipiunt, necesse est infelices esse.</blockquote>
<p>Through not observing what is in the mind of another a man has seldom been seen to be unhappy; but those who do not observe the movements of their own minds must of necessity be unhappy.</p>
<figcaption>Marcus Aurelius, The Meditations</figcaption>
</figure>
(Note that in this scenario lang
is only effective for the element it is applied to. Once that tag closes, the page goes back to assuming we are writing in the language specified in the html
tag. Also note that the directionality of language in HTML is a separate issue.)
In the past it’s been common for small sites with multiple region support to make a separate page for each language, significantly raising the amount of work required to create and maintain the site. Thankfully CSS has the lang
selector, which allows web developers to easily switch the appearance and visibility of different languages written on the same page.
Enjoy this piece? I invite you to follow me at twitter.com/dudleystorey to learn more.