WebVTT is associated with subtitling HTML5 video, but the spec contains many additional features that aren’t related to dialogue. One of the most useful of these is the ability to read and link to video cue points. Creating WebVTT chapter tracks deepens viewer’s engagement with more informative content, while adding invaluable accessibility and UI features. Best of all, they’re easy to create.

New Chapters For An Old Book

Anyone comfortable with a tool like iMovie can add chapter cue points to video in minutes. Created in an application like Subs Factory and exported as a .srt file, then converted to .vtt format, the cue points will look something like this:

00:00:06.254 --> 00:00:11.758
Central downtown Los Angeles
00:00:11.882 --> 00:00:15.184
Rosslyn Hotel

WebVTT information can be “in-band” (wrapped up as part of the video file itself) or “out-of-band” (held as a separate file). Currently, all modern browsers support the latter.

Linking Time & Space

The video is added to the page with some supporting markup:

	<video controls id="downtown-los-angeles">
		<source src="downtown-los-angeles.mp4">
		<source src="downtown-los-angeles.webm">
		<track kind="chapters" label="Locations" src="downtown-los-angeles-locations.vtt" srclang="en" default onload="displayChapters(this)">                
		<ol id="chapters">

A more considerate approach would be to add this markup to the page via JavaScript if the browser supported HTML5 video and WebVTT. Since the majority of browsers now do both, I’ve chosen to place the markup directly on the page.

The ordered list in the markup will be filled with list items containing the chapter descriptions from the .vtt file, with the description for the current scene highlighted. We’ll need some CSS for that, together with declarations for displaying the video and UI:

figure {
	font-size: 0;
	position: relative;
	background: #000;
figure video {
	width: 75%;
	height: auto;
	display: inline-block;
figure figcaption {
	position: absolute;
	right: 0; top: 0;
	background: #222;
	width: 25%;
	font-size: .8rem;
	color: #666;
	height: 100%;
	overflow-y: scroll;
figure figcaption ol {
	position: relative;
	list-style-type: none;
	margin: 0; padding: 0;
figure figcaption ol li {
	padding: .7rem 1rem;
	border-bottom: 1px dashed #000;
	transition: .3s;
.current {
	background: hsl(45,80%,50%);
	color: #000;

Eventually I expect browsers to develop a native UI for dealing with chapter navigation, but none yet has this feature. Currently Chrome will indicate that a video with chapter information has closed captioning, but only display the captions as subtitles. So our first job will be to hide this display, then create the UI for handling chapters. This is done by adding a script at the bottom of the page in a single function, called when the .vtt file is loaded:

function displayChapters(trackElement){
	if ((trackElement) && (textTrack = trackElement.track)){
		if (textTrack.kind === "chapters"){
			textTrack.mode = 'hidden';
			for (var i = 0; i < textTrack.cues.length; ++i) {
				cue = textTrack.cues[i],
				chapterName = cue.text,
				start = cue.startTime,
				newLocale = document.createElement("li");
				newLocale.setAttribute('id', start);
				var localeDescription = document.createTextNode(cue.text);
				function() {
					video.currentTime = this.id;
			function() {
				var currentLocation = this.activeCues[0].startTime;
				if (chapter = document.getElementById(currentLocation)) {
					var locations = [].slice.call(document.querySelectorAll("#chapters li"));
					for (var i = 0; i < locations.length; ++i) { 
					locationList.style.top = "-"+chapter.parentNode.offsetTop+"px";

In order, the script does the following:

  • Checks that the chapters file exists, and hides it in the browser. This is the presumptuous part, as IE doesn’t yet support the chapter display changes we’re about to make; all other modern browsers do, however.
  • Reads the cue times and descriptions from the file and adds them as list items to the page, with each list item taking the time mark of the scene it corresponds to as an id value.
  • Each of the items is supplied with a click event that will force the video to the matching time.
  • WebVTT chapters includes the cuechange event, responding to the moment that a video has moved forward or backward through a cue’s time range. When this occurs, the cue that corresponds to the current time is highlighted.
  • The current cue is moved to the top of the list.

That’s the core of it: you can see more detail in the CodePen repo. The cues also highlight when playing or scrubbing through the video; by clicking on a description, you can also jump to any point.

Improvements & Conclusion

I think that rich native web video is one of the most exciting features in HTML5: not only does it make video accessible (as the descriptions can be read by a screen reader for the blind) but it has the power to make video far more engaging. I see so many future uses for chapter tracks, from pop-up video captions that provide fun facts to truly deep links. It would be relatively easy to link each of the descriptions to further resources, for example, or to add geographical information in the .vtt file that could drive a live map on the page. I’ll explore those possibilities, and more, in future articles.

The stunning quadcopter movie of downtown LA is by Ian Wood, used with permission; locations crowd-sourced via Metafilter and Vimeo.

Enjoy this piece? I invite you to follow me at twitter.com/dudleystorey to learn more.
Check out the CodePen demo for this article at https://codepen.io/dudleystorey/pen/pvPoPq