I’ve talked about encouraging search engines to index your site and the basics of in past blog entries; in this article I’ll be addressing the opposite: removing indexed pages from search engines, or avoiding being listed in the first place.

Why might you want to do such a thing? The most common reason is a site that has been significantly altered since the last time Google visited it: some pages removed, others altered in content. After Google visits a site it keeps a cached version of each page and image: it does not reflect “live” changes. Over time, the actual site and Google’s “memory” of your pages may become significantly out of sync with each other. As a result, Google may begin to provide search results with links to pages on your site that no longer exist, or are very different from what Google believes them to be. In other cases, you might want to avoid having a page indexed at all.

First, try to avoid the situation entirely by creating a Sitemap, an XML index of your site. This is particularly useful if your site’s content changes regularly: Sitemaps include information as to the expected frequency of those alterations, giving Google a guide as to how often it should return to index your site. (Note that Google is under no obligation to do so).

If you don’t wish a page to be listed by Google, add a metatag, robots.txt file or .htaccess command to instruct the “GoogleBot” (the search engine’s indexing service) not to look at a certain page, or to avoid a particular folder. (Note that this only stops the page from being listed by Google – it doesn't mean that the page can’t still be found on your site. Obscurity is not security).

If Google has already listed content you wish removed, use the Google Content Removal Request Tool. The tool can remove images, pages and sites, usually in 24 hours. You can also remove indexed content from both Bing and Yahoo using Bing Webmaster Tools.

Enjoy this piece? I invite you to follow me at twitter.com/dudleystorey to learn more.