<meta name=”robots” content=”index,nofollow”>
<meta name=”robots” content=”noindex,nofollow”>
The major difference between robots.txt and robots meta tags is that with the meta tags you cannot
specify which crawlers you’re targeting. It’s an all or nothing tag, so you either command all of the
crawlers to behave in a certain way, or you command none of them. It’s not as precise as robots.txt,
but if you don’t have access to your web server, it’s a good alternative.
Unfortunately, not all search engines recognize the robots.txt file or the robots meta
tags. So in some cases, you have no control at all over what the crawler examines on
your site. However, more search engines seem to be allowing these commands to help classify the
Web more efficiently.
Search engine crawlers can help your site get indexed so that it appears in search results. But they
can also cause problems with your site if they don’t follow the guidelines outlined in the Robot
Exclusion Standard or if your site is not stable enough to support the way the crawler examines it.
Knowing how to control the way that search engines crawl your site can help to assure that your
site is always at its shiny best (or at least appears to the search crawler to be). It won’t necessarily
give you complete control of all the crawlers on the Web, but it will help with some of them.
Inclusion with XML Site Mapping
You may remember that back in Chapter 3 there was a brief mention of XML site mapping. It’s time
to revisit XML site mapping so that you understand how it can help your web site.
XML site mapping is actually the companion to the Robots Exclusion Protocol. It’s an inclusion
protocol — a method by which you can tell crawlers what is available for them to index. In its
most basic form, an XML site map is a file that lists all the URLs for a web site. This file allows
webmasters to include additional information about each URL such as the date the URL was last
updated, how often the URL changes, and how important the URL is in relation to the other
pages on the site.
The XML site map is used to ensure that crawlers can find certain pages on your web site, like
dynamic pages. The site-map file can be placed in the robots.txt file or you can submit it directly
to a search engine. Doing either of these, however, is not guaranteed to get your site indexed by
the search engine, nor will it get the search engine to index your site any sooner.
An XML site map also does not guarantee that all of the pages on your site will be indexed. It’s simply
a guide which the crawler can use to find pages that it might otherwise miss.
Creating an XML site map is the first step to including it in your robots.txt file or to submitting it
to a search engine. There are many sites on the Internet offering applications that will help you
create your site map. For example, Google offers a site-map generator that will help you create
your site map, once you’ve downloaded and installed the required software. But Google isn’t the
only game in town. There are dozens of other site-map generators that work just as well.
Robots, Spiders, and Crawlers
16 1 9:55 233