Search Engine Optimization Marketing (Seo)

The major difference between robots.txt and robots meta tags is that with the meta tags you cannot

specify which crawlers you’re targeting. It’s an all or nothing tag, so you either command all of the

crawlers to behave in a certain way, or you command none of them. It’s not as precise as robots.txt,

but if you don’t have access to your web server, it’s a good alternative.

Unfortunately, not all search engines recognize the robots.txt file or the robots meta

tags. So in some cases, you have no control at all over what the crawler examines on

your site. However, more search engines seem to be allowing these commands to help classify the

Web more efficiently.

Search engine crawlers can help your site get indexed so that it appears in search results. But they

can also cause problems with your site if they don’t follow the guidelines outlined in the Robot

Exclusion Standard or if your site is not stable enough to support the way the crawler examines it.

Knowing how to control the way that search engines crawl your site can help to assure that your

site is always at its shiny best (or at least appears to the search crawler to be). It won’t necessarily

give you complete control of all the crawlers on the Web, but it will help with some of them.

Inclusion with XML Site Mapping

You may remember that back in Chapter 3 there was a brief mention of XML site mapping. It’s time

to revisit XML site mapping so that you understand how it can help your web site.

XML site mapping is actually the companion to the Robots Exclusion Protocol. It’s an inclusion

protocol — a method by which you can tell crawlers what is available for them to index. In its

most basic form, an XML site map is a file that lists all the URLs for a web site. This file allows

webmasters to include additional information about each URL such as the date the URL was last

updated, how often the URL changes, and how important the URL is in relation to the other

pages on the site.

The XML site map is used to ensure that crawlers can find certain pages on your web site, like

dynamic pages. The site-map file can be placed in the robots.txt file or you can submit it directly

to a search engine. Doing either of these, however, is not guaranteed to get your site indexed by

the search engine, nor will it get the search engine to index your site any sooner.

An XML site map also does not guarantee that all of the pages on your site will be indexed. It’s simply

a guide which the crawler can use to find pages that it might otherwise miss.

Creating an XML site map is the first step to including it in your robots.txt file or to submitting it

to a search engine. There are many sites on the Internet offering applications that will help you

create your site map. For example, Google offers a site-map generator that will help you create

your site map, once you’ve downloaded and installed the required software. But Google isn’t the

only game in town. There are dozens of other site-map generators that work just as well.

NOTE

233

Robots, Spiders, and Crawlers

16 1 9:55 233