Search Engine Optimization Marketing (Seo)

User-agent: CrawlerName

Disallow: /tmp/

Disallow: /links/listing.html

This bit of text tells crawlers first that all crawlers should ignore the temporary directories. So

every crawler reading that file will automatically ignore the temporary files. But you’ve also told a

specific crawler (indicated by CrawlerName) to disallow both temporary directories and the links

on the Listing page. The problem is, the specified crawler will never get that message because it

has already read that all crawlers should ignore the temporary directories.

If you want to command multiple crawlers, you need to first begin by naming the crawlers you want

to control. Only after they’ve been named should you leave your instructions for all crawlers. Written

properly, the text from the preceding code should look like this:

User-agent: CrawlerName

Disallow: /tmp/

Disallow: /links/listing.html

User-agent: *

Disallow: /tmp/

If you have certain pages or links that you want the crawler to ignore, you can accomplish

this without causing the crawler to ignore a whole site or a whole directory or having to

put a specific meta tag on each page.

Each search engine crawler goes by a different name, and if you look at your web server log, you’ll

probably see that name. Here’s a quick list of some of the crawler names that you’re likely to see in

that web server log:

Google: Googlebot

MSN: MSNbot

Yahoo! Web Search: Yahoo SLURP or just SLURP

Ask: Teoma

AltaVista: Scooter

LookSmart: MantraAgent

WebCrawler: WebCrawler

SearchHippo: Fluffy the Spider

These are just a few of the search engine crawlers that might crawl across your site. You can find a

complete list along with the text of the Robots Exclusion Standard document on the Web Robots

Pages (

www.robotstxt.org

). Take the time to read the Robots Exclusion Standard document.

It’s not terribly long, and reading it will help you understand how search crawlers interact with

your web site. That understanding can also help you learn how to control crawlers better when

they come to visit.

NOTE

231

Robots, Spiders, and Crawlers

16 1 9:55 231