Search Engine Optimization Marketing (Seo)

Debugger script Freeware javascript editor

Main Page

The file robots.txt is the actual element that you’ll work with. It’s a text-based document that should

be included in the root of your domain, and it essentially contains instructions to any crawler that

comes to your site about what they are and are not allowed to index.

To communicate with the crawler, you need a specific syntax that it can understand. In its most

basic form, the text might look something like this:

User-agent: *

Disallow: /

These two parts of the text are essential. The first part,

User-agent:

, tells a crawler what user

agent, or crawler, you’re commanding. The asterisk (

) indicates that all crawlers are covered, but

you can specify a single crawler or even multiple crawlers.

The second part,

Disallow:

, tells the crawler what it is not allowed to access. The slash (

) indi-

cates “all directories.” So in the preceding code example, the robots.txt file is essentially saying that

“all crawlers are to ignore all directories.”

When you’re writing robots.txt, remember to include the colon (

) after the

User-agent

indicator

and after the

Disallow

indicator. The colon indicates that important information follows to which

the crawler should pay attention.

You won’t usually want to tell all crawlers to ignore all directories. Instead, you can tell all crawlers

to ignore your temporary directories by writing the text like this:

User-agent: *

Disallow: /tmp/

Or you can take it one step further and tell all crawlers to ignore multiple directories:

User-agent: *

Disallow: /tmp/

Disallow: /private/

Disallow: /links/listing.html

That piece of text tells the crawler to ignore temporary directories, private directories, and the web

page (title Listing) that contains links — the crawler won’t be able to follow those links.

One thing to keep in mind about crawlers is that they read the robots.txt file from top to bottom

and as soon as they find a guideline that applies to them, they stop reading and begin crawling your

site. So if you’re commanding multiple crawlers with your robots.txt file, you want to be careful how

you write it.

This is the wrong way:

User-agent: *

Disallow: /tmp/

230

Optimizing Search Strategies

Part III

16 1 9:55 230

Debugger script Freeware javascript editor
→