Search Engine Optimization Marketing (Seo)

traffic to a web site considerably, and it’s even possible that the requests will just be fulfilled too

slowly and the crawler will give up and go away.

If the crawler does go away, it will eventually return to try the task again. And it might try several

times before it gives up entirely. But if the site doesn’t eventually begin to cooperate with the crawler,

it’s penalized for the failures and your site’s search engine ranking will fall.

In addition, there are a few reasons you may not want a crawler indexing a page on your site:

Your page is under construction. If you can avoid it, you don’t want a crawler to index your

site while this is happening. If you can’t avoid it, however, be sure that any pages that are

being changed or worked on are excluded from the crawler’s territory. Later, when your

page is ready, you can allow the page to be indexed again.

Pages of links. Having links leading to and away from your site is an essential way to ensure

that crawlers find you. However, having pages of links seems suspicious to a search crawler,

and it may classify your site as a spam site. Instead of having pages that are all links, break

links up with descriptions and text. If that’s not possible, block the link pages from being

indexed by crawlers.

Pages of old content. Old content, like blog archives, doesn’t necessarily harm your

search engine rankings, but it also doesn’t help them much. One worrisome issue with

archives, however, is the number of times that archived content appears on your page.

With a blog, for example, you may have the blog appear on the page where it was origi-

nally displayed, and also have it displayed in archives, and possibly have it linked from

some other area of your site. Although this is all legitimate, crawlers might mistake multi-

ple instances of the same content for spam. Instead of risking it, place your archives off

limits to crawlers.

Private information. It really makes better sense not to have private information (or pro-

prietary information) on a web site. But if there is some reason that you must have it on

your site, then definitely block crawlers from access to it. Better yet, password-protect the

information so that no one can stumble on it accidently

There’s a whole host of reasons you may not want to allow a crawler to visit some of your web

pages. It’s just like allowing visitors into your home. You don’t mind if they see the living room,

dining room, den, and maybe the kitchen, but you don’t want them in your bedroom without

good reason. Crawlers are the guests in your Internet house. Be sure they understand the guide-

lines under which they are welcome.

What’s the Robot Exclusion Standard?

Because they do have the potential to wreak havoc on a web site, there has to be some kind of

guidelines to keep crawlers in line. Those guidelines are called the Robot Exclusion Standard, Robots

Exclusion Protocol, or robots.txt.

229

Robots, Spiders, and Crawlers

16 1 9:55 229