Professional Search Engine Optimization (Seo). Developer’s Guide to SEO

Ajax software Free javascripts

Main Page

file in Google (

http://www.seroundtable.com/archives/003932.html

), so if the list gets too

long, it may be problematic.

Wildcard matching can be used to accomplish this as mentioned earlier in this chapter, but its use is not

standard.

However, in this case there is a solution. If you reverse the order of the parameters, such that the print-

friendly URLs look like

/products.php?print=1&product_id=<number>

, you can easily exclude

/products.php?print=1

robots.txt

In general, reordering parameters can make

robots.txt

more palatable for dynamic sites. However, in

the case of preexisting sites, it can involve changing your URLs, may involve redirects, and that may be

undesirable for many reasons. This topic was covered in Chapter 4.

When dealing with an entire directory, on static files, or, in general, cases where many fully qualified file

names have the same prefix, it is usually advisable to use

robots.txt

exclusion. Doing so is simpler

and reduces stress on your server as well as the robot. In cases where the “left-pattern-matching”

logic of a

robots.txt

exclusion will not work, a meta-exclusion will usually work. These methods

can complement each other, so feel free to mix and match them as you see fit.

Solutions for Commonly Duplicated Pages

So you’ve got the tools. Now where can you use them, and when are they appropriate? Sometimes the

solution is exclusion, other times there are more fundamental solutions addressing web site architecture.

And though there are an infinite number of causes for duplicate content, there are a number of common

culprits worth mentioning. Some of the most frequently observed are the following:

Print-friendly pages

Navigation links and breadcrumb navigation

Affiliate pages

Pages with similar content

Pages with duplicate

meta

tag or

title

values

URL canonicalization problems

Pages with URL-based session IDs

Print-Friendly Pages

One of the most common sources of duplicate content is the “print-friendly” page. A throwback from

the day where CSS did not provide a means to provide multiple media for formatting (print, screen,

and so on), many programmers simply provided two versions for every page — the standard one and

the printable one.

103

Chapter 5: Duplicate Content

c05.qxd:c05 10:41 103

Ajax software Free javascripts
→