JavaScript Editor Ajax software     Free javascripts 



Main Page

file in Google (
http://www.seroundtable.com/archives/003932.html
), so if the list gets too
long, it may be problematic.
Wildcard matching can be used to accomplish this as mentioned earlier in this chapter, but its use is not
standard.
However, in this case there is a solution. If you reverse the order of the parameters, such that the print-
friendly URLs look like
/products.php?print=1&product_id=<number>
, you can easily exclude
/products.php?print=1
in
robots.txt
.
In general, reordering parameters can make
robots.txt
more palatable for dynamic sites. However, in
the case of preexisting sites, it can involve changing your URLs, may involve redirects, and that may be
undesirable for many reasons. This topic was covered in Chapter 4.
When dealing with an entire directory, on static files, or, in general, cases where many fully qualified file
names have the same prefix, it is usually advisable to use
robots.txt
exclusion. Doing so is simpler
and reduces stress on your server as well as the robot. In cases where the “left-pattern-matching”
logic of a
robots.txt
exclusion will not work, a meta-exclusion will usually work. These methods
can complement each other, so feel free to mix and match them as you see fit.
Solutions for Commonly Duplicated Pages
So you’ve got the tools. Now where can you use them, and when are they appropriate? Sometimes the
solution is exclusion, other times there are more fundamental solutions addressing web site architecture.
And though there are an infinite number of causes for duplicate content, there are a number of common
culprits worth mentioning. Some of the most frequently observed are the following:
?
Print-friendly pages
?
Navigation links and breadcrumb navigation
?
Affiliate pages
?
Pages with similar content
?
Pages with duplicate
meta
tag or
title
values
?
URL canonicalization problems
?
Pages with URL-based session IDs
Print-Friendly Pages
One of the most common sources of duplicate content is the “print-friendly” page. A throwback from
the day where CSS did not provide a means to provide multiple media for formatting (print, screen,
and so on), many programmers simply provided two versions for every page — the standard one and
the printable one.
103
Chapter 5: Duplicate Content
c05.qxd:c05 10:41 103


JavaScript Editor Ajax software     Free javascripts