file in Google (
), so if the list gets too
long, it may be problematic.
Wildcard matching can be used to accomplish this as mentioned earlier in this chapter, but its use is not
However, in this case there is a solution. If you reverse the order of the parameters, such that the print-
friendly URLs look like
, you can easily exclude
In general, reordering parameters can make
more palatable for dynamic sites. However, in
the case of preexisting sites, it can involve changing your URLs, may involve redirects, and that may be
undesirable for many reasons. This topic was covered in Chapter 4.
When dealing with an entire directory, on static files, or, in general, cases where many fully qualified file
names have the same prefix, it is usually advisable to use
exclusion. Doing so is simpler
and reduces stress on your server as well as the robot. In cases where the “left-pattern-matching”
logic of a
exclusion will not work, a meta-exclusion will usually work. These methods
can complement each other, so feel free to mix and match them as you see fit.
Solutions for Commonly Duplicated Pages
So you’ve got the tools. Now where can you use them, and when are they appropriate? Sometimes the
solution is exclusion, other times there are more fundamental solutions addressing web site architecture.
And though there are an infinite number of causes for duplicate content, there are a number of common
culprits worth mentioning. Some of the most frequently observed are the following:
Navigation links and breadcrumb navigation
Pages with similar content
Pages with duplicate
URL canonicalization problems
Pages with URL-based session IDs
One of the most common sources of duplicate content is the “print-friendly” page. A throwback from
the day where CSS did not provide a means to provide multiple media for formatting (print, screen,
and so on), many programmers simply provided two versions for every page — the standard one and
the printable one.
Chapter 5: Duplicate Content
c05.qxd:c05 10:41 103