Causes and Effects of Duplicate Content
You know duplicate content can have a negative effect on web site rankings. But how do you examine
whether a particular web site exhibits this problem, and how do you mitigate or avoid it?
To begin, you can divide duplicate content into two main categories:
Duplicate content as a result of site architecture
Duplicate content as a result of content theft
These are discussed separately, because they are essentially completely different problems.
Duplicate Content as a Result of Site Architecture
Some examples of site architecture itself leading to duplicate content are as follows:
Pages with substantially similar content that can be accessed via different URLs
Pages with items that are extremely similar, such as a series of differently colored shirts in
an e-commerce catalog having similar descriptions
Pages that are part of an improperly configured affiliate program tracking application
Pages with duplicate title or
Using URL-based session IDs
All of these scenarios are discussed at length in this chapter.
To look for duplicate content as a result of site architecture, you can use a “
to examine the URLs of a web site that a search engine has indexed. All major search engines (Google,
Yahoo!, Microsoft Live Search) support this feature. Usually this will reveal quickly if, for example, “print-
friendly” pages are being indexed.
Google frequently places content it perceives as duplicate content in the “supplemental index.” This is
noted at the bottom of a search engine result with the phrase “supplemental result.” If your web site has
many pages in the supplemental index, it may mean that those pages are considered duplicate content —
at least by Google. Investigate several pages of URLs if possible, and look for the aforementioned cases.
Look especially at the later pages of results. It is extremely easy to create duplicate content problems with-
out realizing it, so viewing from the vantage point of a search engine may be useful.
Duplicate Content as a Result of Content Theft
Content theft creates an entirely different problem. Just as thieves can steal tangible goods, they can also
steal content. This, unsurprisingly, is the reason why it is called content theft. It creates a similar problem
for search engines, because they strive to filter duplicate content from search results — across different web
Chapter 5: Duplicate Content
c05.qxd:c05 10:40 96