We humans often find it frustrating to listen to people repeat themselves. Likewise, search engines
are “frustrated” by web sites that do the same. This problem is called
, which is
defined as web content that is either exactly duplicated or substantially similar to content located
at different URLs. Duplicate content clearly does not contain anything
This is important to realize. Originality is an important factor in the human perception of value,
and search engines factor such human sentiments into their algorithms. Seeing several pages of
duplicated content would not please the user. Accordingly, search engines employ sophisticated
algorithms that detect such content and filter it out from search engine results.
Indexing and processing duplicate content also wastes the storage and computation time of a search
engine in the first place. Aaron Wall of
states that “if pages are too sim-
ilar, then Google [or other search engines] may assume that they offer little value or are of poor con-
tent quality.” A web site may not get spidered as often or as comprehensively as a result. And though
it is an issue of contention in the search engine marketing community as to whether there is an
penalty applied by the various search engines, everyone agrees that duplicate content can
Knowing this, it would be wise to eliminate as much duplicate content as possible from a web site.
This chapter documents the most common causes of duplicate content as a result of web site archi-
tecture. It then proposes methods to eliminate or remove it from a search engine’s view. You will:
Understand the potential negative effects of duplicate content.
Examine the most common types of duplicate content.
Learn how to exclude duplicate content using
Use PHP code to properly implement an affiliate program.
A common question asked by search engine marketers is “how much duplicate content is too
much?” There is no good answer to that question, as you may have predicted. It is best to simply
take the conservative approach of eliminating as much of it as possible.
c05.qxd:c05 10:40 95