To prevent this, the following meta tag should be added to the
section of all cloaked documents:
<meta name=”robots” content=”noarchive” />
If you are cloaking only for a specific spider, you can use a tag like the following. (This can also be applied
, as shown in Chapter 5.)
<meta name=”googlebot” content=”noarchive” />
This prevents the cache from being stored or displayed to users. The
New York Times
also notably uses
this tag to prevent people from reading its content through the search engines’ cache.
In this upcoming exercise you’re implementing a simple cloaking library, in the form of a class named
. This class will have two functions that you can access from your web applications:
updates your cloaking database with search engine IP and user agent data
verifies if the visitor is a search engine spider
The cloaking data is retrieved from Dan Kramer ’s
. Kudos to Dan to providing such a use-
ful set of data for everyone to use!
To test the
library, you’ll create a script named
, which will have the
output shown in Figure 11-1 if read by a “normal” visitor, and the output shown in Figure 11-2 when
read by a search engine.
Chapter 11: Cloaking, Geo-Targeting, and IP Delivery
c11.qxd:c11 11:01 223