URL-Based Session IDs
URL-based session management causes major problems for search engines, because each time a search
engine spiders your web site, it will receive a different session ID and hence a new set of URLs with the
same content. Needless to say, this creates an enormous amount of duplicate content. The PHP feature
that automatically tracks user sessions using a query string parameter is named trans_sid. You can
disable this feature, and permit only cookie-based session support.
To turn off URL-based session IDs, you’d need to add these lines to your
php_value session.use_only_cookies 1
php_value session.use_trans_sid 0
The same effect can be achieved using this PHP code:
// store the session ID using cookies
@ini_set (‘session.use_only_cookies’, 1);
// disable trans_sid
@ini_set (‘session.use_trans_sid’, 0);
The URL factory you created in Chapter 3, together with the redirect library from Chapter 4, can be used
to redirect any URLs that contain the session ID to the “proper” versions of the URLs in case this feature
was inadvertently left on and such URLs are indexed by a search engine.
Chapter 11 also discusses a method using cloaking that dynamically turns URL-based session IDs off for
search engines, but leaves it on for human users.
Other Navigational Link Parameters
In general, parameters in URLs such as those that indicate that a user came from a particular page can
create a large amount of duplicate content. Covering all examples would be impossible, but consider the
following imaginary URLs:
The list could get quite long depending on how many pages link to that product. In practice, whenever
possible, it may be advisable to use session-based or HTTP_REFERER-based data to track things such as
these, even if imperfect solutions.
URL-based sessions may be important on large e-commerce sites, or in certain demo-
graphics, because a small number of users disable cookies in their browsers.
Chapter 5: Duplicate Content
c05.qxd:c05 10:41 107