To include the content from a different server using cURL, you could do something like this:
$ch = curl_init();
Removing Deleted Pages Using 404
Everyone with “fat fingers” has seen the 404 status code at some point. It means that the URL you
requested does not exist. However, there are a few technical details related to this status code that are
First of all, it’s less understood that along with a 404 status code, the web server can also deliver any
HTML content — just like it does with the 200 status code. Indeed, people usually associate 404 with the
generic Apache error page; but this is not necessarily the case. Some web sites customize their 404 pages
to enhance the user experience. Advanced web sites may even try to give the visitors suggestions as to
what they might have meant based on the keywords in the invalid URL.
Regardless of whether a 404 page is generic or custom, it always tells search engines the page does not
exist; and if so, that it should be removed from the index.
For a static site, presenting a 404 error is automatic — simply delete the file. Unfortunately, many dynamic
sites abandon the concept of 404s, because it takes some extra effort to implement. Typically when a prod-
uct is deleted from a database, the product’s page is no longer linked from the other pages of the web site.
The product’s page may, however, be linked from pages of external web sites, have acquired link equity,
and remain indexed by search engines.
The worst thing you can do is return a blank page with a 200 status code — as happens often when a
product ID no longer exists in a database. This will result in a number of blank pages indexed by a
search engine over time, resulting in duplicate content. Instead, you should return a 404 status code,
perhaps with a friendly error message as well.
A common mistake is to deliver a “page not found” message that is meant to handle
404, but with a 200 status code instead. Web hosting services often allow setting a cus-
tom 404 page — that is, the page that is to be fed when a non-existent URL is requested.
However, they may not set the 404 status code correctly. This can result in a theoreti-
cally infinite number of duplicate pages in your web site. You can verify that the
correct headers are sent using the tools cited earlier in this chapter.
Search engines never index a page that arrives with the 404 status code.
Chapter 4: Content Relocation and HTTP Status Codes
c04.qxd:c04 10:40 83