To sanitize user input, you simply call the
function on the user-provided input. It
will strip any tags that are not in the variable
, as well as common attributes that can
over the input HTML, the cleverly constructed HTML would
. The event
is executed upon an error. Because the
does not exist (which causes an error), it executes the
, causing the redirection.
is replaced with
, and nothing occurs.
function does not typically return valid HTML. In practice, this does not matter,
because this function is really designed as a stopgap method to prevent spam. The modified HTML code
will not likely cause any problems in browsers or search engines, either. Eventually, the content would
be deleted or edited by the site owner anyway.
Having such “black hat” content within a web site can damage both the human as well as a search engine
rithms and may result in penalties and web site bans. It is therefore of the utmost importance to address
and mitigate these concerns.
Note that the nofollow library was not used in this latest example, but you could combine nofollow with
sanitize to obtain a better result, like this:
// display third comment
$inHTML = ‘<p>Sanitizing <img src=”INVALID-IMAGE”‘ .
$sanitized = noFollowLinks(sanitizeHTML($inHTML));
Lastly, your implementations — both
— will not exhaustively
attack, or allow the flexibility some programmers require. They do, however, make a spam-
mer’s life much more difficult, and he or she will likely proceed to an easier target. A project called safe-
html by Pixel-Apes is a more robust solution. It is open-source and written in PHP. You can find it at
Requesting Human Input
One common problem webmasters and developers need to consider are the automatic spam robots,
which submit comments on unprotected blogs or other web sites that support comments.
The typical solution to this problem is to use what is called a “CAPTCHA” image that requires the
visitor to read a graphical version of text with some sort of obfuscation. A typical human can read the
image, but an automated script cannot. This approach, however, unfortunately presents usability prob-
lems, because blind users can no longer access the functionality therein. For more information on this
type of CAPTCHA, visit
. An improvement on this
Chapter 8: Black Hat SEO
c08.qxd:c08 10:59 188