Note that we’ve formatted the HTML code in Figure 8-6 manually in the file for better clarity. Web
browsers read the HTML code in the same way regardless of how it’s formatted.
The script gracefully handles modifying the
attribute if it already exists. Your
include the host of the current site, or other sites that you you’re happy to link to. This allows fully quali-
fied internal links to work as they should. It also does not touch any link that does not start with
because those links, by definition, are from the current site.
Using the “nofollow library” is very simple. Instead of displaying content that may contain a link as-is,
you should filter it through the
function, as you did in
// display first comment
echo noFollowLinks(‘<p>Hello! Take a look at <a
// display second comment
echo noFollowLinks(‘<p>We\‘ve just released our new product, <a href=”http://@@ta
For this to work properly, you need to define the “white list,” which is the list of allowed hosts, in
. The exercise only defined
// define array of accepted links
$GLOBALS[‘whitelist’] = array(‘seophp.example.com’, ‘www.seoegghead.com’);
The logic of the code in the function is pretty clear, until you get into the details of the regular expres-
sions involved, which are more complex than those from the previous chapters. We leave understanding
the code to you as an exercise. If you haven’t already, you should read Chapter 3 for a practical introduc-
tion to regular expressions. Appendix A is an even more friendly and thorough introduction to regular
Sanitizing User Input
A similar problem exists with regard to any user-provided content, such as blog comments, guest books,
and forum posts. In that case as well, you must take care to remove any potentially malicious content.
There are two approaches to achieving this.
You can entirely disable HTML by escaping it as you did in the exercise with
Here’s an example:
Chapter 8: Black Hat SEO
c08.qxd:c08 10:59 184