↑
Main Page

API for XML

Every XML document begins with the

XML prolog

, which is the first line in the previous code,

<?xml

version=”1.0”?>

. This line alone tells parsers and browsers that this file should be parsed based on

the XML rules discussed earlier. The second line,

<books>

, is the

document element

, which is the outer-

most start tag in the file (an

element

is considered the contents of a start tag and end tag). All other tags

must be contained within this one in order to constitute a valid XML file. The second line of the XML

file need not always contain the document element; it can come later if comments or other (???)

The third line in this sample file is a comment, which you may recognize as the same style comment

used in HTML. This is one of the syntax elements XML inherited from SGML.

A little bit farther down the page you find a

<desc>

tag with some special syntax inside it. The

<![CDATA[

]]>

code is used to indicate text that should not be parsed, allowing special characters such as less-than

and greater-than to be included without fear of breaking the XML syntax. The text must appear between

<![CDATA[

and

]]>

to be properly shielded from parsing. This is called a

Character Data Section

CData

Section

for short.

The following line is just before the second book definition:

<?page render multiple authors ?>

Even though this looks like the XML prolog, it is actually considered a different type of syntax called a

processing instruction

. The purpose of processing instructions (or PIs for short) is to provide extra infor-

mation to programs that are processing the page, such as XML parsers. PIs are generally free form. Their

only requirement is that a letter must follow the first question mark. After that point, a PI can contain

any sequence of characters aside from the less-than or greater-than symbols.

The most common PI is used to specify a style sheet for an XML file:

<?xml-stylesheet type=”text/css”” href=”MyStyles.css” ?>

This PI is typically placed immediately after the XML prolog and is used by Web browsers to display the

XML data using particular styles.

If you’re interested in learning more about XML and its many uses, consider picking up Beginning

XML, 3rd Edition (Wiley Publishing, Inc., ISBN 0-7645-7077-3).

An API for XML

After XML was defined as a language, the need arose for a way to both represent and manipulate XML

code using common programming languages such as Java.

First came the Simple API for XML (SAX) project for Java. SAX provides an event-based API to parse

XML. Essentially, SAX parsers start out at the beginning of the file and parse their way through the code

in one straight pass, firing events every time it encounters a start tag, end tag, attribute, text, or other

XML syntax. It is up to the developer, then, to determine what to do when each of these events occurs.

SAX parsers are lightweight and fast because they just parse the text and continue on their way. Their

main downside is the inability to stop, go backward, or access a specific part of the XML structure with-

out starting from the beginning of the file.

162

Chapter 6

09_579088 ch06.qxd 3/28/05 11:37 AM Page 162