API for XML
Every XML document begins with the
, which is the first line in the previous code,
. This line alone tells parsers and browsers that this file should be parsed based on
the XML rules discussed earlier. The second line,
, is the
, which is the outer-
most start tag in the file (an
is considered the contents of a start tag and end tag). All other tags
must be contained within this one in order to constitute a valid XML file. The second line of the XML
file need not always contain the document element; it can come later if comments or other (???)
The third line in this sample file is a comment, which you may recognize as the same style comment
used in HTML. This is one of the syntax elements XML inherited from SGML.
A little bit farther down the page you find a
tag with some special syntax inside it. The
code is used to indicate text that should not be parsed, allowing special characters such as less-than
and greater-than to be included without fear of breaking the XML syntax. The text must appear between
to be properly shielded from parsing. This is called a
Character Data Section
The following line is just before the second book definition:
<?page render multiple authors ?>
Even though this looks like the XML prolog, it is actually considered a different type of syntax called a
. The purpose of processing instructions (or PIs for short) is to provide extra infor-
mation to programs that are processing the page, such as XML parsers. PIs are generally free form. Their
only requirement is that a letter must follow the first question mark. After that point, a PI can contain
any sequence of characters aside from the less-than or greater-than symbols.
The most common PI is used to specify a style sheet for an XML file:
<?xml-stylesheet type=”text/css”” href=”MyStyles.css” ?>
This PI is typically placed immediately after the XML prolog and is used by Web browsers to display the
XML data using particular styles.
If you’re interested in learning more about XML and its many uses, consider picking up Beginning
XML, 3rd Edition (Wiley Publishing, Inc., ISBN 0-7645-7077-3).
An API for XML
After XML was defined as a language, the need arose for a way to both represent and manipulate XML
code using common programming languages such as Java.
First came the Simple API for XML (SAX) project for Java. SAX provides an event-based API to parse
XML. Essentially, SAX parsers start out at the beginning of the file and parse their way through the code
in one straight pass, firing events every time it encounters a start tag, end tag, attribute, text, or other
XML syntax. It is up to the developer, then, to determine what to do when each of these events occurs.
SAX parsers are lightweight and fast because they just parse the text and continue on their way. Their
main downside is the inability to stop, go backward, or access a specific part of the XML structure with-
out starting from the beginning of the file.
09_579088 ch06.qxd 3/28/05 11:37 AM Page 162