XML, or the eXtensible Markup Language to give it its full title, is a system and hardware independent language for expressing data and its structure within an XML document. An XML document is a Unicode text file that contains the data together with markup that defines the structure of the data. Because an XML document is a text file, you can create XML using any plain text editor, although an editor designed for creating and editing XML will obviously make things easier. The precise definition of XML is in the hands of the World Wide Web Consortium (W3C), and if you want to consult the XML 1.0 specification, you can find it at http://www.w3.org/XML.
The term 'markup' derives from a time when the paper draft of a document to be printed was marked up by hand to indicate to the typesetter how the printed form of the document should look. Indeed the ancestry of XML can be traced back to a system that was originally developed by IBM in the 1960s to automate and standardize markup for system reference manuals for IBM hardware and software products. XML markup looks similar to HTML in that it consists of tags and attributes added to the text in a file. However, the superficial appearance is where the similarity between XML and HTML ends. XML and HTML are profoundly different in purpose and capability.
Firstly, although an XML document can be created, read, and understood by a person, XML is primarily for communicating data from one computer to another. XML documents will therefore more typically be generated and processed by computer programs. An XML document defines the structure of the data it contains so a program that receives it can properly interpret it. Thus XML is a tool for transferring information and its organization between computer programs. The purpose of HTML on the other hand is solely the description of how data should look when it is displayed or printed. The only structuring information that generally appears in an HTML document relates to the appearance of the data as a visible image. The purpose of HTML is data presentation.
Secondly, HTML provides you with a set of tags that is essentially fixed and geared to the presentation of data. XML is a language in which you can define new sets of tags and attributes to suit different kinds of data – indeed to suit any kind of data including your particular data. Because XML is extensible, it is often described as a meta-language – a language for defining new languages in other words. The first step in using XML to exchange data is to define the language that you intend to use for that purpose - in XML.
Of course, if I invent a set of XML markup to describe data of a particular kind, you will need to know the rules for creating XML documents of this type if you want to create, receive, or modify them. As we shall see, the definition of the markup that has been used within an XML document can be included as part of the document. It also can be provided as a separate entity, in a file identified by a URI for instance, that can be referenced within any document of that type. The use of XML has already been standardized for very diverse types of data. There are XML languages for describing the structures of chemical compounds and, musical scores, as well as plain old text such as in this book.
The Java API for XML Processing (JAXP) provides you with the means for reading, creating, and modifying XML documents from within your Java programs. In order to understand and use this API there are two basic topics you need to be reasonably familiar with:
What an XML document is for and what it consists of.
You also need to be aware of what an XML namespace is, if only because JAXP has methods relating to handling these. You can find more information on JAXP at http://sun.java.com/xml/jaxp/.
Just in case you are new to XML, we will briefly explore the basic characteristics of XML and DTDs before we start applying the classes and methods provided by JAXP to process XML documents. We will also briefly explore what XML namespaces are for. If you are already comfortable with these topics you can skip most of this chapter and pick up where we start talking about SAX. Let's start by looking into the general organization of an XML document.