DTDs for XML
SGML was used to define the Document Type Definition (DTD) for HTML, and it is still used to write
DTDs for XML. The problem with SGML is its allowances for odd syntax, which makes creating parsers
for HTML a difficult problem:
Some start tags specifically disallow end tags, such as the HTML
. Including an end tag
causes an error.
Some start tags have optional or implied end tags, such as the HTML
, which assumes a
closing tag when it meets another
or several other tags.
Some start tags require end tags, such as the HTML
Tags can be embedded in any order. For instance,
<b>This is a <i> sample </b>
is okay even though the end tags don’t occur in reverse order of the start tags.
Some attributes require values, such as
<img src=”picture.jpg” >
Some attributes don’t require values, such as
Attribute can be defined with or without quotation marks surrounding them, so
are both allowed.
All these issues make creating SGML language parsers a truly arduous task. The difficultly of knowing
when to apply the rules caused a stagnation in the definition of SGML languages. This is where XML
begins to fit in.
XML does away with all the optional syntax of SGML that caused so many developers heartache early
on. In XML, the following rules apply:
Every start tag must have end tag.
An optional shorthand syntax represents both the start and end tags in one. This syntax uses a
forward slash (
) immediately before the greater-than symbol, such as
. An XML parser
interprets this as being equal to
Tags must be embedded in an appropriate order, so end tags must mirror start tags, such as
<b>this is a <i>sample</i> string</b>
. It helps to think of start and end tags as similar
to open and close parentheses in math: You cannot close the outermost parenthesis without first
closing all the inner ones.
All attributes require values.
All attributes must use quotes around the values.
These rules make an XML parser much simpler to develop and also remove the guesswork of when and
where to apply odd syntax rules. Where SGML failed to gain mainstream acceptance, XML has made
tremendous inroads because of its simplicity. XML has spawned several languages in just the first six
years of its existence, including MathML, SVG, RDF, RSS, SOAP, XSLT, XSL-FO, and the reformulation
of HTML into XHTML.
For a technical comparison of SGML and XML, please see the W3C’s note located at
09_579088 ch06.qxd 3/28/05 11:37 AM Page 160