Main Page

Previous Next

Rules for a Well-Formed Document

Now that we know a bit more about XML elements and what goes into a DTD, we can formulate what you must do to ensure your XML document is well-formed. The rules for a document to be well-formed are quite simple:

  1. If the XML declaration appears in the prolog, it must include the XML version. Other specifications in the XML document must be in the prescribed sequence – character encoding then standalone specification.

  2. If the document type declaration appears in the prolog the DOCTYPE name must match that of the root element and the markup declarations in the DTD must be according to the rules for writing markup declarations.

  3. The body of the document must contain at least one element, the root element, which contains all the other elements, and an instance of the root element must not appear in the content of another element. All elements must be properly nested.

  4. Elements in the body of the document must be consistent with the markup declarations identified by the DOCTYPE declaration.

The rules for writing an XML document are absolutely strict. Break one rule and your document is not well formed and will not be processed. This strict application of the rules is essential because we are communicating data and its structure. If any laxity were permitted it would open the door to uncertainty about how the data should be interpreted. HTML used to be quite different from XML in this respect. Until recently, the rules for writing HTML were only loosely applied by HTML readers such as web browsers.

For instance, even though a paragraph in HTML should be defined using a begin tag, <p>, and an end tag, </p>, you can usually get away with omitting the end tag, and you can use both capital and lower-case p, and indeed close a capital-case P paragraph with a lower-case p, and vice versa. You can often have overlapping tags in HTML and get away with that too. While it is not to be recommended, a loose application of the rules for HTML is not so harmful since HTML is only concerned with data presentation. The worst that can happen is that the data does not display quite as you intended.

Recently, the W3C has released a number of specifications that make HTML an XML language, and we can expect compliance within the next few years. The enduring problem is, of course, that the Internet has many years of material that is still very useful but that will never be well-formed XML, so browsers may never be fully XML compliant.

Previous Next
JavaScript Editor Java Tutorials Free JavaScript Editor