The W3C website, http://www.w3.org, has a huge number of standards in varying stages of creation. Not all of these standards concern us, and not all of the ones that concern us can be found at this website. However, the vast majority of standards that do concern us can be found there.
The HTML standard is maintained by W3C. This standard might seem fairly straightforward, given that each version should have introduced just a few new tags, but in reality the life of the standards body was vastly complicated by the browser wars. The versions 1.0 and 2.0 of HTML were simple, small documents, but when W3C came to debate HTML version 3.0, they found that much of the new functionality they were discussing had already been superceded by new additions, such as the <applet> and <style> tags, to the version 3.0 browser's appletstyle. Version 3.0 was discarded, and a new version 3.2 became the standard.
However, a lot of the features that went into HTML 3.2 had been introduced at the behest of the browser manufacturers and ran contrary to the spirit of HTML, which was intended solely to define structure. The new features, stemming right back to the <font> tag, just confused the issue and added unnecessary presentational features to HTML. These features really became redundant with the introduction of style sheets. So suddenly, in the version 3 browsers, there were three distinct ways to define the style of an item of text. Which was the correct way? And if all three ways were used, which style did the text ultimately assume? The version 4.0 of the HTML standard was left with the job of unmuddling this chaotic mess and marked up a lot of tags for deprecation (removal) in the next version of the standards. It was the largest version of the standard and included features that linked it to style sheets and the Document Object Model, and also added facilities for the visually impaired and other unfairly neglected minority interest areas. The current version of the HTML standard is 4.01.
Extensible Markup Language, or XML, is a standard for creating markup languages (such as HTML). XML itself has been designed to look as much like HTML as possible, but that's where the similarities end.
HTML is actually an application of the meta-language SGML, which is also a standard for generating markup languages. SGML has been used to create many markup languages, but HTML is the only one that enjoys universal familiarity and popularity. XML, on the other hand, is a direct subset of SGML. SGML is generally considered to be too complex for people to be able to accurately represent it on a computer, so XML is a simplified subset of SGML. XML is much easier to read than SGML as well.
XML's main purpose is for the creating customized markup languages that are very similar in look and structure to HTML. One main use of XML is in the representation of data. Whereas a normal database can store information, databases don't allow individual stored items to contain information about their structure. XML can use the tag structure of markup languages to represent any kind of data, from mathematical and chemical notations to the entire works of Shakespeare, where information contained in the structure of the data might otherwise be lost. For instance, an XML document could be used to record that Mark Anthony doesn't appear until Scene II Act I of Shakespeare's play Julius Caesar, while a relational database would struggle to do this without a lot of extra fields, as the following example shows:
<play> <act1> <scene1> ... </scene1> <scene2> <mark_anthony> Caeser, my lord? </mark_anthony> </scene2> <scene3> ... </scene3> </act1> <act2> ... </act2> <act3> ... </act3> <act4> ... </act4> <act5> ... </act5> </play>
XML is also completely cross-platform, because it contains just text. This means that an application on Windows can package up the data in this format, and a completely different application on Unix should be able to unravel it and read the data.
XML is more complex than HTML. Whereas a browser will take HTML code, interpret the relevant details, and display the corresponding web page without any intervention, interpreting XML requires several extra steps.
Because we're creating the markup language ourselves, we need to first create a set of rules through which the language will be run. This can be done in one of two ways, either by an XML schema or by a Document Type Definition (DTD). Both of these are used to draw up rules, such as which tags we can use in our markup language, which attributes these tags take, and what kind of data these attributes are expecting.
Secondly, once we've written our XML document in our new language, it must be checked against both the syntax rules laid down for XML documents and the rules in the schema or the DTD to see if the code conforms. We'll be taking an in-depth look at XML in the next chapter.
XHTML 1.0 is where the XML and HTML standards meet. XHTML is just a respecification of the HTML 4.01 standard as an XML application. The advantages of this allow XHTML to get around some of the problems caused by a browser's particular interpretation of HTML, and more importantly to provide a specification that allows the Web to be used by clients other than browsers, such as those provided on handheld computers, mobile phones, or any software device that might be connected to the Internet (perhaps even your refrigerator).
It also offers a common method for specifying our own tags, instead of just adding them randomly. We can specify new tags via a common method using an XML DTD and an XML namespace. (This is a way of identifying one set of tags uniquely from any other set of tags.) This is particularly useful for the new markup languages, such as Wireless Markup Language (WML), which are geared toward mobile technology and require a different set of tags to be able to display on the reduced interfaces.
Having said that, anyone familiar with HTML should be able to look at an XHTML page and understand what's going on. There are differences, but not ones that add new tags or attributes.
The following list points out the main differences between XHTML and HTML:
XHTML requires an XML declaration at the top of the file: <?xml version='1.0'?>.
We also have to provide a DTD declaration at the top of the file referencing the version of the DTD standard we are using.
We have to include a reference to the XML namespace within the HTML tag.
We need to supply all XHTML tag names in lowercase, because XML is case-sensitive.
The <head> and <body> elements must always be included in an XHTML document.
Tags must always be closed and nested correctly. When only one tag is required, such as with line breaks, the tag is closed with a /, for example <br/>.
Attribute values must always be denoted by quotation marks.
It's now time for us to consider the Document Object Model itself.