HTML: The Evolution of a Standard

HTML has come a long way since Tim Berners-Lee first proposed it. In many ways, the development of HTML resembles the layers of an onion, because within each further development of the HTML standard, the earlier standards are still largely preserved. Part of the reason for this is the need for backward-compatibility, which allows documents produced using earlier versions of HTML to remain fully readable and accessible by means of later browser versions.

Early Versions of HTML

There was significant evolution of the HTML standard when the Internet and the Web were still primarily academic concerns, prior to the commercial expansion of the Internet (that the Web and HTML were largely instrumental in spurring). These developments included the release of the first numbered version of HTML (Version 1.0), a proposed standard, HTML+, which, however, became the basis for future standardization, and HTML 2.0, which was the first fully developed standard for HTML.

Note

In the following sections, features and capabilities that have been incorporated into HTML are listed. Don't worry if you don't understand everything that is mentioned. The main intention is to give you an idea of how HTML has evolved over time and not to provide a definitive description of what each version of HTML comprised. As you get into this book's tutorial sessions, starting Saturday morning, you will be working with hands-on examples of many of these HTML features.

The First HTML

The initial version of HTML was exceedingly simple and lacked many essential features that were added later. Still, it was a fully functional hypertext system with the following main features:

An HTML element to designate the content of a document as HTML. (This was not included in Berners-Lee's original HTML proposal, but was included in the more formalized specification and DTD that was later produced by Dan Connolly.)
The capability to specify a title, a hierarchy of headings, and paragraph elements within an HTML document.
The inclusion of hypertext anchors within an HTML document, either defining a jump to another document or object on the Web, or defining both ends of a jump from one location to another location within the same or two different documents.
The specification of bulleted lists and glossary lists.
Insertion of non-keyboard character entities.
The denotation of an included section of text as an address block, in which authorship and contact information can be provided.

FIND IT ONLINE
The first version of HTML is often loosely referred to as HTML 1.0, although no official version number was used at the time. To read the initial definition for HTML, go to www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/MarkUp.html.

HTML+

HTML+ was an early, but failed, proposal to extend the capabilities of HTML, including provision for many features that ended up being incorporated in future versions of HTML, including HEAD and BODY elements, numbered lists, character codes and names, inline and floating images, line breaks, literal and logical text highlighting, typewriter and keyboard text, preformatted text, address blocks, acronyms and abbreviations, block quotes, strikeout, superscripts and subscripts, tables, and user-input forms. In addition, HTML+ proposed use of the P (Paragraph) element as a container, rather than as a separator, a feature later incorporated into HTML 2.0. HTML+ also included a proposal for the formatting of mathematical equations, a forerunner of the current MathML standard.

Other features proposed for inclusion in HTML+, but which have never been implemented in any following standard, include elements or attributes for presenting abstracts, bylines, figures, plain lists, notes and footnotes, margin notes, literal text (similar to preformatted text, preserving spaces and returns, but displayed in a proportional instead of a monospaced font), tabs, and indexing.

You might notice something very telling about the HTML+ proposal— many of its proposed features are focused on the presentation of academic papers and documents. The Web was largely an academic initiative in its beginnings, but with the expansion of access to the Web beyond academia and the introduction of the first commercial browsers, events in the marketplace rapidly overtook what were, by comparison, painfully slow movements toward standardizing the expansion of the initial version of HTML.

HTML 2.0

In 1995, Tim Berners-Lee and Dan Connolly proposed the HTML 2.0 standard. Part of the rationale behind the proposal was to standardize "the capabilities of HTML in common use prior to June 1994." HTML 2.0 incorporated much, although not all, of what had been earlier proposed in the failed HTML+ initiative, including elements to hierarchically structure online documents (the HTML, HEAD, and BODY elements), use of the P element as a container (but with an implied end tag), literal and logical highlighting (for displaying italics, bolding, and emphasis), typewriter and keyboard text (monospacing), numbered lists (only bulleted lists were included in HTML 1.0), inline images, comments, preformatted text, block quotes, character entity names, address blocks, and input forms. HTML 2.0 also added support for displaying horizontal rules, which had not previously been included in the HTML+ proposal. In reality, very little was included in HTML 2.0 that had not previously been proposed as part of HTML+ (although there was quite a lot of HTML+ that was not included in HTML 2.0). The HTML 2.0 standard was, in fact, an attempt to rope in and standardize just those features of HTML+ that had been commonly implemented in browsers up until that time—one of the reasons for the failure of the HTML+ proposal had been the failure of browser vendors (primarily Netscape) to implement many of its features.

Ad Hoc HTML: Netscape and Microsoft Extensions

The ad hoc extension of the capabilities of HTML by browser vendors actually began very early. The display of inline images, using the IMG element, for instance, was initially introduced in the Mosaic browser by Marc Andreesson, before it was proposed for HTML+ or included in HTML 2.0. Andreeson later left NCSA (the National Center for Super-computing Applications), where the Mosaic browser had been created, to help form Netscape, which later introduced its Navigator browser, the first widely available commercial Web browser.

Netscape Extensions to HTML

Over time, Netscape introduced a significant number of unofficial extensions to HTML in its Navigator browser. Since virtually everybody surfing the Web at the time was using Netscape's Navigator browser, this led to Netscape's resulting version of HTML being established as what amounted to an ad hoc standard for HTML. Some of Netscape's extensions to HTML were adoptions of features earlier proposed as part of HTML+, but which were not included in HTML 2.0, including superscripts and subscripts, strikethrough (the STRIKE element), floating images, and tables. Netscape, however, also struck out on its own and introduced a number of features, primarily focused on the visual presentation of Web pages, that had not previously been anticipated in any prior official HTML proposal or standard, including setting font colors, font sizes, base font sizes, blinking text, document text and link colors, background colors and images, and frames.

Many of these latter extensions to HTML by Netscape have been criticized for violating one of the original tenets of HTML, which was to separate structure from presentation (with HTML defining the structural elements of a document and browsers then being responsible for determining the presentation of those elements). The FONT and BASEFONT elements have been particularly criticized for introducing new elements that are purely focused on visual presentation and have no place in the hierarchical tree of document objects.

Berners-Lee's initial version of HTML was proposed as a subset of SGML (Standard Generalized Markup Language), but subsequent vendor extensions, such as the FONT and BASEFONT elements, have been criticized for violating both the letter and spirit of the SGML standard. The development of XHTML is an attempt to reformulate HTML as an SGML-compliant markup language.

In defense of Netscape, however, it should be stressed that they were simply responding to a very strong market demand for the ability to create visually richer Web page designs, spurred by the rapid popularization and commercialization of the Web. The commercial evolution of the Web simply outran the standardization process.

Microsoft Extensions to HTML

When Microsoft introduced its Internet Explorer browser, it also introduced a number of extensions to HTML, including the ability to change the font face, display scrolling marquees, and fix a background image relative to the browser window (rather than scrolling with the page).

Initiatives for Standardization

The world of the Web had been transformed within a very few short years, from a largely academic affair focused on the trading of ideas, to an increasingly commercialized medium much more focused on the selling and promotion of products and services. The current standard, HTML 2.0, was more reflective of the past of HTML, rather than its future, and had been proven to be inadequate for meeting the challenges posed by an increasingly commercialized Web. Without a renewed emphasis on standardization, however, the Web could not further progress, other than as the province of a single vendor.

A Failed Initiative: HTML 3.0

The HTML 3.0 proposal was an attempt to extend the capabilities beyond the limitations of the HTML 2.0 standard. Many features that had previously been proposed as part of the failed HTML+ proposal were also included in HTML 3.0, such as tables, big and small fonts, superscripts and subscripts, underlining, and more. Ultimately, however, as happened with HTML+, HTML 3.0 proved to be overly ambitious and was never fully implemented by browser vendors (Netscape and Microsoft, principally). Ultimately, the World Wide Web Consortium (W3C) abandoned HTML 3.0 in favor of a much more modest proposal, HTML 3.2. Many tags and attributes that were proposed as part of HTML 3.0 were later incorporated into the HTML 3.2 and HTML 4.0 standards.

Note

FIND IT ONLINE
The World Wide Web Consortium (also known simply as the W3C) is the organization responsible for developing protocols and standards (including HTML) for the Web. It was founded in October 1994 by Tim Berners-Lee, the inventor of the Web, in collaboration with CERN, where the Web originated. Later, the W3C moved from CERN (in Bern, Switzerland) to MIT (in Cambridge, Massachusetts), where Berners-Lee is currently a faculty member. You can find out more about the W3C's activities and mission at www.w3.org/.

HTML 3.2

HTML 3.2 was released in January 1997. A large part of the HTML 3.2 specification is a rubber-stamping of what originally were Netscape's unofficial and ad hoc extensions to HTML. The rest of the HTML 3.2 specification covers features of the previously proposed specification, HTML 3.0, which had already gained wide acceptance and implementation (tables, for instance) in Web browsers. HTML 3.2 offered little that hadn't already been widely implemented in browsers. Here are some of the primary features included in the HTML 3.2 standard:

Creation of tables.
Insertion of Java applets.
Use of background images, as well as definition of background, text, and link colors.
Specification of font sizes and colors.
Flowing of text around images.
Controlling (and turning off) of image borders.
Specification of the height and width of images (so browsers can allocate space for them and not hold up display of other elements).
Horizontal alignment (left, center, or right) of paragraphs, headings, and horizontal rules.
Insertion of superscripts, subscripts, and strikethroughs.
Specification of document divisions.
Inclusion of client-side image maps.
Provisions for style sheets (using the STYLE tag), left otherwise undefined.

HTML 3.2 should be fully supported by all current graphical Web browsers.

HTML 4.0

HTML 4.0 was released in December 1997. Like HTML 3.2, HTML 4.0 is a mix of the old and the new. Included in it are elements that were previously either Netscape or Microsoft extensions (frames and font-face changes), as well as a number of entirely new elements and capabilities. Here are some of HTML 4.0's primary features:

Frames, including inline frames.
Cascading style sheets.
New form elements, including the BUTTON element which provides for the creation of graphical form buttons.
New table elements, including the capability to apply formatting to column and row groups.
New text-markup elements, including elements for making insertions and deletions, striking out text, adding quotations, and adding formatting to a "span" of text.
The capability to specify font faces for the display of text (formerly a Microsoft extension).
The capability to attach styles and actions to specific elements or a class of elements so that passing the mouse over or clicking on an element, for example, triggers an action executed by a script.

Although current browsers support much of what is included in the HTML 4.0 standard, there are a number of features that have yet to be supported. Some features, such as frames, had already been implemented widely in current browsers, but others, such as the Q tag for formatting inline quotations have yet to be supported by any browser.

Also, support for style sheets, the most important feature included in the HTML 4.0 standard, is still inconsistent and incomplete in current browsers. The result is that the same style sheet can have radically different results depending on whether it's displayed in Internet Explorer or Navigator, for instance. Style sheets can be used effectively to design Web pages for display in today's Web browsers, but they must be used wisely and with circumspection. (For further information on using style sheets, see Appendix E.)

Of course, newer browser versions will undoubtedly more fully support the use of style sheets, as well as other yet to be implemented, or yet to be fully implemented, features included in HTML 4.0.

HTML 4.01

The HTML 4.01 recommendation was released in December 1999. It primarily fixes some bugs and clarifies obscurities in the HTML 4.0 specification.

The W3C has no plans currently to continue the development of the HTML standard beyond version 4.01. This might actually be a good thing, in that it should help to stabilize the current feature set (elements and attributes), thus providing browsers with a single stable version of HTML that they can universally support. In the meantime, developments in the area of cascading style sheets, scripting technologies, and dynamic HTML (the Dynamic Object Model, or DOM) should continue to extend the capabilities of HTML for many years to come.

Note

FIND IT ONLINE
After you have some practical experience working with HTML this weekend, you might want to check out the actual specifications for HTML at the W3C's HTML Web page at www.w3.org/MarkUp/. You can find links there to the HTML 4.01, 4.0, 3.2, and 2.0 specifications. Checking out the earlier specifications for HTML can give you an in-depth understanding of why and how HTML has become what it is today.

The W3C's future development efforts in the area of markup languages will be primarily focused on the XHTML and XML standards. It should be noted, however, that XHTML 1.0 depends entirely on HTML 4.01 for element and attribute definitions. The differences between the two versions of HTML are primarily syntactical. XHTML does not replace HTML 4.01, but is a parallel standard that relies upon HTML 4.01 for most of its substantial content. HTML 4.01 will remain as the substantial core of XHTML from this point on and will remain permanently as a current standard in its own right. In other words, in order to learn XHTML, you need to first learn HTML 4.01; Web authors, however, are under no constraint to migrate from HTML 4.01 to XHTML, but are free to do so if they can derive additional benefits by doing so. Once you have mastered HTML 4.01, learning how to create Web pages that are compliant with XHTML 1.0 is a fairly trivial matter. See Appendix A, "HTML/XHTML Reference," for guidelines on creating XHTML documents that are backward-compatible with HTML 4.01.

XHTML and XML

The original versions of HTML were intended as SGML-conforming languages, but proprietary browser extensions caused the development of HTML to diverge from its original SGML roots.

XML (Extensible Markup Language) was introduced as a kind of umbrella markup language under which all other markup languages for display over the Web could be grouped—XML is designed as a subset of SGML. Along with XHTML, other markup languages that are grouped under and conforming with XML include MathML (Mathematical Markup Language) and SMIL (Synchronized Multimedia Integration Language). XML enables the creation of many additional and specialized markup languages. Academic groups, for instance, should be able to create and publish their own markup languages under XML for displaying academic and scientific papers and articles, including footnotes, citations, bibliographies, figure captions, and so on. In conjunction with XML, the W3C has also developed XSL (Extensible Stylesheet Language) for applying styles to XML-conforming documents.

XHTML (Extensible HyperText Markup Language) was introduced in an attempt to redefine HTML in conformance with XML, and thus also with SGML. XHTML 1.0 relies upon HTML 4.01 for the meanings of elements and attributes—the differences between XHTML 1.0 and HTML 4.01 are largely syntactical in nature. Both XHTML 1.0 and HTML 4.01 are current W3C standards. Web authors are free to choose which standard they want to use in coding documents for display on the Web.

The W3C has also announced that HTML 4.01 will be the last version of HTML—there will be no HTML 4.1 or 5.0. That is actually a good thing, in that it provides a stable standard that all Web authors can write to (in the form of HTML 4.01 or XHTML 1.0, which includes HTML 4.01's elements and attributes). It also means that elements and attributes that are deprecated in HTML 4.01 will never be declared obsolete in a future version of HTML 4, since there will be no future version of HTML 4. There need be no fear that valid HTML 4.01 Web pages will ever be rendered obsolete on the Web.

XHTML is currently composed of three modules, XHTML 1.0 (the original version of XHTML), XHTML 1.1, and XHTML Basic. XHTML 1.1 is a redefinition of XHTML 1.0 within the W3C's current focus on modularity. Unlike XHTML 1.0, XHTML 1.1 only supports strict conformance to the XHTML 1.1 definition and does not allow the use of any deprecated elements or attributes. XHTML 1.0 is redefined as a separate module designed to assure backward-compatibility with HTML 4.01, whereas XHTML 1.1 is designed to ensure forward-compatibility and interoperability with other XML-based modules that may be developed in the future.

XHTML Basic defines a reduced set of features to be used in coding Web pages for presentation by alternative Web clients, such as cell phones, PDAs, WebTV, or other devices with reduced display or input capabilities.

HTML 4.01 and XHTML 1.0 are like two parts of the same book. In order to understand and effectively use XHTML 1.0, you have to first understand HTML 4.01, since XHTML 1.0 relies upon HTML 4.01 for all element and attribute definitions. Many Web authors may never need to migrate from HTML 4.01 to XHTML 1.0 (or other future versions of XHTML). Even if you choose to migrate to XHTML in the future, you still need to learn how to use HTML 4.01 first, since most of XHTML 1.0 is still simply HTML 4.01. You need have no fear that HTML 4.01 will be outmoded in the future, since HTML 4.01 will continue to live on inside of XHTML. There is also no danger that future browsers will ever drop support for displaying HTML 4.01 documents, since that would mean dropping support for millions of Web pages—the vast majority of Web pages are still being coded in HTML 4.01, and not in XHTML 1.0. Migrating from HTML 4.01 to XHTML 1.0 is an option, in other words, not a requirement.

For those who choose to migrate, converting a valid HTML 4.01 document to XHTML 1.0 is not difficult. The W3C has developed a utility, HTML Tidy, which converts HTML 4.01 documents to XHTML 1.0 documents, and also cleans up markup errors and generally "prettifies" the document markup to facilitate easier maintenance and updating in the future.

In Appendix A, you will find a section, "XHTML Compatibility Guidelines," which details the differences between XHTML 1.0 and HTML 4.01 and provides guidance on how to code XHTML 1.0 documents that are backward-compatible with HTML 4.01.

Other HTML Developments

There's much more going on in the HTML area than just XML and XHTML. Here are some of the current initiatives afoot to expand and extend HTML:

Cascading Style Sheets, level 1 (CSS1), level 2 (CSS2), and level 3 (CSS3). Style sheets conforming to CSS2 enable you to specify fonts on the Web that can be downloaded with a Web page, create rectangular regions containing other elements that can overlap and be positioned anywhere on a Web page, and define multiple style sheets for a single Web page that can be used by different media types (such as speech synthesizers, Braille printers, handheld devices, and so on). CSS3 is still in the draft phase, but will further extend the capabilities of CSS when released. CSS3 is being organized into separate modules (including Web fonts, paged media, user interface, tables, math, SMIL, SVG, and other modules) to facilitate the extension of CSS' capabilities in specific areas. To find out more about CSS1, CSS2, and CSS3, see www.w3.org/Style/CSS/.
FIND IT ONLINE
The Dynamic Object Model (DOM). Development and agreements in the area of the DOM are keystones for the full implementation and development of dynamic HTML, allowing the dynamic addressing of any objects in a Web page through scripts or programs. The W3C DOM standard is currently supported by all current browsers. Some earlier browsers, such as Internet Explorer 4 and Netscape Navigator 4, still only support proprietary DOMs created by Microsoft and Netscape. See Sunday Evening, "Creating Page Layouts," for examples of incorporating dynamic effects into Web page designs that are compatible with all current browsers. To find out more about the W3C's DOM standard, see www.w3.org/DOM/.
Mathematical Markup Language (MathML). Provides complex formatting capabilities for equations and formulas. To find out more about MathML, see www.w3.org/Math/.
Synchronized Multimedia Integration Language (SMIL, pronounced "smile"). Provides non-programmers with the ability to create their own multimedia presentations on the Web. To find out more about SMIL, see www.w3.org/AudioVideo/.
Scalable Vector Graphics (SVG). Provides for the inclusion of vector-based scalable graphics in Web pages. SVG graphics are currently supported by Adobe Illustrator 9 and Jasc Software's Trajectory Pro, with others sure to follow. To find out more about the SVG graphics format, see www.w3.org/Graphics/SVG/.
Mobile Access. A number of proposed specifications are in the works to facilitate the display of Web content on mobile and wireless devices, such as cellular phones and personal digital assistants (PDAs). These include the Wireless Application Protocol (WAP) and various "navigation" markup languages that propose additional HTML tags or attributes to facilitate browsing through mobile communication devices. To find out more about the W3C's initiative for facilitating mobile access on the Web, see www.w3.org/Mobile/.
Voice Browsers. The W3C is currently working on proposals for standardizing interaction with Web sites through spoken commands. To find out more about the W3C's Voice Browser activity, see www.w3.org/Voice/.
Platform for Internet Content Selection (PICS). Allows metadata labels to be associated with Internet content to help parents and teachers filter what children can access on the Web based on a site's PICS rating (or lack of a rating). To find more information on PICS, see www.w3.org/PICS/.
Web Accessibility Initiative (WAI). Includes the W3C's Web Content Accessibility Guidelines, which companies and organizations can subscribe to and require in the creation of Web pages and sites to ensure universal accessibility. Compliance with the Americans with Disabilities Act, which guarantees accessibility to anyone regardless of disability, is required for Web sites created for federal, state, and local governmental departments and agencies, as well as for any non-profit or for-profit organizations that are recipients of federal funding. To find out more about the Web Accessibility Initiative, see www.w3.org/WAI/.
The Semantic Web. An initiative, inclusive of Web Ontology Language and Resource Description Framework (RDF), to enable the weaving of semantic meaning into the interrelations between resources, objects, and services on the Web. To find more about the Semantic Web, OWL, and RDF, including a link to an article published in Scientific American on the topic, see www.w3.org/2001/sw/.