Validator and Debugger!
The World Wide Web (Web) is a network of information resources. The Web relies on three mechanisms to make these resources readily available to the widest possible audience:
The ties between the three mechanisms are apparent throughout this specification.
Every resource available on the Web -- HTML document, image, video clip, program, etc. -- has an address that may be encoded by a Universal Resource Identifier , or "URI".
URIs typically consist of three pieces:
Consider the URI that designates the W3C Technical Reports page:http://www.w3.org/TR
This URI may be read as follows: There is a document available via the HTTP protocol, residing on the machine www.w3.org, accessible via the path "/TR". Other schemes you may see in HTML documents include "mailto" for email and "ftp" for FTP.
Here is another example of a URI. This one refers to a user's mailbox:...this is text... For all comments, please send email to <A href="mailto:firstname.lastname@example.org">Jean</A>.
Note. Most readers may be familiar with the term "URL" and not the term "URI". URLs form a subset of the more general URI naming scheme.
A relative URI doesn't contain any naming scheme information. Its path generally refers to a resource on the same machine as the current document. Relative URIs may contain relative path components (e.g., ".." means one level up in the hierarchy defined by the path), and may contain .
Relative URIs are resolved to full URIs using a base URI. As an example of relative URI resolution, assume we have the base URI "http://www.acme.com/support/intro.html". The relative URI in the following markup for a hypertext link:<A href="suppliers.html">Suppliers</A>
would expand to the full URI "http://www.acmhte.com/support/suppliers.html", while the relative URI in the following markup for an image<IMG src="../icons/logon.gif" alt="logon">
would expand to the full URI "http://www.acmhte.com/icons/logon.gif".
In HTML, URIs are used to:
To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may potentially understand. The publishing language used by the World Wide Web is HTML (from HyperText Markup Language).
HTML gives authors the means to:
HTML was originally developed by Tim Berners-Lee while at CERN, and popularized by the Mosaic browser developed at NCSA. During the course of the 1990s it has blossomed with the explosive growth of the Web. During this time, HTML has been extended in a number of ways. The Web depends on Web page authors and vendors sharing the same conventions for HTML. This has motivated joint work on specifications for HTML.
HTML 2.0 (November 1995) was developed under the aegis of the Internet Engineering Task Force (IETF) to codify common practice in late 1994. HTML+ (1993) and HTML 3.0 (1995) proposed much richer versions of HTML. Despite never receiving consensus in standards discussions, these drafts led to the adoption of a range of new features. The efforts of the World Wide Web Consortium's HTML Working Group to codify common practice in 1996 resulted in HTML 3.2 (January 1997).
Most people agree that HTML documents should work well across different browsers and platforms. Achieving interoperability lowers costs to content providers since they must develop only one version of a document. If the effort is not made, there is much greater risk that the Web will devolve into a proprietary world of incompatible formats, ultimately reducing the Web's commercial potential for all participants.
Each version of HTML has attempted to reflect greater consensus among industry players so that the investment made by content providers will not be wasted and that their documents will not become unreadable in a short period of time.
HTML has been developed with the vision that all manner of devices should be able to use information on the Web: PCs with graphics displays of varying resolution and color depths, cellular telephones, hand held devices, devices for speech for output and input, computers with high or low bandwidth, and so on.
HTML 4 extends HTML with mechanisms for style sheets, scripting, frames, embedding objects, improved support for right to left and mixed direction text, richer tables, and enhancements to forms, offering improved accessibility for people with disabilities.
HTML 4.01 is a revision of HTML 4.0 that corrects errors and makes some changes since the previous revision.
This version of HTML has been designed with the help of experts in the field of internationalization, so that documents may be written in every language and be transported easily around the world.
One important step has been the adoption of the ISO/IEC:10646 standard as the document character set for HTML. This is the world's most inclusive standard dealing with issues of the representation of international characters, text direction, punctuation, and other world language issues.
HTML now offers greater support for diverse human languages within a document. This allows for more effective indexing of documents for search engines, higher-quality typography, better text-to-speech conversion, better hyphenation, etc.
As the Web community grows and its members diversify in their abilities and skills, it is crucial that the underlying technologies be appropriate to their specific needs. HTML has been designed to make Web pages more accessible to those with physical limitations. HTML 4 developments inspired by concerns for accessibility include:
Authors who design pages with accessibility issues in mind will not only receive the blessings of the accessibility community, but will benefit in other ways as well: well-designed HTML documents that distinguish structure and presentation will adapt more easily to new technologies.
Authors now have greater control over structure and layout (e.g., column groups). The ability of designers to recommend column widths allows user agents to display table data incrementally (as it arrives) rather than waiting for the entire table before rendering.
Note. At the time of writing, some HTML authoring tools rely extensively on tables for formatting, which may easily cause ac
Style sheets simplify HTML markup and largely relieve HTML of the responsibilities of presentation. They give both authors and users control over the presentation of documents -- font information, alignment, colors, etc.
Style information can be specified for individual elements or groups of elements. Style information may be specified in an HTML document or in external style sheets.
The mechanisms for associating a style sheet with a document is independent of the style sheet language.
Before the advent of style sheets, authors had limited control over rendering. HTML 3.2 included a number of attributes and elements offering control over alignment, font size, and text color. Authors also exploited tables and images as a means for laying out pages. The relatively long time it takes for users to upgrade their browsers means that these features will continue to be used for some time. However, since style sheets offer more powerful presentation mechanisms, the World Wide Web Consortium will eventually phase out many of HTML's presentation elements and attributes. Throughout the specification elements and attributes at risk are marked as " deprecated ". They are accompanied by examples of how to achieve the same effects with other elements or style sheets.
Through scripts, authors may create dynamic Web pages (e.g., "smart forms" that react as users fill them out) and use HTML as a means to build networked applications.
The mechanisms provided to include scripts in an HTML document are independent of the scripting language.
An HTML 4 document is composed of three parts:
White space (spaces, newlines, tabs, and comments) may appear before or after each section. Sections 2 and 3 should be delimited by the HTML element.
Here's an example of a simple HTML document:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<TITLE>My first HTML document</TITLE>
A valid HTML document declares what version of HTML is used in the document. The document type declaration names the document type definition (DTD) in use for the document .
HTML 4.01 specifies three DTDs, so authors must include one of the following document type declarations in their documents. The DTDs vary in the elements they support.
The URI in each document type declaration allows user agents to download the DTD and any entity sets that are needed.
After document type declaration, the remainder of an HTML document is contained by the HTML element. Thus, a typical HTML document has this structure:<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <HTML> ...The head, body, etc. goes here... </HTML>
The HEAD element contains information about the current document, such as its title, keywords that may be useful to search engines, and other data that is not considered document content. User agents do not generally render elements that appear in the HEAD as content. They may, however, make information in the HEAD available to users through other mechanisms.
Every HTML document must have a TITLE element in the HEAD section.
Authors should use the TITLE element to identify the contents of a document. Since users often consult documents out of context, authors should provide context-rich titles. Thus, instead of a title such as "Introduction", which doesn't provide much contextual background, authors should supply a title such as "Introduction to Medieval Bee-Keeping" instead.
For reasons of accessibility, user agents must always make the content of the TITLE element available to users (including TITLE elements that occur in frames). The mechanism for doing so depends on the user agent (e.g., as a caption, spoken).
Titles may contain character entities (for accented characters, special characters, etc.), but may not contain other markup (including comments). Here is a sample document title:<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
Note. The W3C Resource Description Framework became a W3C Recommendation in February 1999. RDF allows authors to specify machine-readable metadata about HTML documents and other network-accessible resources.
HTML lets authors specify meta data -- information about a document rather than document content -- in a variety of ways.
For example, to specify the author of a document, one may use the META element as follows:<META name="Author" content="Igor Smit ">
The META element specifies a property (here "Author") and assigns a value to it .
This specification does not define a set of legal meta data properties. The meaning of a property and the set of legal values for that property should be defined in a reference lexicon called a profile . For example, a profile designed to help search engines index documents might define properties such as "author", "copyright", "keywords", etc.
Start tag: required , End tag: forbidden
For the following attributes, the permitted values and their interpretation are profile dependent:name = name [CS] This attribute identifies a property name. This specification does not list legal values for this attribute. content = cdata [CS] This attribute specifies a property's value. This specification does not list legal values for this attribute. scheme = cdata [CS] This attribute names a scheme to be used to interpret the property's value (see the section on profiles for details). http-equiv = name [CI] This attribute may be used in place of the name attribute. HTTP servers use this attribute to gather information for HTTP response message headers.
A common use for META is to specify keywords that a search engine may use to improve the quality of search results. When several META elements provide language-dependent information about a document, search engines may filter on the lang attribute to display search results using the language preferences of the user. For example,<-- For speakers of US English --> <META name="keywords" lang="en-us" content="vacation, Russia, sunshine"> <-- For speakers of British English --> <META name="keywords" lang="en" content="holiday, Russia, sunshine"> <-- For speakers of French --> <META name="keywords" lang="fr" content="vacances, France;c'est quoi, soleil">
The effectiveness of search engines can also be increased by using the LINK element to specify links to translations of the document in other languages, links to versions of the document in other media (e.g., PDF), and, when the document is part of a collection, links to an appropriate starting point for browsing the collection.
The META element may be used to specify the default information for a document in the following instances:
The following example specifies the character encoding for a document as being ISO-8859-1<META http-equiv="Content-Type" content="text/html; charset=utf-8">
Elements are the structures that describe parts of an HTML document. For example, the P element represents a paragraph while the EM element gives emphasized content.
An element has three parts: a start tag, content, and an end tag. A tag is special text--"markup"--that is delimited by " < " and " > ". An end tag includes a " / " after the " < ". For example, the EM element has a start tag, <EM> , and an end tag, </EM> . The start and end tags surround the content of the EM element:
<EM>This is emphasized text</EM>
Element names are always case-insensitive, so <em> , <eM> , and <EM> are all the same.
Elements cannot overlap each other. If the start tag for an EM element appears within a P , the EM 's end tag must also appear within the same P element.
Some elements allow the start or end tag to be omitted. For example, the LI end tag is always optional since the element's end is implied by the next LI element or by the end of the list:<UL> <LI>First list item; no end tag <LI>Second list item; optional end tag included</LI> <LI>Third list item; no end tag </UL>
Some elements have no end tag because they have no content. These elements, such as the BR element for line breaks, are represented only by a start tag and are said to be empty .
An element's attributes define various properties for the element. For example, the IMG element takes a SRC attribute to provide the location of the image and an ALT attribute to give alternate text for those not loading images:
<IMG SRC="wdglogon.gif" ALT="Web Group">
An attribute is included in the start tag only--never the end tag--and takes the form Attribute-name =" Attribute-value " . The attribute value is delimited by single or double quotes. The quotes are optional if the attribute value consists solely of letters in the range A-Z and a-z, digits (0-9), hyphens ("-"), and periods (".").
Attribute names are case-insensitive, but attribute values may be case-sensitive.
Certain characters in HTML are reserved for use as markup and must be escaped to appear literally. The "<" character may be represented with an entity , < . Similarly, ">" is escaped as > , and "&" is escaped as & . If an attribute value contains a double quotation mark and is delimited by double quotation marks, then the quote should be escaped as " .
Other entities exist for special characters that cannot easily be entered with some keyboards. For example, the copyright symbol ("©") may be represented with the entity © . See the Entities section for a complete list of HTML 4.0 entities.
As an alternative to entities, authors may also use numeric character references . Any character may be represented by a numeric character reference based on its "code position" in Unicode . For example, one could use © for the copyright symbol or ا for the Arabic letter ALEF.
Comments in HTML have a complicated syntax that can be simplified by following this rule: Begin a comment with " <!-- ", end it with " --> ", and do not use " -- " within the comment.
<!-- An example comment -->
An HTML 4.0 document begins with a DOCTYPE declaration that declares the version of HTML to which the document conforms. The HTML element follows and contains the HEAD and BODY . The HEAD contains information about the document, such as its title and keywords, while the BODY contains the actual content of the document, made up of block-level elements and inline elements . A basic HTML 4.0 document takes on the following form:<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
In a Frameset document, the FRAMESET element replaces the BODY element.
Each HTML document should be validated to check for errors such as missing quotation marks ( <A HREF="oops.html>Oops</A> ), misspelled element or attribute names, and invalid structures. Such errors are not always apparent when viewing a document in a browser since browsers are designed to recover from an author's errors. However, different browsers recover in different ways, sometimes resulting in invisible text on one browser but not on others.
The W3C HTML Validation Service checks the validity of HTML 4.0 documents.
Note that some programs claim to be validators but really are not. A validator checks a document against a formal document type definition ( DTD ) while other programs such as lints warn about valid but unsafe HTML . Both kinds of programs are useful, but validation should never be forgotten.
Validator and Debugger!