Adding Characters from Outside the Encoding
If most of the characters on your page belong to one encoding and you just want to add a few characters from another, you can set the main encoding for the document (see page 330) and then use character references for characters outside of the main encoding.
A character reference can represent any character in Unicode by giving the character's unique code within that set. A character's code can be represented as either a regular (base 10) number or as a hexadecimal number. Some characters also have associated entities, that is, unique identifying words, that you can use instead of the number.
You can find a character's code, in hexadecimal form (which is the most common), at the Unicode site: http://www.unicode.org/charts/. You can find the complete list of characters that have associated entities in Appendix D or at my site: www.cookwood.com/entities/
To add characters from outside the encoding:
Type & (an ampersand).
Next, type #xn
, where n
is the hexadecimal number that represents the desired character (Figure 21.15)
Figure 21.15. A hexadecimal reference is comprised of an ampersand, a hash symbol (#), the letter x, the hexadecimal representation of the numeric code for the character, and a semicolon. You can use hexadecimal references to insert any character from the Universal Character Set. This particular character is an é.
Or type #n
, where n
is the base 10 number for your character (Figure 21.16)
Figure 21.16. A numeric reference is comprised of an ampersand, a hash symbol (#), the numeric code for the character, and a semicolon. You can use numeric references to insert any character from the Universal Character Set. This reference is also for an é.
Or type entity
, where entity
is the name of the entity that corresponds to your character (Figure 21.17)
Figure 21.17. An entity reference, also known as a character entity reference or named reference, is made up of an ampersand, the character's name, and a semicolon. There are 252 named references that you can use in your (X)HTML pages. They are case-sensitive. This reference is also for an é.
Finally, type ; (a semicolon).
In general, you only need to use character references for characters that are not part of the document's character encoding.
The principal exception to the first tip is the & symbol. In XHTML documents, when used as text (as in AT&T), you must use its character reference (&).
The greater than, less than, and double quotation mark symbols also have special meaning in (X)HTML. You should use their character references>, <, and ", respectivelywhen not using them in the markup code itself.
While using references for characters like é and £ is valid, using the proper encoding (e.g., utf-8) is much faster for large chunks of text.
The most common default encodings, including windows-1252 and x-mac-roman lack several useful symbols. You can use character references to create these symbols without touching the default encoding.
If you're using a hexadecimal or numeric reference, don't forget the # between the ampersand and the number. And if you're using a hexadecimal, don't forget the lowercase letter x, that indicates that the hexadecimal is coming.
While there are hex and numeric references for every character in Unicode, there are named entity references for only 252 of them. They are case-sensitive. See Appendix D for a complete listing.
Your visitors will only be able to view the characters for which they have adequate fonts installed. While you can specify a particular font (see page 152), it's not required; in its absence browsers should search the available fonts for one that includes the characters in question.
You may also insert small quantities of special characters by using GIF images (see page 90).
Figure 21.18. You can use any combination of named, numeric, or hexadecimal references in your document. It doesn't matter which encoding the document is in. Of course, it's better just to use an encoding like UTF-8 that supports the characters you need.
Figure 21.19. The characters display properly. Note that the visitor's browser must have an appropriate font for the characters.