JavaScript EditorDebugger script     Dhtml css 



Team LiB
Previous Section Next Section

Appendix B: Special Characters and Character Sets

A number of character sets can be used in displaying Web pages. ISO 8859-1 (or Latin1) is a character set that contains characters used in most Western European languages and is the most commonly used character set used in displaying Web pages. Other versions of ISO 8859 are available for displaying non-Western European languages requiring additional characters.

HTML 4 also allows the designation of characters from the Unicode character set, which contains many thousands more possible characters that can be displayed in Web pages. In the following sections, you'll find tables showing the reserved, unused, and special characters included in the ISO 8859-1 character set.

The ISO 8859-1 Character Set

The ISO 8859-1 character set is the official character set for Web pages, at least as far as Western languages (English, French, German, and so on) are concerned. It is an 8-bit character set, which allows for 256 code positions. The code positions 000 through 031 and 127 are assigned as control characters (line feed, space character, and so forth). Positions 032 through 126 correspond to the US-ASCII characters that you can type in at the keyboard. Code positions 128 through 159 are designated as "unused" in ISO 8859-1, although both the Macintosh and Windows systems assign characters to many of these positions. Code positions 160 through 255 contain characters that can be used in Web pages and should be displayable in most Web browsers. Windows and UNIX systems display all these characters because the ISO 8859-1 character set is also their native character set. However, 14 of the officially sanctioned characters are missing from the Macintosh native character set (which is a different character set than the ISO 8859-1 character set). ISO 8859-1 was the standard character set for Web pages coded using HTML 2.0.

Reserved Characters

These are the numerical and named characters that are reserved for formatting HTML tags and codes.

Number

Name

Description

Character

"

"

Double quotation

"

&

&

Ampersand

&

<

<

Left angle bracket

<

&#062;

&gt;

Right angle bracket

>

The < or > characters (left or right angle brackets) should not be typed directly into an HTML document, since they are used to designate the start or end of an HTML tag; to use these characters "as is," you need to type in the numerical or named entity code (&lt; and &gt;). Double quotations and ampersands, on the other hand, generally need only be replaced in an HTML file if they are part of an HTML code that you want to display "as is."

You should use character entity names with caution, because versions of Netscape Navigator prior to 4.0 only recognize the following character entity names:

  • All the reserved character entity names (&quot;, &amp;, &lt;, and &gt;).

  • The copyright (&copy;), registered (&reg;), and non-breakable space (&nbsp;) entity names.

  • All the accented characters (starting with &Agrave; and ending with &yuml;), with the exception of the Y-acute and y-acute characters (&Yacute; and &yacute;), which should be avoided all together because they aren't included in the Macintosh character set.

For inserting all other characters, you should stick to using numerical entity codes.

Unused Characters

Both Windows and the Macintosh assign characters to many of the code positions that the ISO 8859-1 character set designates as unused, and 12 of these extra characters are dissimilar on the two systems.

There's no guarantee that any of these "unused" characters will display on other platforms. Generally, it is best to avoid using these characters in an HTML file. A possible exception is the trademark symbol (&#153;), which displays on Macintosh, Windows, and most UNIX systems, but it is also available as a Unicode character (&#8482;). However, if for legal reasons you want to ensure that your trademark symbol displays on all platforms, you should use <sup>(TM)</sup> instead. (The trademark's entity name, &trade;, should be avoided, however, because no version of Netscape Navigator supports it.)

Number

Name

Description

Character

&#128;

Unused

  

&#129;

Unused

  

&#130;

Single quote (low)

 

&#131;

Small Latin f

 

ƒ

&#132;

Double quote (low)

 

"

&#133;

Ellipsis

 

&#134;

Dagger

 

&#135;

Double dagger

 

&#136;

Circumflex

 

^

&#137;

Per mile sign

 

&#138;

S-caron

 

Š (not Mac)

&#139;

Left angle quote

 

&#140;

OE ligature

 

Œ

&#141;

Unused

  

&#142;

Unused

  

&#143;

Unused

  

&#144;

Unused

  

&#145;

Left single quote

 

'

&#146;

Right single quote

 

'

&#147;

Left double quote

 

"

&#148;

Right double quote

 

"

&#149;

Bullet

 

&#150;

&ndash;

En dash

&#151;

&mdash;

Em dash

&#152;

Small tilde

 

˜

&#153;

&trade;

Trademark

&#154;

s-caron

 

Š (not Mac)

&#155;

Right angle quote

 

&#156;

oe ligature

 

œ

&#157;

Unused

  

&#158;

Unused

  

&#159;

Y-umlaut

 

Ÿ

Used Characters

The following characters, 160 through 255, are part of the ISO 8859-1 character set. These are the only characters that are officially designated for use in HTML documents. They should generally be available on any operating system that uses the ISO 8859-1 character set. A number of these characters, however, may not be displayed on the Macintosh, because they are not included in its default character set, and should probably be avoided (they are marked with "not Mac"). Internet Explorer 5 for the Macintosh, however, substitutes a font that does include the missing characters.

Number

Name

Description

Character

&#160;

&nbsp;

Non-breakable space

[ ] (brackets added)

&#161;

&iexcl;

Inverted exclamation

Ў

&#162;

&cent;

Cent sign

ў

&#163;

&pound;

Pound sign

Ј

&#164;

&curren;

Currency sign

¤

&#165;

&yen;

Yen sign

Ґ

&#166;

&brvbar;

Broken vertical bar

¦(not Mac)

&#167;

&sect;

Section sign

§

&#168;

&uml;

Umlaut

Ё

&#169;

&copy;

Copyright

©

&#170;

&ordf;

Feminine ordinal

Є

&#171;

&laquo;

Left guillemet

&#172;

&not;

Not sign

¬

&#173;

&shy;

Soft hyphen

-

&#174;

&reg;

Registered

®

&#175;

 

Macron

Ї

&#176;

&deg;

Degree

°

&#177;

&plusmn;

Plus/minus sign

±

&#178;

&sup2;

Superscripted 2

22 (not Mac)

&#179;

&sup3;

Superscripted 3

33 (not Mac)

&#180;

&acute;

Acute accent

ґ

&#181;

&micro;

Micro sign

µ

&#182;

&para;

Paragraph sign

&#183;

&middot;

Middle dot

·

&#184;

&cedil;

Cedilla

&#185;

&sup1;

Superscripted 1

11 (not Mac)

&#186;

&ordm;

Masculine ordinal

є

&#187;

&raquo;

Right guillemet

&#188;

&frac14;

1/4 fraction

1/4 (not Mac)

&#189;

&frac12;

1/2 fraction

1/2 (not Mac)

&#190;

&frac34;

3/4 fraction

3/4 (not Mac)

&#191;

&iquest;

Inverted question mark

ї

&#192;

&Agrave;

A-grave

А

&#193;

&Aacute;

A-acute

Б

&#194;

&Acirc;

A-circumflex

В

&#195;

&Atilde;

A-tilde

Г

&#196;

&Auml;

A-umlaut

Д

&#197;

&Aring;

A-ring

Е

&#198;

&AElig;

AE ligature

Ж

&#199;

&Ccedil;

C-cedilla

З

&#200;

&Egrave;

E-grave

И

&#201;

&Eacute;

E-acute

Й

&#202;

&Ecirc;

E-circumflex

К

&#203;

&Euml;

E-umlaut

Л

&#204;

&Igrave;

I-grave

М

&#205;

&Iacute;

I-acute

Н

&#206;

&Icirc;

I-circumflex

О

&#207;

&Iuml;

I-umlaut

П

&#208;

&ETH;

Uppercase Eth

Р (not Mac)

&#209;

&Ntilde;

N-tilde

С

&#210;

&Ograve;

O-grave

Т

&#211;

&Oacute;

O-acute

У

&#212;

&Ocirc;

O-circumflex

Ф

&#213;

&Otilde;

O-tilde

Х

&#214;

&Ouml;

O-umlaut

Ц

&#215;

&times;

Multiplication sign

Ч (not Mac)

&#216;

&Oslash;

O-slash

Ш

&#217;

&Ugrave;

U-grave

Щ

&#218;

&Uacute;

U-acute

Ъ

&#219;

&Ucirc;

U-circumflex

Ы

&#220;

&Uuml;

U-umlaut

Ь

&#221;

&Yacute;

Y-acute

Э (not Mac)

&#222;

&THORN;

Uppercase Thorn

Ю (not Mac)

&#223;

&szlig;

Sharp s (German)

Я

&#224;

&agrave;

a-grave

а

&#225;

&aacute;

a-acute

б

&#226;

&acirc;

a-circumflex

в

&#227;

&atilde;

a-tilde

г

&#228;

&auml;

a-umlaut

д

&#229;

&aring;

a-ring

е

&#230;

&aelig;

ae ligature

ж

&#231;

&ccedil;

c-cedilla

з

&#232;

&egrave;

e-grave

и

&#233;

&eacute;

e-acute

й

&#234;

&ecirc;

e-circumflex

к

&#235;

&euml;

e-umlaut

л

&#236;

&igrave;

i-grave

м

&#237;

&iacute;

i-acute

н

&#238;

&icirc;

i-circumflex

о

&#239;

&iuml;

i-umlaut

п

&#240;

&eth;

Lowercase Eth

р (not Mac)

&#241;

&ntilde;

n-tilde

с

&#242;

&ograve;

o-grave

т

&#243;

&oacute;

o-acute

у

&#244;

&ocirc;

o-circumflex

ф

&#245;

&otilde;

o-tilde

х

&#246;

&ouml;

o-umlaut

ц

&#247;

&divide;

Division sign

ч

&#248;

&oslash;

o-slash

ш

&#249;

&ugrave;

u-grave

щ

&#250;

&uacute;

u-acute

ъ

&#251;

&ucirc;

u-circumflex

ы

&#252;

&uuml;

u-umlaut

ь

&#253;

&yacute;

y-acute

э (not Mac)

&#254;

&thorn;

Lowercase Thorn

ю (not Mac)

&#255;

&yuml;

y-umlaut

я

FIND IT ONLINE
For additional information on the ISO 8859-1 character set, see A. J. Flavell's page on ISO 8859-1 at ppewww.ph.gla.ac.uk/~flavell/iso8859/.


Team LiB
Previous Section Next Section


JavaScript EditorDebugger script     Dhtml css