HTML Entity Notes
Taxonomic Lexicon
- Character Encoding – The method of numerically representing a code point within a given repertoire.
- Code Point / Code Position – Index of a character within a repertoire.
- Character Repertoire – A set of available characters
- Font – Set of glyphs
- Glyph – Visual representation of a character
- Character – Smallest unit of data to convey information
Character Encodings
- EBCDIC
- US-ASCII / ISO 646 – American Standard Code for Information Interchange. Index 0-31 is C0 control characters. 127 is DEL.
- ISO 8859 series
- ISO 8859-1 / ISO Latin-1 – Western European language, such as French, Spanish, Portuguese, Italian, German, Swedish, Norwegian, Danish and Finnish.
- ISO 8859-15 – Replaces ISO 8859-1 to include the Euro sign (€,
€
)
- Windows-1252 – Similar to ISO 8859-1.
- Unicode / ISO 10646 – Contains most of the characters used in the world. Can hold millions but currently holds hundreds of thousands of characters.
- ISO 10646 – Character repertoire used by HTML. Uses UCS-2 or UCS-4 (BE or LE) encoding.
- UTF-8
XML Control Characters
Entity |
Name |
Decimal |
Unicode |
Description |
" |
" |
" |
" |
quote |
& |
& |
& |
& |
Ampersand |
' |
' |
' |
' |
apostrophe |
< |
< |
< |
< |
less than |
> |
> |
> |
> |
greater than |
Computer Commands
Entity |
Name |
Decimal |
Unicode |
Description |
↵ |
↵ |
↵ |
__ |
Carriage return |
White Space
Entity |
Name |
Decimal |
Unicode |
Description |
|
|
  |
  |
Non-Breaking Space |