UTF-8
and unicode.org
Unicode - Universal Coded Character Set, is an ASCII superset of roughly 150 thousand characters. UTF-8 - Unicode Transformation Format 8-bit, maps binary numbers, or bytes, to Unicode characters.
# Hardware
8 bits : 1 byte
# UTF-8
1 byte : 1 character # ASCII
2 bytes : 1 character
3 bytes : 1 character
4 bytes : 1 character
# Font
X characters : 1 glyph
The term Unicode, from en-cod-ing, is not related to programming, and Unilang, meaning lang - language and uni - unification or internationalization, would have been a more fitting name.
Characters are elements of language, and invisible control characters.
- writing system characters
1234
andabcd
𓁨𓎆𓏤𓏤𓏤
𓁨
𓎆
𓏤
𓏤
𓏤
is1000013
in ancient egyptian- control
\n
newline,\b
backspace
- signs and symbols
✝
latin cross︻デ═一
ASCII art
- Optical Character Recognition
- emoji
see: unicode characters that are not writing system characters
If some of these error, or mojibake, then you're either, not on Unicode 2022 version 15 or later, or you don't have a supporting font loaded.
Since the 1970s, a byte, or eight digit binary number, is the smallest amount of data that can exist in computer hardware. A character is one or more bytes. Character encodings are software libraries, to encode - write characters into bytes, and decode - read bytes into characters.
1844 Morse code. Latin characters => Ternary interpretation => International Morse code
SOS
00021112000
... --- ...
# There is no lowercase in Morse
HELLO, WORLD!
000020201002010021112110011222011211120102010021002101011
.... . .-.. .-.. --- --..-- .-- --- .-. .-.. -.. -.-.--
A font, font family, or typeface, displays characters as glyphs.
A font can have multiple alternate glyphs for the same character. This is sometimes used in handwriting fonts. A font can also merge, two or more characters, into a ligature glyph. This is used to create icons on the web, and programming ligatures. Variable fonts are even capable of animations.
Unicode 2022 or version 15, includes, one, two writing systems, control characters for egyptian hieroglyphs, 8 astrology symbols, 4000 chinse-japanese-korean characters, some emojis, and more.