Skip to content

Latest commit

 

History

History
69 lines (48 loc) · 4.02 KB

utf8.md

File metadata and controls

69 lines (48 loc) · 4.02 KB

Unicode - Universal Coded Character Set, is an ASCII superset of roughly 150 thousand characters. UTF-8 - Unicode Transformation Format 8-bit, maps binary numbers, or bytes, to Unicode characters.

# Hardware
8 bits : 1 byte

# UTF-8
1 byte  : 1 character # ASCII
2 bytes : 1 character
3 bytes : 1 character
4 bytes : 1 character

# Font
X characters : 1 glyph

The term Unicode, from en-cod-ing, is not related to programming, and Unilang, meaning lang - language and uni - unification or internationalization, would have been a more fitting name.

What's a character

Characters are elements of language, and invisible control characters.

see: unicode characters that are not writing system characters

If some of these error, or mojibake, then you're either, not on Unicode 2022 version 15 or later, or you don't have a supporting font loaded.

Character encoding

Since the 1970s, a byte, or eight digit binary number, is the smallest amount of data that can exist in computer hardware. A character is one or more bytes. Character encodings are software libraries, to encode - write characters into bytes, and decode - read bytes into characters.

1844 Morse code. Latin characters => Ternary interpretation => International Morse code

SOS
00021112000
... --- ...

# There is no lowercase in Morse
HELLO, WORLD!
000020201002010021112110011222011211120102010021002101011
.... . .-.. .-.. --- --..--   .-- --- .-. .-.. -.. -.-.--

Glyph rendering

A font, font family, or typeface, displays characters as glyphs.

A font can have multiple alternate glyphs for the same character. This is sometimes used in handwriting fonts. A font can also merge, two or more characters, into a ligature glyph. This is used to create icons on the web, and programming ligatures. Variable fonts are even capable of animations.

Patch notes

Unicode 2022 or version 15, includes, one, two writing systems, control characters for egyptian hieroglyphs, 8 astrology symbols, 4000 chinse-japanese-korean characters, some emojis, and more.