Kaigai Blog living abroad in my twenties

【My Study Note】Character Encoding

General Infotech

Character encoding is used to assign our binary values to characters so that we as humans can read them. We definitely wouldn’t want to see all the text in our emails and Web pages rendered in complex sequences of zeros and ones. This is where character encodings come in handy. 

You can think of character encoding as a dictionary. It’s a way for your computers to look up which human characters should be represented by a given binary value.

ASCII

The oldest character encoding standard used is ASCII. It represents the English alphabet, digits, and punctuation marks.

The first character in ASCII to binary table, a lowercase a, maps to 0 1 1 0 0 0 0 1 in binary. This is done for all the characters you can find in the English alphabet as well as numbers and some special symbols. 

The great thing with ASCII was that we only needed to use 127 values out of our possible 256. It lasted for a very long time, but eventually, 256 possible ways weren’t enough. 

UTF 8

Then came UTF 8. The most prevalent encoding standard used today. Also, UTF 8 is built off the Unicode Standard.

Along with having the same ASCII table, it also lets us use a variable number of bytes. What do I mean by that? Think of any emoji. It’s not possible to make emojis with a single byte, so as we can only store one character in a byte, instead UTF 8 allows us to store a character in more than one byte, which means endless emoji fun. 

ASCIIコードの文字に加え、世界中の文字を加えたのが、UTF-8です。ASCIIコード以外の文字は、2~6バイトで表現され、日本語の文字は、基本的に3バイトで表現します。ASCIIコードとの互換性が良いため、パソコンで扱いやすく、世界中の多くのソフトウェアは、UTF-8に対応しています。そのため、「パソコンの世界共通語」と言っても過言ではありません。