How are alphabetic characters programmed into a computer

communicationcomputer-architecturecomputerscpu

I'm no cs student, I'm a programmer. I have a couple of questions and a few assumptions that I will make here (correct me if I'm wrong please).

From my understanding is that all the sequences of 1 and 0's that computers execute are just representations of actual data or instructions that tell other hardware on a system what to do like telling a graphic card to change a pixels color on the monitor.

01100001 being the representation of the letter "a" and that this sequence of bits has been chosen by the fathers of computing/ascii or what have you to represent that letter, it could just as well have been some other sequence right?

All computer systems have agreed that those bits mean the letter "a" for all of the computers around the globe to be interoperable, if there were no standards in place for this the internet would be a mess.

What I want to know is: where is the information stored on a computer that tells it that these sequences of bits here mean/represent the character "a"? Is it in the OS or directly on the motherboard or am I just completely wrong?

Best Answer

Nothing tells 01100001 (61h) is an ASCII binary representation of letter 'a'... except the context. In a computer what a sequence of bits such as 01100001 represents depends on where it's found and how the container is structured.

In a file organized in bytes, you'll mostly find alphabetic characters represented in a continuous flow of 8-bit characters in text files, for instance. Now whether 01100001 represents the letter "a" depends on what standard the containing text file conforms to; "a" is represented 61h in ASCII and 81h in EBCDIC, to name only two.

This is a simplistic explanation as there's also page codes, which were invented because 256 positions is not enough to represent international alphabets. For text files, operating systems have character encodings, each of which defines how (and what) characters are translated into what binary representation.

ASCII is one of them and uses only 256 positions for [some of] English alphabetic and non alphabetic, numbers, [a limited set of] punctuation and [non printable] control characters. ISO-8859-1 is a variant of ASCII which accounts for several European accented characters. UTF-8, another one of them defines a variable-length byte representation to account for the representation of most characters in all languages.

On a UNIX system like GNU/Linux, what character encoding a text file follows is shown by its MIME type. See GNU/Linux command file -i

# file -i dead.letter
dead.letter:          text/plain; charset=us-ascii

This shows file dead.letter is a text file that uses ASCII for its content. In such file types 01100001 (61h) represents the letter "a". The MIME type of a text file is determined (read: set) automatically by the editor that saved the file, depending on the locale (aka regional settings) the editor was started. The MIME type can be saved to disk along with the text file or guessed at run-time from the file content when the file is read. The latter is true especially when reading from GNU operating systems files that were saved in Windows as the latter defines no such thing as MIME types.

Again, this is a summarized explanation but that's the base.