Architecture – Differences between Instruction set (architecture) and machine language

Architectureprogramming-languages

Tanenbaum's Structured Computer Organization says:

Every computer has an ISA (Instruction Set Architecture), which is
a set of registers, instructions, and other features visible to its
low-level programmers.

This ISA is commonly referred to as machine language, although the
term is not entirely accurate.

A program at this level of abstraction is a long list of binary
numbers, one per instruction, telling which instructions to execute
and what their operands are.

My questions are:

  1. ISA includes instructions and registers. As far as I know, an instruction consists of an opcode (i.e. operation) and operand(s). Is an opcode too small to be an member in the ISA?

  2. As far as I know, a programming language is a set of some programs. So is a machine language also a set of some programs?

  3. What is the difference between ISA and machine language? Is it that a machine language is a set of programs, while a ISA is not a set of programs but a set of more basic units (e.g. registers, instructions) which together form programs?

Best Answer

It looks like you borrowed some vocabulary from theoretical computer science (regarding the words "set" and "language"), and tried to use that to interpret the textbook description of the lower level computer systems (CPU and hardware).

The word "set" as in "instruction set architecture" refers to the set of predefined opcodes that are valid for the given CPU architecture.

For example, this page (x86 instruction listings, on Wikipedia) lists the opcodes one can expect on various generations of the IA32 architecture. Each opcode is a member in this "instruction set".

When the author says "... is not entirely accurate", my guess is that he is referring to the fact that there is no grammar one could expect from a sequence of machine instructions.

To clarify further, the CPU reads the instruction bytes, and if it is a valid machine instruction, it will be executed. Otherwise, typically an "invalid instruction" hardware exception will be generated, which will stop the current execution and transfer control to a different program (possibly belonging to the operating system).

However, there is no grammar that defines what is a valid sequence of machine instructions. In some sense, there is no structure or hierarchy that one can expect all valid sequences of machine instructions to conform to.

The machine instructions found in most compiled software programs are formed with some structure; however, not all of them do. The CPU will happily execute a "spaghetti sequence of machine instructions", performing unconditional or conditional jumps as it encounters each; and until recently, it is possible for a sequence of machine instructions to modify parts of itself.

Some CPU instructions are designed to facilitate structured programming:

  • the stack register and push / pop instructions
  • the return address register, and the call / return instructions for executing subroutines (also known as procedures or functions)
  • loop instructions (increment or decrement a register, checks against a limit, and then jumps to a specified address if the limit has not been reached)
  • string copy instructions (a loop instruction that repeatedly copies bytes from one address range to another address range)

Even though some of these instructions appear to form pairs (push vs pop, call vs return), in reality a valid sequence of machine instructions does not need to match these pairs. That is, one could write the next program counter value into the memory address pointed to by the stack register (simulating a push), and then perform a jump into the start of a subroutine (which, together with the previous push, becomes a simulation of a subroutine call). The subroutine could use the regular return instruction.