It probably refers to pipelining, that is, parallel (or semi-parallel) execution of instructions. That's the only scenario I can think of where it does not really matter how long something takes, as long as you can have enough of them running in parallel.
So, the CPU may fetch one instruction, (step 1 in the table above,) and then as soon as it proceeds to step 2 for that instruction, it can at the same time (in parallel) start with step 1 for the next instruction, and so on.
Let's call our two consecutive instructions A and B. So, the CPU executes step 1 (fetch) for instruction A. Now, when the CPU proceeds to step 2 for instruction A, it cannot yet start with step 1 for instruction B, because the program counter has not advanced yet. So, it has to wait until it has reached step 3 for instruction A before it can get started with step 1 for instruction B. This is the time it takes to start another instruction, and we want to keep this at a minimum, (start instructions as quickly as possible,) so that we can be executing in parallel as many instructions as possible.
CISC architectures have instructions of varying lengths: some instructions are only one byte long, others are two bytes long, and yet others are several bytes long. This does not make it easy to increment the program counter immediately after fetching one instruction, because the instruction has to be decoded to a certain degree in order to figure out many bytes long it is. On the other hand, one of the primary characteristics of RISC architectures is that all instructions have the same length, so the program counter can be incremented immediately after fetching instruction A, meaning that the fetching of instruction B can begin immediately afterwards. That's what the author means by starting instructions quickly, and that's what increases the number of instructions that can be executed per second.
In the above table, step 2 says "Change the program counter to point to the following instruction" and step 3 says "Determine the type of instruction just fetched." These two steps can be in that order only on RISC machines. On CISC machines, you have to determine the type of instruction just fetched before you can change the program counter, so step 2 has to wait. This means that on CISC machines the next instruction cannot be started as quickly as it can be started on a RISC machine.
1)
An Instruction Set Architecture defines the interface that is used to program a processor. Note that this does not define the implementation. When a CPU designer goes about designing the processor, they may implement this ISA in various different ways. In fact, the various Intel processors have each implemented the x86 ISA in many different ways over the years.
Computer architecture is generally delineated around an ISA. Computer microarchitecture is very specific to a particular processor. For example, a Pentium vs. a Pentium Pro implement for the most part the exact same instruction set, i.e. the x86 ISA. But their microarchitectures were drastically different. (The Pro was out of order for example).
One of the methods often used to get pipelines to be very fast is to reduce the more complex instructions to microinstructions. These are used by the "back-end" of the processor, which is the part that has one or more execution units.
Therefore, the point of the two types of instructions ISA instruction vs. microinstruction has to do with whether you are discussing the ISA, i.e. the interface, or the microarchitecture, i.e. the implementation.
2)
As far as your second question, Tanenbaum was almost certainly talking about ISA level instructions because his books are about computer architecture. Microarchitecture is much more hardware designer centric. See John Stokes, "Inside the Machine" if you are interested in microarchitecture.
3)
For most real processors, fetch-decode-execute is entirely too simplistic to describe what really goes on inside. However, from an ISA point of view, it defines a good abstraction to the steps an instruction takes.
The instruction fetch is done by dedicated hardware, so it is different from the load instruction data fetch. Usually there are actually two bus masters, instruction and data. These buses often have their own caches. In some cases, i.e. harvard architectures, the data and instruction memory maps are entirely separate.
4)
RISC vs. CISC is entirely regarding the ISA, i.e. the interface. However, the RISC mindset of using a reduced instruction set is very much to simplify the microarchitecture. The effect is that the microarchitectural instructions are generally exactly the same as the ISA instructions in RISC processors.
Best Answer
Firmware doesn't really fit into that hierarchy. Firmware is really just a place where a library of machine code (in the sense of level 2) is stored. Often the code stored in firmware is for the management of the motherboard and IO channels, but it need not be. I've got an old Tandy laptop that has Multiplan, a text editor, a BASIC interpreter, and a calendar in firmware.