Executables do depend on both the OS and the CPU:
Instruction Set: The binary instructions in the executable are decoded by the CPU according to some instruction set. Most consumer CPUs support the x86 (“32bit”) and/or AMD64 (“64bit”) instruction sets. A program can be compiled for either of these instruction sets, but not both. There are extensions to these instruction sets; support for these can be queried at runtime. Such extensions offer SIMD support, for example. Optimizing compilers might try to take advantage of these extensions if they are present, but usually also offer a code path that works without any extensions.
Binary Format: The executable has to conform to a certain binary format, which allows the operating system to correctly load, initialize, and start the program. Windows mainly uses the Portable Executable format, while Linux uses ELF.
System APIs: The program may be using libraries, which have to be present on the executing system. If a program uses functions from Windows APIs, it can't be run on Linux. In the Unix world, the central operating system APIs have been standardized to POSIX: a program using only the POSIX functions will be able to run on any conformant Unix system, such as Mac OS X and Solaris.
So if two systems offers the same system APIs and libraries, run on the same instruction set, and use the same binary format, then a program compiled for one system will also run on the other.
However, there are ways to achieve more compatibility:
Systems running on the AMD64 instruction set will commonly also run x86 executables. The binary format indicates which mode to run. Handling both 32bit and 64bit programs requires additional effort by the operating system.
Some binary formats allow a file to contain multiple versions of a program, compiled for different instruction sets. Such “fat binaries” were encouraged by Apple while they transitioning from the PowerPC architecture to x86.
Some programs are not compiled to machine code, but to some intermediate representation. This is then translated on-the-fly to actual instructions, or might be interpreted. This makes a program independent from the specific architecture. Such a strategy was used on the UCSD p-System.
One operating system can support multiple binary formats. Windows is quite backwards compatible and still supports formats from the DOS era. On Linux, Wine allows the Windows formats to be loaded.
The APIs of one operating system can be reimplemented for another host OS. On Windows, Cygwin and the POSIX subsystem can be used to get a (mostly) POSIX-compliant environment. On Linux, Wine reimplements many of the Windows APIs.
Cross-platform libraries allow a program to be independent of the OS APIs. Many programming languages have standard libraries that try to achieve this, e.g. Java and C.
An emulator simulates a different system by parsing the foreign binary format, interpreting the instructions, and offering a reimplementation of all required APIs. Emulators are commonly used to run old Nitendo games on a modern PC.
It probably refers to pipelining, that is, parallel (or semi-parallel) execution of instructions. That's the only scenario I can think of where it does not really matter how long something takes, as long as you can have enough of them running in parallel.
So, the CPU may fetch one instruction, (step 1 in the table above,) and then as soon as it proceeds to step 2 for that instruction, it can at the same time (in parallel) start with step 1 for the next instruction, and so on.
Let's call our two consecutive instructions A and B. So, the CPU executes step 1 (fetch) for instruction A. Now, when the CPU proceeds to step 2 for instruction A, it cannot yet start with step 1 for instruction B, because the program counter has not advanced yet. So, it has to wait until it has reached step 3 for instruction A before it can get started with step 1 for instruction B. This is the time it takes to start another instruction, and we want to keep this at a minimum, (start instructions as quickly as possible,) so that we can be executing in parallel as many instructions as possible.
CISC architectures have instructions of varying lengths: some instructions are only one byte long, others are two bytes long, and yet others are several bytes long. This does not make it easy to increment the program counter immediately after fetching one instruction, because the instruction has to be decoded to a certain degree in order to figure out many bytes long it is. On the other hand, one of the primary characteristics of RISC architectures is that all instructions have the same length, so the program counter can be incremented immediately after fetching instruction A, meaning that the fetching of instruction B can begin immediately afterwards. That's what the author means by starting instructions quickly, and that's what increases the number of instructions that can be executed per second.
In the above table, step 2 says "Change the program counter to point to the following instruction" and step 3 says "Determine the type of instruction just fetched." These two steps can be in that order only on RISC machines. On CISC machines, you have to determine the type of instruction just fetched before you can change the program counter, so step 2 has to wait. This means that on CISC machines the next instruction cannot be started as quickly as it can be started on a RISC machine.
Best Answer
The lines of code have nothing to do with how the CPU executes it. I'd recommend reading up on assembler, because that will teach you a lot about how the hardware actually does things. You can also get assembler output from many compilers.
That code might compile into something like (in a made up assembly language):
However, if the compiler knows that a variable isn't used again, the store operation may not be emitted.
Now for the debugger to know what machine code corresponds to a line of program source, annotations are added by the compiler to show what line corresponds to where in the machine code.