I will take your question literally and discuss mostly microprocessors, not computers in general.
All computers have some sort of machine code. An instruction consists of an opcode and one or more operands. For example, the ADD instruction for the Intel 4004 (the very first microprocessor) was encoded as 1000RRRR where 1000 is the opcode for ADD and RRRR represented a register number.
The very first computer programs were written by hand, hand-encoding the 1's and 0's to create a program in machine language. This is then programmed into the chip. The first microprocessors used ROM (Read-Only Memory); this was later replaced by EPROM (Erasable Programmable ROM, which was erased with UV light); now programs are usually programmed into EEPROM ("Electrically...-EPROM", which can be erased on-chip), or specifically Flash memory.
Most microprocessors can now run programs out of RAM (this is pretty much standard for everything but microcontrollers), but there has to be a way of loading the program into RAM in the first place. As Joby Taffey pointed out in his answer, this was done with toggle switches for the Altair 8080, which was powered by an Intel 8080 (which followed the 4004 and 8008). In your PC, there is a bit of ROM called the BIOS which is used to start up the computer, and load the OS into RAM.
Machine language gets tedious real fast, so assembler programs were developed that take a mnemonic assembler language and translate it, usually one line of assembly code per instruction, into machine code. So instead of 10000001, one would write ADD R1.
But the very first assembler had to be written in machine code. Then it could be rewritten in its own assembler code, and the machine-language version used to assemble it the first time. After that, the program could assemble itself. This is called bootstrapping and is done with compilers too -- they are typically first written in assembler (or another high-level language), and then rewritten in their own language and compiled with the original compiler until the compiler can compile itself.
Since the first microprocessor was developed long after mainframes and minicomputers were around, and the 4004 wasn't really suited to running an assembler anyway, Intel probably wrote a cross-assembler that ran on one of its large computers, and translated the assembly code for the 4004 into a binary image that could be programmed into the ROM's. Once again, this is a common technique used to port compilers to a new platform (called cross-compiling).
Remember that the result of ADDI
is known by the end of cycle 2 -- specifically our new value for R1 is going to be available:
- at cycle 3 (M stage) in E/M, and
- at cycle 4 (W stage) in M/W
(where E/M and M/W are pipeline registers).
It is available at both these places because we must continue to carry it along until we reach the write-back stage.
Then, looking at just an SW
instruction by itself, there are two places we could forward the word data in time for SW
:
- at cycle 4 (E stage) superseding the value of R1 that was in D/E
- at cycle 5 (M stage) superseding the value of R1 that was in E/M
Your problem constraints permit for M->E forwarding and E->E forwarding. M->E works out here when you look at cycle 4. This is because cycle 4 is not too late to forward for the SW
, and our R1 value is available at that moment from register M/W.
Best Answer
Processors have a lot of really neat math tricks that they can do to optimize things and reduce cycle times, but most of those depend of the next step being predictable. A processor, by itself, cannot examine an instruction without executing it, so only certain commands can be put into the pipeline - because the next steps are all completely predictable.
Conditional logic cannot be predicted. The processor just knows that it has been instructed to go from where it is, to where you want it to be next. Remember that the pipeline has (or could have) unfinished business when it discovers this command. So, as a built in feature, before the processor executes the conditional logic - in this case, the jump - it will allow the pipeline to empty, and detect that it is empty internally.
In some cases, a near jump compiled into machine code may be optimized into something that the processor doesn't treat as conditional - if that near jump is for a common purpose, the processor might actually be able to continue pipelining. This is why a far jump is recommended to make sure the processor actually flushes the pipeline.