Electronic – Why MIPS uses R0 as “zero” when you could just XOR two registers to produce 0

computer-architecturecpumips

I think that I am looking for an answer to a trivia question. I am trying to understand why the MIPS architecture uses an explicit "zero" value in a register when you can achieve the same thing by just XOR'ing any register against itself. One could say that the operation is already done for you; however, I cannot really imagine a situation where you would be using a lot of "zero" values. I read Hennessey's original papers, and it just assigns a zero as a matter of fact without any real justification.

Does a logical reason to have a hard-coded binary assignment of zero exist?

update:
In 8k of an executable from xc32-gcc for the MIPS core in the PIC32MZ, I have a single instance of "zero".

add     t3,t1,zero

the actual answer:
I awarded the bounty to the person who had the information about MIPS and condition codes. The answer actually lies in the MIPS architecture for conditions. Although I was initially not wanting to assign time to this, I reviewed architecture for opensparc, RISC-V, MIPS-IV and OpenPOWER (this document was internal) and here are the summary findings. The R0 register necessary for comparison on branches due to the architecture of the pipeline.

integer compare against zero and branch (bgez,bgtz,blez,bltz)
integer compare two registers and branch (beq,bne)
integer compare two registers and trap (teq,tge,tlt,tne)
integer compare register and immediate and trap (teqi,tgei,tlti,tnei)

It just simply comes down to how the hardware looks in implementation. From the RISC-V manual, there is an unreferenced quote on page 68:

The conditional branches were designed to include arithmetic comparison operations between
two registers (as also done in PA-RISC and Xtensa ISA), rather than use condition codes (x86,
ARM, SPARC, PowerPC), or to only compare one register against zero (Alpha, MIPS), or
two registers only for equality (MIPS). This design was motivated by the observation that a
combined compare-and-branch instruction ts into a regular pipeline, avoids additional condition
code state or use of a temporary register, and reduces static code size and dynamic instruction
fetch trac. Another point is that comparisons against zero require non-trivial circuit delay
(especially after the move to static logic in advanced processes) and so are almost as expensive as
arithmetic magnitude compares. Another advantage of a fused compare-and-branch instruction
is that branches are observed earlier in the front-end instruction stream, and so can be predicted
earlier. There is perhaps an advantage to a design with condition codes in the case where multiple
branches can be taken based on the same condition codes, but we believe this case to be relatively
rare.

The RISC-V document does not hit at the author of the quoted section. I thank everyone for their time and consideration.

Best Answer

The zero-register on RISC CPUs is useful for two reasons:

It's a useful constant

Depending on restrictions of the ISA, you can't use a literal in some instructions encoding, but you can be sure you can use that r0 to get 0.

It can be used to synthesize other instructions

This is perhaps the most important point. As a ISA designer, you can trade-off a general purpose register to a zero-register to be able to synthesize other useful instructions. Synthesizing instructions is good because by having less actual instructions, you need less bits to encode an operation in a opcode, which frees-up space in the instruction encoding space. You can use that space to have e.g. bigger address offsets and/or literals.

The semantics of the zero-register is like /dev/zero on *nix systems: everything written to it is discarded, and you always read back 0.

Let's see a few examples of how we can make pseudo-instructions with the help of the r0 zero-register:

; ### Hypothetical CPU ###

; Assembler with syntax:
; op rd, rm, rn 
; => rd: destination, rm: 1st operand, rn: 2nd operand
; literal as #lit

; On an CPU architecture with a status register (which contains arithmetic status
; flags), `sub` can be used, with r0 as destination to discard result.
cmp rn, rm     ; => sub r0, rn, rm

; `add` instruction can be used as a `mov` instruction:
mov rd, rm     ; => add rd, rm, r0
mov rd, #lit   ; => add rd, r0, #lit

; Negate:
neg rd, rm     ; => sub rd, r0, rm

; On CPU without status flags,
nop            ; => add r0, r0, r0

; RISC-V's `jal` instruction -- Jump and Link: Jump to PC-relative instruction,
; save return address into rd; we can synthesize a `jmp` instruction out of it.
jmp dest       ; => jal r0, dest

; You can even load from an absolute (direct) address, for a usually small range
; of addresses by using a literal offset as an address.
ld rd, addr    ; => ld rd, [r0, #addr]

The case of MIPS

I looked more closely at the MIPS instruction set. There are a handful of pseudo-instructions that uses $zero; they are mainly used for branches. Here are some examples of what I've found:

move $rt, $rs          => add $rt, $rs, $zero

not $rt, $rs           => nor $rt, $rs, $zero

b Label                => beq $zero, $zero, Label ; a small relative branch

bgt $rs, $rt, Label    => slt $at, $rt, $rs
                          bne $at, $zero, Label

blt $rs, $rt, Label    => slt $at, $rs, $rt
                          bne $at, $zero, Label

bge $rs, $rt, Label    => slt $at, $rs, $rt
                          beq $at, $zero, Label

ble $rs, $rt, Label    => slt $at, $rt, $rs
                          beq $at, $zero, Label

As for why you have found only one instance of the $zero register in your disassembly, perhaps it's your disassembler that is smart enough to transform known sequences of instructions into their equivalent pseudo-instruction.

Is the zero-register really useful?

Well, apparently, ARM finds having a zero-register useful enough that in their (somewhat) new ARMv8-A core, which implement AArch64, there's now a zero-register in 64-bit mode; there wasn't a zero-register before. (The register is a bit special though, in some encoding contexts it's a zero-register, in others it instead designates the stack pointer)

Related Solutions

Electronic – MIPS (PIC32): branch vs. branch likely

MIPS is one of several RISC (reduced instruction set computers) architectures that are designed to execute one instruction per clock cycle. In order to achieve this, the original MIPS processors had a five-stage pipeline:

enter image description here

The abbreviations are in the above figure are: IF (Instruction Fetch), RD (Read from register file), ALU (Execute instruction in Arithmetic Logic Unit), MEM (Read/write Memory access), WB (Write back to register file). The vertical axis is successive instructions; the horizontal axis is time.

Because the MEM stage occurs after the ALU stage, RISC machines like MIPS don't do arithmetic or logical operations on memory, but only on registers. For this reason they are also referred to as load/store architectures.

There are several hazard conditions where the pipeline can stall and cause a penalty in the over instructions per cycle (IPC) value. A data hazard occurs, for example, when an instruction attempts to use data in one of the registers before it has been loaded into the register. For example:

lw $3, 100($2)
add $1, $2, $3

The data is not loaded until the MEM stage of the first instruction, which is too late for it to be available for the EX stage of the second instruction.

Control hazards occur because on any branch taken, the instruction immediately after the branch is always fetched from the instruction cache. If this instruction is ignored, there is a one cycle per taken branch IPC penalty.

The solution for the MIPS architecture was the "Branch Delay Slot": always fetch the instruction after the branch, and always execute it, even if the branch is taken. This gets a little weird when writing MIPS assembly code, because when you are reading it, you have to take into account the instruction after the branch is always going to be executed. The trick in writing efficient code is to put in an instruction that will be useful as part of the loop that is being taken executed, but do no harm if the branch is not taken.

The MIPS designers were counting on compiler writers to write clever enough code generators to handle this efficiently. However many do not (including Microchips C32 compiler, based on GCC), and just put NOP's after every branch, wasting both code space and cycles.

So in the R4000 architecture, MIPS added Branch Likely instructions which still always fetch the instruction after the branch from the instruction cache, but only execute it if the branch is taken (opposite of what one might expect). Compilers can then always fill the branch delay slot on such a branch.

A loop like:

loop:
    first instruction
    second instruction
    ...
    blez t0, loop
    nop

can be turned into:

loop:
    first instruction
loop2:   
    second instruction
    ...
    blez t0, loop2
    first instruction

The repeated "first instruction" after the branch is always executed if the branch is taken (and becomes part of the next go-around of the loop. This instruction is ignored if the branch is not taken (incurring a slight IPC penalty).

However as it turns out, trying to include this feature in high-performance designs has been a pain in the neck due to the complexity in getting rid of the result of the ignored instruction. Therefore it has been deprecated.

Electrical – Load/Store architecture

What you mention is only one of many differences between CISC and RISC.

One way that RISC tries to minimize memory stalls is by having the compiler schedule the memory accesses. With CISC, the compiler has little opportunity to optimize memory accesses, but an advantage of RISC's simpler, single-cycle instructions is that it can rearrange those instructions at compile-time to optimize memory accesses. CISC instructions are too complex for the compiler to know when and where instructions can be rearranged. RISC's advantage partially depends on an optimizing compiler understanding how instruction flow can be manipulated.

There are other attributes of RISC that are meant to offer improvements. One is heavy pipe-lining, another is the potential for faster clock speeds, and a third is instruction and data caches. However, CISC architectures have adopted many of the techniques that were envisioned for RISC, and have tended to keep up with RISC in performance.