Data is stored in caches in the form of "lines". Each line could be just one byte or just one 32-bit word, but usually, it's a much larger amount, such as 32 words (128 bytes) — often matched to the maximum "burst" size available from the primary (DDR SDRAM) memory.
Also, most caches have a limited amount of "associativity", which is another way of saying that the data from a particular address can only be stored in a limited number of places in the cache. In modern CPUs, 4-way and 8-way associativity are common, as this gives a good balance between performance and complexity.
If you have a 1-MB cache that has 128-byte lines, that means you have 8K lines in the cache. If you have 8-way associativity, that means that the lines are broken up into 1K groups of 8. The group number is usually defined by the low-order address bits for that particular line of memory. In this case, 10 bits would be required to index the cache.
The address bits stored with each line are called the "tag", and the tag is used to decide whether the address currently being requested by the CPU is being held in that line of the cache.
The number of bits required in the tag is simply the number of address bits coming out of the CPU (virtual address bits for a virtual cache, physical address bits for a physical cache), minus log2 the cache line size (i.e., the number of bits required to address a byte within a line), minus log2 the number of line groups in the cache (the number of bits required to select an associativity group).
For example, if you have a physical address space of 4 GB (32 address bits), and a 1 MB, 8-way cache with a line size of 128 B (7 address bits), the tags would need 32 - 7 - 10 = 15 bits.
That instructable is kind of confusing. You'd be better off selecting an actual book from the big list.
Without getting into a discussion on various architectures which would just lead down the rabbit hole, I'll use the architecture described in the instructable and work through an example of a simple addition program.
Below is the RAM as described. On the left are the 16 addresses. Each address holds a byte. This byte may be data (demarcated as D
) or an instruction consisting of an opcode (O
) and an address (A
).
1111 DDDDDDDD
1110 DDDDDDDD
1101 DDDDDDDD
1100 DDDDDDDD
1011 DDDDDDDD
1010 DDDDDDDD
1001 DDDDDDDD
1000 DDDDDDDD
0111 OOOOAAAA
0110 OOOOAAAA
0101 OOOOAAAA
0100 OOOOAAAA
0011 OOOOAAAA
0010 OOOOAAAA
0001 OOOOAAAA
0000 OOOOAAAA
The program counter (PC) starts off at zero. This tells the processor to fetch the byte at address 0000
from the RAM and treat it as an instruction. So the processor fetches the byte into the Instruction Register (IR). The top four bits of the data retrieved go to the "control matrix" and the bottom four to the MAR. This split happens each time an instruction is fetched.
Note: Those particular terms are not what I would consider typical (at least in my experience) but we'll go with them for this example.
The processor fetches the instruction at address 0000
since PC = 0000
. Our first opcode is going to say, "move the data that is in address 1000
into the accumulator" (I'm going to use prose instead of confusing things by picking a particular flavor of assembly language).
So the processor fetches the data at address 1000
(let us say it is the number 2) and moves it into the accumulator (ACC). Now ACC = 2. The program counter gets automatically incremented so PC = 0001
.
The next instruction at address 0001
says, "add the data that is in the accumulator to the data at address 1001
and store it back in the accumulator". So the processor takes what is in the accumulator and feeds it into one side of the Arithmetic Logic Unit (ALU). The processor takes the data that is at address 1001
(let us say it is the number 3) and feeds it into the other half of the ALU. The ALU preforms the addition of the two numbers and the output (the number 5) is stored in the accumulator. Now ACC = 5. The program counter again gets automatically incremented so PC = 0010
.
The last instruction of our little program at address 0010
says, "store what is in the accumulator at address 1010
". The processor then takes what is in the accumulator and stores it at address 1010
. So now RAM address 1010
= 5.
Hopefully that example is a bit clearer picture of what is going on. Various architectures handles things slightly different ways. But the basic flow is usually similar.
Below is diagram of the basic registers and control circuits of most processors. There are a few more registers than we've been discussing. You can ignore those for the moment for the purposes of this discussion or read more about them at your leisure. Hopefully the visual aid will help make things a bit clearer.
Below is the flow of each step a processor takes. First it fetches an instruction and then that instruction tells it to fetch data to operate on from RAM.
Step 1. [Address] PC -> MAR -> RAM
Step 2. [Instruction] RAM -> MDR -> IR
Step 3. [Address] IR -> MAR -> RAM
Step 4. [Data] RAM -> MDR -> ACC (or R0, etc.)
Best Answer
You can't tell for sure just by looking at the size of the memory. Often the processor will have an addressing range that exceeds the amount of memory the computer is designed to take. Most processors have at least a 64k address range, so a computer with only 1k or 16k of memory is probably limited by other factors.
At the other end of the scale, having a large amount of memory does not mean that the processor must have equally large address registers. For example the Intel 8086 achieved a 1 Megabyte address range by adding two 16 bit registers together. Another way to increase addressing range is to use an I/O port to set the upper address lines, thus switching different bits of memory in one bank at a time (this is commonly used with 8 bit CPUs to access more than 64k of memory).
The size of data and instruction registers may match the data bus width, but there are many exceptions. 16 and 32 bit processors often have 8 bit registers that can be used independently or concatenated to make 16 or 32 bits. Some are able to access single bytes on a 16 or 32 bit data bus. It is also possible to have internal registers that are wider than the buses. For example the Motorola MC68008 executes 16 bit instructions and has 32 bit data and address registers, but only has an 8 bit data bus and a 20 bit address bus.