Data is stored in caches in the form of "lines". Each line could be just one byte or just one 32-bit word, but usually, it's a much larger amount, such as 32 words (128 bytes) — often matched to the maximum "burst" size available from the primary (DDR SDRAM) memory.
Also, most caches have a limited amount of "associativity", which is another way of saying that the data from a particular address can only be stored in a limited number of places in the cache. In modern CPUs, 4-way and 8-way associativity are common, as this gives a good balance between performance and complexity.
If you have a 1-MB cache that has 128-byte lines, that means you have 8K lines in the cache. If you have 8-way associativity, that means that the lines are broken up into 1K groups of 8. The group number is usually defined by the low-order address bits for that particular line of memory. In this case, 10 bits would be required to index the cache.
The address bits stored with each line are called the "tag", and the tag is used to decide whether the address currently being requested by the CPU is being held in that line of the cache.
The number of bits required in the tag is simply the number of address bits coming out of the CPU (virtual address bits for a virtual cache, physical address bits for a physical cache), minus log2 the cache line size (i.e., the number of bits required to address a byte within a line), minus log2 the number of line groups in the cache (the number of bits required to select an associativity group).
For example, if you have a physical address space of 4 GB (32 address bits), and a 1 MB, 8-way cache with a line size of 128 B (7 address bits), the tags would need 32 - 7 - 10 = 15 bits.
Yes, that is essentially correct. The key question is, "For any particular memory address, how many different locations in the cache can hold that address?" Each one of those potential cache locations needs a tag comparator in order to determine whether that location does in fact currently contain that memory address.
Best Answer
From an architectural point of view, it depends on which policy your cache uses.
In the write through policy, the result of any operation is stored in the cache and in the physical memory in the same cycle; therefore the write operation will take longer, while the read operation will depend on the presence of the data block in the cache.
In the write back policy, the result is just stored in the cache and copied in the physical memory only when the same data block is required. Therefore writing and reading can be at the same speed, or either of them can be slower.
More info.