Physical address is hardware address of physical memory and virtual address is the one the processor will be seeing, it has it has a tag and offset. I understand this. Can any one describe it with an example, like how the MMU does this operation (what it adds to the physical address) and what's memory mapping? And what is physically addressed physically tagged, virtually addressed virtually tagged?
Electronic – Physical address vs virtual address
addressingcachememorysram
Related Solutions
When designing a modern computer / operating system combination one of the things we want to do is run multiple programs at the same time. One of the problems that you would run into designing this system is that all your programs want to assume they have access to all the memory they want, and they don't coordinate what addresses they use.
The solution to the problem is a system called virtual memory. The virtual address space is the address space the operating system makes available for a program to use. When a program tries to access virtual memory at say, address 1024, they don't get to access the physical memory address (the addresses that go out on the wires to the ram chips) 1024. Instead there is a mapping system.
The operating system handles all the mappings, so that two different programs can both access what they consider address 1024, but process 1 might have its virtual address 1024 mapped to physical address 2048, while process 2 might have its virtual address 1024 mapped to physical address 4096.
In order to keep the mapping information manageable, the operating system maps memory in "chunks" called pages. 4096 bytes is a very common page size. In the example you site, a certain process has a single page, located at virtual addresses 4096, that is 4096 bytes in length (extending to virtual address 8191), mapped to the physical address 0 (since the page is 4096 bytes long, the mapping extends to physical address 4095)
The actual size of the virtual address space is not specified (it must be at least 14 bits wide because the address 12287 is mentioned), but that hardly matters. One thing for sure, it is not a 12 bit addressing system. That's just the size of a virtual memory page, the smallest chunk of memory the operating system will manage. The addresses 8192 through 12287 are just other virtual addresses a process could access.
The author asks the question "what happens if there is an access to memory that is not mapped?"
In a computer without a virtual memory mapping system, the hardware notices that accesses to addresses not connected to physical ram are errors. The hardware signals the operating system of the offense. This process is called an error trap. The operating system would then print the message "Nonexistent memory referenced" and terminate the process. That's the suitably rude message.
In a computer with a virtual memory mapping system almost the same thing happens. Since most programs don't use all the memory that they could possibly address, the operating system doesn't map all of a process' virtual memory to physical memory (also, most computers have more virtual address space available then total physical ram installed in them). So when a process tries to access a virtual address in unmapped memory, the hardware notices there is no physical memory mapped to the virtual address in question. The operating system is signaled, it prints a rude message, and terminates the process.
This mapping and error trapping system not only allows multiple processes to have their own views of the address space, it also allows the operating system to contain and protect the running processes from each other. Even though they may be using the same virtual addresses, the operating system keeps different processes mapped to different physical addresses. That way is isn't possible for a process to (accidentally or on purpose) access or overwrite the memory of any other process. This keeps buggy programs from taking out your whole computer when they crash.
I agree that illustration is confusing.
The top half of the page is intended to describe the TLB. It sounds like you understand TLB stuff pretty well.
The entire bottom half of the page is intended to describe the data cache. (The label "cache" on the left is intended to apply to the entire bottom half of the page. How could it be redrawn to make it more obvious that it applies not only to the cache metadata valid+tag bits, but also all the data all the way to the right edge of the page?).
It suddenly splits up the physical address and uses it to index the cache, I guess.
Yes. The bottom half of that page, as you just said, and like most large caches, is a physically-indexed, physically-tagged data cache.
But why is it showing the cache and data separately?
That part of the illustration is unnecessarily confusing.
While in principle each word of memory could have its own valid+tag bits, most data caches share the valid+tag bits for a much larger block of data copied from main memory -- a block called a cache line. Loading more data than the program specifically asked for in a single instruction is often helpful, because practically all programs have some spatial locality.
The resulting cache entry structure looks something like
v tag w w w w w w w w w w w w w w w w
v tag w w w w w w w w w w w w w w w w
v tag w w w w w w w w w w w w w w w w
v tag w w w w w w w w w w w w w w w w
v tag w w w w w w w w w w w w w w w w
v tag w w w w w w w w w w w w w w w w
v tag w w w w w w w w w w w w w w w w
v tag w w w w w w w w w w w w w w w w
where the 'v' indicates the valid bit, and each 'w' represents a word of data.
Inexplicably, the book's illustration only shows one of the many blocks of data in the cache:
v tag
v tag
v tag
v tag
v tag
v tag w w w w w w w w w w w w w w w w < -- hit on this cache line.
v tag
v tag
and then the book's illustration inexplicably rotates the words in that cache line to show all the words of that one cache line stacked on top of each other.
When the data cache detects a hit -- when the cache tag matches the tag part of the desired address, and the valid bit is set -- then the "block offset" part of the address indicates one particular word of that one particular cache line.
Perhaps the illustrator ran out of room drawing an extremely wide cache line, and arbitrarily decided to rotate that line to make it fit on the page without considering how confusing that would be?
The data cache’s block size is 128 Bytes.
So for any physical byte address, the bottom 7 bits indicate some particular byte within a cache line, and all the upper bits of that address are used to select some particular cache line.
why is the byte offset just left floating?
The byte offset is left floating in this illustration, because the byte offset is not used by the TLB or by the data cache. A typical TLB and the data cache, like the one illustrated, only deal with aligned 32-bit words. The 2 bits of the address that select one of the 4 bytes within a 32-bit word are handled elsewhere.
Some simple CPUs only have hardware for aligned whole-word access. (I call them "Neither Endian" in "DAV's Endian FAQ"). Compiler writers for such CPUs must add padding to ensure that every instruction is aligned and every data value is aligned. (The two-bit byte offset should always be zeros on these machines).
Many CPUs have a LOAD instruction that can load unaligned 32-bit values into a 32-bit register. Such CPUs have special hardware elsewhere (not part of the cache) that, for each LOAD instruction (sometimes) does 2 reads from the data cache -- the unaligned 32-bit value can overlap 2 different cache lines; either or both read may cause a cache miss. The 2 bits of the address that select one of the 4 bytes within a (aligned) 32-bit word are used internally by the CPU to select the relevant bytes that the cache returns for those reads and re-assemble those bytes into the (unaligned) 32-bit value that the programmer expects. Even though such instructions give the correct results no matter how things are aligned or mis-aligned in memory, assembly language programmers and compiler writers and other programmers obsessed with optimization sometimes add padding anyway to get (some) instructions aligned or (some) data aligned or both. ("How and when to align to cache line size?"; "Aligning to cache line and knowing the cache line size"; etc.) They try to justify this padding by claiming it "optimizes" the program to "run faster". Recent tests seem to indicate data alignment for speed is a myth.
the relationship between a TLB and cache
Conceptually the only connection between the TLB and a (physically-indexed, physically-tagged) data cache is the bundle of wires carrying the physical-address output of the TLB to the physical-address input of the data cache.
One person can design a data cache for a simple CPU without virtual memory that caches physical addresses. Another person can design a TLB for a simple CPU that has no data cache (A CPU with a TLB but no data cache was once a common arrangement for mainframe computers).
In principle, a third person can splice that TLB and that data cache together, wiring the physical-address output of the TLB to the physical-address input of the data cache. The TLB neither knows nor cares that it is now connected to the data cache rather than the main memory address bus. The the data cache neither knows nor cares that it is now connected to the TLB rather than directly to the CPU address register(s).
Related Topic
- Electronic – Role of the Memory Management Unit
- The difference between a virtually tagged and a physically tagged memory system
- Which are the spec bits in this cache
- Electronic – Difference between logical and virtual addresses
- Electronic – What’s the need of translating the virtual address to physical address
Best Answer
Address translation is handled through a translation lookaside buffer (TLB), which is just a cache of translation information (and some metadata like permissions, cacheability, etc.). The TLB works by substituting the physical page number (the address bits above those used to index within a page) for the provided virtual page number (i.e., the virtual page is mapped to the physical page). (Since virtual pages are aligned with physical pages at page granularity, the bits indexing within a page match for virtual and physical addresses of a given page.)
Typically, to reduce delay in retrieving data, the cache is indexed with the virtual address in parallel with the TLB lookup; this would be a virtually addressed cache, but if only index bits within a page are used then it is also a physically addressed cache (because those bits of the virtual address match the bits of the physical address). (A cache might be physically addressed at least partially in parallel with TLB access by predicting the extra non-virtual bits or by feeding in the extra bits after partially indexing the cache, but the tradeoffs seem to favor virtually addressed caches.)
(Using non-physical address bits in indexing the cache can introduce complexities since another mapping of the page might not use the same virtual indexing bits.)
Currently, physical tagging is preferred where a cache hit is determined by comparing the tag at the appropriate index with the requested physical address. Coherence with other devices accessing memory (I/O devices or processors), which provide physical addresses to the system, is easier with physical tags (avoiding the need for a physical address to virtual address translation mechanism, though physical tags could be provided in addition to virtual tags by duplicating the tag storage or by using of an inclusive L2 cache).
As an example, with an 8KiB, two-way set associative cache with 16 byte blocks using 4KiB pages in a 32-bit address space, there would be 256 sets (groups of cache blocks sharing the same index)--requiring 8 bits to index. A load of the 32-bit word at 0x00ab_1134 would index the sets with 8 bits (0x13), read the two tags for that set, and read the words at offset 0x4 in both data blocks for the set. (Reading both blocks reduces delay.) While indexing the cache, the page number, the top 20 bits of the address (0x00ab_1) is presented to the TLB (usually with an address space ID appended); assuming the information for that page is available in the TLB (a TLB hit), the translation is sent to be compared with both tags resulting in either a match against one of the tags (in which case the data corresponding to that tag is selected) or no match (in which case there is a cache miss). (The TLB will also check to see if the process has read permission for that page.)
With a virtually tagged cache, the TLB can be taken out of the critical path (potentially reducing cache access delay with a larger TLB) since it is only needed for permission checks not for tag comparison. (Permission information could even be included with the cache tags.) Typically a system has a larger virtual address space than physical (cacheable) address space, so virtual address tags would require more storage space; this storage demand is increased by the addition of address space IDs to avoid having to flush the cache when a different process is loaded (a Single Address Space OS would not need such a flush).
The wikipedia article for "CPU cache" might be helpful.