When designing a modern computer / operating system combination one of the things we want to do is run multiple programs at the same time. One of the problems that you would run into designing this system is that all your programs want to assume they have access to all the memory they want, and they don't coordinate what addresses they use.
The solution to the problem is a system called virtual memory. The virtual address space is the address space the operating system makes available for a program to use. When a program tries to access virtual memory at say, address 1024, they don't get to access the physical memory address (the addresses that go out on the wires to the ram chips) 1024. Instead there is a mapping system.
The operating system handles all the mappings, so that two different programs can both access what they consider address 1024, but process 1 might have its virtual address 1024 mapped to physical address 2048, while process 2 might have its virtual address 1024 mapped to physical address 4096.
In order to keep the mapping information manageable, the operating system maps memory in "chunks" called pages. 4096 bytes is a very common page size. In the example you site, a certain process has a single page, located at virtual addresses 4096, that is 4096 bytes in length (extending to virtual address 8191), mapped to the physical address 0 (since the page is 4096 bytes long, the mapping extends to physical address 4095)
The actual size of the virtual address space is not specified (it must be at least 14 bits wide because the address 12287 is mentioned), but that hardly matters. One thing for sure, it is not a 12 bit addressing system. That's just the size of a virtual memory page, the smallest chunk of memory the operating system will manage. The addresses 8192 through 12287 are just other virtual addresses a process could access.
The author asks the question "what happens if there is an access to memory that is not mapped?"
In a computer without a virtual memory mapping system, the hardware notices that accesses to addresses not connected to physical ram are errors. The hardware signals the operating system of the offense. This process is called an error trap. The operating system would then print the message "Nonexistent memory referenced" and terminate the process. That's the suitably rude message.
In a computer with a virtual memory mapping system almost the same thing happens. Since most programs don't use all the memory that they could possibly address, the operating system doesn't map all of a process' virtual memory to physical memory (also, most computers have more virtual address space available then total physical ram installed in them). So when a process tries to access a virtual address in unmapped memory, the hardware notices there is no physical memory mapped to the virtual address in question. The operating system is signaled, it prints a rude message, and terminates the process.
This mapping and error trapping system not only allows multiple processes to have their own views of the address space, it also allows the operating system to contain and protect the running processes from each other. Even though they may be using the same virtual addresses, the operating system keeps different processes mapped to different physical addresses. That way is isn't possible for a process to (accidentally or on purpose) access or overwrite the memory of any other process. This keeps buggy programs from taking out your whole computer when they crash.
An address in a cached system has up to three parts: tag, set and offset.
Since the given system is byte addressable, and a cache line is two words (eight bytes), the offset portion of the address requires 3 bits.
A direct mapped cache has no set association. Or, if you will, it may be regarded as collection of sets, each of which holds only one block. So no set -> block associative lookup is required, and the set field of the address can be called a block field. This field directly determines the block to which the address maps, hence the "direct mappped" designation. The tag is used to determine whether a given block in the cache is a "hit" for the address, or holds data for some other address. (Whereas under set association, the tag is used to search through a set of blocks for a hit: the set elements are associated with addresses via the tag field.
The cache has four blocks, because it holds eight words, but pairs of words are considered blocks. So the set/block part of the address requires two bits.
The remainder are tag bits. Since memory space is 4 Kb wide (let us assume there is no virtual memory), addresses are 12 bits wide, and so there are 12 - 3 - 2 = 7 tag bits.
Note that if the set size were 256 bits, it would make the cache fully associative rather than direct mapped: a situation in which the entire cache is one big set of blocks, and so there is no set field in an address, only a tag and offset. The tag is used to search the entire cache for a hit. Under set association, the additional set field restricts the search to an indexed set, which just holds a single block under the direct mapped cache.
Best Answer
The data bus width has no correlation to the range of memory addresses. The address bus and the data bus are separate entities.
For example, if your data bus is 32 bits wide, and your address bus is 16 bits wide, you can have 2^16 memory addresses that are each 32 bits wide. 2^16x32 = 64k x 32bits.
In my example, the lowest memory address is $0000, and the highest memory address is $FFFF. In your example, the lowest would be $00000000 and the highest would be $FFFFFFFF. Each memory address points to a group of bits (32 bits in both of our examples). If you changed your data bus width to 64 bits, and kept your address bus the same width, your memory address span would stay the same. Each address would simply point to 64 bits instead.