To answer your base question, to get to the address of pin A10 of the memory you need to look at the memory map for the ARM device...
In this case it looking at the memory map:
0x8000 0000 -> 0x8FFF FFFF is mapped to CSD0 (SDRAM/DDR)
and
0x9000 0000 -> 0x9FFF FFFF is mapped to CSD1 (SDRAM/DDR)
You'd need to know what chip select was used in the PCB design to determine your answer as to which bank your DDR is attached to.
This implies that A10 is located as either 0x8000 0400 (as you mentioned) or 0x9000 0400.
As to why 0x8000 0F00 was used in place of 0x8000 0400...reading the datasheet for that memory implies, but does not state that the other address pins, A(n) are don't cares for this operation so the coder probably just tossed in an F there instead of figuring out that it was only a 4 that was needed.
I also don't find those 2 sections of the datasheet to contradict each other. The first is basically just saying that you need to reference the device's memory map to locate the real address where the memory is mapped so you can use that address as a base.
The second quotes tells you that bit 0 of the address corresponds to address pin 0 on the memory in this mode, which may not always be the case in normal operation. It may depend on the data/address width of the memory combined with alignment issues for the core.
Address translation is handled through a translation lookaside buffer (TLB), which is just a cache of translation information (and some metadata like permissions, cacheability, etc.). The TLB works by substituting the physical page number (the address bits above those used to index within a page) for the provided virtual page number (i.e., the virtual page is mapped to the physical page). (Since virtual pages are aligned with physical pages at page granularity, the bits indexing within a page match for virtual and physical addresses of a given page.)
Typically, to reduce delay in retrieving data, the cache is indexed with the virtual address in parallel with the TLB lookup; this would be a virtually addressed cache, but if only index bits within a page are used then it is also a physically addressed cache (because those bits of the virtual address match the bits of the physical address). (A cache might be physically addressed at least partially in parallel with TLB access by predicting the extra non-virtual bits or by feeding in the extra bits after partially indexing the cache, but the tradeoffs seem to favor virtually addressed caches.)
(Using non-physical address bits in indexing the cache can introduce complexities since another mapping of the page might not use the same virtual indexing bits.)
Currently, physical tagging is preferred where a cache hit is determined by comparing the tag at the appropriate index with the requested physical address. Coherence with other devices accessing memory (I/O devices or processors), which provide physical addresses to the system, is easier with physical tags (avoiding the need for a physical address to virtual address translation mechanism, though physical tags could be provided in addition to virtual tags by duplicating the tag storage or by using of an inclusive L2 cache).
As an example, with an 8KiB, two-way set associative cache with 16 byte blocks using 4KiB pages in a 32-bit address space, there would be 256 sets (groups of cache blocks sharing the same index)--requiring 8 bits to index. A load of the 32-bit word at 0x00ab_1134 would index the sets with 8 bits (0x13), read the two tags for that set, and read the words at offset 0x4 in both data blocks for the set. (Reading both blocks reduces delay.) While indexing the cache, the page number, the top 20 bits of the address (0x00ab_1) is presented to the TLB (usually with an address space ID appended); assuming the information for that page is available in the TLB (a TLB hit), the translation is sent to be compared with both tags resulting in either a match against one of the tags (in which case the data corresponding to that tag is selected) or no match (in which case there is a cache miss). (The TLB will also check to see if the process has read permission for that page.)
With a virtually tagged cache, the TLB can be taken out of the critical path (potentially reducing cache access delay with a larger TLB) since it is only needed for permission checks not for tag comparison. (Permission information could even be included with the cache tags.) Typically a system has a larger virtual address space than physical (cacheable) address space, so virtual address tags would require more storage space; this storage demand is increased by the addition of address space IDs to avoid having to flush the cache when a different process is loaded (a Single Address Space OS would not need such a flush).
The wikipedia article for "CPU cache" might be helpful.
Best Answer
Virtual address translation is needed for several reasons:
More ram can be addressed than there is available. For instance, the CPU in my laptop can address 256TB of memory, whereas it only has 8Gb of memory. This extra address space lets the kernel allocate far more memory than is available, and it can swap pages to disk for applications that aren't being used.
Virtual address translation prevents memory fragmentation. Imagine a program that frequency allocates and deallocates large objects, the size of a memory page. If the addresses were physical, the memory space would quickly become fragmented, with no large areas of memory free. However, the kernel can remap the virtual and physical addresses so that there's always a large section of memory free, and if part of the address space is fragmented, so what? Only the pages of memory that are in use need to be backed by physical memory, and there's plenty of address space to put new allocations.
Virtual addresses give better security. Remember those CPU bugs earlier this year, Spectre and Meltdown? They rely on knowing some information about the mapping of virtual addresses for different processes and the kernel. If you turn of virtual addressing, they and other attacks become a lot easier because you then know the addresses of the kernel and other processes.
As to why you would turn off the MMU, I can only guess that it's referring to the state of the MMU during startup of the processor, before the kernel has set up the page table. You wouldn't turn the MMU off during normal operation of an operating system.