It's been awhile since this was asked, but I hate orphaned questions :)
First, let's over-simplify a modern x86 platform and pretend it has 32-bits of address space from 0x00000000 to 0xFFFFFFFF. We'll ignore all the special / reserved areas, TOLUD (top of lower usable DRAM, Intel parlance) holes, etc. We'll call this system memory map.
Second, PCI Express extends PCI. From a software point of view, they are very, very similar.
I'll jump to your 3rd one -- configuration space -- first. Any addresses that point to configuration space are allocated from the system memory map. A PCI device had a 256 byte configuration space -- this is extended to 4KB for PCI express. This 4KB space consumes memory addresses from the system memory map, but the actual values / bits / contents are generally implemented in registers on the peripheral device. For instance, when you read the Vendor ID or Device ID, the target peripheral device will return the data even though the memory address being used is from the system memory map.
You stated these are "allocated into RAM" -- not true, the actual bits / stateful elements are in the peripheral device. However, they are mapped into the system memory map. Next, you asked if it was a common set of registers across all PCIe devices -- yes and no. The way PCI config space works, there is a pointer at the end of each section that indicates if there is more "stuff" to be read. There's a bare minimum that all PCIe devices have to implement, and then the more advanced devices can implement more. As for how useful it is for functional operation, well, it's mandatory and heavily utilized. :)
Now, your question about BARs (base address registers) is a good space to segue into memory space and I/O space. Being somewhat x86 centric, the specification allows the specification of a BAR size, in addition to the type. This allows a device to request a regular memory-mapped BAR, or a IO space BAR, which eats some of the 4K of I/O space a x86 machine has. You'll notice that on PowerPC machines, I/O space BARs are worthless.
A BAR is basically the device's way to tell the host how much memory it needs, and of what type (discussed above). If I ask for say 1MB of memory-mapped space, the BIOS may assign me address 0x10000000 to 0x10100000. This is not consuming physical RAM, just address space (do you see now why 32-bit systems run into issues with expansion cards like high-end GPUs that have GB of RAM?). Now a memory write / read to say 0x10000004 will be sent to the PCI Express device, and that may be a byte-wide register that connects to LEDs. So if I write 0xFF to physical memory address 0x10000004, that will turn on 8 LEDs. This is the basic premise of memory-mapped I/O.
I/O space behaves similarly, except it operates in a separate memory space, the x86 I/O space. Address 0x3F8 (COM1) exists both in I/O space and memory space and are two different things.
Your last question, messages refer to a new type of interrupt mechanism, message signaled interrupts or MSI for short. Legacy PCI devices had four interrupt pins, INTA, INTB, INTC, INTD. These were generally swizzled among slots such that INTA went to INTA on Slot 0, then INTB on Slot 1, then INTC on Slot 2, INTD on Slot 3, and then back to INTA on Slot 4. The reason for this is that most PCI devices implemented only INTA and by swizzling it, having say three devices, each would end up with their own interrupt signal to the interrupt controller. MSI is simply a way of signaling interrupts using the PCI Express protocol layer, and the PCIe root complex (the host) takes care of interrupting the CPU.
This answer might be too late to help you, but maybe it will help some future Googler / Binger.
Finally, I recommend reading this book from Intel to get a good, detailed introduction to PCIe before you go any further. Another reference would be the Linux Device Drivers, an online ebook from LWN.
Best Answer
This sounds a lot like how high-throughput routers are designed. One of the main problems in router design is exactly the "memory matrix" or "cross-bar backplane" that you describe. The large companies in the space (Juniper, Cisco, F5, etc) do not use PCIe for this purpose.
I think a better interconnect might be HyperTransport (which is AMDs solution to the same problem that Intel later developed as QuickConnect for.)
Note that in a PC, the PCIe bus sits "after" a HyperTransport bus, and thus HT has lower latency and higher throughput than PCIe can get to RAM (which in turn sits on the other end of the CPU these days.)