It's been awhile since this was asked, but I hate orphaned questions :)
First, let's over-simplify a modern x86 platform and pretend it has 32-bits of address space from 0x00000000 to 0xFFFFFFFF. We'll ignore all the special / reserved areas, TOLUD (top of lower usable DRAM, Intel parlance) holes, etc. We'll call this system memory map.
Second, PCI Express extends PCI. From a software point of view, they are very, very similar.
I'll jump to your 3rd one -- configuration space -- first. Any addresses that point to configuration space are allocated from the system memory map. A PCI device had a 256 byte configuration space -- this is extended to 4KB for PCI express. This 4KB space consumes memory addresses from the system memory map, but the actual values / bits / contents are generally implemented in registers on the peripheral device. For instance, when you read the Vendor ID or Device ID, the target peripheral device will return the data even though the memory address being used is from the system memory map.
You stated these are "allocated into RAM" -- not true, the actual bits / stateful elements are in the peripheral device. However, they are mapped into the system memory map. Next, you asked if it was a common set of registers across all PCIe devices -- yes and no. The way PCI config space works, there is a pointer at the end of each section that indicates if there is more "stuff" to be read. There's a bare minimum that all PCIe devices have to implement, and then the more advanced devices can implement more. As for how useful it is for functional operation, well, it's mandatory and heavily utilized. :)
Now, your question about BARs (base address registers) is a good space to segue into memory space and I/O space. Being somewhat x86 centric, the specification allows the specification of a BAR size, in addition to the type. This allows a device to request a regular memory-mapped BAR, or a IO space BAR, which eats some of the 4K of I/O space a x86 machine has. You'll notice that on PowerPC machines, I/O space BARs are worthless.
A BAR is basically the device's way to tell the host how much memory it needs, and of what type (discussed above). If I ask for say 1MB of memory-mapped space, the BIOS may assign me address 0x10000000 to 0x10100000. This is not consuming physical RAM, just address space (do you see now why 32-bit systems run into issues with expansion cards like high-end GPUs that have GB of RAM?). Now a memory write / read to say 0x10000004 will be sent to the PCI Express device, and that may be a byte-wide register that connects to LEDs. So if I write 0xFF to physical memory address 0x10000004, that will turn on 8 LEDs. This is the basic premise of memory-mapped I/O.
I/O space behaves similarly, except it operates in a separate memory space, the x86 I/O space. Address 0x3F8 (COM1) exists both in I/O space and memory space and are two different things.
Your last question, messages refer to a new type of interrupt mechanism, message signaled interrupts or MSI for short. Legacy PCI devices had four interrupt pins, INTA, INTB, INTC, INTD. These were generally swizzled among slots such that INTA went to INTA on Slot 0, then INTB on Slot 1, then INTC on Slot 2, INTD on Slot 3, and then back to INTA on Slot 4. The reason for this is that most PCI devices implemented only INTA and by swizzling it, having say three devices, each would end up with their own interrupt signal to the interrupt controller. MSI is simply a way of signaling interrupts using the PCI Express protocol layer, and the PCIe root complex (the host) takes care of interrupting the CPU.
This answer might be too late to help you, but maybe it will help some future Googler / Binger.
Finally, I recommend reading this book from Intel to get a good, detailed introduction to PCIe before you go any further. Another reference would be the Linux Device Drivers, an online ebook from LWN.
Best Answer
The card is effectively a Serializer/Deserializer (SerDes for short). One possible implementation is there's a clock for the parallel bus running at PCIe 3.0 speed, which latches the incoming 8 bits of data. Then internally there is a shift register running at a faster clock (10Gb ethernet speeds) which just shifts that out. The process can be performed in reverse to convert ethernet data into PCIe signals.
There is a strong correlation between which PCIe pin corresponds to which bit of the parallel buffer, which is what guarantees the correct ordering in both transmit and receive mode (ex.: pin 0 is always bit 0, pin 1 is always bit 1, etc.). Parallelism is exposed at the bit level.
There is more complications when dealing with the actual PCIe/10G ethernet protocols, (10Gb ethernet uses a 64b/66b encoding), but this basic picture mostly sums up the broad idea.