The interface to an AC97 codec is a bit more complicated than straight digital audio. The serial data consists of 256-bit data frames; each frame contains several channels of 20-bit samples. The overall data rate is 12.288 MHz; dividing by 256 gives the sample rate of 48 kHz. Part of the 256-bit frame is dedicated to control messages, e.g. to set mixer registers. You may need to do this once after power up/reset to set the volume.
The AC97 spec is available from Intel. Writing your own master is not unreasonable but it will take some time. You may also be able to find one you can reuse. OpenCores has one. There's a very barebones AC97 controller and some general information about the protocol here.
I can't remember how many of the AC97 registers are standardized. The manual I found online for your board says it has an LM4550 codec. Assuming that's right, you may want to refer to the LM4550 datasheet for a complete list of configuration registers.
The cat6a specifications are for 100m and 10Gb ethernet (so that's fine?)
I think what you're trying to say with this is that if 10G Ethernet can transmit 100 m over Cat6A cable, then it should be possible to transmit 3.2 Gb/s over 50 m with the same cable.
The difference between what it sounds like you want to do and how 10GbE does things is that the Xilinx serial IO, if I recall correctly, outputs a single 3.2 Gb/s serial data stream over a single pair of wires.
10GbE uses several tricks to get the maximum data rate through the longest copper cable.
First, they use all 4 pairs in the Cat6A cable to transmit the 10 Gb/s. That means that each pair is only transmitting 2.5 Gb/s.
Second, they use pre-emphasis encoding to maximize the usable bandwidth of the cable. Basically they enhance the high frequency portion of the transmitted signal. The transmitted signal then doesn't look like a clean data signal. But when its transmitted through the cable, the high-frequency portion is attenuated, and the received signal is closer to the ideal wave shape.
Third, they use error detecting and error correcting codes to allow error-free data transmission even when the cable degrades the signal enough to cause some errors in the raw bit stream.
Fourth, they use a 16-level pulse-amplitude modulation (PAM), instead of simple on-off coding, to send 4 bits of data for every symbol transmitted over the wire.
These last two methods are possible to improve the data rate due to the Shannon Theorem, which says that the maximum possible data transmission rate through a channel is determined both by the bandwidth of the channel and the signal-to-noise ratio in the channel.
I don't think any of this means that what you're proposing is utterly impossible. For example, the 2.5 Gb/s per pair data rate of 10 GbE actually becomes something like 3.125 Gb/s per pair when you include encoding overhead. But doing the PAM encoding to follow the 10GbE model is likely to require a specialized chip for both the transmitter and receiver, and some detailed design work to get it to work.
One possibility is, can you simply packetize this data up and actually send it over a 10 GbE link? That would allow you to use mostly commodity hardware to keep costs down, and also use a "proven" solution to reduce your risks. Some Xilinx FPGAs include a full Ethernet MAC that should enable this solution, but I don't know if its available at the price point you're trying to work at.
Best Answer
Is there any documentation for that is generic or covers Spartan-6 LX9 board?
The DCM is fully contained within the FPGA chip. So it doesn't matter what board you have. The documentation will be based on what chip you have.
For the Spartan-6 family, see the Spartan 6 Clocking Resources User Guide
Is using this module is absolutely necessary in this example? Is that possible to do the same thing easily w/o using that module?
If you're just blinking LEDs, you probably don't need to use the DCM. Probably they used it because its a very complicated block and they specifically wanted to provide an example of using the DCM without much else in the way of complex logic to confuse things.
How people usually go about vendor specific things like clock? For example, if one wants to target both Altera FPGAs and Xilinx ones. What is the process to achieve that? Is it similar to ordinary C pre-processor typedefs etc?
Unfortunately, complex hard cores like clock management is one of the areas where there's not much compatibility between vendors. You'll likely have to make significant code changes if you want a design using these features that is portable between two device families, whether they're from the same or different vendors.
If you can simplify your interface so that you just have one input clock coming from off-chip, and a number of derived clocks that are distributed to the rest of your logic, the cleanest solution is probably to push all of the vendor-specific code into a single module that you can instantiate in your top-level code. Then you can just use a different "clock management" module depending which device you're targeting.
If you want to do something like dynamically changing the phase relationship between various clocks, or switch between several choices of reference clock frequency, you probably have no choice but to re-code substantial chunks of logic for different target devices.
Remember that in addition to the Verilog code, there will probably also be numerous constraints that need to be defined, and these will also need to be different for different devices.