Relays have the best specs: (almost) zero resistance when on, infinite resistance when off. Use reed-relays, not power relays.
They're better suited for the low current, and often have a lifetime as high as 100 \$\times\$ 10\$^6\$ operations, which is forever. Reed relays won't give you an audible click either. If you use SPDT or DPDT relays you can switch between signal and ground, so that the input doesn't pick up noise when off.
If you want to go electronic there's the 74HC4066, like Pentium100 suggests, but you'll easily find switches with better specs. Analog Devices has a wide offering, the dual SPDT ADG1636 has very good figures: 1\$\Omega\$ on resistance, and 0.007% THD+N.
On the other hand, the 74HC4066 has 0.12% THD (at 4V\$_{PP}\$) and 50\$\Omega\$ on resistance. With a series resistor of 47k\$\Omega\$ that results in a 0.00012% THD. The ADG1636 will result in 0.00000015% :-). Keep in mind that the 4066 is SPST.
If you're a purist you go for the AD, otherwise the 4066 will do if you don't need the double throw feature.
The electronic switches can be controlled by a logic voltage and hardly need power (the ADG1636 consumes less than 1\$\mu\$A).
You know that shopping questions are not allowed here? Fortunately, I have never been one to follow the rules...
Large amounts of audio over Ethernet is not easy-- or cheap. I've been doing this professionally for the past 14 years, and I still have not gotten the price down to what I would consider cheap.
I would not recommend a DIY approach to this. Building the PCB's, writing the software, testing, etc. is difficult for this type of project. That's fine if you want to start a new career in audio over Ethernet, but this is probably too much for someone who just does this as a hobby.
For commercial products, the cheapest that I know of are the boxes by Atterotech. They follow the Cobranet protocol standard and so will inter-operate with other Cobranet devices. But while I said this is the cheapest, it is not cheap! Also, this is pro-audio gear, with pro-audio performance. Other companies that make similar products are QSC Audio, Rane, Whirlwind, Peavey, Biamp, and many others.
There is not much for modules that does both the networking interface and the ADC/DAC circuits. In a former life I designed the Cirrus Logic CM-1 and CM-2 modules which will do up to 32x32 channels of networked audio-- but they do not include the ADC's and DAC's. Connecting converters to these modules is not difficult, but might still be beyond what you want to do.
There are other modules similar to the CM-1/2 from Audinate, Lab X, and others. But I do not think that these will be any easier or cheaper for your uses.
Best Answer
If your word select is out of sync with the data, you don't have the hardware interface set up correctly. You need to fix that problem first. We can't help you with that since you didn't share the code you're using. But once you get that straightened out, here are some general guidelines regarding setting the sample rate on the receiver.
On the transmit side, you put the incoming audio into a FIFO buffer. When that buffer fills to a certain level, or once a certain amount of time has passed, you take a set of audio samples out of that buffer and transmit them in a UDP packet.
UDP packets can get lost or arrive out of order, so you include a sequence number in the packet so that the receiving side can detect either of these occurrances. The packets also experience random delays over some range that is generally bounded.
On the receive side, you take the audio sample data out of the packet, verify the sequence number, and put the data into another FIFO. When this FIFO fills to a level that represents the range of typical packet delays, you start reading the audio samples out and sending them to your audio DAC at the nominal sample rate. If the FIFO ever "runs dry", set the (re-)starting threshold higher.
However, the transmit and receive sample clocks will not be perfectly synchronized. This means that the average amount of data in the receive-side FIFO will start to trend upward or downward over time. If the FIFO depth is increasing, it is necessary to increase the output audio sample rate slightly to match. Similarly, if it is decreasing, it is necessary to decrease the sample rate. These adjustments will cause the long-term average sample rate of the receiver to match that of the transmitter exactly.
(Note that there is a patent on this technique, but that doesn't mean you can't use it in a personal project.)