The fact that theremins use heterodyne mixers has nothing to do with RF. The 'antennae' are not antennae in the classical, RF sense. The capacitance explanation is correct.
Capacitors and Theremin 'Antennae'
The simplest type of capacitor is a parallel-plate capacitor. That means the capacitor consists of two metal plates separated by some material called the dielectric. The equation for the capacitance of such a capacitor is C=εA/d, where ε is the permittivity of the dielectric (ε≈8.8541878176..×10^−12 F/m for air).
When you are operating a theremin, your hand is one plate (your hand is effectively grounded), the antenna is the other, and the air between the two is the dielectric. As you move your hand, you vary the capacitance between ground and the antenna. Both hands will affect both antennae, as they act like two plates in parallel, increasing the total area.
The two antennae are at right angles because that reduces the impact your left hand will have on the right antenna and vice versa. For example, as you move your hand up and down above the volume antenna, it maintains a relatively constant distance from the pitch antenna, thus it's contribution to the overall capacitance is constant (and small).
Theory of Operation
Note/Update: Please refer to FredM's Answer for a more detailed description of the oscillator.
Both antennae capacitors are part of two different, complex active LC oscillators. The 'L' refers to inductors, which store energy in a magnetic field; the 'C' refers to capacitors, which store energy in an electric field. In an LC oscillator, energy is constantly flowing back and forth between the two, changing from electric potential to magnetic potential.
The frequency of the pitch oscillator is beyond audio frequencies, so it can't be directly used. The theremin has a third oscillator that operates at a fixed frequency. The pitch oscillator and the fixed oscillator's outputs are fed into a heterodyne mixer, resulting in an output that includes the sum and difference frequencies of the two inputs. The sum frequency is even higher than the original signal, thus it is useless and is filtered out. The resulting signal is a single frequency (plus harmonics) in the audio range.
The frequency of the volume oscillator is used to control how much the audio signal is amplified. As you move your hand, the frequency changes, so the amplifier's gain changes, and thus the output volume changes.
The big question is: what distance do you want to cover? The data sheet of the transmitter quotes a maximum range of 50 metres [about 150 ft]. Will you use that, or will the receiver be closer?
Any oscillating signal will radiate: the whole point of the USA's FCC is to limit the amount of annoying [or dangerous] EM radiation coming from devices. Depending on the range, and the presence of bulky metallic objects between transmit and receive, Your Mileage May Vary.
Antenna theory and design can be taught in depth by amateur radio enthusiasts, or groups like ARRL. For starters, a simple piece of wire about 20cm long can act like a "whip" antenna: keep it clear of grounded cases, and it should be enough to get you started. A second piece the same length can act as your receive antenna. Start with the circuits next to each other, make sure they work, then seperate them. If they stop working before reaching the seperation you want, THEN [and only then!] consider delving into antenna theory...
If you need more, I'd suggest TI's Application Note #AN058: http://www.ti.com/lit/an/swra161b/swra161b.pdf
And: http://www.picaxe.orconhosting.net.nz/yagi433.jpg
Best Answer
This is a mechanical simplification.
Imagine you are holding a stick of 1 metre in length and the stick is made from a flexible material such as er... rubber or even thin metal. If you moved the stick with your hand slowly, it would barely flex at all - your hand movement would be almost perfectly matched by the movement at the end of the stick and no force is transmitted down the length of the stick.
The above describes an antenna/wire where the frequency from the transmitter is much too small.
Now consider the situation where you could "shake" (or oscillate) that stick really rapidly - your hand would be moving rather quickly from point A to point B then back to point A (and repeating) but the end of the stick would be virtually at standstill.
The above describes the scenario where the transmitter frequency is too high. (The experts amongst us will of course point out that multi-mode oscillations may occur but for now bear with this simplification).
And now, the goldilocks frequency. At the right frequency you can cause the end of the stick to massively move with hardly any movement of the hand - the only constraint is that the hand movement should pretty much precisely match the frequency that sustains the basic mechanical oscillation.
This is what happens with an antenna - for the stick at mechancial resonance (see also cantilever for the maths) significant forces are being transmitted via the hand despite the movement being quite small and, the end of the stick is attaining significant movement. Relate in your mind the forces to current flow and the movement to voltage and you have an antenna.
What makes this happen in the stick is the energy input from your hand, mass and springiness. What makes this happen in an antenna wire is energy input (the transmitter feed) capacitance and inductance.
So the "resonant stick" takes from the hand a significant force with very little movement - this is like the 50ohm antenna input on the antenna - it's quite low impedance and, on the end of the stick there is movement amplification but the force at the end is small - it comverts an impedance and, for an antenna hopefully matches it to the impedance of free space (377 ohms).
I hope this helps.