The fact that theremins use heterodyne mixers has nothing to do with RF. The 'antennae' are not antennae in the classical, RF sense. The capacitance explanation is correct.
Capacitors and Theremin 'Antennae'
The simplest type of capacitor is a parallel-plate capacitor. That means the capacitor consists of two metal plates separated by some material called the dielectric. The equation for the capacitance of such a capacitor is C=εA/d, where ε is the permittivity of the dielectric (ε≈8.8541878176..×10^−12 F/m for air).
When you are operating a theremin, your hand is one plate (your hand is effectively grounded), the antenna is the other, and the air between the two is the dielectric. As you move your hand, you vary the capacitance between ground and the antenna. Both hands will affect both antennae, as they act like two plates in parallel, increasing the total area.
The two antennae are at right angles because that reduces the impact your left hand will have on the right antenna and vice versa. For example, as you move your hand up and down above the volume antenna, it maintains a relatively constant distance from the pitch antenna, thus it's contribution to the overall capacitance is constant (and small).
Theory of Operation
Note/Update: Please refer to FredM's Answer for a more detailed description of the oscillator.
Both antennae capacitors are part of two different, complex active LC oscillators. The 'L' refers to inductors, which store energy in a magnetic field; the 'C' refers to capacitors, which store energy in an electric field. In an LC oscillator, energy is constantly flowing back and forth between the two, changing from electric potential to magnetic potential.
The frequency of the pitch oscillator is beyond audio frequencies, so it can't be directly used. The theremin has a third oscillator that operates at a fixed frequency. The pitch oscillator and the fixed oscillator's outputs are fed into a heterodyne mixer, resulting in an output that includes the sum and difference frequencies of the two inputs. The sum frequency is even higher than the original signal, thus it is useless and is filtered out. The resulting signal is a single frequency (plus harmonics) in the audio range.
The frequency of the volume oscillator is used to control how much the audio signal is amplified. As you move your hand, the frequency changes, so the amplifier's gain changes, and thus the output volume changes.
The capacitance between two plates varies as:
$$C = \frac{eA}{d}$$
in which \$d\$ is the distance between the plates, \$A\$ is the area of the plates and \$e\$ is the Coulomb constant.
$$e = 8.9 \times 10^{-12}$$
Distance from earth to moon:
$$d = 4 \times 10^8\text{ meter}$$
Approximate equivalent earth surface:
$$A = (1.28 \times 10^4)^2$$
Therefore, $$C = \frac{8.9 \times 10^{-12} \times 1.64 \times 10^8}{4 \times 10^8} = 2.39 \times 10^{-11} = 10\text{ pF}$$
The numbers were truncated to the nearest the third place.
Best Answer
Probably not. Heterodyning is an important part of how a Theremin works.
Keep in mind that the capacitance change caused by waving your hand near the antenna is a tiny fraction of the fixed/parasitic capacitance of the circuit overall, on the order of 0.2%.
The period of a 555 timer is directly proportional to capacitance, so a 0.2% change in capacitance results in a 0.2% change in period (or frequency). This would be difficult to detect and/or convert to a control voltage. If your 555 has a nominal period of 1 ms, you'll need to detect changes that cover a span of just 2 µs.
On the other hand, if you have a 1 MHz LC-tuned RF oscillator, a 0.2% change in capacitance will shift its frequency by 0.1%, or 1000 Hz. If you then mix this oscillator with a second oscillator that's fixed at 1 MHz, you get a beat frequency that varies from 0 to 1000 Hz, a range that's much easier to work with.