Smaller packages tend to be more expensive for the same spec. There are also often differences in frequency ranges, tempco, accuracy etc. available in different sizes
Most likely this is a crystal intended for parallel resonant drive, and Cl is therefore the total load capacitance.
If the crystal were driven with a 0 impedance sine wave, this is the capacitance on the other side that would result in the desired phase shift at the rated frequency. Parallel resonant drive circuits rely on this phase shift to make the loop gain greater than 1 so that the circuit oscillates. This load capacitance is specified for crystals intended for parallel resonant applications. For normal crystals in the 1-25 MHz range, it's usually around 16-22 pF. For special low frequency watch crystals it is typically less, like 8-12 pF.
What the equation is trying to tell you is to split up this load capacitance equally on each side of the crystal, but to consider stray capacitance in the total. Stray capacitance is the unavoidable capacitance between nearby conductors or on the microcontroller pins to ground. It is impossible to know precisely, but 3-8 pF is a good guess, assuming reasonable layout.
While we see this very simplistic view of crystal load capacitance a lot, it is not all that good a model. This is because it ignores the output impedance of the circuit driving the signal into the crystal. At your frequency, a 18 pF capacitor would have a impedance of 620 Ω. The drive circuit can easily be substantially less than that, especially a drive circuit intended for that frequency. Think of the limiting case where the drive circuit has 0 impedance. Any capacitance added to that side of the crystal would be irrelevant.
When the drive circuit has high impedance, then the total capacitance seen by the crystal accross its leads is the series combination of the capacitance on each lead to ground. That's where the 2 in your forumula comes from. If both these capacitances are equal, then their series combination will be half of the individual values.
In summary, this equation is naive and simplistic. However, the capacitive load spec of a crystal is reasonably forgiving (one advantage of the parallel drive method), so it works well enough most of the time. If in doubt, put 22pF to ground on each side of the crystal, and it will most likely resonate nicely at very close to the rated frequency. Unless you need it to be super accurate, there is little reason to get into more detail.
Best Answer
Sounds like an interesting project.
All microcontroller datasheets I've read so far all have the same crystal oscillator configuration -- a single inverter in the Pierce oscillator configuration.
It will make no difference to the precision of a Pierce oscillator whether you use such a microcontroller or a SSI discrete logic chip to drive the crystal -- it's a simple inverter either way.
While most of the Pierce oscillators I've seen use exactly 2 capacitors, one per crystal pin, some people insist that the right way to built a Pierce oscillator is with 4 capacitors, one to ground and one to VCC at each crystal pin.
Sometimes I wonder if a "gentle" sine-wave oscillator such as a Wien bridge oscillator would be better at driving a crystal than the digital on/off of a Pierce oscillator. Perhaps you could build a couple of each kind of circuit and compare them.
Wikipedia claims that thermal noise influences the stability of crystal oscillators. Would putting one of your crystals in a little Peltier cooler at a constant cold temperature work any better than the more common approach, putting the crystal in a little oven at a constant hot temperature?
The Spark Fun Wall Clock has a few pointers on getting a clock to read GPS time.