Autocorrellation is one way to help find the dominant frequency of a signal, but I don't see what a FFT has to do with that. The autocorrellation will produce peaks with the period of any strong frequency components. If you then take the FFT of that to find the frequency of those peaks, you might as well take the FFT of the original signal in the first place.
Instead of showing us code, show us the data at various stages of your process. The details of the code are your issue, and are separate from the conceptual processes of going thru the various convolutions, filters, or whatever.
You say your signal only has a single pitch, meaning it's a pure sine wave. In that case, I really don't see the advantage of a autocorrellation pass. You can find the period directly by looking at the time between zero crossings.
In the past I've had to find the fundamental frequency of mostly repeating signals with significant noise on them. What I usually did was apply several stages of low pass filtering. One big advantage of digital filtering is that smaller signals out of a filter don't mean less signal to noise ratio as long as you keep adding the necessary bits at the low end. Using floating point, for example, does this automatically. You can then aggressively low pass filter a signal such that it would be only µV in analog, but still have the same meaningful bits left at the end.
Each LPF pass attenuates the harmonics relative to the fundamental. After enough passes, you are left with mostly the fundamental. Once you have attenuated the harmonics enough to guarantee only two zero crossings per cycle, you look at the zero crossing period, perhaps apply a little low pass filtering to successive ones, and infer the frequency from there.
Added:
Now that you have provided some data we can see what is really going on:

It looks like you have about a 440 Hz signal, but this is clearly far from a "single pitch" as the shape is far from a sine. Just from inspection we can see that the second harmonic is particularly strong. It may be so strong that this "note" is perceived to be 880 Hz instead of the fundamental of 440 Hz.
In this case, what is it you want the answer to be, 440 Hz or 880 Hz? With enough low pass filtering, eventually you get mostly the fundamental and measuring 440 Hz shouldn't be that hard. If you want the answer to be the possibly perceptual tone of 880 Hz, then things get a lot more complicated. One possibility would be to identify the fundamental in all cases. Once you have that, it's easy to find the relative amplitude of the first few harmonics. Then you can decide based on the strength of those harmonics whether you want to report one of them or the fundamental.
With FM you get much more than two sideband frequencies. The sidebands you describe sound more like AM (amplitude modulation) sidebands. You may want to supply a link where you get this information from or re-check your information.
Next, in your final paragraph you might be getting confused. At any one moment in time the FM signal you describe will have a frequency that is somewhere between two limits. The limits are called the deviation meaning that's how much the carrier deviates +/- from its nominal centre frequency.
That deviation is nothing to do with the modulating signal's frequency but has everything to do with the modulating signal's amplitude. The bigger the peak amplitude the bigger the deviation from the nominal centre frequency.
If the PLL is designed correctly, and its low-pass filter permits the VCO to track the carrier within the deviation limits, the low-pass filtered signal (that feeds the VCO) will represent the signal that caused the original modulation i.e. the PLL is an FM demodulator.
You also mention "keeps re-locking" - this is not something that should happen in this type of circuit - the PLL will remain locked to the modulated carrier. There will be a slight error in the instantaneous lock because you need an error to drive the mechanism that tries to maintain lock but, this error will be small and this error is not regarded as the PLL losing-lock.
You might also be getting confused with what happens when the spectral content of FM is analyzed. Yes it has several components of frequency but these do not occur together at any one instant - the spectral content is a time averaged evaluation of what the carrier is doing - moving about following the amplitude of the input modulating signal.
Best Answer
You may think that the bandwidth of an FM signal is \$2\Delta f\$, where \$\Delta f\$ is the frequency deviation (the maximum difference between the instantaneonus frequency \$f(t)\$ and the carrier frequency \$f_c\$).
However, the frequency of a signal cannot be changed in an instant, therefore when frequency modulating a carrier, you will introduce additional frequencies below \$f_c-\Delta f\$ and above \$f_c+\Delta f\$.
The bandwidth of a frequency modulated signal is theoretically infinite, but it can be approximated with the help of Carson's bandwidth rule (http://en.wikipedia.org/wiki/Carson_bandwidth_rule). It's an approximation based on the highest frequency in the modulating signal (\$f_m\$) and the frequency deviation. In your case, the deviation is 1 Hz and \$f_m\$ is 4 Hz, so the approximated bandwidth around the carrier is 10 Hz.