The "Goertzel algorithm" requires fewer resources (RAM, codespace) and less CPU cycles for a single specific frequency.
Wiki has a good set of links to delve through.
I have used Goertzel on ATMega328P and P8X32A with great results. Good luck with your project.
Autocorrellation is one way to help find the dominant frequency of a signal, but I don't see what a FFT has to do with that. The autocorrellation will produce peaks with the period of any strong frequency components. If you then take the FFT of that to find the frequency of those peaks, you might as well take the FFT of the original signal in the first place.
Instead of showing us code, show us the data at various stages of your process. The details of the code are your issue, and are separate from the conceptual processes of going thru the various convolutions, filters, or whatever.
You say your signal only has a single pitch, meaning it's a pure sine wave. In that case, I really don't see the advantage of a autocorrellation pass. You can find the period directly by looking at the time between zero crossings.
In the past I've had to find the fundamental frequency of mostly repeating signals with significant noise on them. What I usually did was apply several stages of low pass filtering. One big advantage of digital filtering is that smaller signals out of a filter don't mean less signal to noise ratio as long as you keep adding the necessary bits at the low end. Using floating point, for example, does this automatically. You can then aggressively low pass filter a signal such that it would be only µV in analog, but still have the same meaningful bits left at the end.
Each LPF pass attenuates the harmonics relative to the fundamental. After enough passes, you are left with mostly the fundamental. Once you have attenuated the harmonics enough to guarantee only two zero crossings per cycle, you look at the zero crossing period, perhaps apply a little low pass filtering to successive ones, and infer the frequency from there.
Added:
Now that you have provided some data we can see what is really going on:
It looks like you have about a 440 Hz signal, but this is clearly far from a "single pitch" as the shape is far from a sine. Just from inspection we can see that the second harmonic is particularly strong. It may be so strong that this "note" is perceived to be 880 Hz instead of the fundamental of 440 Hz.
In this case, what is it you want the answer to be, 440 Hz or 880 Hz? With enough low pass filtering, eventually you get mostly the fundamental and measuring 440 Hz shouldn't be that hard. If you want the answer to be the possibly perceptual tone of 880 Hz, then things get a lot more complicated. One possibility would be to identify the fundamental in all cases. Once you have that, it's easy to find the relative amplitude of the first few harmonics. Then you can decide based on the strength of those harmonics whether you want to report one of them or the fundamental.
Best Answer
This might fit far better on signals.stackexchange.com, if you rephrased it as signal processing question
Anyway, don't start with "I want to make a dedicated chip". Start with, I want to understand how something like that can be done, and then I will pick the tools, and pick implementations.
However, the question "how to best detect pitch of human singing is a very complicated one and far from easy to answer – even on a purely music-theoretical point of view, voice doesn't have one fundamental frequency, unless sung for the effect of producing the perfect tone.
You can, of course, try to detect the dominant tone in a song – and that's a pretty common question on signals.stackexchange.com, so I can only encourage you to search that – but it's still a pretty good question whether what your algorithm detects as dominant represents what a human might perceive as the tone of singing – humans are far from uniform, and that doesn't stop at the perception of music.
A small Cortex-M4F like the STM microcontroller you mention might be suitable for many of the algorithms that you will have to take into consideration, but many other's won't work.
So, one of the important rules of engineering applies: First understand your problem, then pick the tools. That applies to things like the FFT just as much as to the compute platform you'll be running this on.
Any reasonable approach for this will consist of first designing the the DSP on a PC-style computer, trying it against recorded digital signals (thus, audio files), refining it, then porting it to whatever platform you chose to put on your PCB.
That's one of the strengths of DSP: it's really just math. You can do it on a PC just as well as you can do it on a microcontroller, as long as the math you do does the same.