Part of your problem is the FFT.
Since the bins of the FFT won't line up perfectly with the DTMF frequencies, some of the tones won't be detected properly as they will be "smeared" across two or more bins (you don't mention sampling rate or the size of your FFT, so I can't tell for sure what bin sizes you will have.)
You can improve the frequency separation by using longer FFT blocks, but then the delay time mounts up. Longer blocks will help with the noise, too, but if the blocks are too long then short DTMF signals won't be detected (too short.) You can get around that by using overlap in your blocks.
There's a lot of other things you could do, too.
Since DTMF decoding is an old and well researched field, you could start by seeing how it has been done in the past.
You should also devise a reliable method for determining the reliabilty of your methods. Look up the term "signal to noise ratio."
Old microprocessor based DTMF decoders used the Goertzel algorithm, which is a method for calculating the fourier transform at a specified frequency. Use a handful of them with the proper frequencies, and see what kind of performance you get.
The older DTMF decoders without microprocessors used a bank of filters, then detected the amplitude at the output of each filter.
At any rate, your first step should be to hit the library (or google) and see what has been done in the past.
I'm glad to see you said the noise was 'kinda Gaussian'. If you had asserted that the noise was Gaussian, I would have been busy warning the map is not the territory, the noise can do what it likes, however Gaussian is a good approximation to many noise processes.
As you so correctly point out, you'll need to wait until the universe cools to see all the samples. So you need to investigate the signal, and hope that it's stationary within the time of your analysis, and stationary into the future if you want to use your analysis to set a maximum gain.
The thing to do is calculate a CCDF (google, wikipedia) of your input signal, with a wide enough range so there's no observed overload in your observation time. This will lead very quickly to a good estimate of how many samples you will fail to catch for any given maximum range, and so underestimate the signal power if measured with the new range.
You will be able to do calculations like 'if you want to measure with 95% probability to 0.01% power accuracy, then you have to set an x.sigma peak signal'. Of course, that's based on signal you've already seen. If you want to estimate the effect of signals you haven't seen, that will always be an extrapolation, which is always uncertain.
As a hint, if the noise looks guassian(ish), and you can identify a probable sigma, three sigma is often large enough for some people, though demonstrably too small for others, six sigma is only twice as large, and will satisfy all but the theorists and the 'one in 10^12' disk drive error researchers.
Best Answer
No, they are completely orthogonal concepts. The probability distribution says nothing about the frequency content, and the power distribution across frequency says nothing about the sample probability distribution. You have to specify both.