This is a good illustration of the trouble you run into when using canned libraries.
If you originally had 10% noise on the signal, then maybe the problem is with the circuit. What kind of signal are you measuring? What is its upper frequency? How often are you sampling? It would be useful to show the schematic.
You seem to be allowing 5 µs for acquisition, but how do you know that is enough? What impedance is the signal being fed to the A/D? One possible explanation of your symptoms is that the signal impedance is too high, and the previous code used a longer acquisition time, which allowed the sample and hold to settle more closely to the correct value.
However, a better answer is probably to re-think the whole A/D strategy in the first place. Don't base it on what library routines happen to be available or some similar nonsense. Read the datasheet section on the A/D and think about how best to use it, possibly in conjunction with other hardware, like timers, to solve the overall measurement problem.
A strategy I often use is to take A/D readings in a periodic interrupt. You run this as fast as the A/D will let you acquire and convert the signal, and within limits of leaving enough foreground cycles for the rest of the system. This A/D interrupt samples the A/D much faster than you really need, then low pass filters the results. When the main code wants the latest reading, it doesn't go out and take a single A/D reading. It just uses the current value of the low pass filter output. I have done this sort of thing many times. In fact, it's my normal way to take A/D readings unless there is a specific reason to do it differently.
In a small microcontroller, there is no substitute for actually understanding the hardware and being aware of how exactly it is being used, whether it is thru your own code or "library" routines.
Added about periodic A/D interrupts
You now say you are using periodic interrupts to read the A/D, but then your code makes even less sense than it did before. If you are only reading a single channel, then you'd leave the A/D set to that channel and not change it during normal operation. If you are reading multiple channels, the interrupt would cycle thru them. In that case, the A/D conversion done interrupt would switch the hardware to the new channel right after grabbing the conversion results. That way most of the time that isn't conversion is used to acquire the next signal. Even if you were to run the A/D quite fast, like 100 kHz for example, that still leaves most of 10 µs for the acquisition.
You wouldn't generally call a A/D conversion routine from a interrupt anyway, and even if you did, it wouldn't be changing the channel and then waiting around for the acquisition and conversion while the foreground code is locked out of the processor. That makes no sense and is unnecessary.
Then there is the issue of returning the unsigned 10 bit value left justified into a signed integer. It makes no sense that values of 512 and above should be interpreted as negative.
Once again, what is the upper frequency content of the signal? How often does the firmware need its value? I ask these questions for good reasons, and I expect you to answer them whether you think they are relevant or understand the reasons or not. After all, this is your problem. We are volunteers trying to help, but if you refuse to cooperate then we will find other places to spend our limited time here.
You should take full advantage of your bridge to directly drive a difference amplifier. Trying to create the separate reference of 2.47V using an LM324 is going to be a source of error that will likely forever kill the performance of your circuit. After all there is a good reason that full bridges are designed into load cells.
I've not looked at any data sheets for the parts that you are using but it also seems like your use of the ADxxx parts for part of the circuit and the LM324 for the references will be another source of error. The LM324 and the ADxxx parts are likely to be in completely different leagues when it comes to error parameters such as the offset voltages.
Another thing to think about. When you get your circuit all polished out there will still be an error where zero load not equal to zero result. The best you hope for is that the total circuit is as linear as possible across its usable range. Then you take what ever reading you get at "no load" and subtract that value from readings that you make at weighing time to get the actual weight. Another factor to add into this is that it is often necessary to also use software to scale the readings to actual weight. This entails taking a reading at full weight and storing that to be used to scale subsequent readings to actual weight.
If you are unable to achieve good linearity of the analog circuitry over the full weight usage range it may then be necessary to additionally calibrate the system at additional points such as mid range or at 25%, 50% and 75% to give the ability to scale readings to actual weight by doing linear interpolation over shorter sections of the input range.
Lastly do not discount the importance of what temperature variation will do to your system.
Best Answer
Since you have an example of the signal on your scope, the best thing to do is capture the data and transfer it to a PC. Then use a tool like Matlab or Octave to simulate the effect of different filters.
You are looking for a filter, just defined in terms of poles (and maybe zeros) that minimizes the noise, without disturbing the desired features of the signal.
When you have a filter definition, then worry about how to build it.
If a single-pole filter is adequate, a simple RC circuit solves your problem.
For a two-pole filter, the Sallen-Key op-amp circuit is known for having relatively good tolerance for changes in the component values. An LC combination is also possible.
For higher-order filters (which I doubt you need), a cascade of Sallen-Key filters is preferable to a ladder of LC stages, because the op-amp provides buffering that prevents component value shifts in one stage from affecting the characteristics of other stages.
Edit In reply to your comment, I'm not a DSP guy, but here's how I'd work out the equivalent continuous time filter:
Your filter function in discrete time is
\$y_n = a x_n + (1-a) y_{n-1}\$
Given an impulse input, the time constant is the time it takes to decay to \$e^{-1}\$ of the value of \$y_0\$.
This is given by
\$(1-0.025)^n = e^{-1}\$
Solving this, n is about 39 samples, or 156 us.
So you want to choose R low enough that the input impedance of the ADC doesn't affect the filter performance much, then choose C to give RC = 156 us.