Proper bypassing and grounding are unfortunately subjects that seem to be poorly taught and poorly understood. They are actually two separate issues. You are asking about the bypassing, but have also implicitly gotten into grounding.
For most signal problems, and this case is no exception, it helps to consider them both in the time domain and the frequency domain. Theoretically you can analyse in either and convert mathematically to the other, but they each give different insights to the human brain.
Decoupling provides a near reservoir of energy to smooth out the voltage from very short term changes in current draw. The lines back to the power supply have some inductance, and the power supply takes a little time to respond to a voltage drop before it produces more current. On a single board it can catch up usually within a few microseconds (us) or tens of us. However, digital chips can change their current draw a large amount in only a few nanoseconds (ns). The decoupling cap has to be close to the digital chip power and ground leads to do its job, else the inductance in those leads gets in the way of it delivering the extra current quickly before the main power feed can catch up.
That was the time domain view. In the frequency domain digital chips are AC current sources between their power and ground pins. At DC power comes from the main power supply and all is fine, so we're going to ignore DC. This current source generates a wide range of frequencies. Some of the frequencies are so high that the little inductance in the relatively long leads to the main power supply start becoming a significant impedance. That means those high frequencies will cause local voltage fluctuations unless they are dealt with. The bypass cap is the low impedance shunt for those high frequencies. Again, the leads to the bypass cap must be short else their inductance will be too high and get in the way of the capacitor shorting out the high frequency current generated by the chip.
In this view, all your layouts look fine. The cap is close to the power and ground chips in each case. However I don't like any of them for a different reason, and that reason is grounding.
Good grounding is harder to explain than bypassing. It would take a whole book to really get into this issue, so I'm only going to mention pieces. The first job of grounding is to supply a universal voltage reference, which we usually consider 0V since everything else is considered relative to the ground net. However, think what happens as you run current thru the ground net. It's resistance isn't zero, so that causes a small voltage difference between different points of the ground. The DC resistance of a copper plane on a PCB is usually low enough so that this is not too much of a issue for most circuits. A purely digital circuit has 100s of mV noise margins at least, so a few 10s or 100s of μV ground offset isn't a big deal. In some analog circuits it is, but that's not the issue I'm trying to get at here.
Think what happens as the frequency of the current running across the ground plane gets higher and higher. At some point the whole ground plane is only 1/2 wavelength across. Now you don't have a ground plane anymore but a patch antenna. Now remember that a microcontroller is a broad band current source with high frequency components. If you run its immediate ground current across the ground plane for even a little bit, you have a center-fed patch antenna.
The solution I usually use, and for which I have quantitative proof it works well, is to keep the local high frequency currents off the ground plane. You want to make a local net of the microcontroller power and ground connections, bypass them locally, then have only one connection to each net to the main system power and ground nets. The high frequency currents generated by the microcontroller go out the power pins, thru the bypass caps, and back into the ground pins. There can be lots of nasty high frequency current running around that loop, but if that loop has only a single connection to the board power and ground nets, then those currents will largely stay off them.
So to bring this back to your layout, what I don't like is that each bypass cap seems to have a separate via to power and ground. If these are the main power and ground planes of the board, then that's bad. If you have enough layers and the vias are really going to local power and ground planes, then that's OK as long as those local planes are connected to the main planes at only one point.
It doesn't take local planes to do this. I routinely use the local power and ground nets technique even on 2 layer boards. I manually connect all the ground pins and all the power pins, then the bypass caps, then the crystal circuit before routing anything else. These local nets can be a star or whatever right under the microcontroller and still allow other signals to be routed around them as required. However, once again, these local nets must have exactly one connection to the main board power and ground nets. If you have a board level ground plane, then there will be one via some place to connect the local ground net to the ground plane.
I usually go a little further if I can. I put 100 nF or 1 μF ceramic bypass caps as close to the power and ground pins as possible, then route the two local nets (power and ground) to a feed point and put a larger (10μF usually) cap across them and make the single connections to the board ground and power nets right at the other side of the cap. This secondary cap provides another shunt to the high frequency currents that escaped being shunted by the individual bypass caps. From the point of view of the rest of the board, the power/ground feed to the microcontroller is nicely behaved without lots of nasty high frequencies.
So now to finally address your question of whether the layout you have matters compared to what you think best practices are. I think you have bypassed the power/ground pins of the chip well enough. That means it should operate fine. However, if each has a separate via to the main ground plane then you might have EMI problems later. Your circuit will run fine, but you might not be able to legally sell it. Keep in mind that RF transmission and reception are reciprocal. A circuit that can emit RF from its signals is likewise susceptible to having those signals pick up external RF and have that be noise on top of the signal, so it's not just all someone else's problem. Your device may work fine until a nearby compressor is started up, for example. This is not just a theoretical scenario. I've seen cases exactly like that, and I expect many others here have too.
Here's a anecdote that shows how this stuff can make a real difference. A company was making little gizmos that cost them $120 to produce. I was hired to update the design and get production cost below $100 if possible. The previous engineer didn't really understand RF emissions and grounding. He had a microprocessor that was emitting lots of RF crap. His solution to pass FCC testing was to enclose the whole mess in a can. He made a 6 layer board with the bottom layer ground, then had a custom piece of sheet metal soldered over the nasty section at production time. He thought that just by enclosing everything in metal that it wouldn't radiate. That's wrong, but somewhat of a aside I'm not going to get into now. The can did reduce emissions so that they just squeaked by FCC testing with 1/2 dB to spare (that's not a lot).
My design used only 4 layers, a single board-wide ground plane, no power planes, but local ground planes for a few of the choice ICs with single point connections for these local ground planes and the local power nets as I described. To make a long story shorter, this beat the FCC limit by 15 dB (that's a lot). A side advantage was that this device was also in part a radio receiver, and the much quieter circuitry fed less noise into the radio and effectively doubled its range (that's a lot too). The final production cost was $87. The other engineer never worked for that company again.
So, proper bypassing, grounding, visualizing and dealing with the high frequency loop currents really matters. In this case it contributed to make the product better and cheaper at the same time, and the engineer that didn't get it lost his job. No, this really is a true story.
So you could fill a book with the answer to this question, in fact I think I have some on my shelf
Let’s run through your questions.
Should you use a 4 layer board instead of a 2 layer? I say absolutely yes, the cost argument to going 2 layer is a weak one at best compared to the advantages. Obviously it can be done, and is done, and in this devices case I see they placed VCC and GND right next to each other to make this easier to accomplish. So while I would go 4 layer, you can probably get away with 2 if you want.
Why decouple?
Now without going too deep consider the goal of decoupling your processor. You are trying to supply a stable voltage to it despite the fact that it has dynamic current demands. When your processor is active for instance and its transistors switch they are requesting more current. This current is a change, an increase to the current draw at steady state. Now you have a changing current but where are you going to get that current from?
Well first there’s a little decoupling on the die, but then it tries to pull it through the package power and gnd pins. It wants to get at that capacitor you placed outside of your device but before it gets there it has to travel through the bond wires and or package substrate, out the pins, and down your traces. All of this contributes to the inductance, and ultimately the impedance of the path from the die inside the chip to the capacitor.
Why does this matter? Well because an inductor “resists” changes in current consequently its impedance increases as frequency increases. That’s a simplification, but what happens when you try to drag that change in current through your package and routing is that the inductance limits the amount of current you can get.
So your goal when placing your decoupling capacitors should always be to minimize the impedance, and thus the inductance from the pin to your cap. Now with a QFP package like this you may find the shortest possible connection is right at the pins, with a 4 layer board and a BGA it might be directly underneath, but in practice you can achieve even lower impedance on top layers as well.
Don’t ignore GND either. Current flows in a loop, it does you no good to have a super short path to VCC and long winding path to GND. So if you’re going 2 layer I would put the caps parallel, as close as possible to GND and VCC, route directly to the pins, and then bring power and gnd into the caps. Your goal is to minimize the loop size.
More 4 layer arguments and selection
The goal of what we call Power distribution network design is to minimize the impedance across the range of frequencies that your chip will request. To that end having a nice fat GND and VCC plane leading from your caps/part to your regulator will be a much lower impedance path for your lower frequency down to DC. Short of that fat wide traces are recommended if you can.
Cap selection
For this processor and your board I think 0.1uF 402s and 0805 10uF are a good choice. The smaller package size helps you have a smaller loop size. I can do 201 by hand, never bought a 1005, but it is easier with a microscope. For more complex designs we select a range of decoupling capacitors to cover the range of frequencies that the part might demand from us. Blindly doing this as in just using 0.1uF, 0.01uF, and 0.001uF as is often suggested can lead to nasty anti resonance peaks giving you high impedance and certain frequencies Again this is a simplification, but I don’t think digging into that here will help you. Interesting to note that placing the 10uF capacitors further away is ok as their role in this design is for the lower frequencies where the impedance caused by the trace inductance will be lower. Also the frequency range you can effectively decouple to is limited by the impedance of the package we discussed earlier.
Actual part selection
There are tons of capacitors out there, and usually we don’t make specific part recommendations. But I would look for a 402 0.1uF ceramic capacitor with maybe an X7R temperature coefficient, and a voltage rating double your VCC. Here’s an example of one I have on a BOM
Your questions
OK long winded response I guess but sometimes if you get why something is done it makes it easier to decide how to do it.
So you say:
2 layer board: Seems ok for this, I always prefer a 4 as explained above. There are other benefits such as controlled impedance of traces, less noise, easier to pass emi. I don’t know what your board will do but without reference planes your traces return current will be forced to all follow whatever GND wires it can find. Gets a little messy.
GND pours: Meh it will help balance the copper on top and bottom layers for etching and re-flow, but really you’ll carve it up so much with traces it won’t do you that much good. Better to concentrate on getting power to that chip with as low an impedance as possible. Maybe you can figure out how to run VCC and GND as two copper pours?
Components on top: OK doesn’t really matter, in this case better to have decoupling on top than to go through vias to the bottom. If you are hand assembling it doesn’t really matter, but it would be cheaper to manufacture.
Traces on top and bottom: Definitely you probably won’t get away without this.
Decoupling: I talked about this at length.
Ah what else oh the ferrite, I didn’t see that in the app note. I’m assuming maybe it’s used to isolate one of the more sensitive VCC pins, maybe a PLL or an ADC. And it actually goes VCC supply -> Ferrite -> VCC Pin, with the cap from VCC pin to GND? If so that makes sense it’s probably just a little filter.
Got any questions? Just ask, it's hard to put everything you need to know about decoupling in one answer but hopefully this helps.
Best Answer
All active components should have decoupling capacitors. The PCB traces between your part and your power source act like parasitic resistors and inductors, and if you don't decouple your ICs, then when their power requirements change quickly - for instance, when they try to change their output signal in response to something - the changing power requirements will cause voltage drop and overshoot due to the long PCB traces. A nearby decoupling capacitor eliminates that high frequency noise by providing for those short-term spikes locally.
0.1uF capacitors are a reasonable default for decoupling; if your device may have particularly large power draw requirements, you should add a 1uF or larger capacitor in parallel.
Regarding grounding, you will find a lot of conflicting advice on this on the Internet. Split ground planes are often less simple and more problematic than you might suppose, because return currents prefer to flow on the reference plane underneath the signal trace. If any of your traces cross the split in the ground plane, you will force that current to deviate around the split in the ground plane, causing a lot more noise than you might have eliminated by splitting the ground plane in the first place.
This article provides an excellent description of why you should consider using a single ground plane combined with careful routing.