I highly recommend the first thing you do is purchase High Speed Digital Design: A Handbook of Black Magic. Read it twice, then read it again :)
One important point. The crystal frequency doesn't matter here, you need to know the speed of the signals on the lines in question (which can be many times the crystal frequency). More over its actually rise / fall times that drive almost all signal integrity issues, not the digital frequency of the signal.
Designing for DDR isn't really that simple. High speed design can be a bit of a 'voodoo' art, even if you have $10,000+ simulation software. In other words, don't expect to nail the design the first time without putting in the work to understand the issues involved, a check list really won't cut it.
What I mean is, you really should start by reading the book I linked. It will give you enough background that the information in AN2582 will make sense (side note you linked the wrong pdf in the op). It will also allow you to understand the design trade offs you'll likely have to make when laying out the PCB.
That being said, here are my thoughts:
Routing Guidelines:
High level things to consider / avoid:
1) Route on a single layer, with a solid ground plane under it. Avoid vias like the plague. If this isn't possible, the DQ and ADDR groups are most critical, route those first, try to only move signals as groups to different layers.
2) Make sure you impedance match the traces: 50-60ohms, whatever comes out to the 'nicest' trace width for the design. Note the difference between differential and single ended lines and match the impedance appropriately.
3) Maintain proper signal spacing (i think 3*signal line width is preferred). This will help limit crosstalk between signals.
4) Match trace length of all related signals / groups (differential pairs, data bus, address bus, etc). Try to keep all traces to roughly the same length, that is you don't want the address group to be 1cm longer than the data group if you can avoid it.
5) Use source termination. You probably don't need parallel termination nor a Vtt given your board size and use of a single ram ic.
6) Pay special attention to Vref, it needs to be stable: well decoupled, fat traces. For a single ram module you can generate it with a simple resistor divider.
7) Don't use resistor banks for the termination, use individual resistors.
8) Expect that you'll need to 'play' with the source termination resistor values on the first prototype. Basically put a scope on the signal and try various values in the region of (trace_impedance - driver output impedance) = R. Look for the value that results in the cleanest signal (read up on eye patterns).
Signal Groups:
They are (NOTE: Taken from AN2910 and this is for a 64bit + 8bit ECC module, you don't have all these pins):
Data Group: \$MDQS(8:0), \overline{MDQS}(8:0), MDM(8:0), MDQ(63:0), MECC(7:0)\$
Address/CMD Group: \$MBA(2:0), MA(15:0), \overline{MRAS}, \overline{MCAS}, \overline{MWE}\$
Control Group: \$\overline{MCS}(3:0), MCKE(3:0), MODT(3:0)\$
Clock Group: \$MCK(5:0)\$ and \$\overline{MCK}(5:0)\$
Stack Up:
There are lots of ways to do this. Micron gives their recommendation for 6 layer stack ups with 3 or 4 signal layers in app note TN-46-14.
Really stack up is an entire topic of its own, but if your device has the 'standard' assortment of devices on it, these recommendations should work fine.
Other Stuff:
I think the rest of your questions are answered in the linked pdfs or AN2582. There is another checklist available in AN2910.
Unless your microcontroller has a direct bus support for interfacing to DDR/DDR2/DDR3 type RAM or your microcontroller is interfaced through an FPGA which has been programmmed to provide the RAM interface then it is likely that futzing around with DIMMs is not a useful exercise. There are several strong reasons why this is the case....
1) DDR memory chips may be operating at lower voltages than your microcontroller.
2) The interface to the DDR memory is multiplexed and requires precise clocking whilst the multiplexed lines change states in sync with said clock.
3) Modern DIMMs are designed to operate at very high frequency clocks of 800MHz, 1066MHz, 1333MHz, or 1600MHz. Signal integrity is extremely extremely important when designing the circuit connections to the DIMM. It is not a trivial exercise and the memory chips can be extremely sensitive to noise as a result.
4) DDR memories require constant refresh to keep the memory cells data valid. Without refresh the memory content fades away over time from milliseconds to seconds.
5) The command structure to operate modern DDR RAMs is complex. The most complicated part is getting the initialization sequence correct which consists of some 13 to 20 individual steps.
6) Modern DIMMs are designed to feed data to modern PC type computers very fast. The typical DIMM has a data path width of 64-bits. Multi rank DIMMs also require multiple clocks and chip select signals to access all of the memory chips on the memory stick. It is unlikely that the typical small microcontroller can make effective use of this wide data format without an excessive amount of external circuitry.
Keep this in mind too. Companies that make PC style processors that utilize DIMMs have onboard controllers to interface to the memory sockets. There is an engineering specialty for programmers that work in the BIOS field called MRC (memory reference code). This is the program code module that initializes the DDR controller and all the attached DIMMs. This specialty employs the best and some of the most senior BIOS programmers that do nothing but MRC coding as a full time job.
Best Answer
The most typical "soft" error in DRAM consists in the loss of charge by the bit capacitor. Capacitors which were not charged to begin with rarely get any charge from the thin air. Charged capacitors lose their charge naturally over time, and this process is accelerated by faulty gate transistors, cosmic rays switching such transistors to conductive state, and dielectric imperfections.
Whenever a charged capacitor represents logical 0 or 1 is defined by the DRAM implementation. First generations of DRAM had the capacitor tied to ground and used to represent logical 1 with a charged capacitor and logical zero by a discharged one, so typical errors manifested in ones turning into zeroes:
simulate this circuit – Schematic created using CircuitLab
Modern DRAM is a bit different, with the capacitor being tied to VCC/2 potential instead of ground:
This make both 1->0 and 0->1 errors equally probable, at least among many DRAM chips. A particular DRAM chip can still have different probabilities for these errors if VCC/2 potential is consistently interpreted as either 0 or 1.
Earlier DRAM implementations used more exotic bit representation schemes (like encoding pair and unpair bits differently), but I can't find any references.