Looks good and you may just get lucky with that layout.
Being an engineer, luck is usually not a method I rely on :-) So let me show you what I would do:
1) Define the PCB stackup. Looks like you are on a 4-layer stackup, but we need to know material and thickness of laminate/prepreg etc.
2) Calculate trace widths to give you 50R on all layers. Your traces looks wide, but you didn't give your stackup so they may be okay. I would worry a bit about crosstalk though if those traces really are 50R (because I then know that they are far from your reference plane, which increases crosstalk).
3) Engineer a great low impedance power delivery network (PDN). I read between the lines that you have two planes for power and ground - which is a really good idea. I would use my tool at pdntool.com to select the right capacitor combination. And use the knowledge that bypass capacitor location is fairly unimportant. So the caps would be placed last so the don't interfere with the routing.
4) Repeat this for your Vtt supply. The termination voltage is being constantly pulled in both directions, so it needs a low impedance as well. With DDR1 on a low layer count board, Vtt ripple is a common problem (and make sure Vref is not connected to Vtt!!!). This would usually require a Vtt island with sufficient bypass. Remember about half the ripple on Vtt will be present as noise on top of any input signal terminated to Vtt.
5) Do some quick IBIS simulations to find a trace separation that gives you acceptable crosstalk. Use Hyperlynx, SigXplorer or some such tool for this. Or get someone to do it for you.
6) Do your timing analysis to find the acceptable tolerance on trace length matching (don't overdo length matching - just keep within your calculated tolerance).
7) Document the above in a nice document and call a peer review - this is a great time to find errors. You could also post that here and ask for problems in your reasoning.
8) Enter everything as routing rules in your CAD tool and do that layout. Remember with a well engineered PDN and 50R on all layers your via count is irrelevant. Also if you just route your differential clock as two 50R traces of same length (within half a rise-time or so), you need not treat them special.
For inspiration you can also look at the layout examples on the JEDEC website.
Hope this helps - feel free to ask more questions.
A DDR memory device actually consists of two distinct components:
1: A series of memory arrays composed mostly of capacitors, which are written to and read from using a very wide bank of differential amplifiers. This is fundamentally an analogue circuit, surprisingly enough.
2: An interface buffer, which allows the hundreds or thousands of individual bits produced by a single memory-array read cycle to be interfaced to a reasonable number of data lines to the Northbridge or CPU. Several cycles on the external interface are needed to transmit the data in the buffer.
In general, the feature size of semiconductor technology decreases over time as manufacturing technology is refined. This has different effects in the above two components.
For the memory array, the differential amplifiers become more sensitive and the individual capacitors become smaller. This allows a larger array to be constructed in the same die area, reading out more bits per cycle. The speed of the array remains roughly the same, however.
For the interface buffer, some of the data paths become shorter and therefore faster, required voltage swings reduce, and there is now space for better skew-correction, clock recovery, etc. This permits higher external signalling speeds within a reasonable power and area budget. The original DDR RAM simply transmitted data on both the rising and falling edges of the clock signal, instead of only on the rising edge as SDRAM did. More recent versions effectively multiply the basic clock signal as well.
This "basic clock signal" usually works out to around 200MHz in mainstream products of each generation, though faster and slower devices are also available. In original DDR, a 200MHz clock meant 400 MT/s, and was often described as 400MHz (or DDR-400) though the highest frequency signal is actually 200MHz. In DDR2, the basic clock is doubled using a PLL at both ends of the interface, so the actual clock rate is 400MHz and there are 800 MT/s. In DDR3 the clock is quadrupled and in DDR4 it is octupled, giving typically 3200 MT/s today. As you can imagine, the timing relative to the clock edges has to be controlled very carefully.
Since the memory arrays themselves haven't changed much in speed, these higher interface speeds come with increased "column strobe latency" (CL) figures. These describe how many transfer cycles elapse between providing the address and receiving the data, and are used to accommodate the limited speed of the memory arrays relative to the interface bus.
One of the things that the basic clock controls more-or-less directly, rather than through a PLL, is the self-refresh cycle of the memory arrays. Using capacitors to store bits is very space-efficient, but the charge leaks out of them rather easily and weakens the indication within a few tens of milliseconds, so the memory arrays must constantly cycle through their contents, reading and re-writing them to ensure they remain valid.
Best Answer
I/O bus clock is always half of bus data rate.
example: DDR2-800: bus data rate is 800 MT/s, IO clock is 400 MHz.
Memory clock is the clock which sync memory controller:
DDR1: 1/2 of bus data rate, because of 2n-prefetch
DDR2: 1/4 of bus data rate, because of 4n-prefetch
DDR3: 1/8 of bus data rate, because of 8n-prefetch
Two different clock in DDR for MC (memory controller and PHY):
DFI clock: value is equal to memory clock
DFI PHY clock: value is equal to IO clock