Looks good and you may just get lucky with that layout.
Being an engineer, luck is usually not a method I rely on :-) So let me show you what I would do:
1) Define the PCB stackup. Looks like you are on a 4-layer stackup, but we need to know material and thickness of laminate/prepreg etc.
2) Calculate trace widths to give you 50R on all layers. Your traces looks wide, but you didn't give your stackup so they may be okay. I would worry a bit about crosstalk though if those traces really are 50R (because I then know that they are far from your reference plane, which increases crosstalk).
3) Engineer a great low impedance power delivery network (PDN). I read between the lines that you have two planes for power and ground - which is a really good idea. I would use my tool at pdntool.com to select the right capacitor combination. And use the knowledge that bypass capacitor location is fairly unimportant. So the caps would be placed last so the don't interfere with the routing.
4) Repeat this for your Vtt supply. The termination voltage is being constantly pulled in both directions, so it needs a low impedance as well. With DDR1 on a low layer count board, Vtt ripple is a common problem (and make sure Vref is not connected to Vtt!!!). This would usually require a Vtt island with sufficient bypass. Remember about half the ripple on Vtt will be present as noise on top of any input signal terminated to Vtt.
5) Do some quick IBIS simulations to find a trace separation that gives you acceptable crosstalk. Use Hyperlynx, SigXplorer or some such tool for this. Or get someone to do it for you.
6) Do your timing analysis to find the acceptable tolerance on trace length matching (don't overdo length matching - just keep within your calculated tolerance).
7) Document the above in a nice document and call a peer review - this is a great time to find errors. You could also post that here and ask for problems in your reasoning.
8) Enter everything as routing rules in your CAD tool and do that layout. Remember with a well engineered PDN and 50R on all layers your via count is irrelevant. Also if you just route your differential clock as two 50R traces of same length (within half a rise-time or so), you need not treat them special.
For inspiration you can also look at the layout examples on the JEDEC website.
Hope this helps - feel free to ask more questions.
Trace length matching is important in DDR, DDR2 and DDR3, but the most important question is how closely do they need to be matched.
For DDR1, 2 and 3, each byte should be matched to the strobe, and the strobe needs to be matched to the clock. Address and control likewise have a relationship to the clock. Just how tight the matching is depends on your specific implementation and requires you to analyse the link timing budget (See Micron note TN4611). How tightly each match is to another group of a byte depends on whether you have multiple clocks available. Once more, this is part of the timing budget analysis.
DDR3 is somewhat easier to route than the earlier versions (due to a feature known as Write Levelling), but that does not mean you do not need to do a timing analysis.
If you are using an FPGA for the memory controller, keep in mind that the effective length from the pins or balls to the die can be measures in inches rather than a few thou, and this needs to be accounted for in the timing budget (some FPGA tools normally allow timing closure to take care of this internally, but you need to enable the feature. Not all tools can do this).
So - can you ignore the track length match for DDR itnerfaces? My answer is no; you can however, do no more length match than a shortest route provided it does not violate the timing margin for the interface.
I will note that the timing budget becomes easier to meet the shorter the interface; the timing budget is dominated by read timing which has both an outbound and inbound component. The shorter the interface, the lower the cumulative timing offsets.
Best Answer
The minimum length of the DQ lines does not matter because you will just change how you terminate your lines. Depending on how many sockets you have will change the termination load. You need impedance matched boards for this reason. I haven't seen the more recent development guidelines, but board layouts are pretty much given to you.
The control groups are longer due to "cross talk", and this is why you have the WR_DATA_DELAY value in the control registers. You make the control lines as long as you need to, and then you add some time for the setup.