A DDR memory device actually consists of two distinct components:
1: A series of memory arrays composed mostly of capacitors, which are written to and read from using a very wide bank of differential amplifiers. This is fundamentally an analogue circuit, surprisingly enough.
2: An interface buffer, which allows the hundreds or thousands of individual bits produced by a single memory-array read cycle to be interfaced to a reasonable number of data lines to the Northbridge or CPU. Several cycles on the external interface are needed to transmit the data in the buffer.
In general, the feature size of semiconductor technology decreases over time as manufacturing technology is refined. This has different effects in the above two components.
For the memory array, the differential amplifiers become more sensitive and the individual capacitors become smaller. This allows a larger array to be constructed in the same die area, reading out more bits per cycle. The speed of the array remains roughly the same, however.
For the interface buffer, some of the data paths become shorter and therefore faster, required voltage swings reduce, and there is now space for better skew-correction, clock recovery, etc. This permits higher external signalling speeds within a reasonable power and area budget. The original DDR RAM simply transmitted data on both the rising and falling edges of the clock signal, instead of only on the rising edge as SDRAM did. More recent versions effectively multiply the basic clock signal as well.
This "basic clock signal" usually works out to around 200MHz in mainstream products of each generation, though faster and slower devices are also available. In original DDR, a 200MHz clock meant 400 MT/s, and was often described as 400MHz (or DDR-400) though the highest frequency signal is actually 200MHz. In DDR2, the basic clock is doubled using a PLL at both ends of the interface, so the actual clock rate is 400MHz and there are 800 MT/s. In DDR3 the clock is quadrupled and in DDR4 it is octupled, giving typically 3200 MT/s today. As you can imagine, the timing relative to the clock edges has to be controlled very carefully.
Since the memory arrays themselves haven't changed much in speed, these higher interface speeds come with increased "column strobe latency" (CL) figures. These describe how many transfer cycles elapse between providing the address and receiving the data, and are used to accommodate the limited speed of the memory arrays relative to the interface bus.
One of the things that the basic clock controls more-or-less directly, rather than through a PLL, is the self-refresh cycle of the memory arrays. Using capacitors to store bits is very space-efficient, but the charge leaks out of them rather easily and weakens the indication within a few tens of milliseconds, so the memory arrays must constantly cycle through their contents, reading and re-writing them to ensure they remain valid.
Best Answer
The Q is just some ancient notation. Data signals are called DQ and data strobe is DQS
Data strobe is the clock signal for the data lines. Each data byte has their own strobe
It is bidirectional signal. It is transmitted by the same component as the data signals. By the memory controller on write and the by the memory on read commands.
Control and address signals are unidirectional and clocked by the CLK signal. DQS runs the same speed as CLK but they are not synchronized.
Let's imagine time of flight for all signals is 1ns.
Situation with only one clk that is transmitted by the controller:
-During write there is no problem. Data signals can be clocked to the CLK signal and everything is fine. If traces are length matched you can use timing tolerances tighter than the time of flight.
-During read there is a problem. The controller must first transmit the clock to memory, where it arrives 1 ns later. Then the memory sends data bits to the controller and this takes another nanosecond. There is 2 ns skew, which limits how fast you can transmit.
When the same component that sends the data sends the clock, it is all synchronized. Data can be transmitted even faster than what is the time of flight