First of all, "saturation" in mosfets means that change in VDS will not produce significant change in the Id (drain current). You can think about MOSFET in saturation as a current source. That is regardless of the voltage across VDS (with limits of course) the current through the device will be (almost) constant.
Now going back to the question:
According to wikipedia, the MOSFET is in saturation when V(GS) > V(TH) and V(DS) > V(GS) - V(TH).
That is correct.
If I slowly increase the gate voltage starting from 0, the MOSFET remains off. The LED starts conducting a small amount of current when the gate voltage is around 2.5V or so.
You increased The Vgs above Vth of the NMOS so the channel was formed and device started to conduct.
The brightness stops increasing when the gate voltage reaches around 4V. There is no change in the brightness of the LED when the gate voltage is greater then 4V. Even if I increase the voltage rapidly from 4 to 12, the brightness of the LED remains unchanged.
You increased the Vgs making the device conducting more current. At Vgs = 4V the thing that is limiting amount of current is no longer transistor but resistor that you have in series with transistor.
I also monitor the Drain to Source voltage while I'm increasing the gate voltage. The drain to source voltage drops from 12V to close to 0V when the gate voltage is 4V or so. This is easy to understand: since R1 and R(DS) form a voltage divider and R1 is much larger than R(DS), most of the voltage is dropped on R1. In my measurements, around 10V is being dropped on R1 and the rest on the red LED (2V).
Everything looks in order here.
However, since V(DS) is now approximately 0, the condition V(DS) > V(GS) - V(TH) is not satisfied, is the MOSFET not in saturation?
No it is not. It is in linear or triode region. It behaves as resistor in that region. That is increasing Vds will increase Id.
If this is the case, how would one design a circuit in which the MOSFET is in saturation?
You already have. You just to need take care for operating point (make sure that conditions that you have mention are met).
A) In linear region you can observe following: -> when increasing the SUPPLY voltage, the LED will get brighter as the current across resistor and transistor will rise and thus more will be flowing through the LED.
B) In saturation region something different will happen -> when increasing SUPPLY voltage, the LED brightness will not change. The extra voltage that you apply on the SUPPLY will not translate to bigger current. Instead it will be across MOSFET, so the DRAIN volage will rise together with supply voltage (so increase supply by 2V will mean increasing drain volage by almost 2V)
Consider the equivalent BJT circuit, which may be more familiar:
simulate this circuit – Schematic created using CircuitLab
This holds provided the input voltage is >0.65V, with of course some variation based on temperature, output current, manufacturing variation, etc. However, as a first approximation this circuit outputs a constant 0.65V.
The MOSFET circuit is no different, but instead of the 0.65V from a forward-biased silicon PN junction, we get the threshold voltage of the MOSFET. This parameter varies between models of MOSFETs, but is usually some volts. If the output voltage, which is also the gate voltage, is above the threshold voltage, the MOSFET turns on more, shunting more current to ground, increasing the current through the resistor, lowering the ouput/gate voltage such that an equilibrium is reached:
$$ V_{GS} = V_{out} \approx V_{GS(th)} $$
This sort of circuit would be useful as a reference voltage, for example, to implement a voltage regulator, because the output voltage is relatively unaffected by the input voltage. A single transistor such as this isn't necessarily a good voltage regulator on its own, but it could be the basis for something better. A good regulator starts with a reference such as this which might vary based on other parameters (output current, supply voltage, temperature), then isolating or compensating for those parameters from the reference.
Best Answer
I am not a specialist in Solid-State physics, but recall that the "Early Effect" was first identified for BJT, as a good approximation of I-V characteristics for different base currents. In MOSFET there is some resemblance (but versus Vgs), so the shapes of MOSFET I-V curves are also sometimes characterized by "Early Voltage". However, this is still an approximation, and it doesn't work well for MOSFETS.
Identification of "linear I-V" section is somewhat subjective, and the negative ("Early") voltages derived by extrapolation of I-V curves may not be identical for different Vgs curves, that's why you are having difficulties.
In other words, the dynamic impedance (RdsON) for MOSFETs is not only dependent on Vgs, but also on Vds, that's why the RdsON is so fuzzy-specified in MOSFET datasheets.