First of all, "saturation" in mosfets means that change in VDS will not produce significant change in the Id (drain current). You can think about MOSFET in saturation as a current source. That is regardless of the voltage across VDS (with limits of course) the current through the device will be (almost) constant.
Now going back to the question:
According to wikipedia, the MOSFET is in saturation when V(GS) > V(TH) and V(DS) > V(GS) - V(TH).
That is correct.
If I slowly increase the gate voltage starting from 0, the MOSFET remains off. The LED starts conducting a small amount of current when the gate voltage is around 2.5V or so.
You increased The Vgs above Vth of the NMOS so the channel was formed and device started to conduct.
The brightness stops increasing when the gate voltage reaches around 4V. There is no change in the brightness of the LED when the gate voltage is greater then 4V. Even if I increase the voltage rapidly from 4 to 12, the brightness of the LED remains unchanged.
You increased the Vgs making the device conducting more current. At Vgs = 4V the thing that is limiting amount of current is no longer transistor but resistor that you have in series with transistor.
I also monitor the Drain to Source voltage while I'm increasing the gate voltage. The drain to source voltage drops from 12V to close to 0V when the gate voltage is 4V or so. This is easy to understand: since R1 and R(DS) form a voltage divider and R1 is much larger than R(DS), most of the voltage is dropped on R1. In my measurements, around 10V is being dropped on R1 and the rest on the red LED (2V).
Everything looks in order here.
However, since V(DS) is now approximately 0, the condition V(DS) > V(GS) - V(TH) is not satisfied, is the MOSFET not in saturation?
No it is not. It is in linear or triode region. It behaves as resistor in that region. That is increasing Vds will increase Id.
If this is the case, how would one design a circuit in which the MOSFET is in saturation?
You already have. You just to need take care for operating point (make sure that conditions that you have mention are met).
A) In linear region you can observe following: -> when increasing the SUPPLY voltage, the LED will get brighter as the current across resistor and transistor will rise and thus more will be flowing through the LED.
B) In saturation region something different will happen -> when increasing SUPPLY voltage, the LED brightness will not change. The extra voltage that you apply on the SUPPLY will not translate to bigger current. Instead it will be across MOSFET, so the DRAIN volage will rise together with supply voltage (so increase supply by 2V will mean increasing drain volage by almost 2V)
"Linear region" in the answers you quote is used somewhat loosely. Often we say "linear region" or "linear operation" in electronics when we mean in-between operation where a voltage is kept somewhere between the power supply rails (as apposed to clamped to near one of them) or a device like a transistor is kept in the middle region where it is not fully on or fully off. Often devices aren't all that linear in this "linear region", but it's a name that stuck from long ago where linear region was as opposed to in switching operation or the clipped region.
It is this middle "linear" region where the device will dissipate significant power. If the device is a ideal switch, then it can't dissipate power when open since the current is zero, or when closed since the voltage is zero.
This is different from "linear region" when talking about the device physics or details characteristics of a MOSFET. There "linear" can mean "roughly linear current with applied voltage", which also means the MOSFET is acting like a resistor as apposed to more like a current source. That's different from "linear region" from the overall circuit perspective.
Yes, it's context-dependent and can be confusing. If you need to be precise, use real numbers.
Best Answer
When your article says this (wrongly): -
It's because it's written by someone who thinks that the name of the equivalent section of the BJT's characteristic is 100% transferable to MOSFETs.
To clear this up: -
Answer: the linear/ohmic/triode region