Like LvW says in his answer, note that what we call Rds is not a physical resistor present in the MOSFET but it is a phenomenon which is presented by a resistor called Rds in the small signal model of the MOSFET.
You take a MOSFET, you apply DC voltages and currents to it so that it will have a certain operating point. For example, an operating point where the drain current Ids = 1 mA and Vds is 3 V. For this imaginary NFET Vt = 1 V so this NMOS is in saturation.
Now that we know the operating point of this NMOS, we can calculate values for some small signal parameters of this NMOS at this operationg point. These parameters are all derivatives For example:
$$gm = dId / dVgs$$
and
$$Rds = dVds / dId$$
Note how Rds is the derivative of Vds/Id !
The values of gm and Rds result from the physical properties of the MOSFET. So for a different MOSFET (for example, one with a longer channel) these values will be different. In general, Rds will be larger for a MOSFET with a longer channel.
But this does not explain yet why this is so.
What does explain it is the Channel length modulation effect.
For MOSFETs with very short channels the drain is (physically close to the part of the MOSFET's channel which determines the drain current when it is in saturation. As the voltage on the drain increases the depletion layer around the drain also increases in size. Worst case this depletion region can even touch the channel. This results in a low ohmic path between drain and source and Rds will be very low.
If the drain is physically further away from the source that depletion region cannot get anywhere near the channel so the channel will determine the current without the drain and it's depletion region interfering. This results in a more ideal current source behavior of the channel. For a high Rds, this is what is needed, it means dId will be very small (only small drain current variations due to changes in Vds).
A MOSFET acts like a variable resistor in the linear region and a VCCS in the saturation region. In saturation, the VCCS has a small-signal output resistance due to channel length modulation, but it's not the same behavior as the linear region.
In saturation, the drain current is given by the equation:
$$I_D = k(V_{GS} - V_{T})^2(1 + \lambda V_{DS})$$
We can calculate the drain-source conductance (1/resistance) as:
$$G_{sat} = \frac {dI_D}{dV_{DS}} = \lambda k (V_{GS} - V_T)^2$$
In the linear region, the drain current is given by:
$$I_D = k[2(V_{GS} - V_T)V_{DS} - V_{DS}^2]$$
If \$V_{DS} << V_{GS} - V_T\$, this reduces to:
$$I_D = 2k(V_{GS} - V_T)V_{DS}$$
and the conductance is:
$$G_{lin} = \frac {dI_D}{dV_{DS}} = 2k(V_{GS} - V_T)$$
By themselves, \$G_{sat}\$ and \$G_{lin}\$ don't look all that different. To my mind, the main differences are:
Saturation assumes that \$V_{DS} > V_{GS} - V_T\$, and you need a current source to model that.
The linear region is used for switching. A resistor model is useful for estimating power dissipation when the switch is on.
Lambda is typically very small (~0.01), so the saturation output resistance is a second-order effect. In the linear region, \$V_{DS}\$ is just as important as \$V_{GS}\$.
Best Answer
Here they have defined as the internal resistance of current source.
As in the circuit gmb and r0 is already included to capture the effect of parasitic and body-bias. So this can have a chance. But still, I am not sure whether these two documents are following the same notation.