I'm sorry for I cannot answer your particular question, I can however give you a solution to your underlying question which I assume is:
How can I perform cascade-by-cascade calculations and end up with H(s)?
Well, I've used the ABCD matrix for this, I am not 100% sure if I've used it correctly, but I've come to the same answer as when I use KCL, so it does work.
I will continue assuming you've clicked the link and seen how every impedance is translated into a 2x2 matrix for series/parallel connections.
For clarity I will show you how I would apply the ABCD-parameters for your particular case.
$$
Z_{s} =
\begin{pmatrix}
1 & R_s \\
0 & 1 \\
\end{pmatrix}
~~~
Z_f =
\begin{pmatrix}
1 & R_f \\
0 & 1 \\
\end{pmatrix}
~~~
Z_c =
\begin{pmatrix}
1 & 0 \\
sC & 1 \\
\end{pmatrix}
~~~
Z_l =
\begin{pmatrix}
1 & 0 \\
\frac{1}{R_l} & 1 \\
\end{pmatrix}
$$
These are your impedances, then we place the currents and voltages at the appropriate places with appropriate signs and we get the following:
$$
\begin{pmatrix}
V_1(s) \\
I_1(s) \\
\end{pmatrix}
=Z_sZ_fZ_cZ_l
\begin{pmatrix}
V_2(s) \\
-I_2(s)\\
\end{pmatrix}
$$
Then we get something that looks like this:
$$
\begin{pmatrix}
V_1(s) \\
I_1(s) \\
\end{pmatrix}
=
\begin{pmatrix}
1+(R_s+R_f)(sC+\frac{1}{R_l}) & R_s+R_f \\
sC+\frac{1}{R_l} & 1 \\
\end{pmatrix}
\begin{pmatrix}
V_2(s) \\
-I_2(s)\\
\end{pmatrix}
$$
The equation above shows how the current and voltages are related between the input and output. But we don’t care about the input current. For simplicity, let's assume that you are driving a high-impedance load meaning that \$I_2\$ ≃ 0. With output current equal to 0 the left elements remains. Only caring about \$V_1\$ leaves the top elements, this equation is simplified to what we only care about, namely the voltages. The top left element.
In other words, this is the important equation we can dissect out of the matrix equation above:
$$
\begin{align}
V_1(s)&=(1+(R_s+R_f)(sC+\frac{1}{R_l}))V_2(s)\\
\\
\frac{V_2(s)}{V_1(s)}=H(s)&=\frac{1}{1+(R_s+R_f)(sC+\frac{1}{R_l})}
\end{align}
$$
I have not verified if the equation I came up with is the same as the one you have, but I will let you verify that.
I'm sorry for not answering your specific question, but I believe this answers your underlying question/problem.
Best Answer
The conclusion of the author is incorrect. It is in fact possible to use step response data to measure a system's transfer function, it's just a bit more complicated. This is done all the time when evaluating time domain reflectometry and time domain transmission measurements.
The main issue is that the author defines a step to be "a sequence consisting entirely of 1's", which is not a universal definition. A step function should be 0 for t < 0 and 1 for t > 0. For some reason the author of your web page seems to think t=0 is the beginning of time and there's no way we can get data for t<0, which is not generally true.
However if you use a dataset where you start at some t0 < 0, you will have to account for the delay factor when doing your data analysis to calculate the transfer function (if you care about getting the phase right).
Also, you will need to understand about applying a window function to your data before doing the numerical transform, because the common DFT (aka FFT) assumes a periodic signal. This means it thinks the sample after your final sample will be a repeat of your first sample, and so on. This will often cause there to be an artificial second step (with very high frequency content) at the boundary between the last and first samples. Applying a window function (for example a Hamming window) will minimize the effect of this discontinuity, but also introduce some "smearing" in the computed frequency-domain transfer function. See a textbook on DSP for further details on this issue.
Finally, you will, as alluded to in the web page, run into a problem at high frequencies. Since the power content of the stimulus signal is falling off like 1/f, eventually the response signal will be below the noise floor of the measurement system, and the measurement will not be meaningful. Said another way, when you compensate for the 1/f fall-off of the stimulus in calculating the transfer function, you will enhance the noise at high frequencies.