Ethernet – RS_FEC OVERHEADS ON 25GBASE-R

ethernet

I'd like to estimate the effective capacity of the 25 GbE link (25GBASE-R) to transfer an Ethernet payload of a particular size when Reed-Solomon RS(528,544) forward error correction (RS-FEC) is enabled, however, I'm having trouble coming to conclusions about the overhead associated with RS.

To keep things simple, let's assume that all Ethernet frames are exactly 1000 bytes, each with a MAC-header of 14 bytes, a payload of 982 bytes, and a frame check sequence of 4 bytes.

Furthermore, let's assume that there are no additional overheads on the link other than that required to transfer the 1000-byte frames. So that means, no such things as flow control or pause frames, etc.

25e9 BITS PER SECOND
I presume that everything to be counted is sent onto the link at 25e9 bits per second (25 Gbps). I arrive at this by multiplying the link signaling rate of 25.78125e9 baud by the 64-bit/66-bit encoding ratio.

That's a base efficiency of 96.97%.

One can also think of this as 3.125e9 bytes per second, or simply 3.125 GBps.

NON-FEC CASE
To estimate the effective Ethernet payload capacity of the non-FEC case it seems to me that we only have to consider the following transfers on the link:

12B interpacket gap
7B preamble
1B start of frame delimiter
14B Ethernet header
982B Ethernet payload
4B frame check sequence
1020B

With the Ethernet payload being 982 bytes out of the total 1020 bytes, the link, with respect to the payload, is thus 96.27% (982B/1020B) efficient.

If all items listed above are transmitted at 25e9 bits per second (25 Gbps), then the maximum Ethernet payload that could possibly be achieved would be 3.0086e9 bytes per second (3.01 GBps).

REED-SOLOMON FEC OVERHEADS?
The IEEE 802.3by document says that 25GBASE-R RS_FEC uses an RS(528, 514) encoding scheme. (It also says it operates over a Galois Field GF(2^10) where the symbol size is 10 bits, but I'm guessing that's not germane to my question.)

If I understand it correctly, RS(528, 514) takes in a maximum of eight 64-bit chunks of data, reorders the chunks and then blocks them into two groups of four, each being given 1 bit of overhead. Thus, what started as 512 bits of data (64 bytes of data) is now represented by 514 bits.

RS_FEC also generates a 14-bit parity value to be sent onto the link as well.

The original 512 bits of data is thus transmitted onto the link at 25 Gbps as 528 bits of RS_FEC data. That's 2 bytes of overhead for every 66 bytes. This is also 96.97% efficient.

Presuming that the data fed into RS_FEC algorithm is already encoded at 64B/66B, then that reduces the link efficiency from the base 96.97% that I first calculated, to 94.03% (0.9697 * 0.9697)

QUESTIONS
(1) Is the data that is fed into the RS_FEC algorithm already encoded as 64B/66B such that there is a drop in total efficiency to 94.03% after RS_FEC is applied?

(2) Are there any overheads that I'm not considering?

(3) What portion of the transmission is manipulated by RS_FEC?

For instance is the preamble, SFD, and FCS included?

How about the idles during the interpacket gap?

(4) IEEE Standard for Ethernet Amendment 2 (IEEE Std 802.3by-2016) indicates that the "25GBASE-R RS-FEC sublayer employs the Reed-Solomon code RS(528,514) operating over the Galois Field GF(2^10) where the symbol size is 10 bits." I presume that the symbol size is not germane to my original question.

The document also says that the "encoder described in 91.5.2.7 shall be used." This is a reference to SECTION 6 of the IEEE Standard for Ethernet (IEEE Std 802.3-2018) document, which is really about 40 GbE and 100 GbE.

How do I translate that to 25GBASE-R?

(5) SECTION 5 of that same document has a picture showing a FEC coded Ethernet frame (figure 65-7):
S_FEC(5B) + PREAMBLE/SLD + FRAME+FCS + T_FEC(6B) + PARITY(14bits) + T_FEC(6B)

Does this apply to 25GBASE-R as well? If so, then I have more questions:

(6a) Does the RS algorithm only apply to the PREAMBLE/SLD + FRAME + FCS?

Does it ever include the interpacket gap characters?

(6b) If the Ethernet frame were longer, would there be an S_FEC + T_FEC + PARITY + T_FEC for each 64-byte chunk of input data?

Or, would we only see the S_FEC and final T_FEC once per Ethernet frame?

(6c) Are S_FEC and T_FEC really 5 and 6 bytes respectively? If so, then doesn't that make the RS_FEC algorithm extremely inefficient?

At those size, the overhead per 64 bytes of original data would thus be 19 bytes (5B + 2bits + 6B + 14bits + 6B). That's only an efficiency of 77.11% ( 64 / (64 + 19) ).

However, the efficiency would be higher if for every 64-byte chunk of the frame there were only on T_FEC+PARITY, and for the entire frame there was only one S_FEC, at the start, and one T_FEC, at the end.

Thanks!
Tom


follow-on

It is my understanding the during the transcoding process the 66b block representing the original 64b data block is fed into the RS algorithm which then creates two 257b blocks, each with four of the 66b blocks plus one bit overhead. Thus we have 514b representing the 512b of original data. That's 2b of overhead.

That is then used in calculating a 14b parity value which is transmitted after the 514b block. Thus we now have 528b representing the 512b of original data.

Thus the capacity of the link should be reduced from 25e9 bps to 24.24e9 bps (25e9 bps * 512/528).

Are there any other RS-FEC overhead bits or bytes sent onto the link? How about S_FEC and T_FEC, are the inserted into the Ethernet frame?

Does RS-FEC only operate on frames or is every character that goes across the link fed through RS-FEC?

Thanks,
Tom

Best Answer

Regardless of which Ethernet PHY you use, the nominal speed is always the effective bandwidth at the top of the physical layer. Whether or not FEC is used, which PCS code etc doesn't matter. The symbol rate on the physical wire/fiber is appropriately higher to accomodate.

So, a maximum-sized Ethernet frame over 25GE takes exactly (1538 * 8 / 25,000,000,000) ≈ 49.2 μs. (1500 bytes payload, 18 bytes L2 overhead, 20 bytes L1 overhead including IPG), regardless of the PHY and its options.

For instance, the maximum effective throughput for TCP over IPv4 over 25GE is (1460 / 1538) * 25,000,000,000 / 8 ≈ 2.967 GB/s. (The window scaling option is almost always required.)

As JFL has pointed out, the real-world throughput depends on your hardware. Not all 25GE hardware is non-blocking. For instance, a 25G NIC requires at least PCIe 3.0 x4, PCIe 2.0 x8 or equivalent slot (and appropriate backend) for non-blocking transfer.

RS-FEC is applied between the PCS (66b/64b encoding) and the PMA sublayers (check IEEE 802.3 Clause 108). It transcodes 64b/66b to 256b/257b (108.5.2.3) to allow headroom for RS, so it doesn't require a higher baudrate.

Re follow-on: PCS line code, RS-FEC transcoding etc are applied in the physical layer. They don't change the data link layer frame in any way. Generally, PCS, RS-FEC etc only matter within the lower PHY. They have absolutely no effect on the upper PHY and above.