Electrical – Source synchronous vs Common clock methodology in Physical design

clockflipflopphysical-designsynchronous

The common clock and source synchronous clock scheme is explained here: http://referencedesigner.com/books/si/common-vs-source-sync.php.

Question is:
1) How is maximum frequency attained with source synchronous more than common clock mechanism ? As maximum frequency depends on data path which will remain constant for both clocking techniques.
We have the following equation for setup time of flip flop.
Tclk2q + Tcomb < Tperiod + Tskew -Tsetup
Thus:
Tperiod > Tclk2q + Tcomb -Tskew + Tsetup
From this we see Tperiod(1/Max_fequency) depends on Data delay (constant for common clock and source synchronous).

2) And if source synchronous is better than common clock mechanism then why don't we follow source synchronous clocking for all flops in the design? And instead of doing CTS why not just do source synchronous clocking for every flop in the design?

Best Answer

Provided a particular data bit is not associated with a particular clock edge, there is no reason both schemes cannot achieve the same speed; this is true in quite a few serial standards. Note that the reference clocks that are used are normally much slower than the interface speed; they are used internally in the data link endpoints via a PLL to generate the interface clock.

Consider the reality of the clocking schemes:

Here are the two schemes, but with an important addition - the local reference clock used in source synchronous schemes:

Typical Data link click schemes

The local reference clock is required in source synchronous schemes as we need a local clock to clock data out in the first case. This also means that the receiver clocks will be slightly different (no two oscillators are precisely the same frequency).

Some serial standards implement this, such as Infiniband and PCI Express to name but two.

Due to the need for a local clock at each end of the link, which will differ (the specifications permit this, quite reasonably), the links are operating at different speeds in transmit and receive. This adds a new requirement to the link to prevent receiver buffer overrun and introduces the concept of the Skip ordered set; i.e. an elastic buffer is required (ordered sets are important for many reasons) adding a bit of complexity, but a clock does not need to be distributed for proper operation (very important in board to board or even box to box links).

The distributed clock architecture usually has no length match requirement on the common clock, so although the clock frequency is the same at each point, the relative phase is unknown, so the receivers must dynamically determine which edge of the clock to use to clock data in; this adds complexity as well.

In addition, board to board and box to box implementations of this type would need this distributed clock, adding wires and buffers which themselves add complexity to the overall system.

Interestingly, PCI Express supports this mode as well, as well as HyperTransport.

So which scheme is better? Neither - they both have pros and cons; the specific application determines which is appropriate.