Wow, your question isn't terribly focused, and it's not obvious what you are really asking for. But let me give this one a try. Sorry if I didn't get it quite right.
Ripple counter vs. normal synchronous counter: Who says that people don't use ripple counters? People use whatever they have available that works best. In FPGAs, nobody uses a ripple counter because the logic blocks do a sync counter so much better than a ripple. But if you're designing a custom chip then a ripple counter can be more advantageous when it comes to power consumption and logic size. It would not surprise me at all of some people use ripple counters in their ASICs. Sync counters would still be better for speed and simplicity of timing.
Gray Counter vs. Binary Counter: People do use gray counters in ASICs and custom chips. In FPGAs, where binary counters are faster, people still use Gray counters when the count value has to go across clock domains, such as in FIFOs.
Multi-phase clocks: These are certainly used in the design. There are reasons why the PLLs in FPGAs can often output 0, 90, 180, and 270 deg phase-shifted versions of the original clocks. But as the clock frequencies go up, using multiple clocks gets harder due to clock skew and clock distribution issues. It's not impossible at high frequencies but it just isn't done as much.
Sync vs. Async: Sync circuits are not just easier to simulate but easier to design and easier to guarantee that they work correctly. Verification and timing analysis tools are difficult-to-impossible to use with async circuits.
MCU Counter Circuit: Do you KNOW that there are no MCUs that do it that way? If it did, how could you tell? Maybe the prescalers on the timer are ripple counters. Maybe the timer itself is a Gray-coded counter and reading/writing the registers automatically converts it to/from binary. My point is this: the guys who design super-low power MCUs (like the MSP430) do every trick in the book to reduce power consumption. Many of those tricks, like using ripple counters and Gray code where appropriate, are completely invisible to people like you and I. They can, and probably are, using those tricks plus a couple of hundred other tricks that you haven't thought of yet.
One thing that you haven't mentioned is the use of completely async circuits. This is where all of your talk about clocks eventually goes when taken to it's logical conclusion. There have been companies that have tried to build large-scale CPUs that are completely async, including one group that tried to bring an async ARM to market. The benefits are amazing: super-low power, faster processing, and less EMI among them. But the disadvantages are more amazing yet. The main one is that the complexity of designing this chip is huge and is not economically viable today. A secondary problem is that the number of transistors about doubles when compared to an equivalent sync chip.
Even so, there are CPUs on the market today that use async logic in some of its blocks, like the FPU, but nobody uses it on a large scale.
Here is a simple generate statement, which generates 8 TFFs and connects the clock input of the tff to the q output from previous FF. Because you are using indices calculations (i+1 or i-1), you need to wider range for the tff_clocks range or you must shorten the generate loop.
I'm using a loop from 0 to 7 so I extended tff_clocks by 1. Index 0 is connected to the original system clock.
architecture ....
signal tff_clocks : std_logic_vector(8 downto 0);
begin
tff_clocks(0) <= clk; -- first tff is clock with main clock
genTFF : for i in 0 to 7 generate
tff_inst : tff
port map (
clk => tff_clocks(i),
t => '1',
q => tff_clocks(i + 1)
);
end generate;
async_counter_result <= tff_clocks(8 downto 1);
end;
Best Answer
I think what you are looking for is a phase-frequency detector (PFD). These circuits are quite often used in phase-locked-loops (PLLs) for producing a "clean" reference signal that is locked-on to a fairly noisy (or modulated) input signal. The heart of the PFD can be shown in the logic diagram on the picture below (extract taken from here): -
The two outputs labelled "U" and "D" stand for up and down respectively and these two outputs can be combined with equal value resistors to produce an output voltage that represents the frequency/phase difference between two signals.
Wikipedia also provides some information about this type of circuit.
This Maxim article also shows how one can be implemented using D type flip-flops: -
This is a difficult question to answer directly but I will say that PFD circuits, when used as the heart of a PLL, can achieve sub milli-hertz accuracy on locking an oscillator to an unknown input frequency of hundreds if not thousands of mega-hertz. Your two signals will begin in phase at some point and, due to the frequency difference between the two signals will drift to being completely out of phase - this will result in a cyclical output of the PFD which should be easily seen using an oscilloscope. Given that you then know how often the phases align, it is a trivial matter to compute what the average phase (or time delay) difference is per cycle.