A pipelined CPU implies a multi-cycle datapath, precisely because it takes five clock cycles for an instruction to go from Fetch to Writeback.
Where I'm getting confused is here "unlike the multi cycle cpu, the pipelined datapath requires that every instruction use all five stages of execution."
You should finish reading the next paragraph you're quoting. That requirement is just to prevent two instructions from finishing at the same time.
suppose we use the latencies from our multi-cycle cpu, and we try to run a load instruction followed by an add instruction. the load instruction will require five cycles to execute, and the add instruction will require four cycles. so, if we start running the load instruction on cycle 1, it will finish execution on cycle 5. we are pipelining, so we can start running the add instruction on cycle 2, and it will finish on cycle 5. this is a problem: we have two instructions finishing on cycle 5: they will both try to write to the register file on cycle 5. this is a problem, because our register file only has one write port.
You mix two independent (orthogonal) ideas in digital circuits theory: asynchronous circuits and multi-core processors.
Asynchronous circuits: circuits which have more than one clock, and the clocks are asynchronous (i.e. have non-constant and unpredictable phase relationship).
Some circuits may use two clocks (for example), but one is just a division by 2 of the other. These circuits are not asynchronous because there is known phase relationship between the two clocks, although the frequencies of the clocks are different.
You may have a single core CPU having few asynchronous clocks, and a multi-core CPU with all its cores running on the same clock (the latter is just an imaginary CPU - all real multi-core CPUs have many clocks which consist several mutually-asynchronous clock sets).
Asynchronous circuits is a major topic in digital design. The above explanation is basic.
Multi-core CPUs: few microprocessors (cores) connected in parallel which employ sophisticated hardware and software in order to achieve high performance.
The usual practice is to make the cores as independent as possible in terms of clocks/power/execution/etc. This allows dynamic (at run time) adjustment of CPUs activity (i.e. consumed power) to the actual needs of the system.
My impression is that what you're looking for is an explanation about multi-core CPUs, not asynchronous circuits.
This topic is much, much bigger than anything one can put in the answer.
The answers to your questions, though:
- The clocks used by different cores (to my best knowledge) have the same sources (can be more than one: crystal, VCO, ...). Each core (usually) has few mutually-asynchronous clock sets. Each core has dedicated clock gating and throttling logic which allow to turn-off or slow the clock, independently for each core. Again, if you're interested only in algorithmic aspect of cores' parallelism - forget about clocks (for now).
- You have just indicated the main aspect of cores' parallelism - how do you run multiple cores in parallel efficiently. This topic is huge, and contains both HW and SW solutions. From HW perspective, cores both modify a common memory and exchange control and status signals with sequencing logic and between themselves. The picture complicates a lot due to existence of caches - I'd suggest that you start from reading on caches, then cache coherency, and only then on cashes in multi-cores systems.
Hope this helps.
Best Answer
Almost all CPU cores in the market today are designed with multiple pipeline stages (greater than 1); see a classic CPU example here. A 1 stage pipeline simply implies that all major stages of a typical CPU, such as fetch, decode, execute, memory and write back are all done in one cycle a.k.a, the CPU core itself is not pipelined (a non-pipelined architecture).