I've started designing an implementation of an 8088 from scratch with the goal of being cycle-exact. I can understand the reasoning behind the number of clock cycles for most instructions, however I must say I'm quite puzzled by the Effective Address (EA) calculation time.
More specifically, why does computing BP + DI or BX + SI take 7 cycles, but computing BP + SI or BX + DI take 8 cycles? Note that this is the number of cycles for the whole EA calculation, which includes a shift plus add with a segment register (presumably this takes a couple of cycles to keep combinational delays as low as possible).
I could just wait for a given number of cycles in my design, but I'm really interested in knowing why there's this 1-cycle difference (and overall why it takes so many cycles to do any EA calculation, when an ADD between registers is just 3 cycles).
Best Answer
Gracious reply from Stephen Morse (designer of the 8086)...
A definitive answer may have to wait for someone to reverse engineer the silicon...