One reason to divide a clock by two is to obtain an even 50% duty cycle square wave. It may be that the 8085 internally uses both clock edges, and wouldn't function if one half of the cycle happened to be much shorter than the other.
In the days when the 8085 was new, those nice canned oscillators weren't common, and people often cobbled together clock circuits out of discrete crystals, capacitors, and logic gates. Dividing by two ensures that you have equally spaced rising and falling edges.
As for 6.144MHz, you will find that it can be divided by an integer to get common baud rate values, at least up to 38400.
follow up ...
Looking at an Intel data sheet for the 8085, there are three interesting statements
The 8085 incorporates all of the features that the 8224 clock generator and 8228 system controller provided for the 8080A
X1 and X2: Are connected to a crystal, LC or RC network to drive the internal clock generator. The input frequency is divided by 2 to give the processor's internal operating frequency.
CLK: Clock output for use as a system clock. The period of CLK is twice the X1, X2 input period.
So, speculations about using the odd edges of the clock to move stuff around internally aside, it becomes apparent that when they designed the 8085, Intel was replacing the need for a special clock controller by integrating that feature into the chip. Dividing the X1-X2 timebase in half before outputting it as CLK ensures that the system gets a nice even duty cycle, if nothing else.
The \$3\$ machine cycles are:
- Opcode Fetch Cycle
- Memory Read Cycle
- Memory Write Cycle
Internally, depending on the opcode, each machine cycle takes from \$3\$ to \$6\$ T-cycles (or T-states) to accomplish the \$1\$ machine cycle.
T-states are one clock period long, and the instruction length is measured in T-states.
For example, a typical Opcode Fetch has \$4\$ T-states: the first \$3\$, T\$1\$-T\$3\$ are used to fetch the instruction, and T\$4\$ is used to decode it.
Instruction cycles take from \$1\$ to \$6\$ machine cycles.
The 8085 also has some external status pins that can be used to identify which machine cycle it is currently in. These are the \$\mathrm{IO/\overline{M}}\$ signal, the \$\mathrm{S0}\$ and \$\mathrm{S1}\$ signals.
Opcode Fetch: \$\mathrm{IO/\overline{M}} = 0,\$ \$\mathrm{S0} = 1\$ and \$\mathrm{S1} = 1\$
Memory Read: \$\mathrm{IO/\overline{M}} = 0,\$ \$\mathrm{S0} = 0\$ and \$\mathrm{S1} = 1\$
Memory Write: \$\mathrm{IO/\overline{M}} = 0,\$ \$\mathrm{S0} = 1\$ and \$\mathrm{S1} = 0\$
There is also I/O read and write cycles, which are not part of this DCR M instruction, but when those cycles are active in other opcodes the control/status pin \$\mathrm{IO/\overline{M}} = 1\$
Best Answer
CALL is a 3-byte instruction.
If the call is going to be taken, obviously all three bytes of the instruction need to be fetched so that the PC can be updated. Additional machine cycles are required to write the old PC to the stack.
If the call is not going to be taken, the PC still needs to end up pointing to the next instruction after the CALL. The easiest way to do that is to go ahead and fetch all three bytes, incrementing the PC once during each machine cycle.
I may have answered too hastily. I can't find any reference that shows the cycle-by-cycle execution of a taken/not-taken conditional call, but I did find that the execution time is either 9 or 18 clock cycles.
All of the references show that the first machine cycle of a call is 6 clock cycles, and if the call is taken, an additional 4 × 3-clock machine cycles (two to fetch the target address, two to write the PC to the stack) would indeed add up to 18 clocks total.
But if the call is not taken, there is only one additional 3-clock machine cycle, which suggests that the CPU does not fetch both bytes of the instruction, but rather updates the PC internally without executing the third memory cycle.
Curiously, the timing on the original 8080 (the chip on which the 8085 is based) is different — 11 clocks if not taken, 17 clocks if taken.1
This suggests that the initial cycle is only 5 clocks, rather than 6, and when the call is taken, there are four more cycles of 3 clocks each: 5 + 3 + 3 + 3 + 3 = 17 clocks. But in the not-taken case, the other two instruction bytes ARE fetched: 5 + 3 + 3 = 11 clocks.
This is probably what I was remembering when I wrote the initial part of this answer above.
1 From my copy of the 1977 Intel Data Catalog