The only way I though to counter act this is to AND the flip flop's clock input with a clock that is faster than the main clock... that way data will be guaranteed to be clocked in at the end of that cycle.
This sounds to me like an architecture choice that will eventually limit the performance (maximum clock speed) you can achieve with your design. If your registers are able to function at the faster clock speed, you'll eventually want to try to get the whole system running as close to that clock speed as you can, but then you won't be able to have a "slow" clock and a "fast" clock to do this with.
In order to do this, I'm fetching data from memory, placing it on the data bus, then clocking it into a register all in a single operation. I'm worried that the rising edge of the main clock will happen at the register before the data is fetched from memory.... a sort of propagation delay / race condition.
First solution
One way that leaps to mind to solve this is to clock data out of the memory on the rising edge of the clock, and clock it in to the register on the falling edge. Since your register doesn't have a configuration bit for which edge it responds to (like it would if you were designing in an FPGA), you would have to generate the appropriate signal by using an inverter (NOT gate) between the "main" clock signal and the register.
More generally, it's possible to distribute several phases of your clock (e.g., 0, 90, 180, and 270 degrees) instead of just clock and inverted clock. And use these different phases to execute different actions at different times. Of course you have to do a fairly careful analysis of each interface where data is transferred from one phase to another to be sure setup and hold times are met.
To the best of my understanding (possibly out-of-date) multiphase clock designs were fairly common in the discrete logic design era, and were also common (and may still be common) in ASICs and custom chip designs. But they are fairly uncommon in FPGA design due to the complexity of the timing analysis.
Second solution
Another option is to create a controller state machine that enables and disables different elements on different clock cycles as needed. For example, you'd enable the memory output on cycle 1 and enable the register to latch in the data on cycle 2. Since your register apparently doesn't have a clock enable input, you might need to do this by ANDing a state machine output with the clock input to the register.
This type of design was fairly common in the era of discrete logic CPUs, and its what was taught in undergraduate digital logic courses in the early 90's. An elaborate version of this scheme is called a microcoded architecture.
Of course this architecture means that you need more than one clock cycle to complete each instruction. But it would be multiple cycles of your fast clock, not your original "slow" clock that would be used, and you are already using more than one cycle of the fast clock per instruction in your design.
I believe you are missing some basic concepts about sequential circuits. First of all, while combinational circuits are stateless, sequential circuits are defined by the fact of having some kind of inner state that can be changed either in precise instants of time (synchronous circuits) or when a certain condition is true (asynchronous circuits).
The cool thing about synchronous sequential circuits is that they can be realized by combining only two different ingredients:
- combinational logic, which might be the 2-level logic that can be realized with Karnaugh maps or more complex multi-level logic;
- flip flops, similar to the one you asked about (though most of the time you use a single-edge-triggered flip flop).
So basically to design a generic synchronous circuit you divide it in a combinational part and in registers (flip flops). To do this you must have a model of what you're doing; an example of a simple and useful one is that of Moore finite state machines in which you have a state \$S\$, an input \$x\$ and an output \$y\$. A combinational circuit \$C_s\$ is used to compute the new state as \$S'=f_{C_s}(S, x)\$, a second combinational circuit \$C_y\$ is used to compute the new output from the current state as \$y=g_{C_y}(S)\$ and the state is memorized in flip flops.
Many other models exists apart from this one (e.g. Mealy finite state machines) but the constant is that your problem is always decomposed in a designing/synthesizing a set of combinational circuits and using flip flops. This can be done very efficiently by automatic synthesis tools from an RTL input such as Verilog, SystemVerilog or VHDL code.
But one problem remains: how to design flip flops then? Flip-flops themselves are neither synchronous circuits nor combinational circuits. They are the most famous representative of the category of asynchronous circuits. The most famous type of flip flop, the master-slave edge triggered one, is a relatively complex circuit composed by a sequence of two set-reset latches that are transparent on opposite phases of the clock. Each latch is composed by two simple gates in a feedback chain (see Wikipedia for details). In any case, the flip flop has to be designed very carefully so that it behaves as the ideal edge-triggered flip flop, sampling the input exactly at the clock edge (real flip flops have a setup time constraint, during which the input datum must be stable before sampling, and a hold time one during which the datum must be kept stable after sampling).
Unfortunately, there are no simple general methods for designing asynchronous circuits; in fact, such methods are a somewhat active field of research in the electronic design automation community.
Best Answer
Part1
A mod-3 counter with output high for only one state will work as a divide-by-3 system. But duty-cycle will be 1/3. The state table for which can be written as:
This system needs two flip flops for implementation. We need to find out what should be connected to the inputs (D) of these flip flops. This is where K-map is needed. We have the table. Just translate to k-map and solve for
Ad
andBd
. (You actually don't need a K-map to solve for a 2-variable logic)Part2
To make the duty-cycle 50%, the output should be high for 1.5 clock cycles instead of 1. If we can make a circuit that can shift the input signal by half a clock period (as
BQ
andCQ
in 2nd figure), then ORing the input and output of such a circuit can give the required 50% duty-cycle.