Your design will not function correctly if it runs at 100 MHz but is only spec'd (by the tools) to run at 50 MHz. If it does, then it's a one-off miracle that wouldn't work when you make a change and rerun the tools. Don't do it. Don't even do it if your clock is 100 MHz and the tools tell you the design can run at 99.5 MHz.
To solve your problem you can either write a simple 'divide by power of 2' clock divider to reduce the clock frequency (something like this in Verilog):
reg [n:0] count;
always @(posedge CLK_100) begin
count <= count + 1;
end
BUFG bg_0 (.I(count[m]), .O(CLK_DIV));
(where 'm' <= 'n' and 'bufg' is a global clock buffer, and must be used for synchronous designs) or use a Digital Clock Manager (DCM).
Hopefully that solves your pipelining issues as well unless you absolutely have to run the entire design at 100 MHz. Other than pipelining you can consider using FIFOs if you have part of the design running at 50 MHz and the other at 100 MHz, but you'll have to say a bit more about what you're doing to get more meaningful help here.
Doh! Facepalm time. I completely missed the fact that your circuit is an FPGA, so ALL or my timing analysis was wrong. Well, OK. Scratch the timing. What remains is correct, so here is the new, improved, and maybe to the point version.
The simplest answer is that this is not going to work as you think.
The first problem is that your preset is wrong. Instead of calculating 4095 - 1 - 752, you should have calculated 4095 + 1 - 752. You had the right idea (essentially recognizing that 0 is a state), but you got the sign wrong. That is, you were trying to calculate 4095 - (752 - 1).
Another problem is that you are using the last ripple carry to reset your counters. This is wrong on 2 counts. First, what you want to do is to load the presets which you calculated. Second, the counter will reset anyways, since the next count after FFF is 000. The most elegant way to load your preset is to change your preset to 1000 1000 1011, and use the QC output to drive your preset pins. Essentially you are presetting your counter to one count more than previously, then letting the rollover from FFF to 000 provide the active low signal you need to preset the counters. This eliminates the inverter you used.
In the absence of activity on the load lines, what will happen with this circuit is that it will produce on phase_three a 50 nsec pulse at a 4.88 KHz (20 MHz / 4096) frequency. That this is apparently not the case, since you say you're getting good outputs for a different preload, seems clear. If you are not sending pulses on the preload line, I have no faint idea why the preload setting would make a difference.
Also, be aware that RCOs are not clean. They will show spikes at intermediate counts. This is true for discrete logic, and in some respects even more so for FPGA logic.
Finally, a note of caution, if you are going to use an external preload as shown, you will occasionally get weird results. This is caused by the preset releasing too close to the rising clock edge, so that some counters will (occasianally) respond in a flakey manner. The term for this is metastability, and if you are going to synchronise any sort of clocked logic to external events, you need to do a little studying.
Best Answer
A PLL is generally required to achieve what you want to do. Trying to use just logic to do this requires the addition of some extra delays via R/C time constants to bring the 2x pulses up to near 50% duty cycle. However that will not generally happen inside an FPGA without bringing some signals to pins on the part where the R/C can be connected and then fed back into other pins. Another limitation is that such scheme will not be right on 50% duty cycle and for a given set of R/C values will only be useful at a particular narrow range of input frequency.