Yes, power suppplies need a little to time to stabilize its output voltage after turn on. Some need a minimum load to work fine. It depends on the specific powersupply. Try looking at the supply documentation to see if that is the case. You should try adding the delay you mention in the question to see if it improves.
Another thing to consider is that motors generally draw a big amount of current at start transient. If possible add a soft start to the motor, so it gradually increases its speed.
I know of three ways to accomplish what you need (and I have used all three). You have mentioned the first two in your comment.
Having an ISR count the step-pulses is the simplest. The ISR need only increment or decrement a position counter. In the 8-bit micros that I use, such an ISR would take less than a microsecond (although I code in assembly language, not C, on that MCU). It shouldn't be much overhead on any MCU.
The second way is to bring the step pulse into a counter. That could be difficult to manage if your motor runs in both directions, as you need to increment sometimes and decrement others (or just know which direction the count is in relation to). I used this method back in the 80's when counter/timer chips were typically used for motion control.
The most efficient way to control a stepper is with a separate rate-generator circuit, controlled by the MCU. A simple way to build one is to use the 7497 rate-multiplier chip. Each 7497 is six-bits, and you cascade them to get your desired resolution. However, their output pulse stream is not very even, which can cause instability in some applications (it can be filtered, however). A better technique is the adder/accumulator method, which gives a very clean output pulse stream, and is easily multiplexed to drive multiple motors (if you need that). I've had some 32-axis systems that used this approach. The Adder/accumulator (and the mux) fits very nicely in an FPGA.
The big advantage of a rate-generator is the simplicity of the software. The rate-generator gives you an interrupt at a fixed rate, which is your update period. In that ISR you simply load the number of steps you want executed in the next period. The update interrupts can be relatively infrequent, so overhead is low. The position is easy to maintain - you just add the value you load into the rate generator to your position counter. The velocity is easily controlled because it is in direct proportion to the number of steps you load into the rate-generator. Acceleration is easy to control as well - just add/subtract a fixed value on each update. If you have multiple motors, you would update them all in the same ISR.
(whew) I'm sorry if that was too long-winded.
Best Answer
For the Allegro A4988 I would refer you to figure 3 in the data sheet: -
Following activation of a "step" command, it looks to me like the load current is changing on the leading edge i.e. pretty much instantly. There are other pictorial examples in that device's data sheet that confirm this.
For the DRV8825 it's a similar story: -