As others have mentioned, this is best done in assembly. Here is my original attempt at coding this, when I thought the jump instructions took either 2 or 4 cycles (see Edit below for the revised version).
void delay_sub(unsigned char i)
{
// convert 20, 21, 22 etc to count in R7 of 1, 2, 3 (extra cycle added if i is odd)
; cycles
rrc A ; 1 c = 1 if odd
jnc even ; 2 or 4 extra 2 cycles if branch taken (spoils cache)
nop ; 1 delete if using lcall's instead of acall's
nop ; 1 same
clc ; 1 in either case carry is clear prior to subb
even:
subb A,#9 ; 1
mov R7,A ; 1 R7 now = (i / 2) - 9
//while (i--);
loop:
djnz R7, loop ; 2 loop address should be in cache, so no extra cycles needed
ret ; 6
}
timing calculation (assuming acall's)
if i even:
5+7+R7*2+6 = minimum of 20 22 24 ... => R7 = 1, 2, 3 ...
if i odd:
5+8+R7*2+6 = minimum of 21 23 25 ... => R7 = 1, 2, 3 ...
It assumes a call is made like ACALL(nn), where nn is a constant or a variable in a byte variable, so that the parameter can be passed using a one cycle MOV A,#n instruction for example. The minimum timing you can do is 20 clocks, as you asked for.
mov A,#n ; 1
acall delay_sub ; 4
There is no check that the parameter is greater than or equal to 20, any values less than 20 will give incorrect timing.
The mov instruction and acall will take 5 cycles. First off, the count (i) is divided by two to account for the DJNZ instruction taking two cycles. Then the count is adjusted to add a cycle if i is odd. Finally a fixed value is subtracted so the value in the register to be decremented (R7) is in the range 1, 2, 3 ... R7 is then decremented in a tight loop (two cycles per count). There is a fixed cycle count of 6 for the return.
If you have to use a LCALL instead of a ACALL, the minimum timing you can do will be 21 clocks instead of 20, and you will need to delete the two nop's after the jnc instruction. You have to use either all ACALL's or LCALL's, you can't mix them.
I would avoid using C to call the function unless you can guarantee the compiler doesn't add extra overhead. Also, I'm using R7 as a scratch register; your compiler manual will tell you what registers can be used inside an assembler function without having to save them (if any).
This also doesn't account for disabling and re-enabling interrupts, if necessary, to guarantee the timing routine will not be interrupted.
The behavior of the jump instructions are based on the datasheet for the C8051F38x as I understand it (in terms of when the instruction cache is spoiled or not). This may be different for other versions of the 8051.
Finally, I haven't shown the syntax for jumping into in-line assembly and back out again. The subroutine could also be put into a separate file and assembled.
Edit
Since I wrote the original code, the OP has informed me that the number of clock cycles for a jump in his 8051 is 5 or 6, not the 2 or 4 stated in the datasheet I read. So I have re-written the routine to take this into account. Unfortunately, this bumps the minimum cycle count that can be timed to 32 instead of 20. So if counts between 20 and 31 are absolutely needed to be handled, some special purpose code will need to be written specific to that case (see below).
void delay_sub(unsigned char i)
{
// minimum value of i is 32
; cycles
clr C ; 1
subb A,#32 ; 1 adjust for overhead of call and this routine
; a branch could be added here in case the result is negative
mov B,#6 ; 1
div AB ; 4 quotient in A, remainder in B
mov DPTR,#adjustcycles ; 1
mov R7,B ; 1
mov B,A ; 1 save quotient in B as temp
mov A,#6 ; 1
clr C ; 1
subb A,R7 ; 1 A now has 5 - B (remainder)
mov R7,#0 ; 1
jmp @A+DPTR ; 6 jump into table to add clocks based on remainder
adjustcycles: ; execute additional cycles based on remainder
inc R7 ; 1 for remainder of 5
inc R7 ; 1 for remainder of 4
inc R7 ; 1 for remainder of 3
inc R7 ; 1 for remainder of 2
inc R7 ; 1 for remainder of 1
nop ; 1 for remainder of 0
mov A,B ; 1 now has i / 6, have already adjusted for remainder
loop:
djnz loop ; 6
ret ; 6
timing in clock cycles is: 5 (call) + 21 (fixed overhead) + 6*(i/6) + (i%6) + 6 (ret)
if i = 0, 5 + 21 + 6 = 32 therefore that is the minimum count
Instead of dividing the parameter i by 2 as in the previous example, I now have to divide it by 6 because I am assuming the DJNZ instruction takes 6 cycles. So we need to loop i / 6 times, and also add 0 to 5 cycles for the remainder (i % 6).
The remainder of my comments above pretty well apply to this example. I am leaving the original code, in case anyone actually has a 8051 with a two-cycle DJNZ instruction.
For counts of 20-31, you could create a subroutine with just one nop, that takes 12 cycles including the call and return:
void delay12(void)
{
nop
}
For 20-23 counts, you would call it once plus adding 8 to 11 nops after the call (or a dummy jump to the next instruction which would eat up 6 cycles plus 2 to 5 nops -- so delaying 20 cycles would cost just four instructions plus the subroutine which is assumed to be used more than once.). For 24-31 counts, you would call delay12 twice, and add 0to 5 nops and/or a jump instruction as needed.
So to delay 20 cycles:
acall delayl12
jump next
next:
nop
nop
Best Answer
You could use a DS1341/2 to generate a 1 Hz signal ...in addition it could be your main crystal controlled timepiece. It draws less than 500 nA @ 3 V.
You could even use the new DS2417 which draws less than 200 nA @ 3 V and the 1-Wire interface is much lower power than I2C.
Since I assume you are using an ATMega328P, the 1 Hz signal can be a wakeup interrupt along with any button presses. If you are doing something like Chronio you should make sure to read Gammon and Morissey on deep sleep with the ATMega328(P), they are very useful research.