I agree with zklapow but check the projects section on AVR Freaks too. That's how I learnt back in the day before arduinos. Also, you will almost certainly have to read the datasheet for your chip to get anything useful done. There are no nice functions that, for example, read an analog pin in because there are so many parameters that the arduino environment hides from you. Let me give you an example:
int ledPin = 13; // LED connected to digital pin 13
// The setup() method runs once, when the sketch starts
void setup() {
// initialize the digital pin as an output:
pinMode(ledPin, OUTPUT);
}
// the loop() method runs over and over again,
// as long as the Arduino has power
void loop()
{
digitalWrite(ledPin, HIGH); // set the LED on
delay(1000); // wait for a second
digitalWrite(ledPin, LOW); // set the LED off
delay(1000); // wait for a second
}
is roughly similar to the following C (not tested):
int main(void) {
// I just randomly picked pin 6 on PORTB, this may not be the same on an arduino
DDRB = (1<<6); // set pin to output in data direction register
int i; // have to do this outside the for loop because of C standard
while(1) {
PORTB |= (1<<6); // send pin 6 high and leave other pins in their current state
for (i=0; i<10000; i++); // delay
PORTB &= ~(1<<6); // send pin 6 low
for (i=0; i<10000; i++); // delay
}
}
Sure, there are delay functions in some libraries and there may be output functions in other ones, but get used to writing code like this. The only way you can possibly know what all this PORTB and DDRB stuff means is to read the datasheet. I'm sorry to drone on about this but not enough people realize that the datasheet is a goldmine for information.
The Arduino boards can be programmed in assembly. All you need is an ICSP Cable (In Circuit Serial Programmer) and the AVR toolchain (free from ATMEL) to write to the board. You then get the advantage of on board debugging.
As you suggested, you can just slap an ATMEL chip on a breadboard and go to town.
The kit you referenced looks like a great starting point. You can take the chip right off the board and stick it on your own breadboard (as long as it has correctly regulated power and you account for the clock).
EDIT: Apparently you don't need an ICSP to load assembly programs. See comment below for details.
Best Answer
You have a couple options:
1) Use an interrupt. The setup is slightly complicated but frees up your device to do other things while it is waiting. Refer to your AVR datasheet for instructions on how to set an interrupt. For delays greater than the interrupt counter, you can use a pre-scaler or another variable to count interrupts until your desired wait has occurred.
2) Use a NOP in a for loop to perform your wait. According to this page - http://playground.arduino.cc/Main/AVR, a NOP operation takes 1 clock cycle - 1 clock cycle = 1/frequency. At 16MHz a NOP will take 62.5nS to execute. use an unsigned long variable when defining your loop counter so you don't roll over.
Your loop counter will look like this (volatile ensures that the compile will not optimize out the code):
Edit: There will be some overhead from the for loop. You can determine this experimentally (easy) or by counting the instructions (hard).