void (*app_start)(void) = 0x0000;
This isn't a NULL pointer. This really is the address of the start of application code, which the bootloader jumps to. The linker arranges for your application code to start at address 0. See table 26-6 in the ATMEGA168 datasheet.
The bootloader code starts higher up in flash. Exactly where depends on the bootloader fuses.
I learned on a 68HC11 in college. They are very simple to work with but honestly most low powered microcontrollers will be similar (AVR, 8051, PIC, MSP430). The biggest thing that will add complexity to ASM programming for microcontrollers is the number and type of supported memory addressing modes. You should avoid more complicated devices at first such as higher end ARM processors.
I'd probably recommend the MSP430 as a good starting point. Maybe write a program in C and learn by replacing various functions with inline assembly. Start simple, x + y = z, etc.
After you've replaced a function or algorithm with assembly, compare and contrast how you coded it and what the C compiler generated. This is probably one of the better ways to learn assembly in my opinion and at the same time learn about how a compiler works which is incredibly valuable as an embedded programmer. Just make sure you turn off optimizations in the C compiler at first or you'll likely be very confused by the compiler's generated code. Gradually turn on optimizations and note what the compiler does.
RISC vs CISC
RISC means 'Reduced Instruction Set Computing' it doesn't refer to a particular instruction set but just a design strategy that says that the CPU has a minimal instruction set. Few instructions that each do something basic. The is no stringently technical definition of what it takes 'to be RISC'. On the other hand CISC architectures have lots of instructions but each 'does more'.
The purposed advantages of RISC are that your CPU design needs fewer transistors which means less power usage (big for microcontrollers), cheaper to make and higher clock rates leading to greater performance. Lower power usage and cheaper manufacturing are generally true, greater performance hasn't really lived up to the goal as a result of design improvements in CISC architectures.
Almost all CPU cores are RISC or 'middle ground' designs today. Even with the most famous (or infamous) CISC architecture, x86. Modern x86 CPUs are internally RISC like cores with a decoder bolted on the front end that breaks down x86 instructions to multiple RISC like instructions. I think Intel calls these 'micro-ops'.
As to which (RISC vs CISC) is easier to learn in assembly, I think its a toss up. Doing something with a RISC instruction set generally requires more lines of assembly than doing the same thing with a CISC instruction set. On the other hand CISC instruction sets are more complicated to learn due to the greater number of available instructions.
Most of the reason CISC gets a bad name is that x86 is by and far the most common example and is a bit of a mess to work with. I think thats mostly a result of the x86 instructions set being very old and having been expanded half a dozen or more times while maintaining backward compatibility. Even your 4.5Ghz core i7 can run in 286 mode (and does at boot).
As for ARM being a RISC architecture, I'd consider that moderately debatable. Its certainly a load-store architecture. The base instruction set is RISC like, but in recent revisions the instruction set has grown quite a bit to the point where I'd personally consider it more of a middle ground between RISC and CISC. The thumb instructions set is really the most 'RISCish' of the ARM instruction sets.
Best Answer
You can use AVRA (AVR Assembler) on Linux to develop your assembly language program on Linux. I'm on Linux and free both as in beer & freedom of use.
Plan to use AVRDUDE (also free - both above contexts) to burn in your object code. I have used it in Linux.
Plan to use another (cheap - free as in opensource) Arduino like as a programmer. I have used one here and documented. In my article, I download a bootloader to a new ATmegaxx, instead you can burn in your own program. I did it all through USB (as you asked).
You definitely can use an USB based Arduino or RBBB+USB cable to program your target Arduino using USB. RBBB is indeed a utility Arduino that I have used in that role.
Overall, to use my solution, you would have to use another programmer or another cheap Arduino like RBBB. I also have reviewed RBBB in my other post.
Another way to program your Arduino with another Arduino.
Programming your Arduino using a third party USB programmer.