I learned on a 68HC11 in college. They are very simple to work with but honestly most low powered microcontrollers will be similar (AVR, 8051, PIC, MSP430). The biggest thing that will add complexity to ASM programming for microcontrollers is the number and type of supported memory addressing modes. You should avoid more complicated devices at first such as higher end ARM processors.
I'd probably recommend the MSP430 as a good starting point. Maybe write a program in C and learn by replacing various functions with inline assembly. Start simple, x + y = z, etc.
After you've replaced a function or algorithm with assembly, compare and contrast how you coded it and what the C compiler generated. This is probably one of the better ways to learn assembly in my opinion and at the same time learn about how a compiler works which is incredibly valuable as an embedded programmer. Just make sure you turn off optimizations in the C compiler at first or you'll likely be very confused by the compiler's generated code. Gradually turn on optimizations and note what the compiler does.
RISC vs CISC
RISC means 'Reduced Instruction Set Computing' it doesn't refer to a particular instruction set but just a design strategy that says that the CPU has a minimal instruction set. Few instructions that each do something basic. The is no stringently technical definition of what it takes 'to be RISC'. On the other hand CISC architectures have lots of instructions but each 'does more'.
The purposed advantages of RISC are that your CPU design needs fewer transistors which means less power usage (big for microcontrollers), cheaper to make and higher clock rates leading to greater performance. Lower power usage and cheaper manufacturing are generally true, greater performance hasn't really lived up to the goal as a result of design improvements in CISC architectures.
Almost all CPU cores are RISC or 'middle ground' designs today. Even with the most famous (or infamous) CISC architecture, x86. Modern x86 CPUs are internally RISC like cores with a decoder bolted on the front end that breaks down x86 instructions to multiple RISC like instructions. I think Intel calls these 'micro-ops'.
As to which (RISC vs CISC) is easier to learn in assembly, I think its a toss up. Doing something with a RISC instruction set generally requires more lines of assembly than doing the same thing with a CISC instruction set. On the other hand CISC instruction sets are more complicated to learn due to the greater number of available instructions.
Most of the reason CISC gets a bad name is that x86 is by and far the most common example and is a bit of a mess to work with. I think thats mostly a result of the x86 instructions set being very old and having been expanded half a dozen or more times while maintaining backward compatibility. Even your 4.5Ghz core i7 can run in 286 mode (and does at boot).
As for ARM being a RISC architecture, I'd consider that moderately debatable. Its certainly a load-store architecture. The base instruction set is RISC like, but in recent revisions the instruction set has grown quite a bit to the point where I'd personally consider it more of a middle ground between RISC and CISC. The thumb instructions set is really the most 'RISCish' of the ARM instruction sets.
Best Answer
There is no formal or 'only right' answer to this, but sensible guidelines can be suggested. These are essentially the same as a designer would seek to comply with if developing a product intended to have a long lifetime, international multi country manufacture and a desire for no design changes.
I've gone beyond just silicon as if you want to make your product buildable anywhere on earth & long term there are more issues than just the silicon.
In all of this a good working crystal ball and perfect 20-20 hindsight helps.
There is some redundancy and even contradiction here. The aim are guidelines which are each considered on their merits rather that a set of hard and fast rules. For example - the first two suggestions have purposefully been chosen to somewhat contradict each other as an example. These are all "out of my head" and on the fly - if I think of more I'll add them. Criticism welcome via comment system.
Use only multiple vendor sourced parts.
This would tend to eliminate anything from eg Maxim who make superb products but who are frequently the sole source. They are also far from reliable in supply to other than larger customers.
If using single sourced parts choose only vendors who have a solid history of maintaining old parts in stock for decades and/or who provide 100% backwards compatability between newer and legacy parts. Applies more to eg microcontrollers than AND gates.
A supplier who does very well at this is Microchip. You can (probably) still buy 16C16's.
A supplier who does badly at this is Atmel. This does not make them bad per se - just bad for this purpose. Old parts are taken out of production quite rapidly. Backwards compatability is not apparently a priority.
Consider avoiding parts which are or are likely to be restricted in availability by regulations. A notable example is USA's ITAR which effectively classifies various components as if they were munitions and regulates or prevents their export from the US. I am informed that some manufacturers of certain niche products (eg European satellite systems) make every effort to avoid componentry which is or may be ITAR entangled. If a consistent internationally produced product is required then this may involve using parts from eg European sources in the US.
Use parts which have potentially long lifetime families and which are part of formal or informal standards. eg the ARM processors offer a generic processor base which would allow a subset of parts to be chosen which have a good chance of being widely available and for a significant period,
Choose logic families which are liable to remain mainstream or to be likely to offer backwards compatible families.
For systems where it is not utterly essential that they be leading edge, highest performance, highest density, ... do not make them so. Today's leading edge gee whizz system can be tomorrow's bad idea. Cooling, reliability, longevity, quality, ... are not necessarily comfortable companions of the latest and greatest. But, they may be. Choose sensibly.
Use industry standards where applicable.
The following is more about longevity than long term buildability, but ...:
Do due diligence diligently.
Use reputable manufacturers who know their stuff, produce quality product invariably, provide good data sheets with accurate and detailed information.
Identify component groups that have longevity issues and design accordingly. eg wet electrolytic aluminum caps.
If you don't use Panasonics caps know why not.
If you don't use LEDs made by or licensed via the 6 or so major makers know that you've got it wrong.
Don't use tantalum capacitors (yes, I know).
Be aware of potential issues of technologies. eg LiIon vent with flame and calendar life. Beryllium ceramics die in pain. Tin whiskering.
When designing, understand maximum versus typical ratings (or minimum versus typical) - and realise that what usually works is not guaranteed to always work. Understand absolute max versus recommended max operating - and realise that what is guaranteed to survive is not guaranteed to operate. Design for manufacturers worst case specs. And then some.
Never allow protection diodes to conduct more than a few microamps (if that) under normal operating conditions. Realise that allowing protection diodes to conduct may produce no observed bad results ever - but may cause heartache and problems beyond imagining just as easily.
Buy batteries of any technology only from manufacturers and vendors whose capabilities, integrity and bona-fides are known beyond question. Test them anyway.
Understand what aspects of electrostatic protection are marketing hype and which are necessary. Know that unprotected LEDs care more than almost anything.
Understand solder.