I understand how a computer works on the basic principles, such as, a program can be written in a "high" level language like C#, C and then it's broken down in to object code and then binary for the processor to understand. However, I really want to learn about assembly, and how it's used in modern day applications.
I know processors have different instruction sets above the basic x86 instruction set. Do all assembly languages support all instruction sets?
How many assembly languages are there? How many work well with other languages?
How would someone go about writing a routine in assembly, and then compiling it in to object/binary code?
How would someone then reference the functions/routines within that assembly code from a language like C or C++?
How do we know the code we've written in assembly is the fastest it possibly can be?
Are there any recommended books on assembly languages/using them with modern programs?
Sorry for the quantity of questions, I do hope they're general enough to be useful for other people as well as simple enough for others to answer!
Best Answer
On "normal" PCs it's used just for time-critical processing, I'd say that realtime multimedia processing can still benefit quite a bit from hand-forged assembly. On embedded systems, where there's a lot less horsepower, it may have more areas of use.
However, keep in mind that it's not just "hey, this code is slow, I'll rewrite it in assembly and it by magic it will go fast": it must be carefully written assembly, written knowing what it's fast and what it's slow on your specific architecture, and keeping in mind all the intricacies of modern processors (branch mispredictions, out of order executions, ...). Often, the assembly written by a beginner-to-medium assembly programmer will be slower than the final machine code generated by a good, modern optimizing compiler. Performance stuff on x86 is often really complicated, and should be left to people who know what they do => and most of them are compiler writers. :) Have a look at this, for example. C++ code for testing the Collatz conjecture faster than hand-written assembly - why? gets into some of the specific x86 details for that case which you have to understand to match or beat a compiler with optimization enabled, for a single small loop.
I think you're confusing some things here. Many (=all modern)
x86
processors support additional instructions and instruction sets that were introduced after the originalx86
instruction set was defined. Actually, almost all x86 software now is compiled to exploit post-Pentium features likecmovcc
; you can query the processor to see if it supports some features using the CPUID instruction. Obviously, if you want to use a mnemonic for some newer instruction set instruction your assembler (i.e. the software which translates mnemonics in actual machine code) must be aware of them.Most C compilers have intrinsics like
_mm_popcnt_u32
and/or command line options like-mpopcnt
to enable them that let you take advantage of new instructions without hand-written asm. x86-mbmi
/-mbmi2
extensions have several instructions that compilers know how to use when optimizing ordinary C likex << y
(shlx
instead of the more clunkyshl
) orx &= x-1;
(blsr
/_blsr_u32()
). GCC has a-march=native
option to enable all the instruction sets your CPU supports, and to set the-mtune=
option to optimize for your CPU in terms of how much loop unrolling is a good idea, or which instructions or sequences are faster on one CPU, slower on another.If, instead, you're talking about other (non-x86) instruction sets for other families of processors, well, each assembler should support the instructions that the target processor can run. Not all the instructions of an assembly language have direct replacement in others, and in general porting assembly code from an architecture to another is usually a hard and difficult work.
Theoretically, at least one dialect for each processor family. Keep in mind that there are also different notations for the same assembly language; for example, the following two instructions are the same x86 stuff written in AT&T and Intel notation:
If you want to embed a routine in an application written in another language, you should use the tools that the language provides you, in C/C++ you'd use the
asm
blocks.You can instead make stand-alone
.s
or.asm
files using the same syntax a C compiler would output, for examplegcc -O3 -S
will compile to a.s
file that you can assemble withgcc -c
. Separate files are a good idea if you want to write whole functions in asm instead of wrapping one or a couple instructions. A few open source projects like x264 and x265 (video encoders) have extensive amounts of NASM source code for different versions of functions for different versions of SSE or AVX available.If you, instead, wanted to write a whole application in assembly, you'd have to write just in assembly, following the syntactic rules of the assembler you'd like to use.
In theory, because it is the nearest to the bare metal, so you can make the machine do just exactly what you want, without having the compiler take in account for language features that in some specific case do not matter. In practice, since the machine is often much more complicated than what the assembly language expose, as I said often assembly language will be slower than compiler-generated machine code, that takes in account many subtleties that the average programmer do not know.
Addendum
I was forgetting: knowing to read assembly, at least a little bit, can be very useful in debugging strange issues that can come up when the optimizer is broken/only in the release build/you have to deal with heisenbugs/when the source-level debugging is not available or other stuff like that; have a look at the comments here.