No, it does not. In C, variables have a fixed set of memory addresses to work with. If you are working on a system with 4-byte ints
, and you set an int
variable to 2,147,483,647
and then add 1
, the variable will usually contain -2147483648
. (On most systems. The behavior is actually undefined.) No other memory locations will be modified.
In essence, the compiler will not let you assign a value that is too big for the type. This will generate a compiler error. If you force it to with a case, the value will be truncated.
Looked at in a bitwise way, if the type can only store 8 bits, and you try to force the value 1010101010101
into it with a case, you will end up with the bottom 8 bits, or 01010101
.
In your example, regardless of what you do to myArray[2]
, myArray[3]
will contain '4'. There is no "spill over". You are trying to put something that is more than 4-bytes it will just lop off everything on the high end, leaving the bottom 4 bytes. On most systems, this will result in -2147483648
.
From a practical standpoint, you want to just make sure this never, ever happens. These sorts of overflows often result in hard-to-solve defects. In other words, if you think there is any chance at all your values will be in the billions, don't use int
.
Unlike with Java or C# I can't just use google as well, since Assembly just isn't used by many anymore.
I don't think this is accurate: I found dozens of helpful articles and presentations by searching "understanding assembly language".
Further, you will find the search terms x64 and "instruction set" helpful. The following describes additional search terms you might use to dig deeper.
There are many different kinds of CPUs. Each uses an instruction set architecture.
The instruction set architecture describes the various instructions that the CPU can execute. These instructions have encodings and so are stored as bit patterns. A program consists of sequences of instructions that a given CPU can execute. The language of a program using these bit patterns is called machine code.
Assembly language refers to a human readable version of machine code. Instructions are specified using mnemonic instruction names and operands that can be read and edited as text. Assembly language is compiled (assembled) into machine code by program called an assembler. Assembly language has many features to make the source code readable and more maintainable than machine code. For example, assembly language uses labels — whereas machine code uses offsets. Inserting a new instruction into a program in assembly language is easy, but doing so in machine code is hard as it will throw off other offsets being used nearby. So, assembly language is much preferred.
The Application Binary Interface for a given ISA, determines conventions of register and stack usage, such that one function written by one author can "call" another function written by another author (provided they both adhere to the convention). Another useful term here is "calling convention", which is part of an ABI that specfically describes how parameters are passed from one function to another. Also relevant is the term stack frame.
It is useful to understand that the difference between what is allowed/supported by the hardware ISA and what is allowed/supported by software convention of the ABI.
This Wikipedia article lists the registers on x64 (https://en.wikipedia.org/wiki/X86-64#Architectural_features), and this article illustrates the x86's overlapping register names (https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/x64-architecture).
If you don't want to adhere to the standard conventions (ABI), you can create your own, which is often done in simple programs by students learning to write assembly language.
Beyond these search terms, you should consider writing some simple C programs that illustrate your questions, compile them (with and/or without optimization) and look at the compiler output as disassembly or in a debugger to see how the instruction set is used to manipulate data.
NASM is a specific assembler for x86/x64 architecture, but certainly not the only one. Questions specifically about NASM would go to the different syntaxes and expressions you can write in that assembler.
Your question about registers and stack should be directed more to the instruction set architecture and the calling convention than toward any specific assembler. While the instruction set allows certain operations regardless of the operating system, the ABI differs somewhat between linux and windows, so that there are some differences in register usage and stack usage.
A stack frame can use a single stack pointer or both a stack pointer and a frame pointer. The stack pointer can move during the execution of a function, so the offset of stack allocated variables relative to the stack pointer can change. A frame pointer remains fixed during the execution of a function, and thus variables located in the stack can be referred to by a fixed offset from the frame pointer even as the stack pointer moves from pushing and popping.
The frame pointer approach is easier to use and also supports easier debugging, and may also support stack unwinding and exception handling. However, it somewhat less efficient (as it involves a second register, and a few extra instructions to save, establish, and restore the frame pointer).
The x86 architecture has a long linage. If you see RAX, RBX, RSP, RBP, these are names of registers in the 64-bit extension of this architecture. EAX, EBX are names of the 32-bit registers, and you may see these in 32-bit code or 64-bit code. Any given program should be either intended for the 64-bit architecture (x64) or the 32-bit (x86) architecture but not both mixed together. Therefore we can look to how registers like SP and BP are used to see which (RSP/RBP for 64 and ESP/EBP for 32).
In the original 16-bit 8086, AX was a favored register since encodings that target that register are shorter than other instructions. Further, multiplies and divides target AX/DX register pair. Many of these special register uses have been removed in favor of the registers being more general purpose as the architecture has evolved to 32-bits and 64-bits. This evolution is friendlier toward compilers and hence high-level language. These architectures still have dedicated stack pointers and instructions that implicitly target this register. However, the other registers today are general purpose registers. Once again I bring up the calling convention, which will tell you which register is used, for example, to pass the first argument, or to return a return value.
Best Answer
It depends upon the C compiler, as the specification leaves this up to the compiler implementer. (There are specific requirements, though -- such as if you take the address of a parameter -- which may further constrain the compiler implementer.)
A typical C function generates some "prologue" code, the "body" code, and some "epilogue" code. The prologue will allocate local variable space. In general, a compiler will NOT go through the process you outlined. So let's say we are talking about 16-bit x86 code and about this function in C:
Note that in the context I mentioned, the variables will be 2 bytes in size. The compiler will count the number of bytes required for all of the local variables. In this case, you have four of them and since they require 2 bytes each, the compiler "computes" that 8 bytes are required for all of this. (Also note that even if you include variable definitions inside of additional code blocks within the above C function, the C compiler will count ALL OF THEM AT ONCE. It doesn't just count those defined at the outer level.) So the compiler might generate the following prologue in assembly:
Usually, the BP register is used as a "frame pointer" for the current context of the function. BP is also known as the pointer to the "activation frame." So the C compiler needs two instructions to save the old activation frame pointer on the stack and to then initialize it to point to the current one for this invocation of the function.
The third instruction is simple and allocates ALL of the necessary space for the 8 bytes needed for ALL of the local variables. It simply moves the stack pointer over the needed 8 bytes. BP will still point at the base of this activation frame. But the SP now is at the other side of that, so additional pushes and calls won't write over the local variables.
The C compiler will also internally assign some offset value for each of the local variables. Perhaps something like IVAR=0, JVAR=2, KVAR=4, and AVAR=6. Now, the C compiler needs to create some BODY code. Because you set these local variable values, something like this (completely un-optimized):
Note that in this 16-bit context, it's also common for the return value of a function to be placed into AX, if it fits there. In this case, it does. So the answer is already in the right place at this time. So now an epilogue is required:
And that's it.
Now, an optimizer will do a GREAT DEAL to improve the above code in this case. It will probably completely remove ALL of the local variables, as they aren't needed at all. It can pre-compute the entire result as 15 and just do the following:
No need to manage the activation frame, at all. Just return the value. (But then you wouldn't get any idea about how the local variables might be managed.)
Hope that helps a little.