Microcontroller – Global Variable Memory Allocation in STM32

flashmicrocontrollerramrom;stm32

I am in general interested about how compiler and linker handle global variables. Here click it is explained that additional ROM is needed in case variable is initialized and not 0.

So wondering, why is it so. Excuse me for a newbie question, but why are they using the word "ROM" here. Or are they referring to the memory of flash, which acts like ROM?

As you see, I am having a bit of confusion. So any help would be appreciated.

Best regards

Best Answer

Memory

Memory systems come in two key varieties: volatile and non-volatile:

Volatile memory is treated as though it powers up in a random state, though it may power up as all zeroes or all ones, too. Volatile memory has to be writable, or else it's not very useful. When you say "volatile" you mean at least these two things: uninitialized values at power-on and writable.
Non-volatile memory is considered to have specific known values when it powers up; values that were earlier programmed into the device at another time. These values may include code or data or both. When you say "non-volatile" you mean just that: known, pre-initialized values at power-on.

However, in this case, the values may or may not be writable. For example, FRAM (aka FeRAM) can be written to at full memory write speeds, just like SRAM. And it's non-volatile, as well. Core memory (magnetic memory made from toroids with special properties and used mostly in the late 1960s and early 1970s) is another example. Also, some non-volatile memory such as flash or EEPROM can be over-written and will retain the values stored there. But there usually are various conditions that limit its usefulness. And I don't know of any cases where flash or EEPROM are reasonably used in the same way that SRAM variables may be.

Note to c, c++, java, Fortran 2003, and c# programmers lacking a sufficient hardware background:

The use of volatile above (and non-volatile) has nothing whatever to do with its use in languages you are exclusively familiar with. Except for the fact of why it came to arise within C in the first place. For some short discussion of that history and a link to a very old post (circa 1990) on the ancient 'newsgroup' (NNTP) system of the earlier internet (of which I was a small part) see: Nine ways to break your systems code using volatile.

I learned C when I was coding on the Unix v6 kernel in 1978. So my life crosses over the time when the qualifier volatile came into eventual use as part of the C language standard about 10 years after my own Unix O/S kernel period. You can read this short history of C to see its first appearance. Its language semantics was added to C in order to address a need with respect to memory-mapped devices.

(Memory-mapped floating point and I/O systems go well back into the early 1970's. And they certainly were common with the Altair 8800 and IMSAI 8080, circa 1975. So they existed in very expensive as well as rather pedestrian computer systems by the mid-1970's. It took quite some time for computer languages to catch up.)

So it's a hardware term whose usage long precedes that in any computer language. Language designers eventually tumbled to some of those problems in writing code for device drivers. After a few decades of requiring assembly code to deal with it, new language semantics finally overcame conservative resistance within language design circles and arrived to address common hardware requirements. The invention of volatile as a qualifier in C follows from earlier hardware usage and was borrowed and re-purposed in C. My meaning above well predates its use by compiler languages. At least by two decades and probably much more. (I remember seeing the term in 1971. But in a context that tells me it existed many years earlier.)

There was a time before it was a twinkle in the eye of any programming language designer. It was, in fact, borrowed from hardware usage as a convenience. Not invented out of whole cloth by language specialists.

I mean it in this earlier way with respect to electronic memory systems.

Programming Toolchain

There has become a dizzying array of available MCUs, today. Some of them are fully pipelined and to some degree even superscaler. But if everything had to be covered here, it would be another book. So that's off the table.

Keeping to your basic MCU, they come in two basic flavors: von Neumann or Harvard. I may give a nod to Harvard, later. But von Neumann is easiest and gets the point across.

Modern toolchains include multiple units of compilation (compile-time.) These may be in any language form and can also include assembly code, as well. The input at compile-time is a source file. The output at compile-time is usually an object file, which is often just a bunch of various types of records. Modified source files must be compiled to produce their associated object files.

A linker is then used to bring these separately compiled object files together. Linkers also often include a separate source file that can direct their operations in combining those source files. This separate source file is sometimes called a linker control file. It's may indicate how to combine (order and position, for example) what it gathers up from the various object files that it must process. The output of the linker step (link-time) is an executable file. That is usually composed mostly of the binary records that describe everything needed for a "unit of execution" in the final target MCU. But it may also include "patch records" (as with older x86 programs, for example) that help the loader when it reads up the executable and maps it into memory.

There is, sometimes, a loader. In Windows, there definitely is one. But in MCUs, the executable file is just an exact image of the non-volatile portions (literal text) of the execution unit. It may include some details, such as where to place different segments. But often it is little else and the loading process is then just called "programming a device" or "burning a device" and is part of the built-in services of the IDE being used.

Program Model for a Unit of Execution

A unit of execution is the complete in-memory specification of the program being run. This includes not only its non-volatile portions but also all of the required RAM (almost always volatile.)

Below, I've borrowed and modified an image I drew up years ago for another purpose:

There are two colored columns. The left-side one is for von Neumann and the right-side one is for Harvard. For the basic von Neumann architecture, all three sections, CODE, CONST, and crt0, can be placed into a single ROM (flash.) As Harvard architectures have a separate memory system for code and data, there are two such ROMs required unless there are special instructions added in order to access the code memory system as data. The lighter blue is the same for both: SRAM/DRAM.

In the above diagram, I've used ROM loosely. In systems with core memory, for example, it's actually persistent RAM. Some decades ago, MCUs frequently used OTP (one-time programmable) memory and it was truly ROM. Today's systems usually use flash and in many cases it can be written to many, many times. In some cases, where the flash is broken up into multiple sections, writing is possible even while the program is running. (Though not well enough for many purposes.)

The key idea here is that ROM as I intend it above stands for non-volatile memory that may be, but does not need to be, writable. (It obviously does have to be readable, though.) There are no other requirements.

Likewise, the key idea here is that RAM as I intend it above stands for memory that must be writable. It may be volatile, but doesn't need to be.

This is what all of the tools, the compilers, assemblers, and linkers have as their basic concept. There are missing details. Compilers generate code and the code is placed into code segments that the linker collects together in certain ways. I've left out such abstracts as code and data segmentation (this is what the linker processes) and details about how that works. What you see above is just what finally results after all the segments have been organized and placed by the linker.

I've used the word persist where I meant non-volatile memory. These sections must power-up with the right values and must be in some kind of persistent memory system(s). The sections listed as volatile can be taken as if they are SRAM or at least some kind of work-alike as with DRAM. They could also be FRAM (which is non-volatile.) But the main point is that they are fast memory and writable. (You need fast memory for stacks and heaps and variables.)

Program Model Comments

You almost never see all of the code in a program you write. This is particularly true with C and languages other than assembly code. This is because there is a start-up process required. In C compilers, this is usually hidden inside something just called "crt0." (C, run-time, code section 0.) That's the piece that makes sure your stack is set up, the heap space is properly initialized, and that any necessary initial values have been taken care of for your initialized static lifetime variables.

In some languages, such as C, even the uninitialized static lifetime variables have defined initial values (int is 0, float is 0.0, etc.) However, not all languages have that requirement. So a linker cannot assume that this is always the case. If you are mixing languages as well as assembly, then there can be uninitialized static lifetime variables that do NOT require initialization. And so there's no need to waste precious CPU cycles initializing them.

This fact is why I also included an uninitialized data section. C does not use it. But other languages do. So you cannot assume that it is not present. It may be or it may not be. It all depends. But the model above is very general and will apply to almost any "standard" program model. (Obviously, much more complex arrangements can be and have been designed. But this is the primary one to learn about.)

Section Descriptions

--CODE--

Looking back to the diagram above, there is a CODE section. On power-up, this must be valid and workable, immediately. That means it must be in non-volatile memory. (This section includes the crt0 code that is placed to that it starts up when the MCU powers up.)

On some machines, CODE may reside behind a protection barrier so that it cannot be read or written, but only executed. (The modern x86 is an example.) On other machines, it's readable but cannot be written into.

But in the strict sense, the only requirement is that it may be addressed and executed. There is no necessary requirement that it can be read or written. (That doesn't mean that a relaxed system may allow reading or writing code. Many do. It's just that the only necessary and sufficient requirement is that it can be executed.)

For MCUs today, this is usually implemented with flash memory.

--CONST--

There is also a CONST (constant data) section. This section must also be in non-volatile memory.

Strictly speaking, this includes all constant values needed by a program. Examples would be error message strings and the value of \$\pi\$. You don't ever need to write to these values, directly. (Though you may copy them somewhere and modify them.)

--crt0--

This section includes all of the necessary values used to initialize the static lifetime variables stored in volatile memory (SRAM, for example), whether or not they are writable, before the program starts executing. (That initialization is handled, for example, by the hidden crt0 code for the C language.) This section must also be in non-volatile memory.

For MCUs today, this section is usually implemented with flash memory.

For a clarifying example, in C, when you write these four ways of saying similar things:

char * h1= "Hello there";               /* case 1 */
char * const h2= "Hello there";         /* case 2 */
const char h3[]= "Hello there";         /* case 3 */
char const * const h4= "Hello there";   /* case 4 */

But there are distinct semantics to all four of the above cases.

The literal string "Hello there" must be placed into the CONST section. You will never actually write onto it. So it's fine if that string is placed directly into flash memory. (Optimizing C compilers will, of course, notice that these two literal are the same and they won't duplicate these strings.)

Assume the MCU has non-volatile flash and volatile SRAM for its memory.

Case 1 requires crt0 to copy the string from flash to an SRAM buffer large enough to hold it and to also initialize the SRAM-located variable h1 with the address of that SRAM buffer. This is because the declaration says two things: the literal string itself is writable (you can change it if you want) and also the pointer to that literal string can also be changed (you can make h1 point somewhere else, if you want.)

So both the pointer variable as well as all of the contents of the string it points at must be located in SRAM and not in flash. This means crt0 has to initialize both the buffer and the variable. And to do that, it needs correct values located in flash (CONST) that it can use to perform that function.
Case 2 only requires crt0 to initialize the SRAM-located variable h2 with the address of the string. Since the string itself is not writable (by declaration), it can reside in flash. So there is no necessary need to allocate and initialize an SRAM buffer for the literal string. (Of course, it's not harmful to do that. It's just not required.)

The declaration does say that the pointer to that literal string can be changed (you can make h2 point somewhere else, if you want.) So that's why h2 must be located in SRAM.
In case 3, h3 isn't really a variable. It's a compile-time constant. Its value points to the literal string. Since h3 isn't a variable, it doesn't require any memory. So only the literal string exists and it can be located in flash. No SRAM required here. crt0 doesn't need to do anything in this case.
Case 4 is a little interesting. Technically, this also only uses flash and has no requirement for SRAM. That's because h4 is a constant pointer and you are not allowed to modify it and also because the string it points at is also constant and you are also not allowed to modify that, either. That said, h4 does appear to say that there must be a pointer variable. So h4 probably will require room in the flash, along with the literal string.

Optimizing compilers do a lot more, though. It's possible that an optimizing C compiler will remove storage required for h4 since the pointer is constant can cannot be changed. So there's no need to actually allocate space for the pointer. (Though you still may wish it did.) It can simply use what h4 points to whenever h4 is used, when generating code.

However, that same optimizing compiler facing this also in the same program:

call foo(& h4);

Would now be forced into allocating space for h4. That's because an address was taken and, for there to actually be an address for h4, it needs an actual address -- it must exist in memory somewhere.

This is not an error and the compiler doesn't need to inform you about it. It's just a case where the compiler at first may want to optimize out h4, but then later finds out it cannot do that because of something else you wrote in your code.

Note that in the face of separate compilation units, the definition of h4 may exist in one file while the call() exists in a different file. So there is no possible way the compiler can see both at the same time. This means that the linker is responsible for figuring out this particular optimization detail. This requires the compiler to generate enough information in the object files so that the linker can do its job. And the linker must have enough of the compilation job pushed onto it that it can succeed.

--INITIALIZED DATA--

This is where all of the initialized writable static lifetime variables go. Their initialized values either come from the crt0 section or else must be defined by the language. (In c, a semantic 'zero' is usually applied when the initializer is missing from the static lifetime variable definition.)

Every time the program is re-started, these variables must be re-initialized by crt0 code.

For MCUs today, this is usually implemented with SRAM memory.

--UNINITIALIZED DATA--

Some languages (not C) allow static lifetime variable definitions which truly specify no initialized value for the variable. Assembly code is a classic case for this. But it's not the only case.

For those languages, there is no need for crt0 to do anything. (And I'm still using C's crt0 as a metaphor for other languages which may call it something else.) The values will be set up by the program sometime after it starts running, so there's no need.

Since these variables are, by definition, uninitialized its a given that they will be written into. So they must be writable.

For MCUs today, this is usually implemented with SRAM memory.

--HEAP--

This is usually set up right at the very end of the static parts of the program (the above listed sections.) Those all have link-time known sizes and therefore are known before the program starts running.

Heap is usually allocated "upwards" in memory, towards the stack. It is always writable (necessary) and is therefore typically in SRAM.

--STACK--

This is usually set up right at the very end of the memory system's SRAM addressing. It allocates (usually) in a "downwards" direction, towards the heap. It is also always writable (necessary) and is therefore typically in SRAM.

By setting up the heap and the stack to work towards each other, a benefit is that the total memory footprint is known at the time the program starts (operating systems love this, but it's also a necessity for memory-limited MCUs, too.) Another is that although the required stack and heap are of unknown size before the program starts up, they at least are arranged to minimize a conflict later on. (Of course, for stand alone instrumentation code you really do NOT want any conflict ever.)

Summary

Hopefully that gives you enough of a picture to help you think about the work that compilers, linkers, and loaders perform for you.

Oh, and while I'm on a history jag today and just to break some younger minds here, keep also in mind that there was a time when the very idea of a hardware stack didn't even exist with computers.

For example, the Hewlett-Packard 21xx processor family [I worked on the 2114, 2116, and 21MX] didn't have the concept. Calling a subroutine on these machines caused the first word of the subroutine to be written by the address following the call instruction. The subroutine would then return to its caller by doing an indirect-JMP through that word.

It took some time -- decades again -- for the idea of hardware support for even one stack, let alone more, to gel and get implemented in newer computer systems.

Good ideas take time to develop and precipitate. Not everything was as you see it today. There were lots of poorer ideas that also worked. But sweat and tears bred innovation and eventual acceptance of new organizing approaches.