Electronic – How does avr-gcc linker know to put the `.data`section at `0x800100` rather than `0x800060`

gcclinkerlinker-script

Looking at the linker script for my part (ATMEGA168PD), the data region has an origin defined as starting at 0x800060

MEMORY
{
  text   (rx)   : ORIGIN = 0, LENGTH = __TEXT_REGION_LENGTH__
  data   (rw!x) : ORIGIN = 0x800060, LENGTH = __DATA_REGION_LENGTH__
  eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = __EEPROM_REGION_LENGTH__
  fuse      (rw!x) : ORIGIN = 0x820000, LENGTH = __FUSE_REGION_LENGTH__
  lock      (rw!x) : ORIGIN = 0x830000, LENGTH = __LOCK_REGION_LENGTH__
  signature (rw!x) : ORIGIN = 0x840000, LENGTH = __SIGNATURE_REGION_LENGTH__
  user_signatures (rw!x) : ORIGIN = 0x850000, LENGTH = __USER_SIGNATURE_REGION_LENGTH__
}

The .data output section does not explicitly define an address, but instead only refers to the data region …

  .data          :
  {
     PROVIDE (__data_start = .) ;
    *(.data)
     *(.data*)
    *(.gnu.linkonce.d*)
    *(.rodata)  /* We need to include .rodata here if gcc is used */
     *(.rodata*) /* with -fdata-sections.  */
    *(.gnu.linkonce.r*)
    . = ALIGN(2);
     _edata = . ;
     PROVIDE (__data_end = .) ;
  }  > data AT> text

…yet the .data output section ends up at the address 0x800100 rather than 0x800060

00800100 l    d  .data  00000000 .data

This turns out to be the correct address for the start of the .data segment on this part because 0x800060-0x8000ff is reserved for some registers and is not general purpose RAM.

There is no origin in the linker script that would specify the 0x800100 address and there are no command line parameters to the linker that would tell it this magic, part specific number. There are no other output sections in the data region that could be pushing the .data output section higher in memory to the 0x800100 location (.bss and .noinit both follow .data and are relative to it).

I've also noticed that if I rename the .data out section to anything else, its origin lands at the beginning of the data region at 0x800060 as expected. Somehow the linker knows that an output section specifically named .data must start specifically origin 0x800100 rather than the start of the region specified in the linker script.

It is as if somewhere there is a .data = 0x800100; or -Tdata=0x80010, but I can not find it anywhere!

My question is: How does the linker know to start the .data addresses at 0x800100 for this part?

Best Answer

I was not able to figure out where the value of the 0x800100 for the origin of .data was coming from, so as a work around I commented out the standard data region and instead defined a new region called dataX that started at the beginning of usable RAM...

MEMORY
{
  text   (rx)   : ORIGIN = 0, LENGTH = __TEXT_REGION_LENGTH__
 /* data   (rw!x) : ORIGIN = 0x800060, LENGTH = __DATA_REGION_LENGTH__ */
  dataX   (rw!x) : ORIGIN = 0x800100, LENGTH = __DATA_REGION_LENGTH__ 
  eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = __EEPROM_REGION_LENGTH__
  fuse      (rw!x) : ORIGIN = 0x820000, LENGTH = __FUSE_REGION_LENGTH__
  lock      (rw!x) : ORIGIN = 0x830000, LENGTH = __LOCK_REGION_LENGTH__
  signature (rw!x) : ORIGIN = 0x840000, LENGTH = __SIGNATURE_REGION_LENGTH__
  user_signatures (rw!x) : ORIGIN = 0x850000, LENGTH = __USER_SIGNATURE_REGION_LENGTH__
}

Then I changed all of the sections that used to refer to the data region to instead refer to this new region while also changing the name of the .data output section to be .dataX so it would not crash into the predefined value...

  .dataX    : 
  {
     PROVIDE (__data_start = .) ;
    *(.data)
     *(.data*)
    *(.gnu.linkonce.d*)
    *(.rodata)  /* We need to include .rodata here if gcc is used */
     *(.rodata*) /* with -fdata-sections.  */
    *(.gnu.linkonce.r*)
    . = ALIGN(2);
     _edata = . ;
     PROVIDE (__data_end = .) ;

  }  > dataX AT> text


  __data_load_start = LOADADDR(.dataX);
   __data_load_end = __data_load_start + SIZEOF(.dataX);

  /* Global data not cleared after reset.  */

  .noinit    :  
  {
     PROVIDE (__noinit_start = .) ;
    *(.noinit*)
     PROVIDE (__noinit_end = .) ;
     _end = . ;
     PROVIDE (__heap_start = .) ;
  }  > dataX

I noticed that the ld documentation says...

The load address of the section is set to the next free address in the region, aligned to the section’s alignment requirements.

...so I also took out all of the origin calculations for all of these input sections that looked like this...

.noinit ADDR(.bss) + SIZEOF (.bss) : AT (ADDR (.noinit))

...since these calculations are unnecessary (and in fact break when the order of the sections is changed). Instead now the script just lets the linker automatically assign each section to the next aligned address in the requested region.

Everything now works correctly, the sections in the data region now always start at the real beginning of RAM, and the script does not break when the sections are changed.

That said, out of curiosity I would still love to understand where that origin of 0x800100 for .data is coming from if you know!