bssdata/README


See the top level README for information on where to find the
schematic and programmers reference manual for the ARM processor
on the raspberry pi.  Also find information on how to load and run
these programs.

Based on uart02, the purpose of this example is to demonstrate what
you would need to do if you assume .bss is zeros or use .data.  And
you are asking, what does that even mean?  As stated in the top level
README I personally write code so I dont have to mess with what I am
about to show you.

Although not carved in stone, many toolchains (assembler, compiler (C),
linker) use terms .text and .data and .bss to describe where things
go in a binary created by the toolchain.  Now I have seen other names
for segments, you will just have to translate.

These are all chunks of memory if you want to think of it that way, esp
with bare metal embedded you will eventually find a system where some
of the memory range is a rom, and your program is there and some of the
memory range is ram and you want your read/write variables there, etc.

The toolchain keeps these segments of memory separate from each other
and the linker places these items in the binary depending on what the
linker is told to do through a configuraiton file, script, command line,
whatever mechanism.  What kind of binary output is affected by this
as well.  How can there be more than one kind of binary output file?
Most of the "binary" files we run today are more than just the machine
code and data that makes up our program.  The files tend to have some
sort of header so we can detect that is what they are, if you look at
a windows/microsoft .exe file the first to letters are or at least used
to be MZ, an elf file popular with linux starts with the letters ELF.
Then depending on the file format there are lots of things that might
be in the file, for example a bunch of stuff related to debugging, if
you compile for debugging or compile with debugging symbols the file
can have extra info to help the debugger find things in the code to show
you on the debugger gui to allow you to understand where you are in
the higher level source code (the binary file contains machine code).
You will see in the disassembly below of a .elf file, there are some
global names like _start and fun, etc.  These strings are in the
elf binary just in case we want to do things like disassemble.  Otherwise
without those symbols in the .elf file all we would see are some
hex numbers, no ascii names.  Depending on the binary format and how
you liked things each segment may be in separate parts of the binary
file, and the binary file would have information for the loader to
place these things at the right addresses so that the code will run
properly, or at least it puts it where you told it, right or wrong.

.text refers to the code itself, the machine code that is your program.
note that your program, the machine code, is considered read-only.

.bss is used for storage of global stuff (variables, structs, etc)
that were not initialized in the program (this will be explained).

.data is used for storage of global stuff that was initialized in the
program.

.rodata is read only data, this is global stuff that was declared to
be variables or whatever but declared to be read only (const).  Depending
on the flavor and version of toolchain or linker script you are using
.rodata might be combined in the .text segment since both are read-only
segments as far as the toolchain is concerned, bugs in your code may
say otherwise.

so we take the fun.c program in this directory.  Note the fun.c part
of this example is non-functional, dont load it, dont run it.

unsigned int fun2 ( unsigned int );
const unsigned int x=2;
unsigned int y;
unsigned int z=7;
void fun ( unsigned int a )
{
    unsigned int n;

    n=5;
    fun2(a);
    fun2(x);
    fun2(y);
    fun2(z);
    fun2(n);
}

Here is the linker script used.

MEMORY
{
    calvin : ORIGIN = 0x1000, LENGTH = 0x1000
    hobbes : ORIGIN = 0x2000, LENGTH = 0x1000
    susie : ORIGIN = 0x3000, LENGTH = 0x1000
    rosalyn : ORIGIN = 0x4000, LENGTH = 0x1000
}

SECTIONS
{
    .text : { *(.text*) } > calvin
    .bss : { *(.bss*) } > hobbes
    .rodata : { *(.rodata*) } > susie
    .data : { *(.data*) } > rosalyn
}


When compiled, linked with the simple linker script and disassembled it
looks like this

Disassembly of section .text:

00001000 <_start>:
    1000:   eb000001    bl  100c <fun>
    1004:   eafffffe    b   1004 <_start+0x4>

00001008 <fun2>:
    1008:   e12fff1e    bx  lr

0000100c <fun>:
    100c:   e92d4008    push    {r3, lr}
    1010:   ebfffffc    bl  1008 <fun2>
    1014:   e3a00002    mov r0, #2
    1018:   ebfffffa    bl  1008 <fun2>
    101c:   e59f3020    ldr r3, [pc, #32]   ; 1044 <fun+0x38>
    1020:   e5930000    ldr r0, [r3]
    1024:   ebfffff7    bl  1008 <fun2>
    1028:   e59f3018    ldr r3, [pc, #24]   ; 1048 <fun+0x3c>
    102c:   e5930000    ldr r0, [r3]
    1030:   ebfffff4    bl  1008 <fun2>
    1034:   e3a00005    mov r0, #5
    1038:   ebfffff2    bl  1008 <fun2>
    103c:   e8bd4008    pop {r3, lr}
    1040:   e12fff1e    bx  lr
    1044:   00002000    andeq   r2, r0, r0
    1048:   00004000    andeq   r4, r0, r0

Disassembly of section .bss:

00002000 <y>:
    2000:   00000000    andeq   r0, r0, r0

Disassembly of section .rodata:

00003000 <x>:
    3000:   00000002    andeq   r0, r0, r2

Disassembly of section .data:

00004000 <z>:
    4000:   00000007    andeq   r0, r0, r7

I have all the types represented

const unsigned int x=2;
unsigned int y;
unsigned int z=7;
void fun ( unsigned int a )
{
    unsigned int n;

the variable x is declared using const, this tells the compiler that
this is a variable, it has this name, I want it initialized to some
value before my program starts, but I will only ever read from it I
will never change this variables contents.  You will find this variable
end up in either .rodata or .text

The variable y, is a global variable, that has not been initialized.  We
are supposed to be able to assume that when our program starts this
variable will be initialized to zero.  This variable will be found in
the .bss segment.

The variable z is a global variable as well, but it is initialized. We
expect it to be this value when our program starts.

Variable a is a parameter, it is passed in based on the compiler rules
for that processor, etc.  typically it lives in a register or on the
stack.

Lastly variable n is a local variable, it also does not have a named
segment, but typically lives on the stack or in registers or both.  In
this case with such a simple program the optimizer completely removed
the variable from having a home, the constant that we loaded the variable
with then used the variable in a function call was replaced with a constant
being fed right into the register used to send the parameter to a function.

    unsigned int n;
    n=5;
    ...
    fun2(n);

Was optimized to a simple mov 5 and call the function:

    1034:   e3a00005    mov r0, #5
    1038:   ebfffff2    bl  1008 <fun2>

The variable n's home if you will is embedded in the bits in the
instruction itself (note the lower bits of that instruction).

The simple linker script defined four separate memory regions, and then
associated those regions with the various segment definitions.  Many times
in a linker script you will see the words rom and ram and flash and eeprom
to define the memory regions.  I intentionally used non-computer like
names both in simple and in memmap, to get you over this idea that those
names have any special meaning to the linker tool.  This would be a
mistake to think the linker knows eeprom from ram and does something
for you as a result.

Because of the linker script we saw that our variables landed where we
told the toolchain to put them.


    calvin : ORIGIN = 0x1000, LENGTH = 0x1000
    ...
    .text : { *(.text*) } > calvin

results in .text starting at address 0x1000

Disassembly of section .text:

00001000 <_start>:
    1000:   eb000001    bl  100c <fun>


    hobbes : ORIGIN = 0x2000, LENGTH = 0x1000

    .bss : { *(.bss*) } > hobbes

results in .bss starting at address 0x2000

Disassembly of section .bss:

00002000 <y>:
    2000:   00000000    andeq   r0, r0, r0


and x is in .rodata in the disassembly I created above

and z is in .data.

note that both .text .data and .rodata segments the data in the binary
are filled with non-zero values.  doesnt mean you cant have some zeros
there, point being the z varaiable is shown as a 7 in the binary as we
wanted.

if you can read the assembly you will also note that even though the
compiler knows that we initialized x to a 2 and z to a 7, the code
reads their values from the proper memory locations and does not
optimize them away like it did with the y variable.

So it appears that the .elf file we created has all the parts defined
to be in all the right places.  for this to work though there needs
to be a progrm that reads the .elf file and places these items in
memory at the right places, before allowing the program to run.   This
comes in may forms but can be called a loader.  When running a .exe
file in windows or a .elf file or other file format in linux, there is
a loader in the operating system that reads this extra info in the
binary file and places the bits and bytes in the right place in ram.

We dont have a loader, we are running bare metal embedded here, we have
to do these things ourselves.  How this is solved on the raspberry pi
for example is that when you use the toolchain to convert your program
from a .elf to a .bin file.  A .bin file is for the most part or commonly
assumed to be just the bits and bytes of your program, a literal image
of your memory.  Now to clarify the kernel.img file for the raspberry
pi may not represent memory starting at ARM's address zero, in fact these
programs are compiled as .bin files to be loaded at address 0x8000, and
the gpu that boots the raspberry pi does that unless told otherwise
using a script file that it looks for.  if you have dabbled in these
things before you may have found this can be dangerous.  For example
what if you defined one segment to be at address 0x10000000 and another
at address 0x70000000.  Lets say you have 0x100 bytes at 0x10000000
and only two bytes at 0x70000000, if you want to make a single file
that holds the memory image of these two segments that file will need
to be 0x70000002 - 0x10000000 = 0x60000002 bytes in size, that is a huge
file.  all to hold 0x102 bytes.  Maybe you can see why most of the time
our operating systems, etc dont actually use memory images of the
programs but these hybrid files which are part machine code, part raw
data and part descriptions of where things are to go.

Imagine the typical bare metal embedded situation.  your processor
powers up and boots off of a rom of some flavor (rom, prom, eprom, eeprom,
flash) something that is non-volatile and as a result read only or at
least for practial purposes of booting your processor that memory
space is read only.  The memory in your bare metal system comes up filled
with random garbage because that is what the transistors that store that
ram do, they have not been initialized, and there is no rule that says
they have to be or have to be initialized to a specific value.  Many
systems use dram, which often has to be initialized in some form or
fashion and as a result you might end up filling that memory with some
value, or leave it with the last value you used during initialization.
So we have our rom and that really needs to have .text in it, we have
some ram that will no doubt be the home for .bss and .data.  But we have
a problem.  how is the ram at the .bss and .data addresses going to
get loaded with the zeros or non-zero values we are expecting?  We dont
have an operating system here?  The answer is a bit complicated, at some
point before we start using any of the .bss or .data variables in our
program (which we as C programmers assumed would be zero or whatever
value we initialized them to) we need to prepare that memory to meet
those expectations.  And we need to do it in a way that doesnt require
any .bss or .data variables.  even worse, the non-zero items in .data
need to be saved somewhere in the non-volatile memory so that we dont
lose that information, we need to somehow get that data saved in the
rom and get it copied to ram in the right place.

The typical solution is to have the bootstrap or startup code do this,
this isnt necessarily the boot code for the processor.  Even when running
a program on an operating system, the solution may be to have the .bss
code initialized by the first bit of asm in your program before main()
is called.  The toolchain often supplies this startup code, if you dont
tell it not to the toolchain will use its default linker script and
default startup code (which as we complicate things have an intimate
relationship) it will use them.  Assume we dont want to compile everything
count up how many bytes are in each segment, and then hardcode thse
numbers by hand into some asm, re-build, make sure the sizes and offsets
have not changed, repeat until they dont, and have startup code that
is custom this program.  Add or remove a variable somewhere or re-arrange
them in the code and you would ahve to then re-touch your startup code.
Possible but not wise, the better answer is generic startup code.  But
to have generic startup code we need to know where all these segments
are and what size they are etc.  The gnu solution has two parts, first
you use the linker script langauge to define some variables these
variables are filled in by the linker and will ultimately contain the
starting address for a segment like .bss or .data and the size and
or ending address or both.  In the case of .data we also need to tell
the linker script two things. One is here is the non-volatile memory
space we want the .data to live in when the power is off, and here
is the ram address space where we want it to live when we are running,
our variables are read-write they just happen to be initialized to
some number on start, then we can change them in our program later.

So if you look at the real example in this directory and the memmap
file


MEMORY
{
    bob : ORIGIN = 0x8000, LENGTH = 0x1000
    ted : ORIGIN = 0xA000, LENGTH = 0x1000
}

SECTIONS
{
   .text : { *(.text*) } > bob
   __data_rom_start__ = .;
   .data : {
    __data_start__ = .;
    *(.data*)
   } > ted AT > bob
   __data_end__ = .;
   __data_size__ = __data_end__ - __data_start__;
   .bss  : {
   __bss_start__ = .;
   *(.bss*)
   } > bob
   __bss_end__ = .;
   __bss_size__ = __bss_end__ - __bss_start__;
}

you see there is more stuff in the SECTIONS section of the linker script
I dont really want to explain all of it, it is fairly straightforward
including the ted at bob thing we are pretending here that bob is rom
or flash (where .text lives) and ted is ram, sram, dram, whatever.  You
have to be super careful to place these variables inside or outside
of the right brackets to have it all work, this takes practice and some
iterations to get right.  I may not have it right, but the above works
today.  As mentioned way above, I intentioally did not use memory segnemt
names like rom and ram to demonstrate that the linker script sees those
as ascii labels and for the most part doesnt care what you call them.

Now this example is strange because I wanted to try to show the problems
you will face with a single program, not having to have you load more
than one program, etc.  Notice how above I carefully stated that you need
to initialized .bss and .data at some point before you use them and not
using them to get to that point.  Most of the time you are going to see
some sort of assembly solution in the assembly code that is used before
your main() C function is called.  This assembly code in haromny with
the linker script variables.  The __bss_start__ and such variables are
addresses as far as the toolchain is concerned, not values.  When developing
I tried to have the program display the value of __bss_start__ by
declaring it an external global variable.  What the compiler did was take
the address __bss_start__ read that memory location and print that
value.  So in vectors.s I made some other global variables and then
initialized them to the other variables.  These are in the .text section
so they are filled in for us and .bss and .data are not required to
find these values and use them to prepare .bss and .data.  Where my
weird solution comes in is that I dont have asm code that zeros .bss
and copies .data from point a to point b.  I do this in the C code late
in my program.  As mentioned the reason why is I want to show you that
when you display these global variables before preparing memory they
well, as you now expect, have the wrong value.  Then once we copy
and zero things they then have the right values.

//display before initialized
    hexstring(x);
    hexstring(y);
    hexstring(z);
//zero out .bss
    for(ra=bss_start;ra<bss_end;ra+=4) PUT32(ra,0);
//copy .data from non-volatile .text to its home where the code expects it
//to be.
    for(ra=data_start,rb=data_rom_start;ra<data_end;ra+=4,rb+=4) PUT32(ra,GET32(rb));
//display the varialbes again now that ram is prepped.
    hexstring(x);
    hexstring(y);
    hexstring(z);

I used my serial bootloader, xmodemed the program over and ran it
the last part, interesting part, of the output is:

12345678
0000A008
000082EC
0000A000
00000000
00000000
00000000
00000000
00000002
00000007

here again with comments

12345678
0000A008 this is basically __bss_start__
000082EC __data_rom_start__
0000A000 __data_start__
00000000 display of the x variable before memory prep
00000000 display of the y variable before memory prep
00000000 display of the z variable before memory prep
00000000 display of x after memory prep
00000002 display of y after memory prep
00000007 display of z after memory prep

In this case apparently memory was zeroed by someone, so the .bss
data actually looks right even though that was just dumb luck.  you could
easily modify my bootloader (or I should have) to make that memory
random or non-zero further demonstrating the problem.


So after all of that, I repeat, I dont do this with my code.  Why dont
I do this?  First and foremost, these days I try to write portable code.
This code is not portable if you do this, you have to start messing with
a gnu toolchain specific and even worse sometimes the version of binutils
specific linker scripts, then your startup code that comes before the
first call to a C function relies on gnu linker and linker version specific
linker script variables.  The linker script goes from pretty to very
ugly very fast, and warrants extra explaining as to what it is doing.
it is just not portable, and it is ugly.  (remember beauty is in the
eye of the beholder, you may find all of my code ugly, but then you
probably wouldnt be reading this far down into this file if that were
the case).  Instead of this


unsigned int fun2 ( unsigned int );
const unsigned int x=2;
unsigned int y;
unsigned int z=7;
void fun ( unsigned int a )
{
    unsigned int n;

    n=5;
    fun2(a);
    fun2(x);
    fun2(y);
    fun2(z);
    fun2(n);
}

write your code like this:

unsigned int fun2 ( unsigned int );
const unsigned int x;
unsigned int y;
unsigned int z;
void fun ( unsigned int a )
{
    unsigned int n;

    n=5;
    x=2;
    y=0;
    z=7;
    fun2(a);
    fun2(x);
    fun2(y);
    fun2(z);
    fun2(n);
}

and guess what, you dont have a .data segment anymore, you can remove
that from the linker script and all the baggage that goes with it.  Now
you do need .bss but you dont need to zero it out you just need to
have it acurratly defined in the linker script to an address range that
is actually ram.  .rodata if your toolchain needs it, well the example
was demonstrating things, I simply have .rodata also part of the
same space as .text so after changing those few lines of C code I would
then go from this


MEMORY
{
    bob : ORIGIN = 0x8000, LENGTH = 0x1000
    ted : ORIGIN = 0xA000, LENGTH = 0x1000
}

SECTIONS
{
   .text : { *(.text*) } > bob
   .rodata : { *(.rodata*) } > bob
   __data_rom_start__ = .;
   .data : {
    __data_start__ = .;
    *(.data*)
   } > ted AT > bob
   __data_end__ = .;
   __data_size__ = __data_end__ - __data_start__;
   .bss  : {
   __bss_start__ = .;
   *(.bss*)
   } > ted
   __bss_end__ = .;
   __bss_size__ = __bss_end__ - __bss_start__;
}


to this

MEMORY
{
    bob : ORIGIN = 0x8000, LENGTH = 0x1000
    ted : ORIGIN = 0xA000, LENGTH = 0x1000
}

SECTIONS
{
   .text : { *(.text*) } > bob
   .rodata : { *(.rodata*) } > bob
   .bss : { *(.bss*) } > ted
}

and painfully simple startup code

    mov sp,#0x8000
    mov r0,pc
    bl notmain

yes there is a cost.  Some of those initializations that are not in
.text can take up more room than they used to.  Worst case for these
32 bit or smaller variables is you have one instruction that gets the
value from .text, one instruction that gets the address for it in ram,
an instruction that writes the value to ram.  Plus a location in .text
to hold the address in ram for that variable and a location to hold
the constant we want to write to it, kind of like this

    1010:   e59f503c    ldr r5, [pc, #60]   ; 1054 <fun+0x48>
    1014:   e59f403c    ldr r4, [pc, #60]   ; 1058 <fun+0x4c>
    1018:   e3a03000    mov r3, #0
    101c:   e5853000    str r3, [r5]
    1020:   e3a03007    mov r3, #7
    1024:   e5843000    str r3, [r4]

    1054:   00002004    andeq   r2, r0, r4
    1058:   00002000    andeq   r2, r0, r0

because this example used small variables the mov r3,#0 for example was
capable of holding the constant in the instruction encoding itself.
Same for the #7 but had it been some other number say z = 0x1234;

    1010:   e59f503c    ldr r5, [pc, #60]   ; 1054 <fun+0x48>
    1014:   e3a03000    mov r3, #0
    1018:   e59f4038    ldr r4, [pc, #56]   ; 1058 <fun+0x4c>
    101c:   e5853000    str r3, [r5]
    1020:   e59f3034    ldr r3, [pc, #52]   ; 105c <fun+0x50>
    1024:   e5843000    str r3, [r4]

    1054:   00002004    andeq   r2, r0, r4
    1058:   00002000    andeq   r2, r0, r0
    105c:   00001234    andeq   r1, r0, r4, lsr r2

For this particular processor family, other processors like x86 manage
constants differently...

Now the two locations in .text for example

    1054:   00002004    andeq   r2, r0, r4
    1058:   00002000    andeq   r2, r0, r0

Are not additional costs because those would have been used by the code
that reads the variables as well (I have .bss and .data separate here)


    101c:   e59f3020    ldr r3, [pc, #32]   ; 1044 <fun+0x38>
    1020:   e5930000    ldr r0, [r3]
    1024:   ebfffff7    bl  1008 <fun2>

    1028:   e59f3018    ldr r3, [pc, #24]   ; 1048 <fun+0x3c>
    102c:   e5930000    ldr r0, [r3]
    1030:   ebfffff4    bl  1008 <fun2>


    1044:   00002000    andeq   r2, r0, r0
    1048:   00004000    andeq   r4, r0, r0

The point here is that the address to each of these variables still took
up the same amount of .text space.  What we didnt have when we used
a .data and assumed .bss was zeroed for us, is the code to initialize
each variable one at a time.  there would have been a small loop for .bss
and a small loop for .data, if .bss and/or .data were of any decent size
then there is a lot less waste.

Another thing that may be gnawing at you is that this whole thing is
about global variables.  Raise your hand if you use global variables.
Many folks go out of their way not to.  I happen to use them from time
to time, used to always and only use them.  But now it is a bit of
a mixture.  Local variables you have to initialize inline one at a time
and that is as costly as the solution I am proposing, so you are already
likely programming using that one at a time solution.  So you are already
in tune with my solution to this .bss and .data problem.

The most important thing though is when you use local variables and
do those initializations locally, and manage the size of your functions.
The optimizer (if you use it) will remove a lot of this extra code and
memory.

for example:

unsigned int fun2 ( unsigned int );
const unsigned int x=2;
unsigned int y;
unsigned int z=7;
void fun ( unsigned int a )
{
    unsigned int n;

    n=5;
    fun2(a);
    fun2(x);
    fun2(y);
    fun2(z);
    fun2(n);
}

the variable x is a read-only variable.  variable n is local and only
used to feed the fun2() function.

    1014:   e3a00002    mov r0, #2
    1018:   ebfffffa    bl  1008 <fun2>

    1034:   e3a00005    mov r0, #5
    1038:   ebfffff2    bl  1008 <fun2>

The compiler did not waste the .text space and clock cycles to fetch
x from rom, it simply encoded it inline.  Likewise the local variable
n did not consume stack space, there was no stack frame created at all
in fact, the value was encoded directly in the instruciton as well.
When you use globals you can see that it has to get the address then
read the contents of that address then it can do something with your
variable.  If you change the variable it can go through those steps
to save the variable.


This whole example and lengthy README is here to hopefully help you
to realize when you take one of my examples:

unsigned int fun2 ( unsigned int );
const unsigned int x=2;
unsigned int y;
unsigned int z;
void fun ( unsigned int a )
{
    unsigned int n;

    y=0;
    z=2;
    n=5;
    fun2(a);
    fun2(x);
    fun2(y);
    fun2(z);
    fun2(n);
}

And start adding things or changing things:


unsigned int fun2 ( unsigned int );
const unsigned int x=2;
unsigned int y;
unsigned int z;
unsigned int m=12;
void fun ( unsigned int a )
{
    unsigned int n;

    y=0;
    z=2;
    n=5;
    fun2(a);
    fun2(x);
    fun2(y);
    fun2(z);
    fun2(n);
    fun2(m);
}

And then spend a sleepless night or weekend struggling to understand
why m is not 12 when used in the code...Well now you know.  And now
you know why I dont do it (not all the reasons but some), you are
welcome to do your own thing.  And now you know what my statement in
the top level readme is all about.


UPDATE:

Since the ARM runs completely out of ram on the raspberry pi and usually
there is no reason to split the different segments around we can pack
them all up, here is a simple solution for this platform.

bootstrap.s

.globl _start
_start:
    mov sp,#0x00010000
    bl notmain
hang: b hang

notmain.c

const unsigned int readonly=7;
unsigned int dotdata=9;
unsigned int dotbss[16];
void notmain ( void )
{
    dotbss[3]+=readonly;
}

lscript

MEMORY
{
    ram : ORIGIN = 0x8000, LENGTH = 0x18000
}

SECTIONS
{
    .text : { *(.text*) } > ram
    .bss : { *(.bss*) } > ram
    .rodata : { *(.rodata*) } > ram
    .data : { *(.data*) } > ram
}

> arm-none-eabi-as bootstrap.s -o bootstrap.o
> arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
> arm-none-eabi-ld -T lscript bootstrap.o notmain.o -o hello.elf
> arm-none-eabi-objdump -D hello.elf

hello.elf:     file format elf32-littlearm


Disassembly of section .text:

00008000 <_start>:
    8000:   e3a0d801    mov sp, #65536  ; 0x10000
    8004:   eb000000    bl  800c <notmain>

00008008 <hang>:
    8008:   eafffffe    b   8008 <hang>

0000800c <notmain>:
    800c:   e59f300c    ldr r3, [pc, #12]   ; 8020 <notmain+0x14>
    8010:   e593200c    ldr r2, [r3, #12]
    8014:   e2822007    add r2, r2, #7
    8018:   e583200c    str r2, [r3, #12]
    801c:   e12fff1e    bx  lr
    8020:   00008024    andeq   r8, r0, r4, lsr #32

Disassembly of section .bss:

00008024 <dotbss>:
    ...

Disassembly of section .rodata:

00008064 <readonly>:
    8064:   00000007    andeq   r0, r0, r7

Disassembly of section .data:

00008068 <dotdata>:
    8068:   00000009    andeq   r0, r0, r9


> arm-none-eabi-objcopy hello.elf -O binary kernel.img
> ls -al kernel.img
-rwxr-xr-x 1 root root 108 Sep 23 20:47 kernel.img
> hexdump -C kernel.img
00000000  01 d8 a0 e3 00 00 00 eb  fe ff ff ea 0c 30 9f e5  |.............0..|
00000010  0c 20 93 e5 07 20 82 e2  0c 20 83 e5 1e ff 2f e1  |. ... ... ..../.|
00000020  24 80 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |$...............|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000060  00 00 00 00 07 00 00 00  09 00 00 00              |............|
0000006c


This:

MEMORY
{
    ram : ORIGIN = 0x8000, LENGTH = 0x18000
}

SECTIONS
{
    .text : { *(.text*) } > ram
    .bss : { *(.bss*) } > ram
    .rodata : { *(.rodata*) } > ram
    .data : { *(.data*) } > ram
}

Is not nearly as ugly as this:

SECTIONS
{
   .text : { *(.text*) } > bob
   __data_rom_start__ = .;
   .data : {
    __data_start__ = .;
    *(.data*)
   } > ted AT > bob
   __data_end__ = .;
   __data_size__ = __data_end__ - __data_start__;
   .bss  : {
   __bss_start__ = .;
   *(.bss*)
   } > bob
   __bss_end__ = .;
   __bss_size__ = __bss_end__ - __bss_start__;
}

Both are using compiler/linker tricks to reach a goal.  The less
ugly one gives you everything you want, you get your .bss code already
zeroed, you get .data where you can use it.   With that simpler
linker script "all you have to do is" make sure that you have at least
one .data item or .rodata item so that objcopy is forced to place them
after .bss in the image and forced to pad .bss with zeros in the image
in order to place .data and/or .rodata in the right place.

You can use this on the Raspberry Pi and it will work just fine, on other
embedded platforms where you have novolatile memory (rom/flash) for
booting the code and a separate place for ram and you want to keep your
code in rom and data in ram, you have to use the more ugly solutions or
do as I do and simply dont have .data and dont care if .bss is zeroed.