Blog on embedded programming

The trip to the dark side

Assembly Hello World Code for Stm32f4discovery

Hello everyone! First of all: why use assembly language? In my opinion one should remove any kind of intermediary to know something well - remove anything that can be removed and examine the target directly. When programing for PC it’s common not to think what processor is going to execute the code. We don’t think about the amount of bytes that are generated for the execution. I’m talking just about the corresponing level of abstraction - tools should correspond to the task.

The task for this post is:

  • to realize the stm32f4 memory layout
  • to know what’s going on after microcontroller (or MC) is powered up
  • to get familiar with stm32f4 IO port with writing basic blinking program

Knowing how to work with IO port gives you already a real possibility to create a variaty of MC applications. So, let’s begin.

Memory layout and linker script

After the code is compiled and assembled the linkage stage comes on. When code is broken into several source files the assebmling gives several so called object files - these are input files for the linker and it’s output is a single object file or an executable.

All input files have, for example, code and data segments. The output file also should then have both of them, contaning the summary of all input data and code segments respectively - all input data segments concatenate into one output data segment and so on. That’s one of the linker tasks. It’s one more task is to decide where to place each segment within the executable and to manage segments’ addresses. In other words linker’s task is a complecated one and there’s no need to dig deeper right now.

Stm32f4 memory map can be found in it’s datasheet which is available on ST site (direct link, page 69). That is the description of virtual address space - just the way of abstraction above all the peripherals inside MC. This address space is used by the core to deal with everything around it in one manner.

From memory map we can see where flash memory and SRAM memory are located. Their address boundaries are exactly what we need now. We need to provide the linker a possibility to manage addresses within our executable and to place segments into right places. The way to do that is to create a linker script. Linker has the default one which can be viewed in a such way:

1
$ arm-none-eabi-ld --verbose

It’s huge ) And it won’t help us. We need the new one for our MC. And it will be much simpler. Ld documentation on scripts is VERY useful reading to understand object files and linker scripting, so please refer to the doc and let me be less verbose).

Here is the script:

(basic.ld) download
1
2
3
4
5
6
7
8
9
10
11
12
INCLUDE stm32f407.memory

ENTRY(_reset)

SECTIONS {
  .int_vector_table : {
    *(.int_vector_table)
  } > REGION_INT_VECTORS
  .text : {
    *(.text)
  } > REGION_TEXT
}

and here is the stm32f4 specific part included at the top:

(stm32f407.memory) download
1
2
3
4
5
6
7
8
MEMORY {
  FLASH (rx) : ORIGIN = 0x8000000,  LENGTH = 1024K
  RAM (! rx) : ORIGIN = 0x20000000, LENGTH = 192K
}

_estack = 0x20020000;
REGION_ALIAS("REGION_TEXT", FLASH);
REGION_ALIAS("REGION_INT_VECTORS", FLASH);

In stm32f407.memory file two regions of MC’s virtual memory map are described (one note here: the space character in RAM’s attributes “! rx” is nessecary because of some bug - without the space this file should not be included via INCLUDE command but pasted directly into basic.ld - then linkage goes fine). This file is included in a script basic.ld. In the script file we just describe regions and order in which input sections should be placed in an output object file. You may also just download these files.

There’s one more section generated automatically. It’s called .ARM.attributes - I didn’t mention it in a script as I’m not sure what is it for at the moment and we don’t need to use it in a current code. The linker automatically places unmapped sections right after all mapped sections in the output file.

From this moment we can write assembly code just using only .text section and a linker will correctly form an executable. But.. there’s one more section called .int_vector_table and which is placed at the beginning of the executable…

The MC power on

Actually there’s no need at ENTRY command right now in our linker script which stays for an entry point. After the system is powered on the MC core reads the first word of flash memory - that’s the stack top address which is loaded in sp register. So the entry point is known to the core and is equal to the beginning of the flash memory. The second word in the flash is assumed to be the reset handler address and the second step is to jump to that address. The remaining execution process depends on the loaded code or on occasions of hardware interrupts which are not covered in this post. Reset is also an interrupt and when it occurs (for example we push the reset button) the core just reads the reset handler address (or “vector”) again and jumps to it. The rest of interrupts act the same. The storage of interrupts’ handlers are fixed (interrupt vectors table) - that’s how the core is able to jump to the right place on a particular interrupt event.

The blinking program

Here is the code:

(main.S) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
    .syntax unified
    .cpu cortex-m4
    .fpu softvfp
    .thumb

#define RCC_BASE        0x40023800
#define RCC_AHB1ENR     RCC_BASE + 0x30 
#define GPIOD_BASE      0x40020C00
#define GPIOD_MODER     GPIOD_BASE
#define GPIOD_ODR       GPIOD_BASE + 0x14

#define GPIODEN         1 << 3
#define MODER15_OUT     1 << 30
#define MODER14_OUT     1 << 28
#define MODER13_OUT     1 << 26
#define MODER12_OUT     1 << 24
#define LED_BLUE        1 << 15
#define LED_RED         1 << 14
#define LED_ORANGE      1 << 13
#define LED_GREEN       1 << 12
#define DELAY           0x000F

.section    .text
  .weak     _reset              /* that's because of declaring it as an entry point */
  .type     _reset, %function
_reset:
    ldr     r0, =RCC_AHB1ENR    /* firstly, to use any peripheral, the clock should be */
    ldr     r1, [r0]            /* enabled for it. Different registers are responsible for that */
                                /* (see reference manual's "Reset and clock control" section on page 210). */
                                /* For use of GPIOD it's clock is enabled in RCC_AHB1ENR register */
    orr     r1, GPIODEN         /* by setting GPIODEN bit */
    str     r1, [r0]
    ldr     r0, =GPIOD_MODER    /* port mode configuration ragister (input,output, etc) */
    ldr     r1, =(MODER15_OUT | MODER14_OUT | MODER13_OUT | MODER12_OUT)
    str     r1, [r0]            /* configuring GPIOD pins 12-15 as outputs where LEDs are connected to */
    mov     r1, 0              /* clearing for further LED selection    */
    ldr     r2, =GPIOD_ODR      /* GPIOD output data register  */
                                /* writing to it controls pins' voltage */
.Lblink:
    movw    r1, LED_GREEN      
    str     r1, [r2]            /* setting LED_GREEN bit on GPIOD_ODR */
    bl      .Ldelay             /* pause */
    movw    r1, LED_BLUE
    str     r1, [r2]            /* setting LED_BLUE bit on GPIOD_ODR  */
    bl      .Ldelay             /* pause */
    movw    r1, LED_RED
    str     r1, [r2]            /* etc  */
    bl      .Ldelay
    movw    r1, LED_ORANGE
    str     r1, [r2]
    bl      .Ldelay
    b       .Lblink

.Ldelay:
    movt    r0, DELAY           /* moving DELAY value into high halfword of the register  */
1:                              /* to make a big number */
    subs    r0, r0, 1           /* and just spend time substracting */
    bne     1b
    bx      lr                  /* return */

    .size   _reset, . - _reset


.section    .int_vector_table, "a", %progbits   /* interrupt table */
                                                /* "a" - tells that section is allocatable  */
                                                /* (see ld manual) */
                                                /* %progbits - tells that section contains data */
                                                /* (see gas manual) */
    .type   basic_vectors, %object
basic_vectors:
    .word   _estack             /* stack top address (declared in basic.ld) - the last SRAM address */
    .word   _reset              /* the address of a reset handler */

    .size   basic_vectors, . - basic_vectors

I tried to write self-expanatory code with comments, so I won’t stop on it’s contents. If you have any questions feel free to send me a letter. Now let me describe the compilation process. This code has to be compiled with gcc, but not with as. That’s because it uses the power of preprocessing stage to calculate different values in a set of #define. The following commands need to be run:

1
2
$ arm-none-eabi-gcc -Wall -Wextra -o main.elf -nostdlib -mcpu=cortex-m4 -mthumb -Tbasic.ld main.S
$ arm-none-eabi-objcopy -O binary main.elf main.bin

The first one runs preprocessing, assembling and linking stages and silently generates ELF-format executable if there are no errors. Note some new parameters. The -nostdlib parameter tells the linker not to link standard c library, -Tbasic.ld passes the linker script to the ld. The script file should be present at current directory together with the source.

The second just transformes the ELF-format executable which can be run in a linux environment into a binary suitable to be executed by MC.

Huuuuh! The final step is to burn our binary file into discovery board. Connect the board and run the following command:

1
$ st-flash write main.bin 0x8000000

Read carefully the docs on stlink. If everything is OK - the LEDs are already blinking. Here is a compiled executable - download