Bare Metal C Programming the Blue Pill - Part 2

In part 1, we came up with a minimal program and a linker script to compile a binary we could load onto the Blue Pill. We used OpenOCD to program the binary into the chip's embedded flash memory. OpenOCD can however do much more than what we did so far.

Programming Flash

To change things up a little bit, we're going to use an ST-LINK V2 adapter this time. Compared to the the Sipeed JTAG adapter used in part 1, it has less features; there is no embedded serial port and it can't do JTAG.

On the other hand, the ST-LINK provides power for our Blue Pill and uses SWD to talk to the chip - the necessary SWDIO and SWCLK pins are conveniently accessible on the end of the Blue Pill. For a smooth experience, you also want to connect the RST pin of the ST-LINK with the reset pin on the Blue Pill (next to the USB connector, marked "R" or "B2").

Since we're using a different adapter (or "interface" in OpenOCD terminology), we need a new config file. OpenOCD ships with one for the ST-LINK, so our config can be as simple as this (stlink_bluepill.cfg):

source [find interface/stlink-v2.cfg]
transport select hla_swd
source [find target/stm32f1x.cfg]

We're just using the shipped interface config, the SWD transport and the STM32F1x target (remember, the Blue Pill has an STM32F103 chip). We can again program our ELF binary from part 1:

openocd -f stlink_bluepill.cfg -c "program main.elf verify reset exit"


OCD stands for "On-Chip Debugger", so programming could be seen more as a side business than the core purpose of OpenOCD. ARM cores contain a debug port (DP) that can be access via JTAG or SWD. Through it, we can read and manipulate the CPU's state and memory.

Let's start OpenOCD again, this time without programming anything:

$ openocd -f stlink_bluepill.cfg
Open On-Chip Debugger 0.10.0
Licensed under GNU GPL v2
For bug reports, read
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
adapter speed: 1000 kHz
adapter_nsrst_delay: 100
none separate
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : Unable to match requested speed 1000 kHz, using 950 kHz
Info : clock speed 950 kHz
Info : STLINK v2 JTAG v29 API v2 SWIM v7 VID 0x0483 PID 0x3748
Info : using stlink api v2
Info : Target voltage: 3.145419
Info : stm32f1x.cpu: hardware has 6 breakpoints, 4 watchpoints

In this mode, OpenOCD will be listening on ports 3333 and 4444. Let's try telnet on port 3333:

$ telnet localhost 4444
Trying ::1...
Connected to localhost.
Escape character is '^]'.
Open On-Chip Debugger

We get a prompt, where we can for example halt the CPU. If we do this, our blinking LED should stop:

> halt
target halted due to debug-request, current mode: Thread
xPSR: 0x21000000 pc: 0x08000132 msp: 0x20004ff0

You can resume the CPU (and the blinking) with resume. To disconnect, simply use exit.


On port 3333, OpenOCD runs a GDB server. We can run GDB and connect to it with target remote :3333 like this:

$ arm-none-eabi-gdb main.elf
GNU gdb (7.10-1+9) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=arm-linux-gnueabihf --target=arm-none-eabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
Find the GDB manual and other documentation resources online at:
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from main.elf...done.
(gdb) target remote :3333
Remote debugging using :3333
wait () at main.c:22
22        for (unsigned int i = 0; i < 2000000; ++i) __asm__ volatile ("nop");

We compiled the binary with -ggdb, so GDB was able to load debug symbols and after connecting to the target can directly tell us what code the CPU was executing. Since we're waiting most of the time, it's highly likely that we see the for loop in the wait() function doing NOPs.

Let's reset the CPU and single-step through our code (you can just press enter to repeat the previous command):

(gdb) monitor reset halt
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x08000140 msp: 0x20005000
(gdb) s
25      void main() {
36          *((volatile unsigned int *)0x40011010) = (1U << 13);
39          *((volatile unsigned int *)0x40011010) = (1U << 29);
27        *((volatile unsigned int *)0x40021018) |= (1 << 4);
30        *((volatile unsigned int *)0x40011004) = ((0x44444444 // The reset value
36          *((volatile unsigned int *)0x40011010) = (1U << 13);

The LED should now be turned on - but what follows is a boring 2 million NOP loops. If we want to go straight to where we turn the LED off, we can use a break point. Resetting the LED is on line 39, so:

(gdb) break main.c:39
Breakpoint 1 at 0x8000146: file main.c, line 39.
(gdb) c

Unfortunately, at least with my GCC and GDB versions, this did not work.

Interlude: Why is the breakpoint not working?

We can disassemble the code of our main function:

(gdb) disass
Dump of assembler code for function main:
0x08000140 <+0>:     movs    r1, #16
0x08000142 <+2>:     push    {r4, r5, r6, lr}
0x08000144 <+4>:     movs    r6, #128        ; 0x80
0x08000146 <+6>:     movs    r5, #128        ; 0x80
0x08000148 <+8>:     ldr     r2, [pc, #32]   ; (0x800016c <main+44>)
0x0800014a <+10>:    ldr     r3, [r2, #0]
0x0800014c <+12>:    orrs    r3, r1
0x0800014e <+14>:    str     r3, [r2, #0]
0x08000150 <+16>:    ldr     r2, [pc, #28]   ; (0x8000170 <main+48>)
0x08000152 <+18>:    ldr     r3, [pc, #32]   ; (0x8000174 <main+52>)
0x08000154 <+20>:    str     r2, [r3, #0]
0x08000156 <+22>:    lsls    r6, r6, #6
0x08000158 <+24>:    lsls    r5, r5, #22
0x0800015a <+26>:    ldr     r4, [pc, #28]   ; (0x8000178 <main+56>)
0x0800015c <+28>:    str     r6, [r4, #0]
0x0800015e <+30>:    bl      0x8000130 <wait>
=> 0x08000162 <+34>:    str     r5, [r4, #0]
0x08000164 <+36>:    bl      0x8000130 <wait>
0x08000168 <+40>:    b.n     0x800015a <main+26>
0x0800016a <+42>:    nop                     ; (mov r8, r8)
0x0800016c <+44>:    asrs    r0, r3, #32
0x0800016e <+46>:    ands    r2, r0
0x08000170 <+48>:    add     r4, r8
0x08000172 <+50>:    add     r4, r6
0x08000174 <+52>:    asrs    r4, r0, #32
0x08000176 <+54>:    ands    r1, r0
0x08000178 <+56>:    asrs    r0, r2, #32
0x0800017a <+58>:    ands    r1, r0
End of assembler dump.

Looking at the code, we wanted to break between the two call to main - this would be at address 0x08000162. However, the breakpoint was set at 0x8000146. If we stare at the code a bit, we can see that GCC optimized it for speed quite a bit.

Instead of computing the bitshift in each loop iteration, it's only doing str r5, [r4, #0], so just storing the value from register r5. And if we look at the instruction where GDB set the breakpoint, movs r5, #128, this is actually the first step in computing the shifted (1U << 29) value. The second part is at 0x08000158 where the already set 0x80 is shifted by 22.

So it looks like GDB set the breakpoint on the first instruction belonging to the line of code we wanted to break on - unfortunately this is outside the loop and we won't hit it.

We can set a breakpoint on the correct address manually:

(gdb) break *0x08000162
Breakpoint 7 at 0x8000162: file main.c, line 39.
(gdb) c

Breakpoint 2, main () at main.c:39
39          *((volatile unsigned int *)0x40011010) = (1U << 29);

Looking at CPU state

After hitting the (fixed) breakpoint from above, we can take a look at the CPU registers:

(gdb) info registers
r0             0x4a5dead2       1247668946
r1             0x10     16
r2             0x44344444       1144276036
r3             0x0      0
r4             0x40011010       1073811472
r5             0x20000000       536870912
r6             0x2000   8192
r7             0xf23d2b64       -230872220
r8             0xff3ffffe       -12582914
r9             0xefbf9ddd       -272654883
r10            0x2ad17842       718370882
r11            0x63dac8c9       1675282633
r12            0xf9ffeffb       -100667397
sp             0x20004ff0       0x20004ff0
lr             0x8000163        134218083
pc             0x8000162        0x8000162 <main+34>
xPSR           0x61000000       1627389952

Most interesting are r5 and r6, where we figured out our shifted values for turning on/off the LED are. pc is the program counter - which unsurprisingly points to the instruction we set the breakpoint at.

sp is the stack pointer. If we compare it to where we started, we didn't get too far:

(gdb) print &_end_of_ram
$1 = (unsigned long *) 0x20005000

But since our program is not that big, that's to be expected. We could also print memory, but again, due to the simplicity of our minimal program, everything ends up in registers anyway.