Bare Metal C Programming the Blue Pill - Part 1

These days, there are many easy to use environments for microcontroller programming. You have the popular Arduino platform but also the IDEs and tools provided by the different chip manufacturers as well as several open source efforts.

While these are certainly great for getting your project going, they often gloss over the actual "bare metal" details of the chip you work with. So let's explore a bit and peek under the hood.


For this post, I'm going to use a little STM32 dev board that is generally advertised as the "Blue Pill" and can be found on eBay, Aliexpress or your other favorite source of cheap gadgets from China.

The board is based around an STM32F103C8T6 chip. It contains a 32bit ARM Cortex M3 microcontroller and lots of peripherals. You can typically get the boards for less than 2$ per piece. While the identifier "STM32F103C8T6" might look scary at first, it's actually quite simple: It's a part from STMicroelectronics, has a 32bit CPU, is part of the F1 family, specifically the F103 model. C8T6 encodes things like which chip package is used and how much flash memory is included (in this case 64kB).

Since the chip features an ARM 32bit CPU, we will need a compiler that can target tthat architecture. I'll use GCC - so the compiler we want is gcc-arm-none-eabi (targeting ARM, no OS, embedded ABI). If you're using Ubuntu, Debian or Raspbian, you can install the compiler via apt-get install gcc-arm-none-eabi.

How do you program a micro controller?

Contrary to your typical $FAVORITE_OS program, you do not (generally) have any libraries or environment around your code. Instead, the instructions your code is compiled into just run directly, one after each other.

Another important aspect is that you do not have virtual memory - so you do not have a private large memory space you can use as you like and you talk to a kernel when you want to do anything with hardware. Instead, you are directly accessing the physical memory (RAM).

The physical memory you can use for your program however does not start at address 0. It is actually a relatively small chunk among other things at are made to look like they are part of memory. So what else do we have there? Let's have a look at the datasheet. In section 4, we can see how memory is organized. The interesting bits are:

First Byte Last Byte Length Contents
0x0000 0000 Depends 2kB or 64kB Either Flash memory or hardwired system memory depending on boot pins
0x0800 0000 0x0800 FFFF 64kB Flash memory (the C8T6 variant has 64kB. The datasheet shows 128kB)
0x1FFF F000 0x1FFF F7FF 2kB System memory with the hardwired bootloader from ST
0x2000 0000 0x2000 4FFF 20kB SRAM - our main RAM
0x4000 0000 0x4000 33FF 141kB Peripherals (your ADCs, GPIOs, SPIs, UARTs, ... )

So we can't just write our data / variables to arbitrary memory locations - they might not be backed by RAM. Or even worse, they might be backed by some memory mapped hardware and we could get interesting side effects.

We now know that address 0 actually either maps to the embedded flash memory or the hardwired system memory. Which one we get is determined by the boot pins (BOOT0 and BOOT1). If we select flash, whatever we put in there appears starting at address 0.

The Linker

While the compiler takes care of translating our C code to machine instructions (for ARM in our case), the linker is responsible for resolving references between functions, globals or in more generic terms "symbols". It also lays out the final binary image.

We can control this process with linker scripts. Here's a minimal example (linker.ld):

    FLASH (rx)  : ORIGIN = 0x08000000, LENGTH = 64K
    RAM   (rwx) : ORIGIN = 0x20000000, LENGTH = 20K

    .text : {
        /* Start at 0 for the code in flash */
        . = 0;

        /* At the very beginning, we need the interrupt vectors.
        * We need to mark this KEEP to make sure it doesn't get garbage collected.
        * The syntax *(.foo) means take the .foo sections from all files. */

        /* The whole interrupt table is 304 bytes long. Advance to that position in case
        * the supplied table was shorter. */
        . = 304;

        /* And here comes the rest of the code, ie. all symbols starting with .text */
    } > FLASH = 0xFF /* Put this in the flash memory region, default to 0xFF */

    _end_of_ram = ORIGIN(RAM) + LENGTH(RAM); /* Define a symbol for the end of ram */

MEMORY defines the location an size of flash and ram. We start laying out the text section at position 0, first putting the interrupt_vectors section, ensuring it is at least 304 bytes long and then we collect all .text sections with the code. A special symbol _end_of_ram is defined to point at the byte just after the last usable ram byte - we will need this to initialize the stack pointer.

The flash memory's default state is 0xFF and not 0x00, so we'll set this as the fill value. Also note that there would be two options for the flash ORIGIN address: We could use either 0x0000 0000 or 0x0800 0000, but remember that it only appears at at 0x0000 0000 if we set the boot configuration as such. So if we chose to boot differently, it will not be there.

Writing some code

With the linker script prepared, let's write some code. The hello world of micro controllers is to blink an LED. The Blue Pill has one on board which is connected to pin PC13.

So far, we've only looked at the datasheet. In order to write code that accesses peripherals like the GPIO module to blink the LED, we'll need more information. We can find the necessary details in the document RM0008: The reference manual for the STM32F1xx parts.

Don't get scared by the over 1100 pages of text. We will reference specific sections to figure out what we're after. A good place to start is section 3.1: System architecture. Here we can see that there are 1 CPU core and 2 DMA engines accessing flash, ram and peripherals through a bus matrix. The peripherals sit behind the AHB on two peripheral buses: APB1 and APB2. GPIOC which we need for controlling our LED at PC13 is on APB2.

Clock gating

One thing that might not immediately be obvious is that the clock for these peripherals can be turned off to conserve power. And they do default to off. So we first need to turn the clock for port C on. Where we do that is described in section 7.3.7: APB2 peripheral clock enable register (RCC_APB2ENR)

So we want to set bit 4 of this register to 1 to enable port C, but where is this register in memory? We're only being told that it is at address 0x18 within the memory for RCC configuration. So let's go back to section 3.3 to consult the memory map. There, "Reset and clock control RCC" is listed at 0x4002 1000. Adding 0x18 we get 0x4002 1018.

GPIO config

So what else do we need to do? Section 9 covers GPIO. Looks like have quite some options. Let's just do a simple push-pull output. Table 20 and 21 tell us that we need CNF=00 and MODE=11 for a max. 50 MHz speed push-pull.

Section 9.2.2 documents the register we need. The bits for pin 13 are 20-23. Note the reset value 0x4444 4444: We'll have to leave the other bits with those values to avoid changing the config of any other pins. The section specifies an address offset of 0x04, but what is the base for this? Our trusty memory map in section 3.3 points us at 0x4001 1000 for GPIO Port C, so we get 0x4001 1004.

Toggling the GPIO pin

Finally, we need to toggle the pin between high and low to blink the LED. We can use the trick described in section 9.1.2 to atomically set/reset only the pin we're interested in. The BSRR register for that purpose is at offset 0x10 as described in section 9.2.5. We need to set bit 13 to make PC13 high and bit 29 to make it low respectively.

Putting it all together

With this information, we can put together our main.c:

// This is the symbol defined in the linker script.
extern unsigned long _end_of_ram;

// We'll implement main a bit further down but we need the symbol for
// the initial program counter / start address.
void main();

// This is our interrupt vector table that goes in the dedicated section.
__attribute__ ((section(".interrupt_vectors"), used))
void (* const interrupt_vectors[])(void) = {
    // The first 32bit value is actually the initial stack pointer.
    // We have to cast it to a function pointer since the rest of the array
    // would all be function pointers to interrupt handlers.
    (void (*)(void))((unsigned long)&_end_of_ram),

    // The second 32bit value is the initial program counter aka. the
    // function you want to run first.

void wait() {
    // Do some NOPs for a while to pass some time.
    for (unsigned int i = 0; i < 2000000; ++i) __asm__ volatile ("nop");

void main() {
    // Enable port C clock gate.
    *((volatile unsigned int *)0x40021018) |= (1 << 4);

    // Configure GPIO C pin 13 as output.
    *((volatile unsigned int *)0x40011004) = ((0x44444444 // The reset value
        & ~(0xfU << 20))  // Clear out the bits for pin 13
        |  (0x3U << 20)); // Set both MODE bits, leave CNF at 0

    while (1) {
        // Set the output bit.
        *((volatile unsigned int *)0x40011010) = (1U << 13);
        // Reset it again.
        *((volatile unsigned int *)0x40011010) = (1U << 29);

This program essentially only consists of two parts: - The interrupt vector table that specifies the initial stack pointer and our entrypoint - A super tiny main function that sets up the GPIO pin and toggles it in a loop

If you're wondering how I came up with 2000000 loop iterations: By default, the CPU runs off a 8MHz internal oscillator. Each loop iteration takes a few instructions, so if we count to 2 million, it's more or less going to take about a second.

We can compile:

arm-none-eabi-gcc -ggdb -Wall -Os -mthumb -nostdlib -o main.o -c main.c


arm-none-eabi-gcc -ggdb -Os -Wl,--gc-sections -mthumb -mcpu=cortex-m3 -Tlinker.ld -o main.elf main.o

and flash (using the JTAG adapter described in this post):

openocd -f sipeed.cfg -c "program main.elf verify reset exit"

and you should see the onboard LED start blinking.

Check out part 2 to do more with OpenOCD.