a technical blog

ARM Assembler for iOS: Part 2 – First Steps

with 2 comments

As this series is mainly to write down my own learning experience and recap on it, I assume you have made your homework from Part 1 as well as I have. Mainly, reading the Whirlwind tour of ARM Assembly (Sections 23.1 – 23.3 should be enough for now). From here, we will start to write our own first assembly functions and get familiar with the the toolchain. In detail, we will learn the following in this part:

  • Writing a simple assembler function that uses simple instructions like mov and add
  • Compiling/linking an assembler-file to C-code
  • Using gdb to debug your assembly code


Let’s start with a quick summary of what you should have learn from the Whirlwind tour of ARM Assembly:

  • There basically exist three types of assembler instructions: data– (e.g. add, sub, mov, cmp), memory– (e.g. ldr,sdr,sdmfd) and –branch-instructions (e.g. b,bl,bx)
  • All instructions are 32 bit in size (except for THUMB-mode which has only 16 bit instructions); this leads to the problem that immediate values to be mov’ed into a register have only 12 bit left (20 bit are used to encode the rest of the instruction): 8 bit for a number n and 4 bit for a right-rotation r; based on this encoding, the represented number is given as n ror (2*r). Effectively, allowing a call like mov r0, #0x4 (not mov r0, #0xfff0) because 0x4 = 0x4 ror (2*0) (not possible for 0xfff0 because 0xfff does not even fit into the 8 bits of n).
  • The above limitation can be worked around by calculating the required value 0xfff0 with multiple assembler instructions or loading it from memory. ldr r0, =0xfff0 (note the “=”) does this implicitly: If representable by a so-called immediate value, it is a direct mov, but in the example it will be converted to a memory-load
  • There exist registers r0 to r15; where, r13 to r15 have special purposes and also an alias that can be used in assembly (r13/sp: Stack Pointer, r14/lr: Link Register, r15/pc: Program Counter). Actually, r11 and r12 too, but we omitt that here. r0-r3 and r12 are scratch registers that can be changed within a function-call; everything else (r4-r11) has to be restored correctly in the epilog of a called function.
  • All assembler instructions can be executed conditionally. E.g. cmp r0, r1, movlt r3, #4: movlt is only executed when r0 < r1 (less than). The cmp-instruction sets the status-flags that are read out by the next instruction. Generally, every instruction (where it makes sense) can also be executed to set the status-flags correctly by appending an “s”; e.g. subs r0, r1, r2 (r0=r1-r2 and update status-flags). Thus, the general form of a data-statement is op{cond}{status}
  • The Barrel Shifter can be applied to the last operand of a statement to shift/rotate its bits by a fixed amount or an amount given within a register. E.g. mov r0, r1, lsl #3 (in C-Syntax r1 << 3). This is basically free and is executed with the statement in one processor cycle and is preferable over multiplication operations whenever possible.
  • Data instructions (mov, add, etc.) can only work on registers and immediate values; not memory address; they need to be loaded in a register via a memory operation first. Also, there is no operation for division and multiplications are only possible on registers (not immediate values)
  • Memory Instructions (e.g. ldr r0 [r1]: load in register; in C-syntax: r0=*r1). The memory address defined by the bracketed statement ([]) can consist of a register, register + index-regsiter or register + immediate value and either case with additional barrel-shifting. E.g. ldr r0, [r1, r2, lsl #4] (r0=*(r1 + (r2 <<4))). It can also be used to load half-words (2 bytes) or  a single byte only. Append the following to the instruction for this: h (half-word), sh (signed half-word), b (byte), sb (signed byte).
  • Memory addresses in an ldr-instruction do not fit in the instruction (same reason as for the above limitation on immediate values (only 12 bits)). But, assembler will transparently convert used memory labels to so-called PC-relative addresses (address is calculated based on the current program-counter by the assembler). Alternatives are memory-pools which are created when using ldr r0, =labelname (note the “=”); note: this only loads the address of labelname into r0; not the value!!!
  • Bulk-memory operations like stm* and ldm* can be used to load a vector (array of memory values) from memory in a list of register and store it back, respectively. These operations are also used for pushing/poping parameters to/from the stack. There actually exist aliases for different stack-implementations. Generally, the full-decrementing stack is used on the ARM architecture: I.e. the stack-pointer (sp) points to the currently used stack-address and grows to beginning of the memory area. You can see in the figure below what an stmfd statement is doing internally, when storing r0 on the stack. Note also, that the exclamation mark “!” is the auto-indexing feature of these memory operations: sp is internally decremented to the next memory location automatically.

  • Branch Instructions are the GOTOs of assembly. You can change/redirect the flow of our assembly-code. Mainly, we have “b LABELNAME” to jump to a labeled position in your assembly and bl and bx to call subroutines and return again. “bl LABELNAME” will jump to a labeled point in your assembly-code and set the lr-register to the current pc-value (Program counter); when we want to return from our sub-routine (as a “return” in C-code), we call bx which is basically an alias for “mov pc, lr”; meaning: restore the program counter to the saved value and thus cotinue with the next assembly instruction after the sub-routine call. The only difference between “bx lr” and “mov pc, lr” is that the former is required for inter-operability between normal ARM assembly and THUMB-mode.

The first assembler function

Let us write a first simple assembler function based on our already gained knowledge. It will be located in an own assembler-file and will be called from our C-code’s main-method.

Create a file asmlib.s with the following assembly-code:

@ ARM Assembler Test Library

@ int asm_sum(int a, int b)
	.align 2				@ Align to word boundary
	.arm					@ This is ARM code
	.global asm_sum			@ This makes it a real symbol
asm_sum:					@ Start of function definition
	add     r2, r0, r1		@ Add up a (r0) and b (r1) and store result in r2
	mov		r0, r2			@ Store sum (r2) in r0 which stores return-value
	mov		pc, lr			@ Set program counter to lr (was set by caller)

We have defined a very simple assembler function to multiple the two arguments a and b (which are actually handed to the function in register r0 and r1) and return back the sum within r0 to the caller. We will get into the details of the call-conventions in the next Part of the series. For now, just take it for granted that the arguments are handed over in this way and returned in r0. You should know by now that the “asm_sum:” is a label to this instruction-block and when jumped to will first execute the “add”, “mov” and last reset the program-counter (pc) to the next statement of the callers code (was stored in lr by caller).

There are some ARM-asembler directives used in the begining of the file to align the function-label to the next word-boundary (.align x means align to 2^x byte boundary). In general, the ARM processor should be able to handle unaligned access as well, but the Apple documents specifically state that functions have to be aligned. To know the details on why aligned access is important, read this post. “.arm” defines this as ARM-code and “.global” exposes the label as a global symbol. This is important so we can call this function from our C-code.

Some more information is given in the comments starting with “@”. One additional note: We could actual remove the “mov r0, r2” call if we had called “add r0, r0, r1” in the first place, but to learn assembly, it is not bad to have some more explicit code.

Next, we will call our defined function that returns the sum of its two arguments from our main-method in C-code. We use the following main.c for this:

#include <stdio.h>

extern int asm_sum(int a, int b);

int main(int argc, char *argv[])
	printf("== sum ==\n");
	int a = 71;
	int b = 29;
	printf("%d + %d = %d\n", a, b, asm_sum(a, b));

	return 0;

The only thing that you will notice is that we have to define the asm_sum-method as as external symbol, as we don’t define it here in our C-code.

You can now try to compile and link both files to an executable with the following calls (the last line is already for running our executable):

arm-elf-gcc -mcpu=arm7 -O2 -g -c asmlib.s  -o asmlib.o
arm-elf-gcc -mcpu=arm7 -O2 -g -c main.c  -o main.o
arm-elf-gcc -mcpu=arm7 -o armtest *.o -lc
arm-elf-run armtest

Running the program should show you the expected result; i.e. display the printf-statement in the stdout. Good job!


Lets step through our assembly code with the debugger. First, create a file named “.gdbinit” in the folder where also the other two files are located and put in the following lines:

file armtest
target sim

This file will be loaded on startup of gdb and already set our executable, set the target architecture to the ARM simulator and load the code into memory. We can now do some simple debugging:

macbook:ARMAssembly_Part2 daniel$ arm-elf-gdb
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "--host=powerpc-apple-darwin6.8 --target=arm-elf".
Connected to the simulator.
Loading section .init, size 0x1c vma 0x8000
Loading section .text, size 0x8a3c vma 0x801c
Loading section .fini, size 0x18 vma 0x10a58
Loading section .rodata, size 0x248 vma 0x10a70
Loading section .data, size 0x8bc vma 0x10db8
Loading section .eh_frame, size 0x4 vma 0x11674
Loading section .ctors, size 0x8 vma 0x11678
Loading section .dtors, size 0x8 vma 0x11680
Loading section .jcr, size 0x4 vma 0x11688
Start address 0x811c
Transfer rate: 306272 bits in <1 sec.
(gdb) break asmlib.s:9
Breakpoint 1 at 0x8228: file asmlib.s, line 9.
(gdb) run
Starting program: /Users/daniel/Dev/Test/ARMAssembly_Part2/armtest
== sum ==

Breakpoint 1, asm_sum () at asmlib.s:9
9		mov		r0, r2			@ Store sum (r2) in r0 which stores return-value
Current language:  auto; currently asm
(gdb) print $r0
$1 = 71
(gdb) print $r1
$2 = 29
(gdb) print $r2
$3 = 100
(gdb) stepi
10		mov		pc, lr			@ Set program counter to lr (was set by caller)
(gdb) continue
71 + 29 = 100

Program exited normally.
[Switching to process 0]
Current language:  auto; currently c

You see, that we first define a breakpoint at line 9 of the file asmlib.s, then we “run” the executable. The simulator stops at the breakpoint. “print $regname” allows us to print the content of a register. As exepected, the function parameters one and two are stored in r0 and r1 (compare with the values set for a and b in main.c). After that, we use “stepi” to step to the next instruction and then call “continue” so the normal execution proceeded and finally ends the program. These few gdb-commands should already allow you to do some simple debugging. For more details on gdb, google is your friend 🙂

Some more instructions in a nutshell

Lets write one more assembly function to get familiar with some more instructions. How about a multiply-routine that returns for two parameter a and b the value a*b:

@ int asm_mul(int a, int b)
	.align 2
	.global asm_mul
	stmfd   sp!, {r4-r11}   @ in case we needed to work with more than registers r0-r3,
                            @ have to save the first on the stack (only r0-r3 and r12 are scratch registers)
                            @ Here, actually don't need them...

	mov     r3, #0          @ Initialize register holding result of multiplication

	movs    r2, r0          @ Move "a" into r2 and set status-flags (mov"s")
	beq     asm_mul_return  @ Immediately return if a==0

	movs    r2, r1          @ Move "b" into r2 and set status-flags (mov"s")
	beq     asm_mul_return  @ Immediately return if b==0

	add     r3, r3, r0      @ r3 = r3 + r0
	subs    r1, r1, #1      @ r1 = r1 - 1 (decrement)
	bne     asm_mul_loop    @ If the zero-flag is not set (r1 > 0), loop once more

	ldmfd   sp!, {r4-r11}   @ Restore the registers
	mov     r0, r3          @ Store result in r0 (return register)
	mov     pc, lr

Please note that we could have used assembly instructions to do the multiplication for us, but this way we can recap on some instructions we have learned more easily.

The algorithm implemented is basically: result = 0; if (a == 0 or b==0) return result; else while(b>0) result = result + a; b = b – 1 and should be quiet straight-forward. Here are some points to take away though:

  • “subs” not only does a substraction, but because we append the “s”, the status-register will be set; in specific, the zero-flag. If not zero (ne=not equal: a little confusing, but think of it as “if the result is non-zero, the two operands to sub must be not equal“; or have a look in the table in Section 23.3.4 of the well-known tutorial guiding this series) we branch to the label asm_mul_loop.
  • You see, that the status-register can be set by almost any data-instruction; also “mov” can be appended with “s” to set it. Here, we read out again the zero-flag.
  • We use stmfd to store registers r4-r11 on the stack and restore it at the end of the routine via lmfd. This, you would in general always do, if you need to use more then registers r0-r3 in your routine. As r4-r11 are no scratch-registers, it is the convention that they are the same after a functional call as before.

To test your routine, you will have to add a method-declaration for asm_mul to main.c and call it within your main-method. You can find the full sources and a Makefile of our examples on github.

Further Reading

As we will be covering function-call conventions in iOS within the next part, I recommend the following read as preparation:

Written by 38leinad

April 13, 2011 at 6:56 pm

Posted in arm, tutorial

Tagged with ,

2 Responses

Subscribe to comments with RSS.

  1. Love your ARM asm tutorials! Very well written and extremely helpful. Look forward to the iOS integration Part III. 🙂


    July 29, 2011 at 9:16 am

  2. It ranks up there with the likes of getting to meet Johnny Depp, the best dessert you ever ate, and your husband doing all laundry and cooking for
    an entire month. Founded in March 2010, Pinterest
    has become the fastest-growing social media site on the web, gaining 145% more
    users since January 2012 alone. The “following” and “followers” selection operates in the same fashion as Google+ and Twitter.


    September 22, 2014 at 11:35 pm

Any thoughts?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: