ARM Assembler for iOS: Part 2 – First Steps
As this series is mainly to write down my own learning experience and recap on it, I assume you have made your homework from Part 1 as well as I have. Mainly, reading the Whirlwind tour of ARM Assembly (Sections 23.1 – 23.3 should be enough for now). From here, we will start to write our own first assembly functions and get familiar with the the toolchain. In detail, we will learn the following in this part:
- Writing a simple assembler function that uses simple instructions like mov and add
- Compiling/linking an assembler-file to C-code
- Using gdb to debug your assembly code
Recap
Let’s start with a quick summary of what you should have learn from the Whirlwind tour of ARM Assembly:
- There basically exist three types of assembler instructions: data– (e.g. add, sub, mov, cmp), memory– (e.g. ldr,sdr,sdmfd) and –branch-instructions (e.g. b,bl,bx)
- All instructions are 32 bit in size (except for THUMB-mode which has only 16 bit instructions); this leads to the problem that immediate values to be mov’ed into a register have only 12 bit left (20 bit are used to encode the rest of the instruction): 8 bit for a number n and 4 bit for a right-rotation r; based on this encoding, the represented number is given as n ror (2*r). Effectively, allowing a call like mov r0, #0x4 (not mov r0, #0xfff0) because 0x4 = 0x4 ror (2*0) (not possible for 0xfff0 because 0xfff does not even fit into the 8 bits of n).
- The above limitation can be worked around by calculating the required value 0xfff0 with multiple assembler instructions or loading it from memory. ldr r0, =0xfff0 (note the “=”) does this implicitly: If representable by a so-called immediate value, it is a direct mov, but in the example it will be converted to a memory-load
- There exist registers r0 to r15; where, r13 to r15 have special purposes and also an alias that can be used in assembly (r13/sp: Stack Pointer, r14/lr: Link Register, r15/pc: Program Counter). Actually, r11 and r12 too, but we omitt that here. r0-r3 and r12 are scratch registers that can be changed within a function-call; everything else (r4-r11) has to be restored correctly in the epilog of a called function.
- All assembler instructions can be executed conditionally. E.g. cmp r0, r1, movlt r3, #4: movlt is only executed when r0 < r1 (less than). The cmp-instruction sets the status-flags that are read out by the next instruction. Generally, every instruction (where it makes sense) can also be executed to set the status-flags correctly by appending an “s”; e.g. subs r0, r1, r2 (r0=r1-r2 and update status-flags). Thus, the general form of a data-statement is op{cond}{status}
- The Barrel Shifter can be applied to the last operand of a statement to shift/rotate its bits by a fixed amount or an amount given within a register. E.g. mov r0, r1, lsl #3 (in C-Syntax r1 << 3). This is basically free and is executed with the statement in one processor cycle and is preferable over multiplication operations whenever possible.
- Data instructions (mov, add, etc.) can only work on registers and immediate values; not memory address; they need to be loaded in a register via a memory operation first. Also, there is no operation for division and multiplications are only possible on registers (not immediate values)
- Memory Instructions (e.g. ldr r0 [r1]: load in register; in C-syntax: r0=*r1). The memory address defined by the bracketed statement ([]) can consist of a register, register + index-regsiter or register + immediate value and either case with additional barrel-shifting. E.g. ldr r0, [r1, r2, lsl #4] (r0=*(r1 + (r2 <<4))). It can also be used to load half-words (2 bytes) or a single byte only. Append the following to the instruction for this: h (half-word), sh (signed half-word), b (byte), sb (signed byte).
- Memory addresses in an ldr-instruction do not fit in the instruction (same reason as for the above limitation on immediate values (only 12 bits)). But, assembler will transparently convert used memory labels to so-called PC-relative addresses (address is calculated based on the current program-counter by the assembler). Alternatives are memory-pools which are created when using ldr r0, =labelname (note the “=”); note: this only loads the address of labelname into r0; not the value!!!
- Bulk-memory operations like stm* and ldm* can be used to load a vector (array of memory values) from memory in a list of register and store it back, respectively. These operations are also used for pushing/poping parameters to/from the stack. There actually exist aliases for different stack-implementations. Generally, the full-decrementing stack is used on the ARM architecture: I.e. the stack-pointer (sp) points to the currently used stack-address and grows to beginning of the memory area. You can see in the figure below what an stmfd statement is doing internally, when storing r0 on the stack. Note also, that the exclamation mark “!” is the auto-indexing feature of these memory operations: sp is internally decremented to the next memory location automatically.
- Branch Instructions are the GOTOs of assembly. You can change/redirect the flow of our assembly-code. Mainly, we have “b LABELNAME” to jump to a labeled position in your assembly and bl and bx to call subroutines and return again. “bl LABELNAME” will jump to a labeled point in your assembly-code and set the lr-register to the current pc-value (Program counter); when we want to return from our sub-routine (as a “return” in C-code), we call bx which is basically an alias for “mov pc, lr”; meaning: restore the program counter to the saved value and thus cotinue with the next assembly instruction after the sub-routine call. The only difference between “bx lr” and “mov pc, lr” is that the former is required for inter-operability between normal ARM assembly and THUMB-mode.
The first assembler function
Let us write a first simple assembler function based on our already gained knowledge. It will be located in an own assembler-file and will be called from our C-code’s main-method.
Create a file asmlib.s with the following assembly-code:
@ ARM Assembler Test Library @ int asm_sum(int a, int b) .align 2 @ Align to word boundary .arm @ This is ARM code .global asm_sum @ This makes it a real symbol asm_sum: @ Start of function definition add r2, r0, r1 @ Add up a (r0) and b (r1) and store result in r2 mov r0, r2 @ Store sum (r2) in r0 which stores return-value mov pc, lr @ Set program counter to lr (was set by caller)
We have defined a very simple assembler function to multiple the two arguments a and b (which are actually handed to the function in register r0 and r1) and return back the sum within r0 to the caller. We will get into the details of the call-conventions in the next Part of the series. For now, just take it for granted that the arguments are handed over in this way and returned in r0. You should know by now that the “asm_sum:” is a label to this instruction-block and when jumped to will first execute the “add”, “mov” and last reset the program-counter (pc) to the next statement of the callers code (was stored in lr by caller).
There are some ARM-asembler directives used in the begining of the file to align the function-label to the next word-boundary (.align x means align to 2^x byte boundary). In general, the ARM processor should be able to handle unaligned access as well, but the Apple documents specifically state that functions have to be aligned. To know the details on why aligned access is important, read this post. “.arm” defines this as ARM-code and “.global” exposes the label as a global symbol. This is important so we can call this function from our C-code.
Some more information is given in the comments starting with “@”. One additional note: We could actual remove the “mov r0, r2” call if we had called “add r0, r0, r1” in the first place, but to learn assembly, it is not bad to have some more explicit code.
Next, we will call our defined function that returns the sum of its two arguments from our main-method in C-code. We use the following main.c for this:
#include <stdio.h> extern int asm_sum(int a, int b); int main(int argc, char *argv[]) { printf("== sum ==\n"); int a = 71; int b = 29; printf("%d + %d = %d\n", a, b, asm_sum(a, b)); return 0; }
The only thing that you will notice is that we have to define the asm_sum-method as as external symbol, as we don’t define it here in our C-code.
You can now try to compile and link both files to an executable with the following calls (the last line is already for running our executable):
arm-elf-gcc -mcpu=arm7 -O2 -g -c asmlib.s -o asmlib.o arm-elf-gcc -mcpu=arm7 -O2 -g -c main.c -o main.o arm-elf-gcc -mcpu=arm7 -o armtest *.o -lc arm-elf-run armtest
Running the program should show you the expected result; i.e. display the printf-statement in the stdout. Good job!
Debugging
Lets step through our assembly code with the debugger. First, create a file named “.gdbinit” in the folder where also the other two files are located and put in the following lines:
file armtest target sim load
This file will be loaded on startup of gdb and already set our executable, set the target architecture to the ARM simulator and load the code into memory. We can now do some simple debugging:
macbook:ARMAssembly_Part2 daniel$ arm-elf-gdb GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "--host=powerpc-apple-darwin6.8 --target=arm-elf". Connected to the simulator. Loading section .init, size 0x1c vma 0x8000 Loading section .text, size 0x8a3c vma 0x801c Loading section .fini, size 0x18 vma 0x10a58 Loading section .rodata, size 0x248 vma 0x10a70 Loading section .data, size 0x8bc vma 0x10db8 Loading section .eh_frame, size 0x4 vma 0x11674 Loading section .ctors, size 0x8 vma 0x11678 Loading section .dtors, size 0x8 vma 0x11680 Loading section .jcr, size 0x4 vma 0x11688 Start address 0x811c Transfer rate: 306272 bits in <1 sec. (gdb) break asmlib.s:9 Breakpoint 1 at 0x8228: file asmlib.s, line 9. (gdb) run Starting program: /Users/daniel/Dev/Test/ARMAssembly_Part2/armtest == sum == Breakpoint 1, asm_sum () at asmlib.s:9 9 mov r0, r2 @ Store sum (r2) in r0 which stores return-value Current language: auto; currently asm (gdb) print $r0 $1 = 71 (gdb) print $r1 $2 = 29 (gdb) print $r2 $3 = 100 (gdb) stepi 10 mov pc, lr @ Set program counter to lr (was set by caller) (gdb) continue Continuing. 71 + 29 = 100 Program exited normally. [Switching to process 0] Current language: auto; currently c (gdb)
You see, that we first define a breakpoint at line 9 of the file asmlib.s, then we “run” the executable. The simulator stops at the breakpoint. “print $regname” allows us to print the content of a register. As exepected, the function parameters one and two are stored in r0 and r1 (compare with the values set for a and b in main.c). After that, we use “stepi” to step to the next instruction and then call “continue” so the normal execution proceeded and finally ends the program. These few gdb-commands should already allow you to do some simple debugging. For more details on gdb, google is your friend 🙂
Some more instructions in a nutshell
Lets write one more assembly function to get familiar with some more instructions. How about a multiply-routine that returns for two parameter a and b the value a*b:
@ int asm_mul(int a, int b) .align 2 .arm .global asm_mul asm_mul: stmfd sp!, {r4-r11} @ in case we needed to work with more than registers r0-r3, @ have to save the first on the stack (only r0-r3 and r12 are scratch registers) @ Here, actually don't need them... mov r3, #0 @ Initialize register holding result of multiplication movs r2, r0 @ Move "a" into r2 and set status-flags (mov"s") beq asm_mul_return @ Immediately return if a==0 movs r2, r1 @ Move "b" into r2 and set status-flags (mov"s") beq asm_mul_return @ Immediately return if b==0 asm_mul_loop: add r3, r3, r0 @ r3 = r3 + r0 subs r1, r1, #1 @ r1 = r1 - 1 (decrement) bne asm_mul_loop @ If the zero-flag is not set (r1 > 0), loop once more asm_mul_return: ldmfd sp!, {r4-r11} @ Restore the registers mov r0, r3 @ Store result in r0 (return register) mov pc, lr
Please note that we could have used assembly instructions to do the multiplication for us, but this way we can recap on some instructions we have learned more easily.
The algorithm implemented is basically: result = 0; if (a == 0 or b==0) return result; else while(b>0) result = result + a; b = b – 1 and should be quiet straight-forward. Here are some points to take away though:
- “subs” not only does a substraction, but because we append the “s”, the status-register will be set; in specific, the zero-flag. If not zero (ne=not equal: a little confusing, but think of it as “if the result is non-zero, the two operands to sub must be not equal“; or have a look in the table in Section 23.3.4 of the well-known tutorial guiding this series) we branch to the label asm_mul_loop.
- You see, that the status-register can be set by almost any data-instruction; also “mov” can be appended with “s” to set it. Here, we read out again the zero-flag.
- We use stmfd to store registers r4-r11 on the stack and restore it at the end of the routine via lmfd. This, you would in general always do, if you need to use more then registers r0-r3 in your routine. As r4-r11 are no scratch-registers, it is the convention that they are the same after a functional call as before.
To test your routine, you will have to add a method-declaration for asm_mul to main.c and call it within your main-method. You can find the full sources and a Makefile of our examples on github.
Further Reading
As we will be covering function-call conventions in iOS within the next part, I recommend the following read as preparation:
- iOS ABI Function Call Guide by Apple


Love your ARM asm tutorials! Very well written and extremely helpful. Look forward to the iOS integration Part III. 🙂
jpap
July 29, 2011 at 9:16 am
It ranks up there with the likes of getting to meet Johnny Depp, the best dessert you ever ate, and your husband doing all laundry and cooking for
an entire month. Founded in March 2010, Pinterest
has become the fastest-growing social media site on the web, gaining 145% more
users since January 2012 alone. The “following” and “followers” selection operates in the same fashion as Google+ and Twitter.
pinterest
September 22, 2014 at 11:35 pm