This project aims at re-writing following functions in assembly language.
- strlen(3)
- strcpy(3)
- strcmp(3)
- strdup(3)
- read(2)
- write(2)
- and essential linked list functions and sort for fun.
Makefile is required to build a library consists of the functions.
main function is required to test these functions.
- Conforms to System V AMD64 ABI calling convention.
- Must be able to generate position independent executable.
- Requires system call error handling (i.e. errno)
- x86 instructions
- UNIX system call interface (read and write)
- Intel assembly language syntax
- file format (Mach-O, ELF) to understand the structure of the machine code
- Apple M3, macOS 15
- Intel CPU, WSL2 Ubuntu 24.04
// simple.c
int main(int argc, char** argv)
{
return argc;
compile it with clang simple.c -S -O0 --target=x86_64-apple-darwin-macho, then simple.s is generated
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 14, 5 sdk_version 14, 5
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
movl $0, -4(%rbp)
movl %edi, -8(%rbp)
movq %rsi, -16(%rbp)
movl -8(%rbp), %eax
popq %rbp
retq
.cfi_endproc
## -- End function
.subsections_via_symbols
Things I didn't know of are:
.cfi_startproc.cfi_def_cfa_offset 16.cfi_def_cfa_register 16.cfi_endproc.subsections_via_symbols
Things I know partially or can relate to something I know already are:
.section ....build_version ....globl _main.p2align ...
reference
https://sourceware.org/binutils/docs/as/CFI-directives.html
blog
https://www.imperialviolet.org/2017/01/18/cfi.html
cfi instructions is enabled by `-fasynchronous-unwind-tables` and disabled by `-fno-asynchronous-unwind-tables` compiler option.
[OS X Assembler Reference - Directives for Dead-Code Stripping](https://developer.apple.com/library/archive/documentation/DeveloperTools/Reference/Assembler/040-Assembler_Directives/asm_directives.html)\
It tells static link editor that the sections of the object file can be divided into individual blocks.
[OS X Assembler Reference - Directives for Designating the Current Section](https://developer.apple.com/library/archive/documentation/DeveloperTools/Reference/Assembler/040-Assembler_Directives/asm_directives.html)\
`.section segname, sectname [[[, type], attribute], sizeof_stub]`
segname = \_\_TEXT\_\_\
sectname = \_\_text\
type = regular\
attribute = pure_instructions
[OS X Assembler Reference - Section Types and Attributes](https://developer.apple.com/library/archive/documentation/DeveloperTools/Reference/Assembler/040-Assembler_Directives/asm_directives.html#//apple_ref/doc/uid/TP30000823-CJBIFBJG)\
regular section may contain any kind of data.
pure_instructions means this section contains nothing but machine instructions.
https://forums.developer.apple.com/forums/thread/736942
Could not find any document containing this assembler directive.
Assume it is miscellaneous directive because generated assembly code targeted to specific architecture already.
[OS X Assembler Reference - Directives for Dealing With Symbol](https://developer.apple.com/library/archive/documentation/DeveloperTools/Reference/Assembler/040-Assembler_Directives/asm_directives.html)\
`.globl symbol_name`\
This directive makes symbol_name external.
[OS X Assembler Reference - Directives for Moving the Location Counter](https://developer.apple.com/library/archive/documentation/DeveloperTools/Reference/Assembler/040-Assembler_Directives/asm_directives.html)\
`.p2align align_expression [, 1_byte_fill_expression [, max_bytes_to_fill]]`\
Align location counter to 2^{align_expression} bytes and fill space between current and next location counter with {1_byte_fill_expression}.
If bytes of the space is equal to or larger than {max_bytes_to_fill}, this directive does nothing.
NASM - Assembler Directives - Section
NASM - Output Formats - Mach Object
GAS - Sections and Relocation
GAS - .section name
Compiled simple.c with
clang simple.c -S -mllvm --x86-asm-syntax=intel -O3 --target=x86_64-apple-darwin-macho -fno-asynchronous-unwind-tables
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 14, 5 sdk_version 14, 5
.intel_syntax noprefix
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
## %bb.0:
push rbp
mov rbp, rsp
mov eax, edi
pop rbp
ret
## -- End function
.subsections_via_symbols
There are some assembler directives that are still ambiguous. Instead of trying to find what are they, start to consider NASM directives. I must write assembly code that NASM can assemble.
.sectiontext section:section .text.globlexternal symbol:global symbol_name.p2align power_of_2alignment directive:align bytes, optionaloralignb bytes
section .text
global _main
align 16
_main:
push rbp
mov rbp, rsp
mov eax, edi
pop rbp
ret
- Function parameters:
rdi, rsi, rdx, rcx, r8, r9 - Callee saved:
rbx, rsp, rbp, r12, r13, r14, r15 - Caller saved: function parameters +
r10, r11 - Return:
rax
x86_64 cheatsheet in AT&T syntax
Intel assembly instruction syntax: inst [reg0, reg1, ...]
If there are more than one register, reg0 is source register.
-
push rbp
rbpis base pointer and callee saved register.pushwill subtract size ofrbp(8 bytes) fromrspand set the top of the stack pointed byrsptorbp. -
mov rbp, rsp
Setrbptorsp. new base pointer is current stack point. -
mov eax, edi
rdicontains value ofint argcaccording to the calling convention. set return value register,eax, toint argcbefore return. -
pop rbp
Before return, restore callee savedrbpfrom the top of the stack and incrementrsp. -
ret
Near return - jump to an address located on the top of the stack and pops it from the stack.
-
Assemble command:
nasm -f macho64 simple.s -
Link command with
clang:clang simple.o -target=x86_64-apple-darwin-macho
ld: warning: no platform load command found in '/Users/hseong/assembly_practice/lib/simple_nasm.o', assuming: macOS
warning disappears if-Wl,-ld_classicis appended to the command.ld-classiccan be found in macOS man page -
Link commands with
ld:ld simple.o
error:Missing -platform_version optionld simple.o -platform_version macos 14.5 14.5
warning:no platform load command found in '/Users/hseong/assembly_practice/lib/simple_nasm.o', assuming: macOSld simple.o -platform_version macos 14.5 14.5 -ld_classic
error:dynamic executables or dylibs must link with libSystem.dylib for architecture x86_64
Check implicit linker option that working commandclang simple.o -target=x86_64-apple-darwin-macho -vpassed toldby using-vclangoption.ld simple.o -platform_version macos 14.5 14.5 -ld_classic -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -lSystem
this command works without warning and error.ld simple.o -ld_classic
this command works fine.Using
-ld_classicis necessary to avoid warning. -
Every executables generated with or without warning work as expected.
- There are mandatory targets
LIBRARY_NAME, all, clean, fclean, re. - Use
unamecommand to decide build environment (e.g. OS and architecture) and assemble & link options.
- Add tester target that compiles main function and test functions and links them with the library.
- Examine final machine code by executing
objdump --disassemble --disassembler-options=intel --syms FILENAME > FILENAME.scommand.
- clang in macOS automatically prepends global symbol with underscore.
- Could not assemble due to inconsistent symbol name. macOS requires leading underscore for global symbol but Linux does not.
- Used NASM pre-defined macro and multi-line macro to resolve the issue.
Function signature: size_t strlen(const char *s);
Returns difference of bytes between s and an address containing first null terminator.
size_tis defined asunsigned longaccording tomachine/types.hof macOS SDK.- C implementation in this machine uses LP64, which means
long intandunsigned long intare 64 bits type. - Return register will be
rax.
global strlento set it to external symbol.
constis type qualifier provided by compiler.- need to test how
constaffects generated assembly code. char*is pointer type which should be 64 bits wide, because ofptrdiff_tbeinglong int.- Register will be
rdi. according to this.
- minuend - subtrahend
- example
mov al, byte [rsp]
- For C source code, symbol
strlenis automatically prepended by_when compiled. - Use
_strlenfor assembly symbol name. - Why does this happen?
- GCC option has -fleading-underscore and its counterpart -fno-leading-underscore. clang does not have this option and it seems to use leading underscore by default.
- C compiler prepend underscore to generated mangled identifiers to avoid name collision between assembly code and c code that have same symbol. Stack overflow - What is the reason function names are prefixed with an underscore by the compiler
- NASM has preprocessor for global, local symbol prefix that is useful for mangling symbols.
leainstruction loads effective address from source to destination. Width of destination should be larger than or equal to source.- It seems like that the instruction is used when source is indirection.
Function signature: char * strcpy(char * dst, const char * src);
Return dst and copy [src, src_end) bytes to [dst, dst + src_end - src) where src_end contains first null-terminator.
retinstruction in strcpy was working as if it'sretin main function, which means the process exits if strcpy returns.mov rsp, rbp->mov rbp, rsp- The top of the stack pointed to the stack frame of main function.
Function signature: int strcmp(const char * s1, const char * s2);
Return difference between first unmatched characters from each null-terminated string.
loop:
...
cmp dl, 0
setz al
; int a = (dl == 0);
cmp cl, 0
setz al
; a = (cl == 0);
cmp dl, cl
setne al
; a = (dl != cl);
cmp al, 1
jne loop
Value of al depends on dl != cl only. assignment to a must be bitwise OR assignment.
Function signature: char * strdup(const char * s);
Returns newly allocated null-terminated string which is a copy of the parameter s.
System call functions like read(2) are libc function that wraps around assembly instructions. I need to rewrite those with special instructions that enters kernel space with higher privilege.
- macOS system libraries are linked with linker option
-lSystem -syslibroot PATH. find PATH -name 'libSystem*'command listsusr/lib/libSystem.tbdand similar names..tbdis text-based stub libraries according to a stack overflow post.- It contains actual path
/usr/lib/system/where binary libraries are found. - There are
libsystem_kernel.dylib,libsystem_platform.dylib,libsystem_pthread.dylib. platform and pthread libraries are related to concurrency and kernel library contains the others including system call such as read, write. libsystem_kernel.dylibcontains user-level assembly code that is written in Apple's custom ISA based on ARMv8-A(until M3).- user-space
readimplementation is acquired from a commandobjdump --disassemble -t libsystem_kernal.dylib
_read:
1da0: 70 00 80 d2 mov x16, #3
; system call number
1da4: 01 10 00 d4 svc #0x80
; SuperVisor Call
1da8: e3 00 00 54 b.lo 0x1dc4
; branch unsigned lower: branch if Carry = 0 (unsigned lower) in status register
1dac: fd 7b bf a9 stp x29, x30, [sp, #-16]!
; store pair of registers, x29 at [sp - 16] and x30 at [sp - 8] and set sp = sp - 16.
1db0: fd 03 00 91 mov x29, sp
; store sp
1db4: 00 03 00 94 bl _cerror
; branch and link, similar to x86 call
1db8: bf 03 00 91 mov sp, x29
; restore sp
1dbc: fd 7b c1 a8 ldp x29, x30, [sp], #16
; load pair of register
1dc0: c0 03 5f d6 ret
1dc4: c0 03 5f d6 ret
; duplicate ret instructions?
Exceptions on macOS
Arm A-profile A64 Instruction Set Architecture
Mac OS X and iOS Internals\
syscall table creation script - xnu/bsd/kern/makesyscalls.sh
syscall master file - xnu/kern/syscalls.master
interrupt(), kernel_trap(), user_trap() - xnu/osfmk/i386/trap.c
trap types - xnu/osfmk/i386/trap.h
system call classes - xnu/osfmk/mach/i386/syscall_sw.
unix_syscall() - xnu/bsd/dev/i386/systemcalls.c
read() - xnu/bsd/kern/sys_generic.c
- TSD: Thread Specific Data
- FLEH: First Level Exception Handler
- SLEH: Second Level Exception Handler
System call trap is triggered with syscall instruction and trap handler uses system call number stored in rax that determines system call to be executed.
Function signature: ssize_t read(int fd, void* buf, size_t nbyte);
returns number of bytes written to buffer, or -1 if an error occurs and global variable errno that represents error number is set.
NASM 'rel' keyword and position independent code
extern int * __error(void);
#define errno (*__error())
___error:
1bf0: 68 d0 3b d5 mrs x8, TPIDRRO_EL0
; mrs: Move to Register from Special register
; TPIDR_EL0 register contains pointer to thread-local storage
1bf4: 08 05 40 f9 ldr x8, [x8, #8]
; load thread-local variable
1bf8: 09 02 00 f0 adrp x9, 67 ; 0x44000
; set x9 to address(0x44000) of page the label belongs to
1bfc: 29 01 02 91 add x9, x9, #128
1c00: 1f 01 00 f1 cmp x8, #0
1c04: 20 01 88 9a csel x0, x9, x8, eq
; csel: Conditional Select
; x0 = eq ? x9 : x8
1c08: c0 03 5f d6 ret
According to syscall functions in xnu/bsd/dev/i386/systemcalls.c, carry bit in FLAGS register is set if an error has occurred during system call. Check carry bit to decide whether return the number of byte read or set errno and return -1.
Linux kernel x86_64 system call table - linux/arch/x86/entry/syscalls/syscall_64.tbl
Linux x86_64 system call return value is stored at rax. If it has negative value, it is system call error and equals to -errno.
Function signature: int atoi_base(char* str, char* base);
convert str to integer using base representing base of n-decimal number.
With base = "01", str will be interpreted as binary.
With base = "0123456789abcdef", str will be interpreted as hexadecimal number.
Many registers and function calls are required to implement complex logic. In this case, caller-saved registers need to be stored to the stack frequently. To avoid this, It should use callee-saved registers for persistent variables such as count, sum, etc.
push rbp
mov rbp, rsp
sub rsp, 8
mov [rsp], rbx
xor rbx, rbx ; use rbx to store count
...
mov rax, rbx ; return count
mov rbx, [rsp]
mov rsp, rbp
pop rbp
ret
minimize usage of cmp and jump instructions.
// is_space.c
int is_space(char c)
{
return (c >= 9 && c <= 13 || c == 32);
}
// is_space.s
...
mov rax, 1 ; a = 1;
shr rax, rdx ; a <<= c;
mov rdx, 0x100000000 ; d = (1 << 32) | (0b11111 << 9);
and rax, rdx ; a = d & c;
...
Function signature: void list_sort(t_list** head, int (*cmp)(void*, void*));
sort linked list
- instruction step over failed (Could not create return address breakpoint. Return address did not point to executable memory).
- executable stack Gentoo Wiki - GNU stack Stack overflow post
- PLT (Procedure Linkage Table) for PIC
- NASM - ELF
A warning occurs when -Wall option is applied:
warning: 32-bit relative section-crossing relocation [-w+reloc-rel-dword]
while NASM assembles files that calls external functions such as malloc, free, ___errno.
- NASM appendix A Relative relocation that could not be resolved at assembly time was generated in the output format.
- Optimization is necessary to compile linked list merge sort written in C and compare it with assembly version.
- But list_sort does not work if optimization option
-O1or higher is applied. - Found a bug that
raxwas placed whereeaxmust be.