Skip to content

librasan: Support patching Thumb functions #3176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 3, 2025
Merged

Conversation

wfdewith
Copy link
Contributor

Description

ARMv7 has two instruction sets: Thumb2 and ARM. ARM CPUs switch to Thumb state when they jump to a target address if the least significant bit of the address is set to 1. While libc can be compiled with ARM or Thumb instructions entirely, mixed instruction sets are also possible. This can happen when libc is compiled for ARM but has some handwritten Thumb assembly implementations for certain functions for performance reasons.

The following command shows a libc.so.6 where two symbols have an address with the least significant bit set to 1.

$ readelf --syms --wide libc.so.6 | awk '$2~/.*[13579bde]$/'
Symbol table '.dynsym' contains 2208 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
   523: 0007bb21   156 FUNC    GLOBAL DEFAULT   12 memchr@@GLIBC_2.4
   769: 0007a641   220 FUNC    GLOBAL DEFAULT   12 strlen@@GLIBC_2.4

This poses a problem with the current live-patching implementation, which assumes that all libc functions are always ARM functions. In the current situation, the live patcher writes the assembly trampoline consisting of ARM instructions into a function where Thumb instructions are expected, offset by a single byte. This obviously leads to incorrect behavior once such a function is called.

This PR fixes that by checking whether the target address points to a Thumb functions and generates a Thumb trampoline which it writes to the correct address (i.e. address - 1).

The ARMv7-A manual also says the following.

A processor in ARM state executes ARM instructions, and a processor in Thumb state executes Thumb instructions. A processor in Thumb state can enter ARM state by executing any of the following instructions: BX, BLX, or an LDR or LDM that loads the PC.

A processor in ARM state can enter Thumb state by executing any of the same instructions.

In ARMv7, a processor in ARM state can also enter Thumb state by executing an ADC, ADD, AND, ASR, BIC, EOR, LSL, LSR, MOV, MVN, ORR, ROR, RRX, RSB, RSC, SBC, or SUB instruction that has the PC as destination register and does not set the condition flags

This permits calls and returns between ARM code written for ARMv4 processors and Thumb code running on ARMv7 processors to function correctly. ARM recommends that new software uses BX or BLX instructions instead. In particular, ARM recommends that software uses BX LR to return from a procedure, not MOV PC, LR.

Which means we have to use bx ip instead of mov pc, ip in Thumb state. For consistency and following the recommendation, I've also changed the ARM trampoline to use bx ip.

Checklist

  • I have run ./scripts/precommit.sh and addressed all comments

@wfdewith
Copy link
Contributor Author

I made an attempt to write tests for this, but unfortunately, this is probably not possible, since calling a Thumb function with invalid instructions leads to a SIGABRT or SIGILL, which cannot easily be caught in a unit test. Compiling a function with Thumb instructions requires a nightly feature in Rust, but that on its own isn't a problem because we're already using nightly for tests.

#![cfg_attr(target_arch = "arm", feature(arm_target_feature))]
#[cfg(test)]
#[cfg(feature = "libc")]
#[cfg(target_arch = "arm")]
mod tests {
    use asan::{
        GuestAddr, expect_panic,
        mmap::{Mmap, MmapProt, linux::LinuxMmap},
        patch::{Patch, raw::RawPatch},
    };
    use log::info;

    #[unsafe(no_mangle)]
    #[target_feature(enable = "thumb-mode")]
    extern "C" fn test1(a1: usize, a2: usize, a3: usize, a4: usize, a5: usize, a6: usize) -> usize {
        assert_eq!(a1, 1);
        assert_eq!(a2, 2);
        assert_eq!(a3, 3);
        assert_eq!(a4, 4);
        assert_eq!(a5, 5);
        assert_eq!(a6, 6);
        return 0xdeadface;
    }

    #[unsafe(no_mangle)]
    extern "C" fn test2(a1: usize, a2: usize, a3: usize, a4: usize, a5: usize, a6: usize) -> usize {
        assert_eq!(a1, 1);
        assert_eq!(a2, 2);
        assert_eq!(a3, 3);
        assert_eq!(a4, 4);
        assert_eq!(a5, 5);
        assert_eq!(a6, 6);
        return 0xd00df00d;
    }

    #[test]
    fn test_patch() {
        let ret1 = unsafe { test1(1, 2, 3, 4, 5, 6) };
        assert_eq!(ret1, 0xdeadface);

        let ret2 = test2(1, 2, 3, 4, 5, 6);
        assert_eq!(ret2, 0xd00df00d);

        let ptest1 = test1 as *const () as GuestAddr;
        let ptest2 = test2 as *const () as GuestAddr;
        info!("pfn: {:#x}", ptest1);
        let aligned_pfn = ptest1 & !0xfff;
        info!("aligned_pfn: {:#x}", aligned_pfn);
        LinuxMmap::protect(
            aligned_pfn,
            0x4096,
            MmapProt::READ | MmapProt::WRITE | MmapProt::EXEC,
        )
        .unwrap();

        RawPatch::patch(ptest1, ptest2).unwrap();
        // This will raise SIGABRT or SIGILL
        unsafe { test1(1, 2, 3, 4, 5, 6) };
    }
}

@WorksButNotTested
Copy link
Collaborator

Cool. Sounds good. Let's ignore the negative tests (ones we expect to crash) and just make sure we have tests to show we can hook both ARM and THUMB functions successfully.

@wfdewith wfdewith force-pushed the thumb branch 2 times, most recently from 81946e1 to 42a4660 Compare April 28, 2025 17:51
@wfdewith
Copy link
Contributor Author

I was so focused on getting a failing test case that I completely forgot about actually writing a test case for the more important working cases. I have used some macros to avoid repetitiveness.

@domenukk
Copy link
Member

domenukk commented May 2, 2025

@WorksButNotTested looks good?

@WorksButNotTested
Copy link
Collaborator

I think the changes in raw.rs can be simplified a bit to...

fn patch(target: GuestAddr, destination: GuestAddr) -> Result<(), Self::Error> {
    debug!("patch - addr: {:#x}, target: {:#x}", target, destination);
    if target == destination {
        Err(RawPatchError::IdentityPatch(target))?;
    }

    let patch = Self::get_patch(destination)?;
    trace!("patch: {:02x?}", patch);

    trace!("patch: {:02x?}", patch);

    /* Mask the thumb mode indicator bit */
    #[cfg(target_arch = "arm")]
    let target = target & !1;

    let dest = unsafe { from_raw_parts_mut(target as *mut u8, patch.len()) };
    dest.copy_from_slice(&patch);
    Ok(())
}

#[cfg(target_arch = "arm")]
fn get_patch(destination: GuestAddr) -> Result<Vec<u8>, RawPatchError> {
    // ldr ip, [pc, #2]
    // bx ip
    // .long 0xdeadface

    /* If our target is in thumb mode */
    if target & 1 == 1 {
        let insns = [
            [0xdf, 0xf8, 0x02, 0xc0].to_vec(),
            [0x60, 0x47].to_vec(),
            [0xce, 0xfa, 0xad, 0xde].to_vec(),
        ];
        let addr = destination.to_ne_bytes().to_vec();
        let insns_mod = [&insns[0], &insns[1], &addr];
        Ok(insns_mod.into_iter().flatten().cloned().collect())
    } else {
        let insns = [
            [0x00, 0xc0, 0x9f, 0xe5].to_vec(),
            [0x1c, 0xff, 0x2f, 0xe1].to_vec(),
            [0xce, 0xfa, 0xad, 0xde].to_vec(),
        ];
        let addr = destination.to_ne_bytes().to_vec();
        let insns_mod = [&insns[0], &insns[1], &addr];
        Ok(insns_mod.into_iter().flatten().cloned().collect())
    }
}

@WorksButNotTested
Copy link
Collaborator

Think I've got my head around the macros in the tests now. They look good. Only thing I'd change on that is that its probably worth resetting the page permissions after and probably worth modifying two pages rather than one in case the function ends up spanning a page boundary. But I made those some mistakes in my original implementation, so I can understand why they got copied!

@wfdewith
Copy link
Contributor Author

wfdewith commented May 2, 2025

Think I've got my head around the macros in the tests now. They look good. Only thing I'd change on that is that its probably worth resetting the page permissions after and probably worth modifying two pages rather than one in case the function ends up spanning a page boundary. But I made those some mistakes in my original implementation, so I can understand why they got copied!

This makes sense. I'll also update the test_arm[123] function naming to improve clarity on what their purpose is like for example: arm_patched_to_arm, thumb_patch_target etc.

@wfdewith
Copy link
Contributor Author

wfdewith commented May 2, 2025

Should be good to go now

@wfdewith
Copy link
Contributor Author

wfdewith commented May 2, 2025

Also, the page size was set to 0x4096, while it is usually 0x1000 or 4096 😉

@WorksButNotTested
Copy link
Collaborator

Are you happy to make these changes too?
#3176 (comment)

@wfdewith
Copy link
Contributor Author

wfdewith commented May 2, 2025

I like it better, but it does require passing the target parameter to get_patch for all architectures. It is a bit more uniform like this. Note that the disassembly is slightly different for ARM and Thumb mode: Thumb mode requires an offset for the PC.

@WorksButNotTested
Copy link
Collaborator

@domenukk, @rmalmain this is good to land when the CI goes green.

// bx ip
// .long 0xdeadface
let insns = [
[0xdf, 0xf8, 0x02, 0xc0].to_vec(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these things have to be vecs? I feel like it could be some static values

@domenukk domenukk merged commit 0ddc5f1 into AFLplusplus:main May 3, 2025
104 of 108 checks passed
@wfdewith wfdewith deleted the thumb branch May 3, 2025 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants