-
-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocate PHT & SHT at the end of the *.elf file #544
Conversation
173b14f
to
da4b0b0
Compare
Btw, I've got another merge request in the works, one that fixes #75 and hopefully will be able to handle #520 as well, but I've ultimately decided to move it out of this pull request, since it's technically a separate changeset (and less important, at least from my point of view) -- I'll prepare it in the upcoming weeks. |
Another issue is that Definitely seems like a |
5f982f1
to
8dad294
Compare
@Patryk27 rebased CI and it looks like this causes several regressions on non-x86 platforms. |
It should be possible to reproduce this locally using |
Status: got patch v2.0 in the works - a bit different, simpler approach. Hopefully will submit it in a couple of hours. |
Nope, still fails on other architectures (I'm testing using binfmt) - currently hitting |
Got it! -- but the situation is... messy. It seems that my fix is actually correct for all of the platforms - the tests fail due to a bug in qemu-user that stems from a bug in Linux. You see, this bug mentioned in here, the reason why we try to keep PHT at the beginning: https://github.com/NixOS/patchelf/blob/769337c227799aa60911562b6940530f4a86eb3c/BUGS ... got fixed relatively recent, just in 2022-04: https://lore.kernel.org/lkml/[email protected]/ qemu-user has a similar code, but they haven't backported this fix yet:
... and so patchelf with my fix generates correct *.elfs for native (or fully emulated) platforms - those *.elfs just seem broken for qemu-user and kernels before 5.15. I have confirmed this suspicion by running a fully-emulated arm64 Ubuntu - with my fix, patchelf's tests fail on qemu-user (aka binfmt), but they pass on the fully emulated machine. I think the most sane way out of this mess is to alter the logic so that patchelf tries to keep PHT at the beginning of the file by relocating other sections, as it does right now, and only upon finding an unmovable section (one that would overlap PHT), it would reallocate PHT instead. If this PHT-relocation happens, the output *.elf won't work in qemu-user or kernels before 5.15, but I think:
What do you think, @Mic92? |
If we can skip tests if they run in qemu emulation, that would be nice. Patchelf is full of platform assumption and we would degraded it very quickly without proper testing. I will try to add some native arm64 ci here, so we can test that part. We should cover at least amd64/arm64 with native builds in this case. |
f0963eb
to
f901e39
Compare
Alright, @Mic92 - v3.0 is ready! I've adjusted the logic so that we keep PHDR at the beginning of the file for as long as it's possible - if we detect that the new PHDR would overflow another section, only then we relocate it at the end of the file. This makes the ELFs generated by patchelf compatible with both the older and newer kernels (unless we have to launch the new logic, e.g. for the pCloud's NodeJS library - those ELFs would be incompatible with the older kernels; but there's no other solution here). I still need to write an extra test, though - for now I've just manually checked that the pCloud launches when patchelf-ed with my approach here. |
This from the test suite on x86_64. |
Huh, interesting - all tests pass on my machine. I’ll take a look. |
@Patryk27 And you say that cherry-picking the qemu patch, would fix the other tests? I could potentially cross compile patchelf in that case. I think we have now basic toolchain for all these targets. |
Drive by comment here, thanks for making this change! There's another important reason to try to keep the PHDRs at the beginning of the file: by default, Linux only saves the first page of an executable or shared library in core dumps, and debuggers need the PHDRs and build ID from a core dump to identify the exact binary that was in use. So if the PHDRs aren't in the first page, debuggers won't be able to find what they need. That was the motivation for the binutils patch (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=bf6d7087de0a7351fd1dfd5f41522a7f4f576180) that caused #568 and NixOS/nixpkgs#362211, which I assume this also fixes? |
With my current approach the cherry-pick is not required, since we generate ELFs compatible with both kernels (and after I add a test covering the new behavior, I'll enable it just for x86_64, to avoid going through qemu-user). |
@Patryk27 Ok, but why do all emulated tests fail than a couple of tests each. |
Ha, seems I got it this time! cc @Mic92 |
Closes #531.
Closes #482.
Closes #244.
Upstream-wise, affects NixOS/nixpkgs#226339.
(didn't want to write
close
so that merging this merge request doesn't close that issue at once)Abstract
Patching an *.elf file might require extending its program header table, which can then cause it to run out of its originally allocated space (both in terms of file offset and virtual memory).
Currently patchelf solves this problem by finding which sections would overlap PHT and then by moving them to the end of *.elf file:
patchelf/src/patchelf.cc
Line 832 in 7c2f768
As compared to similar logic for binaries:
patchelf/src/patchelf.cc
Line 964 in 7c2f768
... the logic for libraries is missing a crucial check: it doesn't take into account whether that particular section can be shuffled around - in particular, sections with
SHT_PROGBITS
can't!As luck would have it,
libnode.so
(e.g. shipped together withpcloud
) does have.rodata
(a section withSHT_PROGBITS
active) right at the beginning of the file:... which patchelf happily moves around, breaking RIP-relative addressing in the assembly (which, after patching, tries to access the
ZZZZ
-ed memory).This commit fixes the issue by changing the logic from:
... to, perhaps a bit more wasteful in terms of storage:
As far as I've checked, the reason why PHT was so strictly kept at the beginning was an old Linux bug:
patchelf/src/patchelf.cc
Line 857 in 7c2f768
patchelf/BUGS
Line 1 in 7c2f768
... which is not present anymore (not sure when precisely was it fixed, though - the original entry in the BUGS file is dated 2005).
Seizing the day, I'm also including another fix (for binaries), so that merging this pull request will solve all pcloud-related problems.