Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic solution for native extensions referencing other native libraries or data files in memfs #72

Open
maxirmx opened this issue Jan 21, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@maxirmx
Copy link
Member

maxirmx commented Jan 21, 2022

Implement generic solution that can support cases described in the last bullet below the diagram .
This solution shall

  • intercept dlopen, stat, open, openat and similar system calls from native extensions and libraries loaded by native extensions recursively
  • extract files to host temp folder
  • reroute calls from memfs to temporarily extraced files

tebako

  • Packaged application (patched Ruby, tebako image) is referencing limited set of shared objects on the host system. Line 1 on the diagram.

  • Inside tebako image we may have "native extensions" .
    If a gem loads native extension using rubygems features this call is intercepted, a copy extension shared object is placed to host temp folder, all further calls to extension are routed to the copy of extension. Line 2 on the diagram.

  • Extension itself and its copy may be dynamic and reference to the host libraries and/or Ruby entries exported by tebako image. Lines 3,4 on the diagram. If we build tebako image statically it has side effect. No symbols are exported and extension can not link to Ruby entries (i.e.: line 4 is broken). So we have to build tebako image itself as a shared object (~dynamically)

  • If some ruby code uses external library through ffi or data file though sysetm calls like open (line 6 on the diagam), it does not work with ruby-packer and works with ocra. However, we have a set of hacks to support spevific gems (ffi, seven_zip_ruby, sassc, ...) These hacks extract shared libraries and data files to host temp folder and subclass supported gems in order to routes calls to copies of shared libraries and/or data files. Similar to point 2 above

Originally posted by @maxirmx in #42 (comment)

@maxirmx maxirmx added the enhancement New feature or request label Jan 21, 2022
@maxirmx
Copy link
Member Author

maxirmx commented Feb 15, 2022

Ruby package (native or extension) can call other code that reads file using system calls.
Some live examples:

  • if ruby loads extension it is actually a call to dlload that is passed to dl.so (or dl.a) that calls open and read
  • sassc gem uses native extension that opens template files from C++ code
  • metanorma uses some jars, so it calls java that does not know anything about tebako of course
  • (there are more cases)

Following ruby-packer approach tebako intercepts file IO at compile level. Simply speaking we put directives like
#define open(...) tebako_open(VA_ARGS) to appropriate files. This approach does not chain, of course. I mean that if patched file loads unpatched code this unpatched code does not know and cannot know anything about tebako.

If we use fuse filesystem (and similar windows and macos feature) the picture would be different. All file io would be interecepted by the operationg system and routed to our handlers.

You also mentioned that we can intercept system calls without fuse using the approach implemented by retrace. This is probably possible but you have to consider the implications.

  1. It is a project of very different scale. With source-levelpatching ww need to take care only about functions that are actually used by Ruby. I believe it is ~25% of Linux IO API. Furthermore we will have to intercept low-level APIs like syscall and handle implementation-specific details
  2. There is very high risk to trigger more alearts from anti-viruses. I believe that the program that intercepts all file IO is very suspicious.
  3. Source-level patching becomes redundant. Basically whatever we did before can be thrown away and we will start from scratch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant