`mrun` - MongoDB / Module Runner

⚠️⚠️⚠️DISCLAIMER⚠️⚠️⚠️

This is not an official MongoDB product. There is no support or warranty of any kind, including for any security-related issues, and this software is excluded from the MongoDB Bug Bounty program. Starting MongoDB with mrun is not supported under any circumstances.

Overview

mrun is a simple and generic tool for running modular software - including modularizing and extending legacy monolithic software.

Each module is a shared library (.so file) that receives its own set of command line arguments from mrun, and runs cooperatively inside the host program process. Modules that want to have their own stream of execution (independent from the host program) can simply start their own thread.

This approach is useful for combining multiple modules without needing to actually link them into a binary. Being able to experiment with different combinations of runtime modules is especially useful when the "glue code" between modules can also be expressed as a generic module. Or if you want to quickly investigate different ways of joining modules, different combinations of modules, or "injecting" modules into legacy monolithic programs.

One possible use case is writing a module that implements a custom MongoDB command or aggregation stage (or anything else that uses a static initialization-based registration system), and then using mrun to launch a stock/release mongod with this additional module added on (and possibly other modules as well).

Currently mrun is beta, and only supports Linux. MacOS and Windows support may come later.

Installation

Download the latest binary from the Releases page, and then put it anywhere in your $PATH, or just run it directly. It should work on any currently supported Linux distro (requires glibc >= 2.11, which was released in 2009, roughly corresponding to Ubuntu 10.04, Debian 7, or RHEL/CentOS 6).

Alternatively, compile the source as below.

Compilation

To compile the mrun binary, just run make. The build (and code) are deliberately extremely simple, with no real dependencies. Any normal C compiler (supporting C11) should be fine, eg. gcc, clang, and any standard make.

To build the tests, run make inside the test subdir.

Alternatively, to build the portable "universal" Linux binary (inside an Ubuntu 18.04 docker container), run the the build-in-docker command.

Usage

Usage:
  mrun [global options] --- <program spec> [--- <module1 spec> --- <module2 spec> --- ...]

  where <program spec> takes the form:
    <program> [program args]

  and each <module spec> takes the form:
    [module options] <modulename> [module args]

Use "---" to separate the global options, program, and modules.
The [module args] will be passed to the module's configure (or main) function.
The <modulename>:
  - Can be an absolute or relative path the .so for the module.
  - Is subject to the normal .so resolution rules (use $LD_LIBRARY_PATH to add search paths).
  - The trailing .so can be omitted.
  - If not a path to a .so, then any leading `lib` prefix can be omitted.
  - Can be an executable, if it exports `main` (and/or configure/run functions).
  - Must not start with '-'.

Global options:
  --help       Output this help text
  --version    Output the mrun version
  --debug      Enable verbose debugging output

Module options:
  --inhibit-configure Do not call the configure fn for this module
  --configure-fn      Name of the configure function for this module (default: mrun_configure)
  --inhibit-run       Do not call the run (or main) fn for this module
  --run-fn            Name of the run function for this module (default: mrun_run)
  --main-fn           Name of the main function for this module (default: mrun_main, fallback: main)

Running

The mrun command line is comprised as follows.

First come the global options. These are listed immediately after the mrun command on the command line, and apply to the entire mrun process being launched. Commonly there are no global options, but even in this case the --- is still necessary to denote the end of the global options.

Next comes the program specification, which simply specifies the program to be run, along with any command-line arguments for it.

This is followed by zero or more module specifications, comprised of these parts:

Separator: Each module spec starts with the argument ---, which indicates the end of the previous module spec (or program spec).
Module options: These are options for mrun, which control various aspects of how the module is loaded. Each option starts with -, but some take an additional argument after the option, eg. --configure-fn special_config_fn. (Note that space must be used for the argument, ie. --configure-fn=special_config_fn will not work.) It's common to not need to specify any of these options.
Module name/path: This is the name of the module. Much like the way that commands are specified on the command line, it can be any of:
- (a) an absolute path to a dynamic library or binary file.
- (b) a relative path to a dynamic library or binary file.
- (c) a dynamic library or binary filename without path, to be searched for according to the normal rules (ie. $LD_LIBRARY_PATH, rpath, system library paths).
- (d) As a special case of (c), any leading lib and/or trailing .so can be omitted, ie. libfoobar.so can more simply be referred to as just foobar.
Module arguments: Any command line arguments following the module name/path, up to --- (or the end of the command line), are passed to the module, to be used by the module however it likes. They directly correspond to the argc/argv passed to any regular program at exec() time.

After mrun has parsed the command line and determined the requested module specifications, it loads each module in the order specified on the command line. The loading itself has 3 distinct stages:

Open (static initialization): This calls dlopen() to load the shared library into the process. Any initializer functions (ie. declared with __attribute__((constructor)) will be run at this time. All listed modules are loaded in order before moving on to the next stage. Any dlopen() failure is considered an error, and mrun will abort.
Configure (explicit initialization): This calls the module's "configure function" (which defaults to being mrun_configure()), passing the module's arguments as argc and argv. This stage can be skipped for a module by specifying the module option of --inhibit-configure. The name of the configure function can be customized for a module by specifying module option --configure-fn name_of_desired_configure_function. All listed modules are configured in order before moving on to the next stage. It is not an error for a module to have no configure function. However, if a module returns a non-zero value from its configure function, then that is considered an error and mrun will abort.
Run (dynamic initialization): This calls the module's "run function" (which defaults to being mrun_run()), if present. If it's not present, then it will instead attempt to call the module's "main function" (which defaults to being mrun_main()), if present, passing the module arguments as argc and argc. If neither of these are present in the module, then it will try to run the "fallback" main function, which is main(). This stage can be skipped for a module by specifying the module option of --inhibit-run. The name of the run function can be customized for a module by specifying module option --run-fn name_of_desired_run_function, and the name of the main function can be customized with the module option --main-fn name_of_desired_main_function. All listed modules are attempted to be run in order. It is not an error for a module to have no run and/or main function. However, if a module's run or main function returns a non-zero value, then that is considered an error and mrun will abort.

This approach means that normal dynamic library resolution and symbol resolution works the same as for ordinary binaries and libraries. However, if a module expects certain symbols to be automatically provided - eg. it's intended as a plugin for some other program or library (ie. meaning that it's not appropriate to make it have a DT_NEEDED dependency on that library) - then the program or library providing those symbols must be listed earlier on the mrun command line. Otherwise, when mrun attempts to dlopen() the library, the missing symbols will cause it to fail.

Examples

Trivial case

The simplest possible use of mrun is to trivially launch an existing binary and pass arguments to it. ie. running

mrun --- foobar --arg1 arg2 --arg3

is equivalent to

foobar --arg1 arg2 --arg3

Simple case

Consider this simple program:

// a-program.c
#include <stdio.h>

void doTheThing(char* s) {
    printf("The thing is \"%s\"\n", s);
}

int main(int argc, char* argv[]) {
    printf("main()\n");
    doTheThing("main");
    return 0;
}

$ ./a-program
main()
The thing is "main"
$

A simple module for this program could look like this:

// a-module.c
void doTheThing(char* s);

int mrun_main(int argc, char* argv[]) {
    if (argc <= 1) {
        doTheThing("module");
    } else {
        for (int i = 1; i < argc; i++) {
            doTheThing(argv[i]);
        }
    }
    return 0;
}

$ mrun --- ./a-program --- ./a-module1.so
The thing is "module"
main()
The thing is "main"
$ mrun --- ./a-program --- ./a-module1.so foo bar baz
The thing is "foo"
The thing is "bar"
The thing is "baz"
main()
The thing is "main"
$

More sophisticated

A more sophisticated example of how a module can extend the behaviour of a (suitably designed) program can be found in b-program.c and b-module.c.

The program has a function register_handler(int num, handler h) which is used to associate a callback function (typedef void (*handler)(char*)) with a "handler number". The module can use this to register its own handlers. The base program registers two handlers:

void plain(char* s) {
    printf("%s\n", s);
}
void quoted(char* s) {
    printf("'%s'\n", s);
}
int main(int argc, char* argv[]) {
    register_handler(0, plain);
    register_handler(1, quoted);

Whereas the module defines handler 2:

char* prefix = "(";
char* suffix = ")";

void fancy(char* s) {
    printf("%s%s%s\n", prefix, s, suffix);
}

It also sets prefix and suffix depending on the module's command line args.

$ ./b-program 0 foobar
foobar
$ ./b-program 1 foobar
'foobar'
$ ./b-program 2 foobar
no such handler 2
$ mrun --- ./b-program 2 foobar --- ./b-module.so
(foobar)
$ mrun --- ./b-program 2 foobar --- ./b-module.so [ ]
[foobar]
$ mrun --- ./b-program 2 foobar --- ./b-module.so { }
{foobar}
$ mrun --- ./b-program 2 foobar --- ./b-module.so \|
|foobar|
$ mrun --- ./b-program 2 foobar --- ./b-module.so foo bar
foofoobarbar
$

Modules only

Sometimes you might just want to combine several cooperating modules (eg. c-module1.c and c-module2.c), without any need for a base program. In this case, you can just use the /bin/true "no-op" program:

// c-module1.c
#include <stdio.h>

int mrun_run() {
    printf("module1\n");
    return 0;
}

// c-module2.c
#include <stdio.h>

int mrun_run() {
    printf("module2\n");
    return 0;
}

$ ../mrun --- /bin/true --- ./c-module1.so --- ./c-module2.so
module1
module2
$ ../mrun --- /bin/true --- ./c-module2.so --- ./c-module1.so
module2
module1
$

Module-provided specific implementations

d-greet.c calls say_hello(), which is provided at runtime by d-english.c or d-french.c:

$ ../mrun --- true --- ./d-greet.so
(mrun): Error: ./d-greet.so: undefined symbol: say_hello
$ ../mrun --- true --- ./d-english.so --- ./d-greet.so
[english] init
[english] configure: 1 ./d-english.so
[english] run
[greet] main: 1 ./d-greet.so
Hello there
[english] fini
$ ../mrun --- true --- ./d-french.so --- ./d-greet.so
[french] init
[french] configure: 1 ./d-french.so
[french] run
[greet] main: 1 ./d-greet.so
Bonjour
[french] fini
$

Basic mongod plugins

We can apply this to a simple real example of a mongod plugin module. Here we consider the plugged_cmd sample module, which merely returns { plugged: 1 } to the db.adminCommand({ plugged_cmd: 1 }) command, and the flush_hostname_cache_cmd sample module which provides a command to flush the internal gethostname() cache (see SERVER-14893).

If we ordinarily run mongod as:

mongod --dbpath /data/db --port 12345 --fork

then we can instead run

mrun --- mongod --dbpath /data/db --port 12345 --fork --- plugged_cmd --- flush_hostname_cache_cmd

Now the mongod supports the plugged_cmd command:

$ mongosh --port 12345 --quiet --eval 'db.adminCommand({ plugged: 1 })'
{ plugged: 1, ok: 1 }

and the flushHostnameCache command:

$ hostname
myhost
$ mongosh --port 12345 --quiet --eval 'db.hostInfo().system.hostname'
myhost:12345
$ sudo hostname newhostname
$ hostname
newhostname
$ mongosh --port 12345 --quiet --eval 'db.hostInfo().system.hostname'
myhost:12345
$ mongosh --port 12345 --quiet --eval 'db.adminCommand({ flushHostnameCache: 1 })'
{ ok: 1 }
$ mongosh --port 12345 --quiet --eval 'db.hostInfo().system.hostname'
newhostname:12345

One nice benefit of implementing these commands as modules (rather than being baked into mongod) is that they will work equally well in mongos (or anything else that contains a compatible BasicCommand class interface):

mrun --- mongos --configdb ... --port 12345 --fork --- plugged_cmd --- flush_hostname_cache_cmd

Also note that the mongod or mongos binary can be a development build, or from a release package/tarball. All that's required is for any internal ABIs used by loaded modules to be compatible with those provided by the binary.

Writing modules

Basic use - static initialization

In many cases, internal registration uses static initialization, meaning that declaring an object of a certain type is all that is necessary. For example, MongoDB core server commands, aggregation stages, and MONGO_INITIALIZER make use of such a system. This means that the module's implementation can be quite simple - it just needs to declare the registration object. When the module gets loaded by mrun, the static initialization (ie. constructor) for the object will be run, causing the object to register itself. Nothing mrun-specific is required.

Non-static initialization

If you need your module to be able to run some initialization code when it gets loaded by mrun, this can be achieved by declaring an initializer function with __attribute__((constructor)), eg:

__attribute((constructor))
static void my_module_init() {
    // initialization code goes here
}

Alternatively, it may make more sense to initialize your module during the "run" stage of mrun. In this case, you can put the initialization code inside the mrun_run() function. Note that this code will run after all modules have been statically initialized. eg:

int mrun_run() {
    // initialization code goes here
    return 0;  // non-zero indicates an error
}

Note that if you're using a C++ compiler, you need to define mrun_run() with C linkage (to inhibit C++ identifier mangling), like so:

extern "C"
int mrun_run() {
    // initialization code goes here
    return 0;  // non-zero indicates an error
}

Passing arguments to modules

If you need your module to accept arguments from the command-line (to configure its behavior), this is achieved by defining a mrun_configure() function. This takes argc and argv parameters which point to the command line arguments for the module, and works the same as the similarly named traditional arguments to the main() function (with the small exception that you must rely on on argc, and cannot rely on the argv array being terminated by a NULL value). As above, this function needs C linkage. eg:

static bool verbose = false;

extern "C"
int mrun_configure(int argc, char* argv[]) {
    // process argc/argv as necessary, eg:
    if (argc > 1) {
        if (std::string(argv[1]) == "--verbose") {
            verbose = true;
        } else {
            std::cerr << "Unknown arg" << std::endl;
            return 1;  // non-zero indicates an error
        }
    }
    return 0;
}

extern "C"
int mrun_run() {
    if (verbose) {
        std::cerr << "starting running now" << std::endl;
    }
    doStuff(verbose);
    return 0;  // non-zero indicates an error
}

However, unlike main(), you should not do extensive processing or operations during this function call, ie. this is not where the "main" part of your module should run. The reason is that each module gets "configured" before each module gets "run", and so doing processing during mrun_configure() can hold up the configuration of any later modules. So your module's "main" program code, if any, should be put into mrun_run() instead. mrun_configure() should only do command line argument parsing. If your mrun_configure() would just stash the argc/argv for later processing by mrun_run(), then you can instead use mrun_main(), which takes argc/argv but runs at mrun_run time (but only if there is no mrun_run present). eg:

extern "C"
int mrun_main(int argc, char* argv[]) {
    bool verbose = false;

    if (argc > 1) {
        if (std::string(argv[1]) == "--verbose") {
            verbose = true;
        } else {
            std::cerr << "Unknown arg" << std::endl;
            return 1;  // non-zero indicates an error
        }
    }

    if (verbose) {
        std::cerr << "starting running now" << std::endl;
    }
    doStuff(verbose);

    return 0;
}

Independent thread of execution

If you need your module to run independently of any other modules and/or any main program, then you can use normal methods to start a thread in mrun_run(). eg. a module which provides an alternative socket entry point would need a thread that listens for and responds to incoming connections.

Building MongoDB modules

NOTE

Please note that this section of documentation refers to the old Scons-based build system used by MongoDB, and has not been updated for the newer Bazel-based build system. However, the general principles still apply.

Inside the main repo

To create a module inside the mongodb/mongo repo, you just need to define a library with some special flags:

env.Library(
    target='mymodule',
    source=[
        'source_for_my_module.cpp',
    ],
    LIBDEPS=[
    ],
    LIBDEPS_PRIVATE=[
    ],
    LIBDEPS_TAGS=[
        'illegal_cyclic_or_unresolved_dependencies_allowlisted',
    ],
    LIBDEPS_GLOBAL=[
    ],
)

This will build a libmymodule.so that gets installed into build/install/lib. The absolute path to this file can then be given to mrun, or it can be copied into mrun's lib dir so that it can be referenced without any path.

There is usually no need to list any LIBDEPS or LIBDEPS_PRIVATE, and so those can be empty. You should not list base MongoDB-specific libraries (eg. '$BUILD_DIR/mongo/db/commands'), because when using mrun with statically-linked release binaries, those libraries will not be able to be found - but the code will still be present in the binary. This is the reason for the 'illegal_cyclic_or_unresolved_dependencies_allowlisted' tag. (The LIBDEPS_GLOBAL is forced to be empty for a similar reason - without this, the module will have a dependency on libtcmalloc_minimal.so, which isn't present and is already baked into the static release binaries.)

Inside a compile-time build module

Rather than putting the module's source in the main repo and modifying the SConscript files, it's also very easy to do so inside a compile-time build module, ie. under src/mongo/db/modules/<foo>. Usually, each directory inside db/modules is from another location (eg. another git repo, a git submodule, copied from somewhere, or a symlink to somewhere else). However, this isn't actually necessary, and so it's possible to just create a directory under db/modules as a way of keeping everything self-contained, and avoiding the need to modify other files (eg. even if merely add a new sub-directory to a SConscript file).

To do this, you need a build.py file in the base module directory. For a self-contained mrun module, which only adds static-initalised objects, and isn't used (at compile time) by any other modules/code, this file can be very simple:

def configure(conf, env):
    pass

The only thing required in the SConscript file, apart from the actual library definition (shown above), is to import and clone the env, and tell it to use the normal server include paths:

Import([
    'env',
])

env = env.Clone()

env.InjectMongoIncludePaths()

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SConscript		SConscript
build-in-docker		build-in-docker
build.py		build.py
get_interp		get_interp
mrun.c		mrun.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`mrun` - MongoDB / Module Runner

Overview

Installation

Compilation

Usage

Running

Examples

Writing modules

Building MongoDB modules

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mongodb-labs/mrun

Folders and files

Latest commit

History

Repository files navigation

mrun - MongoDB / Module Runner

Overview

Installation

Compilation

Usage

Running

Examples

Writing modules

Building MongoDB modules

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`mrun` - MongoDB / Module Runner

Packages