This is not an official MongoDB product. There is no support or warranty of any kind, including for any security-related issues, and this software is excluded from the MongoDB Bug Bounty program. Starting MongoDB with mrun is not supported under any circumstances.
mrun
is a simple and generic tool for running modular software - including modularizing and extending legacy monolithic software.
Each module is a shared library (.so
file) that receives its own set of command line arguments from mrun, and runs cooperatively inside the host program process. Modules that want to have their own stream of execution (independent from the host program) can simply start their own thread.
This approach is useful for combining multiple modules without needing to actually link them into a binary. Being able to experiment with different combinations of runtime modules is especially useful when the "glue code" between modules can also be expressed as a generic module. Or if you want to quickly investigate different ways of joining modules, different combinations of modules, or "injecting" modules into legacy monolithic programs.
One possible use case is writing a module that implements a custom MongoDB command or aggregation stage (or anything else that uses a static initialization-based registration system), and then using mrun to launch a stock/release mongod
with this additional module added on (and possibly other modules as well).
Currently mrun is beta, and only supports Linux. MacOS and Windows support may come later.
Download the latest binary from the Releases page, and then put it anywhere in your $PATH
, or just run it directly. It should work on any currently supported Linux distro (requires glibc >= 2.11, which was released in 2009, roughly corresponding to Ubuntu 10.04, Debian 7, or RHEL/CentOS 6).
Alternatively, compile the source as below.
To compile the mrun
binary, just run make
. The build (and code) are deliberately extremely simple, with no real dependencies. Any normal C compiler (supporting C11) should be fine, eg. gcc
, clang
, and any standard make
.
To build the tests, run make
inside the test
subdir.
Alternatively, to build the portable "universal" Linux binary (inside an Ubuntu 18.04 docker container), run the the build-in-docker
command.
Usage:
mrun [global options] --- <program spec> [--- <module1 spec> --- <module2 spec> --- ...]
where <program spec> takes the form:
<program> [program args]
and each <module spec> takes the form:
[module options] <modulename> [module args]
Use "---" to separate the global options, program, and modules.
The [module args] will be passed to the module's configure (or main) function.
The <modulename>:
- Can be an absolute or relative path the .so for the module.
- Is subject to the normal .so resolution rules (use $LD_LIBRARY_PATH to add search paths).
- The trailing .so can be omitted.
- If not a path to a .so, then any leading `lib` prefix can be omitted.
- Can be an executable, if it exports `main` (and/or configure/run functions).
- Must not start with '-'.
Global options:
--help Output this help text
--version Output the mrun version
--debug Enable verbose debugging output
Module options:
--inhibit-configure Do not call the configure fn for this module
--configure-fn Name of the configure function for this module (default: mrun_configure)
--inhibit-run Do not call the run (or main) fn for this module
--run-fn Name of the run function for this module (default: mrun_run)
--main-fn Name of the main function for this module (default: mrun_main, fallback: main)
The mrun
command line is comprised as follows.
First come the global options. These are listed immediately after the mrun
command on the command line, and apply to the entire mrun process being launched. Commonly there are no global options, but even in this case the ---
is still necessary to denote the end of the global options.
Next comes the program specification, which simply specifies the program to be run, along with any command-line arguments for it.
This is followed by zero or more module specifications, comprised of these parts:
-
Separator: Each module spec starts with the argument
---
, which indicates the end of the previous module spec (or program spec). -
Module options: These are options for mrun, which control various aspects of how the module is loaded. Each option starts with
-
, but some take an additional argument after the option, eg.--configure-fn special_config_fn
. (Note that space must be used for the argument, ie.--configure-fn=special_config_fn
will not work.) It's common to not need to specify any of these options. -
Module name/path: This is the name of the module. Much like the way that commands are specified on the command line, it can be any of:
-
(a) an absolute path to a dynamic library or binary file.
-
(b) a relative path to a dynamic library or binary file.
-
(c) a dynamic library or binary filename without path, to be searched for according to the normal rules (ie.
$LD_LIBRARY_PATH
, rpath, system library paths). -
(d) As a special case of (c), any leading
lib
and/or trailing.so
can be omitted, ie.libfoobar.so
can more simply be referred to as justfoobar
.
-
-
Module arguments: Any command line arguments following the module name/path, up to
---
(or the end of the command line), are passed to the module, to be used by the module however it likes. They directly correspond to theargc
/argv
passed to any regular program atexec()
time.
After mrun has parsed the command line and determined the requested module specifications, it loads each module in the order specified on the command line. The loading itself has 3 distinct stages:
-
Open (static initialization): This calls
dlopen()
to load the shared library into the process. Any initializer functions (ie. declared with__attribute__((constructor))
will be run at this time. All listed modules are loaded in order before moving on to the next stage. Anydlopen()
failure is considered an error, and mrun will abort. -
Configure (explicit initialization): This calls the module's "configure function" (which defaults to being
mrun_configure()
), passing the module's arguments asargc
andargv
. This stage can be skipped for a module by specifying the module option of--inhibit-configure
. The name of the configure function can be customized for a module by specifying module option--configure-fn name_of_desired_configure_function
. All listed modules are configured in order before moving on to the next stage. It is not an error for a module to have no configure function. However, if a module returns a non-zero value from its configure function, then that is considered an error and mrun will abort. -
Run (dynamic initialization): This calls the module's "run function" (which defaults to being
mrun_run()
), if present. If it's not present, then it will instead attempt to call the module's "main function" (which defaults to beingmrun_main()
), if present, passing the module arguments asargc
andargc
. If neither of these are present in the module, then it will try to run the "fallback" main function, which ismain()
. This stage can be skipped for a module by specifying the module option of--inhibit-run
. The name of the run function can be customized for a module by specifying module option--run-fn name_of_desired_run_function
, and the name of the main function can be customized with the module option--main-fn name_of_desired_main_function
. All listed modules are attempted to be run in order. It is not an error for a module to have no run and/or main function. However, if a module's run or main function returns a non-zero value, then that is considered an error and mrun will abort.
This approach means that normal dynamic library resolution and symbol resolution works the same as for ordinary binaries and libraries. However, if a module expects certain symbols to be automatically provided - eg. it's intended as a plugin for some other program or library (ie. meaning that it's not appropriate to make it have a DT_NEEDED
dependency on that library) - then the program or library providing those symbols must be listed earlier on the mrun command line. Otherwise, when mrun attempts to dlopen()
the library, the missing symbols will cause it to fail.
Trivial case
The simplest possible use of mrun is to trivially launch an existing binary and pass arguments to it. ie. running
mrun --- foobar --arg1 arg2 --arg3
is equivalent to
foobar --arg1 arg2 --arg3
Simple case
Consider this simple program:
// a-program.c
#include <stdio.h>
void doTheThing(char* s) {
printf("The thing is \"%s\"\n", s);
}
int main(int argc, char* argv[]) {
printf("main()\n");
doTheThing("main");
return 0;
}
$ ./a-program
main()
The thing is "main"
$
A simple module for this program could look like this:
// a-module.c
void doTheThing(char* s);
int mrun_main(int argc, char* argv[]) {
if (argc <= 1) {
doTheThing("module");
} else {
for (int i = 1; i < argc; i++) {
doTheThing(argv[i]);
}
}
return 0;
}
$ mrun --- ./a-program --- ./a-module1.so
The thing is "module"
main()
The thing is "main"
$ mrun --- ./a-program --- ./a-module1.so foo bar baz
The thing is "foo"
The thing is "bar"
The thing is "baz"
main()
The thing is "main"
$
More sophisticated
A more sophisticated example of how a module can extend the behaviour of a (suitably designed) program can be found in b-program.c and b-module.c.
The program has a function register_handler(int num, handler h)
which is used to associate a callback function (typedef void (*handler)(char*)
) with a "handler number". The module can use this to register its own handlers. The base program registers two handlers:
void plain(char* s) {
printf("%s\n", s);
}
void quoted(char* s) {
printf("'%s'\n", s);
}
int main(int argc, char* argv[]) {
register_handler(0, plain);
register_handler(1, quoted);
Whereas the module defines handler 2:
char* prefix = "(";
char* suffix = ")";
void fancy(char* s) {
printf("%s%s%s\n", prefix, s, suffix);
}
It also sets prefix
and suffix
depending on the module's command line args.
$ ./b-program 0 foobar
foobar
$ ./b-program 1 foobar
'foobar'
$ ./b-program 2 foobar
no such handler 2
$ mrun --- ./b-program 2 foobar --- ./b-module.so
(foobar)
$ mrun --- ./b-program 2 foobar --- ./b-module.so [ ]
[foobar]
$ mrun --- ./b-program 2 foobar --- ./b-module.so { }
{foobar}
$ mrun --- ./b-program 2 foobar --- ./b-module.so \|
|foobar|
$ mrun --- ./b-program 2 foobar --- ./b-module.so foo bar
foofoobarbar
$
Modules only
Sometimes you might just want to combine several cooperating modules (eg. c-module1.c and c-module2.c), without any need for a base program. In this case, you can just use the /bin/true
"no-op" program:
// c-module1.c
#include <stdio.h>
int mrun_run() {
printf("module1\n");
return 0;
}
// c-module2.c
#include <stdio.h>
int mrun_run() {
printf("module2\n");
return 0;
}
$ ../mrun --- /bin/true --- ./c-module1.so --- ./c-module2.so
module1
module2
$ ../mrun --- /bin/true --- ./c-module2.so --- ./c-module1.so
module2
module1
$
Module-provided specific implementations
d-greet.c calls say_hello()
, which is provided at runtime by d-english.c or d-french.c:
$ ../mrun --- true --- ./d-greet.so
(mrun): Error: ./d-greet.so: undefined symbol: say_hello
$ ../mrun --- true --- ./d-english.so --- ./d-greet.so
[english] init
[english] configure: 1 ./d-english.so
[english] run
[greet] main: 1 ./d-greet.so
Hello there
[english] fini
$ ../mrun --- true --- ./d-french.so --- ./d-greet.so
[french] init
[french] configure: 1 ./d-french.so
[french] run
[greet] main: 1 ./d-greet.so
Bonjour
[french] fini
$
Basic mongod plugins
We can apply this to a simple real example of a mongod
plugin module. Here we consider the plugged_cmd
sample module, which merely returns { plugged: 1 }
to the db.adminCommand({ plugged_cmd: 1 })
command, and the flush_hostname_cache_cmd
sample module which provides a command to flush the internal gethostname()
cache (see SERVER-14893).
If we ordinarily run mongod as:
mongod --dbpath /data/db --port 12345 --fork
then we can instead run
mrun --- mongod --dbpath /data/db --port 12345 --fork --- plugged_cmd --- flush_hostname_cache_cmd
Now the mongod supports the plugged_cmd
command:
$ mongosh --port 12345 --quiet --eval 'db.adminCommand({ plugged: 1 })'
{ plugged: 1, ok: 1 }
and the flushHostnameCache
command:
$ hostname
myhost
$ mongosh --port 12345 --quiet --eval 'db.hostInfo().system.hostname'
myhost:12345
$ sudo hostname newhostname
$ hostname
newhostname
$ mongosh --port 12345 --quiet --eval 'db.hostInfo().system.hostname'
myhost:12345
$ mongosh --port 12345 --quiet --eval 'db.adminCommand({ flushHostnameCache: 1 })'
{ ok: 1 }
$ mongosh --port 12345 --quiet --eval 'db.hostInfo().system.hostname'
newhostname:12345
One nice benefit of implementing these commands as modules (rather than being baked into mongod
) is that they will work equally well in mongos
(or anything else that contains a compatible BasicCommand
class interface):
mrun --- mongos --configdb ... --port 12345 --fork --- plugged_cmd --- flush_hostname_cache_cmd
Also note that the mongod
or mongos
binary can be a development build, or from a release package/tarball. All that's required is for any internal ABIs used by loaded modules to be compatible with those provided by the binary.
Basic use - static initialization
In many cases, internal registration uses static initialization, meaning that declaring an object of a certain type is all that is necessary. For example, MongoDB core server commands, aggregation stages, and MONGO_INITIALIZER
make use of such a system. This means that the module's implementation can be quite simple - it just needs to declare the registration object. When the module gets loaded by mrun, the static initialization (ie. constructor) for the object will be run, causing the object to register itself. Nothing mrun-specific is required.
Non-static initialization
If you need your module to be able to run some initialization code when it gets loaded by mrun, this can be achieved by declaring an initializer function with __attribute__((constructor))
, eg:
__attribute((constructor))
static void my_module_init() {
// initialization code goes here
}
Alternatively, it may make more sense to initialize your module during the "run" stage of mrun. In this case, you can put the initialization code inside the mrun_run()
function. Note that this code will run after all modules have been statically initialized. eg:
int mrun_run() {
// initialization code goes here
return 0; // non-zero indicates an error
}
Note that if you're using a C++ compiler, you need to define mrun_run()
with C linkage (to inhibit C++ identifier mangling), like so:
extern "C"
int mrun_run() {
// initialization code goes here
return 0; // non-zero indicates an error
}
Passing arguments to modules
If you need your module to accept arguments from the command-line (to configure its behavior), this is achieved by defining a mrun_configure()
function. This takes argc
and argv
parameters which point to the command line arguments for the module, and works the same as the similarly named traditional arguments to the main()
function (with the small exception that you must rely on on argc
, and cannot rely on the argv
array being terminated by a NULL
value). As above, this function needs C linkage. eg:
static bool verbose = false;
extern "C"
int mrun_configure(int argc, char* argv[]) {
// process argc/argv as necessary, eg:
if (argc > 1) {
if (std::string(argv[1]) == "--verbose") {
verbose = true;
} else {
std::cerr << "Unknown arg" << std::endl;
return 1; // non-zero indicates an error
}
}
return 0;
}
extern "C"
int mrun_run() {
if (verbose) {
std::cerr << "starting running now" << std::endl;
}
doStuff(verbose);
return 0; // non-zero indicates an error
}
However, unlike main()
, you should not do extensive processing or operations during this function call, ie. this is not where the "main" part of your module should run. The reason is that each module gets "configured" before each module gets "run", and so doing processing during mrun_configure()
can hold up the configuration of any later modules. So your module's "main" program code, if any, should be put into mrun_run()
instead. mrun_configure()
should only do command line argument parsing. If your mrun_configure()
would just stash the argc
/argv
for later processing by mrun_run()
, then you can instead use mrun_main()
, which takes argc
/argv
but runs at mrun_run
time (but only if there is no mrun_run
present). eg:
extern "C"
int mrun_main(int argc, char* argv[]) {
bool verbose = false;
if (argc > 1) {
if (std::string(argv[1]) == "--verbose") {
verbose = true;
} else {
std::cerr << "Unknown arg" << std::endl;
return 1; // non-zero indicates an error
}
}
if (verbose) {
std::cerr << "starting running now" << std::endl;
}
doStuff(verbose);
return 0;
}
Independent thread of execution
If you need your module to run independently of any other modules and/or any main program, then you can use normal methods to start a thread in mrun_run()
. eg. a module which provides an alternative socket entry point would need a thread that listens for and responds to incoming connections.
NOTE
Please note that this section of documentation refers to the old Scons-based build system used by MongoDB, and has not been updated for the newer Bazel-based build system. However, the general principles still apply.
Inside the main repo
To create a module inside the mongodb/mongo repo, you just need to define a library with some special flags:
env.Library(
target='mymodule',
source=[
'source_for_my_module.cpp',
],
LIBDEPS=[
],
LIBDEPS_PRIVATE=[
],
LIBDEPS_TAGS=[
'illegal_cyclic_or_unresolved_dependencies_allowlisted',
],
LIBDEPS_GLOBAL=[
],
)
This will build a libmymodule.so
that gets installed into build/install/lib
. The absolute path to this file can then be given to mrun, or it can be copied into mrun's lib
dir so that it can be referenced without any path.
There is usually no need to list any LIBDEPS
or LIBDEPS_PRIVATE
, and so those can be empty. You should not list base MongoDB-specific libraries (eg. '$BUILD_DIR/mongo/db/commands'
), because when using mrun with statically-linked release binaries, those libraries will not be able to be found - but the code will still be present in the binary. This is the reason for the 'illegal_cyclic_or_unresolved_dependencies_allowlisted'
tag. (The LIBDEPS_GLOBAL
is forced to be empty for a similar reason - without this, the module will have a dependency on libtcmalloc_minimal.so
, which isn't present and is already baked into the static release binaries.)
Inside a compile-time build module
Rather than putting the module's source in the main repo and modifying the SConscript
files, it's also very easy to do so inside a compile-time build module, ie. under src/mongo/db/modules/<foo>
. Usually, each directory inside db/modules
is from another location (eg. another git repo, a git submodule
, copied from somewhere, or a symlink to somewhere else). However, this isn't actually necessary, and so it's possible to just create a directory under db/modules
as a way of keeping everything self-contained, and avoiding the need to modify other files (eg. even if merely add a new sub-directory to a SConscript
file).
To do this, you need a build.py
file in the base module directory. For a self-contained mrun module, which only adds static-initalised objects, and isn't used (at compile time) by any other modules/code, this file can be very simple:
def configure(conf, env):
pass
The only thing required in the SConscript
file, apart from the actual library definition (shown above), is to import and clone the env, and tell it to use the normal server include paths:
Import([
'env',
])
env = env.Clone()
env.InjectMongoIncludePaths()