Skip to content

init: optionally load the system SELinux policy #400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

WavyEbuilder
Copy link

Implements #399 . Currently a draft PR for some of the reasons noted in that issue. Another thing to add:

  • I saw mentioned here, Replace use of iostream-based classes #240 (comment), that you don't want to be falling back to C interfaces. For now, I'm just calling fprintf if the SELinux policy fails to load (as /dev/console is likely unable to be accessed at that point). Would you prefer std::cout to be used for now in this case?

@WavyEbuilder WavyEbuilder force-pushed the master branch 2 times, most recently from 5952f94 to 35df218 Compare October 15, 2024 20:53
@WavyEbuilder
Copy link
Author

Current TODO (just to note I am aware of it) is to link against libselinux in the Makefile.

@WavyEbuilder
Copy link
Author

I've added an --enable--selinux option to the configure script, however I am not really sure how best to go about linking against libselinux as it is an optional dependency. Any advice in this regard?

@WavyEbuilder
Copy link
Author

Think I've got that sorted by passing a linker flag.

@davmac314
Copy link
Owner

I will review properly when I get a chance. I don't know a heap about SELinux so bare with me. A couple of things I will point out now though:

  • Definitely avoid using fprintf, it's best if you can keep to the conventions already used throughout. cerr would normally be the right option for a definite error message, however, output should generally go via the log interface instead of directly via cout/cerr. Eg log(loglevel_t::ERROR, "(error message here)") possibly followed by flush_log() if this is before a call to exit().
  • I'm not a fan of the re-exec approach either. If the correct label can be assigned to the already-running executable (i.e. what systemd apparently does), it seems like we should just do that.
  • I'm inclined to think that there should either be a command-line option to disable loading the policy, or that should be the default and there should a command-line option to enable it.
  • Probably the whole setup should be moved to its own separate function

@WavyEbuilder
Copy link
Author

WavyEbuilder commented Oct 16, 2024

Hey! Appreciate the fast response.

Definitely avoid using fprintf, it's best if you can keep to the conventions already used throughout. cerr would normally be the right option for a definite error message, however, output should generally go via the log interface instead of directly via cout/cerr. Eg log(loglevel_t::ERROR, "(error message here)") possibly followed by flush_log() if this is before a call to exit().

A lot of the other init systems mentioned /dev/console not being available at that point in time (which is a reasonable assumption as if the policy fails to load there is a good chance SELinux would block access to /dev/console). I had a quite glance through dinit-log.cc and it appears like it just uses stdout - would this be correct?

I'm not a fan of the re-exec approach either. If the correct label can be assigned to the already-running executable (i.e. what systemd apparently does), it seems like we should just do that.

I'll make sure to change that to use setcon_raw(3)

I'm inclined to think that there should either be a command-line option to disable loading the policy, or that should be the default and there should a command-line option to enable it.

Generally the command-line option is provided to the kernel cmdline (which the selinux_init_load_policy parses and handles for us). The main options are enforcing=0 (force SELinux to boot in enforcing mode regardless of /etc/selinux/config) and selinux=0 (disable selinux alltogether). So as far as I can tell, all of that should be handled for us

Probably the whole setup should be moved to its own separate function

I'll make sure to do that

@WavyEbuilder
Copy link
Author

Also, would you like me to commit any changes as separate commits until you are happy with it so you can see the diffs between changes a bit easier, or would rebasing be preferred?

@WavyEbuilder
Copy link
Author

Alright took a look at this a bit more thoroughly and I have a few design questions to raise quickly, notably regarding Probably the whole setup should be moved to its own separate function:

  1. Shall we plan ahead now for other security frameworks? I'm only really an SELinux guy as I mentioned in Load security policies for LSMs #399 , but it might be worth doing similar to systemd, i.e.:
static int initialize_security(
                bool *loaded_policy,
                dual_timestamp *security_start_timestamp,
                dual_timestamp *security_finish_timestamp,
                const char **ret_error_message);

Then we could just call our implementation of a similar function once in dinit_main (ideally as early as possible, being security frameworks it makes sense to attempt to load them as early as possible). Now given that we have C++ to hand here, I was wondering how to go about propagating errors back. Being C++11, we don't have anything like std::optional, but taking a look at some functions that can return failure in dinit's source, they appear to return a bool. My original idea was something like std::optional<std::string> (and returning an error message on failure, otherwise std::nullopt), but of course that is only available in C++17. Would something like char * work here instead then maybe (returning nullptr on success... though that's a bit strange granted)? Or is the specific failure message relevant at all here and should we just return a bool to indicate success status?

  1. Regarding headers for this, I was thinking maybe something along the lines of:
    src/includes/mac-util.h // contains initialize_security, might be worth just putting it in dinit-util.h?
    src/includes/dinit-mac/selinux.h // SELinux helper functions so we could (like systemd again) just quickly call a generic setup function for each MAC framework.

I think that'd be a reasonable way of structuring things, but being my first PR to dinit and my first real work on a C++ project below 14, your input would be greatly appreciated.

Thanks!

@davmac314
Copy link
Owner

taking a look at some functions that can return failure in dinit's source, they appear to return a bool ... Would something like char * work here instead then maybe (returning nullptr on success... though that's a bit strange granted)? Or is the specific failure message relevant at all here and should we just return a bool to indicate success status?

Log any relevant message (via log(...) functions), then return a bool (usually false for failure).

Shall we plan ahead now for other security frameworks?

If we're just talking about adding a single method, I don't see any advantage in doing that now, as it can easily be done if and when support for other security frameworks are added. Let's just keep it simple.

Generally the command-line option is provided to the kernel cmdline (which the selinux_init_load_policy parses and handles for us). The main options are enforcing=0 (force SELinux to boot in enforcing mode regardless of /etc/selinux/config) and selinux=0 (disable selinux alltogether). So as far as I can tell, all of that should be handled for us

Ok, that sounds fine.

@davmac314
Copy link
Owner

Generally the command-line option is provided to the kernel cmdline (which the selinux_init_load_policy parses and handles for us). The main options are

Actually, thinking about this more, there may still be cases where SELinux is enabled but the loading of the policy should not be performed. One example is given by the Systemd code here.

Also, can you clarify what this might mean? I want to know what file descriptors SELinux would keep open and in what circumstances.

Are you doing this work for a distribution or is it a more personal endeavour?

@davmac314
Copy link
Owner

(Finally: make sure you have read CONTRIBUTING and CODE-STYLE documents, if you haven't already. Thanks!).

@WavyEbuilder
Copy link
Author

WavyEbuilder commented Oct 17, 2024

Log any relevant message (via log(...) functions), then return a bool (usually false for failure).

Got it, thanks :)

If we're just talking about adding a single method, I don't see any advantage in doing that now, as it can easily be done if and when support for other security frameworks are added. Let's just keep it simple.

Yup, makes sense, should be fairly easy with to deal in the future. Will do!

Actually, thinking about this more, there may still be cases where SELinux is enabled but the loading of the policy should not be performed. One example is given by the Systemd code here.

That's a good example. For that specific initrd case, I had a look at systemd's in_initrd(void) function and found this comment:

/* If /etc/initrd-release exists, we're in an initrd.
 * This can be overridden by setting SYSTEMD_IN_INITRD=0|1.
 */ 

Would you like to do the same with an override here? (maybe something more like DINIT_IN_INITRD)

If you'd still like to add a flag override of some sorts to tell dinit to not load the selinux policy, I feel like that should be an opt in sort of thing, because while not loading the selinux policy won't load any of the user's policy, it doesn't mean selinux won't be loaded. In that case, everything runs in the kernel context, and taking my distro (Gentoo) as an example, Portage fails to operate if I just run with the system booted in the kernel context, so that is very much an edge case and almost never intended to be a reality. But the initrd case makes sense.

Also, can you clarify what this might mean? I want to know what file descriptors SELinux would keep open and in what circumstances.

Afaik this is generally in the case of when it needs to audit something related to that fd so it keeps it open for a bit? I remember seeing something similar in the past, but I'm not exactly too happy with that answer as I can't answer it with 100% confidence, so if that's okay I'll do a bit of digging in the libselinux docs and get back to you on that.

Are you doing this work for a distribution or is it a more personal endeavour?

I'm currently using Gentoo and dinit on basically all of my systems (which doesn't have upstream Gentoo support currently), however I am hoping to try improving the support (and maybe possibly getting official support) for dinit for both Adelie and Gentoo (nothing offical, just me on my own there, though I have spoken to some developers of the respective distros about that, but absoloutly nothing really offical yet). This specific piece of work was started when I hit a few weird bugs with my SELinux policy while dealing with a few ebuilds for dinit related to SELinux (policy for it), so in the future as Gentoo has offical SELinux support it might be useful, but for now really just take it as a personal endeavour, I'm not affiliated with any distro officially :)

(Finally: make sure you have read CONTRIBUTING and CODE-STYLE documents, if you haven't already. Thanks!).

Had a read over them already, but I'll make sure to reread CODE-STYLE

@WavyEbuilder
Copy link
Author

Did a bit of digging and found this commit systemd/systemd@a3dff21 which seems to explain it quite nicely.

@WavyEbuilder
Copy link
Author

Hmm reading the setcon(3) man page:

sets  the  current  security context of the process to a new value.  Note that use of this function requires that the entire application be trusted to maintain any desired separation between the old and new security contexts, unlike exec-based transitions performed via setexeccon(3).  When possible, decompose your application and use setexeccon(3) and execve(3) instead.

Still seems fine to transition with setcon_raw(3), but it probably makes sense to make sure to do it as early as possible, before we open any other file descriptors. I'll push a new commit using setcon_raw(3) shortly.

@WavyEbuilder
Copy link
Author

I think it'll be best to consider how SELinux aware we want dinit to be at this stage. After reading some more man pages and systemd code, it seems like systemd transitioning itself to the new context is an okay option as they already make heavy use of selinux throughout (i.e. in transient units which have an SELinuxContext= option). However, if our only goal (at least in the short-term) is to load the policy, then I think it makes sense to stick with the execve(3) solution.

If we want to be a little bit more SELinux aware, (i.e. if it is not unforeseeable to make use of libselinux more throughout dinit), then I would probably start off by creating an selinux utils header of some point as there is quite a bit of setup, etc that'll need to be done.

My personal opinion would be (for the sake of simplicity) to stick with the execve(3) solution as it'd be a fair amount simpler for now. In the future, if something like the dinit-run and transient units mentioned in #253 (comment) gets implemented, we could add support for launching said units with a specific selinux context (I'd be happy to work on something like that), and at that point as that'd require a decent amount of setup anyway, it'd make sense to just use setcon_raw(3) at that point. I'm guessing the (relatively) unneeded complexity of correctly transitioning ourself to the right domain after loading the policy is why the other simpler init systems (openrc, sysvinit, etc) that don't really make use of a lot of SELinux's features just load the policy and relaunch themselves.

However, if you'd still like to continue with setting our own context, I can go down that route. It'd require a bit more design though, so it might be worth working on getting some helper functions stubbed out firstly.

@davmac314
Copy link
Owner

for now really just take it as a personal endeavour, I'm not affiliated with any distro officially :)

That's fine but it will need a commitment from you that you will support it going forward, or otherwise make it clear that it's experimental/unsupported in relevant documentation. (Incidentally I think you missed updating the build instructions - that's something that would need to be added).

To be honest I'm not sure I'm following your reasoning in a few ways:

it seems like systemd transitioning itself to the new context is an okay option as they already make heavy use of selinux throughout (i.e. in transient units which have an SELinuxContext= option)

I don't really understand why that makes a difference. What is the reason why this would not be a good option for dinit as well? (I get that Dinit doesn't provide specific support for SELinux features when executing service processes, but why does that make a difference as to the mechanics of how the policy is loaded?)

I'm fine with the policy loading happening very early in dinit's execution. But:

My personal opinion would be (for the sake of simplicity) to stick with the execve(3) solution

If we are going to call execve anyway, I'm not sure I see any point having the functionality in dinit itself. It could just as well be a wrapper (called init) that loads the policy and then execves dinit. Or am I missing something? Having dinit re-exec itself on each boot doesn't seem right to me at all.

I know you had a few other questions for me and I can go back to those, but I really need some clarification on these points. I don't want to be discouraging but it seems like there are a few details you're not really sure of yourself, and that gives me some pause. I'm hesitant to incorporate something where I really don't understand why things have been done the way they have. If there's open questions that you need to sort out, please feel free to take whatever time you need to do that, but let's get them sorted first and talk details of the code then.

@WavyEbuilder
Copy link
Author

Hey,

That's fine but it will need a commitment from you that you will support it going forward, or otherwise make it clear that it's experimental/unsupported in relevant documentation. (Incidentally I think you missed updating the build instructions - that's something that would need to be added).

I can commit to that, but would also be happy mentioning it is experimental.

I don't really understand why that makes a difference. What is the reason why this would not be a good option for dinit as well? (I get that Dinit doesn't provide specific support for SELinux features when executing service processes, but why does that make a difference as to the mechanics of how the policy is loaded?)

I think I phrased that badly earlier, I'll rephrase it a little here now. Systemd makes use of SELinux a lot inside it being quite SELinux aware so it already has a lot of boilerplate that will be used elsewhere. The reason why transitioning to the new context is a little more complex is because it requires a bit more setup, we are in a privileged domain at that point and we are sort of entrusting ourselves to do it right, so it'd require a fair amount more code I would think.

If we are going to call execve anyway, I'm not sure I see any point having the functionality in dinit itself. It could just as well be a wrapper (called init) that loads the policy and then execves dinit. Or am I missing something? Having dinit re-exec itself on each boot doesn't seem right to me at all.

An initramfs can work fine for this (and often is!) or some simpler pid 1 that's only job is to launch dinit properly with the right context, but (at least to me) that feels a little unnecessary.

I know you had a few other questions for me and I can go back to those, but I really need some clarification on these points. I don't want to be discouraging but it seems like there are a few details you're not really sure of yourself, and that gives me some pause.

That's fair, I did phrase it quite badly above. My main reasoning was based off the Let's just keep it simple. comment, so basically the only "problem" is that it's just a bit more work that we need to get right for (at least in my opinion) minimal gain. However there's nothing preventing us from doing that if you wish :) It's just a bit more code, so I thought I'd mention that.

I'm hesitant to incorporate something where I really don't understand why things have been done the way they have. If there's open questions that you need to sort out, please feel free to take whatever time you need to do that, but let's get them sorted first and talk details of the code then.

That makes sense. For now I'll just presume we're going down the route of transitioning ourselves to a new context and I'll push in a bit with an example of that and let you compare.

Incidentally I think you missed updating the build instructions - that's something that would need to be added).
I did miss that, my bad, I'll make sure to update that as well.

@davmac314
Copy link
Owner

davmac314 commented Oct 17, 2024

I think I phrased that badly earlier, I'll rephrase it a little here now. Systemd makes use of SELinux a lot inside it being quite SELinux aware so it already has a lot of boilerplate that will be used elsewhere. The reason why transitioning to the new context is a little more complex is because it requires a bit more setup, we are in a privileged domain at that point and we are sort of entrusting ourselves to do it right, so it'd require a fair amount more code I would think.

This, I guess, is what I don't understand. I can see that opening file descriptors before loading the policy might give access to things that the policy will then disallow but, if we are loading the policy quite early and we have opened file descriptors then in fact we do need those file descriptors. If the policy disallowed that access then that would be a broken policy anyway. Dinit doesn't go around just casually opening files. Likewise any other resource it has accessed, it probably needs. And anyway, as far as I can tell, applying the security label will enforce access against file descriptors that were opened previously anyway.

Eg from https://www.systutorials.com/docs/linux/man/3-setcon_raw/ -

Since access to file descriptors is revalidated upon use by SELinux, the new context must be explicitly authorized in the policy to use the descriptors opened by the old context if that is desired.

If the process was being ptraced at the time of the setcon() operation, ptrace permission will be revalidated against the new context and the setcon() will fail if it is not allowed by policy.

Given those are taken care of (and ptrace shouldn't be an issue anyway), and given that we'd be loading the policy early (before doing just about anything anyway), what would be the concerns in regards to "entrusting ourselves to do it right?" I'm after concrete examples.

An initramfs can work fine for this (and often is!) or some simpler pid 1 that's only job is to launch dinit properly with the right context, but (at least to me) that feels a little unnecessary.

To me it feels unnecessary to me add specific support (including a library dependency) for something in Dinit which can be handled just as well from outside, and re-executing our own process right after we start honestly just feels like a hack. At the moment my position on that is a "no", I would need to be given a good, concrete reason for why that should change.

Applying the security context within the already-running process without re-execing would be acceptable, though.

My main reasoning was based off the Let's just keep it simple. comment, so basically [...] It's just a bit more code, so I thought I'd mention that.

A little bit more code isn't an issue, if we have already gone as far as adding a dependency and providing support for SELinux then we may as well do it properly. My bad for the "keep it simple" comment which caused confusion - I meant, keep it restricted to the specific functionality that you are wanting to implement; we don't need abstraction layers for handling other security frameworks, etc.

But before you said:

so it'd require a fair amount more code I would think.

Is it a bit, or is it a fair amount? (or am I conflating two different things?)
(Edit: what I'm really asking is: is it a matter of say 3-4x the amount of code currently in the PR, all confined to a single function? Or are we talking sweeping changes throughout the codebase? If it's the former, that's totally fine).

@WavyEbuilder
Copy link
Author

WavyEbuilder commented Oct 18, 2024

As for entrusting ourselves to do it right, the main concern surrounds file descriptors. That is why I'm a little worried about the log interface. We don't want to leave any of our file descriptors open when we transition. Then in effect we would have a "context mismatch" otherwise, as the file descriptor inherits the context of the process it was opened by. So we would have issues regarding access control to existing contexts at that point . That's ideally why I'm thinking we load ourselves as early as possible. If we do that, we should be fine.

A good example is this. Imagine we start out with kernel context (what dinit starts out with on my system before the policy is loaded). We have some fd's opened, and the kernel context has permission to use them. Now we load the policy, transition ourselves, and the loaded policy executes us as init_t. In the Gentoo refpolicy, then that kernel context becomes kernel_t. Now SELinux will prevent us from using those open file descriptors.

(Edit: what I'm really asking is: is it a matter of say 3-4x the amount of code currently in the PR, all confined to a single function? Or are we talking sweeping changes throughout the codebase? If it's the former, that's totally fine).

Oh I see what you meant by keeping it simple now :) I was a bit confused with what you were after, but it shouldn't really require any sweeping changes for the codebase, it should all be confined to that function we'll make to load the policy, which will be the thing that's a bit longer. That's all good then, I can make the transition work.

I'll get to work on that, and if there are any concerns please let me know. Thanks for all the time you've given this so far, appreciated a lot.

@WavyEbuilder WavyEbuilder force-pushed the master branch 2 times, most recently from 5ddfc4d to d5295d9 Compare October 18, 2024 04:01
@WavyEbuilder
Copy link
Author

Alright I think I've got this working as desired now. I've just made a function selinux_transition that is above dinit_main for now so you can take a look, if you'd like me to move it let me know.

If anything is unclear/you feel any comments are needed, please let me know, and I'll make sure to add them.

@WavyEbuilder
Copy link
Author

(just updated an error message as i'm not longer using setexeccon_raw(3) and removed a close as i am no longer opening something else). Should be ready for review now

@WavyEbuilder
Copy link
Author

Just fixed another silly mistake, forgot to chance 0 and 1 to true and false as I changed the return type to a bool as you suggested. My apologies, actually ready for review now.

@davmac314
Copy link
Owner

Could you remove the "Draft:" status if it's ready? I'll get to it when I can.

A couple of things I noticed when looking quickly now:

  • could you add a command-line option to inhibit loading the security policy?
  • there is still no build documentation
  • "I can commit to that, but would also be happy mentioning it is experimental." - I don't see any mention, did you change your mind about that? Like I said, it's fine if you can commit to supporting this going forward, but otherwise it should be clearly tagged as experimental and unsupported.

Can I ask you to please go through our discussion and double-check you've addressed the things we did discuss.

One other thing I noticed now:

  • selinux_transition function should be static

Answers to some of your earlier questions:

Also, would you like me to commit any changes as separate commits until you are happy with it so you can see the diffs between changes a bit easier, or would rebasing be preferred?

You can squash commits but don't rebase onto any changes in master (if there are any), thanks.

I had a look at systemd's in_initrd(void) function and found this comment: [...] Would you like to do the same with an override here?

No, just a command line option.

@davmac314
Copy link
Owner

Hmm, I see you've re-based onto current master. Please don't do that until ready for final merge (i.e. once the PR has basically been approved). Now I can't compare the changes between the previous incarnation of the PR and the current incarnation.

Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments.

Please take care to address the specific comments, but also re-review your own PR to make sure there aren't other changes that should apply in more than one place. Also check correctness of documentation and check terminology used in new documentation for consistency with existing documentation.

@@ -298,6 +302,16 @@ There are several ways to work around this.
Service names following the \fB\-\-container\fR (\fB\-o\fR) or \fB\-\-system\-mgr\fR (\fB\-m\fR) options are not ignored.
Also, the \fB\-\-service\fR (\fB\-t\fR) option can be used to force a service name to be recognised regardless of operating mode.
.\"
.SH SELINUX SUPPORT
.LP
When running as PID 1 on a SELinux enabled machine, \fBdinit\fR will by default load the system's SELinux policy.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look at the code, it looks to me like the SELinux policy will be loaded if dinit is running as system manager and system init, but this says "when running as PID 1"? (I already pointed out a similar issue in the previous review; you should address all cases).

Isn't it the case that this happens only if Dinit has been built with SELinux support enabled?

What happens in case of various failures? Eg failure to load the policy.

.SH SELINUX SUPPORT
.LP
When running as PID 1 on a SELinux enabled machine, \fBdinit\fR will by default load the system's SELinux policy.
This behaviour can be disabled by passing the \fB\-\-disable\-selinux\-policy\fR option to dinit.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't really need to mention this, that option is already documented (also "dinit" lacks formatting).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Referring to line 308)

Comment on lines 310 to 313
When loading the SELinux policy, dinit will automatically mount a few special filesystems needed to successfully load the policy.
\fBsysfs\fR will be mounted at \fB/sys\fR, and \fBselinuxfs\fR will be mounted at \fB/sys/fs/selinux\fR.
\fBdinit\fR will not unmount either.
\fBprocfs\fR will also be mounted at \fB/proc\fR, but \fBdinit\fR will unmount it after loading the SELinux policy.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this whole section problematic. First, other than /proc, it's not really dinit that mounts the filesystems; that happens inside the SELinux library and is prone to change if the library does, so I would prefer that the documentation doesn't claim that dinit does this itself but instead makes it clear that the SELinux framework may mount filesystems (and specify /sys/ etc as examples).

If you're going to mention /proc being temporarily mounted (and I guess that it's probably a good idea to mention it) then at least be clear on the significance of this. Rather than talk about when it's mounted and unmounted just say that it is temporarily mounted in order to load the policy, that's much more concise and less confusing. But also, point out that the /proc directory must exist for this to be successful, and that the temporary mount will overmount any previously-mounted /proc until the temporary mount is removed (I guess you perhaps didn't realise this), and what the likely outcome of being unable to mount /proc will be.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Davin, I've just pushed a change for this, does the new wording seem okay? Just to clarify on my choices on a few things:

First, other than /proc, it's not really dinit that mounts the filesystems; that happens inside the SELinux library and is prone to change if the library does, so I would prefer that the documentation doesn't claim that dinit does this itself but instead makes it clear that the SELinux framework may mount filesystems (and specify /sys/ etc as examples).

I think I've managed to achieve this, but as the dinit library is linked with libselinux and hence technically the binary is mounting /sys, etc, when later specifying that Dinit mounts /proc, I opted to not use dinit but use Dinit (to refer to the project) as I think that gets across that it is not Dinit-written code mounting that - is that expressing things okay?

If you're going to mention /proc being temporarily mounted (and I guess that it's probably a good idea to mention it) then at least be clear on the significance of this. Rather than talk about when it's mounted and unmounted just say that it is temporarily mounted in order to load the policy

With regard to this, we mount /proc to transition ourselves so the later calls to getcon_raw(3), etc work, but I suppose we also do it for selinux_init_load_policy(3), but that's a weird one because that function will attempt to mount it in some codepaths, so for now I've just mentioned it mounts it in order to transition - would you like me to include loading the policy in that?

Thanks

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've managed to achieve this, but as the dinit library is linked with libselinux and hence technically the binary is mounting /sys, etc, when later specifying that Dinit mounts /proc, I opted to not use dinit but use Dinit

That's a bit of a non-sequitur isn't it? But yes, it is fine (and perhaps preferable) to refer to actions by Dinit as opposed to dinit. "dinit" is just the executable (file/command).

(However, I think the paragraph is currently worded somewhat confusing. It starts by saying that a few filesystems will be mounted "If they are not mounted already", but concludes with a sentence saying that any previously-mounted proc will be mounted over.)

As I said earlier:

I don't really want to be pulled in to make a judgement on every thing that comes up. Please get the PR to the state where you are satisfied with it and ask for review only at that stage (remember to self-review first).

I will repeat that now because it seems the message didn't get through: please hold off on asking questions one-at-a-time, complete the PR and ask for feedback at that stage. (Only ask a question in the meantime if it raises a concern about something that's going to be a lot of work to change afterwards). I'm sorry, but this is your personal itch you are scratching, you need to place more value on my time and less on your own.

Also, if you are pushing commits which do not represent the final state of the PR (ready for review), then please convert the PR to draft status (until it is actually ready). Otherwise each push triggers a request for review in my notifications, and it is not clear if the PR is ready for review or not.

src/dinit.cc Outdated
@@ -6,6 +6,7 @@
#include <cstddef>
#include <cstdlib>

#include <sys/mount.h>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a header specified by POSIX, we cannot rely on it being unconditionally available.

mentioned in this document.

## Loading the system SELinux policy
When booted as the system init system, dinit by default will attempt to load the
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"system init system" sounds weird. Say "system init" just as we do elsewhere.

When booted as the system init system, dinit by default will attempt to load the
system's SELinux policy and transition itself to a context specified by that policy
if not already done so in earlier boot (e.g. by an initramfs). This behaviour may be
disabled by passing dinit the `--disable-selinux-policy` flag.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"flag" -> command-line argument.

Comment on lines 17 to 19
If not already mounted in earlier boot (e.g. by an initramfs), dinit will mount `/sys`,
and selinuxfs (typically `/sys/fs/selinux`). This occurs before any services are started,
as loading the SELinux policy is the first thing dinit does.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments elsewhere about assigning responsibility for these actions.

It seems odd to mention this here but not also mention the temporary mounting of /proc, especially as it appears in the chart below.

src/dinit.cc Outdated
Comment on lines 488 to 492
// This function will attempt to mount /sys and /proc unconditionally, but will not bail if it
// fail to do so. /sys will remain mounted after returning, and it is possible for /sys to still
// remain mounted despite returning false. This function will attempt to unmount /proc if it was
// responsible for mounting it, but lazily unmounts it using MNT_DETACH so while /proc will be
// unavailable for new accesses, it is not guarenteed to be unmounted.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make clear where the responsibility for mounting /sys lies (and why does this mention only /sys and not /sys/fs/selinux?)

What is the point of the lazy unmount using MNT_DETACH? (either explain why it's necessary to do it that way, or don't do it that way).

@WavyEbuilder
Copy link
Author

Hmm, I see you've re-based onto current master. Please don't do that until ready for final merge (i.e. once the PR has basically been approved). Now I can't compare the changes between the previous incarnation of the PR and the current incarnation.

My apologies - would you like me to keep making commits like before and then to squash them?

@davmac314
Copy link
Owner

davmac314 commented Mar 2, 2025

My apologies - would you like me to keep making commits like before and then to squash them?

Yes, just as I said earlier:

Also, would you like me to commit any changes as separate commits until you are happy with it so you can see the diffs between changes a bit easier, or would rebasing be preferred?

You can squash commits but don't rebase onto any changes in master (if there are any), thanks.

Note that I requested even then that you don't rebase onto master.

It is trivial to see diffs between successive versions of the PR changes even if you squash the commits. It becomes difficult when you also rebase onto master.

@WavyEbuilder
Copy link
Author

Hi Davin, should be all ready for review. I think I've addressed everything mentioned in the last two reviews in all places I can find instances of them, as well as cleaned up some general formatting and a few other things (should be mentioned in commits). I've tested on a system:

  • with an initramfs
  • without an initramfs
  • missing policy (to cause policy load to fail)
    and with enforcing/permissive supplied via various methods (i.e. enforcing=0 on kernel cmdline, /etc/selinux/config).

Thanks a lot

Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't address this comment from the previous review: #400 (comment)
I'd really appreciate if you could thoroughly check that everything has been addressed before you request another review.

Regarding the documentation/comments about mounting directories, let me lay out my concerns:

  • I don't want users to be surprised that some directories get mounted automatically (seemingly by dinit)
  • However, I don't really want to suggest that Dinit is doing the mounts (other than the single case which it temporarily does): it is really SELinux that does
  • I also don't want to be in a position where Dinit documentation is actually documenting SELinux more completely/precisely than SELinux documentation itself does (especially but not solely because details of what SELinux does may change without any change in Dinit).

Some more general concerns:

  • I don't want user documentation to be developer centric (don't user developer terminology, and don't provide details about implementation that are irrelevant to users).
  • Nevertheless, the documentation needs to be clear and unambiguous, and consistent.
  • It also needs to be correct.

So for instance, where you have got:

If not already mounted in earlier boot (e.g. by an initramfs), libselinux will mount /sys, and selinuxfs (typically /sys/fs/selinux) in order to load the policy. Should the mounting of either fail, the policy load will fail. dinit will also attempt to temporarily mount /proc, and the newly mounted procfs will be mounted over an existing procfs. In order for this, and as such the initial setup of SELinux, to succeed, the /proc directory will need to exist. This occurs before any services are started, as loading the SELinux policy is the first thing dinit does.

The problems I have are:

  • it talks about an action by libselinux - that's the SELinux library, it's a developer-centric concept. Say "SELinux" or "SELinux framework" (even "SELinux library", but I would prefer one of the former two).
  • it's implying that libselinux loads the policy, but the previous paragraph says that dinit loads the policy (dinit by default will attempt to load the system's SELinux policy) - i.e. it's inconsistent. You could clear up the confusion somewhat by specifically stating that Dinit instructs the SELinux framework to load the policy, rather than saying that Dinit loads the policy itself, or by clarifying that Dinit makes use of the SELinux framework to load the policy.
  • In "dinit will also attempt to temporarily mount /proc", there's a similar confusion. Saying that dinit "will also mount" something implies that dinit already mounted something else. But it was SELinux that mounted /sys etc, as per the previous sentence - I know that's arguably an action of dinit, but there's no reason for the inconsistency at this point in the documentation. (So for example you could say "In addition, dinit itself will mount ...").
  • It says that SELinux will do things but that's not documented by SELinux documentation (is it?) and so can't necessarily be 100% relied on to always be the case. So, say "may" instead ("is known to" would've been ok, but I prefer "may"). It's SELinux's job to specify the details, not ours; if they don't, we shouldn't be any more precise than we need to be (we are not documenting SELinux, beyond what is really necessary).
  • "This occurs before any services are started, as loading the SELinux policy is the first thing dinit does." -- No, the first thing dinit does is parse its arguments. Even if loading the SELinux policy was the first thing, why say that here? It's just an unnecessary detail (that may later become incorrect). Just leave it as "This occurs before any services are started".

Some lesser issues (that I might have let go if it weren't for the other things I've highlighted above):

  • It says "If not already mounted in earlier boot" that libselinux will mount /sys, and selinuxfs (typically /sys/fs/selinux). Specifying one filesystem only by its mountpoint but the other by both type and typical mountpoint is inconsistent. Be consistent.
  • "earlier boot" isn't a thing (at least not in the way it's used here). "Earlier in the boot process" makes sense here.
  • It says dinit a lot when talking about Dinit. As we discussed recently, it's preferable to say Dinit unless there's a specific reason to say dinit.

I'm not going to go through all the other documentation/comments - please go through them yourself and apply the above concerns. There really is a high bar for the documentation quality.

(I understand that's a lot to absorb. But please give it a go, and I will give more detailed feedback/suggestions in the next iteration, if necessary).

@WavyEbuilder
Copy link
Author

Hi Davin,

Thanks for the review. Could you elaborate on what's missing for #400 (comment) please? I thought 3d04808 covered it, but I'm guessing not, just a bit unclear on what else you'd like added for that.

As for everything else, I'll try to get to that shortly - I've got a few things to work on for the next couple days so might take a little while before anything is done (and given it's a lot to digest I want to spend a bit of time on it). Also, to avoid email spam, would you like me to not push any commits until I'm ready for review or would changing the title of the PR to include draft suffice?

Thanks a lot

@davmac314
Copy link
Owner

Hi Davin,

Thanks for the review. Could you elaborate on what's missing for #400 (comment) please? I thought 3d04808 covered it, but I'm guessing not, just a bit unclear on what else you'd like added for that.

Ok, I see you added a comment which says:

// A procfs that has already been mounted on /proc (before we mounted it) may be in use.

... just above the call to umount2.

But what has the procfs that was mounted before we mounted (over) it got to do with the one that we mounted and that we are now unmounting? If the original proc is busy, we can still mount over the top of it and then unmount that new mount without doing anything special - why is MNT_DETACH being used here? That question wasn't answered.

@davmac314
Copy link
Owner

I should add:

// A procfs that has already been mounted on /proc (before we mounted it) may be in use.

The other problem with this (other than it not making sense) is the "may be in use" part. That doesn't explain the problem. What may be using it? What is the real problem you are trying to solve here? What do you believe MNT_DETACH is doing and why is that better (than not using it)?

@davmac314
Copy link
Owner

And finally! I see that you've acknowledge the prior comments, but still not actually answered the question. I'm concerned that you are going to try to answer by updating the comment in the code. What I'm wanting is to get answers to the questions so that we can establish that the code as written is correct and desirable (or not), I'm not just wanting an attempt at justification in a comment - go that route and I may still find it unsatisfactory at next review.

I forgot to answer this:

Also, to avoid email spam, would you like me to not push any commits until I'm ready for review or would changing the title of the PR to include draft suffice?

Changing the PR status to draft is fine.

@WavyEbuilder
Copy link
Author

So my logic for the MNT_DETACH is: I'm concerned about once /proc becoming available, we'll get EBUSY back when trying to umount(2) it.

MNT_DETACH (since Linux 2.4.11)
              Perform a lazy unmount: make the mount unavailable for new accesses, immediately disconnect the filesystem and all filesystems mounted below it from each other and from the mount table, and actually perform the unmount when the mount ceases to be busy.

Given MNT_DETACH umounts when the mount is no longer busy, that makes sense to me; especially as procfs can be mounted on top, so despite our successful mounting of it, another procfs may exist below it, which existing things may be trying to access.

@WavyEbuilder WavyEbuilder marked this pull request as draft April 7, 2025 18:06
@davmac314
Copy link
Owner

davmac314 commented Apr 7, 2025

Thanks for answering.

So my logic for the MNT_DETACH is: I'm concerned about once /proc becoming available, we'll get EBUSY back when trying to umount(2) it.
[...]

Given MNT_DETACH umounts when the mount is no longer busy, that makes sense to me; especially as procfs can be mounted on top, so despite our successful mounting of it, another procfs may exist below it, which existing things may be trying to access.

What existing things? Is this even a real problem currently?

Anyway, suppose that this scenario was valid - that something before Dinit started both mounted /proc and spawned some program or daemon that it then ran in parallel with Dinit itself (even though Dinit as system manager is really meant to be responsible for starting everything), and that this other program needs to access the proc filesystem.

Now there is a problem: we don't know what options proc was mounted with, and how they might affect what this secondary daemon should see when it accesses /proc.

If Dinit goes ahead and mounts proc on /proc with the default options (i.e. what is currently coded), and during the time that it is mounted some other daemons accesses /proc, it might be seeing the wrong instance of the proc filesystem (mounted with the wrong options). What's arguably worse is there's almost no indication that this will be happening, especially because the unmount will succeed even if this is happening (due to the use of MNT_DETACH).

Currently the proc mount options include hidepid=.... which would allow other process IDs to be hidden from another daemon if it were running as a non-root user. Allowing an unprivileged daemon access to a proc instance where the process IDs weren't hidden would then be a security concern (probably a minor one, but still, a concern). We also don't know that future mount options might be added that raise additional issues.

So, if it's a real concern that other processes are already running in parallel with Dinit, and that's something that we want to support, then we have a problem. And MNT_DETACH isn't solving that problem, it's just making it harder to detect. In this case the best solution is probably either to not have Dinit mount on /proc at all (and require that it's done by the initramfs prior to running dinit), or at least have a switch to disable the mounting behaviour. (You might be tempted to think it's possible to check whether /proc is already mounted and avoid overmounting it in that case, but that's also prone to a race condition: what if this other daemon that's running also wants to mount /proc?). But thinking it through, in this case where you've got any other processes running alongside Dinit when it starts, I'd argue that SELinux policy loading should be handled outside of Dinit anyway. Otherwise you've got a secondary daemon, which Dinit isn't aware of, which is going to find that a security policy has been loaded behind its back while it's running, with all the problems that entails (file descriptors becoming impossible to use, etc). Right?

On the other hand, if other processes running in parallel with Dinit are a purely theoretical concern and not something we need to support, it's ok to mount and unmount proc as is currently done (I guess, though it makes me distinctly uncomfortable regardless), but only without the use of MNT_DETACH - because it shouldn't be needed and only runs the risk of masking a situation that really should be flagged as an error.

In neither case, as I see it, should MNT_DETACH be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Importance: Normal C-dinit Things about the main parts of dinit Enhancement/New Feature Improving things or introduce new feature make Things about Dinit's Make build process meson Things about Dinit's Meson build process P-Linux Things related to Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants