Skip to content

Limit stack walking in the java agent to frames before the call to AccessController.doPrivileged #17894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cwperks
Copy link
Member

@cwperks cwperks commented Apr 10, 2025

Description

This PR seeks to resolve the issue described here. After the merge of the Java agent (replacement for JSM), its been necessary to add new grants to different policy files (for example this) and one possible reason is that the java agent is walking more of the stack then the Java Security Manager did previously.

The purpose of AccessController.doPrivileged() is to limit the stack for the java security manager when traversing the ProtectionDomains to see if all callers in the stack are granted permission to do the privileged action.

I verified the changes in this PR by using the job-scheduler and tested with FileChannel.open interception.

Testing steps:

  1. Publish the changes in this PR with ./gradlew publishToMavenLocal -Dbuild.snapshot=true -Dbuild.version_qualifier=beta1
  2. In job scheduler, modify createComponents with the following:
String readAFile() throws IOException {
    Path path = Paths.get("/etc/zshrc");
    try (FileChannel channel = FileChannel.open(path, StandardOpenOption.READ)) {
        // Allocate a buffer with the size of the file
        ByteBuffer buffer = ByteBuffer.allocate((int) channel.size());
        channel.read(buffer);
        buffer.flip();

        // Convert buffer contents to string
        return StandardCharsets.UTF_8.decode(buffer).toString();
    }
}

@Override
public Collection<Object> createComponents(
    Client client,
    ClusterService clusterService,
    ThreadPool threadPool,
    ResourceWatcherService resourceWatcherService,
    ScriptService scriptService,
    NamedXContentRegistry xContentRegistry,
    Environment environment,
    NodeEnvironment nodeEnvironment,
    NamedWriteableRegistry namedWriteableRegistry,
    IndexNameExpressionResolver indexNameExpressionResolver,
    Supplier<RepositoriesService> repositoriesServiceSupplier
) {
    this.lockService = new LockService(client, clusterService);
    this.jobDetailsService = new JobDetailsService(client, clusterService, this.indicesToListen, this.indexToJobProviders);
    this.scheduler = new JobScheduler(threadPool, this.lockService);
    this.sweeper = initSweeper(
        environment.settings(),
        client,
        clusterService,
        threadPool,
        xContentRegistry,
        this.scheduler,
        this.lockService,
        this.jobDetailsService
    );
    clusterService.addListener(this.sweeper);
    clusterService.addLifecycleListener(this.sweeper);

    AccessController.doPrivileged(new PrivilegedAction<Object>() {
        @Override
        public Object run() {
            try {
                System.out.println("readAFile: " + readAFile());
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
            return null;
        }
    });

    return List.of(this.lockService);
}
  1. Run an integ test. For example ./gradlew integTest --tests JobSchedulerPluginIT.testPluginsAreInstalled -i

The test will fail without the policy grant. Test this with the call to readAFile both 1) wrapped by AccessController.doPrivileged()` and 2) without wrapping.

In that case for 2) without this change in core you will see that it fails with the following:

»  java.lang.SecurityException: Denied OPEN (read) access to file: /etc/zshrc, domain: ProtectionDomain  (file:/Users/cwperx/Projects/opensearch/job-scheduler/build/testclusters/integTest-0/distro/3.0.0-INTEG_TEST/lib/opensearch-3.0.0-beta1-SNAPSHOT.jar <no signer certificates>)

Its clearly walking up the stack to check for a grant in a different ProtectionDomain, whereas when wrapping in AccessController.doPrivileged it will not walk up to that point in the stack.

Related Issues

Resolves #17884 (comment)

Separate Issue

Separately, there is an issue with the FileInterceptor.

If I change the implementation of readAFile to this:

String readAFile() throws IOException {
    return Files.readAllLines(Paths.get("/etc/zshrc")).getFirst();
}

Then the interceptor is not properly evaluating permissions because behind the scenes its calling newByteChannel and not open.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 92d83d2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@cwperks
Copy link
Member Author

cwperks commented Apr 10, 2025

@reta @andrross @kumargu this completely explains the entries that have needed to be added to policy files since the merge of the agent. If we proceed with walking the entire stack then there's going to need to be a lot of new entries for the agent. We need to visit this logic to determine which frames to examine.

@reta
Copy link
Collaborator

reta commented Apr 10, 2025

@reta @andrross @kumargu this completely explains the entries that have needed to be added to policy files since the merge of the agent. If we proceed with walking the entire stack then there's going to need to be a lot of new entries for the agent. We need to visit this logic to determine which frames to examine.

Thanks @cwperks , I also see the issue but AccessController & similar APIs are high risks, they are about to be removed completely from JDK. Other options to consider is to augment (at runtime) the OpenSearch core with the plugin permissions, I will explore this route.

@cwperks
Copy link
Member Author

cwperks commented Apr 10, 2025

Should we consider making a replacement in OpenSearch and driving a campaign across plugins to replace AccessController.doPrivileged calls with the replacement?

@reta
Copy link
Collaborator

reta commented Apr 10, 2025

Should we consider making a replacement in OpenSearch and driving a campaign across plugins to replace AccessController.doPrivileged calls with the replacement?

That could be an option, yes, but it really depends on how far we would like to go, the k-nn plugin clearly violates the OpenSearch sandbox rules by reading outside the conf/ directory.

@kumargu
Copy link
Contributor

kumargu commented Apr 11, 2025

Should we consider making a replacement in OpenSearch and driving a campaign across plugins to replace AccessController.doPrivileged calls

that would be a huge campaign right now considering Beta-1 timelines. Our plan was to remove usage of AccessController.doPrivileged in 3.1.

Also we should reconsider making replacements: we should probably not end up writing our own security manager with all sorts of replacements (like PolicyFile, PolicyParser, AccessController, SocketPermissions, FilePermissions etc..)

@cwperks
Copy link
Member Author

cwperks commented Apr 11, 2025

Another marker we could use is SpecialPermission.check(), but this has not been consistently added before AccessController.doPrivileged() in all cases

@cwperks
Copy link
Member Author

cwperks commented Apr 11, 2025

that would be a huge campaign right now considering Beta-1 timelines. Our plan was to remove usage of AccessController.doPrivileged in 3.1.

I don't think we need to immediately switch over the imports from java.security.AccessController to its replacement (in OpenSearch) if we go this route. Replacing the imports could be done in 3.1.

I can see AccessController in JDK24 (marked for removal, but still there).

I think writing a replacement would be fairly trivial because it just wraps a callable and executes the callable. Its exact purpose is to limit what frames are examined on the stack when iterating through the ProtectionDomains to ensure that the code executing the privileged action is explicitly authorized with grants.

@cwperks
Copy link
Member Author

cwperks commented Apr 11, 2025

I will take some time to study how the AccessController relates to JSM.

I thought this comment from @rmuir on the initial contribution of SpecialPermission was insightful: 6d8c035

Add SpecialPermission to guard exceptions to security policy.

In some cases (e.g. buggy cloud libraries, scripting engines), we must
grant dangerous permissions to contained cases. Those AccessController blocks
are dangerous, since they truncate the stack, and can allow privilege escalation.

This PR adds a simple permission to check before each one, so that unprivileged code
like groovy scripts, can't do anything they shouldn't be allowed to do otherwise.

SpecialPermission.check() was always supposed to be called before AccessController.doPrivileged() to take into account the callstack before AccessController.doPrivileged to ensure the original code came from OpenSearch or one of its deps (i.e. if a painless script were to run privileged code then SpecialPermission.check() would fail)

Once inside AccessController.doPrivileged it wouldn't consider previous frames

@rmuir
Copy link
Contributor

rmuir commented Apr 11, 2025

Yeah that's right. It's a hack, from the times before painless where there was untrusted groovy code. Stuff was setup for this untrusted groovy code to run with zero permissions, and the intent of this check was to try to prevent access to doPrivileged completely, as it could allow escalation.

@cwperks
Copy link
Member Author

cwperks commented Apr 11, 2025

I think I found where JSM is assembling the list of ProtectionDomains: https://github.com/openjdk/jdk17u/blob/305512ccc6bc87afe972781e6953c4517a365f86/src/hotspot/share/prims/jvm.cpp#L1267-L1269

This PR seems to be in line with the AccessControllerContext code here:

// stop at the first privileged frame

@cwperks
Copy link
Member Author

cwperks commented Apr 11, 2025

Thank you for confirming @rmuir !

@rmuir
Copy link
Contributor

rmuir commented Apr 11, 2025

Yeah, I feel like java made it really hard to "drop permissions" explicitly. At the time the default permissions were quite generous, which made it worse. So that's why you see strange stuff here. I feel like at some point code was changed to use a new accesscontroller method, that kinda sorta allowed you to "drop privs". Classloader filtering and painless came later IIRC.

So you might have another way to accomplish the goals, without the hack? This SpecialPermission is not a good one to keep around because it always confused everyone. I guess my best analogy is, being in sudoers file.

Copy link
Contributor

❕ Gradle check result for 92d83d2: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Apr 11, 2025

Codecov Report

Attention: Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Project coverage is 72.35%. Comparing base (1628152) to head (92d83d2).
Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
...ent/StackCallerProtectionDomainChainExtractor.java 0.00% 4 Missing ⚠️
...opensearch/javaagent/SocketChannelInterceptor.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17894      +/-   ##
============================================
- Coverage     72.43%   72.35%   -0.08%     
+ Complexity    66789    66759      -30     
============================================
  Files          5449     5451       +2     
  Lines        309085   309148      +63     
  Branches      44979    44988       +9     
============================================
- Hits         223899   223698     -201     
- Misses        66906    67100     +194     
- Partials      18280    18350      +70     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrross
Copy link
Member

Other options to consider is to augment (at runtime) the OpenSearch core with the plugin permissions, I will explore this route.

@cwperks @reta Seems like the two options are either to truncate the stack, or give all the core protection domains a superset of all plugin permissions, right?

If we go with the option to truncate the stack, I think we can rely on the JDK access controller for the time being and later introduce a replacement, and then support both until the JDK version is removed.

@reta
Copy link
Collaborator

reta commented Apr 11, 2025

@cwperks @reta Seems like the two options are either to truncate the stack, or give all the core protection domains a superset of all plugin permissions, right?

@andrross No, not necessarily, please check https://github.com/opensearch-project/k-NN/pull/2657/files#diff-9650701a1206ba7746d46a213c115232a9c39b597cfb85d009fe6d9b7c695b1fR180 for the option(s) that should work without any changes (in the nutshell - we should be going on case by case basis).

Personally, I think we should be prepared that everything security related will be gone (including the permissions, see please https://bugs.openjdk.org/browse/JDK-8353642), so we should not be building on these APIs (or tempting to replicate them, it will get very complex), thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants