Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 45 additions & 15 deletions jep/202/README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,20 @@ The flow looks like
** browser is sent a HTTP redirect to that url so download is between external store and browser


This change adds more methods to the VirtualFile API to support this flow.
This change adds more methods to the http://javadoc.jenkins.io/jenkins/util/VirtualFile.html[`VirtualFile`] API to support this flow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently Javadoc site does not list the methods. Maybe makes sense to add return type and throws to have a full method signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the javadoc is now updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


* `toExternalURL()` Optionally obtains a URL which may be used to retrieve file contents from any process on any node.
* `mode()` Gets the file Unix mode, if meaningful
* `readLink()` If this file is a symlink, returns the link target.
* `list(String, String, boolean)` Lists recursive files of this directory with pattern matching

http://javadoc.jenkins.io/hudson/FilePath.html[`FilePath`]

* `copyFromRemotely(URL)` Copies the content of a URL to a remote file without transferring content over a Remoting channel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doRemoteCopy might be a clearer name? BTW. I'm aware it's a change upstream, but as @jglick vented "it's still in Beta after all" :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main issue is that that would not obviously be a drop-in replacement for copyFrom(URL), as this is.



In https://github.com/jenkinsci/workflow-api-plugin[workflow-api-plugin] a new https://github.com/jglick/workflow-api-plugin/blob/44bc9985b0aa1fb9407b5c78be6b429ad29af2d8/src/main/java/org/jenkinsci/plugins/workflow/flow/StashManager.java#L292-L315[`StashAwareArtifactManager`] is also added, a mixin interface for an `ArtifactManager` which supports specialized stash behavior as well.


The `VirtualFile.run(Callable)` method, that already existed to optimize Workspace link browsing on agents, is now being used in artifact-related code in Jenkins core.
It was important to implement this in the S3 plugin in order to ensure that Classic or Blue Ocean UI flows relating to artifacts, including simply opening the index page of a build, did not make a large number of network calls.
Expand Down Expand Up @@ -151,32 +164,29 @@ Equivalent features to S3 exist in other cloud providers and artifact repositori

The S3 implementation also uses http://jclouds.apache.org[Apache JClouds] that abstracts most of the implementation from the underlying blob store.

== Backwards Compatibility
=== Container (Bucket) and Path References

Existing plugins using `ArtifactManager` API will continue to work using the new selected implementation.
Current implementation for S3 uses a master wide config option to set the name of the container (S3 bucket) and path inside.
Currently this is a environment variable but could be a system property or a config section in the UI.

However:
This means that different runs can not store the artifacts in different buckets or paths, as we don't expect that to be a common use case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot

It would be more common to move all the artifacts from one location to another and that could be easily achieved by moving the blobs in S3 and changing the master wide configuration parameters.

* Various plugins call `Run.getArtifactsDir` and similar deprecated APIs.
These would already have been broken for users of the Compress Artifacts plugin, but that is rarely used, whereas we are proposing lots of people run with the S3 artifact manager.
Calls to the deprecated APIs will behave as if there were no artifacts in the build.
We could add telemetry so that such calls produce a warning in the system log, at least when the build actually does have a custom artifact manager selected.
=== Interruptions

* Some plugins using `VirtualFile` may still be calling open and then passing the stream to an agent.
This will work, but will be very expensive when using S3 storage. They need to be updated to call `VirtualFile.toExternalURL`.
The API does not support `InterruptedException`, but we do not see any evidence that it may cause integration issues with other plugins like Build Timeout.
Follow up work as part of https://issues.jenkins-ci.org/browse/JENKINS-50597[JENKINS-50597] will verify this.

== Security
=== Security

Security considerations make agents need to be restricted to only access the artifacts needed.
Having access to the blob store would mean access to other jobs artifacts.
Two possible implementations were considered:

=== Agents only need upload/download permissions
==== Agents only need upload/download permissions

If agents only do upload/download operations we can use pre-signed urls so they will not be able to access other jobs artifacts.
Other operations (list, create, delete,...) would run on the master, which would be a performance hit for builds with many artifacts

=== Passing limited credentials to each agent
==== Passing limited credentials to each agent

Masters need to run with elevated permissions to be able to create new roles and permissions on the fly for each job (`AssumeRole` in AWS).
Those limited credentials would be passed on to the agent, who would use them to talk to the external store.
Expand All @@ -186,6 +196,26 @@ This temporary role creation does not exist in all clouds nor other artifact rep

We opted for the first, simpler option.

== Backwards Compatibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eek, GitHub diff is getting confused, may trigger merge conflicts with #98 but OK


Existing plugins using `ArtifactManager` API will continue to work using the new selected implementation.

However:

* Various plugins call `Run.getArtifactsDir` and similar deprecated APIs.
These would already have been broken for users of the Compress Artifacts plugin, but that is rarely used, whereas we are proposing lots of people run with the S3 artifact manager.
Calls to the deprecated APIs will behave as if there were no artifacts in the build.
We could add telemetry so that such calls produce a warning in the system log, at least when the build actually does have a custom artifact manager selected.

* Some plugins using `VirtualFile` may still be calling open and then passing the stream to an agent.
This will work, but will be very expensive when using S3 storage. They need to be updated to call `VirtualFile.toExternalURL`.

== Security

Security considerations make agents need to be restricted to only access the artifacts needed.
Having access to the blob store would mean access to other jobs artifacts.

Agents only do URL based upload/download operations and get the correct url to do so from the master.
Extra care needs to be taken so agents do not have any access to the blob store.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can delete this sentence now? Or shift to the next paragraph?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


In the common case where the vm instances are assigned roles (`IAM role` in AWS) the instance where the master runs should have access to the blob store but the agents should run in a different instance where its role does not allow it.
Expand Down