Skip to content

Add support for version 2.1 of the IMDS credentials provider #4109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ysaito1001
Copy link
Contributor

@ysaito1001 ysaito1001 commented Apr 30, 2025

DO NOT MERGE THIS PR UNTIL READY TO RELEASE

Description

This PR adds support for version 2.1 of the IMDS credentials provider. With this update, if the underlying EC2 instance's IMDS supports it, the provider can now retrieve credentials that include an account ID. This account ID can then be used by account-based endpoints.

To support this functionality, the credentials provider first attempts to access the extended API endpoint, which ends with -extended (e.g., http://169.254.169.254/latest/meta-data/iam/security-credentials-extended). If this endpoint returns a 404, the provider falls back to the legacy API endpoint—the same one used in the IMDS credentials provider v2.0.

This "try extended API, then fall back to legacy API" pattern is applied to both retrieving the IMDS instance profile name and fetching credentials. Importantly:

  • Once the provider determines that a specific API version (extended or legacy) is supported, it will stick to that version.
  • Only the extended API can return credentials that include an account ID.

The PR also made the following IMDS credentials providers options configurable:

  • Disable IMDS credentials fetching
  • IMDS instance profile name

Both options can be configured via environment variables or a shared config file, with environment variables taking precedence over the config file.

Note: Feature tracking for this functionality will be handled in a separate PR.

Testing

  • Updated connection recordings in aws-config/test-data/ to reflect changes in the execution flow within .retrieve_credentials (Note: This update is not intended to add new test coverage, but to align recordings with the updated flow.)
  • Added unit tests to verify the behavior of the extended API, including falling back to the legacy API when necessary.
  • Added unit tests to verify the configuration options for disabling IMDS credentials fetching and specifying the IMDS instance profile name.

Checklist

  • For changes to the AWS SDK, generated SDK code, or SDK runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "aws-sdk-rust" in the applies_to key.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link

A new generated diff is ready to view.

  • AWS SDK (ignoring whitespace)
  • No codegen difference in the Client Test
  • No codegen difference in the Server Test
  • No codegen difference in the Server Test Python
  • No codegen difference in the Server Test Typescript

A new doc preview is ready to view.

@@ -89,7 +89,7 @@
"action": {
"Request": {
"request": {
"uri": "http://169.254.169.254/latest/meta-data/iam/security-credentials/",
"uri": "http://169.254.169.254/latest/meta-data/iam/security-credentials-extended/",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this test is updated to hit extended but the response isn't updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, not strictly necessary since extended API can still return credentials without account ID. Admittedly, this is what happens when we manually tweak connection recording files for a feature that hasn't been supported in the real world.

{
Ok(credentials) => {
let state = &mut self.provider_state.write().unwrap();
state.resolved_profile = Some(profile.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this set in resolve_profile_name() already?

Ok(credentials)
}
Err(ImdsError::ErrorResponse(raw)) if raw.response().status().as_u16() == 404 => {
// A block is required to drop the `RwLockWriteGuard`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this or the other similar comments like it, this would be self evident to anyone trying to remove the block.

}

#[tokio::test]
async fn should_return_valid_credentials_when_profile_is_unstable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(comment unrelated to this particular code I chose it at random): The SEP defines a bunch of test cases are we covering all of them? Is there anyway to turn the JSON provided into test cases such that new test cases or changes to the SEP are easy? I don't care how we do it but some level of traceability to the SEP would be good here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also unrelated, but related to SEP JSON tests in general. We don't really seem to use them, when I have implemented a SEP with tests I've had translated them into Rust/Kotlin because that seemed to be the standard, but it would be nice if we could start using the JSON tests directly where possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care if we use the JSON directly or not (though in this case the JSON seems workable) but at the very least translating them such that we can match test cases seems desirable. SEP test cases probably ought to be required to have an identifier 🤔

Copy link
Contributor Author

@ysaito1001 ysaito1001 May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unit tests have covered all the test cases except for ones I deemed redundant from an equivalence testing point of view, which test scenarios where the extended API returns credentials without account ID; I only included one test case for that using profile-name my-profile-0005.

SEP test cases probably ought to be required to have an identifier

Agreed. The closest thing to the test case ID in the SEP is a profile name, e.g. my-profile-0001. If we used JSON directly, that would be the best traceability. We sort of have the test infrastructure as shown under aws-config/test-data. They work well with connection recording, but for the feature like this that hasn't been supported in the real world we cannot generate recording, so it's easier to write the tests in unit tests.

Is there anyway to turn the JSON provided into test cases such that new test cases or changes to the SEP are easy?

That's invent and simplify :) Requires some level of design for the to-be-built infrastructure to be useful not just in one SEP, but in many others in the future.

Copy link
Contributor

@landonxjames landonxjames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about the RwLock, but otherwise LGTM!


match self.client.get(self.uri_base()).await {
Ok(profile) => {
let state = &mut self.provider_state.write().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.read() and .write() on a RwLock can panic, should we use .try_read() and .try_write() instead or do you not expect it to be an issue in practice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I don't expect it to be an issue in practice. read and write can panic upon unwrapping a LockResult whenever a writer panics while holding an exclusive lock. If we look at write critical sections in our code (where we call self.provider_state.write().), they either update the field of ProviderState or return a value from the method. Therefore, the lock will not be poisoned.

In terms of .read vs. .try_read (or .write vs. try_write), the difference is whether the caller blocks or not (note that the notion of a poisoned lock equally exists for .try_* variants). I think we can use blocking ones because

  • read critical sections and write critical sections are short (reading a field value or assigning a new value to the field), which is necessarily so since we are already in an async context and don't want to block for a long time anyway.
  • If try_* variants returns Err, we need to loop back and try acquiring the lock again, which I assume is better handled by blocking variants.

At least I switched to .expect with explanatory comment in bff4f26.

}

#[tokio::test]
async fn should_return_valid_credentials_when_profile_is_unstable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also unrelated, but related to SEP JSON tests in general. We don't really seem to use them, when I have implemented a SEP with tests I've had translated them into Rust/Kotlin because that seemed to be the standard, but it would be nice if we could start using the JSON tests directly where possible.

Copy link

github-actions bot commented May 1, 2025

A new generated diff is ready to view.

  • AWS SDK (ignoring whitespace)
  • No codegen difference in the Client Test
  • No codegen difference in the Server Test
  • No codegen difference in the Server Test Python
  • No codegen difference in the Server Test Typescript

A new doc preview is ready to view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants