-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split repository traits, Preserve metadata hashes #274
Conversation
A TUF client has no need to store metadata in a remote repository, and some repository implementations cannot implement the store_* methods, so this change splits defines the Repository trait to be a supertrait of RepositoryProvider (the read half) and RepositoryStorage (the write half). Other than requiring clients to import additional traits, this change should not change any existing functionality.
Since the Repository traits can be implemented outside this crate, it is not safe to assume that those implementations will verify the maximum length or hash of metadata/targets before succeeding a fetch request. In preparation to move the length check and hash verification into the tuf client, this change splits the SafeReader into 2 types: * EnforceMinimumBitrate enforces a minimum transfer rate, currently utilized by only the http repository implementation * SafeReader, which retains the logic to enforce a maximum file size and hash value. This change also defines an extension trait on AsyncRead to easily wrap an AsyncRead in these types. A future change will move hash and length enforcement out of the Repository implementations.
The current definition the fetch_metadata and store_metadata repository trait methods require the repository trait implementations to parse the metadata, and the SignedMetadata structs and DataInterchange traits cannot provide a guarantee that parsing and re-serializing metadata will produce byte-for-byte identical metadata. This behavior is an issue for TUF as during the metadata update workflow, clients fetch metadata from a remote repository and store it in the local repository for later use. However, for metadata referenced by snapshot, the hash of the data must match in order to use the local version. This change: * modifies the fetch_metadata and store_metadata trait methods to interact with AsyncRead instead of SignedMetadata so that the original unparsed bytes can be stored as-is in the local repository as long as the metadata is determined to be valid. * defines the RawSignedMetadata type, which is a simple wrapper around raw bytes and contains type information identifying its inner Metadata type and serialization format. * introduces a new type to wrap an instance of the Repository traits and provide an API surface that can store the raw metadata and fetch both raw and parsed metadata to simplify interacting with the now less ergonomic trait methods. * moves the metadata max length checks and hash checks out of the repository trait implementations and into the RepositoryClient, ensuring invalid metadata provided by external RepositoryProviders will not be trusted (for metadata that is is hash-checked).
This change modifies the local/remote parameters for Client's constructors to be `impl Into<Repository<R, D>>`, which both preserves the existing behavior where implementations of RepositoryProvider and/or RepositoryStorage are accepted and allows callers to provide a Repository directly. As long as one of the parameters is a Repository instance, there won't be a need to turbofish or annotate the Client type to include the DataInterchange parameter.
src/client.rs
Outdated
pub async fn with_trusted_local(config: Config<T>, local: L, remote: R) -> Result<Self> { | ||
pub async fn with_trusted_local( | ||
config: Config<T>, | ||
local: impl Into<Repository<L, D>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is part of the public interface, can you switch to using where clauses? Using impl Into
prevents users from turbofish-ing this type, and since this is a fairly generic type, it's possible we might need a turbofish for some complex situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed Repository from the public interface, so these are all back to L and R.
src/client.rs
Outdated
let root_path = MetadataPath::from_role(&Role::Root); | ||
let (local, remote) = (local.into(), remote.into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, I recommend doing the .into()
first, since logically the root_path
and root_version
are more closely related than this (local, remote)
.
src/client.rs
Outdated
@@ -356,6 +371,10 @@ where | |||
) | |||
.await?; | |||
|
|||
// FIXME why does with_trusted_root_keyids store using the metadata version but | |||
// with_trusted_root_keys use the version from the caller? | |||
let root_version = MetadataVersion::Number(trusted_root.version()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I just forgot to copy this case into with_truted_root_keys
. We haven't gotten any checks yet that the metadata returned the version we requested (see #253), so the safest thing is to use the metadata version, since the whole system could get rather confused if we stored it with the wrong version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that to with_trusted_root_keys then, thanks for clarifying.
let raw_signed_meta = self | ||
.fetch_raw_metadata(meta_path, version, max_length, hash_data) | ||
.await?; | ||
let signed_meta = raw_signed_meta.parse()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be better to move this to the caller, to make it easier to see where the parsing happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The match statements in lookup_target_description get rather nasty if there isn't a single method I can call to get both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a little concern that if we ever have two fetch_metadata
calls in the same scope, it'd be a little easier to hash check one metadata, but use the other when trying to update the tuf database. This probably isn't a large problem in practice since we still go through signature verification and the rest of the validation, but it'd be nice to avoid this.
We can defer addressing this for a future patch though.
src/util.rs
Outdated
} | ||
|
||
/// Wraps an `AsyncRead` to detect and fail transfers slower than a minimum bitrate. | ||
pub(crate) struct EnforceMinimumBitrate<R> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/repository/http.rs
Outdated
let components = target_path.components(); | ||
let resp = self.get(&self.targets_prefix, &components).await?; | ||
|
||
let stream = resp | ||
// TODO check content length if known and fail early if the payload is too large. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you file a ticket for this, and link to it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#278 (also added to the TODO)
src/repository.rs
Outdated
{ | ||
/// Store signed metadata. | ||
/// A readable TUF repository. | ||
pub trait RepositoryProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you move the DataInterchange
parameter onto the methods? This has the side effect of us not being able to create a Box<dyn RepositoryProvider>
.
I suppose this would allow us to interact with a repository that supports interchanges, but I'm not sure if that's worth trying to support just yet, since we (and the tuf spec) hasn't actually figured out how alternative interchanges would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Repository<EphemeralRepository<Json>, Json>
was redundant, and the repository provider is where the metadata and targets are, in whatever form they take. Yeah, it is a bit weird to allow a provider to provide different data interchanges when we don't have a story for what that looks like yet.
I'll plumb the type parameter back through the trait itself.
tests/simple_example.rs
Outdated
@@ -105,7 +105,7 @@ where | |||
} | |||
|
|||
async fn init_server<'a, T>( | |||
remote: &'a mut EphemeralRepository<Json>, | |||
remote: &'a mut Repository<EphemeralRepository, Json>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about just passing in EphemeralRepository
and not wrap it in Repository
? The only thing that buys us is that we call Repository::check
to see if the metadata path matches the role, but I'm not sure if that check actually buys us that much. If a server messed up and wrote the root metadata as timestamp.json
, the client would fetch that file then it should fail to deserialize into TimestampMetadata
. So really, this check is more for making sure we don't generate a malformed repository, but given how low-level the current implementation is, the whole thing probably needs to be architected (see #232).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. The reason I used Repository in these test binaries was because its method signatures were nicer to use (I could store RawMetadata instead of having to convert them to bytes first).
src/repository.rs
Outdated
/// to a specific [`DataInterchange`](crate::interchange::DataInterchange) that will enforce | ||
/// provided length limits and hash checks. | ||
#[derive(Debug, Clone)] | ||
pub struct Repository<R, D> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could make this pub(crate)
. It's mainly about making sure the client doesn't accidentally access the repositories directly. I'm not sure if external users would need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me. It was a little more convenient to use in the test binaries in this crate, but it isn't too terrible to use the repository implementations directly, and I suppose I could add some similar methods to EphemeralRepository to simplify testing if we find ourselves doing a lot of repository manipulation from external tests.
src/repository.rs
Outdated
fn fetch_metadata<'a, M>( | ||
/// A writable TUF repository. Most implementors of this trait should also implement | ||
/// `RepositoryProvider`. | ||
pub trait RepositoryStorage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I think this should take D
as a generic parameter.
Now that it is generic on D again, it doesn't need to support storing more than one format of metadata.
@@ -206,10 +207,16 @@ async fn extract_keys(dir: &Path) -> Vec<KeyId> { | |||
let remote = init_remote(dir).unwrap(); | |||
|
|||
let root_path = MetadataPath::from_role(&Role::Root); | |||
let metadata: SignedMetadata<_, RootMetadata> = remote | |||
|
|||
let mut buf = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is a good example of why I exported Repository
.
Got a bit overzealous with Sync.
Merged in develop and responded to the comments. Please take another look. |
let raw_signed_meta = self | ||
.fetch_raw_metadata(meta_path, version, max_length, hash_data) | ||
.await?; | ||
let signed_meta = raw_signed_meta.parse()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a little concern that if we ever have two fetch_metadata
calls in the same scope, it'd be a little easier to hash check one metadata, but use the other when trying to update the tuf database. This probably isn't a large problem in practice since we still go through signature verification and the rest of the validation, but it'd be nice to avoid this.
We can defer addressing this for a future patch though.
This series of changes:
Repository
trait into local and remote components #119RepositoryProvider
s.