Skip to content

Conversation

@thomasschafer
Copy link
Owner

@thomasschafer thomasschafer commented Sep 22, 2025

To do:

  • Update docs (see here)
  • Recreate PR with base as astral-sh/uv when ready for review, and link the original issue

@thomasschafer thomasschafer force-pushed the sbom-export-support branch 4 times, most recently from 5e494c7 to 8c3a41b Compare September 24, 2025 16:21
@thomasschafer thomasschafer force-pushed the sbom-export-support branch 2 times, most recently from 2853f6a to 3b592f9 Compare September 24, 2025 16:46

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think it's be nice to keep this in the export_format.rs module, since it's conceptually still a set of export formats, just limited to pip compile 🙂

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done!

@woodruffw
Copy link

  • Do we need to update docs?

Yes, please 🙂

@zanieb
Copy link

zanieb commented Sep 24, 2025

I'm not sure if we talk about export much in the documentation right now. The most helpful thing would be to write a section explaining the format, how you'd generate it, and then how you'd consume it. I can figure out how to fit that content into the larger documentation story.

@zanieb
Copy link

zanieb commented Sep 24, 2025

Did you explore just making this format an enum member that's banned in uv pip compile? What motivates splitting the enum up? (I think it's probably fine, I'd just explore both options myself if I were making the change)

@thomasschafer
Copy link
Owner Author

Did you explore just making this format an enum member that's banned in uv pip compile? What motivates splitting the enum up? (I think it's probably fine, I'd just explore both options myself if I were making the change)

Are you meaning something like adding a runtime error when users attempt to use the SBOM format in uv pip compile? If so, we did consider that but had a couple of concerns. First, the auto-generated help text from clap would show cyclonedx1.5 as a possible option i.e.

$ uv pip compile --help | rg format
      --format <FORMAT>                                  The format in which the resolution should be output [possible values: requirements.txt, pylock.toml, cyclonedx1.5]

Also, we'd then need to add error handling to reject that format option when parsing the CLI args, and then add either errors or unreachable!() in the pip compile match statements, which didn't seem ideal.

Let me know if I'm missing something though or if you have other thoughts!

@thomasschafer thomasschafer force-pushed the sbom-export-support branch 2 times, most recently from feb3c90 to 41d8858 Compare September 25, 2025 09:24
@zanieb
Copy link

zanieb commented Sep 25, 2025

Those seem like reasonable reasons to avoid it.

CycloneDX1_5,
}

/// The format to use when compiling to a `requirements.txt` or `pylock.toml` file.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be

Suggested change
/// The format to use when compiling to a `requirements.txt` or `pylock.toml` file.
/// The output format to use in `uv pip compile`.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 129 to 131
Source::Git(url, _) | Source::Direct(url, _) => {
("generic", format!("?download_url={}", url.as_ref()))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised to see Git dependencies are just "generic"

Copy link
Owner Author

@thomasschafer thomasschafer Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging - for GitHub URLs specifically there is a github purl type which we should be using here, although for other Git sources I'm not aware of anything more specific. We'll revisit this to handle that case!

package.id.name.to_string()
}

/// Generate a Package URL (purl) from a package
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should indicate when this will return None

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Component {
component_type: classification,
name: NormalizedString::new(&name),
Copy link

@zanieb zanieb Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little weird that you use get_package_name to get a String then use &str here to construct a new NormalizedString owned type. I would probably omit the get_package_name and get_version_string helpers or at least avoid returning owned values since the caller needs borrowed ones anyway.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I've updated get_package_name to return a &str, although for get_version_string it looks like Version implements Display (here) to get a String but no way to get an &str (at least that I can see)

}

#[test]
fn cyclonedx_export() -> Result<()> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also test a direct URL, e.g., by copying a wheel URL from PyPI

Copy link

@jkowalleck jkowalleck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add tests? maybe snapshot-tests on prepared test beds.?

Since you are crafting CycloneDX based on "untrusted" data, I'd encourage to also validate the SBOM results before returning it.
If you don't want to validate the SBOM in the production code, I would still validate them in tests, to show that you are producing valid SBOMS.

PS: I am not a rust export. but I understand that bom.validate() would run certain validations - for example that bomRefs are unique, and other schema- and plausibility-checks.

let version_specifier = version.map_or_else(String::new, |v| format!("@{v}"));

Some(format!(
"pkg:{purl_type}/{name}{version_specifier}{qualifiers}"
Copy link

@jkowalleck jkowalleck Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do this yourself?
there is an official PURL library to craft proper package URL strings: https://github.com/package-url/packageurl.rs

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zanieb / @woodruffw any objections to me pulling that additional crate in?

Copy link

@jkowalleck jkowalleck Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "official" library is mere a backup/fork of another lib's source code:
https://github.com/scm-rs/packageurl.rs

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're just constructing PURLs not parsing them, does a library have much to offer here?

I'm generally wary of more dependencies, especially for something straight-forward. I'm not sure what to think of there being a library that's just a fork of another here.

Copy link

@jkowalleck jkowalleck Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my hint with the fork is jost to give a heads up to use the right library, which appears to be https://github.com/scm-rs/packageurl.rs.

anyway, PURL is not just concatenating strings the correct way. Especially when using entrusted user input, crafting a malformed PURL is just so easy.
i mean, are you even aware of all the characters that are not allowed in a PURL, and the characters that must be percent-encoded, or the characters that must NOT be percent-encoded? (there is no sanitation or handing of these cases in the current code yet)
I encourage using a dedicated library for crafting valid PURL strings - otherwise you will just implement a thing that has been solved already....

PS: see the wrong implementation in action, where a library should have done the job -- #4 (comment)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is being constructed from untrusted user input, this is being constructed from the output of a uv resolution and each of these types is validated, e.g., the package name and version specifiers are already parsed and normalized. This isn't to say we shouldn't use a package to construct the PURL, but we're not concatenating arbitrary strings here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transitive deps from that crate don't look too bad (and are already in our transitive closure), but I also agree with @zanieb that we don't have any arbitrary inputs in this context -- we know by construction that our name and version are well-formed. I'm also similarly hesitant to bring in another sidecar dep (and I similarly find the "official" fork thing confusing -- which one is authoritative if they diverge?). Overall I'd lean towards doing it ourselves here, unless there's some hidden complexity in the small subset of purls we emit.

(We do probably need to percent-encode the values in the qualifier, however, if I'm reading the purl spec correctly.)

@thomasschafer
Copy link
Owner Author

thomasschafer commented Sep 26, 2025

could you add tests? maybe snapshot-tests on prepared test beds.?

@jkowalleck there are already some tests here and we're planning on adding some more - is this what you were thinking of or did you have other ideas?

"bom-ref": "[email protected]",
"name": "flask",
"version": "2.3.3",
"purl": "pkg:generic/[email protected]?download_url=https://github.com/pallets/flask.git?rev=2.3.3#3205b53c7cf69d17fee49cac6b84978175b7dd73"
Copy link

@jkowalleck jkowalleck Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken result. the download_url was not properly escaped, so that the hash and the query string are propagated to the purl ...
expected outcome: pkg:generic/[email protected]?vcs_url=https://github.com/pallets/flask.git%3Frev%3D2.3.3%233205b53c7cf69d17fee49cac6b84978175b7dd73

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed

@thomasschafer thomasschafer force-pushed the sbom-export-support branch 3 times, most recently from ae20eeb to 80084c1 Compare September 29, 2025 16:31
@thomasschafer thomasschafer force-pushed the sbom-export-support branch 2 times, most recently from 16fb577 to 4ddba58 Compare October 29, 2025 15:41
@thomasschafer
Copy link
Owner Author

Closing in favour of this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants