Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License Scanning does not work on license name containing url to license #117

Open
LeonKolataAtGov opened this issue Jan 21, 2025 · 1 comment
Labels
working as designed The description indicates the tool is working as designed

Comments

@LeonKolataAtGov
Copy link

LeonKolataAtGov commented Jan 21, 2025

We are using the license policy functionality for scanning the licenses inside the SBOM, there is an issue on getting the correct license when the license name is an URL.

Is there a possible fix for that, adjusting the license.json was not successful. I tried putting it as name,family, alias or url but the sbom-utitlity couldn't find any suitable license choice.

Is this a known Issue, I've seen the IBM implementation stated something about a similar issue with license expressions

// NOTE: we have found some SBOM authors have placed license expressions
// within the "name" field. This prevents us from assigning policy
// return
https://github.com/IBM/sbom-utility/blob/main/cmd/license_policy.go#L154C1-L156C11

I'm a nohead in go therefore I cannot fix it on my own :(

Example output:

[TRACE] [ENTER] [2025-01-17 14:52:02.889267424] license.go(228) cmd.hashComponentLicense() 
[TRACE] [2025-01-17 14:52:02.889278504] license.go(236) cmd.hashComponentLicense() hashing license for component=`tomcat-embed-core`
[TRACE] [ENTER] [2025-01-17 14:52:02.889283874] license.go(321) cmd.hashLicenseInfoByLicenseType() 
[TRACE] [ENTER] [2025-01-17 14:52:02.889288603] license_policy_config.go(365) schema.(*LicensePolicyConfig).FindPolicy() 
[TRACE] [ENTER] [2025-01-17 14:52:02.889298552] license_policy_config.go(438) schema.(*LicensePolicyConfig).FindPolicyByFamilyName() ((string):name:, (string):https://www.apache.org/licenses/LICENSE-2.0.txt)
[TRACE] [ENTER] [2025-01-17 14:52:02.889312518] license_policy_config.go(496) schema.(*LicensePolicyConfig).searchForLicenseFamilyName() 
[TRACE] [EXIT ] [2025-01-17 14:52:02.889325222] license_policy_config.go(518) schema.(*LicensePolicyConfig).searchForLicenseFamilyName() 
[TRACE] [2025-01-17 14:52:02.889331082] license_policy_config.go(486) schema.(*LicensePolicyConfig).FindPolicyByFamilyName() No policy match found for license family name=`https://www.apache.org/licenses/LICENSE-2.0.txt` 
[TRACE] [EXIT ] [2025-01-17 14:52:02.889342314] license_policy_config.go(490) schema.(*LicensePolicyConfig).FindPolicyByFamilyName() 
[TRACE] [EXIT ] [2025-01-17 14:52:02.889346872] license_policy_config.go(397) schema.(*LicensePolicyConfig).FindPolicy() 
[TRACE] [2025-01-17 14:52:02.889373171] bom_hash.go(252) schema.(*BOM).HashmapLicenseInfo() Hashmap Put() licenseInfo: {UsagePolicy:UNDEFINED LicenseChoiceTypeValue:2 LicenseChoiceType:name License:https://www.apache.org/licenses/LICENSE-2.0.txt ResourceName:tomcat-embed-core BOMRef:pkg:maven/org.apache.tomcat.embed/[email protected]?package-id=1c2ad5ed5dfa7568 BOMLocationValue:3 BOMLocation:components LicenseChoice:{License:0xc00060b2c0 CDXLicenseExpression:{Expression: BOMRef:<nil> Acknowledgement:}} Policy:{Id: Reference: IsOsiApproved:false IsFsfLibre:false IsDeprecated:false Family: Name: UsagePolicy:UNDEFINED Aliases:[] Children:[] Notes:[] Urls:[] AnnotationRefs:[] AltUsagePolicy: AltAnnotationRefs: AltSPDXId:} Component:{Primary:false Type:library Name:tomcat-embed-core Version:10.1.34 Description: Group: BOMRef:pkg:maven/org.apache.tomcat.embed/[email protected]?package-id=1c2ad5ed5dfa7568 MimeType: Supplier:<nil> Publisher: Scope: Hashes:<nil> Licenses:0xc000013db8 Copyright: Cpe:cpe:2.3:a:apache:tomcat-embed-core:10.1.34:*:*:*:*:*:*:* Purl:pkg:maven/org.apache.tomcat.embed/[email protected] Swid:<nil> Pedigree:<nil> ExternalReferences:0xc000013d88 Components:<nil> Evidence:<nil> Properties:0xc000013dd0 ReleaseNotes:<nil> Signature:<nil> Modified:false ModelCard:<nil> Data:<nil> Authors:<nil> OmniborId:<nil> Swhid:<nil> CryptoProperties:<nil> Tags:<nil> Manufacturer:<nil> Author:} Service:{Name: Version: Description: Group: BOMRef:<nil> Endpoints:<nil> Authenticated:false XTrustBoundary:false Provider:<nil> Data:<nil> Licenses:<nil> ExternalReferences:<nil> Services:<nil> Properties:<nil> ReleaseNotes:<nil> Signature:<nil> TrustZone: Tags:<nil>} ExtendedLicenseInfo:{LicenseId: LicenseName:https://www.apache.org/licenses/LICENSE-2.0.txt LicenseExpression: LicenseUrl: LicenseTextEncoding: LicenseTextContentType: LicenseTextContent:}}
[TRACE] [EXIT ] [2025-01-17 14:52:02.889382108] license.go(355) cmd.hashLicenseInfoByLicenseType() ((<nil>): <nil>)
[TRACE] [EXIT ] [2025-01-17 14:52:02.889387788] license.go(266) cmd.hashComponentLicense() ((<nil>): <nil>)

Extract from the sbom

{
      "bom-ref": "pkg:maven/org.apache.tomcat.embed/[email protected]?package-id=1c2ad5ed5dfa7568",
      "type": "library",
      "name": "tomcat-embed-core",
      "version": "10.1.34",
      "licenses": [
        {
          "license": {
            "name": "https://www.apache.org/licenses/LICENSE-2.0.txt"
          }
        }
      ],
      "cpe": "cpe:2.3:a:apache:tomcat-embed-core:10.1.34:*:*:*:*:*:*:*",
      "purl": "pkg:maven/org.apache.tomcat.embed/[email protected]",
      "externalReferences": [
        {
          "url": "",
          "hashes": [
            {
              "alg": "SHA-1",
              "content": "f610f84be607fbc82e393cc220f0ad45f92afc91"
            }
          ],
          "type": "build-meta"
        }
      ],
      "properties": [
        {
          "name": "syft:package:foundBy",
          "value": "java-archive-cataloger"
        },
        {
          "name": "syft:package:language",
          "value": "java"
        },
        {
          "name": "syft:package:type",
          "value": "java-archive"
        },
        {
          "name": "syft:package:metadataType",
          "value": "java-archive"
        },
        {
          "name": "syft:cpe23",
          "value": "cpe:2.3:a:apache:tomcat_embed_core:10.1.34:*:*:*:*:*:*:*"
        },
        {
          "name": "syft:cpe23",
          "value": "cpe:2.3:a:apache:tomcat:10.1.34:*:*:*:*:*:*:*"
        },
        {
          "name": "syft:cpe23",
          "value": "cpe:2.3:a:apache:embed:10.1.34:*:*:*:*:*:*:*"
        },
        {
          "name": "syft:location:0:path",
          "value": "/workflow-0.0.1-SNAPSHOT.jar"
        },
        {
          "name": "syft:metadata:virtualPath",
          "value": "/workflow-0.0.1-SNAPSHOT.jar:BOOT-INF/lib/tomcat-embed-core-10.1.34.jar"
        }
      ]
    },
@mrutkows
Copy link
Contributor

mrutkows commented Feb 19, 2025

@LeonKolataAtGov the value of the name field (of a licenseChoice) is simply a free-form string and not treated as an identifier. That is, you can put anything you wish in the name field and it will not be used/treated as an identifier it cannot be (for obvious reasons) for the purposes of policy lookup/assertion.

Syft simply chose to place the local filename of a license file in the name field (which they are free to do give it is just a string), but it cannot be used for deterministic processing (such as license assertion and risk assessment).

Ironically, one of the core reasons I created this utility was so that SBOM generation tooling providers would use the fields properly and adhere to schema. The field to use is id which SHOULD be a valid SPDX Identifier. As a generation tool, Syft should read the contents of the LICENSE file they identified (and added to the licenses array) and assert its contents is Apache 2.0 (or other) and set the id value to Apache-2.0 (which is also endorsed by the ASF I should mention: https://www.apache.org/licenses/LICENSE-2.0).

To be clear, other BOM generation tools likely choose to fill in the name value with other information (not local file paths); therefore, although code could be added to account specifically for Syft's use of the field and interpret the string "https://www.apache.org/licenses/LICENSE-2.0.txt" as being SPDX ID "Apache-2.0", this would still be a "guess" made that would then be presented as an assertion to the policy evaluation code (which companies use to determine risk). That is not a good security/compliance practice.

IMO, Syft should fix their gen. tool to use the id field when possible (or, of course, the license expression field which also uses SPDX IDs). In addition, if they choose not to make a determination for 'id''s vale, then they should include the source code of the actual license text which downstream viewers can read to make their own determination. That is, take advantage of the license text object.

@mrutkows mrutkows added the working as designed The description indicates the tool is working as designed label Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
working as designed The description indicates the tool is working as designed
Projects
None yet
Development

No branches or pull requests

2 participants