Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORT does find/analyse all files in a repo #9406

Closed
kikofernandez opened this issue Nov 11, 2024 · 6 comments
Closed

ORT does find/analyse all files in a repo #9406

kikofernandez opened this issue Nov 11, 2024 · 6 comments
Labels
enhancement Issues that are considered to be enhancements question An issue that is actually a question reporter About the reporter tool

Comments

@kikofernandez
Copy link

Describe the bug

In an unmanaged project, I was expecting all files to be included in the scan results, but this is not the case.

To Reproduce

Steps to reproduce the behavior:

  1. git clone [email protected]:erlang/otp.git
  2. bin/ort analyze -i ~/otp -o ~/otp -f JSON
  3. bin/ort scan -f JSON -i ~/otp/analyzer-result.json -o ~/otp (takes ~40 min)
  4. See that licenses for *.css files are not found in the scan results

Expected behavior

I expected all files in an unmanaged repo to be categorised with a license.
The *.css files do not appear inside the ort path scan_results.summary.licenses (scan-result.json),
but they do appear in the section that mentions:

 files: [ {
        "path" : "lib/common_test/priv/ct_default.css",
        "sha1" : "f4487fc4908e382cafb4137f4c4b02447939f2eb"
      }, { ...}]

File lib/common_test/priv/ct_default.css cannot be found anywhere in the results of the scan (scan-result.json), except for the sha1.
"lib/common_test/priv/jquery.tablesorter.min.js" does not appear in the scan results either, except for its sha1.

  • I think this is a bug, otherwise, how could one be sure that all files have been scan and have a license?
  • Is there an option that forces all files to be included in the scan with a license, even if the license is UNKNOWN?

Console / log output

    "config" : {
      "excludes" : {
        "paths" : [ {
          "pattern" : "lib/compiler/scripts/smoke-build",
          "reason" : "TEST_OF",
          "comment" : "This directory contains test projects which are not relevant."
        }, {
          "pattern" : ".ort/**",
          "reason" : "TEST_OF",
          "comment" : "This directory is not relevant."
        }, {
          "pattern" : ".ort.yml",
          "reason" : "OTHER",
          "comment" : "This directory is not relevant."
        }, {
          "pattern" : ".ort/evaluator.rules.kts",
          "reason" : "TEST_OF",
          "comment" : "This directory is not relevant."
        }, {
          "pattern" : "HOWTO/**",
          "reason" : "OTHER",
          "comment" : "This directory is not relevant."
        }, {
          "pattern" : "system/COPYRIGHT",
          "reason" : "OTHER",
          "comment" : "This file contains all licenses used in Erlang/OTP. Not relevant"
        }, {
          "pattern" : "CONTRIBUTING.md",
          "reason" : "OTHER",
          "comment" : "This file contains needs to specify its license"
        } ]
      },
...
"scanners" : {
      "Unmanaged::otp:8bf3d2eb3de18341150d43b7e7b6dd60eaafc080" : [ "ScanCode" ]
    },
    "files" : [ {
      "provenance" : {
        "vcs_info" : {
          "type" : "Git",
          "url" : "[email protected]:kikofernandez/otp.git",
          "revision" : "8bf3d2eb3de18341150d43b7e7b6dd60eaafc080",
          "path" : ""
        },
        "resolved_revision" : "8bf3d2eb3de18341150d43b7e7b6dd60eaafc080"
      },
      "files" : [ {
    "path" : "lib/common_test/priv/ct_default.css",
        "sha1" : "f4487fc4908e382cafb4137f4c4b02447939f2eb"
      }, {
        "path" : "lib/common_test/priv/jquery-latest.js",
        "sha1" : "ee48592d1fff952fcf06ce0b666ed4785493afdc"
      }, {
        "path" : "lib/common_test/priv/jquery.tablesorter.min.js",
        "sha1" : "89ca6c6f8d67f4c339daa6081daf7b4ce2b5308c"
      },...]    

Environment

Output of the ort requirements -l commands command:

Hoplite is configured to infer which sealed type to choose by inspecting the config values at runtime. This behaviour is now deprecated in favour of explicitly specifying the type through a discriminator field. In 3.0 this new behavior will become the default. To enable this behavior now (and disable this warning), invoke withExplicitSealedTypes() on the ConfigLoaderBuilder.
 ______________________________                                                        
/        \_______   \__    ___/ The OSS Review Toolkit, version 37.0.0,                
|    |   | |       _/ |    |    built with JDK 21.0.4+7-LTS, running under Java 21.0.5.
|    |   | |    |   \ |    |    Executing 'requirements' as 'xxxx' on Linux         
\________/ |____|___/ |____|    with 8 CPUs and a maximum of 7948 MiB of memory.       
                                                                                       
Environment variables:                                                                
ORT_CONFIG_DIR = /home/xxxx/.ort/config                                            
ORT_DATA_DIR = /home/xxxx/.ort                                                     
HOME = /home/xxxx                                                              
SHELL = /bin/bash                                                                     
TERM = xterm-256color                                                                 
JAVA_HOME = /usr/lib/jvm/jdk-21.0.5-oracle-x64                                        
                                                                                      
Looking for ORT configuration in the following file:
        /home/xxxx/.ort/config/config.yml

Scanners:
        - Askalono: Requires 'askalono' in no specific version. Tool not found.
        - BoyterLc: Requires 'lc' in no specific version. Tool not found.
        - Licensee: Requires 'licensee' in no specific version. Tool not found.
        * ScanCode: Requires 'scancode' in version >=3.0.0. Found version 32.2.1.

PackageManagers:
        - Bazel: Requires 'bazel' in version >=7.0.0. Tool not found.
        - Bower: Requires 'bower' in version >=1.8.8. Tool not found.
        - Cargo: Requires 'cargo' in no specific version. Tool not found.
        - CocoaPods: Requires 'pod' in version >=1.11.0. Tool not found.
        - Composer: Requires 'composer' in version >=1.5.0. Tool not found.
        - Conan: Requires 'conan' in version >=1.44.0 and <2.0.0. Tool not found.
        - GoMod: Requires 'go' in version >=1.21.1. Tool not found.
        * Npm: Requires 'npm' in version >=6.0.0 and <11.0.0. Found version 10.5.0.
        - NuGetInspector: Requires 'nuget-inspector' in no specific version. Tool not found.
        - Pipenv: Requires 'pipenv' in version >=2018.10.9. Tool not found.
        - Pnpm: Requires 'pnpm' in version >=5.0.0 and <10.0.0. Tool not found.
        - Poetry: Requires 'poetry' in no specific version. Tool not found.
        - Pub: Requires 'dart' in version >=2.10.0. Tool not found.
        * PythonInspector: Requires 'python-inspector' in version >=0.9.2. Found version 0.12.0.
        - Sbt: Requires 'sbt' in no specific version. Tool not found.
        - Stack: Requires 'stack' in version >=2.1.1. Tool not found.
        - SwiftPm: Requires 'swift' in no specific version. Tool not found.
        - Yarn: Requires 'yarn' in version >=1.3.0 and <1.23.0. Tool not found.

VersionControlSystems:
        * Git: Requires 'git' in version >=2.29.0. Found version 2.34.1.
        - GitRepo: Requires 'repo' in no specific version. Tool not found.
        - Mercurial: Requires 'hg' in no specific version. Tool not found.

Prefix legend:
        - The tool was not found in the PATH environment.
        + The tool was found in the PATH environment, but not in the required version.
        * The tool was found in the PATH environment in the required version.

ScanCode license texts found in '/home/xxxx/Code/venv/lib64/python3.10/site-packages/licensedcode/data/licenses'.

Not all tools requirements were satisfied:
        ! Some tools were not found at all.

And specify (relevant parts of) your ORT configuration (config.yml):

ort:
  enableRepositoryPackageCurations: true
  forceOverwrite: true

  advisor:
    osv:
      serverUrl: "https://api-staging.osv.dev"

  scanner:
    skipConcluded: false

  analyzer:
    allowDynamicVersions: true
    # enabledPackageManagers: [Unmanaged]
  #   # A flag to control whether excluded scopes and paths should be skipped during the analysis.
    skipExcluded: true

    config:
      # A map from scanner plugin types to the plugin configuration.
      ScanCode:
        options:
          # Command line options that affect the ScanCode output. If changed, stored scan results that were created with
          # different options are not reused.
          # commandLine: '--copyright --license --info --strip-root --timeout 300'

          # Command line options that do not affect the ScanCode output.
          commandLineNonConfig: '--processes 8'

          # Use per-file license findings instead of per-line ones.
          preferFileLicense: true

Additional context

None

@kikofernandez kikofernandez added bug Issues that are considered to be bugs to triage Issues that need triaging labels Nov 11, 2024
@sschuberth
Copy link
Member

I expected all files in an unmanaged repo to be categorised with a license.

That expectation might be wrong, depending on the exact understanding. ORT does not generally store license information for all files, but only stores files with license / copyright findings. Any file not listed as part of the findings implicitly has an SPDX license of NONE.

However, main license files, like a LICENSE file in the root of a repository, can be regarded to implicitly declare licenses for files that do not contain license information themselves. While ORT is able to infer such license information, it's not stored explicitly in its data model.

The *.css files do not appear inside the ort path scan_results.summary.licenses (scan-result.json)

This means (if there is no bug) that the scanner in use, here ScanCode, did not find any license information in those files.

but they do appear in the section that mentions:

That's expected. While not every file contains license information, you can calculate the SHA1 of every file.

I think this is a bug, otherwise, how could one be sure that all files have been scan and have a license?

It's not a bug AFAICT. You could for example write a rule that uses the file list to check that every file for which a SHA1 was calculated also appears in the list of license findings. Of course, that only makes sense if your policy demands that each and every file must contain license information on its own.

Is there an option that forces all files to be included in the scan with a license, even if the license is UNKNOWN?

Not really. This kind of mapping would be the responsibility of a reporter. You could write one, or introduce an option to an existing one, that explicitly lists files with no findings with a license of NONE instead of omitting them.

This might be a valid addition to the SPDX reporter.

@sschuberth sschuberth added enhancement Issues that are considered to be enhancements reporter About the reporter tool and removed bug Issues that are considered to be bugs to triage Issues that need triaging labels Nov 11, 2024
@kikofernandez
Copy link
Author

That expectation might be wrong, depending on the exact understanding. ORT does not generally store license information for all files, but only stores files with license / copyright findings. Any file not listed as part of the findings implicitly has an SPDX license of NONE.

Ok, I fully understand. What about the following:

If I set a curation for a whole folder in my .ort.yml file, for the files that do not have a license and should, e.g.,

"curations" : {
        "license_findings" : [ {
          "path" : "lib/common_test/priv/**",
          "concluded_license" : "MIT",
          "reason" : "INCORRECT",
          "comment" : "The scanner incorrectly categorises the folder license"
        }, {

when I run the report, is the report going to include all files under that path to be set with the license I said? (these files did not have a license and now I curated them).

So far, I think those files are still not shown in the report even when I have curated them, so the curation only works for the files whose license was detected to be something, and the report skips the files with license NONE (even when it has been curated to not be NONE).

@sschuberth
Copy link
Member

sschuberth commented Nov 11, 2024

is the report going to include all files under that path to be set with the license I said?

No. License finding curations only correct the license findings that are explicitly contained in the ORT result file.

those files are still not shown in the report even when I have curated them

That's expected for files that were not originally in the ORT result file.

the curation only works for the files whose license was detected to be something

Correct.

@sschuberth sschuberth added the question An issue that is actually a question label Nov 11, 2024
@kikofernandez
Copy link
Author

kikofernandez commented Nov 12, 2024

Thanks for the information

@sschuberth
Copy link
Member

sschuberth commented Nov 12, 2024

Can we somehow put the outcome of the discussion into a concrete feature request (as a separate, clean issue)? If not, are we good to close the issue (or move it to a discussion for reference)?

@kikofernandez
Copy link
Author

Great idea. New ticket as feature enhancement:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that are considered to be enhancements question An issue that is actually a question reporter About the reporter tool
Projects
None yet
Development

No branches or pull requests

2 participants