Add download archiving system with LocalFilesystem provider #1815

VarshaUN · 2025-08-18T12:25:43Z

As per discussed with mentors , I have added the following ,

archiving.py
Added DownloadStore abstract base class and implementations (LocalFilesystemProvider, S3LikeProvider, SftpProvider) for storing downloads with SHA256-based deduplication and metadata.
settings.py
Initialized download_store based on DOWNLOAD_ARCHIVING_PROVIDER (localstorage, s3, sftp) with configuration validation and error logging.
input.py
Added add_input_from_url and add_input_from_upload to archive URL downloads and uploaded files using download_store, with fallback to project input directory when archiving is disabled. Integrate with InputSource model for metadata storage.

Enhances input handling for pipelines, supporting deduplicated storage and retrieval of inputs across local, S3, and SFTP backends.

Still in progress ,

Signed-off-by : Varsha U N [email protected]

…iders Signed-off-by: Varsha U N <[email protected]>

AyanSinhaMahapatra · 2025-08-18T14:40:42Z

@VarshaUN thanks for the PR, you need to address a few overall issues before we can start reviewing the code in more details, see comments below for these:

there are no tests for the new functionality at all, we don't know if your code works
the CI tests fail as you've added dependencies but not added them to pyproject.toml
as the CI tests are not even running we don't have any idea about whether the code changes even pass existing tests
the PR has no link to previous issues/discussions/PRs (please do not use As per discussed with mentors, link to an issue/discussion)
please also consider all reviews/comments from add support to store packages/archives locally #1685, I feel many of them are also applicable here.
please do a basic test run on your local machine/CI as mentioned in https://scancodeio.readthedocs.io/en/latest/installation.html#local-development and https://scancodeio.readthedocs.io/en/latest/installation.html#tests and preferably this passes on your PR (or there are expected failures which you can explain). I've also mentioned this before in add support to store packages/archives locally #1685 (review)

Couple of issues with general direction of the PR as discussed in #1685 (comment):

We could have a simple base class to get/put files in the archive and a local file system implementation for now, enable with a global settings.

we asked for a working implementation of the features with local filesystem only, then only go for S3 (it's fine to create data/abstract classes) as this is much harder to test. Functionality that cannot be tested, is much likely to not be merged.

CORE: have a feature in the base Pipeline class and settings to enable that

Your code does not interact with any pipelines at all, and no unittests/overall tests are added, so no way for us to test the functions/features.

The local storage looks like this:

We need some tests so show this is being stored correctly.

Presently input archives are downloaded with

scancode.io/scanpipe/pipelines/__init__.py

Line 130 in 9d41ad3

def download_missing_inputs(self):

and stored in the /input/ directory for each project as specified in

scancode.io/scanpipe/models.py

Line 573 in 9d41ad3

WORK_DIRECTORIES = ["input", "output", "codebase", "tmp"]

, this needs to move into a central location for each instance of scancode.io and there is a lot of code changes required to do that. This is not addressed yet in the PR.

Signed-off-by: Varsha U N <[email protected]>

scanpipe/pipes/input.py

Signed-off-by: Varsha U N <[email protected]>

This reverts commit 87c81bd.

This reverts commit cd04f3f1062f3ac8c78af3a7b0ed042633f5b375.

This reverts commit b6d2342873168e53865e8f39185a9602de191b7f.

This reverts commit ca2f49f505bd5c951b5f270d4b218a69848a6de9.

Signed-off-by: Varsha U N <[email protected]>

Add download archiving system with LocalFilesystem, S3, and SFTP prov…

16ab593

…iders Signed-off-by: Varsha U N <[email protected]>

VarshaUN marked this pull request as draft August 18, 2025 12:25

add test for localfilesysytem

0bc58cf

Signed-off-by: Varsha U N <[email protected]>

AyanSinhaMahapatra reviewed Aug 25, 2025

View reviewed changes

scanpipe/pipes/input.py Outdated Show resolved Hide resolved

VarshaUN added 2 commits August 25, 2025 14:32

Merge branch 'main' into JSON-local-storage

fa5eb25

modify the required imports

5476933

Signed-off-by: Varsha U N <[email protected]>

VarshaUN changed the title ~~Add download archiving system with LocalFilesystem, S3, and SFTP providers~~ Add download archiving system with LocalFilesystem provider Sep 15, 2025

VarshaUN added 16 commits September 17, 2025 05:17

fix CI errors

35efe84

Signed-off-by: Varsha U N <[email protected]>

add tests for storing packages

87c81bd

Signed-off-by: Varsha U N <[email protected]>

Merge branch 'main' into JSON-local-storage

d1f65cd

Update Dockerfile

fa1d219

Update test_pipelines.py

cb2d0c6

Update Dockerfile

195c3b7

Revert "add tests for storing packages"

48c8b1c

This reverts commit 87c81bd.

Revert "Revert "add tests for storing packages""

7f177b9

This reverts commit cd04f3f1062f3ac8c78af3a7b0ed042633f5b375.

Revert "Revert "Revert "add tests for storing packages"""

a381d69

This reverts commit b6d2342873168e53865e8f39185a9602de191b7f.

Revert "add tests for storing packages"

544f9e2

This reverts commit ca2f49f505bd5c951b5f270d4b218a69848a6de9.

fix CI errors

86c0d23

Signed-off-by: Varsha U N <[email protected]>

fix minor errors

fbfbebb

Signed-off-by: Varsha U N <[email protected]>

fix minor error

aefd069

Signed-off-by: Varsha U N <[email protected]>

fix the imports

8cceed7

Signed-off-by: Varsha U N <[email protected]>

fix CI errors and imports

ede7730

Signed-off-by: Varsha U N <[email protected]>

fix ci errors

660a965

Signed-off-by: Varsha U N <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add download archiving system with LocalFilesystem provider #1815

Add download archiving system with LocalFilesystem provider #1815

Uh oh!

VarshaUN commented Aug 18, 2025

Uh oh!

AyanSinhaMahapatra commented Aug 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add download archiving system with LocalFilesystem provider #1815

Are you sure you want to change the base?

Add download archiving system with LocalFilesystem provider #1815

Uh oh!

Conversation

VarshaUN commented Aug 18, 2025

Uh oh!

AyanSinhaMahapatra commented Aug 18, 2025

Uh oh!

Uh oh!

Uh oh!