Skip to content

Conversation

@netomi
Copy link
Contributor

@netomi netomi commented Dec 1, 2025

This PR adds support to download and analyse access logs from an AWS S3 bucket provided by Amazon CloudFront.

Additionally the following changes are included:

  • remove unused Shedlock classes
  • rename AzureDownloadCountProcessedItem entity to DownloadCountProcessedItem and include a storageType field
  • update other code to accommodate above changes

I did decide to remove the AzureDownloadCountProcessItem table instead of renaming and altering it as this feels like to be simpler and we do not lose data that is important to keep around imho.

@netomi netomi marked this pull request as draft December 1, 2025 21:08
@netomi
Copy link
Contributor Author

netomi commented Dec 1, 2025

a version of this change is currently running on staging.open-vsx.org. It updates the counts every minutes to make it easier for testing.

Also, processed log files are not yet deleted, but marked in the database as being processed.

@netomi netomi marked this pull request as ready for review December 2, 2025 07:43
@netomi
Copy link
Contributor Author

netomi commented Dec 2, 2025

@autumnfound can you take a look at this PR?

@amvanbaren it is quite important to get this reviewed and integrated so that we can switch over to an official openvsx release asap for open-vsx.org. We currently run a custom version due to the storage migration. Will also add another PR to utilize a CDN in front of the actual cloud storage provider.

@netomi
Copy link
Contributor Author

netomi commented Dec 2, 2025

something that I could not figure out yet:

Tthe existing AzureDownloadCountService has a recurring job defined whose name I wanted to update to better reflect that there are now multiple jobs, but changing the job name in the annotation does not update the job in the database.

Adding that to the migration script will make tests fail, so for now, I was updating the respective db table manually. Do you have any ideas how this could be make more robust? Is something like that supposed to be in the migration script?

Copy link
Contributor

@autumnfound autumnfound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but the lack of comments made this code a bit harder to follow/digest.

@netomi
Copy link
Contributor Author

netomi commented Dec 4, 2025

This PR also contains an optimization for evicting the extension json cache for an extension, see #1392 .

It needs to get all version / platform combinations for an extension and evict the key for each combination in the cache, which is pretty slow when you have lots of extensions.

The optimization uses a pattern to clean all keys for the extension if a redis cache is used which supports that kind of operation.

@netomi netomi marked this pull request as draft December 6, 2025 11:16
@netomi
Copy link
Contributor Author

netomi commented Dec 6, 2025

Needs some more updates from the version that is currently running on production to speed up analysis of download counts (evicting the cache is so slow we need to do that in bulk)

@netomi netomi marked this pull request as ready for review December 11, 2025 20:00
@netomi netomi merged commit e9908e2 into eclipse:master Dec 11, 2025
4 checks passed
@netomi netomi deleted the aws-download-logs branch December 11, 2025 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants