-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUMULUS-3757: Update granule upsert logic to allow updating collection info #3887
Closed
Closed
Changes from 7 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
5a7fd51
updtaing granules and testing
etcart 82cd1aa
3757 changes - endpoint, PG, and tests
Nnaga1 86738c8
turn on es update
etcart 6bf4880
pull bare wrap for updateGranule
etcart af612f8
slightly clearer name
etcart b6365cf
Merge branch 'es_for_move_collections_backport' of https://github.com…
Nnaga1 95c8617
linting
Nnaga1 e6822c8
api client for this endpoint
etcart File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
tl;dr.
We should probably move the concurrency logic to the API package, handle bounding/configuring concurrency and address validation via something like zod.
Details/concerns:
We should consider implementing a concurrency or size bound somehow the API is limited by design to 2 DB connections >by design<, if you throw a large enough request at this endpoint I'd expect it will either timeout or overwhelm the knex connection pool, or both. That's probably not appropriate to the db package (and why we wound up with this sort of logic in the
api/lib
write 'stuff' as an approach.Users aren't going to just use this for the intended approach for 3757, if it's part of the public API we likely should use zod to validate inputs/schemas/etc before passing a payload to a @cumulus/db method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this should probably just update one granule and its file, while the api/lib or something else, would do the promise.all/concurrency stuff,
I'm all for changing the approach, I was really hoping in the review it would be to make it better than what I had bc theres stuff I was sure about (concurrency, api calls, the api timing out, how to call this.......), so whatever would be better, this has also changed a lot in the way it should be done so that has also thrown me off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so should I do it like this:
make this endpoint just update one granule and its files in PG and ES -> make an api-client call for that -> elswhere, call the api-client fn concurrently with the list of target_granules (the changes to make)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to make a bulk update feature like this utilize a single granule call - that's going to force scaling of lambda invocations per granule, and is super inefficient.
There's room for various approaches here, but particularly, we need to put configurations/limits on concurrency around
await Promise.all(granules.map(async (granule) => {
so that we don't overwhelm the limited knex connection pool or make an exception for this bulk call and increase it, however that has user monitoring/tuning concerns.As an opinionated take: I'd expect that best looks like having the DB package implement a single granule + all files update and associated tests, with an api lib method that does it for a bunch of granules and has concurrency controls in it. That allows for solid testing around a single granule update,and decouples concurrency concerns from the DB package. If the team feels strongly we should support bulk updates as part of the domain of the DB package, we could easily do that same approach entirely within the DB package.