feat: add cache to terrakit data fetch#48
Open
Beldine-Moturi wants to merge 10 commits intomainfrom
Open
Conversation
7c172b7 to
9feba28
Compare
4 tasks
EugeneGene
previously approved these changes
Apr 27, 2026
5b7f2fc to
493dbd7
Compare
fMurugi
reviewed
May 5, 2026
Contributor
There was a problem hiding this comment.
- Can we have atomic writes.
In the case we have N pods writing or reading tp avoid partial files read:
Time →
[ create file ][ write... ][ write... ][ write... ]
↑
another pod reads here → 💥 corrupted/partial file
**Sol:**
Time →
[ write tmp file completely ]
[ atomically rename ]
Other pods:
- before rename → file doesn't exist
- after rename → file is 100% complete
- The indexing should be done for each day ,so when we fetch data for 18-20 .we cache data for each day ,so as for the next request we have if dates expand or reduce, we can just fetch for the additional dates not all of the other dates.eg 18-21 will fetch only the missing date.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Related Issue (optional)
How to test this PR?
Update/redeploy the studio with changes in this branch
Use this image in the pipelines deployment, specifically the
terrakit-data-fetch pipelinepod:quay.io/geospatial-studio/geostudio-pipelines:cache-inference-imagesSubmit sample inference request with bbox option.
Check that
/data/cacheis created and sample files saved in itSubmit second request with the exact same payload.
Confirm that the second submission does not fetch data from terrakit in the
terrakit-data-fetchpipeline step, but rather uses the saved files in the cache. (Sample logs for both requests attachaed below)Sample logs for new request:
a1ab0320-65d0-49f6-abb7-c3c9a2bc5a16-task_0-terrakit-data-fetch-stdout (1).log
Sample logs for duplicate request:
b314a45b-d266-4561-8a9d-1c5c51f41c99-task_0-terrakit-data-fetch-stdout.log
Screenshots / Logs (optional)
Checklist