Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maria DB data offloading #598

Open
MartinKolarik opened this issue Jan 12, 2025 · 4 comments
Open

Maria DB data offloading #598

MartinKolarik opened this issue Jan 12, 2025 · 4 comments
Labels

Comments

@MartinKolarik
Copy link
Member

As a follow-up to #269 (comment) we may want to offload data older than 1 hour from Redis to another storage so that we can keep the measurements results stored for longer. S3 and similar alternatives would be very expensive because of request based pricing, so Maria DB seems like the best option. All queries would be single-row lookups by primary key, so a server with 16 - 32 GB RAM and 2 TB+ of fast storage should be sufficient.

@jimaek
Copy link
Member

jimaek commented Jan 12, 2025

How would purging work? Also can we do different TTL per user type?

@MartinKolarik
Copy link
Member Author

MartinKolarik commented Jan 12, 2025

A CRON job that checks the disk usage and drops part of the oldest data if it's above a threshold seems most reliable. Theoretically, we don't need to define storage duration and it just stores as much as possible.

If we wanted to have different TTL per user type, we'd combine this with predetermined expiration (again enforced by a scheduled function), and the disk check job would become just a fallback.

As a rough estimate, 1 month of data at the current usage level = 2 TB (Maria's compression might reduce that somewhat).

@jimaek
Copy link
Member

jimaek commented Jan 12, 2025

Feels icky doing it with cron. Wouldn't it also lock the tables and create issues when it runs?

@MartinKolarik
Copy link
Member Author

Partitioning by days or weeks solves the delete problem, as dropping partitions is fast (unlike DELETE ... WHERE `date` < x which needs to scan the rows). The API will also have enough time to retry if needed, I imagine we'd try to move the data ASAP after the measurement is finished, but serve the data from Redis for 1 hour anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants