Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable autotrimming #11

Merged
merged 2 commits into from
Sep 12, 2024
Merged

Enable autotrimming #11

merged 2 commits into from
Sep 12, 2024

Conversation

npezza93
Copy link
Collaborator

Fixes #10

@npezza93
Copy link
Collaborator Author

testing this on prod and its working like a charm

@npezza93 npezza93 merged commit dfb0165 into main Sep 12, 2024
2 checks passed
@npezza93 npezza93 deleted the autotrim branch September 12, 2024 01:26
@dhh
Copy link
Member

dhh commented Sep 12, 2024

Awesome. Wonder if we should gel the domain language around trimming vs pruning?

@dhh
Copy link
Member

dhh commented Sep 12, 2024

@djmb Could you have a look?

@npezza93
Copy link
Collaborator Author

Awesome. Wonder if we should gel the domain language around trimming vs pruning?

I've renamed everything from prune to trim. Is that what you mean?

@dhh
Copy link
Member

dhh commented Sep 12, 2024

Ah great. Yes 👍

@kevinmcconnell
Copy link

@npezza93 what do you think about triggering the trimming according to send activity, rather than unsubscribes? Could trigger a trim every n messages (by keeping a counter, or just using a random check on each write that's weighted according to that n, which will average out to the same thing).

That way the trimming workload would be balanced with the write workload, rather than being dependent on how often clients unsubscribe. Which I think should better match the work that trimming has to do -- the more messages you send, the more of them you'll have to trim.

@djmb
Copy link

djmb commented Sep 12, 2024

I'd recommend a random check rather than a counter - you don't need to store the counter state, and you avoid a thundering herd from a bunch of processes booted together.

def perform
::SolidCable::Message.prunable.delete_all
::SolidCable::Message.trimmable.delete_all
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this should work pretty well with SQLite, I have some worries about how this would behave on MySQL or PostgreSQL.

It's deleting an unbounded number of messages so could lock for a fair amount of time. If the database is being replicated, that could also trigger replication lag as those deletes are processed.

Also there could generally be locking issues with concurrent jobs attempting to run the query.

The approach solid_cache takes is to delete small amounts of data but do it often.

  • Every N / 2 writes we trigger an expiration task (a job or just in a thread).
  • The task will try to expire up to N records.

We expire N records, but trigger the expiration after N/2 inserts so we have downward pressure on the cache size when it is too large. But we don't try to clear everything out at once as that could be millions and millions of records.

Solid Cache then has a slightly complicated process for deleting records in a concurrent safe manner, but I think we could maybe just rely on SKIP LOCKED here instead. That means you need MySQL 8.0 at least, but Solid Queue already requires that so I don't think it would be an issue to have Solid Cable do the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put up #15 which should address this. Let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add auto trimming
4 participants