Skip to content

Duralite: cache directory listing#135

Draft
tkellogg wants to merge 6 commits intomasterfrom
duralite
Draft

Duralite: cache directory listing#135
tkellogg wants to merge 6 commits intomasterfrom
duralite

Conversation

@tkellogg
Copy link
Owner

@tkellogg tkellogg commented Dec 2, 2022

Reduce disk I/O by moving most directory listing to a probabalistic
cache approach. For one, I don't want to wear out disks by constantly
accessing them. Spinning disks are especially problematic because
latency can be quite high.

This introduces CachedDirIter which replaces usages of fs::ReadDir.
It's an enum that can either wrap a fs::ReadDir or represent a cache
hit by iterating a vector af paths. The cache itself is a Trie, so it
shouldn't take that much memory to hold in memory all paths under
$HOME, for example. The cache lives at program scope and is passed
down into where it's needed.

Cache invalidation is a problem, though. I don't want to refresh the
entire Trie all at once, but I also want guarantees that a new directory
will be recognized within a certain time limit, e.g. within 10 minutes.
Here I take a probabalistic approach. Each individual directory entry is
invalidated independently. On any given pass, there's an X% chance
that a single directory will be invalidated. X is calculated such that
caches will be invalidated within some maximum time bound (10 minutes)
95% of the time. In the remaining 10% of cases, they're
force-invalidated at the 10 minute mark.

The duralite sub-project has been about reducing dura's presence on host
machines by utilizing as few resources as possible. I don't want people
to not use dura because "my computer runs slow with it". Something I've
been observing is that, as I reduce these I/O-intensive bottlenecks,
dura uses even more CPU. In a follow-up PR I want to add some
strategically placed thread::sleep's to spread out the CPU usage
evenly throughout the entire 5 second polling interval.

Reduce disk I/O by moving most directory listing to a probabalistic
cache approach. For one, I don't want to wear out disks by constantly
accessing them. Spinning disks are especially problematic because
latency can be quite high.

This introduces `CachedDirIter` which replaces usages of `fs::ReadDir`.
It's an enum that can either wrap a `fs::ReadDir` or represent a cache
hit by iterating a vector af paths. The cache itself is a Trie, so it
shouldn't take that much memory to hold in memory all paths under
`$HOME`, for example. The cache lives at program scope and is passed
down into where it's needed.

Cache invalidation is a problem, though. I don't want to refresh the
entire Trie all at once, but I also want guarantees that a new directory
will be recognized within a certain time limit, e.g. within 10 minutes.
Here I take a probabalistic approach. Each individual directory entry is
invalidated independently. On any given pass, there's an `X%` chance
that a single directory will be invalidated. `X` is calculated such that
caches will be invalidated within some maximum time bound (10 minutes)
95% of the time. In the remaining 10% of cases, they're
force-invalidated at the 10 minute mark.

The duralite sub-project has been about reducing dura's presence on host
machines by utilizing as few resources as possible. I don't want people
to not use dura because "my computer runs slow with it". Something I've
been observing is that, as I reduce these I/O-intensive bottlenecks,
dura uses even more CPU. In a follow-up PR I want to add some
strategically placed `thread::sleep`'s to spread out the CPU usage
evenly throughout the entire 5 second polling interval.
@tkellogg tkellogg changed the title broken code Duralite: cache directory listing Dec 2, 2022
@tkellogg
Copy link
Owner Author

tkellogg commented Dec 2, 2022

TODO: this needs more tests before it can be released into the wild

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant