`fs-cache`: Add Cache Struct #95

Pushkarm029 · 2024-11-17T16:21:47Z

Cache is not implemented yet. It will be using MemoryLimitedStorage under the hood.
MemoryLimitedStorage is a simplified version of FolderStorage that keeps a small number of (K, V) pairs in memory.
Currently, it uses Least Recently Used (LRU) algorithm with a LinkedHashMap to decide which (K, V) pairs to keep in memory.

Signed-off-by: Pushkar Mishra <[email protected]>

github-actions · 2024-11-17T16:42:00Z

Benchmark for `603c1db`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	249.1±0.55µs	247.3±3.21µs	-0.72%
blake3_resource_id_creation/compute_from_bytes:medium	15.6±0.32µs	15.5±0.12µs	-0.64%
blake3_resource_id_creation/compute_from_bytes:small	1363.7±5.28ns	1366.1±3.36ns	+0.18%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	196.7±0.46µs	196.9±0.54µs	+0.10%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1704.6±6.46µs	1710.0±15.22µs	+0.32%
crc32_resource_id_creation/compute_from_bytes:large	86.7±0.52µs	87.0±1.98µs	+0.35%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.01µs	5.4±0.02µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.3±0.27ns	92.5±1.74ns	+0.22%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.6±1.54µs	64.6±0.69µs	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	912.7±4.68µs	916.7±5.12µs	+0.44%
resource_index/index_build//tmp/ark-fs-index-benchmarksPQj4Mh	110.1±1.24ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksg00L2f	113.0±2.09ms	N/A	N/A
resource_index/index_get_resource_by_id	98.6±1.25ns	98.1±3.16ns	-0.51%
resource_index/index_get_resource_by_path	55.1±3.28ns	55.9±3.43ns	+1.45%
resource_index/index_update_all	1092.6±26.67ms	1121.2±40.66ms	+2.62%
resource_index/index_update_one	671.6±17.92ms	659.4±16.29ms	-1.82%

Signed-off-by: Pushkar Mishra <[email protected]>

github-actions · 2024-11-18T16:54:50Z

Benchmark for `20fe5a6`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	248.5±0.43µs	250.9±0.79µs	+0.97%
blake3_resource_id_creation/compute_from_bytes:medium	15.6±0.14µs	15.6±0.06µs	0.00%
blake3_resource_id_creation/compute_from_bytes:small	1366.3±6.88ns	1351.3±8.54ns	-1.10%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.1±0.41µs	196.7±0.68µs	-0.20%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1701.8±4.62µs	1706.9±19.45µs	+0.30%
crc32_resource_id_creation/compute_from_bytes:large	86.7±0.19µs	86.7±0.35µs	0.00%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.01µs	5.4±0.02µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.4±0.52ns	92.3±0.29ns	-0.11%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.4±0.26µs	64.8±0.26µs	+0.62%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	912.0±4.12µs	913.3±1.83µs	+0.14%
resource_index/index_build//tmp/ark-fs-index-benchmarksnxxYTn	107.7±1.31ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksvuyxiZ	106.4±1.91ms	N/A	N/A
resource_index/index_get_resource_by_id	100.7±1.10ns	101.1±1.20ns	+0.40%
resource_index/index_get_resource_by_path	53.6±0.85ns	59.6±3.74ns	+11.19%
resource_index/index_update_all	1121.3±43.90ms	1147.9±45.96ms	+2.37%
resource_index/index_update_one	690.6±30.43ms	696.9±24.44ms	+0.91%

kirillt · 2024-11-20T06:52:20Z

fs-cache/src/cache.rs

+    /// Load most recent cached items into memory based on timestamps
+    pub fn load_recent(&mut self) -> Result<()> {
+        self.storage.load_fs()
+    }


Actually, we don't need to expose this function.

Only set/get API is needed

The rest should happen under the hood:

any set should write both to memory and disk

one-way sync from disk to memory is needed when users get values

if we hit our own limit for bytes stored in the in-memory mapping, we erase oldest entries from it

but entries are always stored on disk, no need to sync from memory to disk explicitly

Primary usage scenario: keys are of type ResourceId

App indexes a folder.

App may populate the cache before using it, but it's not required.

App will query caches by key:

if the entry is in memory already, that's great, we just return the value

otherwise, we check disk for entry with the requested key

if it is on disk, we add it to in-memory storage and return the value

otherwise, we return None

Index can notify the app about recently discovered resources. Corresponding values can be in the cache already, but this is not required. App can initialize values for new resources.

Secondary usage scenario: keys are of arbitrary type

Can be any deterministic computation.

fs-cache/src/cache.rs

Signed-off-by: Pushkar Mishra <[email protected]>

github-actions · 2024-11-20T17:00:38Z

Benchmark for `840a337`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	249.7±0.90µs	249.9±3.96µs	+0.08%
blake3_resource_id_creation/compute_from_bytes:medium	15.5±0.07µs	15.6±0.39µs	+0.65%
blake3_resource_id_creation/compute_from_bytes:small	1355.3±1.66ns	1345.4±1.59ns	-0.73%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.6±2.23µs	196.5±1.78µs	-0.56%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1689.4±10.06µs	1689.3±19.40µs	-0.01%
crc32_resource_id_creation/compute_from_bytes:large	92.0±1.71µs	91.8±0.43µs	-0.22%
crc32_resource_id_creation/compute_from_bytes:medium	5.7±0.02µs	5.7±0.03µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	96.4±1.08ns	96.3±0.36ns	-0.10%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	65.2±0.36µs	65.2±0.24µs	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	911.5±6.57µs	909.8±12.85µs	-0.19%
resource_index/index_build//tmp/ark-fs-index-benchmarkshSX1P0	112.8±1.64ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksrfjRBT	111.6±0.59ms	N/A	N/A
resource_index/index_get_resource_by_id	128.4±1.86ns	126.5±2.54ns	-1.48%
resource_index/index_get_resource_by_path	53.1±0.87ns	53.4±1.43ns	+0.56%
resource_index/index_update_all	1121.0±32.23ms	1125.9±45.68ms	+0.44%
resource_index/index_update_one	692.9±29.07ms	680.3±28.80ms	-1.82%

Signed-off-by: Pushkar Mishra <[email protected]>

github-actions · 2024-11-21T07:20:11Z

Benchmark for `50fe163`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	248.3±0.92µs	249.2±1.65µs	+0.36%
blake3_resource_id_creation/compute_from_bytes:medium	15.6±0.28µs	15.6±0.05µs	0.00%
blake3_resource_id_creation/compute_from_bytes:small	1369.6±3.04ns	1377.0±1.78ns	+0.54%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.0±0.93µs	197.1±2.58µs	+0.05%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1698.9±9.96µs	1702.7±18.55µs	+0.22%
crc32_resource_id_creation/compute_from_bytes:large	86.7±0.75µs	86.8±0.27µs	+0.12%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.04µs	5.4±0.01µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.4±0.48ns	92.4±0.82ns	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.3±0.21µs	64.5±0.74µs	+0.31%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	913.1±8.90µs	910.9±7.09µs	-0.24%
resource_index/index_build//tmp/ark-fs-index-benchmarksVXChWr	110.9±0.81ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksb93cF3	112.2±1.35ms	N/A	N/A
resource_index/index_get_resource_by_id	99.1±0.98ns	99.3±3.04ns	+0.20%
resource_index/index_get_resource_by_path	54.6±1.22ns	55.6±1.78ns	+1.83%
resource_index/index_update_all	1088.2±32.03ms	1123.7±41.04ms	+3.26%
resource_index/index_update_one	678.0±17.77ms	667.3±16.55ms	-1.58%

fs-cache/src/cache.rs

fs-cache/src/memory_limited_storage.rs

fs-cache/Cargo.toml

fs-cache/src/memory_limited_storage.rs

Pushkarm029 · 2024-11-24T06:00:13Z

Thank you for the review.

ark-cli/src/util.rs

fs-cache/src/memory_limited_storage.rs

kirillt · 2024-11-26T12:14:12Z

fs-cache/src/memory_limited_storage.rs

+        // Try to load from disk
+        let file_path = self.path.join(format!("{}.json", key));
+        if file_path.exists() {
+            // Doubt: Update file's modiied time (in disk) on read to preserve LRU across app restarts?


Let's track this feature and work on it later. Better to keep implementation simple for this moment and avoid redundant state. Btw we can also simply write cached keys into a file + apply atomic versioning on it, so all peers would have same view of LRU.

fs-cache/src/memory_limited_storage.rs

kirillt · 2024-11-26T12:20:39Z

fs-cache/src/memory_limited_storage.rs

+    // Write a single value to disk
+    fn write_value_to_disk(&mut self, key: &K, value: &V) -> Result<()> {
+        let file_path = self.path.join(format!("{}.json", key));
+        let mut file = File::create(&file_path)?;


Let's add debug_assert that the file doesn't exist.

Also, we should use lightweight atomic writing to avoid dirty writing. Keep in mind scenario when several ARK apps on same device use same folder and write to the cache in parallel.

I believe, that atomic versions would be excessive here, but I'm not 100% sure yet.

fs-cache/src/memory_limited_storage.rs

Signed-off-by: Pushkar Mishra <[email protected]>

github-actions · 2024-11-30T20:15:30Z

Benchmark for `42bb74c`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	253.2±1.52µs	249.8±1.76µs	-1.34%
blake3_resource_id_creation/compute_from_bytes:medium	15.6±0.08µs	15.5±0.05µs	-0.64%
blake3_resource_id_creation/compute_from_bytes:small	1364.8±2.77ns	1361.8±12.07ns	-0.22%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	196.8±0.38µs	197.4±0.97µs	+0.30%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1708.0±5.92µs	1710.5±15.38µs	+0.15%
crc32_resource_id_creation/compute_from_bytes:large	86.8±0.29µs	86.5±0.16µs	-0.35%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.06µs	5.4±0.03µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.3±0.38ns	92.3±0.38ns	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.7±1.69µs	64.5±0.24µs	-0.31%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	929.5±3.84µs	909.0±7.26µs	-2.21%
resource_index/index_build//tmp/ark-fs-index-benchmarksA5AkUo	102.6±0.86ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksiZe8TK	106.8±2.63ms	N/A	N/A
resource_index/index_get_resource_by_id	114.2±5.84ns	101.5±3.15ns	-11.12%
resource_index/index_get_resource_by_path	56.4±0.55ns	56.6±1.79ns	+0.35%
resource_index/index_update_all	1088.6±30.71ms	1082.8±46.16ms	-0.53%
resource_index/index_update_one	660.5±12.96ms	639.0±12.28ms	-3.26%

Signed-off-by: Pushkar Mishra <[email protected]>

github-actions · 2024-12-01T12:08:48Z

Benchmark for `c7341e8`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	248.7±0.71µs	248.1±1.25µs	-0.24%
blake3_resource_id_creation/compute_from_bytes:medium	15.5±0.04µs	15.6±0.19µs	+0.65%
blake3_resource_id_creation/compute_from_bytes:small	1365.8±2.31ns	1365.3±2.84ns	-0.04%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.5±0.62µs	197.4±0.51µs	-0.05%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1700.2±4.80µs	1707.8±15.05µs	+0.45%
crc32_resource_id_creation/compute_from_bytes:large	86.6±0.31µs	86.6±0.34µs	0.00%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.02µs	5.4±0.02µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.3±0.35ns	92.3±0.36ns	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	65.1±0.66µs	65.5±2.18µs	+0.61%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	989.3±3.53µs	915.9±2.34µs	-7.42%
resource_index/index_build//tmp/ark-fs-index-benchmarks8EW2uf	104.7±2.28ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksz7qZYD	110.6±2.78ms	N/A	N/A
resource_index/index_get_resource_by_id	98.5±0.50ns	98.8±1.41ns	+0.30%
resource_index/index_get_resource_by_path	52.9±0.78ns	53.4±1.23ns	+0.95%
resource_index/index_update_all	1096.8±38.35ms	1118.9±49.79ms	+2.01%
resource_index/index_update_one	668.8±25.60ms	667.1±20.70ms	-0.25%

Signed-off-by: Pushkar Mishra <[email protected]>

github-actions · 2024-12-01T12:43:46Z

Benchmark for d698fcf

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	251.2±6.65µs	251.8±5.32µs	+0.24%
blake3_resource_id_creation/compute_from_bytes:medium	15.5±0.04µs	15.6±0.18µs	+0.65%
blake3_resource_id_creation/compute_from_bytes:small	1365.2±38.12ns	1356.3±10.87ns	-0.65%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	198.4±5.47µs	196.9±0.99µs	-0.76%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1713.0±16.08µs	1724.9±45.91µs	+0.69%
crc32_resource_id_creation/compute_from_bytes:large	87.1±1.95µs	87.0±1.61µs	-0.11%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.14µs	5.4±0.05µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.3±0.36ns	92.4±0.36ns	+0.11%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.7±0.26µs	64.5±0.31µs	-0.31%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	916.4±15.36µs	914.6±8.60µs	-0.20%
resource_index/index_build//tmp/ark-fs-index-benchmarkscqgOio	111.8±1.02ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksosCgHK	112.9±2.86ms	N/A	N/A
resource_index/index_get_resource_by_id	108.4±3.05ns	117.9±3.55ns	+8.76%
resource_index/index_get_resource_by_path	54.3±2.07ns	53.4±0.84ns	-1.66%
resource_index/index_update_all	1111.4±33.86ms	1131.9±44.19ms	+1.84%
resource_index/index_update_one	670.8±23.59ms	677.0±18.97ms	+0.92%

Pushkarm029 · 2024-12-01T12:44:38Z

fs-cache/src/cache.rs

+            // TODO: NEED FIX
+            memory_cache: LruCache::new(
+                NonZeroUsize::new(max_memory_bytes)
+                    .expect("Capacity can't be zero"),
+            ),


LruCache requires the capacity (number of items) to be specified during initialization. However, our Cache is designed to be limited by max_memory_bytes. So, my question is: what would be the most way to initialize the LruCache with?

Note: In all other functions, we are already comparing based on the number of bytes, not the number of items.

I think we can create another parameter(max_items) which will be Option<usize> with default as 100.

I think the number of items should be left up to the developer calling the function. Instead of taking max_memory_bytes as an argument, we could take max_memory_items. This would require redesigning the implementation to focus on the number of items rather than memory size, but it would give developers the flexibility to decide based on the average size of the items they store.

If prioritizing memory size over the number of items is a hard requirement, then I can think of two options:

We could implement our own version of LruCache

Or, LruCache has a resize() method, and we could use this to resize the cache based on other metadata we track

Also, I looked into uluru, and it uses the number of items to initialize the cache as well. Just mentioning this in case you were considering it.

Guys, what about this? https://docs.rs/lru-mem/latest/lru_mem/

But it has only 3 stars on GitHub..

It's actually an interesting option and would have been a perfect fit 😃

I wouldn’t recommend it though because if we find any issues later in the crate, we’d have to fork it and fix the problem ourselves, and we’re not familiar with the code. Plus, since it's not actively maintained/ used, there wouldn’t be anyone around to help us either.

github-actions · 2024-12-01T12:46:06Z

Benchmark for a683a1e

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	248.1±2.89µs	251.9±1.44µs	+1.53%
blake3_resource_id_creation/compute_from_bytes:medium	15.5±0.03µs	15.5±0.05µs	0.00%
blake3_resource_id_creation/compute_from_bytes:small	1365.5±25.41ns	1359.4±10.95ns	-0.45%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.2±0.51µs	199.2±2.51µs	+1.01%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1717.8±7.57µs	1715.3±26.78µs	-0.15%
crc32_resource_id_creation/compute_from_bytes:large	86.8±0.40µs	86.8±0.63µs	0.00%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.08µs	5.4±0.02µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.3±0.31ns	92.8±2.31ns	+0.54%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.3±0.33µs	64.4±0.27µs	+0.16%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	914.7±18.73µs	920.0±16.67µs	+0.58%
resource_index/index_build//tmp/ark-fs-index-benchmarksOmbnwj	103.7±1.25ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksrimMQB	103.9±2.68ms	N/A	N/A
resource_index/index_get_resource_by_id	106.9±3.64ns	99.4±3.23ns	-7.02%
resource_index/index_get_resource_by_path	52.7±0.54ns	53.6±0.72ns	+1.71%
resource_index/index_update_all	1080.8±30.30ms	1086.0±37.89ms	+0.48%
resource_index/index_update_one	650.5±15.88ms	659.1±24.05ms	+1.32%

github-actions · 2024-12-01T12:55:56Z

Benchmark for `92357c6`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	248.6±2.33µs	251.2±1.34µs	+1.05%
blake3_resource_id_creation/compute_from_bytes:medium	15.5±0.06µs	15.6±0.05µs	+0.65%
blake3_resource_id_creation/compute_from_bytes:small	1363.7±5.39ns	1366.4±6.19ns	+0.20%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.0±1.64µs	197.4±1.60µs	+0.20%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1707.8±7.37µs	1710.1±11.30µs	+0.13%
crc32_resource_id_creation/compute_from_bytes:large	86.8±0.23µs	86.6±0.35µs	-0.23%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.01µs	5.4±0.02µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.4±0.37ns	92.3±0.18ns	-0.11%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	66.1±0.16µs	64.6±1.60µs	-2.27%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	913.4±2.26µs	914.3±4.10µs	+0.10%
resource_index/index_build//tmp/ark-fs-index-benchmarksFMPQVr	106.1±0.98ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksQNMEsB	108.6±1.48ms	N/A	N/A
resource_index/index_get_resource_by_id	106.8±1.16ns	102.4±7.45ns	-4.12%
resource_index/index_get_resource_by_path	53.8±1.29ns	53.1±1.25ns	-1.30%
resource_index/index_update_all	1081.6±25.74ms	1085.1±36.51ms	+0.32%
resource_index/index_update_one	651.6±14.25ms	660.5±13.79ms	+1.37%

tareknaser · 2024-12-06T21:51:19Z

fs-cache/src/cache.rs

+            // TODO: NEED FIX
+            memory_cache: LruCache::new(
+                NonZeroUsize::new(max_memory_bytes)
+                    .expect("Capacity can't be zero"),
+            ),


I think the number of items should be left up to the developer calling the function. Instead of taking max_memory_bytes as an argument, we could take max_memory_items. This would require redesigning the implementation to focus on the number of items rather than memory size, but it would give developers the flexibility to decide based on the average size of the items they store.

If prioritizing memory size over the number of items is a hard requirement, then I can think of two options:

We could implement our own version of LruCache

Or, LruCache has a resize() method, and we could use this to resize the cache based on other metadata we track

Also, I looked into uluru, and it uses the number of items to initialize the cache as well. Just mentioning this in case you were considering it.

tareknaser · 2024-12-06T21:54:23Z

fs-cache/src/cache.rs

+        // Remove oldest entries until we have space for new value
+        while self.current_memory_bytes + size > self.max_memory_bytes {
+            let (_, old_entry) = self
+                .memory_cache
+                .pop_lru()
+                .expect("Cache should have entries to evict");
+            debug_assert!(
+                self.current_memory_bytes >= old_entry.size,
+                "Memory tracking inconsistency detected"
+            );
+            self.current_memory_bytes = self
+                .current_memory_bytes
+                .saturating_sub(old_entry.size);
+        }


But Yeah I think we should remove this code at all costs.

It’s currently undermining the purpose of using the external LRU cache crate. If there’s absolutely no other way around this, then we may need to implement our own LRU cache solution.

This operation should be O(1)

Tried replacing the existing logic with resize(). However, this operation is still not O(1) because lru.resize has a complexity of O(N).

Additionally, using resize requires extra calculations to determine the target_size for the LRU cache. Considering this overhead, I think the existing approach is better.

I will explore the pros & cons of implementing our own LRU cache.

tareknaser · 2024-12-06T21:56:36Z

fs-cache/src/cache.rs

+struct CacheEntry<V> {
+    value: V,
+    size: usize,
+}


Why do we need to store the size of the value? Can’t we just read it from fs when needed? I don’t see it being read often.

If it’s for convenience to avoid I/O calls…

We need to track memory consumption in bytes to be precise about when to offload values, and to support large values, too. We probably can't avoid saving value sizes into memory, otherwise when we hit the limit we cannot know how much values we need to offload.

However, that's where we could split the crate into 2 flavours:

dynamically-sized values e.g. byte vectors and text strings

statically-sized values e.g. integers

For the 2nd flavour we could utilize some standard Rust trait.

Is there a way to have these 2 flavours combined nicely in a single crate?

We need to track memory consumption in bytes to be precise about when to offload values...

Thtat's fine but we're actually reading the file size from the disk again in the get_file_size method, even though we already have it stored in the metadata. That seems a bit wasteful. Check out the next comment for more on this

dynamically-sized values e.g. byte vectors and text strings

That's a good point
I completely missed the dynamic types aspect when I looked at this. Now it makes a lot more sense why we need to track the data size instead of just the number of items.

Is there a way to have these 2 flavours combined nicely in a single crate?

If we're dealing with types that have a fixed size, like usize, it doesn't really matter if we count how many items there are or just the total size they take up. But this completely breaks with the second flavour you mentioned.

The only solution I can think of right now is to treat all types of data as if they were the second type – basically, keep track of how much memory they use instead of how many there are. But that brings up the question of how to do this in a clean way

Main thread: #95 (comment)

tareknaser · 2024-12-06T21:58:14Z

fs-cache/src/cache.rs

+        log::debug!("cache/{}: caching in memory for key {}", self.label, key);
+        let size = self.get_file_size(key)?;


... then we would essentially be defeating the purpose here, as we're reading the file size from disk again

fs-cache/src/cache.rs

fs-storage/src/utils.rs

fs-cache/src/lib.rs

fs-cache/src/cache.rs

README.md

fs-cache/src/cache.rs

kirillt · 2024-12-11T06:59:04Z

fs-cache/src/cache.rs

            }
        }

-        // Sort by size before loading
-        file_metadata.sort_by(|a, b| b.1.cmp(&a.1));
+        // Sort by modified time (most recent first)


Actually, I'm not sure that pre-loading the most recently modified values would really be beneficial.

We could implement more sophisticated approach with gathering query statistics and recording it somewhere on disk for pre-loading in future. But I would do it in a separate PR and not right now.

Signed-off-by: Pushkar Mishra <[email protected]>

github-actions · 2024-12-24T19:02:42Z

Benchmark for `df8396c`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	255.6±0.74µs	250.0±0.80µs	-2.19%
blake3_resource_id_creation/compute_from_bytes:medium	15.9±0.06µs	15.7±0.09µs	-1.26%
blake3_resource_id_creation/compute_from_bytes:small	1426.0±54.88ns	1475.2±19.86ns	+3.45%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	204.9±0.50µs	195.1±0.78µs	-4.78%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1698.2±4.98µs	1702.8±11.64µs	+0.27%
crc32_resource_id_creation/compute_from_bytes:large	87.4±0.27µs	87.5±1.07µs	+0.11%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.02µs	5.4±0.02µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.5±0.23ns	92.5±0.42ns	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	62.8±0.47µs	62.5±0.41µs	-0.48%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	916.0±5.33µs	919.8±13.31µs	+0.41%
resource_index/index_build//tmp/ark-fs-index-benchmarks8M1Tib	99.7±1.33ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksVNF5Fp	100.3±1.69ms	N/A	N/A
resource_index/index_get_resource_by_id	103.2±1.70ns	102.5±2.92ns	-0.68%
resource_index/index_get_resource_by_path	54.5±0.58ns	54.4±1.86ns	-0.18%
resource_index/index_update_all	1116.5±46.81ms	1162.5±66.93ms	+4.12%
resource_index/index_update_one	674.5±31.76ms	693.8±40.04ms	+2.86%

github-actions · 2024-12-24T19:04:59Z

Benchmark for `c50cdbb`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	251.9±2.09µs	251.2±0.93µs	-0.28%
blake3_resource_id_creation/compute_from_bytes:medium	15.6±0.05µs	15.6±0.06µs	0.00%
blake3_resource_id_creation/compute_from_bytes:small	1396.4±52.54ns	1448.1±33.70ns	+3.70%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	194.9±0.37µs	195.5±1.56µs	+0.31%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1698.1±3.68µs	1701.3±20.66µs	+0.19%
crc32_resource_id_creation/compute_from_bytes:large	87.0±0.34µs	87.9±6.48µs	+1.03%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.02µs	5.4±0.02µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.3±0.28ns	92.3±0.34ns	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	62.7±0.84µs	62.3±0.21µs	-0.64%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	916.1±3.71µs	909.8±2.73µs	-0.69%
resource_index/index_build//tmp/ark-fs-index-benchmarksHvKvNs	93.5±0.51ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarkslExIXm	94.6±0.69ms	N/A	N/A
resource_index/index_get_resource_by_id	100.1±0.64ns	99.9±0.57ns	-0.20%
resource_index/index_get_resource_by_path	55.2±1.21ns	54.3±1.16ns	-1.63%
resource_index/index_update_all	1107.4±73.13ms	1162.4±85.92ms	+4.97%
resource_index/index_update_one	671.9±31.38ms	689.4±47.73ms	+2.60%

Signed-off-by: Pushkar Mishra <[email protected]>

github-actions · 2024-12-26T10:57:16Z

Benchmark for `252b2ff`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	250.1±0.96µs	253.4±1.64µs	+1.32%
blake3_resource_id_creation/compute_from_bytes:medium	15.6±0.08µs	15.6±0.04µs	0.00%
blake3_resource_id_creation/compute_from_bytes:small	1464.4±31.08ns	1483.4±15.93ns	+1.30%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	195.9±0.62µs	196.0±0.94µs	+0.05%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1713.6±13.34µs	1706.4±29.58µs	-0.42%
crc32_resource_id_creation/compute_from_bytes:large	87.4±0.23µs	87.6±1.04µs	+0.23%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.02µs	5.4±0.02µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.4±0.22ns	92.6±0.84ns	+0.22%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	62.7±0.93µs	62.6±0.65µs	-0.16%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	912.0±3.39µs	917.0±3.32µs	+0.55%
resource_index/index_build//tmp/ark-fs-index-benchmarksBAPWzC	104.5±0.50ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksDDUQxv	101.3±0.92ms	N/A	N/A
resource_index/index_get_resource_by_id	111.5±1.84ns	95.2±0.62ns	-14.62%
resource_index/index_get_resource_by_path	55.5±0.23ns	49.9±0.33ns	-10.09%
resource_index/index_update_all	1096.6±37.06ms	1154.2±53.52ms	+5.25%
resource_index/index_update_one	673.5±26.77ms	675.5±37.78ms	+0.30%

Pushkarm029 added 2 commits November 17, 2024 20:13

init

b056880

Signed-off-by: Pushkar Mishra <[email protected]>

fix

8797e8f

Signed-off-by: Pushkar Mishra <[email protected]>

few changes

56597ac

Signed-off-by: Pushkar Mishra <[email protected]>

Pushkarm029 requested a review from kirillt November 19, 2024 18:49

kirillt reviewed Nov 20, 2024

View reviewed changes

fs-cache/src/cache.rs Outdated Show resolved Hide resolved

kirillt reviewed Nov 20, 2024

View reviewed changes

fs-cache/src/cache.rs Outdated Show resolved Hide resolved

simplified cache & removed remove

e5b41bc

Signed-off-by: Pushkar Mishra <[email protected]>

fix

33091bf

Signed-off-by: Pushkar Mishra <[email protected]>

Pushkarm029 requested a review from tareknaser November 21, 2024 07:00

tareknaser reviewed Nov 23, 2024

View reviewed changes