refactor(mercury): Optimize decoding time by skipping renaming #681

el-ev · 2024-11-12T09:43:36Z

In the save_to_file function of cache_object.rs, we previously written data to a temporary file and then renamed it to the final path. This process introduced additional I/O overhead due to the creation and renaming of temporary files (see the flamegraph).

This change modifies the function to write directly to the target file. By doing so, we eliminate the need for temporary files and the renaming operation, reducing I/O operations and improving performance.

It can be observed that decoding time of the test test_pack_decode_with_large_file_with_delta_without_ref reduced from ~26s to ~21s (~20% improvement) on a Macbook Air M2 device.

…y file creation and renameing

vercel · 2024-11-12T09:43:41Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
mega	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Nov 12, 2024 0:49am

Hou-Xiaoxuan · 2024-11-12T11:11:11Z

Thank you for your contribution. In fact, the previous version was designed to use a technique known as “atomic writes” to prevent partially written files from being read. (On most Unix systems, mv/rename is an atomic operation. )By first writing to a temporary file and then renaming it to the target path, we ensure that readers only access fully written files, avoiding issues with incomplete data reads and potential race conditions.

Additionally, the execution time of test_pack_decode_with_large_file_with_delta_without_ref is actually quite variable due to the complex concurrent calls and scheduling at higher levels. In reality, the rename operation itself doesn’t significantly impact performance.

We appreciate your insights and analysis. If you have further suggestions or alternative ideas for optimizing performance while maintaining atomicity, we’d be glad to discuss them.

el-ev · 2024-11-12T12:41:46Z

Thank you for the clarification on the atomic writes. Now I see that data races may occur if the file is read before it is completely written. I will close this PR and propose a new one when I have a better idea for optimization. I thought in-memory atomic data structures or file locks could be utilized.

Besides, I'm current on a student project which is related to the mega project, more specifically, the mercury crate. However, the instructions for contributing to this crate are not clear to me (for instance, the README file is empty and there are few to no related issues). @Hou-Xiaoxuan @genedna Could you please provide me with some guidance on what can be done to contribute to this crate?

Thanks in advance.

Hou-Xiaoxuan · 2024-11-12T13:07:20Z

@el-ev Mercury serves as a foundational module within the project rather than an entry point. Broadly speaking, it contains basic data structures and functionalities related to Git, such as the encoding and decoding functions for git index file. Currently, there are some performance issues with the decoding function, as noted in Issue #600, for which there is a temporary but suboptimal solution. If you’re interested in optimizations, you could look into implementing a more efficient diff algorithm, though this is quite challenging.

Alternatively, if you’re open to contributing beyond the Mercury module, the Libra module might be a good option. Libra is a Rust-based implementation of Git, with some commands yet to be implemented. Additionally, it has more comprehensive documentation and contribution guidelines.

[mercury] refactor: optimize file writing time by preventing temporar…

0aeee53

…y file creation and renameing

vercel bot deployed to Preview November 12, 2024 09:44 View deployment

Merge branch 'main' into decode_fileio

c2d103f

vercel bot deployed to Preview November 12, 2024 09:51 View deployment

el-ev changed the title ~~[mercury] refactor: Optimize decoding time by skipping renaming~~ refactor(mercury): Optimize decoding time by skipping renaming Nov 12, 2024

Merge branch 'main' into decode_fileio

6172569

genedna requested a review from Hou-Xiaoxuan November 12, 2024 12:48

vercel bot deployed to Preview November 12, 2024 12:49 View deployment

el-ev closed this Nov 12, 2024

el-ev deleted the decode_fileio branch December 18, 2024 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(mercury): Optimize decoding time by skipping renaming #681

refactor(mercury): Optimize decoding time by skipping renaming #681

el-ev commented Nov 12, 2024

vercel bot commented Nov 12, 2024 •

edited

Loading

Hou-Xiaoxuan commented Nov 12, 2024

el-ev commented Nov 12, 2024

Hou-Xiaoxuan commented Nov 12, 2024

refactor(mercury): Optimize decoding time by skipping renaming #681

refactor(mercury): Optimize decoding time by skipping renaming #681

Conversation

el-ev commented Nov 12, 2024

vercel bot commented Nov 12, 2024 • edited Loading

Hou-Xiaoxuan commented Nov 12, 2024

el-ev commented Nov 12, 2024

Hou-Xiaoxuan commented Nov 12, 2024

vercel bot commented Nov 12, 2024 •

edited

Loading