-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(mercury): Optimize decoding time by skipping renaming #681
Conversation
…y file creation and renameing
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Thank you for your contribution. In fact, the previous version was designed to use a technique known as “atomic writes” to prevent partially written files from being read. (On most Unix systems, mv/rename is an atomic operation. )By first writing to a temporary file and then renaming it to the target path, we ensure that readers only access fully written files, avoiding issues with incomplete data reads and potential race conditions. Additionally, the execution time of We appreciate your insights and analysis. If you have further suggestions or alternative ideas for optimizing performance while maintaining atomicity, we’d be glad to discuss them. |
Thank you for the clarification on the atomic writes. Now I see that data races may occur if the file is read before it is completely written. I will close this PR and propose a new one when I have a better idea for optimization. I thought in-memory atomic data structures or file locks could be utilized. Besides, I'm current on a student project which is related to the mega project, more specifically, the Thanks in advance. |
@el-ev Mercury serves as a foundational module within the project rather than an entry point. Broadly speaking, it contains basic data structures and functionalities related to Git, such as the encoding and decoding functions for git index file. Currently, there are some performance issues with the decoding function, as noted in Issue #600, for which there is a temporary but suboptimal solution. If you’re interested in optimizations, you could look into implementing a more efficient diff algorithm, though this is quite challenging. Alternatively, if you’re open to contributing beyond the Mercury module, the Libra module might be a good option. Libra is a Rust-based implementation of Git, with some commands yet to be implemented. Additionally, it has more comprehensive documentation and contribution guidelines. |
In
the save_to_file
function ofcache_object.rs
, we previously written data to a temporary file and then renamed it to the final path. This process introduced additional I/O overhead due to the creation and renaming of temporary files (see the flamegraph).This change modifies the function to write directly to the target file. By doing so, we eliminate the need for temporary files and the renaming operation, reducing I/O operations and improving performance.
It can be observed that decoding time of the test
test_pack_decode_with_large_file_with_delta_without_ref
reduced from ~26s to ~21s (~20% improvement) on a Macbook Air M2 device.