-
-
Notifications
You must be signed in to change notification settings - Fork 99
Use zip archive format for exporting & importing #1727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@csb0730 this should also reduce memory consumption further, as there is no need to load blobs into memory at all. |
I would suggest to use brotli, which has nice implementation in rust here: https://crates.io/crates/brotli. It was developed specifically for the web and means it compresses various text based documents like html considerably better than gzip (https://medium.com/oyotech/how-brotli-compression-gave-us-37-latency-improvement-14d41e50fee4). This will also become useful if we decide to store full content html emails, as we can reuse the dependency for compressing those. |
This is not correct: We don't load all blobs into memory! Only the current file to copy! Regarding the whole import/export process: There is no change regardless what file format you use, sqlar or zip. |
Comparison gzip vs brotli and an existing dc backup
|
I don't know why zip should be faster? Maybe You can explain? Please see my comment here in former issue: And maybe a general consideration: The development around the backup issue is very welcome and I'm very happy that this discussion starts and is evolving so quick. But please consider all aspects deeply before going some way. |
For comparison also added lz4
|
Note brotli is more expensive than gzip in compression, but cheaper in decompression. |
sorry, i don't understand. ZIP is an archive format and brotli is a compression format.
Can you use brotli with zip? If not, then i fail to see how it relates to the issue.
|
Even the final compress ratio is not the key element/feature/benefit of a backup. The technical possibilities (and needs) itself are very important IMHO. |
Oh, I thought you were talking about compression, and just forgot to add the |
Note, zip is not just an archive format it does both compression and archival. |
Compression does not matter much, because most of the backup size is taken by blobs and they are already compressed. |
@dignifiedquire -- i changed the title and the initial issue description to clarify this is about "zip archive". Sorry for the confusion. We could also consider tar+brotli i guess but i feel at least on windows and phones there will not be much support for dealing with it -- people now ".zip" files and usually have an idea how to handle them if they want to. |
A last comment: Maybe I'm wrong but I'm quite shure when all aspects or needs and traits of the file formats are on the table the sqlite database will be in another light? |
ZIP files store directory in the end, so you can't start import until you get the whole file. ZIP is only streamable in a sense that you can start export immediately, but import will only start at the end of download. |
They are? compressing them all together could yield quite the benefit, vs individual compression @hpk42 but users are supposed to import this file not go work with it manually, so I am not seeing the benefit of having good windows support |
Streaming import sounds like a serious attack vector, you would start ingesting data without being able to verify that it is uncompromised. |
We can also PGP-encrypt whatever format we use, it can also work if the key is already transferred via autocrypt setup message. ZIP AES encryption is using a password, which is not that useful to us. |
I agree, we don't have a trusted channel if we are transferring a backup over the LAN. The simplest way is to transfer a checksum/key over the second channel (QR code) and simply transmit the backup file over unencrypted TCP connection. If we want to stream, we need to authenticate each packet by using TLS maybe. It is more complicated, and since we can't read ZIP file until we receive the end of central directory, it does not matter much. |
Overall, I don't see many advantages now compared to sqlar. And there is an added dependency of zip-rs. Also compatibility is easier with sqlar, because both formats are SQLite databases. Just open it and then check whether it is an sqlar format or not. If we go with zip, we need to also change how we search for the file. |
Tar is streamable both on export and import and has an async library by @dignifiedquire: https://github.com/dignifiedquire/async-tar |
thanks all for the discussion, i moved things to a new issues, addressing various comments here. |
A zip archive export would contain the sqlite db as is, plus all the blob files from the blob dir. Advantages/reasons for this approach:
When introducing zip files for export/import we will need to continue importing from the old format for a longer time.
The text was updated successfully, but these errors were encountered: