Skip to content

Use zip archive format for exporting & importing #1727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hpk42 opened this issue Jul 16, 2020 · 22 comments
Closed

Use zip archive format for exporting & importing #1727

hpk42 opened this issue Jul 16, 2020 · 22 comments
Labels
enhancement New feature or request

Comments

@hpk42
Copy link
Contributor

hpk42 commented Jul 16, 2020

A zip archive export would contain the sqlite db as is, plus all the blob files from the blob dir. Advantages/reasons for this approach:

  • faster and streamable: export can start immediately, there is no need to copy blob files into the db (storage problems)
  • when implementing network-based multi-device setup (export to network & import from network), the import can directly stream the zip file into the app state, requiring only the size of the zip on the phone/device.
  • zip files allow AES encrypting the contents which is not perfect but better than nothing
  • standards: many users know how to deal/post-process zip files
  • rust has documented and maintained zip crates, see eg https://mvdnes.github.io/rust-docs/zip-rs/zip/write/struct.ZipWriter.html

When introducing zip files for export/import we will need to continue importing from the old format for a longer time.

@link2xt
Copy link
Collaborator

link2xt commented Jul 16, 2020

@csb0730 this should also reduce memory consumption further, as there is no need to load blobs into memory at all.

@dignifiedquire
Copy link
Collaborator

I would suggest to use brotli, which has nice implementation in rust here: https://crates.io/crates/brotli. It was developed specifically for the web and means it compresses various text based documents like html considerably better than gzip (https://medium.com/oyotech/how-brotli-compression-gave-us-37-latency-improvement-14d41e50fee4). This will also become useful if we decide to store full content html emails, as we can reuse the dependency for compressing those.

@csb0730
Copy link

csb0730 commented Jul 17, 2020

@csb0730 this should also reduce memory consumption further, as there is no need to load blobs into memory at all.

This is not correct: We don't load all blobs into memory! Only the current file to copy!

Regarding the whole import/export process: There is no change regardless what file format you use, sqlar or zip.

@dignifiedquire
Copy link
Collaborator

Comparison gzip vs brotli and an existing dc backup

  • original: 194M
  • brotli (best compression): 162M
  • gzip (best lvl9): 167M

@csb0730
Copy link

csb0730 commented Jul 17, 2020

A zip export would contain the sqlite db as is, plus all the blob files from the blob dir. Advantages/reasons for this approach:

* faster and streamable: export can start immediately, there is no need to copy blob files into the db (storage problems)

I don't know why zip should be faster? Maybe You can explain?

Please see my comment here in former issue:
#1724 (comment)

And maybe a general consideration:

The development around the backup issue is very welcome and I'm very happy that this discussion starts and is evolving so quick. But please consider all aspects deeply before going some way.

@dignifiedquire
Copy link
Collaborator

For comparison also added lz4

  • lz4 (--best): 167M

@dignifiedquire
Copy link
Collaborator

Note brotli is more expensive than gzip in compression, but cheaper in decompression.

@hpk42
Copy link
Contributor Author

hpk42 commented Jul 17, 2020 via email

@csb0730
Copy link

csb0730 commented Jul 17, 2020

Even the final compress ratio is not the key element/feature/benefit of a backup. The technical possibilities (and needs) itself are very important IMHO.

@dignifiedquire
Copy link
Collaborator

ZIP is an archive format

Oh, I thought you were talking about compression, and just forgot to add the g for gzip. Not sure why there is so much discussion about the archive format /me confused

@hpk42 hpk42 changed the title Use zip format for exporting & importing Use zip archive format for exporting & importing Jul 17, 2020
@dignifiedquire
Copy link
Collaborator

Note, zip is not just an archive format it does both compression and archival. tar is a pure archival format. Ref: https://superuser.com/a/1257441/147739

@link2xt
Copy link
Collaborator

link2xt commented Jul 17, 2020

Compression does not matter much, because most of the backup size is taken by blobs and they are already compressed.

@hpk42
Copy link
Contributor Author

hpk42 commented Jul 17, 2020

@dignifiedquire -- i changed the title and the initial issue description to clarify this is about "zip archive". Sorry for the confusion. We could also consider tar+brotli i guess but i feel at least on windows and phones there will not be much support for dealing with it -- people now ".zip" files and usually have an idea how to handle them if they want to.

@csb0730
Copy link

csb0730 commented Jul 17, 2020

A last comment: Maybe I'm wrong but I'm quite shure when all aspects or needs and traits of the file formats are on the table the sqlite database will be in another light?

@link2xt
Copy link
Collaborator

link2xt commented Jul 17, 2020

when implementing network-based multi-device setup (export to network & import from network), the import can directly stream the zip file into the app state, requiring only the size of the zip on the phone/device.

ZIP files store directory in the end, so you can't start import until you get the whole file. ZIP is only streamable in a sense that you can start export immediately, but import will only start at the end of download.

@dignifiedquire
Copy link
Collaborator

Compression does not matter much, because most of the backup size is taken by blobs and they are already compressed.

They are? compressing them all together could yield quite the benefit, vs individual compression

@hpk42 but users are supposed to import this file not go work with it manually, so I am not seeing the benefit of having good windows support

@dignifiedquire
Copy link
Collaborator

Streaming import sounds like a serious attack vector, you would start ingesting data without being able to verify that it is uncompromised.

@link2xt
Copy link
Collaborator

link2xt commented Jul 17, 2020

zip files allow AES encrypting the contents which is not perfect but better than nothing

We can also PGP-encrypt whatever format we use, it can also work if the key is already transferred via autocrypt setup message. ZIP AES encryption is using a password, which is not that useful to us.

@link2xt
Copy link
Collaborator

link2xt commented Jul 17, 2020

Streaming import sounds like a serious attack vector, you would start ingesting data without being able to verify that it is uncompromised.

I agree, we don't have a trusted channel if we are transferring a backup over the LAN. The simplest way is to transfer a checksum/key over the second channel (QR code) and simply transmit the backup file over unencrypted TCP connection. If we want to stream, we need to authenticate each packet by using TLS maybe. It is more complicated, and since we can't read ZIP file until we receive the end of central directory, it does not matter much.

@link2xt
Copy link
Collaborator

link2xt commented Jul 17, 2020

Overall, I don't see many advantages now compared to sqlar. And there is an added dependency of zip-rs.

Also compatibility is easier with sqlar, because both formats are SQLite databases. Just open it and then check whether it is an sqlar format or not. If we go with zip, we need to also change how we search for the file.

@link2xt link2xt changed the title Use zip archive format for exporting & importing Use tar archive format for exporting & importing Jul 17, 2020
@link2xt
Copy link
Collaborator

link2xt commented Jul 17, 2020

Tar is streamable both on export and import and has an async library by @dignifiedquire: https://github.com/dignifiedquire/async-tar

@link2xt link2xt changed the title Use tar archive format for exporting & importing Use zip archive format for exporting & importing Jul 17, 2020
@hpk42
Copy link
Contributor Author

hpk42 commented Jul 17, 2020

thanks all for the discussion, i moved things to a new issues, addressing various comments here.
#1729

@hpk42 hpk42 closed this as completed Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants