Commit 65a9c4b
File deduplication (#6332)
When receiving messages, blobs will be deduplicated with the new
function `create_and_deduplicate_from_bytes()`. For sending files, this
adds a new function `set_file_and_deduplicate()` instead of
deduplicating by default.
This is for
#6265; read the
issue description there for more details.
TODO:
- [x] Set files as read-only
- [x] Don't do a write when the file is already identical
- [x] The first 32 chars or so of the 64-character hash are enough. I
calculated that if 10b people (i.e. all of humanity) use DC, and each of
them has 200k distinct blob files (I have 4k in my day-to-day account),
and we used 20 chars, then the expected value for the number of name
collisions would be ~0.0002 (and the probability that there is a least
one name collision is lower than that) [^1]. I added 12 more characters
to be on the super safe side, but this wouldn't be necessary and I could
also make it 20 instead of 32.
- Not 100% sure whether that's necessary at all - it would mainly be
necessary if we might hit a length limit on some file systems (the
blobdir is usually sth like
`accounts/2ff9fc096d2f46b6832b24a1ed99c0d6/dc.db-blobs` (53 chars), plus
64 chars for the filename would be 117).
- [x] "touch" the files to prevent them from being deleted
- [x] TODOs in the code
For later PRs:
- Replace `BlobObject::create(…)` with
`BlobObject::create_and_deduplicate(…)` in order to deduplicate
everytime core creates a file
- Modify JsonRPC to deduplicate blob files
- Possibly rename BlobObject.name to BlobObject.file in order to prevent
confusion (because `name` usually means "user-visible-name", not "name
of the file on disk").
[^1]: Calculated with both https://printfn.github.io/fend/ and
https://www.geogebra.org/calculator, both of which came to the same
result
([1](https://github.com/user-attachments/assets/bbb62550-3781-48b5-88b1-ba0e29c28c0d),
[2](https://github.com/user-attachments/assets/82171212-b797-4117-a39f-0e132eac7252))
---------
Co-authored-by: l <[email protected]>1 parent 22a7cfe commit 65a9c4b
File tree
23 files changed
+582
-239
lines changed- deltachat-ffi
- src
- python
- src/deltachat
- tests
- src
- chat
- imex
- mimeparser
- receive_imf
- webxdc
23 files changed
+582
-239
lines changedSome generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
| 113 | + | |
113 | 114 | | |
114 | 115 | | |
115 | 116 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4756 | 4756 | | |
4757 | 4757 | | |
4758 | 4758 | | |
| 4759 | + | |
| 4760 | + | |
| 4761 | + | |
| 4762 | + | |
| 4763 | + | |
| 4764 | + | |
| 4765 | + | |
| 4766 | + | |
| 4767 | + | |
| 4768 | + | |
| 4769 | + | |
| 4770 | + | |
| 4771 | + | |
| 4772 | + | |
| 4773 | + | |
| 4774 | + | |
| 4775 | + | |
| 4776 | + | |
| 4777 | + | |
| 4778 | + | |
| 4779 | + | |
| 4780 | + | |
| 4781 | + | |
| 4782 | + | |
| 4783 | + | |
4759 | 4784 | | |
4760 | 4785 | | |
4761 | 4786 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3835 | 3835 | | |
3836 | 3836 | | |
3837 | 3837 | | |
| 3838 | + | |
| 3839 | + | |
| 3840 | + | |
| 3841 | + | |
| 3842 | + | |
| 3843 | + | |
| 3844 | + | |
| 3845 | + | |
| 3846 | + | |
| 3847 | + | |
| 3848 | + | |
| 3849 | + | |
| 3850 | + | |
| 3851 | + | |
| 3852 | + | |
| 3853 | + | |
| 3854 | + | |
| 3855 | + | |
| 3856 | + | |
| 3857 | + | |
| 3858 | + | |
| 3859 | + | |
| 3860 | + | |
| 3861 | + | |
| 3862 | + | |
| 3863 | + | |
| 3864 | + | |
3838 | 3865 | | |
3839 | 3866 | | |
3840 | 3867 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
111 | | - | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
112 | 114 | | |
113 | 115 | | |
114 | 116 | | |
| |||
120 | 122 | | |
121 | 123 | | |
122 | 124 | | |
123 | | - | |
| 125 | + | |
| 126 | + | |
124 | 127 | | |
125 | 128 | | |
126 | 129 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
181 | 181 | | |
182 | 182 | | |
183 | 183 | | |
184 | | - | |
185 | | - | |
| 184 | + | |
| 185 | + | |
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
189 | 189 | | |
190 | | - | |
191 | | - | |
192 | | - | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
193 | 194 | | |
194 | 195 | | |
195 | 196 | | |
| |||
214 | 215 | | |
215 | 216 | | |
216 | 217 | | |
217 | | - | |
218 | | - | |
| 218 | + | |
| 219 | + | |
219 | 220 | | |
220 | 221 | | |
221 | 222 | | |
| |||
0 commit comments