-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster BoolReader #124
base: main
Are you sure you want to change the base?
Faster BoolReader #124
Conversation
That said, lossless WebP is already plenty fast specifically due to optimizations to transforms. We actually beat |
Regarding bit reading: libwebp has a dedicated codepath for reading with probability 128 that is distinct from the general-purpose one. Is that something that you've explored? If you haven't attempted it, it doesn't have to be a part of this PR. I just wanted to know if this has been attempted or not. I would expect this not to matter if the hot variant of |
Huh are you sure? I only mentioned it because Not denying that it's already plenty fast, just that I'm certain it showed up in my call graphs inside |
Yes that's the But indeed, that's something that can be revisited in a separate PR. |
It might be worth renaming "bool reader" to "arithmetic decoder" or something to that effect, because it is doing boolean arithmetic coding rather than simply reading bits. |
FWIW there is no change on end-to-end benchmarks for the large image on my machine from the FastReader::read_flag optimization. It's possible that it helps other machines, just not mine. |
I can confirm this didn't break anything 🎉 No behavioral changes before and after on my corpus of 7,500 images scraped from the web. |
BoolReader
to its own file.read_bool
andread_with_tree
by assuming none of them reach the end of the buffer and returning a transparentBitResult
, then validating at the end.read_bool
andread_with_tree
by assuming each bit can be read from the 4-byte chunks (inFastReader
), and retrying with the slow approach if this fails.Final performance results are a 1.3x speedup compared to image-rs 0.2.0 (
--use-reference
), although it is still 1.3x slower than libwebp:(I ran
dwebp
as the first and the last candidate to negate any effects from my poor laptop's CPU overheating.)This uses as_flattened_mut() which was stabilized in 1.80.0, so merging this probably requires raising the MSRV. I don't know your policy on that, but the alternative was adding unsafe or adding another dependency (that itself uses unsafe), so I left it as is.
PS:
read_literal
has some obvious optimizations but it doesn't seem part of the latency critical path.read_flag
's1 + ((range - 1) * 128) >> 8)
but it seems hard to measure.