Skip to content

Comments

perf: optimize image processing with larger pixel chunks#27

Merged
doprz merged 3 commits intodoprz:mainfrom
CordlessCoder:main
Apr 25, 2025
Merged

perf: optimize image processing with larger pixel chunks#27
doprz merged 3 commits intodoprz:mainfrom
CordlessCoder:main

Conversation

@CordlessCoder
Copy link
Contributor

@CordlessCoder CordlessCoder commented Apr 22, 2025

I'm seeing a ~2x speedup thanks to this :)

@CordlessCoder
Copy link
Contributor Author

CordlessCoder commented Apr 22, 2025

Bumped dependency versions and updated image features to work with the new defaults.

Didn't find any performance regressions when running with .jpeg inputs.

@doprz doprz self-requested a review April 22, 2025 23:30
@doprz doprz added the enhancement New feature or request label Apr 23, 2025
@doprz
Copy link
Owner

doprz commented Apr 23, 2025

Thank you for another PR @CordlessCoder , it is very much appreciated!

@doprz doprz added dependencies optimization and removed enhancement New feature or request labels Apr 23, 2025
@CordlessCoder
Copy link
Contributor Author

The PR is ready to merge btw.

@doprz doprz changed the title Optimize multithreading by chunking pixels perf: optimize image processing with larger pixel chunks Apr 25, 2025
Copy link
Owner

@doprz doprz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thank you once again for another PR @CordlessCoder !

Here are some of my thoughts:

  • Increasing the chunk size from 4 to 4096 should significantly improve performance by:
    • Enhancing cache locality
    • Reducing threading overhead per pixel
    • Allowing better SIMD optimization opportunities (in the future)
  • How does this handle images whose dimensions aren't evenly divisible by the chunk size? The chunks_exact_mut will process complete chunks, but we might need to handle remaining pixels.

@doprz doprz merged commit a046ef2 into doprz:main Apr 25, 2025
@CordlessCoder
Copy link
Contributor Author

LGTM!

Thank you once again for another PR @CordlessCoder !

Here are some of my thoughts:

* Increasing the chunk size from 4 to 4096 should significantly improve performance by:
  
  * Enhancing cache locality
  * Reducing threading overhead per pixel
  * Allowing better SIMD optimization opportunities (in the future)

* How does this handle images whose dimensions aren't evenly divisible by the chunk size? The `chunks_exact_mut` will process complete chunks, but we might need to handle remaining pixels.

There are no remaining pixels, it uses .par_chunks_mut - not .par_chunks_exact_mut.

@doprz
Copy link
Owner

doprz commented Apr 25, 2025

Thank you for clarifying that; I will be updating the docs, nix flake, and possibly finish #25 for the v1.1.0 release soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants