MapAlgebra

Benchmarking

PNGs - Manual indexing vs. Shared Memory

This is a comparison of approaches in performing the transformation Raster p r c PixelRGBA8 -> Image PixelRGBA8. The built-in encodePalettedPng function provides a similar transformation of Raster p r c Word8 -> Image Pixel8, but that wouldn’t allow us to use the alpha channel. It is quite fast, but we can’t consider it here.

The manual indexing approach uses the generateImage function from JuicyPixels while indexing through every element of the Raster. This is not parallelizable.

-- | Manual indexing method (no memory sharing).
indexing :: Raster p r c PixelRGBA8 -> Image PixelRGBA8
indexing (Raster a) = generateImage f w h
  where (Z :. h :. w) = R.extent a
        f c r = R.unsafeIndex a (Z :. r :. c)

The shared memory approach borrows some code from JuicyPixel-repa to construct an Image from a fully computed Array. The Array is given the F type hint (“Foreign”) so that we just need to pass a pointer to it in order to build the Image (since the internal data in Image is a Data.Vector.Storable).

-- | Memory sharing approach.
shared :: Raster p r c PixelRGBA8 -> Image PixelRGBA8
shared (Raster a) = Image w h $ S.unsafeFromForeignPtr0 (R.toForeignPtr arr) (h*w*z)
  where (Z :. h :. w :. z) = R.extent arr
        arr = runIdentity . R.computeP $ toRGBA a

This uses computeP as well, assuming that all input Rasters will be large enough to make this worth it. I’ve benchmarked elsewhere that it’s worth it to use computeP even for 256x256 rasters.

All times are in milliseconds. The first two extra ops were simple local addition, and the 3rd op is a focal addition, which seems to have much more overhead.

Manual Indexing

Cores	Classify	1 op	2 ops	3 ops
1	8.450	9.790	13.79	269.4
2	9.16	10.47	14.85	291.7
4	9.58	11.08	15.21	309.8
8	11.17	13.12	17.7	407.7

Shared Memory

Cores	Classify	1 op	2 ops	3 ops
1	31.57	36.53	48.13	1064
2	17.12	19.69	25.29	539.3
4	14.08	10.41	22.16	350.5
8	11.12	12.59	15.56	219.1

Since the shared memory approach uses computeP, its performance improves as more cores are added. This is the environment we’d be using the library in anyway (say, 16 or 32 cores), so the shared memory approach should be used here.

PNG encoding w/ LLVM

All benchmarks were ran with RTS+ -N4

Trial	`generateImage` (μs)	256 (ms)	1024 (ms)
LLVM - `traverse`	42	8.25	143.2
LLVM - `unsafeTraverse`	44	9.8	168
Native - `traverse`	110	10.77	175.6
Native - `unsafeTraverse`	112	12.81	202.7

Take aways:

LLVM is good.
traverse is mysteriously faster, at least for my ForeignPtr approach to image conversion. Is there a way that would use U? Repa claims that U is the best for numerical opeartions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notes.org

notes.org

MapAlgebra

Benchmarking

PNGs - Manual indexing vs. Shared Memory

PNG encoding w/ LLVM

Files

notes.org

Latest commit

History

notes.org

File metadata and controls

MapAlgebra

Benchmarking

PNGs - Manual indexing vs. Shared Memory

PNG encoding w/ LLVM