Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add finalizer to ZStream and make TranscodingStreams.finalize a noop #97

Merged
merged 1 commit into from
Feb 11, 2025

Conversation

nhz2
Copy link
Member

@nhz2 nhz2 commented Feb 3, 2025

Fixes #48 and fixes #88 by porting JuliaIO/CodecBzip2.jl#43 to this package.

According to https://www.zlib.net/manual.html#Basic

If zlib is used in a multi-threaded application, zalloc and zfree must be thread safe. In that case, zlib is thread-safe.

So, there shouldn't be any thread safety issues with the finalizer here.

Copy link

codecov bot commented Feb 3, 2025

Codecov Report

Attention: Patch coverage is 96.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 84.84%. Comparing base (f29c86f) to head (f50e6f9).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/libz.jl 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #97      +/-   ##
==========================================
+ Coverage   83.51%   84.84%   +1.33%     
==========================================
  Files           4        4              
  Lines         188      198      +10     
==========================================
+ Hits          157      168      +11     
+ Misses         31       30       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if f
C_NULL
else
ccall(:jl_malloc, Ptr{Cvoid}, (Csize_t,), s%Csize_t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we use ccall here rather than Libc.malloc ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make sure the memory being used is visible to the GC https://discourse.julialang.org/t/c-struct-garbage-collection-not-run-frequently-enough/116919/14

If I replace this with Libc.malloc Julia uses more memory in the memory leak test, because GC isn't run frequently enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some reference within the C code suggesting that calloc may be warranted here.

https://github.com/madler/zlib/blob/develop/deflate.c#L397

The comments here suggest that malloc may be an appropriate optimization on modern systems.

madler/zlib#517

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like calloc is only needed if pointers are 16 bits.

@mkitti
Copy link
Member

mkitti commented Feb 6, 2025

Per my Slack comment, consider adding and encouraging a do syntax to explicitly finalize the compressor rather than relying on garbage collection to work. GC will only invoke the finalizers under memory pressure.

@nhz2
Copy link
Member Author

nhz2 commented Feb 6, 2025

Unlike with files, or GPU arrays, the only resource being used here is CPU memory, so unless I am missing something it is fine if the freeing is delayed until there is memory pressure.

The issue with the do syntax is that it may be hard to prove that the compressor doesn't escape the block, and there are some use cases where the compressor needs to escape the block. For example, the memory could be reused by another compressor with the same options: Arrow.jl has some pools to do this https://github.com/apache/arrow-julia/blob/main/src/Arrow.jl#L88-L92. Also, it would also be interesting to try and use https://github.com/MasonProtter/Bumper.jl if there are multiple compressors that all have the same lifetime, so each compressor doesn't need to be individually freed.

@mkitti
Copy link
Member

mkitti commented Feb 7, 2025

There is a long thread about this by Jeff here:
JuliaLang/julia#11207

The point of the do syntax is not to prevent leakage. The point is to syntactically pair construction and destruction.

I think the use of the finalizer here is fine as a last resort measure, but we should not depend on it. Memory will appear to leak until something actually activates the GC, which may be never.

@@ -99,6 +118,11 @@ function inflate_end!(zstream::ZStream)
return ccall((:inflateEnd, libz), Cint, (Ref{ZStream},), zstream)
end

function decompress_finalizer!(zstream::ZStream)
inflate_end!(zstream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this gets called twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing.

InflateEnd starts by checking if zstream.state is null and returns Z_STREAM_ERROR if it is https://github.com/madler/zlib/blob/8a844d434f0eef87d972ae6406b00968f7c52944/inflate.c#L1268-L1269

InflateEnd also ends by setting zstream.state to null. https://github.com/madler/zlib/blob/8a844d434f0eef87d972ae6406b00968f7c52944/inflate.c#L1273

The same happens in DeflateEnd https://github.com/madler/zlib/blob/8a844d434f0eef87d972ae6406b00968f7c52944/deflate.c#L1266-L1283

Also, since the ZStream constructor initializes .state to C_NULL, the finalizer can run on uninitialized streams without doing anything.

ccall(:jl_malloc, Ptr{Cvoid}, (Csize_t,), s%Csize_t)
end
end
zfree(::Ptr{Cvoid}, p::Ptr{Cvoid}) = ccall(:jl_free, Cvoid, (Ptr{Cvoid},), p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since zfree is just a pass through to jl_free perhaps you should just provide the function pointer directly:

cglobal(:jl_free).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have different signatures. zfree takes a pointer to an allocator context and a pointer to free, while jl_free just takes a pointer to free.

@nhz2
Copy link
Member Author

nhz2 commented Feb 7, 2025

IIUC, the problem in JuliaLang/julia#11207 that is relevant here is that finalizers can slow down when there are many different objects with finalizers.

I can try and benchmark with ZipArchives.jl where I create new compressors for each new file that needs to be compressed.

When this PR is merged, I can safely reuse the compressor if the new file has the same compression level as the previous file without worrying about memory leaks or use after free, so I will benchmark that as well.

GC never running sounds like a good thing, no need to waste cycles freeing memory if a process is short-lived and never uses much memory.

@nhz2
Copy link
Member Author

nhz2 commented Feb 9, 2025

I've done the following as a benchmark:

using Chairmarks
using ZipArchives

function bench(N)
    sink = IOBuffer()
    ZipWriter(sink) do w
        for i in 1:N
            zip_newfile(w, "$i"; compress=true)
            write(w, zeros(UInt8, 10000))
        end
    end
end
bench(400000)
display(@be bench(400000) seconds=60)

With currently released versions of CodecZlib and ZipArchives I get:

min    6.277 s (10000149 allocs: 16.795 GiB, 9.50% gc time)
median 6.415 s (10000149 allocs: 16.795 GiB, 10.43% gc time)
mean   6.454 s (10000149 allocs: 16.795 GiB, 10.49% gc time)
max    6.724 s (10000149 allocs: 16.795 GiB, 12.38% gc time)

With the currently released version of ZipArchives and this PR I get:

 min    7.348 s (12000149 allocs: 116.698 GiB, 14.83% gc time)
 median 7.436 s (12000149 allocs: 116.698 GiB, 16.09% gc time)
 mean   7.441 s (12000149 allocs: 116.698 GiB, 16.23% gc time)
 max    7.560 s (12000149 allocs: 116.698 GiB, 18.05% gc time)

With JuliaIO/ZipArchives.jl#86 and this PR I get:

 min    6.008 s (9600156 allocs: 16.747 GiB, 9.75% gc time)
 median 6.065 s (9600156 allocs: 16.747 GiB, 10.60% gc time)
 mean   6.090 s (9600156 allocs: 16.747 GiB, 10.80% gc time)
 max    6.222 s (9600156 allocs: 16.747 GiB, 12.94% gc time)

So this PR hurts performance by about 13% in this test, without the changes to ZipArchives.

Looking at a similar benchmark in ZipStreams:

using ZipStreams

function bench(N)
    sink = IOBuffer()
    zipsink(sink) do w
        for i in 1:N
            open(w, "$i") do f
                write(f, zeros(UInt8, 10000))
            end
        end
    end
end

Without this PR I run out of memory due to the memory leak, with this PR I get:

 min    9.401 s (61600112 allocs: 118.280 GiB, 12.57% gc time)
 median 9.555 s (61600112 allocs: 118.280 GiB, 14.11% gc time)
 mean   9.541 s (61600112 allocs: 118.280 GiB, 13.89% gc time)
 max    9.614 s (61600112 allocs: 118.280 GiB, 14.72% gc time)

@mkitti mkitti self-requested a review February 11, 2025 17:38
@nhz2 nhz2 merged commit 7516678 into master Feb 11, 2025
26 checks passed
@nhz2 nhz2 deleted the nz/finalizer branch February 11, 2025 18:30
@nhz2 nhz2 mentioned this pull request Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak if decompressed stream is not explicitly closed transcode failing if codec is not initialized
2 participants