Skip to content

Conversation

@twz123
Copy link

@twz123 twz123 commented Oct 21, 2025

Three goroutines could outlive a call to ClientConn.close(). Add mechanics to cancel them and wait for them to complete when closing a client connection.

Fixes #8655.

RELEASE NOTES:

  • Closing a client connection will cancel all pending goroutines and block until they complete.

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

❌ Patch coverage is 91.30435% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (master@7472d57). Learn more about missing BASE report.

Files with missing lines Patch % Lines
clientconn.go 84.21% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master    #8666   +/-   ##
=========================================
  Coverage          ?   83.22%           
=========================================
  Files             ?      416           
  Lines             ?    32344           
  Branches          ?        0           
=========================================
  Hits              ?    26919           
  Misses            ?     4038           
  Partials          ?     1387           
Files with missing lines Coverage Δ
balancer_wrapper.go 85.47% <100.00%> (ø)
internal/balancer/gracefulswitch/gracefulswitch.go 87.86% <100.00%> (ø)
internal/testutils/pipe_listener.go 85.71% <100.00%> (ø)
clientconn.go 90.06% <84.21%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

healthData *healthData

shutdownMu sync.Mutex
shutdown chan struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to shutdownCh to indicate it is a channel?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


func (acbw *acBalancerWrapper) UpdateAddresses(addrs []resolver.Address) {
acbw.ac.updateAddrs(addrs)
acbw.goFunc(func(shutdown <-chan struct{}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we changed this to run the acbw.ac.updateAddrs(addrs) function in a go routine??

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically a "bubbled-up" goroutine. Previously, the goroutine was spawned in updateAddrs itself (line 1021). But, as we now need to track those, I figured it would be most appropriate to do it here. Another option would be to somehow push this down into updateAddrs itself, by passing the acBalancerWrapper pointer, or a function pointer to acbw.goFunc or sth. along those lines, and then use that to spawn the goroutine there:

Suggested change
acbw.goFunc(func(shutdown <-chan struct{}) {
acbw.ac.updateAddrs(acbw, addrs)

Then we could write line 1021 of updateAddrs like so:

	acbw.goFunc(ac.resetTransportAndUnlock)

ac.mu.Lock()
defer ac.mu.Unlock()
acMu := &ac.mu
acMu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the original code did not work? this seems unnecessarily complicated and does the same thing. Or am I missing something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a way to make the defer do the unlock conditionally, as the code might need to unlock it before returning (see lines 1441 and 1442). We can achieve the same e.g. with a boolean variable, if you prefer.

I think using a pointer for this is good, because, if you forget the nil check, it will panic with an easy to understand stack trace. Whereas if you forget to check a boolean, you'd do a double-unlock and there's a chance that you get weirder problems.

Three goroutines could outlive a call to ClientConn.close(). Add
mechanics to cancel them and wait for them to complete when closing a
client connection.

RELEASE NOTES:
- Closing a client connection will cancel all pending goroutines and
  block until they complete.

Signed-off-by: Tom Wieczorek <[email protected]>
@twz123 twz123 force-pushed the clientconn-close-waitfor-goroutines branch from e45e9d7 to e8bb2b4 Compare November 5, 2025 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow waiting for all goroutines to exit on client connection close

4 participants