Skip to content

Conversation

@twz123
Copy link

@twz123 twz123 commented Oct 16, 2025

Don't emit a warning when the connection is closing. A warning implies that something is not as it should be, but it's expected that the connection attempt is interrupted in this case.

RELEASE NOTES: N/A

Don't emit a warning when the connection is closing. A warning implies
that something is not as it should be, but it's expected that the
connection attempt is interrupted in this case.

RELEASE NOTES: N/A

Signed-off-by: Tom Wieczorek <[email protected]>
@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.83%. Comparing base (ae62635) to head (4edf517).
⚠️ Report is 13 commits behind head on master.

Files with missing lines Patch % Lines
clientconn.go 57.14% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8654      +/-   ##
==========================================
- Coverage   81.21%   80.83%   -0.38%     
==========================================
  Files         416      416              
  Lines       41002    41096      +94     
==========================================
- Hits        33298    33220      -78     
- Misses       6226     6331     +105     
- Partials     1478     1545      +67     
Files with missing lines Coverage Δ
clientconn.go 78.61% <57.14%> (-10.83%) ⬇️

... and 26 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@easwars
Copy link
Contributor

easwars commented Oct 16, 2025

@twz123 : Could you please describe the problem you are seeing in a little bit more detail and say how your change helps fix the problem? Thanks.

@twz123
Copy link
Author

twz123 commented Oct 17, 2025

When connecting to etcd using their client, I'm seeing warnings during normal operations like this

2025/10/15 12:55:29 WARNING: [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"

or this:

2025/10/15 12:56:24 WARNING: [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: authentication handshake failed: context canceled"

This broke something on our side (k0sproject/k0sctl#953), as we weren't prepared for the additional output. While that was an oversight and easy to fix in k0sctl, we were wondering why we're seeing the warning in the first place. It seems to happen when/after the GRPC client connection gets established. I don't have a clue about GRPC internals, but it seems that a subchannel gets cancelled while it's still creating its transport. This seems like "normal operations" to me. A warning implies that something is not as it should be, so I figured the warning would be inappropriate in this situation?

Below an example output of my test app, which connects to etcd, executes a command, and disconnects again:

Oct 17 12:58:48.922 Connecting.
2025/10/17 12:58:48 INFO: [core] original dial target is: "etcd-endpoints://0xc000334960/127.0.0.1:2379"
2025/10/17 12:58:48 INFO: [core] [Channel #1] Channel created for target "etcd-endpoints://0xc000334960/127.0.0.1:2379"
2025/10/17 12:58:48 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"etcd-endpoints", Opaque:"", User:(*url.Userinfo)(nil), Host:"0xc000334960", Path:"/127.0.0.1:2379", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
2025/10/17 12:58:48 INFO: [core] [Channel #1] Channel authority set to "127.0.0.1:2379"
2025/10/17 12:58:48 INFO: [core] [Channel #1] Channel switches to new LB policy "pick_first"
2025/10/17 12:58:48 INFO: [core] [Channel #1] Channel Connectivity change to CONNECTING
2025/10/17 12:58:48 INFO: [core] [Channel #1 SubChannel #2] Subchannel created
2025/10/17 12:58:48 INFO: [core] [Channel #1] Channel switches to new LB policy "round_robin"
2025/10/17 12:58:48 INFO: [roundrobin] [0xc000368bd0] Created
2025/10/17 12:58:48 INFO: [core] [Channel #1 SubChannel #3] Subchannel created
2025/10/17 12:58:48 INFO: [core] [Channel #1] Channel exiting idle mode
Oct 17 12:58:48.923 Connected, Sleeping one sec.
2025/10/17 12:58:48 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING
2025/10/17 12:58:48 INFO: [core] [Channel #1 SubChannel #2] Subchannel picks a new address "127.0.0.1:2379" to connect
2025/10/17 12:58:48 INFO: [core] [Channel #1 SubChannel #3] Subchannel Connectivity change to CONNECTING
2025/10/17 12:58:48 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN
2025/10/17 12:58:48 INFO: [core] [Channel #1 SubChannel #2] Subchannel deleted
2025/10/17 12:58:48 INFO: [core] [Channel #1 SubChannel #3] Subchannel picks a new address "127.0.0.1:2379" to connect
2025/10/17 12:58:48 WARNING: [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: authentication handshake failed: context canceled"
2025/10/17 12:58:48 WARNING: [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: authentication handshake failed: context canceled"
2025/10/17 12:58:48 INFO: [core] [Channel #1 SubChannel #3] Subchannel Connectivity change to READY
2025/10/17 12:58:48 INFO: [core] [Channel #1] Channel Connectivity change to READY
Oct 17 12:58:49.923 Listing members.
{"members":{"controller-0":"https://172.29.182.39:2380"}}
Oct 17 12:58:49.924 Disconnecting.
2025/10/17 12:58:49 INFO: [core] [Channel #1] Channel Connectivity change to SHUTDOWN
2025/10/17 12:58:49 INFO: [core] [Channel #1] Closing the name resolver
2025/10/17 12:58:49 INFO: [core] [Channel #1] ccBalancerWrapper: closing
2025/10/17 12:58:49 INFO: [core] [Channel #1 SubChannel #3] Subchannel Connectivity change to SHUTDOWN
2025/10/17 12:58:49 INFO: [core] [Channel #1 SubChannel #3] Subchannel deleted
2025/10/17 12:58:49 INFO: [core] [Channel #1] Channel deleted
Oct 17 12:58:49.924 Disconnected.
Done with warnings.

@easwars
Copy link
Contributor

easwars commented Oct 29, 2025

@twz123 Thank you for sharing logs.

From the output I do see that the load balancing policy on the gRPC Channel is changing from pick_first to round_robin. So, while you don't see any connectivity problems on the client, internally the gRPC Channel (which is a virtual connection from the client to the server) is changing its load balancing policy. And the latter is the one which creates subchannels (these are actual connections to real backend servers). And when the pick_first policy is closed, it closes the subchannels that it created. And then round_robin comes along and creates new a subchannel. Although, in your case, both these subchannels are for the same backend address 127.0.0.1:2379.

We would like to keep throwing this warning log since a subchannel not being able to connect to the backend is an important event that we care about. This is in fact specifically mentioned in one of our docs that we use as a guideline for log statements: https://github.com/grpc/grpc-go/blob/master/Documentation/log_levels.md

I'd like to close this and keep existing behavior since it looks like this can and has been easily fixed on your side.

Please feel free to reopen if you think otherwise.

@easwars easwars closed this Oct 29, 2025
@twz123
Copy link
Author

twz123 commented Oct 29, 2025

From the output I do see that the load balancing policy on the gRPC Channel is changing from pick_first to round_robin. So, while you don't see any connectivity problems on the client, internally the gRPC Channel (which is a virtual connection from the client to the server) is changing its load balancing policy. And the latter is the one which creates subchannels (these are actual connections to real backend servers). And when the pick_first policy is closed, it closes the subchannels that it created. And then round_robin comes along and creates new a subchannel. Although, in your case, both these subchannels are for the same backend address 127.0.0.1:2379.

Thanks for the explanation @easwars!

We would like to keep throwing this warning log since a subchannel not being able to connect to the backend is an important event that we care about. This is in fact specifically mentioned in one of our docs that we use as a guideline for log statements: https://github.com/grpc/grpc-go/blob/master/Documentation/log_levels.md

I agree that this warning is important and useful. I didn't remove it; I just silenced it in one specific case where I believe it is inappropriate: When the connection has been closed by the user. This is a normal operation, and subchannels are expected to close. They simply didn't finish connecting and were interrupted along the way. The code was issuing a warning in any case. Now, it will only issue a warning if the connection does not close.

The PR diff might make it look like the warning has been removed completely, but it was actually just moved from createTransport to tryAllAddrs. In createTransport, it was the last statement before returning an error, and tryAllAddrs is the only caller of createTransport. In tryAllAddrs, the warning will be logged as before, except when the connection closes.

I'd like to close this and keep existing behavior since it looks like this can and has been easily fixed on your side.

The hard error has been fixed in k0sctl, yes. However, the warning remains, not only in k0sctl, but every time something interacts with etcd on localhost. It's confusing for users and developers alike. Users are seeing it every time when they query the etcd member status via k0s, and the kube-apiserver logs are full with these warnings, too. We've already received bug reports claiming that etcd clusters were broken because users misinterpreted the logs.

@twz123
Copy link
Author

twz123 commented Oct 29, 2025

Please feel free to reopen if you think otherwise.

PS: I don't have the permissions to reopen this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants