OSAC-420: harden console proxy websocket transport#692
Conversation
|
@sk-ilya: This pull request references OSAC-420 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThe PR migrates console WebSocket handling from ChangesConsole WebSocket Migration and Hardening
Sequence Diagram(s)sequenceDiagram
rect rgba(173, 216, 230, 0.5)
Note over Client,ConsoleProxyCore: WebSocket Upgrade & Relay (Origin-based gating)
Client->>ConsoleProxyWSHandler: HTTP Upgrade (Origin, Authorization)
ConsoleProxyWSHandler->>ConsoleProxyWSHandler: cookie auth + empty Origin → reject 403
ConsoleProxyWSHandler->>websocket.Accept: Accept(w, r, {OriginPatterns, BinarySubprotocol, PingPongHooks})
alt Accept succeeds (101)
websocket.Accept-->>ConsoleProxyWSHandler: *websocket.Conn
ConsoleProxyWSHandler->>StartPing: StartPing(ctx, conn, logger, pingConfig)
ConsoleProxyWSHandler->>websocket.NetConn: wrap as io.ReadWriteCloser
ConsoleProxyWSHandler->>ConsoleProxyCore: Relay(ctx, NetConn, backend)
ConsoleProxyCore->>ConsoleProxyCore: io.CopyBuffer(64KB) bidirectional
else Accept fails
websocket.Accept-->>ConsoleProxyWSHandler: error (403)
ConsoleProxyWSHandler->>backend: backend.Close()
end
end
rect rgba(144, 238, 144, 0.5)
Note over kubeVirtBackend,KubeVirt: Backend Dial (TLS 1.3 + Bearer auth)
kubeVirtBackend->>websocket.Dial: Dial(ctx, url, DialOptions{TLS 1.3, Bearer, Subprotocols})
websocket.Dial-->>kubeVirtBackend: *websocket.Conn
kubeVirtBackend->>StartPing: StartPing(ctx, conn, logger, pingConfig)
kubeVirtBackend->>websocket.NetConn: NetConn(ctx, conn, BinaryMessage)
websocket.NetConn-->>kubeVirtBackend: io.ReadWriteCloser
end
rect rgba(255, 218, 185, 0.5)
Note over websocket.Conn,Relay: Keep-alive ping loop
loop every PingInterval
StartPing->>websocket.Conn: conn.Ping(pongTimeout)
websocket.Conn-->>StartPing: pong or timeout
end
alt Ping fails & ctx active
StartPing->>websocket.Conn: CloseNow()
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🔕 Pre-merge checks override appliedThe pre-merge checks have been overridden successfully. You can now proceed with the merge. Overridden by ❌ Failed checks (1 error)
✅ Passed checks (10 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/cmd/cli/console/serial/serial_cmd.go`:
- Around line 156-164: The code at line 156 ignores the error returned by
stdinReader.Close() in the defer statement, and at line 163 ignores the return
value from stdinReader.Cancel(), which can cause shutdown hangs if cancellation
fails. Update the defer statement to capture and log any error from
stdinReader.Close(), and update the InterruptRead function to capture and check
the return value from stdinReader.Cancel(), ensuring any failures are properly
handled rather than silently ignored. This prevents indefinite blocking of the
cleanup goroutine in connect.Proxy when cancellation fails on platform-specific
edge cases.
In `@internal/console/mock_ws_server_test.go`:
- Line 115: The WebSocket test code is ignoring error returns which can hide
handshake and write failures, weakening test diagnostics. In
internal/console/mock_ws_server_test.go at lines 115-115, check the error
returned from conn.Write() when writing the banner and return immediately if an
error occurs to prevent silent failures. In
internal/servers/console_proxy_ws_test.go at lines 124-126, capture the error
returned from websocket.Accept() and explicitly verify that it fails as
expected—if Accept unexpectedly succeeds, close the connection and fail the test
with a clear assertion to catch this regression. Following the coding guideline
that error returns must never be ignored, these changes will improve test
failure fidelity and prevent regressions from slipping through.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: osac-project/coderabbit/.coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: b099fa11-72a1-4090-9cfe-36589fc5904a
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (13)
charts/service/templates/ingress-proxy/_config.yaml.tplgo.modinternal/cmd/cli/console/serial/serial_cmd.gointernal/console/kubevirt_backend.gointernal/console/kubevirt_backend_integration_test.gointernal/console/mock_ws_server_test.gointernal/console/ping.gointernal/network/listener_test.gointernal/servers/console_http_middleware.gointernal/servers/console_http_middleware_test.gointernal/servers/console_proxy_core.gointernal/servers/console_proxy_ws.gointernal/servers/console_proxy_ws_test.go
bef42a6 to
b9ad9fa
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
internal/cmd/cli/console/serial/serial_cmd.go (1)
166-170:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
Cancel()failure path can still hang proxy shutdown.At Line 167, a failed
stdinReader.Cancel()only logs a warning and returns a no-op restore function. Sinceconnect.ProxycallsInterruptRead()and then waits onwg.Wait(), a non-interrupted blocking read can deadlock disconnect/cleanup.Suggested fix
InterruptRead: func() func() { - if !stdinReader.Cancel() { - c.logger.WarnContext(ctx, "Failed to cancel stdin reader, proxy cleanup may block") - } - return func() {} + if stdinReader.Cancel() { + return func() {} + } + + c.logger.WarnContext(ctx, "Failed to cancel stdin reader, forcing stdin read deadline to unblock") + if err := os.Stdin.SetReadDeadline(time.Now()); err != nil { + c.logger.WarnContext(ctx, "Failed to force-unblock stdin read", "error", err) + return func() {} + } + return func() { + if err := os.Stdin.SetReadDeadline(time.Time{}); err != nil { + c.logger.DebugContext(ctx, "Failed to clear stdin read deadline", "error", err) + } + } },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cmd/cli/console/serial/serial_cmd.go` around lines 166 - 170, The InterruptRead function returns a no-op restore function when stdinReader.Cancel() fails, which means any blocking read on stdin will remain uninterrupted. This causes the proxy shutdown to hang when waiting on wg.Wait() after calling InterruptRead(). Instead of returning an empty function when Cancel() fails, the returned function should implement an alternative mechanism to forcefully interrupt the blocking read operation, such as directly closing the stdin reader or using a separate interrupt signal that the blocking read respects, ensuring the read is unblocked regardless of whether the initial Cancel() call succeeded.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/console/kubevirt_backend_integration_test.go`:
- Around line 181-210: The conn.Write call in the HTTP handler function within
newMockWSServerWithAuthCapture is ignoring the error it returns, which reduces
test diagnostic capability. Capture the error returned by the conn.Write
operation and handle it appropriately by either logging it to a test logger or
returning the error so the test can detect unexpected connection failures.
In `@internal/servers/console_proxy_ws_test.go`:
- Around line 106-113: The test uses a broad defer recover() block around the
handler.ServeHTTP call to swallow any panic, then only asserts the response code
is not 403. This can mask unrelated regressions in security-sensitive code.
Instead of using recover() to suppress arbitrary panics, refactor the test to
use controlled stubs (such as providing a mock Opener) to prevent the panic from
occurring in the first place, then assert a specific expected outcome (such as a
particular response code or behavior) rather than just checking it is not
forbidden. Apply this same fix pattern at all locations where this generic panic
recovery exists in the test.
- Around line 124-126: The websocket.Accept call in the test handler is ignoring
its error return value using blank identifiers (_). Replace the `_, _ =
websocket.Accept(...)` pattern with explicit error handling that captures the
error returned by websocket.Accept and checks whether it is nil. If an error
occurs, fail the test using an appropriate test assertion method (such as
t.Fatalf) to ensure unexpected upgrade behavior is caught and the security-path
test remains trustworthy.
---
Duplicate comments:
In `@internal/cmd/cli/console/serial/serial_cmd.go`:
- Around line 166-170: The InterruptRead function returns a no-op restore
function when stdinReader.Cancel() fails, which means any blocking read on stdin
will remain uninterrupted. This causes the proxy shutdown to hang when waiting
on wg.Wait() after calling InterruptRead(). Instead of returning an empty
function when Cancel() fails, the returned function should implement an
alternative mechanism to forcefully interrupt the blocking read operation, such
as directly closing the stdin reader or using a separate interrupt signal that
the blocking read respects, ensuring the read is unblocked regardless of whether
the initial Cancel() call succeeded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: osac-project/coderabbit/.coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 15fe442b-adbe-4f95-91fd-8adf0eaf801c
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (13)
charts/service/templates/ingress-proxy/_config.yaml.tplgo.modinternal/cmd/cli/console/serial/serial_cmd.gointernal/console/kubevirt_backend.gointernal/console/kubevirt_backend_integration_test.gointernal/console/mock_ws_server_test.gointernal/console/ping.gointernal/network/listener_test.gointernal/servers/console_http_middleware.gointernal/servers/console_http_middleware_test.gointernal/servers/console_proxy_core.gointernal/servers/console_proxy_ws.gointernal/servers/console_proxy_ws_test.go
| // newMockWSServerWithAuthCapture creates a mock WS server that captures the | ||
| // Authorization header from the HTTP upgrade request into a channel. | ||
| func newMockWSServerWithAuthCapture(authCh chan<- string) (*mockWSServer, error) { | ||
| listener, err := net.Listen("tcp", "127.0.0.1:0") | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to listen: %w", err) | ||
| } | ||
|
|
||
| m := &mockWSServer{listener: listener} | ||
|
|
||
| mux := http.NewServeMux() | ||
| mux.HandleFunc("/apis/console.osac.openshift.io/v1alpha1/namespaces/", func(w http.ResponseWriter, r *http.Request) { | ||
| authCh <- r.Header.Get("Authorization") | ||
| conn, err := websocket.Accept(w, r, &websocket.AcceptOptions{ | ||
| InsecureSkipVerify: true, | ||
| }) | ||
| if err != nil { | ||
| http.Error(w, err.Error(), http.StatusInternalServerError) | ||
| return | ||
| } | ||
| conn.SetReadLimit(-1) | ||
| conn.Write(r.Context(), websocket.MessageBinary, []byte("authenticated\r\n")) | ||
| conn.Close(websocket.StatusNormalClosure, "") | ||
| }) | ||
|
|
||
| m.server = &http.Server{Handler: mux} | ||
| go m.server.Serve(listener) | ||
|
|
||
| return m, nil | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Low-severity: Test helper ignores conn.Write error.
In test code, ignoring the write error at line 202 reduces diagnostic fidelity if the connection unexpectedly fails. The risk is low since this is test infrastructure only.
🔧 Optional improvement
- conn.Write(r.Context(), websocket.MessageBinary, []byte("authenticated\r\n"))
+ if err := conn.Write(r.Context(), websocket.MessageBinary, []byte("authenticated\r\n")); err != nil {
+ return
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/console/kubevirt_backend_integration_test.go` around lines 181 -
210, The conn.Write call in the HTTP handler function within
newMockWSServerWithAuthCapture is ignoring the error it returns, which reduces
test diagnostic capability. Capture the error returned by the conn.Write
operation and handle it appropriately by either logging it to a test logger or
returning the error so the test can detect unexpected connection failures.
| _, _ = websocket.Accept(w, r, &websocket.AcceptOptions{ | ||
| OriginPatterns: []string{"https://good.com"}, | ||
| }) |
There was a problem hiding this comment.
Handle websocket.Accept errors explicitly in the test handler.
At Line 124, _, _ = websocket.Accept(...) drops the returned error, which can mask unexpected upgrade behavior and make this security-path test less trustworthy.
As per coding guidelines, **/*.go: “Never ignore error returns.”
Suggested fix
- _, _ = websocket.Accept(w, r, &websocket.AcceptOptions{
+ _, err := websocket.Accept(w, r, &websocket.AcceptOptions{
OriginPatterns: []string{"https://good.com"},
})
+ if err != nil {
+ return
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/servers/console_proxy_ws_test.go` around lines 124 - 126, The
websocket.Accept call in the test handler is ignoring its error return value
using blank identifiers (_). Replace the `_, _ = websocket.Accept(...)` pattern
with explicit error handling that captures the error returned by
websocket.Accept and checks whether it is nil. If an error occurs, fail the test
using an appropriate test assertion method (such as t.Fatalf) to ensure
unexpected upgrade behavior is caught and the security-path test remains
trustworthy.
Source: Coding guidelines
zszabo-rh
left a comment
There was a problem hiding this comment.
Couple of nice-to-have comments, but otherwise the library swap is a clear improvement and the code is even simpler than before. LGTM!
|
|
||
| // Ping sends periodic WebSocket pings to detect dead connections. It blocks | ||
| // until the context is cancelled or a ping fails, returning the cause. | ||
| func Ping(ctx context.Context, conn *websocket.Conn) error { |
There was a problem hiding this comment.
Any chance we could add a few tests for the new ping.go? Things like verifying the timeout fires when pongs stop arriving and that cancelling the context shuts things down cleanly. It's straightforward code, but it's also the piece that decides when to tear down connections. Would be nice to have that covered.
| const ( | ||
| pingInterval = 30 * time.Second | ||
| pongTimeout = 10 * time.Second | ||
| ) |
There was a problem hiding this comment.
Would it make sense to make these configurable? With many concurrent console sessions, a 30s ping interval could add up. Even just wiring them through the existing console proxy config would give operators a knob to tune if needed.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sk-ilya, zszabo-rh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
New changes are detected. LGTM label has been removed. |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
internal/console/kubevirt_backend_integration_test.go (1)
86-90:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMinor reliability risk: ignored WebSocket write/close errors can hide integration-test regressions.
The handlers drop
Write/Closeerrors, so auth/close-path failures can be masked instead of surfacing as test failures.🔧 Suggested fix
closeServer, err := newMockWSServerWithHandler(func(ctx context.Context, conn *websocket.Conn) { - conn.Write(ctx, websocket.MessageBinary, []byte("goodbye\r\n")) + if err := conn.Write(ctx, websocket.MessageBinary, []byte("goodbye\r\n")); err != nil { + return + } // Close the WebSocket connection immediately after banner. - conn.Close(websocket.StatusNormalClosure, "") + if err := conn.Close(websocket.StatusNormalClosure, ""); err != nil { + return + } }) @@ mux.HandleFunc("/apis/console.osac.openshift.io/v1alpha1/namespaces/", func(w http.ResponseWriter, r *http.Request) { authCh <- r.Header.Get("Authorization") @@ - conn.Write(r.Context(), websocket.MessageBinary, []byte("authenticated\r\n")) - conn.Close(websocket.StatusNormalClosure, "") + if err := conn.Write(r.Context(), websocket.MessageBinary, []byte("authenticated\r\n")); err != nil { + return + } + if err := conn.Close(websocket.StatusNormalClosure, ""); err != nil { + return + } })As per coding guidelines,
**/*.go: never ignore error returns.Also applies to: 194-204
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/console/kubevirt_backend_integration_test.go` around lines 86 - 90, The WebSocket message handler in newMockWSServerWithHandler ignores error returns from conn.Write() and conn.Close() calls, which can mask test failures when authentication or connection closure fails. Capture the error returns from both the conn.Write() call (which sends the banner message) and the conn.Close() call, and handle them appropriately (e.g., by logging or failing the test). Apply this same error-checking pattern to the other handler location at lines 194-204 in the same file where similar WebSocket Write/Close operations occur without error checking.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/console/kubevirt_backend.go`:
- Around line 67-75: The Build() method in KubeVirtBackendBuilder does not
validate the PingConfig field before creating the backend instance, allowing
invalid PingInterval values (such as negative durations) to propagate to the
Ping() method where they cause a panic. Add validation logic in Build() similar
to what exists in PingConfigFromFlags to check if b.pingConfig has a negative
PingInterval, and return an appropriate error if validation fails, ensuring the
validation happens at the API boundary before the kubeVirtBackend is
instantiated.
In `@internal/console/ping.go`:
- Around line 84-100: The PingReceivedHandler and PongReceivedHandler functions
are logging the full peer-controlled payload bytes, which poses a security risk
as untrusted data could contain sensitive information or control characters.
Replace the `slog.String("payload", string(payload))` calls in both functions
with a safer alternative such as logging only the payload length using
`slog.Int()` with an appropriate key name, or use a key prefixed with `!` to
indicate redaction as per the repository's coding guidelines. This ensures
control-frame payloads are not exposed in logs while still providing useful
debugging information.
In `@internal/network/listener_test.go`:
- Around line 424-426: The tls.Config struct in TLSClientConfig is missing the
MinVersion field, which allows insecure TLS protocol downgrades. Add MinVersion:
tls.VersionTLS13 to the tls.Config initialization to enforce TLS 1.3 as the
minimum protocol version. This ensures the test code models secure TLS
configuration practices and prevents negotiation of deprecated protocol versions
with known vulnerabilities.
---
Duplicate comments:
In `@internal/console/kubevirt_backend_integration_test.go`:
- Around line 86-90: The WebSocket message handler in newMockWSServerWithHandler
ignores error returns from conn.Write() and conn.Close() calls, which can mask
test failures when authentication or connection closure fails. Capture the error
returns from both the conn.Write() call (which sends the banner message) and the
conn.Close() call, and handle them appropriately (e.g., by logging or failing
the test). Apply this same error-checking pattern to the other handler location
at lines 194-204 in the same file where similar WebSocket Write/Close operations
occur without error checking.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: osac-project/coderabbit/.coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: e669bee2-9307-47af-9d25-28d596873b28
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (21)
charts/service/templates/ingress-proxy/_config.yaml.tplgo.modinternal/cmd/cli/console/serial/serial_cmd.gointernal/cmd/service/start/consoleproxy/start_console_proxy_cmd.gointernal/cmd/service/start/grpcserver/start_grpc_server_cmd.gointernal/console/kubevirt_backend.gointernal/console/kubevirt_backend_integration_test.gointernal/console/manager.gointernal/console/mock_ws_server_test.gointernal/console/ping.gointernal/console/ping_flags.gointernal/console/ping_test.gointernal/console/session_flags.gointernal/network/grpc_keepalive_flags.gointernal/network/listener_test.gointernal/servers/console_http_middleware.gointernal/servers/console_http_middleware_test.gointernal/servers/console_proxy_core.gointernal/servers/console_proxy_ws.gointernal/servers/console_proxy_ws_test.gomanifests/overlays/openshift/route.yaml
Assisted-by: Claude Code <noreply@anthropic.com>
migrate console websocket handling from golang.org/x/net/websocket to github.com/coder/websocket
Summary by CodeRabbit