-
-
Notifications
You must be signed in to change notification settings - Fork 974
[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592) #4832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a per-peer Job subsystem (streaming RPC, persistence, manager, client executor), refactors debug bundle flow (removes DebugBundleRequest.status, renames LogFile→LogPath, centralizes upload), and propagates logPath/profile config into client connect/engine flows. (50 words) Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Server as Management gRPC Server
participant JobMgr as Job Manager
participant Store as Store/DB
participant Client as Peer Client (mgmt stream)
participant Engine as Client Engine / Executor
participant DebugGen as Debug Bundle Generator
participant Upload as Upload service
Server->>JobMgr: CreatePeerJob / SendJob(accountID, peerID, JobRequest)
JobMgr->>Store: Persist job record (CreatePeerJob)
Server->>Client: Stream JobRequest to peer (Job RPC)
Client->>Engine: Executor.BundleJob(params, wait)
Engine->>DebugGen: Generate bundle (uses LogPath, StatusRecorder)
DebugGen-->>Engine: bundle file path
Engine->>Upload: UploadDebugBundle(mgmURL, bundlePath)
Upload-->>Engine: uploadKey
Engine->>Client: Return JobResponse (uploadKey)
Client->>Server: Stream JobResponse back
Server->>JobMgr: HandleResponse(JobResponse)
JobMgr->>Store: CompletePeerJob (mark succeeded/failed)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
client/status/status.go (1)
234-315: ToProtoFullStatus mapping is good; add nil‑safety when reading latency/handshakeThe new
ToProtoFullStatushelper correctly maps:
- Management and signal state (URL, connected, error).
- Local peer state (IP, pub key, kernel, FQDN, Rosenpass flags, networks from route keys).
- Per‑peer fields (status, timestamps, transfer stats, Rosenpass, networks, latency, SSH host key).
- Relays and DNS group state, including error strings.
Downstream,
mapPeersassumesLatencyandLastWireguardHandshakeare always set:lastHandshake := pbPeerState.GetLastWireguardHandshake().AsTime().Local() ... Latency: pbPeerState.GetLatency().AsDuration(),That’s safe for
FullStatusobjects produced byToProtoFullStatus, but will panic if aPeerStatearrives with these fields unset (e.g., older daemon or other producer). To harden this:var lastHandshake time.Time if ts := pbPeerState.GetLastWireguardHandshake(); ts != nil { lastHandshake = ts.AsTime().Local() } var latency time.Duration if d := pbPeerState.GetLatency(); d != nil { latency = d.AsDuration() }and then use those locals when building
PeerStateDetailOutput.Also applies to: 549-635
client/internal/engine.go (2)
34-53: Engine config/struct wiring for jobs is fine, butc *profilemanager.Configis unusedThe new debug/job additions to
EngineConfig(ProfileConfig,LogPath) andEngine(jobExecutor,jobExecutorWG) are wired in cleanly, and creatingjobExecutorinNewEngineis reasonable.However, the extra parameter
c *profilemanager.ConfigonNewEngineis never used in the function body. In Go this is a compile‑time error. Either remove the parameter or actually thread it into the engine configuration, e.g.:config.ProfileConfig = c // or engine.config.ProfileConfig = cdepending on where you intend to own that reference.
Also applies to: 83-142, 145-222, 235-276
278-366: Job stream consumption and bundle handling are generally solid; watch for restart behavior and nil profile configThe new job flow on the client side looks good overall:
receiveJobEventshooks intomgmClient.Jobwith a per‑message handler that defaults responses toJobStatus_failedand branches onWorkloadParameters– currently onlyBundleis handled, everything else returnsErrJobNotImplemented.jobExecutorWGensuresStop()waits for the Job stream goroutine to exit before tearing down the engine.handleBundlebuildsdebug.GeneratorDependenciesfrom engine state, callsjobExecutor.BundleJob, and wraps the resulting upload key in aJobResponse_Bundle.Two points to be aware of:
When
mgmClient.Jobreturns any error (including whene.ctxis canceled duringStop()), you treat it as a hard failure and callCtxGetState(e.ctx).Wrap(ErrResetConnection)+e.clientCancel(), which triggers a full client restart. That mirrors how Sync/Signal errors are handled but also means a graceful “down” or engine stop will trigger a restart path instead of a quiet shutdown. If that’s not intended, you may want to distinguish context cancellation from other errors.
handleBundleassumese.config.ProfileConfigande.config.ProfileConfig.ManagementURLare non‑nil:InternalConfig: e.config.ProfileConfig, ... uploadKey, err := e.jobExecutor.BundleJob(..., e.config.ProfileConfig.ManagementURL.String())If
ProfileConfigor itsManagementURLcan be unset (e.g., in tests or some platforms), this will panic. A defensive check that returns a failedJobResponsewith a clear reason would make the behavior safer.Also applies to: 942-1012
🧹 Nitpick comments (38)
shared/management/client/grpc.go (5)
112-160: Centralized withMgmtStream helper looks good; double‑check backoff choice and final loggingThe
withMgmtStreamwrapper nicely unifies readiness checks, server key fetching, and retry for bothSyncandJob. One thing to double‑check is whether you explicitly want streaming calls to usedefaultBackoff(ctx)while other RPCs still usenbgrpc.Backoff(...); if not, consider reusing the same backoff helper for consistency, or documenting the intentional difference.Also,
withMgmtStreamlogs"unrecoverable error"for any non‑nilerrafterbackoff.Retry, includingcontext.Canceled/DeadlineExceeded. If you expect normal shutdowns via context cancellation, you may want to special‑case those errors before logging at warn to avoid noisy logs.
162-216: Job stream error handling is solid; refine connection notifications for better state reportingThe overall
handleJobStreamflow (open stream → handshake → recv/process/respond loop) and error classification by gRPC code look good.A couple of refinements around connection state notifications:
notifyDisconnected(err)is called for every receive error, includingcodes.Unimplemented. In that case, the server is reachable but doesn’t supportJob; marking management as “disconnected” can mislead consumers ofConnStateNotifier. Consider movingnotifyDisconnectedinto theswitchand skipping it forcodes.Unimplemented(and possiblycodes.Canceled, which usually indicates local shutdown).- The Job path never calls
notifyConnected(), so this stream can only move the state toward “disconnected”. IfConnStateNotifieris used for user‑visible connectivity, you might want to callnotifyConnected()once the stream and handshake succeed (or when the first job is successfully received) to keep the state transitions balanced.- On
sendJobResponsefailure you return the error but don’t notify disconnection; if this error typically indicates a broken stream, it may be worth also invokingnotifyDisconnected(err)there.These are behavioral tweaks rather than correctness issues but would make connection state reporting more accurate.
218-252: Job helpers mirror existing encryption patterns; consider cleaning up unused ctx parameters
sendHandshakeandreceiveJobRequestcorrectly follow the existingEncryptMessage/DecryptMessagepattern and use the WireGuard public key in theEncryptedMessageas elsewhere in this file.Right now the
ctxparameter passed into these helpers isn’t used inside them; the only effective cancellation point is the stream’s own context fromrealClient.Job(ctx). That’s fine behaviorally, but for clarity you could either:
- Drop
ctxfrom these helper signatures, or- Start using it explicitly (e.g., check
ctx.Err()before/after blocking operations, or for future timeouts / per‑job cancellation).Given this is internal code, I’d treat it as a readability/maintenance cleanup to do when convenient.
254-295: Ensure JobResponse is always well‑formed for correlation and logging
processJobRequestnicely guards against anilhandler result by synthesizing aJobResponsewithStatus: JobStatus_failedand a reason. Two minor robustness tweaks you might consider:
- If the handler returns a non‑nil
JobResponsebut forgets to populateID, you could default it tojobReq.IDso the management server can always correlate responses:jobResp := msgHandler(jobReq) if jobResp == nil { jobResp = &proto.JobResponse{ ID: jobReq.ID, Status: proto.JobStatus_failed, Reason: []byte("handler returned nil response"), } } else if len(jobResp.ID) == 0 { jobResp.ID = jobReq.ID }
- For logging,
string(jobReq.ID)/string(resp.ID)assumes the IDs are valid UTF‑8. If these are ever changed to non‑textual bytes, consider logging them as%xor viahex.EncodeToStringto avoid odd output.Not blockers, but they make the job channel a bit more defensive and easier to debug.
297-395: Sync refactor via connectToSyncStream/receiveUpdatesEvents looks consistent with existing patternsThe split into
connectToSyncStreamandreceiveUpdatesEventsreads clean and matches the encryption/decryption contract used elsewhere (EncryptMessage(serverPubKey, c.key, req)andDecryptMessage(serverPubKey, c.key, update.Body, resp)).A small optional improvement:
GetNetworkMapnow reimplements the “singleRecv+ decryptSyncResponse” logic that’s very similar to whatreceiveUpdatesEventsdoes in a loop. If you find yourself touching this code again, you might consider a tiny shared helper (e.g.,decryptSyncUpdate(serverPubKey, update *proto.EncryptedMessage) (*proto.SyncResponse, error)) to keep the decryption path fully DRY.Functionally, the new sync connection flow looks correct.
client/internal/debug/upload_test.go (1)
1-1: Test now exercises the exported UploadDebugBundle correctlySwitching to
package debugand callingUploadDebugBundle(context.Background(), testURL+types.GetURLPath, testURL, file)matches the new API and keeps the expectations (getURLHash(testURL)prefix, stored file content) intact.You might consider using a context with timeout here (and/or a simple readiness wait on
srv.Start) to make this integration-style test more robust under slow CI environments.Also applies to: 41-48
client/cmd/debug.go (1)
338-360: Debug bundle generator now uses LogPath and shared default upload URLPassing
LogPath: logFilePathintodebug.NewBundleGeneratormatches the updatedGeneratorDependenciesand ensures the bundle generator knows where to look for logs. Reusingtypes.DefaultBundleURLas the default for--upload-bundle-urlin bothdebug bundleanddebug forkeeps the CLI consistent with the upload-server configuration.Consider clarifying in the flag help text that the default is the NetBird-hosted debug upload service, so self‑hosters know they should override it if needed.
Also applies to: 369-379
client/server/server_test.go (1)
21-21: JobManager wiring into test management server is correctCreating a
jobManager := job.NewJobManager(nil, store)and passing it into bothserver.BuildManager(...)andnbgrpc.NewServer(...)matches the updated constructor signatures and ensures job flows are exercised in this test setup.The
if err != nil { return nil, "", err }right aftereventStore := &activity.InMemoryEventStore{}still refers to the earliererrfromNewTestStoreFromSQL, which has already been handled and is guaranteed nil here. It can be removed to avoid confusion.Also applies to: 298-329
client/internal/debug/debug.go (1)
30-34: Status seeding via FullStatus/nbstatus is coherent with anonymization strategyUsing
statusRecorder.GetFullStatus()→nbstatus.ToProtoFullStatus→ConvertToStatusOutputOverview→ParseToFullDetailSummaryaligns thestatus.txtcontents with the regularnetbird statusoutput and correctly injects event history and profile name. Seeding the bundle anonymizer viaseedFromStatus(g.anonymizer, &fullStatus)leverages this richer status snapshot to anonymize later artifacts more consistently.One minor nit:
seedFromStatusruns even wheng.anonymizeis false, which is harmless but unnecessary work; you could optionally guard it withif g.anonymize { ... }for clarity.Also applies to: 382-405, 852-880
shared/management/http/api/generate.sh (1)
14-16: oapi-codegen v2 install path looks correct; consider pinning a versionThe switch to
github.com/oapi-codegen/oapi-codegen/v2/cmd/oapi-codegen@latestmatches the v2 module layout and should work fine with recent Go toolchains. For more reproducible codegen in CI, you might eventually want to pin a specific version instead of@latest.client/embed/embed.go (1)
175-185: Recorder integration and updated Run call look consistentWiring a
peer.NewRecorderintoNewConnectClientand updatingRuntoRun(run, "")matches the new client API and should keep embedded startup behavior intact. If you later surface a log/debug path inOptions, this is the right place to thread it through instead of an empty string.management/server/http/testing/testing_tools/channel/channel.go (1)
19-39: JobManager wiring into BuildManager is correct; reuse metrics instead of nilInjecting a concrete
job.ManagerintoBuildManagerhere is a nice improvement in test realism. Given you already constructmetricsin this helper, you can pass it into the job manager instead ofnilso job‑related metrics behavior is also covered in tests:- peersUpdateManager := update_channel.NewPeersUpdateManager(nil) - jobManager := job.NewJobManager(nil, store) + peersUpdateManager := update_channel.NewPeersUpdateManager(nil) + jobManager := job.NewJobManager(metrics, store) ... - am, err := server.BuildManager(ctx, nil, store, networkMapController, jobManager, nil, "", &activity.InMemoryEventStore{}, geoMock, false, validatorMock, metrics, proxyController, settingsManager, permissionsManager, false) + am, err := server.BuildManager(ctx, nil, store, networkMapController, jobManager, nil, "", &activity.InMemoryEventStore{}, geoMock, false, validatorMock, metrics, proxyController, settingsManager, permissionsManager, false)Also applies to: 53-79
management/server/account_test.go (1)
35-50: Account test manager now wires a JobManager; prefer passing metrics instead of nilImporting
joband passing a concreteJobManagerintoBuildManagerincreateManagermatches the new account manager wiring and lets tests exercise job‑related paths. Since you already computemetricsin this helper, you can avoid a nil metrics field on the JobManager and better mirror production setup by doing:- updateManager := update_channel.NewPeersUpdateManager(metrics) - requestBuffer := NewAccountRequestBuffer(ctx, store) - networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), &config.Config{}) - manager, err := BuildManager(ctx, nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + updateManager := update_channel.NewPeersUpdateManager(metrics) + requestBuffer := NewAccountRequestBuffer(ctx, store) + networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), &config.Config{}) + manager, err := BuildManager(ctx, nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Also applies to: 2931-2969
management/server/route_test.go (1)
22-34: Route tests correctly inject a JobManager; reuse metrics for consistencyThe additional
jobimport and passingjob.NewJobManager(nil, store)intoBuildManagerincreateRouterManagerare aligned with the new constructor and ensure route tests run with job support enabled. Since this helper already initializesmetrics, you can tighten the fidelity of the test setup by wiring it into the JobManager too:- updateManager := update_channel.NewPeersUpdateManager(metrics) - requestBuffer := NewAccountRequestBuffer(ctx, store) - networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), &config.Config{}) - - am, err := BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + updateManager := update_channel.NewPeersUpdateManager(metrics) + requestBuffer := NewAccountRequestBuffer(ctx, store) + networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), &config.Config{}) + + am, err := BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Also applies to: 1258-1302
management/server/peer_test.go (1)
35-35: Prefer initializingjob.Managerwith real metrics in testsThe wiring of
BuildManagerlooks correct (argument order matches the updated signature), but all these test setups construct the job manager withjob.NewJobManager(nil, s)whilemetricsis already available. This leavesjob.Manager.metricsnil in tests, which can (a) hide metric-related bugs and (b) potentially panic later if the manager starts using metrics without nil checks.Consider passing the same metrics instance you already create:
- am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(nil, s), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(metrics, s), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Apply the same pattern to the other
BuildManagerinvocations in this file for consistency.Also applies to: 1296-1296, 1381-1381, 1534-1534, 1614-1614
management/server/nameserver_test.go (1)
19-19: Use the existing metrics instance when creatingJobManager
createNSManagercorrectly wires the newjobManagerdependency intoBuildManager, but you’re passingjob.NewJobManager(nil, store)even though a realmetricsinstance is already created and used forupdate_channel.NewPeersUpdateManager.To avoid a nil
metricsinsidejob.Managerin tests (and potential surprises if metrics are used later), prefer:- return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Also applies to: 798-799
management/server/dns_test.go (1)
17-17: Align testJobManagerconstruction with real metrics
createDNSManagercorrectly injects aJobManagerintoBuildManager, but usesjob.NewJobManager(nil, store)despite having ametricsinstance already.For more realistic tests and to avoid a nil metrics field inside
job.Manager, consider:- return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false) + return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)Also applies to: 229-230
management/server/management_proto_test.go (1)
32-32: JobManager wiring in tests looks correct; consider reusing metricsThe test correctly instantiates
job.Managerand threads it throughBuildManagerandnbgrpc.NewServer, matching the new signatures. To exercise metrics inside the job manager during tests, you might consider passing themetricsinstance instead ofnilwhen creatingjobManager, but that's optional.Also applies to: 342-343, 369-371, 380-381
client/cmd/testutil_test.go (1)
19-19: Consistent JobManager setup in test helper; minor polish possibleThe JobManager is correctly created and passed into
BuildManagerandnbgrpc.NewServer, aligning with the new signatures. Two small nits you could optionally address later:
- Pass the
metricsinstance tojob.NewJobManagerinstead ofnilif you want job metrics in tests.- The
if err != nil { return nil, nil }check aftereventStore := &activity.InMemoryEventStore{}is dead code and can be removed.Also applies to: 92-96, 123-124, 129-130
management/server/account.go (1)
18-18: JobManager injection is clean; clarify non‑nil contractAdding
jobManager *job.ManagertoDefaultAccountManagerand threading it viaBuildManageris consistent and keeps wiring explicit. However, methods likeCreatePeerJobdereferenceam.jobManagerwithout nil checks, soBuildManagereffectively requires a non‑nil JobManager in all call sites. Consider either:
- Documenting that
jobManagermust be non‑nil (and enforcing via tests), or- Adding a defensive nil check that returns a clear
status.Internal/status.PreconditionFailederror instead of panicking if it’s miswired.Would you like a small
ast-grep/rgscript to verify that allBuildManagercall sites now pass a non‑nil JobManager?Also applies to: 68-76, 179-197, 203-224
management/server/management_test.go (1)
31-31: Test server wiring with JobManager is correctThe test server now correctly constructs a JobManager and passes it through to
BuildManagerandnbgrpc.NewServer, matching the new APIs. As with other tests, you could optionally passmetricsinstead ofnilintojob.NewJobManagerif you want job metrics exercised, but the current setup is functionally fine.Also applies to: 183-185, 212-228, 235-247
management/server/http/handlers/peers/peers_handler.go (1)
31-41: New peer job HTTP endpoints are well‑structured; tighten validation and rely on fixed ownership checkThe new
/peers/{peerId}/jobsand/peers/{peerId}/jobs/{jobId}handlers are consistent with existing patterns (auth from context,util.WriteError/WriteJSONObject, and conversion helpers). A few small points:
- Once
GetPeerJobByIDis fixed to assertjob.PeerID == peerID(see account manager comment), these handlers will correctly enforce per‑peer job ownership.- For consistency with other handlers in this file (e.g.
HandlePeer,GetAccessiblePeers), you may want to add an explicit empty‑peerIdcheck inCreateJob,ListJobs, andGetJobto return a clearInvalidArgumentinstead of silently passing""through.toSingleJobResponsecleanly maps the domainJobto the HTTPJobResponse, including optionalFailedReasonand typedWorkload; that looks good.Overall, the HTTP surface for jobs is in good shape once the backend ownership check is tightened.
Also applies to: 51-142, 618-638
shared/management/proto/management.proto (1)
52-53: Refine Job proto types for clarity and forward compatibilityThe Job RPC and related messages look structurally sound and align with the existing EncryptedMessage pattern. A few tweaks would make them easier to use:
JobResponse.Reasonasbytesis unusual for a human-readable explanation; astringwould be more ergonomic unless you explicitly plan to ship arbitrary binary blobs.JobStatusonly hasunknown_status,succeeded,failed. If you intend to reflect persistent job lifecycle (pending/running vs completed) it may be worth adding an explicit in‑progress state instead of overloadingunknown_statusas a placeholder.- Consider documenting expected encoding/format for
JobRequest.ID/JobResponse.ID(stringified UUID vs raw bytes) so store/types.Job integration remains consistent.Functionally this is fine, these are just protocol polish suggestions.
Also applies to: 66-89
client/internal/debug/upload.go (1)
15-101: Upload helper is solid; consider small robustness improvementsThe end-to-end flow (get presigned URL → size check → PUT upload) looks good and side‑effect free. A few minor refinements:
- In
upload, prefer%wwhen wrapping the HTTP error to preserve the error chain:return fmt.Errorf("upload failed: %w", err)getUploadURLbuilds the query asurl+"?id="+id. Ifurlmight ever include its own query string, usingnet/urlto appendidwould be safer.- Both
getUploadURLanduploadstrictly requireStatusOK. If the upload service ever returns another 2xx (e.g. 201/204), this will be treated as failure. If you control that service and guarantee 200, current code is fine; otherwise considerif putResp.StatusCode/100 != 2.None of these are blockers; the current implementation should work as intended.
client/server/debug.go (1)
27-39: Use the incoming context (and possibly avoid holding the mutex) during uploadThe switch to
debug.UploadDebugBundleandLogPath: s.logFilelooks correct.Two follow‑up improvements worth considering:
DebugBundleignores the incoming context and callsUploadDebugBundlewithcontext.Background(). That means client cancellation/timeouts won’t stop the upload. Renaming the parameter toctx context.Contextand passingctxthrough would make this RPC better behaved under cancellation.- The upload is done while
s.mutexis held. For large bundles or slow networks this can block other server operations behind this lock. If state safety allows it, you might generate the bundle under the lock, then release the mutex before starting the network upload.Behavior is fine for a rarely used debug path, but these tweaks would make it more responsive.
Also applies to: 49-57
management/server/account/manager.go (1)
127-129: Job APIs on Manager look good; consider pagination/filtersThe new job methods are consistent with the rest of the Manager interface (accountID/userID/peerID ordering, types.Job usage).
Two design points to keep in mind for implementations:
GetAllPeerJobscan grow unbounded over time; if jobs are expected to accumulate, consider adding pagination and/or server‑side filtering (e.g. by status or time range) at some point.- Ensure implementations consistently enforce
userIDauthorization, since these methods expose per‑peer operational history.No changes required here, just points to watch in the concrete manager.
client/internal/engine_test.go (1)
1591-1592: JobManager wiring in startManagement test mirrors production; consider reusing metricsThe new
jobManager := job.NewJobManager(nil, store)and its injection intoBuildManagerandnbgrpc.NewServerproperly mirror the production wiring, so tests now exercise the job pipeline end‑to‑end.Since you already create
metrics, err := telemetry.NewDefaultAppMetrics(...)a few lines later, you may want to:
- Construct
metricsfirst, then- Call
job.NewJobManager(metrics, store)This would keep metrics behavior consistent between the JobManager and the rest of the management stack even in tests. Not required for correctness, but a bit cleaner.
Also applies to: 1622-1623, 1628-1629
shared/management/http/api/openapi.yml (2)
41-121: Align bundle/workload schema with actual bundle generator configThe Bundle/Workload schemas look reasonable, but there are a couple of potential contract mismatches to double‑check:
BundleParametersexposesbundle_forandbundle_for_timewhile the currentdebug.BundleConfigonly hasAnonymize,IncludeSystemInfo, andLogFileCount. There is no way for callers to controlIncludeSystemInfo, and the semantics ofbundle_for/bundle_for_timevs. what the client actually does (currently just a wait + regular bundle) should be clarified or adjusted so the API is not misleading.- You’re using a discriminator on
WorkloadRequest/WorkloadResponsewithpropertyName: type. Make sure the generated clients you care about support this pattern where the discriminator property lives only in the concrete types (BundleWorkloadRequest/Response) and not also declared at the base level; some generators are picky here.I’d suggest either:
- Extending
BundleConfig+ implementation to truly honor all declared parameters, or- Tightening the OpenAPI description and fields to only what is actually used today, to avoid surprising API consumers.
35-38: Clarify experimental nature and pagination/filtering for Jobs endpointsThe
Jobstag is markedx-experimental: true, but the individual operations are not. If you rely on tooling that inspects operation/vendor extensions rather than tags, you may want to duplicatex-experimental: trueon the new/api/peers/{peerId}/jobsand/api/peers/{peerId}/jobs/{jobId}operations.Additionally, listing jobs currently returns an unbounded array with no filtering or pagination parameters. If job counts can become large per peer, it’s worth at least documenting ordering (e.g. newest first) and planning for
page/page_size/statusfilters, even if you defer implementing them now.Also applies to: 2353-2456
shared/management/client/client_test.go (1)
25-26: JobManager wiring in tests looks correct; consider reusing metrics if you need job metricsThe new
jobManager := job.NewJobManager(nil, store)and its injection into bothBuildManagerandnbgrpc.NewServeralign with the updated constructor signatures and keep the tests realistic.If you ever want to exercise job‑related metrics in tests, you could pass the already‑created
metricsinstead ofniltoNewJobManager; otherwise this setup is fine as‑is for functional tests.Also applies to: 76-77, 123-124, 130-132
management/server/store/sql_store.go (1)
136-207: Job CRUD helpers mostly fine; consider tightening error handling and update scopeFunctionality looks correct overall (ID scoping, account/peer filters, status transitions), but a few details could be improved:
CompletePeerJobupdates byidonly; sinceIDis the primary key this is safe, but for consistency withGetPeerJobByIDyou may want to includeaccount_idin theWHEREas well.CreatePeerJob,CompletePeerJob, andMarkPendingJobsAsFailedcorrectly wrap errors withstatus.Internal, butGetPeerJobByID/GetPeerJobsreturn raw GORM errors on internal failure. Consider wrapping those instatus.Errorf(status.Internal, ...)as done elsewhere in this store so callers don’t depend on storage-specific errors.- If
types.Job.ApplyResponseonly populates a subset of fields,CompletePeerJob’sUpdates(job)call is fine; if it ever starts zeroing other fields, you may want to switch to an explicit field list/map to avoid unintended overwrites.management/internals/shared/grpc/server.go (1)
184-219: Job stream handling is generally sound; consider surfacing send failuresThe new
JobRPC and helpers hold up well:
- Handshake reads a single encrypted
JobRequestand reuses the existingparseRequestpath, so message framing and crypto stay consistent with Sync/Login.- Account/peer lookup mirrors the Sync flow and correctly maps missing peers to
PermissionDenied/Unauthenticated.CreateJobChannel+ deferredCloseChannelprovide a clear per-peer lifecycle, andstartResponseReceivercleanly demultiplexesJobResponsemessages into the job manager.One point to consider tightening:
- In
sendJobsLoop, any failure insendJobis logged and then the method returnsnil, so the gRPC stream terminates with an OK status even if job delivery to the client failed. For parity withhandleUpdates/sendUpdate, you may want to return the error (or a mapped status error) so callers can observe transport failures on the Job stream.Also applies to: 333-397, 450-467
management/server/job/manager.go (2)
44-151: Pending‑job cleanup semantics are coarse; minor concurrency detail worth double‑checkingOverall behavior is coherent (fail stuck jobs, enqueue new ones, complete on response), but a few aspects could be sharpened:
CreateJobChannelcallsMarkPendingJobsAsFailedfor(accountID, peerID)before taking the lock. That’s fine functionally, but means “stuck too long” in the DB is actually “any pending job when a fresh channel is created” — if you later add explicit TTL semantics, you may want a more targeted query.- In
SendJob, you look up theChannelunderRLockand then callch.AddEventafter releasing the lock. IfCreateJobChannelorCloseChannelcan callch.Close()concurrently, make sureChannel.AddEventis written to safely detect a closed channel and returnErrJobChannelClosedrather than panic on send to a closed chan.- Both
cleanupandCloseChanneluseMarkPendingJobsAsFailed(accountID, peerID, reason), which marks all pending DB jobs for that peer, not just the specificjobID. That’s acceptable if you only ever have a single pending job per peer, but if you later support multiple in‑flight jobs, you’ll likely want a job‑scoped failure path in the store.- In
CloseChannel, the loop overjm.pendingcallsMarkPendingJobsAsFailedonce per event for the same peer; a minor optimization is to call it once per peerID, then delete all relevantpendingentries.These are mostly behavioral clarifications and small optimizations rather than blockers.
90-117: HandleResponse flow is correct but could simplify error‑response coupling
HandleResponsecorrectly:
- Looks up the pending event by
jobID.- Builds a
types.Jobfrom the response viaApplyResponse.- Calls
Store.CompletePeerJoband always removes the entry frompending.Given that the
event.Responsefield is only set whenCompletePeerJobsucceeds and then immediately dropped frompending, you could omitevent.Responseentirely or set it before calling the store to simplify reasoning. Current behavior is valid; this is mainly an internal API cleanup opportunity.client/status/status.go (1)
122-169: Overview mapping from FullStatus looks correct; consider a nil guardSwitching
ConvertToStatusOutputOverviewto use*proto.FullStatusdirectly and threading indaemonVersionmatches how the proto is structured and keeps the output fields aligned with the underlying status (management, signal, local peer, relays, DNS, events, SSH).One small robustness improvement: if
pbFullStatuscan ever be nil (e.g., from an older daemon or error path),pbFullStatus.Get...()calls will panic. A quick upfront check likeif pbFullStatus == nil { return OutputOverview{} }would make the function safer against unexpected inputs.
management/server/types/job.go (2)
105-138: Minor: preserve wrapped error instead of usingerr.Error()In
BuildWorkloadResponse, you wrap bundle build errors as:if err := j.buildBundleResponse(&wl); err != nil { return nil, status.Errorf(status.InvalidArgument, err.Error()) }Using
err.Error()as the format string loses the original error as a distinct argument and makes further wrapping/debugging slightly harder. Consider:- return nil, status.Errorf(status.InvalidArgument, err.Error()) + return nil, status.Errorf(status.InvalidArgument, "%v", err)to keep the original error value intact and consistent with the rest of the file.
140-162: Bundle parameter validation is clear; consider tightening semantics only if neededThe validation for
BundleForTime(1–5 minutes whenBundleForis true) andLogFileCount(1–1000) is straightforward and matches the API docs. You also normalize stored parameters to theBundleParametersJSON and initializeResultto{}, which keepsBuildWorkloadResponsehappy for pending/failed jobs.If you ever want to hard‑fail obviously nonsensical inputs when
BundleForis false (e.g., negativeBundleForTime), you could extend the check here, but that’s optional and not strictly required by the current contract.shared/management/http/api/types.gen.go (1)
2112-2169: WorkloadRequest helpers match server usage; consider using the WorkloadType constantThe
As*/From*/Merge*helpers andDiscriminator/ValueByDiscriminatorforWorkloadRequestare consistent with howNewJobandvalidateAndBuildBundleParamsconsume the workload. One small optional improvement would be to useWorkloadTypeBundleinstead of the literal"bundle"to avoid drift if the constant ever changes:-func (t *WorkloadRequest) FromBundleWorkloadRequest(v BundleWorkloadRequest) error { - v.Type = "bundle" +func (t *WorkloadRequest) FromBundleWorkloadRequest(v BundleWorkloadRequest) error { + v.Type = WorkloadTypeBundleSame for
MergeBundleWorkloadRequestand the switch inValueByDiscriminator.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
client/proto/daemon.pb.gois excluded by!**/*.pb.gogo.sumis excluded by!**/*.sumshared/management/proto/management.pb.gois excluded by!**/*.pb.goshared/management/proto/management_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (52)
.github/workflows/wasm-build-validation.yml(1 hunks)client/cmd/debug.go(1 hunks)client/cmd/status.go(1 hunks)client/cmd/testutil_test.go(3 hunks)client/cmd/up.go(1 hunks)client/embed/embed.go(1 hunks)client/internal/connect.go(7 hunks)client/internal/debug/debug.go(7 hunks)client/internal/debug/upload.go(1 hunks)client/internal/debug/upload_test.go(2 hunks)client/internal/engine.go(13 hunks)client/internal/engine_test.go(9 hunks)client/jobexec/executor.go(1 hunks)client/proto/daemon.proto(0 hunks)client/server/debug.go(2 hunks)client/server/server.go(4 hunks)client/server/server_test.go(3 hunks)client/status/status.go(4 hunks)client/status/status_test.go(1 hunks)client/ui/debug.go(2 hunks)go.mod(2 hunks)management/internals/server/boot.go(1 hunks)management/internals/server/controllers.go(3 hunks)management/internals/server/modules.go(1 hunks)management/internals/shared/grpc/server.go(11 hunks)management/server/account.go(4 hunks)management/server/account/manager.go(1 hunks)management/server/account_test.go(2 hunks)management/server/activity/codes.go(2 hunks)management/server/dns_test.go(2 hunks)management/server/http/handlers/peers/peers_handler.go(3 hunks)management/server/http/testing/testing_tools/channel/channel.go(3 hunks)management/server/job/channel.go(1 hunks)management/server/job/manager.go(1 hunks)management/server/management_proto_test.go(4 hunks)management/server/management_test.go(4 hunks)management/server/mock_server/account_mock.go(2 hunks)management/server/nameserver_test.go(2 hunks)management/server/peer.go(1 hunks)management/server/peer_test.go(5 hunks)management/server/route_test.go(2 hunks)management/server/store/sql_store.go(3 hunks)management/server/store/store.go(1 hunks)management/server/types/job.go(1 hunks)shared/management/client/client.go(1 hunks)shared/management/client/client_test.go(3 hunks)shared/management/client/grpc.go(8 hunks)shared/management/client/mock.go(2 hunks)shared/management/http/api/generate.sh(1 hunks)shared/management/http/api/openapi.yml(2 hunks)shared/management/http/api/types.gen.go(8 hunks)shared/management/proto/management.proto(2 hunks)
💤 Files with no reviewable changes (1)
- client/proto/daemon.proto
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.
Applied to files:
client/jobexec/executor.go
📚 Learning: 2025-11-14T13:05:31.729Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.
Applied to files:
client/jobexec/executor.go
🧬 Code graph analysis (42)
shared/management/client/client.go (3)
management/server/types/job.go (1)
Job(34-58)shared/management/http/api/types.gen.go (2)
JobRequest(708-710)JobResponse(713-721)shared/management/proto/management.pb.go (6)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)
management/internals/server/controllers.go (3)
management/internals/server/server.go (1)
BaseServer(45-68)management/server/job/manager.go (2)
Manager(23-30)NewJobManager(32-42)management/internals/server/container.go (1)
Create(6-10)
client/status/status_test.go (1)
client/status/status.go (1)
ConvertToStatusOutputOverview(122-169)
client/internal/debug/upload_test.go (2)
client/internal/debug/upload.go (1)
UploadDebugBundle(17-28)upload-server/types/upload.go (1)
GetURLPath(9-9)
management/server/store/store.go (1)
management/server/types/job.go (1)
Job(34-58)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
NewServer(84-147)
client/internal/debug/upload.go (1)
upload-server/types/upload.go (3)
GetURLResponse(15-18)ClientHeader(5-5)ClientHeaderValue(7-7)
client/cmd/up.go (1)
util/log.go (1)
FindFirstLogPath(77-84)
management/server/account_test.go (2)
management/server/account.go (1)
BuildManager(180-268)management/server/job/manager.go (1)
NewJobManager(32-42)
management/server/peer.go (7)
management/server/account.go (1)
DefaultAccountManager(68-113)management/server/types/job.go (2)
Job(34-58)Workload(60-64)management/server/permissions/modules/module.go (1)
Peers(7-7)management/server/permissions/operations/operation.go (1)
Delete(9-9)shared/management/status/error.go (5)
NewPermissionValidationError(213-215)NewPermissionDeniedError(209-211)NewPeerNotPartOfAccountError(105-107)Errorf(70-75)Type(46-46)management/server/store/store.go (3)
Store(50-211)LockingStrengthNone(47-47)LockingStrengthUpdate(43-43)management/server/activity/codes.go (1)
JobCreatedByUser(183-183)
management/server/http/testing/testing_tools/channel/channel.go (2)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)
management/internals/server/modules.go (1)
management/server/account.go (1)
BuildManager(180-268)
shared/management/client/mock.go (3)
shared/management/http/api/types.gen.go (2)
JobRequest(708-710)JobResponse(713-721)shared/management/proto/management.pb.go (6)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)management/server/types/job.go (1)
Job(34-58)
management/server/account/manager.go (1)
management/server/types/job.go (1)
Job(34-58)
client/embed/embed.go (1)
client/internal/connect.go (1)
NewConnectClient(51-62)
client/cmd/status.go (1)
client/status/status.go (1)
ConvertToStatusOutputOverview(122-169)
management/server/management_test.go (1)
management/server/job/manager.go (1)
NewJobManager(32-42)
management/server/peer_test.go (2)
management/server/account.go (1)
BuildManager(180-268)management/server/job/manager.go (1)
NewJobManager(32-42)
client/server/server.go (1)
client/status/status.go (1)
ToProtoFullStatus(549-635)
management/server/mock_server/account_mock.go (1)
management/server/types/job.go (1)
Job(34-58)
management/server/nameserver_test.go (2)
management/server/account.go (1)
BuildManager(180-268)management/server/job/manager.go (1)
NewJobManager(32-42)
management/server/job/channel.go (1)
management/server/job/manager.go (1)
Event(17-21)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
GetUserAuthFromContext(25-30)shared/management/http/util/util.go (3)
WriteError(84-120)WriteErrorResponse(70-80)WriteJSONObject(27-35)shared/management/http/api/types.gen.go (3)
JobRequest(708-710)JobResponse(713-721)JobResponseStatus(724-724)management/server/types/job.go (3)
NewJob(67-103)Job(34-58)Workload(60-64)
client/jobexec/executor.go (3)
client/internal/debug/debug.go (3)
GeneratorDependencies(238-243)BundleConfig(232-236)NewBundleGenerator(245-264)client/internal/debug/upload.go (1)
UploadDebugBundle(17-28)upload-server/types/upload.go (1)
DefaultBundleURL(11-11)
client/server/debug.go (1)
client/internal/debug/upload.go (1)
UploadDebugBundle(17-28)
client/internal/engine.go (6)
client/internal/profilemanager/config.go (1)
Config(89-160)client/jobexec/executor.go (3)
Executor(23-24)NewExecutor(26-28)ErrJobNotImplemented(20-20)shared/management/client/client.go (1)
Client(14-27)shared/management/proto/management.pb.go (21)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)JobStatus_failed(30-30)JobRequest_Bundle(457-459)JobRequest_Bundle(461-461)JobStatus_succeeded(29-29)BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)JobResponse_Bundle(548-550)JobResponse_Bundle(552-552)SyncResponse(721-738)SyncResponse(753-753)SyncResponse(768-770)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)client/internal/state.go (1)
CtxGetState(31-33)client/internal/debug/debug.go (2)
GeneratorDependencies(238-243)BundleConfig(232-236)
management/internals/shared/grpc/server.go (6)
management/server/account/manager.go (1)
Manager(27-131)management/server/job/manager.go (2)
Manager(23-30)Event(17-21)shared/management/proto/management_grpc.pb.go (1)
ManagementService_JobServer(427-431)shared/management/proto/management.pb.go (9)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)EncryptedMessage(322-333)EncryptedMessage(348-348)EncryptedMessage(363-365)management/server/job/channel.go (2)
Channel(18-21)ErrJobChannelClosed(15-15)encryption/message.go (1)
EncryptMessage(10-24)
management/server/store/sql_store.go (1)
management/server/types/job.go (3)
Job(34-58)JobStatusPending(18-18)JobStatusFailed(20-20)
management/server/job/manager.go (3)
shared/management/proto/management.pb.go (6)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)management/server/job/channel.go (2)
Channel(18-21)NewChannel(23-29)management/server/types/job.go (1)
Job(34-58)
shared/management/client/client_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
client/internal/connect.go (2)
client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)client/internal/profilemanager/config.go (1)
Config(89-160)
management/server/dns_test.go (2)
management/server/account.go (1)
BuildManager(180-268)management/server/job/manager.go (1)
NewJobManager(32-42)
shared/management/client/grpc.go (3)
shared/management/proto/management.pb.go (13)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)EncryptedMessage(322-333)EncryptedMessage(348-348)EncryptedMessage(363-365)JobStatus_failed(30-30)SyncResponse(721-738)SyncResponse(753-753)SyncResponse(768-770)shared/management/proto/management_grpc.pb.go (2)
ManagementService_JobClient(169-173)ManagementService_SyncClient(89-92)encryption/message.go (2)
EncryptMessage(10-24)DecryptMessage(27-40)
management/server/types/job.go (3)
shared/management/proto/management.pb.go (22)
JobStatus(25-25)JobStatus(57-59)JobStatus(61-63)JobStatus(70-72)JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)JobStatus_succeeded(29-29)JobStatus_failed(30-30)JobResponse_Bundle(548-550)JobResponse_Bundle(552-552)JobRequest_Bundle(457-459)JobRequest_Bundle(461-461)shared/management/http/api/types.gen.go (8)
JobRequest(708-710)WorkloadResponse(1948-1950)BundleParameters(361-373)BundleResult(376-378)BundleWorkloadResponse(391-399)WorkloadTypeBundle(195-195)WorkloadRequest(1943-1945)JobResponse(713-721)shared/management/status/error.go (4)
Errorf(70-75)BadRequest(36-36)InvalidArgument(27-27)Error(54-57)
client/server/server_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
client/internal/debug/debug.go (4)
util/log.go (1)
SpecialLogs(25-28)client/internal/profilemanager/profilemanager.go (1)
NewProfileManager(56-58)client/status/status.go (3)
ToProtoFullStatus(549-635)ConvertToStatusOutputOverview(122-169)ParseToFullDetailSummary(532-547)version/version.go (1)
NetbirdVersion(18-20)
client/internal/engine_test.go (5)
client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)shared/management/client/mock.go (1)
MockClient(13-24)management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
shared/management/http/api/types.gen.go (2)
shared/management/proto/management.pb.go (9)
BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)management/server/types/job.go (1)
Workload(60-64)
management/server/route_test.go (2)
management/server/account.go (1)
BuildManager(180-268)management/server/job/manager.go (1)
NewJobManager(32-42)
client/status/status.go (2)
client/proto/daemon.pb.go (21)
FullStatus(1994-2008)FullStatus(2021-2021)FullStatus(2036-2038)ManagementState(1682-1689)ManagementState(1702-1702)ManagementState(1717-1719)SignalState(1621-1628)SignalState(1641-1641)SignalState(1656-1658)LocalPeerState(1528-1539)LocalPeerState(1552-1552)LocalPeerState(1567-1569)PeerState(1347-1369)PeerState(1382-1382)PeerState(1397-1399)RelayState(1743-1750)RelayState(1763-1763)RelayState(1778-1780)NSGroupState(1803-1811)NSGroupState(1824-1824)NSGroupState(1839-1841)client/internal/peer/status.go (6)
FullStatus(151-161)ManagementState(128-132)SignalState(121-125)LocalPeerState(106-112)RosenpassState(135-138)NSGroupState(142-148)
management/server/management_proto_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
client/cmd/testutil_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: release_ui
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: JS / Lint
- GitHub Check: Windows
- GitHub Check: Linux
- GitHub Check: Darwin
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: Build Cache
- GitHub Check: Check External GPL/AGPL Licenses
- GitHub Check: Client / Unit
| // todo consider the channel buffer size when we allow to run multiple jobs | ||
| const jobChannelBuffer = 1 | ||
|
|
||
| var ( | ||
| ErrJobChannelClosed = errors.New("job channel closed") | ||
| ) | ||
|
|
||
| type Channel struct { | ||
| events chan *Event | ||
| once sync.Once | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Guard against send‑on‑closed channel and reconsider where timeout is enforced
Two concerns around Channel’s concurrency and semantics:
-
Potential panic on send after Close
Close()closesjc.events, andAddEventalways doesjc.events <- eventin theselectcase without checking for closure. IfClosecan race withAddEvent(which is likely when job streams disconnect while new jobs are being queued), a send on a closed channel will panic the process.
Consider adding internal state to mark the channel as closed and haveAddEventshort‑circuit withErrJobChannelClosed(or similar) instead of ever sending to a potentially closed channel, or ensure by design that only the sender closes and that no furtherAddEventcalls happen afterClose. -
Timeout at enqueue vs. job response (already noted by your TODO)
The currentresponseWaittimeout only limits how long you wait to enqueue into the buffered channel, not how long you wait for the peer’s job response. WithjobChannelBuffer = 1, this effectively prevents a second job from being queued while the first one is still pending, but does not clean uppendingjobs when a peer never responds. Moving timeout handling to the job lifecycle (pending → failed) in the manager would better match the comment.
I’d treat the send‑on‑closed‑channel risk as something to fix before merging; the timeout refactor can follow.
Also applies to: 31-41, 43-59
I'll examine the full implementation to validate the concurrency concerns.
---
🏁 Script executed:
```shell
#!/bin/bash
# Retrieve the full channel.go file to examine Close(), AddEvent(), and Event() implementations
find . -name "channel.go" -path "*/job/*" -type f | head -5 | xargs -I {} cat {}
Length of output: 1183
Guard against send-on-closed channel and reconsider where timeout is enforced
Two concerns around Channel's concurrency and semantics:
-
Potential panic on send after Close
Close()usessync.Onceto closejc.eventsexactly once, but this only prevents double-close, not send-on-closed-channel panics. IfAddEvent()is called concurrently withClose(), theselectcase that sends tojc.eventscan execute after the channel is closed, causing a panic. Thesync.Onceguard does not prevent this race.
Add an internal closed flag (guarded by mutex) thatAddEventchecks before entering the select, or ensure by documented contract and design thatAddEventcannot be called afterClose(). This must be fixed before merging. -
Timeout at enqueue vs. job response (confirmed by TODO)
TheresponseWaittimeout only limits how long to wait to enqueue into the buffered channel, not how long to wait for the peer's job response. WithjobChannelBuffer = 1, this prevents a second job from being queued while the first is pending, but does not clean up pending jobs when a peer never responds. The TODO comment (lines 34–35) confirms this is a known issue. Move timeout handling to the job lifecycle (pending → failed) in the manager.
🤖 Prompt for AI Agents
In management/server/job/channel.go around lines 11–21, AddEvent can panic if it
races with Close because closing the channel is guarded only by sync.Once; add
an internal boolean closed guarded by a mutex (or RWMutex) that Close sets and
AddEvent checks before attempting to send, returning ErrJobChannelClosed if
closed to avoid send-on-closed panics; also make the send non-blocking/select
with a default path to return an error if the channel is closed or full.
Separately, move the timeout logic out of the enqueue path into the job
lifecycle in the manager: stop using responseWait to gate channel enqueueing
(keep enqueue quick or increase buffer), and implement a manager-side
pending→failed transition with a timer for each job so unresponsive peers are
timed out and cleaned up rather than relying on the channel buffer to enforce
timeouts.
| func (am *DefaultAccountManager) CreatePeerJob(ctx context.Context, accountID, peerID, userID string, job *types.Job) error { | ||
| // todo: Create permissions for job | ||
| allowed, err := am.permissionsManager.ValidateUserPermissions(ctx, accountID, userID, modules.Peers, operations.Delete) | ||
| if err != nil { | ||
| return status.NewPermissionValidationError(err) | ||
| } | ||
| if !allowed { | ||
| return status.NewPermissionDeniedError() | ||
| } | ||
|
|
||
| peerAccountID, err := am.Store.GetAccountIDByPeerID(ctx, store.LockingStrengthNone, peerID) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if peerAccountID != accountID { | ||
| return status.NewPeerNotPartOfAccountError() | ||
| } | ||
|
|
||
| // check if peer connected | ||
| if !am.jobManager.IsPeerConnected(peerID) { | ||
| return status.Errorf(status.BadRequest, "peer not connected") | ||
| } | ||
|
|
||
| // check if already has pending jobs | ||
| // todo: The job checks here are not protected. The user can run this function from multiple threads, | ||
| // and each thread can think there is no job yet. This means entries in the pending job map will be overwritten, | ||
| // and only one will be kept, but potentially another one will overwrite it in the queue. | ||
| if am.jobManager.IsPeerHasPendingJobs(peerID) { | ||
| return status.Errorf(status.BadRequest, "peer already has pending job") | ||
| } | ||
|
|
||
| jobStream, err := job.ToStreamJobRequest() | ||
| if err != nil { | ||
| return status.Errorf(status.BadRequest, "invalid job request %v", err) | ||
| } | ||
|
|
||
| // try sending job first | ||
| if err := am.jobManager.SendJob(ctx, accountID, peerID, jobStream); err != nil { | ||
| return status.Errorf(status.Internal, "failed to send job: %v", err) | ||
| } | ||
|
|
||
| var peer *nbpeer.Peer | ||
| var eventsToStore func() | ||
|
|
||
| // persist job in DB only if send succeeded | ||
| err = am.Store.ExecuteInTransaction(ctx, func(transaction store.Store) error { | ||
| peer, err = transaction.GetPeerByID(ctx, store.LockingStrengthUpdate, accountID, peerID) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| if err := transaction.CreatePeerJob(ctx, job); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| jobMeta := map[string]any{ | ||
| "for_peer_name": peer.Name, | ||
| "job_type": job.Workload.Type, | ||
| } | ||
|
|
||
| eventsToStore = func() { | ||
| am.StoreEvent(ctx, userID, peer.ID, accountID, activity.JobCreatedByUser, jobMeta) | ||
| } | ||
| return nil | ||
| }) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| eventsToStore() | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Peer job management: ownership check bug, nil‑safety, and ordering concerns
The new peer‑job methods are a good start, but there are a few issues worth addressing:
- Job ownership validation bug (path vs data mismatch)
GetPeerJobByIDignores thepeerIDargument when returning the job:job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID) // no check that job.PeerID == peerID
- The HTTP handler
GET /peers/{peerId}/jobs/{jobId}passes bothpeerIDandjobID, but the account manager currently only checks that the peer belongs to the account, not that the job belongs to that peer. - Result: a caller can request
/peers/{peerA}/jobs/{jobOfPeerB}(within the same account) and see the job forpeerB. - Fix: after loading the job, assert
job.PeerID == peerIDand returnstatus.NotFoundorstatus.PermissionDeniedotherwise:func (am *DefaultAccountManager) GetPeerJobByID(ctx context.Context, accountID, userID, peerID, jobID string) (*types.Job, error) { // ... existing permission + peerAccountID checks ...
-
job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)
-
job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID) if err != nil { return nil, err } -
if job.PeerID != peerID { -
return nil, status.NewPeerNotPartOfAccountError() -
}
} return job, nil
-
Potential nil‑pointer on
am.jobManagerCreatePeerJobassumesam.jobManageris non‑nil:if !am.jobManager.IsPeerConnected(peerID) { ... } if am.jobManager.IsPeerHasPendingJobs(peerID) { ... } if err := am.jobManager.SendJob(...); err != nil { ... }
- If
BuildManageris ever called with a nil JobManager (e.g., in older tests or misconfigured environments), this will panic on first use of the job API. - Consider a defensive check up front:
if am.jobManager == nil { return status.Errorf(status.Internal, "job manager is not configured") }
- Or make the non‑nil requirement explicit and enforced in construction/tests.
-
Send‑before‑persist ordering can desync DB and in‑memory job state
- The method currently:
- Checks for pending jobs.
- Builds
jobStreamand callsSendJob, which will enqueue/send work to the peer. - Only then writes the job to the DB via
CreatePeerJobinside a transaction.
- If step (3) fails (DB error, constraint, etc.),
CreatePeerJobreturns an error but the job has already been dispatched to the peer, potentially leaving:- A job executing on the client but absent from persistent storage and from
/jobslistings.
- A job executing on the client but absent from persistent storage and from
- Consider flipping the order:
- Persist the job first (status = pending), then send it to the peer, and if sending fails, update the job to
failedwith an appropriateFailedReason. That keeps DB and runtime state consistent at the cost of occasionally having “failed to send” jobs in the DB.
- Persist the job first (status = pending), then send it to the peer, and if sending fails, update the job to
- The method currently:
-
Permissions reused from
Peers/Delete- Using
modules.Peers+operations.Deleteas the gate for creating and viewing jobs is strict but workable short‑term. The TODO about introducing dedicated job permissions is valid; when you add those, these sites will be the key ones to update.
- Using
Overall structure is solid; tightening ownership checks and jobManager assumptions will make the new job API much safer.
Also applies to: 389-441, 618-638
| // ApplyResponse validates and maps a proto.JobResponse into the Job fields. | ||
| func (j *Job) ApplyResponse(resp *proto.JobResponse) error { | ||
| if resp == nil { | ||
| return nil | ||
| } | ||
|
|
||
| j.ID = string(resp.ID) | ||
| now := time.Now().UTC() | ||
| j.CompletedAt = &now | ||
| switch resp.Status { | ||
| case proto.JobStatus_succeeded: | ||
| j.Status = JobStatusSucceeded | ||
| case proto.JobStatus_failed: | ||
| j.Status = JobStatusFailed | ||
| if len(resp.Reason) > 0 { | ||
| reason := string(resp.Reason) | ||
| if len(resp.Reason) > MaxJobReasonLength { | ||
| reason = string(resp.Reason[:MaxJobReasonLength]) + "... (truncated)" | ||
| } | ||
| j.FailedReason = fmt.Sprintf("Client error: '%s'", reason) | ||
| } | ||
| return nil | ||
| default: | ||
| return fmt.Errorf("unexpected job status: %v", resp.Status) | ||
| } | ||
|
|
||
| // Handle workload results (oneof) | ||
| var err error | ||
| switch r := resp.WorkloadResults.(type) { | ||
| case *proto.JobResponse_Bundle: | ||
| if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil { | ||
| return fmt.Errorf("failed to marshal workload results: %w", err) | ||
| } | ||
| default: | ||
| return fmt.Errorf("unsupported workload response type: %T", r) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely JSON shape mismatch between proto.BundleResult and api.BundleResult
In ApplyResponse, for a succeeded bundle job you serialize the proto result directly:
case *proto.JobResponse_Bundle:
if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil {
return fmt.Errorf("failed to marshal workload results: %w", err)
}Later, buildBundleResponse does:
var r api.BundleResult
if err := json.Unmarshal(j.Workload.Result, &r); err != nil {
return fmt.Errorf("invalid result for bundle job: %w", err)
}However, the types differ:
proto.BundleResulthasUploadKey String(a protobuf message type, not a plain string).api.BundleResulthasUploadKey *string.
Encoding a protobuf message with encoding/json will typically produce an object like:
{ "upload_key": { "value": "..." } }whereas api.BundleResult expects:
{ "upload_key": "..." }This will cause json.Unmarshal in buildBundleResponse to fail (cannot unmarshal object into Go value of type string), meaning succeeded jobs will not be convertible to HTTP WorkloadResponse.
I’d strongly recommend normalizing here by mapping the proto result into the HTTP shape instead of serializing the proto struct directly, e.g.:
case *proto.JobResponse_Bundle:
- if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil {
- return fmt.Errorf("failed to marshal workload results: %w", err)
- }
+ apiResult := api.BundleResult{}
+ if r.Bundle != nil {
+ // Assuming proto.String carries a `Value` or similar; adapt field access accordingly.
+ key := r.Bundle.UploadKey // map to string as appropriate
+ apiResult.UploadKey = &key
+ }
+ if j.Workload.Result, err = json.Marshal(apiResult); err != nil {
+ return fmt.Errorf("failed to marshal workload results: %w", err)
+ }(Adjust field access/unwrap logic to whatever String actually exposes.)
This keeps the stored JSON shape aligned with api.BundleResult and avoids runtime errors in BuildWorkloadResponse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
client/internal/connect.go(7 hunks)client/internal/engine.go(13 hunks)client/internal/engine_test.go(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
client/internal/engine_test.go (5)
client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)shared/management/client/mock.go (1)
MockClient(13-24)management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(180-268)management/internals/shared/grpc/server.go (1)
NewServer(84-147)
client/internal/engine.go (7)
client/internal/profilemanager/config.go (1)
Config(89-160)client/jobexec/executor.go (3)
Executor(23-24)NewExecutor(26-28)ErrJobNotImplemented(20-20)shared/management/client/client.go (1)
Client(14-27)shared/management/proto/management.pb.go (19)
JobRequest(388-398)JobRequest(413-413)JobRequest(428-430)JobResponse(463-475)JobResponse(490-490)JobResponse(505-507)JobStatus_failed(30-30)JobRequest_Bundle(457-459)JobRequest_Bundle(461-461)JobStatus_succeeded(29-29)BundleParameters(554-563)BundleParameters(578-578)BundleParameters(593-595)SyncResponse(721-738)SyncResponse(753-753)SyncResponse(768-770)BundleResult(625-631)BundleResult(646-646)BundleResult(661-663)client/internal/state.go (1)
CtxGetState(31-33)client/internal/debug/debug.go (2)
GeneratorDependencies(238-243)BundleConfig(232-236)client/server/server.go (1)
New(96-107)
client/internal/connect.go (3)
client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)client/internal/profilemanager/config.go (1)
Config(89-160)shared/management/proto/management.pb.go (3)
PeerConfig(2082-2098)PeerConfig(2113-2113)PeerConfig(2128-2130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Management / Integration (amd64, postgres)
- GitHub Check: Management / Benchmark (API) (amd64, postgres)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Management / Benchmark (API) (amd64, sqlite)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Client / Unit (386)
- GitHub Check: Client / Unit
- GitHub Check: release
- GitHub Check: release_ui_darwin
- GitHub Check: Android / Build
- GitHub Check: Linux
- GitHub Check: Client / Unit
- GitHub Check: Darwin
- GitHub Check: Windows
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
🔇 Additional comments (16)
client/internal/connect.go (5)
65-66: LGTM! logPath parameter threading is clean.The public Run signature now accepts logPath and forwards it correctly. Mobile variants appropriately default to empty string, which is sensible given platform-specific logging constraints.
Also applies to: 87-87, 105-105
108-108: LGTM!Internal run signature correctly updated to receive logPath.
251-251: LGTM!Call to createEngineConfig correctly passes logPath parameter.
275-275: LGTM!NewEngine call correctly updated with profile config parameter, enabling debug bundle generation support.
414-414: LGTM! EngineConfig enriched for debug bundle support.The function signature and field assignments correctly thread logPath and ProfileConfig through to EngineConfig. Note that the entire profilemanager.Config is stored in ProfileConfig—this appears intentional for debug bundle generation, which may need access to ManagementURL and other config details.
Also applies to: 449-452
client/internal/engine_test.go (3)
28-28: LGTM!Import added to support JobManager usage in test setup.
256-256: LGTM! Test updates are consistent.All NewEngine calls correctly updated with two additional nil parameters (checks, profileConfig). Using nil is appropriate for tests that don't exercise debug bundle or checks functionality.
Also applies to: 418-424, 637-643, 802-808, 1004-1010, 1536-1536
1595-1595: LGTM! Test infrastructure correctly wired.JobManager created and threaded through BuildManager and NewServer, enabling job subsystem testing. Using nil for metrics is appropriate for test context.
Also applies to: 1626-1626, 1632-1632
client/internal/engine.go (8)
34-34: LGTM!Imports added to support debug bundle generation and job execution.
Also applies to: 52-52
138-141: LGTM!EngineConfig fields added to support debug bundle generation. Clear comment documents the purpose.
206-207: LGTM! Proper concurrency primitives added.syncRespMux provides dedicated locking for sync response persistence, separate from syncMsgMux. jobExecutorWG ensures clean shutdown of job-related goroutines.
Also applies to: 220-221
234-234: LGTM!NewEngine signature correctly extended with profile config parameter. jobExecutor initialized during construction for handling management-service job requests.
Also applies to: 254-254
335-335: LGTM! Proper shutdown ordering.jobExecutorWG.Wait() ensures job-related goroutines complete before resource cleanup. The preceding e.cancel() will signal job goroutines to exit via context cancellation.
523-523: LGTM! receiveJobEvents follows established patterns.The function correctly:
- Increments jobExecutorWG before spawning goroutine
- Handles job requests synchronously via callback
- Triggers client restart on Job stream errors (consistent with Management/Signal event handlers)
- Returns proper error codes for unimplemented job types
Also applies to: 942-977
803-814: LGTM! Proper lock separation for sync response persistence.The code correctly uses syncRespMux (not syncMsgMux) to protect sync response persistence state. RLock for reading the flag, then separate write lock for storing the response—proper lock upgrade pattern.
1874-1886: LGTM! Thread-safe sync response persistence.Both functions correctly use syncRespMux for protection. GetLatestSyncResponse properly:
- Reads state under RLock
- Releases lock before expensive clone operation
- Returns a clone to prevent external mutation
Also applies to: 1889-1911
| func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) { | ||
| log.Infof("handle remote debug bundle request: %s", params.String()) | ||
| syncResponse, err := e.GetLatestSyncResponse() | ||
| if err != nil { | ||
| log.Warnf("get latest sync response: %v", err) | ||
| } | ||
|
|
||
| bundleDeps := debug.GeneratorDependencies{ | ||
| InternalConfig: e.config.ProfileConfig, | ||
| StatusRecorder: e.statusRecorder, | ||
| SyncResponse: syncResponse, | ||
| LogPath: e.config.LogPath, | ||
| } | ||
|
|
||
| bundleJobParams := debug.BundleConfig{ | ||
| Anonymize: params.Anonymize, | ||
| IncludeSystemInfo: true, | ||
| LogFileCount: uint32(params.LogFileCount), | ||
| } | ||
|
|
||
| waitFor := time.Duration(params.BundleForTime) * time.Minute | ||
|
|
||
| uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String()) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| response := &mgmProto.JobResponse_Bundle{ | ||
| Bundle: &mgmProto.BundleResult{ | ||
| UploadKey: uploadKey, | ||
| }, | ||
| } | ||
| return response, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential nil pointer dereference on Line 1001.
The code accesses e.config.ProfileConfig.ManagementURL.String() without checking if ProfileConfig is nil. While tests pass nil, they don't exercise this path. For robustness, consider adding a nil check.
Apply this defensive check:
+ if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {
+ return nil, errors.New("profile config or management URL not available for bundle generation")
+ }
+
uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) { | |
| log.Infof("handle remote debug bundle request: %s", params.String()) | |
| syncResponse, err := e.GetLatestSyncResponse() | |
| if err != nil { | |
| log.Warnf("get latest sync response: %v", err) | |
| } | |
| bundleDeps := debug.GeneratorDependencies{ | |
| InternalConfig: e.config.ProfileConfig, | |
| StatusRecorder: e.statusRecorder, | |
| SyncResponse: syncResponse, | |
| LogPath: e.config.LogPath, | |
| } | |
| bundleJobParams := debug.BundleConfig{ | |
| Anonymize: params.Anonymize, | |
| IncludeSystemInfo: true, | |
| LogFileCount: uint32(params.LogFileCount), | |
| } | |
| waitFor := time.Duration(params.BundleForTime) * time.Minute | |
| uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String()) | |
| if err != nil { | |
| return nil, err | |
| } | |
| response := &mgmProto.JobResponse_Bundle{ | |
| Bundle: &mgmProto.BundleResult{ | |
| UploadKey: uploadKey, | |
| }, | |
| } | |
| return response, nil | |
| } | |
| func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) { | |
| log.Infof("handle remote debug bundle request: %s", params.String()) | |
| syncResponse, err := e.GetLatestSyncResponse() | |
| if err != nil { | |
| log.Warnf("get latest sync response: %v", err) | |
| } | |
| bundleDeps := debug.GeneratorDependencies{ | |
| InternalConfig: e.config.ProfileConfig, | |
| StatusRecorder: e.statusRecorder, | |
| SyncResponse: syncResponse, | |
| LogPath: e.config.LogPath, | |
| } | |
| bundleJobParams := debug.BundleConfig{ | |
| Anonymize: params.Anonymize, | |
| IncludeSystemInfo: true, | |
| LogFileCount: uint32(params.LogFileCount), | |
| } | |
| waitFor := time.Duration(params.BundleForTime) * time.Minute | |
| if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil { | |
| return nil, errors.New("profile config or management URL not available for bundle generation") | |
| } | |
| uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String()) | |
| if err != nil { | |
| return nil, err | |
| } | |
| response := &mgmProto.JobResponse_Bundle{ | |
| Bundle: &mgmProto.BundleResult{ | |
| UploadKey: uploadKey, | |
| }, | |
| } | |
| return response, nil | |
| } |
🤖 Prompt for AI Agents
In client/internal/engine.go around lines 979 to 1012, the call to
e.config.ProfileConfig.ManagementURL.String() can panic if ProfileConfig or its
ManagementURL is nil; add a defensive nil-check before calling String() and pass
a safe fallback (e.g., empty string or configured default) into
e.jobExecutor.BundleJob. Specifically, ensure e.config and
e.config.ProfileConfig are non-nil and that ProfileConfig.ManagementURL is
non-nil, compute a local managementURL string variable accordingly, and use that
variable in the BundleJob call so the function never dereferences a nil pointer.
| } | ||
|
|
||
| // GetJobByID fetches job by ID | ||
| func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, jobID string) (*types.Job, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should also verify the peer this is coming from
|
|
||
| func (s *Server) startResponseReceiver(ctx context.Context, srv proto.ManagementService_JobServer) { | ||
| go func() { | ||
| for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should also check if the ctx is done here or Err() != nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/internal/engine.go (1)
1295-1326: Fix compile error:zone.GetSearchDomainDisabledis not defined onmgmProto.CustomZone
toDNSConfigcurrently does:dnsZone := nbdns.CustomZone{ Domain: zone.GetDomain(), SearchDomainDisabled: zone.GetSearchDomainDisabled(), SkipPTRProcess: zone.GetSkipPTRProcess(), }The pipeline error indicates
GetSearchDomainDisableddoes not exist on*mgmProto.CustomZone, so this code does not compile. You need to either:
- Add a corresponding
search_domain_disabled(or similarly named) field to theCustomZonemessage inshared/management/proto/management.protoand regenerate the Go bindings, or- Drop this mapping for now and rely on the default value in
nbdns.CustomZone.If you just want the current code to build without changing the proto, adjust as:
- dnsZone := nbdns.CustomZone{ - Domain: zone.GetDomain(), - SearchDomainDisabled: zone.GetSearchDomainDisabled(), - SkipPTRProcess: zone.GetSkipPTRProcess(), - } + dnsZone := nbdns.CustomZone{ + Domain: zone.GetDomain(), + SkipPTRProcess: zone.GetSkipPTRProcess(), + }and reintroduce the mapping once the proto field exists.
♻️ Duplicate comments (1)
client/internal/engine.go (1)
958-993: Nil pointer risk when usingProfileConfig.ManagementURLin bundle job handler
handleBundlecalls:uploadKey, err := e.jobExecutor.BundleJob( e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String(), )without checking whether
e.config.ProfileConfigorProfileConfig.ManagementURLare nil. Either being nil will panic when a bundle job arrives.Add a defensive check before constructing
bundleDepsand the URL, and return a failed job response if the configuration is missing. For example:func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) { @@ - syncResponse, err := e.GetLatestSyncResponse() + syncResponse, err := e.GetLatestSyncResponse() @@ - bundleDeps := debug.GeneratorDependencies{ - InternalConfig: e.config.ProfileConfig, + if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil { + return nil, errors.New("profile config or management URL not available for bundle generation") + } + + bundleDeps := debug.GeneratorDependencies{ + InternalConfig: e.config.ProfileConfig, @@ - waitFor := time.Duration(params.BundleForTime) * time.Minute - - uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String()) + waitFor := time.Duration(params.BundleForTime) * time.Minute + managementURL := e.config.ProfileConfig.ManagementURL.String() + + uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, managementURL)This ensures the job handler fails gracefully instead of panicking the client.
Also applies to: 995-1028
🧹 Nitpick comments (7)
shared/management/client/client_test.go (1)
76-76: Consider passing metrics to JobManager instead of nil.The
JobManageris created withnilfor the metrics parameter, but a validmetricsobject is created just below on lines 100-101. Consider reordering the initialization to create metrics first, then pass it toNewJobManager, unless nil metrics is intentionally acceptable for test scenarios.Apply this diff to use the actual metrics object:
+ metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) + require.NoError(t, err) + - jobManager := job.NewJobManager(nil, store) + jobManager := job.NewJobManager(metrics, store) eventStore := &activity.InMemoryEventStore{} ctrl := gomock.NewController(t) t.Cleanup(ctrl.Finish) permissionsManagerMock := permissions.NewMockManager(ctrl) permissionsManagerMock. EXPECT(). ValidateUserPermissions( gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), ). Return(true, nil). AnyTimes() peersManger := peers.NewManager(store, permissionsManagerMock) settingsManagerMock := settings.NewMockManager(ctrl) ia, _ := integrations.NewIntegratedValidator(context.Background(), peersManger, settingsManagerMock, eventStore) - metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) - require.NoError(t, err) -shared/management/proto/management.proto (1)
91-96: Consider adding documentation forBundleParametersfields.The field names could benefit from proto comments for clarity:
bundle_foris ambiguous—it's unclear what this flag enables without context from the HTTP API docsbundle_for_timeshould document its unit (minutes, based on the HTTP API)message BundleParameters { - bool bundle_for = 1; - int64 bundle_for_time = 2; + // Whether to generate a bundle for a specific timeframe. + bool bundle_for = 1; + // Time period in minutes for which to generate the bundle. + int64 bundle_for_time = 2; int32 log_file_count = 3; + // Whether sensitive data should be anonymized in the bundle. bool anonymize = 4; }management/server/account/manager.go (1)
126-128: Normalize parameter ordering for peer job Manager methods
CreatePeerJobtakes(accountID, peerID, userID string, ...), whileGetAllPeerJobsandGetPeerJobByIDuse(accountID, userID, peerID string, ...). This asymmetry is easy to trip over in implementations and mocks.Consider standardizing on one ordering (e.g.
ctx, accountID, userID, peerID, ...) for all three methods and updating implementations accordingly.client/internal/engine_test.go (1)
239-257: Consider centralizing NewEngine test construction to avoid duplicatednil, nilargumentsAll the
NewEngine(...)calls now end with..., peer.NewRecorder("https://mgm"), nil, nil). This is correct for the current signature, but the repeated trailingnil, nilacross multiple tests makes future signature changes easy to miss.Consider a small helper (e.g.
newTestEngine(t, ctx, cfg, mgmtClientOpts...)) that sets up the common arguments (including the last two) so only that helper needs updating ifNewEngineevolves again.Also applies to: 417-425, 637-644, 801-809, 1003-1011, 1536-1537
management/internals/shared/grpc/server.go (2)
55-77: Consider guarding against a niljobManagerin server construction
Servernow depends onjobManager(CreateJobChannel,CloseChannel,HandleResponse), butNewServeraccepts a*job.Managerwithout validating it. If any caller passesnil, theJobRPC will panic. Consider either:
- Requiring non-nil and returning an error from
NewServerifjobManager == nil, or- Making the
Jobhandler explicitly reject calls whenjobManageris nil with a gRPCInternal/Unavailableerror.This keeps failures explicit instead of panicking.
Also applies to: 79-132
175-210: Job streaming flow is coherent; error handling strategy is acceptable but could be tightenedThe
JobRPC handshake (decrypt first message, resolve account/peer, set context keys), background response receiver (startResponseReceiver), andsendJobsLoop/sendJobpipeline all look logically consistent and reuse existing encryption and context patterns fromSync. The goroutine cleanly exits onio.EOF/context.Canceled, and the job channel is closed viadefer CloseChannel.The only notable tradeoff is that
sendJobsLooplogs and returnsnilonsendJobfailures, so the client sees a successful stream close rather than an error status; you already left a TODO to refine this. Given that, the current implementation is reasonable for an initial version.Also applies to: 327-390, 450-473
client/internal/engine.go (1)
235-255: Unusedc *profilemanager.Configparameter inNewEngine
NewEngineaccepts ac *profilemanager.Configargument but doesn’t use it; all configuration is read fromconfig *EngineConfiginstead. Unless future work will usec, consider removing this parameter to avoid confusion or, alternatively, wiringcintoconfig.ProfileConfighere.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
client/proto/daemon.pb.gois excluded by!**/*.pb.gogo.sumis excluded by!**/*.sum
📒 Files selected for processing (29)
client/cmd/testutil_test.go(4 hunks)client/internal/engine.go(13 hunks)client/internal/engine_test.go(10 hunks)client/proto/daemon.proto(0 hunks)client/server/server.go(4 hunks)client/server/server_test.go(4 hunks)go.mod(2 hunks)management/internals/server/boot.go(1 hunks)management/internals/server/controllers.go(3 hunks)management/internals/server/modules.go(1 hunks)management/internals/shared/grpc/server.go(10 hunks)management/server/account.go(4 hunks)management/server/account/manager.go(1 hunks)management/server/account_test.go(2 hunks)management/server/activity/codes.go(2 hunks)management/server/dns_test.go(2 hunks)management/server/http/handlers/peers/peers_handler.go(3 hunks)management/server/http/testing/testing_tools/channel/channel.go(3 hunks)management/server/management_proto_test.go(4 hunks)management/server/management_test.go(3 hunks)management/server/mock_server/account_mock.go(2 hunks)management/server/nameserver_test.go(2 hunks)management/server/peer.go(1 hunks)management/server/peer_test.go(5 hunks)management/server/route_test.go(2 hunks)shared/management/client/client_test.go(4 hunks)shared/management/http/api/openapi.yml(2 hunks)shared/management/http/api/types.gen.go(8 hunks)shared/management/proto/management.proto(2 hunks)
💤 Files with no reviewable changes (1)
- client/proto/daemon.proto
🚧 Files skipped from review as they are similar to previous changes (12)
- management/server/account_test.go
- management/server/dns_test.go
- management/server/account.go
- management/internals/server/modules.go
- management/server/route_test.go
- management/server/http/testing/testing_tools/channel/channel.go
- management/server/peer_test.go
- client/cmd/testutil_test.go
- management/server/activity/codes.go
- management/server/management_test.go
- management/server/peer.go
- management/server/nameserver_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
Applied to files:
shared/management/client/client_test.go
🧬 Code graph analysis (10)
management/server/account/manager.go (1)
management/server/types/job.go (1)
Job(34-58)
client/server/server_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(80-132)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
NewServer(80-132)
management/server/mock_server/account_mock.go (1)
management/server/types/job.go (1)
Job(34-58)
management/server/management_proto_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(80-132)
shared/management/client/client_test.go (3)
management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(178-266)management/internals/shared/grpc/server.go (1)
NewServer(80-132)
client/server/server.go (1)
client/status/status.go (1)
ToProtoFullStatus(549-635)
management/internals/server/controllers.go (3)
management/internals/server/server.go (1)
BaseServer(45-68)management/server/job/manager.go (2)
Manager(23-30)NewJobManager(32-42)management/internals/server/container.go (1)
Create(6-10)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
GetUserAuthFromContext(25-30)shared/management/http/util/util.go (3)
WriteError(84-120)WriteErrorResponse(70-80)WriteJSONObject(27-35)shared/management/http/api/types.gen.go (3)
JobRequest(708-710)JobResponse(713-721)JobResponseStatus(724-724)management/server/types/job.go (3)
NewJob(67-103)Job(34-58)Workload(60-64)
client/internal/engine_test.go (3)
client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)management/server/job/manager.go (1)
NewJobManager(32-42)management/server/account.go (1)
BuildManager(178-266)
🪛 GitHub Actions: Linux
client/internal/engine.go
[error] 1312-1312: zone.GetSearchDomainDisabled undefined (type *"github.com/netbirdio/netbird/shared/management/proto".CustomZone has no field or method GetSearchDomainDisabled)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Android / Build
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: JS / Lint
- GitHub Check: iOS / Build
- GitHub Check: Linux
- GitHub Check: Darwin
- GitHub Check: Windows
- GitHub Check: Client / Unit
- GitHub Check: Check External GPL/AGPL Licenses
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
🔇 Additional comments (21)
shared/management/client/client_test.go (2)
27-27: LGTM!The import addition is necessary to support JobManager instantiation in the test setup.
123-123: LGTM!The
BuildManagercall correctly includes thejobManagerparameter in the expected position (5th parameter).management/internals/server/controllers.go (1)
29-33: LGTM!The
JobManager()accessor correctly follows the established lazy initialization pattern using theCreatehelper, consistent with other accessors likePeersUpdateManager()andIntegratedValidator().shared/management/proto/management.proto (2)
51-53: LGTM!The bidirectional streaming RPC correctly follows the encrypted message pattern used by other management service endpoints, maintaining consistency in the API design.
75-89: Potential inconsistency between proto and HTTP API status values.The
JobStatusenum here definesunknown_status,succeeded, andfailed, but the HTTP API types (JobResponseStatusintypes.gen.go) include an additionalpendingstatus. If jobs can be in a pending state before completion, consider adding it to the proto enum for consistency:enum JobStatus { unknown_status = 0; //placeholder - succeeded = 1; - failed = 2; + pending = 1; + succeeded = 2; + failed = 3; }Also,
Reasonon line 84 uses PascalCase while proto convention typically uses snake_case (reason).shared/management/http/api/types.gen.go (1)
1-4: This is auto-generated code.This file is generated by
oapi-codegen. Any issues with the types or structure should be addressed in the source OpenAPI specification rather than this file.management/internals/server/boot.go (1)
148-148: LGTM!The
JobManageris correctly wired into the gRPC server construction, following the established dependency injection pattern. The parameter position matches theNewServerfunction signature inmanagement/internals/shared/grpc/server.go.management/server/management_proto_test.go (2)
343-343: LGTM!The
JobManagerinstantiation withnilmetrics is acceptable for test purposes—the implementation handles nil metrics gracefully.
372-373: LGTM!The test correctly wires
jobManagerinto bothBuildManagerandnbgrpc.NewServer, matching the updated function signatures and maintaining consistency with the production code inboot.go.Also applies to: 386-386
client/server/server.go (2)
31-33: Status conversion now correctly uses shared helperSwitching to
nbstatus.ToProtoFullStatus(fullStatus)centralizes the in‑memory→proto mapping and keeps this server aligned with the status package; the subsequent mutation ofEventsandSshServerStatestill fits the previous behaviour.Also applies to: 1080-1088
1537-1545: ConnectClient.Run wiring matches new signaturePassing
s.logFileintos.connectClient.Run(runningChan, s.logFile)correctly adapts to the new Run signature and ensures the engine has access to the daemon’s log path without altering existing retry semantics.shared/management/http/api/openapi.yml (3)
35-38: Jobs tag fits existing API categorisationAdding the
Jobstag (markedx-experimental) matches the tagging style of other feature areas and clearly scopes the new endpoints.
41-156: Workload and Job schemas are consistent and extensibleThe
WorkloadType/Bundle*schemas plusWorkloadRequest/ResponseandJobRequest/JobResponseare structured cleanly:
- Discriminator on
typewithbundlemapping is valid and forward‑extensible.JobResponserequired fields avoid nullable properties (completed_at,failed_reason) while guaranteeingid,status,triggered_by, andworkload.- Bundle parameter/result fields match the described bundle use case.
No structural issues from an OpenAPI standpoint.
2353-2456: Peer Job endpoints integrate cleanly into peers API
/api/peers/{peerId}/jobs(GET/POST) and/api/peers/{peerId}/jobs/{jobId}(GET):
- Use the new
JobRequest/JobResponseschemas correctly.- Reuse
BearerAuthandTokenAuthsecurity consistent with other peer operations.- Return an array for list, 201 with a single
JobResponsefor create, and 200 for get.The shape and placement of these endpoints look correct and consistent with the rest of the spec.
client/server/server_test.go (1)
23-24: JobManager wiring in management test server matches production setupCreating a
jobManager := job.NewJobManager(nil, store)and threading it intoserver.BuildManager(...)andnbgrpc.NewServer(...)keeps the test server aligned with the new job‑enabled construction path without altering existing behaviour of this test.Also applies to: 298-332
client/internal/engine_test.go (1)
27-29: Engine tests correctly integrate JobManager into management server startupImporting
management/server/job, constructingjobManager := job.NewJobManager(nil, store), and passing it through toserver.BuildManager(...)andnbgrpc.NewServer(...)keeps this test’s management server in sync with the new job‑aware construction path and should let engine tests run against the same wiring as production.Also applies to: 1595-1636
management/server/http/handlers/peers/peers_handler.go (2)
31-41: Peer job HTTP endpoints and handlers look consistent with existing patternsThe new
/peers/{peerId}/jobsand/peers/{peerId}/jobs/{jobId}routes plusCreateJob,ListJobs, andGetJobfollow the existing handler conventions: user auth from context,mux.Varsfor IDs, JSON decoding with proper 400 on parse failure, and error propagation viautil.WriteError. Delegation toaccountManagerfor authorization and data access is appropriate. I don’t see correctness or security issues in this flow.Also applies to: 51-142
594-614: Job→API response mapping is correct and defensive
toSingleJobResponsebuildsapi.JobResponsefromtypes.Jobcleanly, including building the workload viaBuildWorkloadResponse, handling optionalFailedReason, and preserving timestamps and status. This matches the shared API types and is safe as long asBuildWorkloadResponsereturns a non-nil workload whenerr == nil, which it’s already checked for.management/server/mock_server/account_mock.go (1)
132-155: Job-related mock hooks are wired correctlyThe new
CreatePeerJobFunc,GetAllPeerJobsFunc, andGetPeerJobByIDFuncfields and their methods mirror the existing mock style (delegate if set, otherwise returncodes.Unimplemented). This should integrate cleanly with the expandedaccount.Managerinterface and make job behavior easily configurable in tests.client/internal/engine.go (2)
205-212: Sync response persistence refactor withsyncRespMuxlooks correctThe move from using
syncMsgMuxto a dedicatedsyncRespMuxforpersistSyncResponse/latestSyncResponseremoves the race between sync handling and debug bundle reads:
handleSyncnow checkspersistSyncResponseunderRLockand conditionally writeslatestSyncResponseunderLock.SetSyncResponsePersistenceandGetLatestSyncResponseboth operate entirely undersyncRespMux, withGetLatestSyncResponsecloning before returning.The lock usage avoids deadlocks (no upgrade while holding
RLock) and keeps sync message processing serialized viasyncMsgMuxas before.Also applies to: 770-841, 1891-1929
958-993: Job event client wiring matches existing management/signal patterns
receiveJobEventsmirrors the existingreceiveManagementEvents/receiveSignalEventspatterns: it uses a dedicated wait group, runsmgmClient.Jobwith the engine context, maps terminal errors to a global client restart (ErrResetConnection+clientCancel), and defaults job status to failed unless the handler succeeds. That’s a good, conservative default and makes job-stream failures consistent with other control-plane failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
shared/management/client/client_test.go (1)
76-135: ConstructjobManagerwith real metrics instead ofnilto match production wiring.Right now the job manager is created with a
nilmetrics handle even thoughmetricsis initialized later viatelemetry.NewDefaultAppMetrics. To avoid potential nil dereferences inside the job manager and to keep the test server wiring closer to production, it’s better to build the job manager with the real metrics instance after it’s created. The rest of the wiring (NewController,BuildManager,nbgrpc.NewServer) matches the signatures shown in the referenced snippets. Based on relevant_code_snippets frommanagement/server/job/manager.go,management/server/account.go, andmanagement/internals/shared/grpc/server.go.You could adjust as follows:
- jobManager := job.NewJobManager(nil, store) - eventStore := &activity.InMemoryEventStore{} + eventStore := &activity.InMemoryEventStore{} @@ - metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) - require.NoError(t, err) + metrics, err := telemetry.NewDefaultAppMetrics(context.Background()) + require.NoError(t, err) + jobManager := job.NewJobManager(metrics, store)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
shared/management/proto/management.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (1)
shared/management/client/client_test.go(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
Applied to files:
shared/management/client/client_test.go
🧬 Code graph analysis (1)
shared/management/client/client_test.go (4)
management/server/job/manager.go (1)
NewJobManager(32-42)management/internals/controllers/network_map/controller/controller.go (1)
NewController(75-111)management/internals/modules/peers/ephemeral/manager/ephemeral.go (1)
NewEphemeralManager(53-61)management/server/account.go (1)
BuildManager(178-266)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: release_ui
- GitHub Check: release
- GitHub Check: release_ui_darwin
- GitHub Check: Build Cache
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
- GitHub Check: Darwin
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
🔇 Additional comments (1)
shared/management/client/client_test.go (1)
20-28: New imports for ephemeral manager and job manager look correct and are properly used.The
ephemeral_manageralias andjobpackage are both referenced later instartManagement(forNewEphemeralManagerandNewJobManagerrespectively), so the import changes are consistent and compile-time safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
management/server/job/manager.go(1 hunks)management/server/management_test.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- management/server/management_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
management/server/job/manager.go (3)
management/server/job/channel.go (2)
Channel(18-21)NewChannel(23-29)management/server/store/store.go (1)
Store(50-211)management/server/types/job.go (1)
Job(34-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Management / Integration (amd64, sqlite)
- GitHub Check: Management / Benchmark (API) (amd64, postgres)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Integration (amd64, postgres)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Signal / Unit (386)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Client / Unit (386)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Management / Benchmark (API) (amd64, sqlite)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: Windows
- GitHub Check: Darwin
- GitHub Check: Linux
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: JS / Lint
🔇 Additional comments (5)
management/server/job/manager.go (5)
17-21: LGTM!The Event struct is a clean container for tracking job request/response pairs.
23-30: LGTM!The Manager struct is properly designed with appropriate fields for concurrent job management.
32-42: LGTM!The constructor properly initializes all Manager fields with sensible defaults.
153-159: LGTM!The method correctly checks peer connection status with appropriate read lock usage.
161-171: LGTM!The method correctly checks for pending jobs with appropriate read lock usage. The O(n) scan is acceptable for typical workloads.
| // CreateJobChannel creates or replaces a channel for a peer | ||
| func (jm *Manager) CreateJobChannel(ctx context.Context, accountID, peerID string) *Channel { | ||
| // all pending jobs stored in db for this peer should be failed | ||
| if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Pending job cleanup: marked as failed automatically due to being stuck too long"); err != nil { | ||
| log.WithContext(ctx).Error(err.Error()) | ||
| } | ||
|
|
||
| jm.mu.Lock() | ||
| defer jm.mu.Unlock() | ||
|
|
||
| if ch, ok := jm.jobChannels[peerID]; ok { | ||
| ch.Close() | ||
| delete(jm.jobChannels, peerID) | ||
| } | ||
|
|
||
| ch := NewChannel() | ||
| jm.jobChannels[peerID] = ch | ||
| return ch | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition: Store operation before lock acquisition.
Lines 46-49 call MarkPendingJobsAsFailed before acquiring the lock at line 51. This creates a race with other methods (e.g., SendJob, HandleResponse) that may be modifying the pending map or the peer's state concurrently. The store operation and in-memory cleanup should be atomic.
Consider acquiring the lock before the store operation, or ensure the store call and subsequent in-memory cleanup are properly coordinated:
func (jm *Manager) CreateJobChannel(ctx context.Context, accountID, peerID string) *Channel {
+ jm.mu.Lock()
+ defer jm.mu.Unlock()
+
// all pending jobs stored in db for this peer should be failed
if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Pending job cleanup: marked as failed automatically due to being stuck too long"); err != nil {
log.WithContext(ctx).Error(err.Error())
}
- jm.mu.Lock()
- defer jm.mu.Unlock()
-
if ch, ok := jm.jobChannels[peerID]; ok {
ch.Close()
delete(jm.jobChannels, peerID)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // CreateJobChannel creates or replaces a channel for a peer | |
| func (jm *Manager) CreateJobChannel(ctx context.Context, accountID, peerID string) *Channel { | |
| // all pending jobs stored in db for this peer should be failed | |
| if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Pending job cleanup: marked as failed automatically due to being stuck too long"); err != nil { | |
| log.WithContext(ctx).Error(err.Error()) | |
| } | |
| jm.mu.Lock() | |
| defer jm.mu.Unlock() | |
| if ch, ok := jm.jobChannels[peerID]; ok { | |
| ch.Close() | |
| delete(jm.jobChannels, peerID) | |
| } | |
| ch := NewChannel() | |
| jm.jobChannels[peerID] = ch | |
| return ch | |
| } | |
| // CreateJobChannel creates or replaces a channel for a peer | |
| func (jm *Manager) CreateJobChannel(ctx context.Context, accountID, peerID string) *Channel { | |
| jm.mu.Lock() | |
| defer jm.mu.Unlock() | |
| // all pending jobs stored in db for this peer should be failed | |
| if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Pending job cleanup: marked as failed automatically due to being stuck too long"); err != nil { | |
| log.WithContext(ctx).Error(err.Error()) | |
| } | |
| if ch, ok := jm.jobChannels[peerID]; ok { | |
| ch.Close() | |
| delete(jm.jobChannels, peerID) | |
| } | |
| ch := NewChannel() | |
| jm.jobChannels[peerID] = ch | |
| return ch | |
| } |
🤖 Prompt for AI Agents
management/server/job/manager.go lines 44-62: the call to
jm.Store.MarkPendingJobsAsFailed runs before jm.mu is acquired, causing a race
with in-memory maps; move the critical section so the mutex is locked before
coordinating the store update and in-memory cleanup: acquire jm.mu.Lock() at the
top of the function (or at least before any access/modification of
jm.jobChannels/pending state), defer jm.mu.Unlock(), then call
jm.Store.MarkPendingJobsAsFailed while holding the lock and proceed with
closing/deleting the old channel and creating the new one; if holding the lock
during the store call is unacceptable for latency, introduce a dedicated
finer-grained lock or perform an in-memory state flip under the lock and persist
the change afterwards in a safe, coordinated way.
| // SendJob sends a job to a peer and tracks it as pending | ||
| func (jm *Manager) SendJob(ctx context.Context, accountID, peerID string, req *proto.JobRequest) error { | ||
| jm.mu.RLock() | ||
| ch, ok := jm.jobChannels[peerID] | ||
| jm.mu.RUnlock() | ||
| if !ok { | ||
| return fmt.Errorf("peer %s has no channel", peerID) | ||
| } | ||
|
|
||
| event := &Event{ | ||
| PeerID: peerID, | ||
| Request: req, | ||
| } | ||
|
|
||
| jm.mu.Lock() | ||
| jm.pending[string(req.ID)] = event | ||
| jm.mu.Unlock() | ||
|
|
||
| if err := ch.AddEvent(ctx, jm.responseWait, event); err != nil { | ||
| jm.cleanup(ctx, accountID, string(req.ID), err.Error()) | ||
| return err | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, find the exact location and check file size
find . -name "manager.go" -path "*/job/*" | head -5Repository: netbirdio/netbird
Length of output: 95
🏁 Script executed:
#!/bin/bash
# Get the file to understand the context
wc -l management/server/job/manager.goRepository: netbirdio/netbird
Length of output: 97
🏁 Script executed:
#!/bin/bash
# Read the SendJob method and surrounding code
head -150 management/server/job/manager.go | tail -100Repository: netbirdio/netbird
Length of output: 2890
🏁 Script executed:
#!/bin/bash
# Read the full manager.go file to understand Channel type and AddEvent
cat -n management/server/job/manager.goRepository: netbirdio/netbird
Length of output: 5836
🏁 Script executed:
#!/bin/bash
# Search for the Channel type definition and AddEvent method
fd "channel" --type f -e go | xargs grep -l "func.*AddEvent" | head -5Repository: netbirdio/netbird
Length of output: 93
🏁 Script executed:
#!/bin/bash
# Search in common locations for Channel implementation
find . -name "*.go" -path "*/job/*" -exec grep -l "func.*AddEvent" {} \;Repository: netbirdio/netbird
Length of output: 95
🏁 Script executed:
#!/bin/bash
# Read the Channel implementation
cat -n management/server/job/channel.goRepository: netbirdio/netbird
Length of output: 1657
🏁 Script executed:
#!/bin/bash
# Check if there are any other places that might call CloseChannel or CreateJobChannel to understand call patterns
rg "CloseChannel|CreateJobChannel" --type go -B 2 -A 2Repository: netbirdio/netbird
Length of output: 15401
🏁 Script executed:
#!/bin/bash
# Check what happens when sending on a closed channel in the current code
# to confirm it would panic
grep -n "jc.events <-" management/server/job/channel.goRepository: netbirdio/netbird
Length of output: 89
🏁 Script executed:
#!/bin/bash
# Verify the actual call site in grpc server to understand the flow
grep -B 5 -A 15 "sendJobsLoop" management/internals/shared/grpc/server.go | head -40Repository: netbirdio/netbird
Length of output: 1469
Race condition: Channel reference becomes stale after lock release.
After releasing the read lock at line 68, another goroutine can call CloseChannel (e.g., when a gRPC stream ends via sendJobsLoop's defer), which closes the channel and deletes it from the map. The ch reference obtained at line 67 then points to a closed channel. When AddEvent at line 82 attempts to send on the closed channel (line 38 of channel.go: jc.events <- event), it will panic with "send on closed channel".
This can occur in production when:
CreateJobChannelcreates a channel for a peerSendJobacquires a reference to the channel and releases the read lock- The gRPC stream terminates and
CloseChannelis called via the defer insendJobsLoop SendJobresumes and callsAddEventon the now-closed channel, causing a panic
Hold the lock across the entire operation or verify the channel still exists after acquiring the write lock:
func (jm *Manager) SendJob(ctx context.Context, accountID, peerID string, req *proto.JobRequest) error {
- jm.mu.RLock()
+ jm.mu.Lock()
+ defer jm.mu.Unlock()
+
ch, ok := jm.jobChannels[peerID]
- jm.mu.RUnlock()
if !ok {
return fmt.Errorf("peer %s has no channel", peerID)
}
event := &Event{
PeerID: peerID,
Request: req,
}
-
- jm.mu.Lock()
jm.pending[string(req.ID)] = event
- jm.mu.Unlock()
if err := ch.AddEvent(ctx, jm.responseWait, event); err != nil {
jm.cleanup(ctx, accountID, string(req.ID), err.Message())
return err
}
return nil
}Note: This assumes ch.AddEvent can be called safely while holding the Manager's lock. Since AddEvent uses a 5-minute timeout, a more defensive approach would be to re-check the channel exists in the map after acquiring the write lock, then release the lock before calling AddEvent.
Committable suggestion skipped: line range outside the PR's diff.
| for jobID, ev := range jm.pending { | ||
| if ev.PeerID == peerID { | ||
| // if the client disconnect and there is pending job then marke it as failed | ||
| if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Time out peer disconnected"); err != nil { | ||
| log.WithContext(ctx).Errorf("failed to mark pending jobs as failed: %v", err) | ||
| } | ||
| delete(jm.pending, jobID) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant store calls: MarkPendingJobsAsFailed called in loop.
Line 132 calls MarkPendingJobsAsFailed for each pending job of the peer. This method likely marks ALL pending jobs for the peer in a single call, making the repeated calls redundant and potentially causing performance issues or unintended side effects.
Call the method once, outside the loop:
+ // Mark all pending jobs for this peer as failed in the store
+ if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Time out peer disconnected"); err != nil {
+ log.WithContext(ctx).Errorf("failed to mark pending jobs as failed: %v", err)
+ }
+
+ // Clean up in-memory pending entries
for jobID, ev := range jm.pending {
if ev.PeerID == peerID {
- // if the client disconnect and there is pending job then marke it as failed
- if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Time out peer disconnected"); err != nil {
- log.WithContext(ctx).Errorf("failed to mark pending jobs as failed: %v", err)
- }
delete(jm.pending, jobID)
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for jobID, ev := range jm.pending { | |
| if ev.PeerID == peerID { | |
| // if the client disconnect and there is pending job then marke it as failed | |
| if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Time out peer disconnected"); err != nil { | |
| log.WithContext(ctx).Errorf("failed to mark pending jobs as failed: %v", err) | |
| } | |
| delete(jm.pending, jobID) | |
| } | |
| } | |
| // Mark all pending jobs for this peer as failed in the store | |
| if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Time out peer disconnected"); err != nil { | |
| log.WithContext(ctx).Errorf("failed to mark pending jobs as failed: %v", err) | |
| } | |
| // Clean up in-memory pending entries | |
| for jobID, ev := range jm.pending { | |
| if ev.PeerID == peerID { | |
| delete(jm.pending, jobID) | |
| } | |
| } |
🤖 Prompt for AI Agents
In management/server/job/manager.go around lines 129 to 137, the code calls
jm.Store.MarkPendingJobsAsFailed inside a loop for every pending job of the same
peer which is redundant and inefficient; move the single call to
MarkPendingJobsAsFailed outside the loop so it runs once per peer (before or
after iterating to delete matching entries), and only delete jm.pending entries
in the loop — ensure the error handling remains and log the error if the single
store call fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
management/server/types/job.go (1)
120-138: Potential JSON shape mismatch between stored proto.BundleResult and api.BundleResult
ApplyResponsestoresresp.WorkloadResults.(*proto.JobResponse_Bundle).Bundleby JSON‑marshalling the proto struct, whilebuildBundleResponselater unmarshals that blob intoapi.BundleResult. Ifproto.BundleResult.UploadKeyis still a wrapped type (e.g. a protobufStringmessage) rather than a plainstring, the JSON will be{ "upload_key": { "value": "..." } }, which will fail to unmarshal intoapi.BundleResult’sUploadKey *string. Consider mapping the proto result into anapi.BundleResultand marshalling that instead, or updating the proto field to a plainstring, so the stored JSON shape matches the HTTP type.To confirm the field type and JSON shape, please run:
#!/bin/bash # Inspect BundleResult in the proto package and compare to HTTP BundleResult rg -n "message BundleResult" shared/management/proto -n -C3 || true rg -n "type BundleResult struct" shared/management/http/api -n -C5 || trueAlso applies to: 164-201
management/server/job/manager.go (1)
24-32: Race: SendJob can panic sending on a channel closed by CloseChannel/CreateJobChannel, and old streams can close new channels
SendJobgrabs a*ChannelunderRLock, releases the mutex, then callsch.AddEvent, whileCloseChannelandCreateJobChannelcan concurrently callch.Close()on the same channel underLock. This leaves a window whereAddEventwrites into a channel that has just been closed, which will panic with “send on closed channel.” Additionally, becauseCloseChannellooks up the channel only bypeerID, a previous Job stream’s deferredCloseChannelcan close and delete a new channel created for the same peer by a later Job stream. You should either (a) ensure channels are not closed whileAddEventmay be sending (e.g. by coordinating via the same mutex or a per‑channel state flag), or (b) makeAddEventrobust to a concurrently closed channel (e.g. recovering from the panic and returning a typedErrJobChannelClosed, whichSendJobalready routes intocleanup).Also applies to: 34-45, 47-65, 68-91, 130-149, 151-162
🧹 Nitpick comments (2)
management/internals/shared/grpc/server.go (1)
446-449: Minor: comment still refers to “update message” instead of “job message”The comment above
sendJobmentions “update message” and “sync server”, which is copy‑pasted from the Sync path; updating it to refer to “job message”/“Job server” would avoid confusion.management/server/store/sql_store.go (1)
109-118: Job persistence methods are scoped correctly; consider checking RowsAffected in CompletePeerJob
CreatePeerJob,MarkPendingJobsAsFailed,MarkAllPendingJobsAsFailed,GetPeerJobByID, andGetPeerJobsall use appropriate account/peer scoping and statuses, which should keep job rows well‑bounded. InCompletePeerJob, however, a missing job ID currently yields a successful return even if no rows were updated; consider checkingresult.RowsAffectedand returning a NotFound‑style error when it is zero to avoid silently losing job completions.Also applies to: 136-157, 159-193, 195-225
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
management/internals/modules/peers/manager.go(2 hunks)management/internals/server/controllers.go(3 hunks)management/internals/shared/grpc/server.go(10 hunks)management/server/job/manager.go(1 hunks)management/server/store/sql_store.go(4 hunks)management/server/store/store.go(1 hunks)management/server/types/job.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
management/internals/modules/peers/manager.go (1)
management/server/store/store.go (1)
LockingStrengthNone(47-47)
management/server/types/job.go (1)
shared/management/http/api/types.gen.go (7)
JobRequest(708-710)WorkloadResponse(1948-1950)BundleParameters(361-373)BundleResult(376-378)BundleWorkloadResponse(391-399)WorkloadTypeBundle(195-195)JobResponse(713-721)
management/server/job/manager.go (3)
management/server/job/channel.go (2)
Channel(18-21)NewChannel(23-29)management/server/store/store.go (1)
Store(50-213)management/internals/modules/peers/manager.go (1)
Manager(23-33)
management/server/store/store.go (1)
management/server/types/job.go (1)
Job(34-58)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
Job(34-58)JobStatusPending(18-18)JobStatusFailed(20-20)management/server/store/store.go (2)
LockingStrength(40-40)LockingStrengthNone(47-47)
🔇 Additional comments (6)
management/server/types/job.go (1)
66-103: Job creation and bundle parameter validation look consistent
NewJobandvalidateAndBuildBundleParamscorrectly derive the workload type, enforce tight bounds onBundleForTimeandLogFileCount, and persist a normalized JSON parameters blob; this should interoperate cleanly withbuildStreamBundleResponseand the protoBundleParameters.Also applies to: 140-162
management/internals/server/controllers.go (1)
29-33: JobManager wiring via BaseServer accessor looks correctThe
JobManager()accessor follows the existingCreatepattern and passes the expected dependencies (Metrics,Store,PeersManager) intojob.NewJobManager, so the wiring is coherent with the rest of the server.management/internals/modules/peers/manager.go (1)
23-33: GetPeerID helper cleanly delegates to storeExtending
ManagerwithGetPeerIDand delegating tostore.GetPeerIDByKeywithLockingStrengthNoneis straightforward and keeps peer‑ID resolution centralized in the store.Also applies to: 165-167
management/server/job/manager.go (1)
93-128: HandleResponse flow (pending lookup, peer validation, DB completion) looks soundWithin the manager lock,
HandleResponsenow looks up the pending event by jobID, resolves the responding peer ID viapeersManager.GetPeerID, enforces equality withevent.PeerID, applies the proto response onto atypes.Job, persists withCompletePeerJob, and only then deletes the pending entry; this fixes earlier concerns about peer spoofing and premature pending cleanup.management/internals/shared/grpc/server.go (1)
79-132: Job streaming wiring and handshake/response receiver are coherentThe
Jobhandler’s handshake, account/peer lookup, response receiver startup, and per‑peerCreateJobChannel/sendJobsLoopintegration follow the existing Sync patterns and correctly tie intojob.Managerand the store; the response receiver also exits cleanly on EOF/context cancellation while validating and dispatching decryptedJobResponses.Also applies to: 175-207, 324-337, 339-362, 364-386, 446-469
management/server/store/store.go (1)
40-48: Store interface extensions for jobs and GetPeerIDByKey look appropriateAdding job lifecycle methods and
GetPeerIDByKeytoStorematches the SqlStore implementation and the new job/peers flows; just ensure any non‑SQLStoreimplementations are updated accordingly to avoid build breakage.If you have other
Storeimplementations (e.g. a FileStore), please confirm they now implement these methods as well.Also applies to: 206-213
| func (s *Server) sendJobsLoop(ctx context.Context, accountID string, peerKey wgtypes.Key, peer *nbpeer.Peer, updates *job.Channel, srv proto.ManagementService_JobServer) error { | ||
| // todo figure out better error handling strategy | ||
| defer s.jobManager.CloseChannel(ctx, accountID, peer.ID) | ||
|
|
||
| for { | ||
| event, err := updates.Event(ctx) | ||
| if err != nil { | ||
| if errors.Is(err, job.ErrJobChannelClosed) { | ||
| log.WithContext(ctx).Debugf("jobs channel for peer %s was closed", peerKey.String()) | ||
| return nil | ||
| } | ||
|
|
||
| // happens when connection drops, e.g. client disconnects | ||
| log.WithContext(ctx).Debugf("stream of peer %s has been closed", peerKey.String()) | ||
| return ctx.Err() | ||
| } | ||
|
|
||
| if err := s.sendJob(ctx, peerKey, event, srv); err != nil { | ||
| log.WithContext(ctx).Warnf("send job failed: %v", err) | ||
| return nil | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Job send failures are swallowed, returning OK status to the client
In sendJobsLoop, if sendJob returns a non‑nil error (including internal encryption or Send failures), the error is only logged and the loop returns nil, so the gRPC Job call completes with an OK status even though the job delivery failed. Consider propagating the error (or mapping it to a suitable gRPC status) so callers and observability can distinguish successful from failed job streams.
Also applies to: 446-469
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
management/server/store/sql_store.go (1)
146-157: Address past review feedback: Verify peer ownership in CompletePeerJob.As noted in past reviews, this method should verify that the job being completed actually belongs to the peer making the request. Currently, it only checks the job ID without validating the peer_id, which could allow a peer to complete another peer's job.
Apply this diff to add peer verification:
func (s *SqlStore) CompletePeerJob(ctx context.Context, job *types.Job) error { result := s.db. Model(&types.Job{}). - Where(idQueryCondition, job.ID). + Where(accountAndIDQueryCondition+" AND id = ?", job.AccountID, job.PeerID, job.ID). Updates(job) if result.Error != nil { log.WithContext(ctx).Errorf("failed to update job in store: %s", result.Error) return status.Errorf(status.Internal, "failed to update job in store") } + + if result.RowsAffected == 0 { + return status.Errorf(status.NotFound, "job %s not found or does not belong to peer %s", job.ID, job.PeerID) + } + return nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
management/server/store/sql_store.go(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
Job(34-58)JobStatusPending(18-18)JobStatusFailed(20-20)management/server/store/store.go (2)
LockingStrength(40-40)LockingStrengthNone(47-47)
🔇 Additional comments (3)
management/server/store/sql_store.go (3)
43-52: LGTM: Constants follow existing patterns.The new
accountAndPeerIDQueryConditionconstant is consistent with the existing query condition constants and will be used for job filtering operations.
117-117: LGTM: Job entity added to auto-migration.The addition of
&types.Job{}to the AutoMigrate call ensures the jobs table schema is properly created and maintained.
4221-4239: LGTM: GetPeerIDByKey correctly uses GetKeyQueryCondition helper.The method properly uses
GetKeyQueryCondition(s)at line 4230 to handle MySQL's reservedkeyidentifier, addressing the concern raised in past reviews about MySQL compatibility.
| // GetJobByID fetches job by ID | ||
| func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, jobID string) (*types.Job, error) { | ||
| var job types.Job | ||
| err := s.db. | ||
| Where(accountAndIDQueryCondition, accountID, jobID). | ||
| First(&job).Error | ||
| if errors.Is(err, gorm.ErrRecordNotFound) { | ||
| return nil, status.Errorf(status.NotFound, "job %s not found", jobID) | ||
| } | ||
| if err != nil { | ||
| log.WithContext(ctx).Errorf("failed to fetch job from store: %s", err) | ||
| return nil, err | ||
| } | ||
| return &job, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address past review feedback: Add peer ownership verification to GetPeerJobByID.
As noted in past reviews, this method should verify that the job belongs to a specific peer when called from a peer context. Currently, it only checks accountID and jobID, which could allow any authenticated entity in the account to view other peers' jobs.
Consider either:
- Adding a
peerIDparameter and verifying ownership, OR - Renaming this method to indicate it's for admin/management access only (e.g.,
GetJobByIDForAccount)
If peer verification is required, apply this diff:
-func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, jobID string) (*types.Job, error) {
+func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, peerID, jobID string) (*types.Job, error) {
var job types.Job
err := s.db.
- Where(accountAndIDQueryCondition, accountID, jobID).
+ Where(accountAndPeerIDQueryCondition+" AND id = ?", accountID, peerID, jobID).
First(&job).Error
if errors.Is(err, gorm.ErrRecordNotFound) {
return nil, status.Errorf(status.NotFound, "job %s not found", jobID)
}Committable suggestion skipped: line range outside the PR's diff.
| return "", fmt.Errorf("upload debug bundle: %w", err) | ||
| } | ||
|
|
||
| log.Infof("debug bundle has been generated well") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| log.Infof("debug bundle has been generated well") | |
| log.Infof("debug bundle has been generated successfully") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
client/jobexec/executor.go (1)
63-63: Fix grammatical error in log message.The message "generated well" is grammatically incorrect and misleading since at this point both generation and upload have succeeded.
- log.Infof("debug bundle has been generated well") + log.Infof("debug bundle has been uploaded successfully")
🧹 Nitpick comments (1)
client/jobexec/executor.go (1)
51-61: Consider retry-friendly cleanup strategy on upload failure.Currently, the deferred
os.Removedeletes the bundle file regardless of upload outcome. If the upload fails, the generated bundle is lost, requiring full regeneration for any retry attempt.Consider keeping the file on upload failure to enable retry, or implementing retry logic within this method before cleanup.
- defer func() { - if err := os.Remove(path); err != nil { - log.Errorf("failed to remove debug bundle file: %v", err) - } - }() - key, err := debug.UploadDebugBundle(ctx, types.DefaultBundleURL, mgmURL, path) if err != nil { log.Errorf("failed to upload debug bundle: %v", err) return "", fmt.Errorf("upload debug bundle: %w", err) } + + if err := os.Remove(path); err != nil { + log.Errorf("failed to remove debug bundle file: %v", err) + }Alternatively, implement retry logic before cleanup if transient upload failures are expected.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
client/jobexec/executor.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.
Applied to files:
client/jobexec/executor.go
📚 Learning: 2025-11-14T13:05:31.729Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.
Applied to files:
client/jobexec/executor.go
🧬 Code graph analysis (1)
client/jobexec/executor.go (3)
client/internal/debug/debug.go (3)
GeneratorDependencies(238-243)BundleConfig(232-236)NewBundleGenerator(245-264)client/internal/debug/upload.go (1)
UploadDebugBundle(17-28)upload-server/types/upload.go (1)
DefaultBundleURL(11-11)
🔇 Additional comments (5)
client/jobexec/executor.go (5)
1-14: LGTM!Package declaration and imports are correctly structured. All imports are utilized within the file.
16-18: LGTM!The 60-minute cap on bundle wait time is a reasonable safeguard against excessively long waits.
24-29: LGTM!Empty struct with constructor is an acceptable pattern for future extensibility.
67-76: LGTM!The
waitForimplementation correctly handles context cancellation and returns the appropriate error, addressing prior review feedback. The logging provides good observability.
20-22: Remove this comment —ErrJobNotImplementedis actively used.The error is referenced in
client/internal/engine.go:979where it's accessed via.Error()method. It's appropriately exported for use across packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/internal/engine.go (1)
235-255: Remove unusedc *profilemanager.Configparameter inNewEngine
NewEngine’s last parameterc *profilemanager.Configis never used in the function body. This is dead API surface and (depending on tooling) can trigger compile or lint failures and unnecessary churn at call sites.Consider dropping this parameter from the signature and corresponding call sites, since
EngineConfig.ProfileConfigalready carries the same information.-func NewEngine(..., config *EngineConfig, mobileDep MobileDependency, statusRecorder *peer.Status, checks []*mgmProto.Checks, c *profilemanager.Config) *Engine { +func NewEngine(..., config *EngineConfig, mobileDep MobileDependency, statusRecorder *peer.Status, checks []*mgmProto.Checks) *Engine {…and update call sites accordingly.
♻️ Duplicate comments (4)
management/server/peer.go (3)
348-358: Potential nil-pointer onam.jobManager.The code calls
am.jobManager.IsPeerConnected(peerID)andam.jobManager.IsPeerHasPendingJobs(peerID)without checking ifam.jobManageris non-nil. If the JobManager is ever nil (e.g., in certain test configurations or misconfigured environments), this will panic.Consider adding a defensive check at the start of the function:
+ if am.jobManager == nil { + return status.Errorf(status.Internal, "job manager is not configured") + } + if !am.jobManager.IsPeerConnected(peerID) {Additionally, the TODO comment on lines 353-355 correctly identifies a race condition where concurrent calls could overwrite pending job entries. This should be addressed in a follow-up to ensure job queue integrity.
Based on past review comments, this issue was previously flagged but remains unaddressed.
365-398: Send-before-persist ordering can cause inconsistency between runtime and persistent state.The current implementation sends the job to the peer (line 366) before persisting it to the database (line 379). If the database write fails after the job has been dispatched, the job will execute on the client but will be absent from persistent storage and from
/jobsAPI listings.This ordering can lead to:
- Jobs executing on peers with no database record
- Inability to track or query these jobs via the API
- Potential audit and compliance issues
Consider reversing the order:
- Persist the job first with
status = pending- Send the job to the peer
- If sending fails, update the job status to
failedwith an appropriateFailedReasonThis ensures the database remains the source of truth and runtime state stays consistent.
Based on past review comments, this issue was previously flagged but remains unaddressed.
427-451: Missing job ownership validation - authorization bypass.After fetching the job by
jobID(line 445), the function does not verify thatjob.PeerIDmatches the requestedpeerID. This allows a caller to access any job within the account by manipulating the URL path:Example:
GET /peers/{peerA}/jobs/{jobOfPeerB}will returnjobOfPeerBeven though the path specifiespeerA.Add an ownership check after retrieving the job:
job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID) if err != nil { return nil, err } + + if job.PeerID != peerID { + return nil, status.NewPeerNotPartOfAccountError() + } return job, nilBased on past review comments, this issue was previously flagged but remains unaddressed.
client/internal/engine.go (1)
987-1010: Guard against nilProfileConfigandManagementURLinhandleBundle
handleBundledereferencese.config.ProfileConfigande.config.ProfileConfig.ManagementURLwithout checking for nil:
InternalConfig: e.config.ProfileConfige.config.ProfileConfig.ManagementURL.String()If
ProfileConfigor itsManagementURLis nil (e.g. in tests or alternativeEngineConfigconstruction), this will panic during remote debug bundle generation. A previous review already flagged this; it’s still present.func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) { log.Infof("handle remote debug bundle request: %s", params.String()) @@ - bundleDeps := debug.GeneratorDependencies{ - InternalConfig: e.config.ProfileConfig, + if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil { + return nil, fmt.Errorf("profile config or management URL not available for bundle generation") + } + + bundleDeps := debug.GeneratorDependencies{ + InternalConfig: e.config.ProfileConfig, @@ - waitFor := time.Duration(params.BundleForTime) * time.Minute - - uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String()) + waitFor := time.Duration(params.BundleForTime) * time.Minute + + managementURL := e.config.ProfileConfig.ManagementURL.String() + uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, managementURL)This keeps the job API unchanged while ensuring the engine can’t panic on missing profile data.
🧹 Nitpick comments (3)
client/server/server.go (1)
31-33: Usingnbstatus.ToProtoFullStatussimplifies and centralizes status conversionReplacing the local
toProtoFullStatushelper withnbstatus.ToProtoFullStatus(fullStatus)is a good consolidation. The returnedpbFullStatusis still enriched locally with events and SSH server state before being attached to the response, so behavior remains consistent while reducing duplication.Also applies to: 1078-1091
client/internal/connect.go (2)
64-67: Log path propagation throughConnectClient.Runand mobile variants is consistentThe new
Run(runningChan, logPath)signature, plus forwarding oflogPathintorun(...), cleanly introduces log‑path awareness without disturbing existing control flow. Android/iOS wrappers deliberately pass an empty log path, which keeps their behavior unchanged while still benefiting from other refactors.Also applies to: 70-88, 90-106, 108-108
251-256: EngineConfig enrichment withLogPathandProfileConfigis correctly wired
createEngineConfignow:
- Accepts
logPathand assigns it toEngineConfig.LogPath.- Sets
EngineConfig.ProfileConfig = config.This, combined with the updated call
createEngineConfig(myPrivateKey, c.config, peerConfig, logPath)and the use of these fields in the engine’s debug bundle path, ensures the engine has both the profile metadata and log location needed for remote debug bundles.Aside from the unused extra parameter on
NewEngine(flagged separately inengine.go), the data flow looks sound.Also applies to: 417-456
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
client/internal/connect.go(7 hunks)client/internal/engine.go(13 hunks)client/server/server.go(4 hunks)management/server/peer.go(2 hunks)management/server/permissions/modules/module.go(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
client/internal/engine.go (3)
client/internal/profilemanager/config.go (1)
Config(89-160)client/jobexec/executor.go (3)
Executor(24-25)NewExecutor(27-29)ErrJobNotImplemented(21-21)client/internal/debug/debug.go (2)
GeneratorDependencies(238-243)BundleConfig(232-236)
client/server/server.go (1)
client/status/status.go (1)
ToProtoFullStatus(549-635)
client/internal/connect.go (3)
client/internal/mobile_dependency.go (1)
MobileDependency(13-25)client/internal/engine.go (2)
NewEngine(235-276)EngineConfig(83-142)client/internal/profilemanager/config.go (1)
Config(89-160)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Client / Unit (386)
- GitHub Check: release_ui_darwin
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: release
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
- GitHub Check: Darwin
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
🔇 Additional comments (6)
management/server/permissions/modules/module.go (1)
8-8: LGTM!The new
RemoteJobsmodule constant and its registration in theAllmap follow the established pattern correctly.Also applies to: 25-25
management/server/peer.go (2)
34-34: LGTM!The minimum version constant for remote jobs is clear and will enable proper feature gating.
400-425: Function looks good; note the TODO for dedicated permissions.The permission checks and peer ownership validation are correctly implemented. The TODO comment on line 401 indicates planned work to create more granular job permissions, which is a reasonable future enhancement.
client/internal/engine.go (2)
205-222: Job executor lifecycle wiring looks correctThe addition of
jobExecutor, its dedicatedjobExecutorWG, thereceiveJobEvents()call inStart(), ande.jobExecutorWG.Wait()inStop()form a coherent lifecycle: job goroutine is tracked and properly drained on shutdown, without sharingshutdownWgwith other engine goroutines. I don’t see deadlock or ordering issues given current usage.Also applies to: 278-337, 524-527, 950-985
809-822: Sync response persistence refactor is thread‑safeThe move to a dedicated
syncRespMuxforpersistSyncResponseandlatestSyncResponse(with R/W separation and cloning inGetLatestSyncResponse) removes contention withsyncMsgMuxand avoids data races on the stored message. Lock ordering is one‑way (syncMsgMux→syncRespMux) and I don’t see any reverse order, so deadlock risk is low.Also applies to: 1889-1925
client/server/server.go (1)
1539-1547: Passings.logFileintoConnectClient.Runmatches new signatureUpdating
connect()to calls.connectClient.Run(runningChan, s.logFile)correctly propagates the daemon’s log file path into the client, aligning with the new Run signature and the debug bundle/log‑path plumbing inConnectClientandEngineConfig.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
client/cmd/testutil_test.go(4 hunks)client/internal/engine_test.go(10 hunks)client/server/server_test.go(4 hunks)management/internals/modules/peers/manager_mock.go(1 hunks)management/server/account_test.go(2 hunks)management/server/dns_test.go(2 hunks)management/server/http/testing/testing_tools/channel/channel.go(2 hunks)management/server/management_proto_test.go(3 hunks)management/server/management_test.go(4 hunks)management/server/nameserver_test.go(2 hunks)management/server/peer_test.go(5 hunks)management/server/route_test.go(2 hunks)shared/management/client/client_test.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- management/server/dns_test.go
- management/server/management_proto_test.go
- management/server/peer_test.go
- management/server/http/testing/testing_tools/channel/channel.go
- management/server/account_test.go
- client/cmd/testutil_test.go
- management/server/route_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
Applied to files:
shared/management/client/client_test.go
🧬 Code graph analysis (2)
management/server/management_test.go (3)
management/server/networks/routers/manager.go (1)
NewManager(40-46)management/server/settings/manager.go (1)
NewManager(37-44)management/server/job/manager.go (1)
NewJobManager(34-45)
management/server/nameserver_test.go (3)
management/internals/modules/peers/manager.go (1)
NewManager(44-49)management/server/account.go (1)
BuildManager(178-266)management/server/job/manager.go (1)
NewJobManager(34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Build Cache
- GitHub Check: Linux
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: release_ui
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: Darwin
- GitHub Check: iOS / Build
- GitHub Check: Windows
- GitHub Check: Android / Build
- GitHub Check: Check External GPL/AGPL Licenses
🔇 Additional comments (10)
management/internals/modules/peers/manager_mock.go (1)
100-113: LGTM! Auto-generated mock looks correct.The mock method and recorder for
GetPeerIDfollow standard gomock patterns and are consistent with the existing mock methods in this file. The signatures and implementations are correct.client/server/server_test.go (2)
310-310: LGTM! JobManager correctly instantiated for tests.The JobManager is created with nil metrics, which is appropriate for test setup. The store and peersManager dependencies are properly provided.
323-323: No action needed—BuildManager parameter positions are correct.The jobManager is correctly positioned as the 5th parameter in the BuildManager call at line 323, matching the function signature where it follows networkMapController. All parameter alignments are accurate.
shared/management/client/client_test.go (2)
96-96: LGTM! JobManager wiring integrated correctly.The JobManager is properly instantiated and will be threaded through the test setup.
122-122: Ephemeral manager import alias improves readability.Using the
ephemeral_manageralias avoids conflicts and makes the code clearer.management/server/nameserver_test.go (1)
794-801: LGTM! Clean inline JobManager creation.Creating the JobManager inline within the BuildManager call is concise and appropriate for test code. The peersManager dependency is correctly instantiated on the preceding line.
client/internal/engine_test.go (3)
1602-1635: LGTM! Management server test wiring is consistent.The JobManager is correctly created and passed through to BuildManager and nbgrpc.NewServer, following the same pattern established in other test files.
1536-1536: No changes needed—NewEngine signature is correct.NewEngine returns only
*Engine(not(*Engine, error)), making the assignmente, err := NewEngine(...), nilvalid Go syntax. This is not unconventional; Go functions can return single values, and assigningniltoerrhere is a straightforward tuple assignment with no issues.
256-256: The two trailingnilparameters passed to NewEngine correspond to thechecksandcparameters in the function signature (line 235 of client/internal/engine.go), which are[]*mgmProto.Checksand*profilemanager.Configrespectively. These parameters are already documented in the function signature itself through their names and type annotations, so no additional documentation is needed. The speculation about "jobManager and logPath" does not match the actual implementation.Likely an incorrect or invalid review comment.
management/server/management_test.go (1)
206-244: LGTM! JobManager integration is clean and consistent.The test correctly:
- Extracts peersManager into a variable (line 206)
- Creates jobManager with nil metrics (line 207)
- Passes jobManager to BuildManager (line 219)
- Passes jobManager to nbgrpc.NewServer (line 244)
This follows the established pattern across all test files.
| return nil, "", err | ||
| } | ||
| mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, secretsManager, nil, nil, &server.MockIntegratedValidator{}, networkMapController) | ||
| mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &server.MockIntegratedValidator{}, networkMapController) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find nbgrpc package files
fd -e go | grep -i nbgrpc | head -20Repository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Search for NewServer function definition
ast-grep --pattern $'func NewServer($_$$_) $_$$_'Repository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Also try a broader search for NewServer
rg -t go "func NewServer" --max-count 5 -A 10Repository: netbirdio/netbird
Length of output: 5342
🏁 Script executed:
# Read the test file to see the actual call
cat -n client/server/server_test.go | sed -n '320,345p'Repository: netbirdio/netbird
Length of output: 1643
Fix nbgrpc.NewServer call: pass concrete managers instead of nil for appMetrics and authManager parameters.
The parameter count (9) and order are correct. However, the call passes nil for the 6th and 7th parameters (appMetrics and authManager), which expect telemetry.AppMetrics and auth.Manager types respectively. Pass appropriate instances or mocks instead of nil for these parameters.
🤖 Prompt for AI Agents
In client/server/server_test.go around line 332, the nbgrpc.NewServer call
currently passes nil for the 6th and 7th parameters (appMetrics and
authManager); replace those nils with concrete instances or test mocks
implementing telemetry.AppMetrics and auth.Manager (e.g., create or reuse a
MockAppMetrics and MockAuthManager in the test setup or use existing test
helpers), then pass those instances to nbgrpc.NewServer in the same parameter
positions so the call receives proper typed objects instead of nil.

Bugfixes
Describe your changes
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Other
✏️ Tip: You can customize this high-level summary in your review settings.