Skip to content

Conversation

@pappz
Copy link
Contributor

@pappz pappz commented Nov 21, 2025

Bugfixes

Describe your changes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • Jobs system: peers can create, list, and retrieve jobs (e.g., request debug bundle generation and upload).
    • End-to-end job streaming and API models added for job workloads (bundle), plus client/server job handling and executor.
  • Bug Fixes / Improvements

    • Status rendering now uses the full status + daemon version for more accurate summaries.
    • Log collection more resilient with log-path propagation and systemd fallbacks.
    • Debug bundles uploaded via presigned streamed uploads for reliability.
  • Other

    • Debug bundles no longer include the prior standalone status string; status is produced differently.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

Adds a per-peer Job subsystem (streaming RPC, persistence, manager, client executor), refactors debug bundle flow (removes DebugBundleRequest.status, renames LogFile→LogPath, centralizes upload), and propagates logPath/profile config into client connect/engine flows. (50 words)

Changes

Cohort / File(s) Summary
Debug bundle refactor
client/proto/daemon.proto, client/cmd/debug.go, client/server/debug.go, client/internal/debug/debug.go, client/internal/debug/upload.go, client/internal/debug/upload_test.go, client/ui/debug.go
Removed status from DebugBundleRequest; renamed generator dependency LogFileLogPath; moved upload logic to client/internal/debug/upload.go (UploadDebugBundle); CLI/UI no longer builds or passes a status string; generator now seeds status from recorder/FullStatus when present.
Status conversion
client/cmd/status.go, client/status/status.go, client/status/status_test.go
ConvertToStatusOutputOverview now accepts *proto.FullStatus and explicit daemonVersion; added ToProtoFullStatus() and updated call sites to pass full status + version.
Client connect & engine
client/internal/connect.go, client/internal/engine.go, client/internal/engine_test.go, client/embed/embed.go, client/server/server.go, client/cmd/up.go
ConnectClient.Run gained logPath parameter; EngineConfig now includes LogPath and ProfileConfig; NewEngine receives profile config; engine integrates job executor, new syncResp mux, and job event processing; call sites updated to pass logPath and job wiring.
Job executor (client)
client/jobexec/executor.go
New Executor with BundleJob() to optionally wait, create debug bundle via debug.BundleGenerator, and upload via debug.UploadDebugBundle; adds MaxBundleWaitTime and ErrJobNotImplemented.
Server debug handling
client/server/debug.go
Server debug handler delegates upload to debug.UploadDebugBundle and uses LogPath in generator dependencies; removed local upload helpers and unused imports.
Management job subsystem
management/server/job/*, management/server/account.go, management/server/peer.go, management/server/account/manager.go, management/server/types/job.go, management/server/account/manager.go
New job package with Channel and Manager for per-peer channels, pending-job tracking, send/response/cleanup; account manager wired to accept JobManager; job types, validation, mapping and events added.
Persistence for jobs
management/server/store/store.go, management/server/store/sql_store.go
Store interface extended with job methods; SqlStore implements Create/Complete/Get/MarkPendingAsFailed; migrations include types.Job.
Management API & proto
management/internals/shared/grpc/server.go, shared/management/proto/management.proto, shared/management/http/api/openapi.yml, shared/management/http/api/types.gen.go
Added streaming Job RPC and job protobuf messages (JobRequest/JobResponse/BundleParameters/BundleResult/JobStatus); gRPC server implements Job stream, handshake, response receiver and job forwarding; OpenAPI + generated types include Job workload union (bundle).
Client mgmt stream & mocks
shared/management/client/grpc.go, shared/management/client/client.go, shared/management/client/mock.go, shared/management/client/client_test.go
Client streaming logic extended with Job stream helpers; Client interface and MockClient gain Job(...) method; tests/mocks updated to exercise job stream.
HTTP handlers & API types
management/server/http/handlers/peers/peers_handler.go, shared/management/http/api/types.gen.go, shared/management/http/api/openapi.yml
Added HTTP endpoints for peer jobs (Create/List/Get); handlers map Job types to API responses; generated API models and OpenAPI paths/schemas for Job/Workload (bundle).
Tests & wiring updates
many client/ and management/server/*_test.go, client/cmd/testutil_test.go, shared/management/client/client_test.go
Tests and scaffolding updated to instantiate and pass job.NewJobManager(...) into BuildManager and NewServer call sites; mocks expanded with job hooks.
Server wiring & controllers
management/internals/server/controllers.go, management/internals/server/boot.go, management/internals/shared/grpc/*
Added BaseServer.JobManager() accessor; pass JobManager into gRPC server and manager builders; Server stores jobManager and NewServer signature extended.
API tooling / deps
shared/management/http/api/generate.sh, go.mod
oapi-codegen invocation updated to v2; go.mod added github.com/oapi-codegen/runtime v1.1.2 and indirect github.com/apapsch/go-jsonmerge/v2.
Other small changes
assorted files
Activity code for job creation added; mocks and imports updated; minor formatting adjustments across files.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Server as Management gRPC Server
    participant JobMgr as Job Manager
    participant Store as Store/DB
    participant Client as Peer Client (mgmt stream)
    participant Engine as Client Engine / Executor
    participant DebugGen as Debug Bundle Generator
    participant Upload as Upload service

    Server->>JobMgr: CreatePeerJob / SendJob(accountID, peerID, JobRequest)
    JobMgr->>Store: Persist job record (CreatePeerJob)
    Server->>Client: Stream JobRequest to peer (Job RPC)
    Client->>Engine: Executor.BundleJob(params, wait)
    Engine->>DebugGen: Generate bundle (uses LogPath, StatusRecorder)
    DebugGen-->>Engine: bundle file path
    Engine->>Upload: UploadDebugBundle(mgmURL, bundlePath)
    Upload-->>Engine: uploadKey
    Engine->>Client: Return JobResponse (uploadKey)
    Client->>Server: Stream JobResponse back
    Server->>JobMgr: HandleResponse(JobResponse)
    JobMgr->>Store: CompletePeerJob (mark succeeded/failed)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing careful review:
    • Job streaming handshake, encryption, retry and error semantics (shared/management/client/grpc.go; management/internals/shared/grpc/server.go)
    • Concurrency, channel lifecycle, pending-job tracking and cleanup (management/server/job/*)
    • Engine integration with job executor and shutdown ordering (client/internal/engine.go; client/jobexec/executor.go)
    • Store migrations and correctness of job persistence queries/transactions (management/server/store/sql_store.go)
    • HTTP handlers and generated union type correctness and JSON marshaling (management/server/http/handlers/peers/peers_handler.go; shared/management/http/api/types.gen.go)

Possibly related PRs

Suggested reviewers

  • crn4
  • mlsmaycon

Poem

🐰

I hopped through branches, soft and spry,
Laid down job channels, gave logs a try,
Bundles bundled, uploads sent on cue,
No more lone status strings to skew,
My little paws applaud the new sky.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete. While the title is informative, the description body provides no actual details about the changes, implementation, or context. Complete the 'Describe your changes' section with details about what was changed and why. Provide an issue ticket number/link and explain whether documentation is needed for this feature.
Docstring Coverage ⚠️ Warning Docstring coverage is 15.87% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: enabling debug bundle runs to be triggered from the API/Dashboard, with a reference issue number.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/remote-debug-release

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
client/status/status.go (1)

234-315: ToProtoFullStatus mapping is good; add nil‑safety when reading latency/handshake

The new ToProtoFullStatus helper correctly maps:

  • Management and signal state (URL, connected, error).
  • Local peer state (IP, pub key, kernel, FQDN, Rosenpass flags, networks from route keys).
  • Per‑peer fields (status, timestamps, transfer stats, Rosenpass, networks, latency, SSH host key).
  • Relays and DNS group state, including error strings.

Downstream, mapPeers assumes Latency and LastWireguardHandshake are always set:

lastHandshake := pbPeerState.GetLastWireguardHandshake().AsTime().Local()
...
Latency: pbPeerState.GetLatency().AsDuration(),

That’s safe for FullStatus objects produced by ToProtoFullStatus, but will panic if a PeerState arrives with these fields unset (e.g., older daemon or other producer). To harden this:

var lastHandshake time.Time
if ts := pbPeerState.GetLastWireguardHandshake(); ts != nil {
    lastHandshake = ts.AsTime().Local()
}

var latency time.Duration
if d := pbPeerState.GetLatency(); d != nil {
    latency = d.AsDuration()
}

and then use those locals when building PeerStateDetailOutput.

Also applies to: 549-635

client/internal/engine.go (2)

34-53: Engine config/struct wiring for jobs is fine, but c *profilemanager.Config is unused

The new debug/job additions to EngineConfig (ProfileConfig, LogPath) and Engine (jobExecutor, jobExecutorWG) are wired in cleanly, and creating jobExecutor in NewEngine is reasonable.

However, the extra parameter c *profilemanager.Config on NewEngine is never used in the function body. In Go this is a compile‑time error. Either remove the parameter or actually thread it into the engine configuration, e.g.:

config.ProfileConfig = c
// or
engine.config.ProfileConfig = c

depending on where you intend to own that reference.

Also applies to: 83-142, 145-222, 235-276


278-366: Job stream consumption and bundle handling are generally solid; watch for restart behavior and nil profile config

The new job flow on the client side looks good overall:

  • receiveJobEvents hooks into mgmClient.Job with a per‑message handler that defaults responses to JobStatus_failed and branches on WorkloadParameters – currently only Bundle is handled, everything else returns ErrJobNotImplemented.
  • jobExecutorWG ensures Stop() waits for the Job stream goroutine to exit before tearing down the engine.
  • handleBundle builds debug.GeneratorDependencies from engine state, calls jobExecutor.BundleJob, and wraps the resulting upload key in a JobResponse_Bundle.

Two points to be aware of:

  • When mgmClient.Job returns any error (including when e.ctx is canceled during Stop()), you treat it as a hard failure and call CtxGetState(e.ctx).Wrap(ErrResetConnection) + e.clientCancel(), which triggers a full client restart. That mirrors how Sync/Signal errors are handled but also means a graceful “down” or engine stop will trigger a restart path instead of a quiet shutdown. If that’s not intended, you may want to distinguish context cancellation from other errors.

  • handleBundle assumes e.config.ProfileConfig and e.config.ProfileConfig.ManagementURL are non‑nil:

    InternalConfig: e.config.ProfileConfig,
    ...
    uploadKey, err := e.jobExecutor.BundleJob(..., e.config.ProfileConfig.ManagementURL.String())

    If ProfileConfig or its ManagementURL can be unset (e.g., in tests or some platforms), this will panic. A defensive check that returns a failed JobResponse with a clear reason would make the behavior safer.

Also applies to: 942-1012

🧹 Nitpick comments (38)
shared/management/client/grpc.go (5)

112-160: Centralized withMgmtStream helper looks good; double‑check backoff choice and final logging

The withMgmtStream wrapper nicely unifies readiness checks, server key fetching, and retry for both Sync and Job. One thing to double‑check is whether you explicitly want streaming calls to use defaultBackoff(ctx) while other RPCs still use nbgrpc.Backoff(...); if not, consider reusing the same backoff helper for consistency, or documenting the intentional difference.

Also, withMgmtStream logs "unrecoverable error" for any non‑nil err after backoff.Retry, including context.Canceled / DeadlineExceeded. If you expect normal shutdowns via context cancellation, you may want to special‑case those errors before logging at warn to avoid noisy logs.


162-216: Job stream error handling is solid; refine connection notifications for better state reporting

The overall handleJobStream flow (open stream → handshake → recv/process/respond loop) and error classification by gRPC code look good.

A couple of refinements around connection state notifications:

  • notifyDisconnected(err) is called for every receive error, including codes.Unimplemented. In that case, the server is reachable but doesn’t support Job; marking management as “disconnected” can mislead consumers of ConnStateNotifier. Consider moving notifyDisconnected into the switch and skipping it for codes.Unimplemented (and possibly codes.Canceled, which usually indicates local shutdown).
  • The Job path never calls notifyConnected(), so this stream can only move the state toward “disconnected”. If ConnStateNotifier is used for user‑visible connectivity, you might want to call notifyConnected() once the stream and handshake succeed (or when the first job is successfully received) to keep the state transitions balanced.
  • On sendJobResponse failure you return the error but don’t notify disconnection; if this error typically indicates a broken stream, it may be worth also invoking notifyDisconnected(err) there.

These are behavioral tweaks rather than correctness issues but would make connection state reporting more accurate.


218-252: Job helpers mirror existing encryption patterns; consider cleaning up unused ctx parameters

sendHandshake and receiveJobRequest correctly follow the existing EncryptMessage / DecryptMessage pattern and use the WireGuard public key in the EncryptedMessage as elsewhere in this file.

Right now the ctx parameter passed into these helpers isn’t used inside them; the only effective cancellation point is the stream’s own context from realClient.Job(ctx). That’s fine behaviorally, but for clarity you could either:

  • Drop ctx from these helper signatures, or
  • Start using it explicitly (e.g., check ctx.Err() before/after blocking operations, or for future timeouts / per‑job cancellation).

Given this is internal code, I’d treat it as a readability/maintenance cleanup to do when convenient.


254-295: Ensure JobResponse is always well‑formed for correlation and logging

processJobRequest nicely guards against a nil handler result by synthesizing a JobResponse with Status: JobStatus_failed and a reason. Two minor robustness tweaks you might consider:

  • If the handler returns a non‑nil JobResponse but forgets to populate ID, you could default it to jobReq.ID so the management server can always correlate responses:
jobResp := msgHandler(jobReq)
if jobResp == nil {
    jobResp = &proto.JobResponse{
        ID:     jobReq.ID,
        Status: proto.JobStatus_failed,
        Reason: []byte("handler returned nil response"),
    }
} else if len(jobResp.ID) == 0 {
    jobResp.ID = jobReq.ID
}
  • For logging, string(jobReq.ID) / string(resp.ID) assumes the IDs are valid UTF‑8. If these are ever changed to non‑textual bytes, consider logging them as %x or via hex.EncodeToString to avoid odd output.

Not blockers, but they make the job channel a bit more defensive and easier to debug.


297-395: Sync refactor via connectToSyncStream/receiveUpdatesEvents looks consistent with existing patterns

The split into connectToSyncStream and receiveUpdatesEvents reads clean and matches the encryption/decryption contract used elsewhere (EncryptMessage(serverPubKey, c.key, req) and DecryptMessage(serverPubKey, c.key, update.Body, resp)).

A small optional improvement: GetNetworkMap now reimplements the “single Recv + decrypt SyncResponse” logic that’s very similar to what receiveUpdatesEvents does in a loop. If you find yourself touching this code again, you might consider a tiny shared helper (e.g., decryptSyncUpdate(serverPubKey, update *proto.EncryptedMessage) (*proto.SyncResponse, error)) to keep the decryption path fully DRY.

Functionally, the new sync connection flow looks correct.

client/internal/debug/upload_test.go (1)

1-1: Test now exercises the exported UploadDebugBundle correctly

Switching to package debug and calling UploadDebugBundle(context.Background(), testURL+types.GetURLPath, testURL, file) matches the new API and keeps the expectations (getURLHash(testURL) prefix, stored file content) intact.

You might consider using a context with timeout here (and/or a simple readiness wait on srv.Start) to make this integration-style test more robust under slow CI environments.

Also applies to: 41-48

client/cmd/debug.go (1)

338-360: Debug bundle generator now uses LogPath and shared default upload URL

Passing LogPath: logFilePath into debug.NewBundleGenerator matches the updated GeneratorDependencies and ensures the bundle generator knows where to look for logs. Reusing types.DefaultBundleURL as the default for --upload-bundle-url in both debug bundle and debug for keeps the CLI consistent with the upload-server configuration.

Consider clarifying in the flag help text that the default is the NetBird-hosted debug upload service, so self‑hosters know they should override it if needed.

Also applies to: 369-379

client/server/server_test.go (1)

21-21: JobManager wiring into test management server is correct

Creating a jobManager := job.NewJobManager(nil, store) and passing it into both server.BuildManager(...) and nbgrpc.NewServer(...) matches the updated constructor signatures and ensures job flows are exercised in this test setup.

The if err != nil { return nil, "", err } right after eventStore := &activity.InMemoryEventStore{} still refers to the earlier err from NewTestStoreFromSQL, which has already been handled and is guaranteed nil here. It can be removed to avoid confusion.

Also applies to: 298-329

client/internal/debug/debug.go (1)

30-34: Status seeding via FullStatus/nbstatus is coherent with anonymization strategy

Using statusRecorder.GetFullStatus()nbstatus.ToProtoFullStatusConvertToStatusOutputOverviewParseToFullDetailSummary aligns the status.txt contents with the regular netbird status output and correctly injects event history and profile name. Seeding the bundle anonymizer via seedFromStatus(g.anonymizer, &fullStatus) leverages this richer status snapshot to anonymize later artifacts more consistently.

One minor nit: seedFromStatus runs even when g.anonymize is false, which is harmless but unnecessary work; you could optionally guard it with if g.anonymize { ... } for clarity.

Also applies to: 382-405, 852-880

shared/management/http/api/generate.sh (1)

14-16: oapi-codegen v2 install path looks correct; consider pinning a version

The switch to github.com/oapi-codegen/oapi-codegen/v2/cmd/oapi-codegen@latest matches the v2 module layout and should work fine with recent Go toolchains. For more reproducible codegen in CI, you might eventually want to pin a specific version instead of @latest.

client/embed/embed.go (1)

175-185: Recorder integration and updated Run call look consistent

Wiring a peer.NewRecorder into NewConnectClient and updating Run to Run(run, "") matches the new client API and should keep embedded startup behavior intact. If you later surface a log/debug path in Options, this is the right place to thread it through instead of an empty string.

management/server/http/testing/testing_tools/channel/channel.go (1)

19-39: JobManager wiring into BuildManager is correct; reuse metrics instead of nil

Injecting a concrete job.Manager into BuildManager here is a nice improvement in test realism. Given you already construct metrics in this helper, you can pass it into the job manager instead of nil so job‑related metrics behavior is also covered in tests:

-	peersUpdateManager := update_channel.NewPeersUpdateManager(nil)
-	jobManager := job.NewJobManager(nil, store)
+	peersUpdateManager := update_channel.NewPeersUpdateManager(nil)
+	jobManager := job.NewJobManager(metrics, store)
...
-	am, err := server.BuildManager(ctx, nil, store, networkMapController, jobManager, nil, "", &activity.InMemoryEventStore{}, geoMock, false, validatorMock, metrics, proxyController, settingsManager, permissionsManager, false)
+	am, err := server.BuildManager(ctx, nil, store, networkMapController, jobManager, nil, "", &activity.InMemoryEventStore{}, geoMock, false, validatorMock, metrics, proxyController, settingsManager, permissionsManager, false)

Also applies to: 53-79

management/server/account_test.go (1)

35-50: Account test manager now wires a JobManager; prefer passing metrics instead of nil

Importing job and passing a concrete JobManager into BuildManager in createManager matches the new account manager wiring and lets tests exercise job‑related paths. Since you already compute metrics in this helper, you can avoid a nil metrics field on the JobManager and better mirror production setup by doing:

-	updateManager := update_channel.NewPeersUpdateManager(metrics)
-	requestBuffer := NewAccountRequestBuffer(ctx, store)
-	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), &config.Config{})
-	manager, err := BuildManager(ctx, nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	updateManager := update_channel.NewPeersUpdateManager(metrics)
+	requestBuffer := NewAccountRequestBuffer(ctx, store)
+	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.cloud", port_forwarding.NewControllerMock(), &config.Config{})
+	manager, err := BuildManager(ctx, nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Also applies to: 2931-2969

management/server/route_test.go (1)

22-34: Route tests correctly inject a JobManager; reuse metrics for consistency

The additional job import and passing job.NewJobManager(nil, store) into BuildManager in createRouterManager are aligned with the new constructor and ensure route tests run with job support enabled. Since this helper already initializes metrics, you can tighten the fidelity of the test setup by wiring it into the JobManager too:

-	updateManager := update_channel.NewPeersUpdateManager(metrics)
-	requestBuffer := NewAccountRequestBuffer(ctx, store)
-	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), &config.Config{})
-
-	am, err := BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	updateManager := update_channel.NewPeersUpdateManager(metrics)
+	requestBuffer := NewAccountRequestBuffer(ctx, store)
+	networkMapController := controller.NewController(ctx, store, metrics, updateManager, requestBuffer, MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), &config.Config{})
+
+	am, err := BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Also applies to: 1258-1302

management/server/peer_test.go (1)

35-35: Prefer initializing job.Manager with real metrics in tests

The wiring of BuildManager looks correct (argument order matches the updated signature), but all these test setups construct the job manager with job.NewJobManager(nil, s) while metrics is already available. This leaves job.Manager.metrics nil in tests, which can (a) hide metric-related bugs and (b) potentially panic later if the manager starts using metrics without nil checks.

Consider passing the same metrics instance you already create:

-	am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(nil, s), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	am, err := BuildManager(context.Background(), nil, s, networkMapController, job.NewJobManager(metrics, s), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Apply the same pattern to the other BuildManager invocations in this file for consistency.

Also applies to: 1296-1296, 1381-1381, 1534-1534, 1614-1614

management/server/nameserver_test.go (1)

19-19: Use the existing metrics instance when creating JobManager

createNSManager correctly wires the new jobManager dependency into BuildManager, but you’re passing job.NewJobManager(nil, store) even though a real metrics instance is already created and used for update_channel.NewPeersUpdateManager.

To avoid a nil metrics inside job.Manager in tests (and potential surprises if metrics are used later), prefer:

-	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Also applies to: 798-799

management/server/dns_test.go (1)

17-17: Align test JobManager construction with real metrics

createDNSManager correctly injects a JobManager into BuildManager, but uses job.NewJobManager(nil, store) despite having a metrics instance already.

For more realistic tests and to avoid a nil metrics field inside job.Manager, consider:

-	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(nil, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)
+	return BuildManager(context.Background(), nil, store, networkMapController, job.NewJobManager(metrics, store), nil, "", eventStore, nil, false, MockIntegratedValidator{}, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false)

Also applies to: 229-230

management/server/management_proto_test.go (1)

32-32: JobManager wiring in tests looks correct; consider reusing metrics

The test correctly instantiates job.Manager and threads it through BuildManager and nbgrpc.NewServer, matching the new signatures. To exercise metrics inside the job manager during tests, you might consider passing the metrics instance instead of nil when creating jobManager, but that's optional.

Also applies to: 342-343, 369-371, 380-381

client/cmd/testutil_test.go (1)

19-19: Consistent JobManager setup in test helper; minor polish possible

The JobManager is correctly created and passed into BuildManager and nbgrpc.NewServer, aligning with the new signatures. Two small nits you could optionally address later:

  • Pass the metrics instance to job.NewJobManager instead of nil if you want job metrics in tests.
  • The if err != nil { return nil, nil } check after eventStore := &activity.InMemoryEventStore{} is dead code and can be removed.

Also applies to: 92-96, 123-124, 129-130

management/server/account.go (1)

18-18: JobManager injection is clean; clarify non‑nil contract

Adding jobManager *job.Manager to DefaultAccountManager and threading it via BuildManager is consistent and keeps wiring explicit. However, methods like CreatePeerJob dereference am.jobManager without nil checks, so BuildManager effectively requires a non‑nil JobManager in all call sites. Consider either:

  • Documenting that jobManager must be non‑nil (and enforcing via tests), or
  • Adding a defensive nil check that returns a clear status.Internal/status.PreconditionFailed error instead of panicking if it’s miswired.

Would you like a small ast-grep/rg script to verify that all BuildManager call sites now pass a non‑nil JobManager?

Also applies to: 68-76, 179-197, 203-224

management/server/management_test.go (1)

31-31: Test server wiring with JobManager is correct

The test server now correctly constructs a JobManager and passes it through to BuildManager and nbgrpc.NewServer, matching the new APIs. As with other tests, you could optionally pass metrics instead of nil into job.NewJobManager if you want job metrics exercised, but the current setup is functionally fine.

Also applies to: 183-185, 212-228, 235-247

management/server/http/handlers/peers/peers_handler.go (1)

31-41: New peer job HTTP endpoints are well‑structured; tighten validation and rely on fixed ownership check

The new /peers/{peerId}/jobs and /peers/{peerId}/jobs/{jobId} handlers are consistent with existing patterns (auth from context, util.WriteError/WriteJSONObject, and conversion helpers). A few small points:

  • Once GetPeerJobByID is fixed to assert job.PeerID == peerID (see account manager comment), these handlers will correctly enforce per‑peer job ownership.
  • For consistency with other handlers in this file (e.g. HandlePeer, GetAccessiblePeers), you may want to add an explicit empty‑peerId check in CreateJob, ListJobs, and GetJob to return a clear InvalidArgument instead of silently passing "" through.
  • toSingleJobResponse cleanly maps the domain Job to the HTTP JobResponse, including optional FailedReason and typed Workload; that looks good.

Overall, the HTTP surface for jobs is in good shape once the backend ownership check is tightened.

Also applies to: 51-142, 618-638

shared/management/proto/management.proto (1)

52-53: Refine Job proto types for clarity and forward compatibility

The Job RPC and related messages look structurally sound and align with the existing EncryptedMessage pattern. A few tweaks would make them easier to use:

  • JobResponse.Reason as bytes is unusual for a human-readable explanation; a string would be more ergonomic unless you explicitly plan to ship arbitrary binary blobs.
  • JobStatus only has unknown_status, succeeded, failed. If you intend to reflect persistent job lifecycle (pending/running vs completed) it may be worth adding an explicit in‑progress state instead of overloading unknown_status as a placeholder.
  • Consider documenting expected encoding/format for JobRequest.ID / JobResponse.ID (stringified UUID vs raw bytes) so store/types.Job integration remains consistent.

Functionally this is fine, these are just protocol polish suggestions.

Also applies to: 66-89

client/internal/debug/upload.go (1)

15-101: Upload helper is solid; consider small robustness improvements

The end-to-end flow (get presigned URL → size check → PUT upload) looks good and side‑effect free. A few minor refinements:

  • In upload, prefer %w when wrapping the HTTP error to preserve the error chain:
    return fmt.Errorf("upload failed: %w", err)
  • getUploadURL builds the query as url+"?id="+id. If url might ever include its own query string, using net/url to append id would be safer.
  • Both getUploadURL and upload strictly require StatusOK. If the upload service ever returns another 2xx (e.g. 201/204), this will be treated as failure. If you control that service and guarantee 200, current code is fine; otherwise consider if putResp.StatusCode/100 != 2.

None of these are blockers; the current implementation should work as intended.

client/server/debug.go (1)

27-39: Use the incoming context (and possibly avoid holding the mutex) during upload

The switch to debug.UploadDebugBundle and LogPath: s.logFile looks correct.

Two follow‑up improvements worth considering:

  • DebugBundle ignores the incoming context and calls UploadDebugBundle with context.Background(). That means client cancellation/timeouts won’t stop the upload. Renaming the parameter to ctx context.Context and passing ctx through would make this RPC better behaved under cancellation.
  • The upload is done while s.mutex is held. For large bundles or slow networks this can block other server operations behind this lock. If state safety allows it, you might generate the bundle under the lock, then release the mutex before starting the network upload.

Behavior is fine for a rarely used debug path, but these tweaks would make it more responsive.

Also applies to: 49-57

management/server/account/manager.go (1)

127-129: Job APIs on Manager look good; consider pagination/filters

The new job methods are consistent with the rest of the Manager interface (accountID/userID/peerID ordering, types.Job usage).

Two design points to keep in mind for implementations:

  • GetAllPeerJobs can grow unbounded over time; if jobs are expected to accumulate, consider adding pagination and/or server‑side filtering (e.g. by status or time range) at some point.
  • Ensure implementations consistently enforce userID authorization, since these methods expose per‑peer operational history.

No changes required here, just points to watch in the concrete manager.

client/internal/engine_test.go (1)

1591-1592: JobManager wiring in startManagement test mirrors production; consider reusing metrics

The new jobManager := job.NewJobManager(nil, store) and its injection into BuildManager and nbgrpc.NewServer properly mirror the production wiring, so tests now exercise the job pipeline end‑to‑end.

Since you already create metrics, err := telemetry.NewDefaultAppMetrics(...) a few lines later, you may want to:

  • Construct metrics first, then
  • Call job.NewJobManager(metrics, store)

This would keep metrics behavior consistent between the JobManager and the rest of the management stack even in tests. Not required for correctness, but a bit cleaner.

Also applies to: 1622-1623, 1628-1629

shared/management/http/api/openapi.yml (2)

41-121: Align bundle/workload schema with actual bundle generator config

The Bundle/Workload schemas look reasonable, but there are a couple of potential contract mismatches to double‑check:

  • BundleParameters exposes bundle_for and bundle_for_time while the current debug.BundleConfig only has Anonymize, IncludeSystemInfo, and LogFileCount. There is no way for callers to control IncludeSystemInfo, and the semantics of bundle_for/bundle_for_time vs. what the client actually does (currently just a wait + regular bundle) should be clarified or adjusted so the API is not misleading.
  • You’re using a discriminator on WorkloadRequest/WorkloadResponse with propertyName: type. Make sure the generated clients you care about support this pattern where the discriminator property lives only in the concrete types (BundleWorkloadRequest/Response) and not also declared at the base level; some generators are picky here.

I’d suggest either:

  • Extending BundleConfig + implementation to truly honor all declared parameters, or
  • Tightening the OpenAPI description and fields to only what is actually used today, to avoid surprising API consumers.

35-38: Clarify experimental nature and pagination/filtering for Jobs endpoints

The Jobs tag is marked x-experimental: true, but the individual operations are not. If you rely on tooling that inspects operation/vendor extensions rather than tags, you may want to duplicate x-experimental: true on the new /api/peers/{peerId}/jobs and /api/peers/{peerId}/jobs/{jobId} operations.

Additionally, listing jobs currently returns an unbounded array with no filtering or pagination parameters. If job counts can become large per peer, it’s worth at least documenting ordering (e.g. newest first) and planning for page/page_size/status filters, even if you defer implementing them now.

Also applies to: 2353-2456

shared/management/client/client_test.go (1)

25-26: JobManager wiring in tests looks correct; consider reusing metrics if you need job metrics

The new jobManager := job.NewJobManager(nil, store) and its injection into both BuildManager and nbgrpc.NewServer align with the updated constructor signatures and keep the tests realistic.

If you ever want to exercise job‑related metrics in tests, you could pass the already‑created metrics instead of nil to NewJobManager; otherwise this setup is fine as‑is for functional tests.

Also applies to: 76-77, 123-124, 130-132

management/server/store/sql_store.go (1)

136-207: Job CRUD helpers mostly fine; consider tightening error handling and update scope

Functionality looks correct overall (ID scoping, account/peer filters, status transitions), but a few details could be improved:

  • CompletePeerJob updates by id only; since ID is the primary key this is safe, but for consistency with GetPeerJobByID you may want to include account_id in the WHERE as well.
  • CreatePeerJob, CompletePeerJob, and MarkPendingJobsAsFailed correctly wrap errors with status.Internal, but GetPeerJobByID / GetPeerJobs return raw GORM errors on internal failure. Consider wrapping those in status.Errorf(status.Internal, ...) as done elsewhere in this store so callers don’t depend on storage-specific errors.
  • If types.Job.ApplyResponse only populates a subset of fields, CompletePeerJob’s Updates(job) call is fine; if it ever starts zeroing other fields, you may want to switch to an explicit field list/map to avoid unintended overwrites.
management/internals/shared/grpc/server.go (1)

184-219: Job stream handling is generally sound; consider surfacing send failures

The new Job RPC and helpers hold up well:

  • Handshake reads a single encrypted JobRequest and reuses the existing parseRequest path, so message framing and crypto stay consistent with Sync/Login.
  • Account/peer lookup mirrors the Sync flow and correctly maps missing peers to PermissionDenied / Unauthenticated.
  • CreateJobChannel + deferred CloseChannel provide a clear per-peer lifecycle, and startResponseReceiver cleanly demultiplexes JobResponse messages into the job manager.

One point to consider tightening:

  • In sendJobsLoop, any failure in sendJob is logged and then the method returns nil, so the gRPC stream terminates with an OK status even if job delivery to the client failed. For parity with handleUpdates/sendUpdate, you may want to return the error (or a mapped status error) so callers can observe transport failures on the Job stream.

Also applies to: 333-397, 450-467

management/server/job/manager.go (2)

44-151: Pending‑job cleanup semantics are coarse; minor concurrency detail worth double‑checking

Overall behavior is coherent (fail stuck jobs, enqueue new ones, complete on response), but a few aspects could be sharpened:

  • CreateJobChannel calls MarkPendingJobsAsFailed for (accountID, peerID) before taking the lock. That’s fine functionally, but means “stuck too long” in the DB is actually “any pending job when a fresh channel is created” — if you later add explicit TTL semantics, you may want a more targeted query.
  • In SendJob, you look up the Channel under RLock and then call ch.AddEvent after releasing the lock. If CreateJobChannel or CloseChannel can call ch.Close() concurrently, make sure Channel.AddEvent is written to safely detect a closed channel and return ErrJobChannelClosed rather than panic on send to a closed chan.
  • Both cleanup and CloseChannel use MarkPendingJobsAsFailed(accountID, peerID, reason), which marks all pending DB jobs for that peer, not just the specific jobID. That’s acceptable if you only ever have a single pending job per peer, but if you later support multiple in‑flight jobs, you’ll likely want a job‑scoped failure path in the store.
  • In CloseChannel, the loop over jm.pending calls MarkPendingJobsAsFailed once per event for the same peer; a minor optimization is to call it once per peerID, then delete all relevant pending entries.

These are mostly behavioral clarifications and small optimizations rather than blockers.


90-117: HandleResponse flow is correct but could simplify error‑response coupling

HandleResponse correctly:

  • Looks up the pending event by jobID.
  • Builds a types.Job from the response via ApplyResponse.
  • Calls Store.CompletePeerJob and always removes the entry from pending.

Given that the event.Response field is only set when CompletePeerJob succeeds and then immediately dropped from pending, you could omit event.Response entirely or set it before calling the store to simplify reasoning. Current behavior is valid; this is mainly an internal API cleanup opportunity.

client/status/status.go (1)

122-169: Overview mapping from FullStatus looks correct; consider a nil guard

Switching ConvertToStatusOutputOverview to use *proto.FullStatus directly and threading in daemonVersion matches how the proto is structured and keeps the output fields aligned with the underlying status (management, signal, local peer, relays, DNS, events, SSH).

One small robustness improvement: if pbFullStatus can ever be nil (e.g., from an older daemon or error path), pbFullStatus.Get...() calls will panic. A quick upfront check like

if pbFullStatus == nil {
    return OutputOverview{}
}

would make the function safer against unexpected inputs.

management/server/types/job.go (2)

105-138: Minor: preserve wrapped error instead of using err.Error()

In BuildWorkloadResponse, you wrap bundle build errors as:

if err := j.buildBundleResponse(&wl); err != nil {
    return nil, status.Errorf(status.InvalidArgument, err.Error())
}

Using err.Error() as the format string loses the original error as a distinct argument and makes further wrapping/debugging slightly harder. Consider:

- return nil, status.Errorf(status.InvalidArgument, err.Error())
+ return nil, status.Errorf(status.InvalidArgument, "%v", err)

to keep the original error value intact and consistent with the rest of the file.


140-162: Bundle parameter validation is clear; consider tightening semantics only if needed

The validation for BundleForTime (1–5 minutes when BundleFor is true) and LogFileCount (1–1000) is straightforward and matches the API docs. You also normalize stored parameters to the BundleParameters JSON and initialize Result to {}, which keeps BuildWorkloadResponse happy for pending/failed jobs.

If you ever want to hard‑fail obviously nonsensical inputs when BundleFor is false (e.g., negative BundleForTime), you could extend the check here, but that’s optional and not strictly required by the current contract.

shared/management/http/api/types.gen.go (1)

2112-2169: WorkloadRequest helpers match server usage; consider using the WorkloadType constant

The As*/From*/Merge* helpers and Discriminator/ValueByDiscriminator for WorkloadRequest are consistent with how NewJob and validateAndBuildBundleParams consume the workload. One small optional improvement would be to use WorkloadTypeBundle instead of the literal "bundle" to avoid drift if the constant ever changes:

-func (t *WorkloadRequest) FromBundleWorkloadRequest(v BundleWorkloadRequest) error {
-    v.Type = "bundle"
+func (t *WorkloadRequest) FromBundleWorkloadRequest(v BundleWorkloadRequest) error {
+    v.Type = WorkloadTypeBundle

Same for MergeBundleWorkloadRequest and the switch in ValueByDiscriminator.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32146e5 and 62ae8d8.

⛔ Files ignored due to path filters (4)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • go.sum is excluded by !**/*.sum
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (52)
  • .github/workflows/wasm-build-validation.yml (1 hunks)
  • client/cmd/debug.go (1 hunks)
  • client/cmd/status.go (1 hunks)
  • client/cmd/testutil_test.go (3 hunks)
  • client/cmd/up.go (1 hunks)
  • client/embed/embed.go (1 hunks)
  • client/internal/connect.go (7 hunks)
  • client/internal/debug/debug.go (7 hunks)
  • client/internal/debug/upload.go (1 hunks)
  • client/internal/debug/upload_test.go (2 hunks)
  • client/internal/engine.go (13 hunks)
  • client/internal/engine_test.go (9 hunks)
  • client/jobexec/executor.go (1 hunks)
  • client/proto/daemon.proto (0 hunks)
  • client/server/debug.go (2 hunks)
  • client/server/server.go (4 hunks)
  • client/server/server_test.go (3 hunks)
  • client/status/status.go (4 hunks)
  • client/status/status_test.go (1 hunks)
  • client/ui/debug.go (2 hunks)
  • go.mod (2 hunks)
  • management/internals/server/boot.go (1 hunks)
  • management/internals/server/controllers.go (3 hunks)
  • management/internals/server/modules.go (1 hunks)
  • management/internals/shared/grpc/server.go (11 hunks)
  • management/server/account.go (4 hunks)
  • management/server/account/manager.go (1 hunks)
  • management/server/account_test.go (2 hunks)
  • management/server/activity/codes.go (2 hunks)
  • management/server/dns_test.go (2 hunks)
  • management/server/http/handlers/peers/peers_handler.go (3 hunks)
  • management/server/http/testing/testing_tools/channel/channel.go (3 hunks)
  • management/server/job/channel.go (1 hunks)
  • management/server/job/manager.go (1 hunks)
  • management/server/management_proto_test.go (4 hunks)
  • management/server/management_test.go (4 hunks)
  • management/server/mock_server/account_mock.go (2 hunks)
  • management/server/nameserver_test.go (2 hunks)
  • management/server/peer.go (1 hunks)
  • management/server/peer_test.go (5 hunks)
  • management/server/route_test.go (2 hunks)
  • management/server/store/sql_store.go (3 hunks)
  • management/server/store/store.go (1 hunks)
  • management/server/types/job.go (1 hunks)
  • shared/management/client/client.go (1 hunks)
  • shared/management/client/client_test.go (3 hunks)
  • shared/management/client/grpc.go (8 hunks)
  • shared/management/client/mock.go (2 hunks)
  • shared/management/http/api/generate.sh (1 hunks)
  • shared/management/http/api/openapi.yml (2 hunks)
  • shared/management/http/api/types.gen.go (8 hunks)
  • shared/management/proto/management.proto (2 hunks)
💤 Files with no reviewable changes (1)
  • client/proto/daemon.proto
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.

Applied to files:

  • client/jobexec/executor.go
📚 Learning: 2025-11-14T13:05:31.729Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.

Applied to files:

  • client/jobexec/executor.go
🧬 Code graph analysis (42)
shared/management/client/client.go (3)
management/server/types/job.go (1)
  • Job (34-58)
shared/management/http/api/types.gen.go (2)
  • JobRequest (708-710)
  • JobResponse (713-721)
shared/management/proto/management.pb.go (6)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
management/internals/server/controllers.go (3)
management/internals/server/server.go (1)
  • BaseServer (45-68)
management/server/job/manager.go (2)
  • Manager (23-30)
  • NewJobManager (32-42)
management/internals/server/container.go (1)
  • Create (6-10)
client/status/status_test.go (1)
client/status/status.go (1)
  • ConvertToStatusOutputOverview (122-169)
client/internal/debug/upload_test.go (2)
client/internal/debug/upload.go (1)
  • UploadDebugBundle (17-28)
upload-server/types/upload.go (1)
  • GetURLPath (9-9)
management/server/store/store.go (1)
management/server/types/job.go (1)
  • Job (34-58)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
client/internal/debug/upload.go (1)
upload-server/types/upload.go (3)
  • GetURLResponse (15-18)
  • ClientHeader (5-5)
  • ClientHeaderValue (7-7)
client/cmd/up.go (1)
util/log.go (1)
  • FindFirstLogPath (77-84)
management/server/account_test.go (2)
management/server/account.go (1)
  • BuildManager (180-268)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/peer.go (7)
management/server/account.go (1)
  • DefaultAccountManager (68-113)
management/server/types/job.go (2)
  • Job (34-58)
  • Workload (60-64)
management/server/permissions/modules/module.go (1)
  • Peers (7-7)
management/server/permissions/operations/operation.go (1)
  • Delete (9-9)
shared/management/status/error.go (5)
  • NewPermissionValidationError (213-215)
  • NewPermissionDeniedError (209-211)
  • NewPeerNotPartOfAccountError (105-107)
  • Errorf (70-75)
  • Type (46-46)
management/server/store/store.go (3)
  • Store (50-211)
  • LockingStrengthNone (47-47)
  • LockingStrengthUpdate (43-43)
management/server/activity/codes.go (1)
  • JobCreatedByUser (183-183)
management/server/http/testing/testing_tools/channel/channel.go (2)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/server/modules.go (1)
management/server/account.go (1)
  • BuildManager (180-268)
shared/management/client/mock.go (3)
shared/management/http/api/types.gen.go (2)
  • JobRequest (708-710)
  • JobResponse (713-721)
shared/management/proto/management.pb.go (6)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
management/server/types/job.go (1)
  • Job (34-58)
management/server/account/manager.go (1)
management/server/types/job.go (1)
  • Job (34-58)
client/embed/embed.go (1)
client/internal/connect.go (1)
  • NewConnectClient (51-62)
client/cmd/status.go (1)
client/status/status.go (1)
  • ConvertToStatusOutputOverview (122-169)
management/server/management_test.go (1)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/peer_test.go (2)
management/server/account.go (1)
  • BuildManager (180-268)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
client/server/server.go (1)
client/status/status.go (1)
  • ToProtoFullStatus (549-635)
management/server/mock_server/account_mock.go (1)
management/server/types/job.go (1)
  • Job (34-58)
management/server/nameserver_test.go (2)
management/server/account.go (1)
  • BuildManager (180-268)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/job/channel.go (1)
management/server/job/manager.go (1)
  • Event (17-21)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
  • GetUserAuthFromContext (25-30)
shared/management/http/util/util.go (3)
  • WriteError (84-120)
  • WriteErrorResponse (70-80)
  • WriteJSONObject (27-35)
shared/management/http/api/types.gen.go (3)
  • JobRequest (708-710)
  • JobResponse (713-721)
  • JobResponseStatus (724-724)
management/server/types/job.go (3)
  • NewJob (67-103)
  • Job (34-58)
  • Workload (60-64)
client/jobexec/executor.go (3)
client/internal/debug/debug.go (3)
  • GeneratorDependencies (238-243)
  • BundleConfig (232-236)
  • NewBundleGenerator (245-264)
client/internal/debug/upload.go (1)
  • UploadDebugBundle (17-28)
upload-server/types/upload.go (1)
  • DefaultBundleURL (11-11)
client/server/debug.go (1)
client/internal/debug/upload.go (1)
  • UploadDebugBundle (17-28)
client/internal/engine.go (6)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
client/jobexec/executor.go (3)
  • Executor (23-24)
  • NewExecutor (26-28)
  • ErrJobNotImplemented (20-20)
shared/management/client/client.go (1)
  • Client (14-27)
shared/management/proto/management.pb.go (21)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • JobStatus_failed (30-30)
  • JobRequest_Bundle (457-459)
  • JobRequest_Bundle (461-461)
  • JobStatus_succeeded (29-29)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • JobResponse_Bundle (548-550)
  • JobResponse_Bundle (552-552)
  • SyncResponse (721-738)
  • SyncResponse (753-753)
  • SyncResponse (768-770)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
client/internal/state.go (1)
  • CtxGetState (31-33)
client/internal/debug/debug.go (2)
  • GeneratorDependencies (238-243)
  • BundleConfig (232-236)
management/internals/shared/grpc/server.go (6)
management/server/account/manager.go (1)
  • Manager (27-131)
management/server/job/manager.go (2)
  • Manager (23-30)
  • Event (17-21)
shared/management/proto/management_grpc.pb.go (1)
  • ManagementService_JobServer (427-431)
shared/management/proto/management.pb.go (9)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • EncryptedMessage (322-333)
  • EncryptedMessage (348-348)
  • EncryptedMessage (363-365)
management/server/job/channel.go (2)
  • Channel (18-21)
  • ErrJobChannelClosed (15-15)
encryption/message.go (1)
  • EncryptMessage (10-24)
management/server/store/sql_store.go (1)
management/server/types/job.go (3)
  • Job (34-58)
  • JobStatusPending (18-18)
  • JobStatusFailed (20-20)
management/server/job/manager.go (3)
shared/management/proto/management.pb.go (6)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
management/server/job/channel.go (2)
  • Channel (18-21)
  • NewChannel (23-29)
management/server/types/job.go (1)
  • Job (34-58)
shared/management/client/client_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
client/internal/connect.go (2)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
management/server/dns_test.go (2)
management/server/account.go (1)
  • BuildManager (180-268)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
shared/management/client/grpc.go (3)
shared/management/proto/management.pb.go (13)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • EncryptedMessage (322-333)
  • EncryptedMessage (348-348)
  • EncryptedMessage (363-365)
  • JobStatus_failed (30-30)
  • SyncResponse (721-738)
  • SyncResponse (753-753)
  • SyncResponse (768-770)
shared/management/proto/management_grpc.pb.go (2)
  • ManagementService_JobClient (169-173)
  • ManagementService_SyncClient (89-92)
encryption/message.go (2)
  • EncryptMessage (10-24)
  • DecryptMessage (27-40)
management/server/types/job.go (3)
shared/management/proto/management.pb.go (22)
  • JobStatus (25-25)
  • JobStatus (57-59)
  • JobStatus (61-63)
  • JobStatus (70-72)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • JobStatus_succeeded (29-29)
  • JobStatus_failed (30-30)
  • JobResponse_Bundle (548-550)
  • JobResponse_Bundle (552-552)
  • JobRequest_Bundle (457-459)
  • JobRequest_Bundle (461-461)
shared/management/http/api/types.gen.go (8)
  • JobRequest (708-710)
  • WorkloadResponse (1948-1950)
  • BundleParameters (361-373)
  • BundleResult (376-378)
  • BundleWorkloadResponse (391-399)
  • WorkloadTypeBundle (195-195)
  • WorkloadRequest (1943-1945)
  • JobResponse (713-721)
shared/management/status/error.go (4)
  • Errorf (70-75)
  • BadRequest (36-36)
  • InvalidArgument (27-27)
  • Error (54-57)
client/server/server_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
client/internal/debug/debug.go (4)
util/log.go (1)
  • SpecialLogs (25-28)
client/internal/profilemanager/profilemanager.go (1)
  • NewProfileManager (56-58)
client/status/status.go (3)
  • ToProtoFullStatus (549-635)
  • ConvertToStatusOutputOverview (122-169)
  • ParseToFullDetailSummary (532-547)
version/version.go (1)
  • NetbirdVersion (18-20)
client/internal/engine_test.go (5)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
shared/management/client/mock.go (1)
  • MockClient (13-24)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
shared/management/http/api/types.gen.go (2)
shared/management/proto/management.pb.go (9)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
management/server/types/job.go (1)
  • Workload (60-64)
management/server/route_test.go (2)
management/server/account.go (1)
  • BuildManager (180-268)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
client/status/status.go (2)
client/proto/daemon.pb.go (21)
  • FullStatus (1994-2008)
  • FullStatus (2021-2021)
  • FullStatus (2036-2038)
  • ManagementState (1682-1689)
  • ManagementState (1702-1702)
  • ManagementState (1717-1719)
  • SignalState (1621-1628)
  • SignalState (1641-1641)
  • SignalState (1656-1658)
  • LocalPeerState (1528-1539)
  • LocalPeerState (1552-1552)
  • LocalPeerState (1567-1569)
  • PeerState (1347-1369)
  • PeerState (1382-1382)
  • PeerState (1397-1399)
  • RelayState (1743-1750)
  • RelayState (1763-1763)
  • RelayState (1778-1780)
  • NSGroupState (1803-1811)
  • NSGroupState (1824-1824)
  • NSGroupState (1839-1841)
client/internal/peer/status.go (6)
  • FullStatus (151-161)
  • ManagementState (128-132)
  • SignalState (121-125)
  • LocalPeerState (106-112)
  • RosenpassState (135-138)
  • NSGroupState (142-148)
management/server/management_proto_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
client/cmd/testutil_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: release_ui
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: iOS / Build
  • GitHub Check: Android / Build
  • GitHub Check: JS / Lint
  • GitHub Check: Windows
  • GitHub Check: Linux
  • GitHub Check: Darwin
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: Build Cache
  • GitHub Check: Check External GPL/AGPL Licenses
  • GitHub Check: Client / Unit

Comment on lines +11 to +21
// todo consider the channel buffer size when we allow to run multiple jobs
const jobChannelBuffer = 1

var (
ErrJobChannelClosed = errors.New("job channel closed")
)

type Channel struct {
events chan *Event
once sync.Once
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Guard against send‑on‑closed channel and reconsider where timeout is enforced

Two concerns around Channel’s concurrency and semantics:

  1. Potential panic on send after Close
    Close() closes jc.events, and AddEvent always does jc.events <- event in the select case without checking for closure. If Close can race with AddEvent (which is likely when job streams disconnect while new jobs are being queued), a send on a closed channel will panic the process.
    Consider adding internal state to mark the channel as closed and have AddEvent short‑circuit with ErrJobChannelClosed (or similar) instead of ever sending to a potentially closed channel, or ensure by design that only the sender closes and that no further AddEvent calls happen after Close.

  2. Timeout at enqueue vs. job response (already noted by your TODO)
    The current responseWait timeout only limits how long you wait to enqueue into the buffered channel, not how long you wait for the peer’s job response. With jobChannelBuffer = 1, this effectively prevents a second job from being queued while the first one is still pending, but does not clean up pending jobs when a peer never responds. Moving timeout handling to the job lifecycle (pending → failed) in the manager would better match the comment.

I’d treat the send‑on‑closed‑channel risk as something to fix before merging; the timeout refactor can follow.

Also applies to: 31-41, 43-59


I'll examine the full implementation to validate the concurrency concerns.


---

🏁 Script executed:

```shell
#!/bin/bash
# Retrieve the full channel.go file to examine Close(), AddEvent(), and Event() implementations
find . -name "channel.go" -path "*/job/*" -type f | head -5 | xargs -I {} cat {}

Length of output: 1183


Guard against send-on-closed channel and reconsider where timeout is enforced

Two concerns around Channel's concurrency and semantics:

  1. Potential panic on send after Close
    Close() uses sync.Once to close jc.events exactly once, but this only prevents double-close, not send-on-closed-channel panics. If AddEvent() is called concurrently with Close(), the select case that sends to jc.events can execute after the channel is closed, causing a panic. The sync.Once guard does not prevent this race.
    Add an internal closed flag (guarded by mutex) that AddEvent checks before entering the select, or ensure by documented contract and design that AddEvent cannot be called after Close(). This must be fixed before merging.

  2. Timeout at enqueue vs. job response (confirmed by TODO)
    The responseWait timeout only limits how long to wait to enqueue into the buffered channel, not how long to wait for the peer's job response. With jobChannelBuffer = 1, this prevents a second job from being queued while the first is pending, but does not clean up pending jobs when a peer never responds. The TODO comment (lines 34–35) confirms this is a known issue. Move timeout handling to the job lifecycle (pending → failed) in the manager.

🤖 Prompt for AI Agents
In management/server/job/channel.go around lines 11–21, AddEvent can panic if it
races with Close because closing the channel is guarded only by sync.Once; add
an internal boolean closed guarded by a mutex (or RWMutex) that Close sets and
AddEvent checks before attempting to send, returning ErrJobChannelClosed if
closed to avoid send-on-closed panics; also make the send non-blocking/select
with a default path to return an error if the channel is closed or full.
Separately, move the timeout logic out of the enqueue path into the job
lifecycle in the manager: stop using responseWait to gate channel enqueueing
(keep enqueue quick or increase buffer), and implement a manager-side
pending→failed transition with a timer for each job so unresponsive peers are
timed out and cleaned up rather than relying on the channel buffer to enforce
timeouts.

Comment on lines 317 to 387
func (am *DefaultAccountManager) CreatePeerJob(ctx context.Context, accountID, peerID, userID string, job *types.Job) error {
// todo: Create permissions for job
allowed, err := am.permissionsManager.ValidateUserPermissions(ctx, accountID, userID, modules.Peers, operations.Delete)
if err != nil {
return status.NewPermissionValidationError(err)
}
if !allowed {
return status.NewPermissionDeniedError()
}

peerAccountID, err := am.Store.GetAccountIDByPeerID(ctx, store.LockingStrengthNone, peerID)
if err != nil {
return err
}

if peerAccountID != accountID {
return status.NewPeerNotPartOfAccountError()
}

// check if peer connected
if !am.jobManager.IsPeerConnected(peerID) {
return status.Errorf(status.BadRequest, "peer not connected")
}

// check if already has pending jobs
// todo: The job checks here are not protected. The user can run this function from multiple threads,
// and each thread can think there is no job yet. This means entries in the pending job map will be overwritten,
// and only one will be kept, but potentially another one will overwrite it in the queue.
if am.jobManager.IsPeerHasPendingJobs(peerID) {
return status.Errorf(status.BadRequest, "peer already has pending job")
}

jobStream, err := job.ToStreamJobRequest()
if err != nil {
return status.Errorf(status.BadRequest, "invalid job request %v", err)
}

// try sending job first
if err := am.jobManager.SendJob(ctx, accountID, peerID, jobStream); err != nil {
return status.Errorf(status.Internal, "failed to send job: %v", err)
}

var peer *nbpeer.Peer
var eventsToStore func()

// persist job in DB only if send succeeded
err = am.Store.ExecuteInTransaction(ctx, func(transaction store.Store) error {
peer, err = transaction.GetPeerByID(ctx, store.LockingStrengthUpdate, accountID, peerID)
if err != nil {
return err
}
if err := transaction.CreatePeerJob(ctx, job); err != nil {
return err
}

jobMeta := map[string]any{
"for_peer_name": peer.Name,
"job_type": job.Workload.Type,
}

eventsToStore = func() {
am.StoreEvent(ctx, userID, peer.ID, accountID, activity.JobCreatedByUser, jobMeta)
}
return nil
})
if err != nil {
return err
}
eventsToStore()
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Peer job management: ownership check bug, nil‑safety, and ordering concerns

The new peer‑job methods are a good start, but there are a few issues worth addressing:

  1. Job ownership validation bug (path vs data mismatch)
    • GetPeerJobByID ignores the peerID argument when returning the job:
      job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)
      // no check that job.PeerID == peerID
    • The HTTP handler GET /peers/{peerId}/jobs/{jobId} passes both peerID and jobID, but the account manager currently only checks that the peer belongs to the account, not that the job belongs to that peer.
    • Result: a caller can request /peers/{peerA}/jobs/{jobOfPeerB} (within the same account) and see the job for peerB.
    • Fix: after loading the job, assert job.PeerID == peerID and return status.NotFound or status.PermissionDenied otherwise:
      func (am *DefaultAccountManager) GetPeerJobByID(ctx context.Context, accountID, userID, peerID, jobID string) (*types.Job, error) {
        // ... existing permission + peerAccountID checks ...
      
  •  job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)
    
  •  job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)
     if err != nil {
       return nil, err
     }
    
  •  if job.PeerID != peerID {
    
  •    return nil, status.NewPeerNotPartOfAccountError()
    
  •  }
     return job, nil
    
    }
    
    
  1. Potential nil‑pointer on am.jobManager

    • CreatePeerJob assumes am.jobManager is non‑nil:
      if !am.jobManager.IsPeerConnected(peerID) { ... }
      if am.jobManager.IsPeerHasPendingJobs(peerID) { ... }
      if err := am.jobManager.SendJob(...); err != nil { ... }
    • If BuildManager is ever called with a nil JobManager (e.g., in older tests or misconfigured environments), this will panic on first use of the job API.
    • Consider a defensive check up front:
      if am.jobManager == nil {
        return status.Errorf(status.Internal, "job manager is not configured")
      }
    • Or make the non‑nil requirement explicit and enforced in construction/tests.
  2. Send‑before‑persist ordering can desync DB and in‑memory job state

    • The method currently:
      1. Checks for pending jobs.
      2. Builds jobStream and calls SendJob, which will enqueue/send work to the peer.
      3. Only then writes the job to the DB via CreatePeerJob inside a transaction.
    • If step (3) fails (DB error, constraint, etc.), CreatePeerJob returns an error but the job has already been dispatched to the peer, potentially leaving:
      • A job executing on the client but absent from persistent storage and from /jobs listings.
    • Consider flipping the order:
      • Persist the job first (status = pending), then send it to the peer, and if sending fails, update the job to failed with an appropriate FailedReason. That keeps DB and runtime state consistent at the cost of occasionally having “failed to send” jobs in the DB.
  3. Permissions reused from Peers/Delete

    • Using modules.Peers + operations.Delete as the gate for creating and viewing jobs is strict but workable short‑term. The TODO about introducing dedicated job permissions is valid; when you add those, these sites will be the key ones to update.

Overall structure is solid; tightening ownership checks and jobManager assumptions will make the new job API much safer.

Also applies to: 389-441, 618-638

Comment on lines +164 to +201
// ApplyResponse validates and maps a proto.JobResponse into the Job fields.
func (j *Job) ApplyResponse(resp *proto.JobResponse) error {
if resp == nil {
return nil
}

j.ID = string(resp.ID)
now := time.Now().UTC()
j.CompletedAt = &now
switch resp.Status {
case proto.JobStatus_succeeded:
j.Status = JobStatusSucceeded
case proto.JobStatus_failed:
j.Status = JobStatusFailed
if len(resp.Reason) > 0 {
reason := string(resp.Reason)
if len(resp.Reason) > MaxJobReasonLength {
reason = string(resp.Reason[:MaxJobReasonLength]) + "... (truncated)"
}
j.FailedReason = fmt.Sprintf("Client error: '%s'", reason)
}
return nil
default:
return fmt.Errorf("unexpected job status: %v", resp.Status)
}

// Handle workload results (oneof)
var err error
switch r := resp.WorkloadResults.(type) {
case *proto.JobResponse_Bundle:
if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil {
return fmt.Errorf("failed to marshal workload results: %w", err)
}
default:
return fmt.Errorf("unsupported workload response type: %T", r)
}
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Likely JSON shape mismatch between proto.BundleResult and api.BundleResult

In ApplyResponse, for a succeeded bundle job you serialize the proto result directly:

case *proto.JobResponse_Bundle:
    if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil {
        return fmt.Errorf("failed to marshal workload results: %w", err)
    }

Later, buildBundleResponse does:

var r api.BundleResult
if err := json.Unmarshal(j.Workload.Result, &r); err != nil {
    return fmt.Errorf("invalid result for bundle job: %w", err)
}

However, the types differ:

  • proto.BundleResult has UploadKey String (a protobuf message type, not a plain string).
  • api.BundleResult has UploadKey *string.

Encoding a protobuf message with encoding/json will typically produce an object like:

{ "upload_key": { "value": "..." } }

whereas api.BundleResult expects:

{ "upload_key": "..." }

This will cause json.Unmarshal in buildBundleResponse to fail (cannot unmarshal object into Go value of type string), meaning succeeded jobs will not be convertible to HTTP WorkloadResponse.

I’d strongly recommend normalizing here by mapping the proto result into the HTTP shape instead of serializing the proto struct directly, e.g.:

case *proto.JobResponse_Bundle:
-   if j.Workload.Result, err = json.Marshal(r.Bundle); err != nil {
-       return fmt.Errorf("failed to marshal workload results: %w", err)
-   }
+   apiResult := api.BundleResult{}
+   if r.Bundle != nil {
+       // Assuming proto.String carries a `Value` or similar; adapt field access accordingly.
+       key := r.Bundle.UploadKey // map to string as appropriate
+       apiResult.UploadKey = &key
+   }
+   if j.Workload.Result, err = json.Marshal(apiResult); err != nil {
+       return fmt.Errorf("failed to marshal workload results: %w", err)
+   }

(Adjust field access/unwrap logic to whatever String actually exposes.)

This keeps the stored JSON shape aligned with api.BundleResult and avoids runtime errors in BuildWorkloadResponse.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62ae8d8 and de78255.

📒 Files selected for processing (3)
  • client/internal/connect.go (7 hunks)
  • client/internal/engine.go (13 hunks)
  • client/internal/engine_test.go (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
client/internal/engine_test.go (5)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
shared/management/client/mock.go (1)
  • MockClient (13-24)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (180-268)
management/internals/shared/grpc/server.go (1)
  • NewServer (84-147)
client/internal/engine.go (7)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
client/jobexec/executor.go (3)
  • Executor (23-24)
  • NewExecutor (26-28)
  • ErrJobNotImplemented (20-20)
shared/management/client/client.go (1)
  • Client (14-27)
shared/management/proto/management.pb.go (19)
  • JobRequest (388-398)
  • JobRequest (413-413)
  • JobRequest (428-430)
  • JobResponse (463-475)
  • JobResponse (490-490)
  • JobResponse (505-507)
  • JobStatus_failed (30-30)
  • JobRequest_Bundle (457-459)
  • JobRequest_Bundle (461-461)
  • JobStatus_succeeded (29-29)
  • BundleParameters (554-563)
  • BundleParameters (578-578)
  • BundleParameters (593-595)
  • SyncResponse (721-738)
  • SyncResponse (753-753)
  • SyncResponse (768-770)
  • BundleResult (625-631)
  • BundleResult (646-646)
  • BundleResult (661-663)
client/internal/state.go (1)
  • CtxGetState (31-33)
client/internal/debug/debug.go (2)
  • GeneratorDependencies (238-243)
  • BundleConfig (232-236)
client/server/server.go (1)
  • New (96-107)
client/internal/connect.go (3)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
shared/management/proto/management.pb.go (3)
  • PeerConfig (2082-2098)
  • PeerConfig (2113-2113)
  • PeerConfig (2128-2130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Management / Integration (amd64, postgres)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Client / Unit
  • GitHub Check: release
  • GitHub Check: release_ui_darwin
  • GitHub Check: Android / Build
  • GitHub Check: Linux
  • GitHub Check: Client / Unit
  • GitHub Check: Darwin
  • GitHub Check: Windows
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
🔇 Additional comments (16)
client/internal/connect.go (5)

65-66: LGTM! logPath parameter threading is clean.

The public Run signature now accepts logPath and forwards it correctly. Mobile variants appropriately default to empty string, which is sensible given platform-specific logging constraints.

Also applies to: 87-87, 105-105


108-108: LGTM!

Internal run signature correctly updated to receive logPath.


251-251: LGTM!

Call to createEngineConfig correctly passes logPath parameter.


275-275: LGTM!

NewEngine call correctly updated with profile config parameter, enabling debug bundle generation support.


414-414: LGTM! EngineConfig enriched for debug bundle support.

The function signature and field assignments correctly thread logPath and ProfileConfig through to EngineConfig. Note that the entire profilemanager.Config is stored in ProfileConfig—this appears intentional for debug bundle generation, which may need access to ManagementURL and other config details.

Also applies to: 449-452

client/internal/engine_test.go (3)

28-28: LGTM!

Import added to support JobManager usage in test setup.


256-256: LGTM! Test updates are consistent.

All NewEngine calls correctly updated with two additional nil parameters (checks, profileConfig). Using nil is appropriate for tests that don't exercise debug bundle or checks functionality.

Also applies to: 418-424, 637-643, 802-808, 1004-1010, 1536-1536


1595-1595: LGTM! Test infrastructure correctly wired.

JobManager created and threaded through BuildManager and NewServer, enabling job subsystem testing. Using nil for metrics is appropriate for test context.

Also applies to: 1626-1626, 1632-1632

client/internal/engine.go (8)

34-34: LGTM!

Imports added to support debug bundle generation and job execution.

Also applies to: 52-52


138-141: LGTM!

EngineConfig fields added to support debug bundle generation. Clear comment documents the purpose.


206-207: LGTM! Proper concurrency primitives added.

syncRespMux provides dedicated locking for sync response persistence, separate from syncMsgMux. jobExecutorWG ensures clean shutdown of job-related goroutines.

Also applies to: 220-221


234-234: LGTM!

NewEngine signature correctly extended with profile config parameter. jobExecutor initialized during construction for handling management-service job requests.

Also applies to: 254-254


335-335: LGTM! Proper shutdown ordering.

jobExecutorWG.Wait() ensures job-related goroutines complete before resource cleanup. The preceding e.cancel() will signal job goroutines to exit via context cancellation.


523-523: LGTM! receiveJobEvents follows established patterns.

The function correctly:

  • Increments jobExecutorWG before spawning goroutine
  • Handles job requests synchronously via callback
  • Triggers client restart on Job stream errors (consistent with Management/Signal event handlers)
  • Returns proper error codes for unimplemented job types

Also applies to: 942-977


803-814: LGTM! Proper lock separation for sync response persistence.

The code correctly uses syncRespMux (not syncMsgMux) to protect sync response persistence state. RLock for reading the flag, then separate write lock for storing the response—proper lock upgrade pattern.


1874-1886: LGTM! Thread-safe sync response persistence.

Both functions correctly use syncRespMux for protection. GetLatestSyncResponse properly:

  • Reads state under RLock
  • Releases lock before expensive clone operation
  • Returns a clone to prevent external mutation

Also applies to: 1889-1911

Comment on lines +979 to +1012
func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {
log.Infof("handle remote debug bundle request: %s", params.String())
syncResponse, err := e.GetLatestSyncResponse()
if err != nil {
log.Warnf("get latest sync response: %v", err)
}

bundleDeps := debug.GeneratorDependencies{
InternalConfig: e.config.ProfileConfig,
StatusRecorder: e.statusRecorder,
SyncResponse: syncResponse,
LogPath: e.config.LogPath,
}

bundleJobParams := debug.BundleConfig{
Anonymize: params.Anonymize,
IncludeSystemInfo: true,
LogFileCount: uint32(params.LogFileCount),
}

waitFor := time.Duration(params.BundleForTime) * time.Minute

uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
if err != nil {
return nil, err
}

response := &mgmProto.JobResponse_Bundle{
Bundle: &mgmProto.BundleResult{
UploadKey: uploadKey,
},
}
return response, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential nil pointer dereference on Line 1001.

The code accesses e.config.ProfileConfig.ManagementURL.String() without checking if ProfileConfig is nil. While tests pass nil, they don't exercise this path. For robustness, consider adding a nil check.

Apply this defensive check:

+	if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {
+		return nil, errors.New("profile config or management URL not available for bundle generation")
+	}
+
 	uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {
log.Infof("handle remote debug bundle request: %s", params.String())
syncResponse, err := e.GetLatestSyncResponse()
if err != nil {
log.Warnf("get latest sync response: %v", err)
}
bundleDeps := debug.GeneratorDependencies{
InternalConfig: e.config.ProfileConfig,
StatusRecorder: e.statusRecorder,
SyncResponse: syncResponse,
LogPath: e.config.LogPath,
}
bundleJobParams := debug.BundleConfig{
Anonymize: params.Anonymize,
IncludeSystemInfo: true,
LogFileCount: uint32(params.LogFileCount),
}
waitFor := time.Duration(params.BundleForTime) * time.Minute
uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
if err != nil {
return nil, err
}
response := &mgmProto.JobResponse_Bundle{
Bundle: &mgmProto.BundleResult{
UploadKey: uploadKey,
},
}
return response, nil
}
func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {
log.Infof("handle remote debug bundle request: %s", params.String())
syncResponse, err := e.GetLatestSyncResponse()
if err != nil {
log.Warnf("get latest sync response: %v", err)
}
bundleDeps := debug.GeneratorDependencies{
InternalConfig: e.config.ProfileConfig,
StatusRecorder: e.statusRecorder,
SyncResponse: syncResponse,
LogPath: e.config.LogPath,
}
bundleJobParams := debug.BundleConfig{
Anonymize: params.Anonymize,
IncludeSystemInfo: true,
LogFileCount: uint32(params.LogFileCount),
}
waitFor := time.Duration(params.BundleForTime) * time.Minute
if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {
return nil, errors.New("profile config or management URL not available for bundle generation")
}
uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
if err != nil {
return nil, err
}
response := &mgmProto.JobResponse_Bundle{
Bundle: &mgmProto.BundleResult{
UploadKey: uploadKey,
},
}
return response, nil
}
🤖 Prompt for AI Agents
In client/internal/engine.go around lines 979 to 1012, the call to
e.config.ProfileConfig.ManagementURL.String() can panic if ProfileConfig or its
ManagementURL is nil; add a defensive nil-check before calling String() and pass
a safe fallback (e.g., empty string or configured default) into
e.jobExecutor.BundleJob. Specifically, ensure e.config and
e.config.ProfileConfig are non-nil and that ProfileConfig.ManagementURL is
non-nil, compute a local managementURL string variable accordingly, and use that
variable in the BundleJob call so the function never dereferences a nil pointer.

}

// GetJobByID fetches job by ID
func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, jobID string) (*types.Job, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should also verify the peer this is coming from


func (s *Server) startResponseReceiver(ctx context.Context, srv proto.ManagementService_JobServer) {
go func() {
for {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should also check if the ctx is done here or Err() != nil

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/internal/engine.go (1)

1295-1326: Fix compile error: zone.GetSearchDomainDisabled is not defined on mgmProto.CustomZone

toDNSConfig currently does:

dnsZone := nbdns.CustomZone{
    Domain:               zone.GetDomain(),
    SearchDomainDisabled: zone.GetSearchDomainDisabled(),
    SkipPTRProcess:       zone.GetSkipPTRProcess(),
}

The pipeline error indicates GetSearchDomainDisabled does not exist on *mgmProto.CustomZone, so this code does not compile. You need to either:

  • Add a corresponding search_domain_disabled (or similarly named) field to the CustomZone message in shared/management/proto/management.proto and regenerate the Go bindings, or
  • Drop this mapping for now and rely on the default value in nbdns.CustomZone.

If you just want the current code to build without changing the proto, adjust as:

-        dnsZone := nbdns.CustomZone{
-            Domain:               zone.GetDomain(),
-            SearchDomainDisabled: zone.GetSearchDomainDisabled(),
-            SkipPTRProcess:       zone.GetSkipPTRProcess(),
-        }
+        dnsZone := nbdns.CustomZone{
+            Domain:         zone.GetDomain(),
+            SkipPTRProcess: zone.GetSkipPTRProcess(),
+        }

and reintroduce the mapping once the proto field exists.

♻️ Duplicate comments (1)
client/internal/engine.go (1)

958-993: Nil pointer risk when using ProfileConfig.ManagementURL in bundle job handler

handleBundle calls:

uploadKey, err := e.jobExecutor.BundleJob(
    e.ctx, bundleDeps, bundleJobParams, waitFor,
    e.config.ProfileConfig.ManagementURL.String(),
)

without checking whether e.config.ProfileConfig or ProfileConfig.ManagementURL are nil. Either being nil will panic when a bundle job arrives.

Add a defensive check before constructing bundleDeps and the URL, and return a failed job response if the configuration is missing. For example:

 func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {
@@
-    syncResponse, err := e.GetLatestSyncResponse()
+    syncResponse, err := e.GetLatestSyncResponse()
@@
-    bundleDeps := debug.GeneratorDependencies{
-        InternalConfig: e.config.ProfileConfig,
+    if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {
+        return nil, errors.New("profile config or management URL not available for bundle generation")
+    }
+
+    bundleDeps := debug.GeneratorDependencies{
+        InternalConfig: e.config.ProfileConfig,
@@
-    waitFor := time.Duration(params.BundleForTime) * time.Minute
-
-    uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
+    waitFor := time.Duration(params.BundleForTime) * time.Minute
+    managementURL := e.config.ProfileConfig.ManagementURL.String()
+
+    uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, managementURL)

This ensures the job handler fails gracefully instead of panicking the client.

Also applies to: 995-1028

🧹 Nitpick comments (7)
shared/management/client/client_test.go (1)

76-76: Consider passing metrics to JobManager instead of nil.

The JobManager is created with nil for the metrics parameter, but a valid metrics object is created just below on lines 100-101. Consider reordering the initialization to create metrics first, then pass it to NewJobManager, unless nil metrics is intentionally acceptable for test scenarios.

Apply this diff to use the actual metrics object:

+	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
+	require.NoError(t, err)
+
-	jobManager := job.NewJobManager(nil, store)
+	jobManager := job.NewJobManager(metrics, store)
 	eventStore := &activity.InMemoryEventStore{}
 
 	ctrl := gomock.NewController(t)
 	t.Cleanup(ctrl.Finish)
 
 	permissionsManagerMock := permissions.NewMockManager(ctrl)
 	permissionsManagerMock.
 		EXPECT().
 		ValidateUserPermissions(
 			gomock.Any(),
 			gomock.Any(),
 			gomock.Any(),
 			gomock.Any(),
 			gomock.Any(),
 		).
 		Return(true, nil).
 		AnyTimes()
 
 	peersManger := peers.NewManager(store, permissionsManagerMock)
 	settingsManagerMock := settings.NewMockManager(ctrl)
 
 	ia, _ := integrations.NewIntegratedValidator(context.Background(), peersManger, settingsManagerMock, eventStore)
 
-	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
-	require.NoError(t, err)
-
shared/management/proto/management.proto (1)

91-96: Consider adding documentation for BundleParameters fields.

The field names could benefit from proto comments for clarity:

  • bundle_for is ambiguous—it's unclear what this flag enables without context from the HTTP API docs
  • bundle_for_time should document its unit (minutes, based on the HTTP API)
 message BundleParameters {
-  bool   bundle_for       = 1;
-  int64  bundle_for_time  = 2;
+  // Whether to generate a bundle for a specific timeframe.
+  bool   bundle_for       = 1;
+  // Time period in minutes for which to generate the bundle.
+  int64  bundle_for_time  = 2;
   int32  log_file_count   = 3;
+  // Whether sensitive data should be anonymized in the bundle.
   bool   anonymize        = 4;
 }
management/server/account/manager.go (1)

126-128: Normalize parameter ordering for peer job Manager methods

CreatePeerJob takes (accountID, peerID, userID string, ...), while GetAllPeerJobs and GetPeerJobByID use (accountID, userID, peerID string, ...). This asymmetry is easy to trip over in implementations and mocks.

Consider standardizing on one ordering (e.g. ctx, accountID, userID, peerID, ...) for all three methods and updating implementations accordingly.

client/internal/engine_test.go (1)

239-257: Consider centralizing NewEngine test construction to avoid duplicated nil, nil arguments

All the NewEngine(...) calls now end with ..., peer.NewRecorder("https://mgm"), nil, nil). This is correct for the current signature, but the repeated trailing nil, nil across multiple tests makes future signature changes easy to miss.

Consider a small helper (e.g. newTestEngine(t, ctx, cfg, mgmtClientOpts...)) that sets up the common arguments (including the last two) so only that helper needs updating if NewEngine evolves again.

Also applies to: 417-425, 637-644, 801-809, 1003-1011, 1536-1537

management/internals/shared/grpc/server.go (2)

55-77: Consider guarding against a nil jobManager in server construction

Server now depends on jobManager (CreateJobChannel, CloseChannel, HandleResponse), but NewServer accepts a *job.Manager without validating it. If any caller passes nil, the Job RPC will panic. Consider either:

  • Requiring non-nil and returning an error from NewServer if jobManager == nil, or
  • Making the Job handler explicitly reject calls when jobManager is nil with a gRPC Internal/Unavailable error.

This keeps failures explicit instead of panicking.

Also applies to: 79-132


175-210: Job streaming flow is coherent; error handling strategy is acceptable but could be tightened

The Job RPC handshake (decrypt first message, resolve account/peer, set context keys), background response receiver (startResponseReceiver), and sendJobsLoop/sendJob pipeline all look logically consistent and reuse existing encryption and context patterns from Sync. The goroutine cleanly exits on io.EOF/context.Canceled, and the job channel is closed via defer CloseChannel.

The only notable tradeoff is that sendJobsLoop logs and returns nil on sendJob failures, so the client sees a successful stream close rather than an error status; you already left a TODO to refine this. Given that, the current implementation is reasonable for an initial version.

Also applies to: 327-390, 450-473

client/internal/engine.go (1)

235-255: Unused c *profilemanager.Config parameter in NewEngine

NewEngine accepts a c *profilemanager.Config argument but doesn’t use it; all configuration is read from config *EngineConfig instead. Unless future work will use c, consider removing this parameter to avoid confusion or, alternatively, wiring c into config.ProfileConfig here.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de78255 and 3e53c01.

⛔ Files ignored due to path filters (2)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (29)
  • client/cmd/testutil_test.go (4 hunks)
  • client/internal/engine.go (13 hunks)
  • client/internal/engine_test.go (10 hunks)
  • client/proto/daemon.proto (0 hunks)
  • client/server/server.go (4 hunks)
  • client/server/server_test.go (4 hunks)
  • go.mod (2 hunks)
  • management/internals/server/boot.go (1 hunks)
  • management/internals/server/controllers.go (3 hunks)
  • management/internals/server/modules.go (1 hunks)
  • management/internals/shared/grpc/server.go (10 hunks)
  • management/server/account.go (4 hunks)
  • management/server/account/manager.go (1 hunks)
  • management/server/account_test.go (2 hunks)
  • management/server/activity/codes.go (2 hunks)
  • management/server/dns_test.go (2 hunks)
  • management/server/http/handlers/peers/peers_handler.go (3 hunks)
  • management/server/http/testing/testing_tools/channel/channel.go (3 hunks)
  • management/server/management_proto_test.go (4 hunks)
  • management/server/management_test.go (3 hunks)
  • management/server/mock_server/account_mock.go (2 hunks)
  • management/server/nameserver_test.go (2 hunks)
  • management/server/peer.go (1 hunks)
  • management/server/peer_test.go (5 hunks)
  • management/server/route_test.go (2 hunks)
  • shared/management/client/client_test.go (4 hunks)
  • shared/management/http/api/openapi.yml (2 hunks)
  • shared/management/http/api/types.gen.go (8 hunks)
  • shared/management/proto/management.proto (2 hunks)
💤 Files with no reviewable changes (1)
  • client/proto/daemon.proto
🚧 Files skipped from review as they are similar to previous changes (12)
  • management/server/account_test.go
  • management/server/dns_test.go
  • management/server/account.go
  • management/internals/server/modules.go
  • management/server/route_test.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/peer_test.go
  • client/cmd/testutil_test.go
  • management/server/activity/codes.go
  • management/server/management_test.go
  • management/server/peer.go
  • management/server/nameserver_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • shared/management/client/client_test.go
🧬 Code graph analysis (10)
management/server/account/manager.go (1)
management/server/types/job.go (1)
  • Job (34-58)
client/server/server_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
management/internals/server/boot.go (1)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
management/server/mock_server/account_mock.go (1)
management/server/types/job.go (1)
  • Job (34-58)
management/server/management_proto_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
shared/management/client/client_test.go (3)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (178-266)
management/internals/shared/grpc/server.go (1)
  • NewServer (80-132)
client/server/server.go (1)
client/status/status.go (1)
  • ToProtoFullStatus (549-635)
management/internals/server/controllers.go (3)
management/internals/server/server.go (1)
  • BaseServer (45-68)
management/server/job/manager.go (2)
  • Manager (23-30)
  • NewJobManager (32-42)
management/internals/server/container.go (1)
  • Create (6-10)
management/server/http/handlers/peers/peers_handler.go (4)
management/server/context/auth.go (1)
  • GetUserAuthFromContext (25-30)
shared/management/http/util/util.go (3)
  • WriteError (84-120)
  • WriteErrorResponse (70-80)
  • WriteJSONObject (27-35)
shared/management/http/api/types.gen.go (3)
  • JobRequest (708-710)
  • JobResponse (713-721)
  • JobResponseStatus (724-724)
management/server/types/job.go (3)
  • NewJob (67-103)
  • Job (34-58)
  • Workload (60-64)
client/internal/engine_test.go (3)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/server/account.go (1)
  • BuildManager (178-266)
🪛 GitHub Actions: Linux
client/internal/engine.go

[error] 1312-1312: zone.GetSearchDomainDisabled undefined (type *"github.com/netbirdio/netbird/shared/management/proto".CustomZone has no field or method GetSearchDomainDisabled)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Android / Build
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: release_ui
  • GitHub Check: JS / Lint
  • GitHub Check: iOS / Build
  • GitHub Check: Linux
  • GitHub Check: Darwin
  • GitHub Check: Windows
  • GitHub Check: Client / Unit
  • GitHub Check: Check External GPL/AGPL Licenses
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
🔇 Additional comments (21)
shared/management/client/client_test.go (2)

27-27: LGTM!

The import addition is necessary to support JobManager instantiation in the test setup.


123-123: LGTM!

The BuildManager call correctly includes the jobManager parameter in the expected position (5th parameter).

management/internals/server/controllers.go (1)

29-33: LGTM!

The JobManager() accessor correctly follows the established lazy initialization pattern using the Create helper, consistent with other accessors like PeersUpdateManager() and IntegratedValidator().

shared/management/proto/management.proto (2)

51-53: LGTM!

The bidirectional streaming RPC correctly follows the encrypted message pattern used by other management service endpoints, maintaining consistency in the API design.


75-89: Potential inconsistency between proto and HTTP API status values.

The JobStatus enum here defines unknown_status, succeeded, and failed, but the HTTP API types (JobResponseStatus in types.gen.go) include an additional pending status. If jobs can be in a pending state before completion, consider adding it to the proto enum for consistency:

 enum JobStatus {
   unknown_status = 0; //placeholder
-  succeeded = 1;
-  failed = 2;
+  pending = 1;
+  succeeded = 2;
+  failed = 3;
 }

Also, Reason on line 84 uses PascalCase while proto convention typically uses snake_case (reason).

shared/management/http/api/types.gen.go (1)

1-4: This is auto-generated code.

This file is generated by oapi-codegen. Any issues with the types or structure should be addressed in the source OpenAPI specification rather than this file.

management/internals/server/boot.go (1)

148-148: LGTM!

The JobManager is correctly wired into the gRPC server construction, following the established dependency injection pattern. The parameter position matches the NewServer function signature in management/internals/shared/grpc/server.go.

management/server/management_proto_test.go (2)

343-343: LGTM!

The JobManager instantiation with nil metrics is acceptable for test purposes—the implementation handles nil metrics gracefully.


372-373: LGTM!

The test correctly wires jobManager into both BuildManager and nbgrpc.NewServer, matching the updated function signatures and maintaining consistency with the production code in boot.go.

Also applies to: 386-386

client/server/server.go (2)

31-33: Status conversion now correctly uses shared helper

Switching to nbstatus.ToProtoFullStatus(fullStatus) centralizes the in‑memory→proto mapping and keeps this server aligned with the status package; the subsequent mutation of Events and SshServerState still fits the previous behaviour.

Also applies to: 1080-1088


1537-1545: ConnectClient.Run wiring matches new signature

Passing s.logFile into s.connectClient.Run(runningChan, s.logFile) correctly adapts to the new Run signature and ensures the engine has access to the daemon’s log path without altering existing retry semantics.

shared/management/http/api/openapi.yml (3)

35-38: Jobs tag fits existing API categorisation

Adding the Jobs tag (marked x-experimental) matches the tagging style of other feature areas and clearly scopes the new endpoints.


41-156: Workload and Job schemas are consistent and extensible

The WorkloadType/Bundle* schemas plus WorkloadRequest/Response and JobRequest/JobResponse are structured cleanly:

  • Discriminator on type with bundle mapping is valid and forward‑extensible.
  • JobResponse required fields avoid nullable properties (completed_at, failed_reason) while guaranteeing id, status, triggered_by, and workload.
  • Bundle parameter/result fields match the described bundle use case.

No structural issues from an OpenAPI standpoint.


2353-2456: Peer Job endpoints integrate cleanly into peers API

/api/peers/{peerId}/jobs (GET/POST) and /api/peers/{peerId}/jobs/{jobId} (GET):

  • Use the new JobRequest/JobResponse schemas correctly.
  • Reuse BearerAuth and TokenAuth security consistent with other peer operations.
  • Return an array for list, 201 with a single JobResponse for create, and 200 for get.

The shape and placement of these endpoints look correct and consistent with the rest of the spec.

client/server/server_test.go (1)

23-24: JobManager wiring in management test server matches production setup

Creating a jobManager := job.NewJobManager(nil, store) and threading it into server.BuildManager(...) and nbgrpc.NewServer(...) keeps the test server aligned with the new job‑enabled construction path without altering existing behaviour of this test.

Also applies to: 298-332

client/internal/engine_test.go (1)

27-29: Engine tests correctly integrate JobManager into management server startup

Importing management/server/job, constructing jobManager := job.NewJobManager(nil, store), and passing it through to server.BuildManager(...) and nbgrpc.NewServer(...) keeps this test’s management server in sync with the new job‑aware construction path and should let engine tests run against the same wiring as production.

Also applies to: 1595-1636

management/server/http/handlers/peers/peers_handler.go (2)

31-41: Peer job HTTP endpoints and handlers look consistent with existing patterns

The new /peers/{peerId}/jobs and /peers/{peerId}/jobs/{jobId} routes plus CreateJob, ListJobs, and GetJob follow the existing handler conventions: user auth from context, mux.Vars for IDs, JSON decoding with proper 400 on parse failure, and error propagation via util.WriteError. Delegation to accountManager for authorization and data access is appropriate. I don’t see correctness or security issues in this flow.

Also applies to: 51-142


594-614: Job→API response mapping is correct and defensive

toSingleJobResponse builds api.JobResponse from types.Job cleanly, including building the workload via BuildWorkloadResponse, handling optional FailedReason, and preserving timestamps and status. This matches the shared API types and is safe as long as BuildWorkloadResponse returns a non-nil workload when err == nil, which it’s already checked for.

management/server/mock_server/account_mock.go (1)

132-155: Job-related mock hooks are wired correctly

The new CreatePeerJobFunc, GetAllPeerJobsFunc, and GetPeerJobByIDFunc fields and their methods mirror the existing mock style (delegate if set, otherwise return codes.Unimplemented). This should integrate cleanly with the expanded account.Manager interface and make job behavior easily configurable in tests.

client/internal/engine.go (2)

205-212: Sync response persistence refactor with syncRespMux looks correct

The move from using syncMsgMux to a dedicated syncRespMux for persistSyncResponse/latestSyncResponse removes the race between sync handling and debug bundle reads:

  • handleSync now checks persistSyncResponse under RLock and conditionally writes latestSyncResponse under Lock.
  • SetSyncResponsePersistence and GetLatestSyncResponse both operate entirely under syncRespMux, with GetLatestSyncResponse cloning before returning.

The lock usage avoids deadlocks (no upgrade while holding RLock) and keeps sync message processing serialized via syncMsgMux as before.

Also applies to: 770-841, 1891-1929


958-993: Job event client wiring matches existing management/signal patterns

receiveJobEvents mirrors the existing receiveManagementEvents/receiveSignalEvents patterns: it uses a dedicated wait group, runs mgmClient.Job with the engine context, maps terminal errors to a global client restart (ErrResetConnection + clientCancel), and defaults job status to failed unless the handler succeeds. That’s a good, conservative default and makes job-stream failures consistent with other control-plane failures.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
shared/management/client/client_test.go (1)

76-135: Construct jobManager with real metrics instead of nil to match production wiring.

Right now the job manager is created with a nil metrics handle even though metrics is initialized later via telemetry.NewDefaultAppMetrics. To avoid potential nil dereferences inside the job manager and to keep the test server wiring closer to production, it’s better to build the job manager with the real metrics instance after it’s created. The rest of the wiring (NewController, BuildManager, nbgrpc.NewServer) matches the signatures shown in the referenced snippets. Based on relevant_code_snippets from management/server/job/manager.go, management/server/account.go, and management/internals/shared/grpc/server.go.

You could adjust as follows:

-	jobManager := job.NewJobManager(nil, store)
-	eventStore := &activity.InMemoryEventStore{}
+	eventStore := &activity.InMemoryEventStore{}
@@
-	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
-	require.NoError(t, err)
+	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
+	require.NoError(t, err)
+	jobManager := job.NewJobManager(metrics, store)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e53c01 and 80b6854.

⛔ Files ignored due to path filters (1)
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (1)
  • shared/management/client/client_test.go (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • shared/management/client/client_test.go
🧬 Code graph analysis (1)
shared/management/client/client_test.go (4)
management/server/job/manager.go (1)
  • NewJobManager (32-42)
management/internals/controllers/network_map/controller/controller.go (1)
  • NewController (75-111)
management/internals/modules/peers/ephemeral/manager/ephemeral.go (1)
  • NewEphemeralManager (53-61)
management/server/account.go (1)
  • BuildManager (178-266)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: release_ui
  • GitHub Check: release
  • GitHub Check: release_ui_darwin
  • GitHub Check: Build Cache
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
  • GitHub Check: Darwin
  • GitHub Check: Linux
  • GitHub Check: Windows
  • GitHub Check: iOS / Build
  • GitHub Check: Android / Build
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
🔇 Additional comments (1)
shared/management/client/client_test.go (1)

20-28: New imports for ephemeral manager and job manager look correct and are properly used.

The ephemeral_manager alias and job package are both referenced later in startManagement (for NewEphemeralManager and NewJobManager respectively), so the import changes are consistent and compile-time safe.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1d594b and 469e147.

📒 Files selected for processing (2)
  • management/server/job/manager.go (1 hunks)
  • management/server/management_test.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • management/server/management_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
management/server/job/manager.go (3)
management/server/job/channel.go (2)
  • Channel (18-21)
  • NewChannel (23-29)
management/server/store/store.go (1)
  • Store (50-211)
management/server/types/job.go (1)
  • Job (34-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Management / Integration (amd64, sqlite)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Integration (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Signal / Unit (386)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: Windows
  • GitHub Check: Darwin
  • GitHub Check: Linux
  • GitHub Check: Android / Build
  • GitHub Check: Client / Unit
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: release_ui
  • GitHub Check: JS / Lint
🔇 Additional comments (5)
management/server/job/manager.go (5)

17-21: LGTM!

The Event struct is a clean container for tracking job request/response pairs.


23-30: LGTM!

The Manager struct is properly designed with appropriate fields for concurrent job management.


32-42: LGTM!

The constructor properly initializes all Manager fields with sensible defaults.


153-159: LGTM!

The method correctly checks peer connection status with appropriate read lock usage.


161-171: LGTM!

The method correctly checks for pending jobs with appropriate read lock usage. The O(n) scan is acceptable for typical workloads.

Comment on lines 44 to 62
// CreateJobChannel creates or replaces a channel for a peer
func (jm *Manager) CreateJobChannel(ctx context.Context, accountID, peerID string) *Channel {
// all pending jobs stored in db for this peer should be failed
if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Pending job cleanup: marked as failed automatically due to being stuck too long"); err != nil {
log.WithContext(ctx).Error(err.Error())
}

jm.mu.Lock()
defer jm.mu.Unlock()

if ch, ok := jm.jobChannels[peerID]; ok {
ch.Close()
delete(jm.jobChannels, peerID)
}

ch := NewChannel()
jm.jobChannels[peerID] = ch
return ch
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Race condition: Store operation before lock acquisition.

Lines 46-49 call MarkPendingJobsAsFailed before acquiring the lock at line 51. This creates a race with other methods (e.g., SendJob, HandleResponse) that may be modifying the pending map or the peer's state concurrently. The store operation and in-memory cleanup should be atomic.

Consider acquiring the lock before the store operation, or ensure the store call and subsequent in-memory cleanup are properly coordinated:

 func (jm *Manager) CreateJobChannel(ctx context.Context, accountID, peerID string) *Channel {
+	jm.mu.Lock()
+	defer jm.mu.Unlock()
+
 	// all pending jobs stored in db for this peer should be failed
 	if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Pending job cleanup: marked as failed automatically due to being stuck too long"); err != nil {
 		log.WithContext(ctx).Error(err.Error())
 	}
 
-	jm.mu.Lock()
-	defer jm.mu.Unlock()
-
 	if ch, ok := jm.jobChannels[peerID]; ok {
 		ch.Close()
 		delete(jm.jobChannels, peerID)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// CreateJobChannel creates or replaces a channel for a peer
func (jm *Manager) CreateJobChannel(ctx context.Context, accountID, peerID string) *Channel {
// all pending jobs stored in db for this peer should be failed
if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Pending job cleanup: marked as failed automatically due to being stuck too long"); err != nil {
log.WithContext(ctx).Error(err.Error())
}
jm.mu.Lock()
defer jm.mu.Unlock()
if ch, ok := jm.jobChannels[peerID]; ok {
ch.Close()
delete(jm.jobChannels, peerID)
}
ch := NewChannel()
jm.jobChannels[peerID] = ch
return ch
}
// CreateJobChannel creates or replaces a channel for a peer
func (jm *Manager) CreateJobChannel(ctx context.Context, accountID, peerID string) *Channel {
jm.mu.Lock()
defer jm.mu.Unlock()
// all pending jobs stored in db for this peer should be failed
if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Pending job cleanup: marked as failed automatically due to being stuck too long"); err != nil {
log.WithContext(ctx).Error(err.Error())
}
if ch, ok := jm.jobChannels[peerID]; ok {
ch.Close()
delete(jm.jobChannels, peerID)
}
ch := NewChannel()
jm.jobChannels[peerID] = ch
return ch
}
🤖 Prompt for AI Agents
management/server/job/manager.go lines 44-62: the call to
jm.Store.MarkPendingJobsAsFailed runs before jm.mu is acquired, causing a race
with in-memory maps; move the critical section so the mutex is locked before
coordinating the store update and in-memory cleanup: acquire jm.mu.Lock() at the
top of the function (or at least before any access/modification of
jm.jobChannels/pending state), defer jm.mu.Unlock(), then call
jm.Store.MarkPendingJobsAsFailed while holding the lock and proceed with
closing/deleting the old channel and creating the new one; if holding the lock
during the store call is unacceptable for latency, introduce a dedicated
finer-grained lock or perform an in-memory state flip under the lock and persist
the change afterwards in a safe, coordinated way.

Comment on lines +64 to +88
// SendJob sends a job to a peer and tracks it as pending
func (jm *Manager) SendJob(ctx context.Context, accountID, peerID string, req *proto.JobRequest) error {
jm.mu.RLock()
ch, ok := jm.jobChannels[peerID]
jm.mu.RUnlock()
if !ok {
return fmt.Errorf("peer %s has no channel", peerID)
}

event := &Event{
PeerID: peerID,
Request: req,
}

jm.mu.Lock()
jm.pending[string(req.ID)] = event
jm.mu.Unlock()

if err := ch.AddEvent(ctx, jm.responseWait, event); err != nil {
jm.cleanup(ctx, accountID, string(req.ID), err.Error())
return err
}

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find the exact location and check file size
find . -name "manager.go" -path "*/job/*" | head -5

Repository: netbirdio/netbird

Length of output: 95


🏁 Script executed:

#!/bin/bash
# Get the file to understand the context
wc -l management/server/job/manager.go

Repository: netbirdio/netbird

Length of output: 97


🏁 Script executed:

#!/bin/bash
# Read the SendJob method and surrounding code
head -150 management/server/job/manager.go | tail -100

Repository: netbirdio/netbird

Length of output: 2890


🏁 Script executed:

#!/bin/bash
# Read the full manager.go file to understand Channel type and AddEvent
cat -n management/server/job/manager.go

Repository: netbirdio/netbird

Length of output: 5836


🏁 Script executed:

#!/bin/bash
# Search for the Channel type definition and AddEvent method
fd "channel" --type f -e go | xargs grep -l "func.*AddEvent" | head -5

Repository: netbirdio/netbird

Length of output: 93


🏁 Script executed:

#!/bin/bash
# Search in common locations for Channel implementation
find . -name "*.go" -path "*/job/*" -exec grep -l "func.*AddEvent" {} \;

Repository: netbirdio/netbird

Length of output: 95


🏁 Script executed:

#!/bin/bash
# Read the Channel implementation
cat -n management/server/job/channel.go

Repository: netbirdio/netbird

Length of output: 1657


🏁 Script executed:

#!/bin/bash
# Check if there are any other places that might call CloseChannel or CreateJobChannel to understand call patterns
rg "CloseChannel|CreateJobChannel" --type go -B 2 -A 2

Repository: netbirdio/netbird

Length of output: 15401


🏁 Script executed:

#!/bin/bash
# Check what happens when sending on a closed channel in the current code
# to confirm it would panic
grep -n "jc.events <-" management/server/job/channel.go

Repository: netbirdio/netbird

Length of output: 89


🏁 Script executed:

#!/bin/bash
# Verify the actual call site in grpc server to understand the flow
grep -B 5 -A 15 "sendJobsLoop" management/internals/shared/grpc/server.go | head -40

Repository: netbirdio/netbird

Length of output: 1469


Race condition: Channel reference becomes stale after lock release.

After releasing the read lock at line 68, another goroutine can call CloseChannel (e.g., when a gRPC stream ends via sendJobsLoop's defer), which closes the channel and deletes it from the map. The ch reference obtained at line 67 then points to a closed channel. When AddEvent at line 82 attempts to send on the closed channel (line 38 of channel.go: jc.events <- event), it will panic with "send on closed channel".

This can occur in production when:

  1. CreateJobChannel creates a channel for a peer
  2. SendJob acquires a reference to the channel and releases the read lock
  3. The gRPC stream terminates and CloseChannel is called via the defer in sendJobsLoop
  4. SendJob resumes and calls AddEvent on the now-closed channel, causing a panic

Hold the lock across the entire operation or verify the channel still exists after acquiring the write lock:

 func (jm *Manager) SendJob(ctx context.Context, accountID, peerID string, req *proto.JobRequest) error {
-	jm.mu.RLock()
+	jm.mu.Lock()
+	defer jm.mu.Unlock()
+
 	ch, ok := jm.jobChannels[peerID]
-	jm.mu.RUnlock()
 	if !ok {
 		return fmt.Errorf("peer %s has no channel", peerID)
 	}
 
 	event := &Event{
 		PeerID:  peerID,
 		Request: req,
 	}
-
-	jm.mu.Lock()
 	jm.pending[string(req.ID)] = event
-	jm.mu.Unlock()
 
 	if err := ch.AddEvent(ctx, jm.responseWait, event); err != nil {
 		jm.cleanup(ctx, accountID, string(req.ID), err.Message())
 		return err
 	}
 
 	return nil
 }

Note: This assumes ch.AddEvent can be called safely while holding the Manager's lock. Since AddEvent uses a 5-minute timeout, a more defensive approach would be to re-check the channel exists in the map after acquiring the write lock, then release the lock before calling AddEvent.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 129 to 137
for jobID, ev := range jm.pending {
if ev.PeerID == peerID {
// if the client disconnect and there is pending job then marke it as failed
if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Time out peer disconnected"); err != nil {
log.WithContext(ctx).Errorf("failed to mark pending jobs as failed: %v", err)
}
delete(jm.pending, jobID)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redundant store calls: MarkPendingJobsAsFailed called in loop.

Line 132 calls MarkPendingJobsAsFailed for each pending job of the peer. This method likely marks ALL pending jobs for the peer in a single call, making the repeated calls redundant and potentially causing performance issues or unintended side effects.

Call the method once, outside the loop:

+	// Mark all pending jobs for this peer as failed in the store
+	if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Time out peer disconnected"); err != nil {
+		log.WithContext(ctx).Errorf("failed to mark pending jobs as failed: %v", err)
+	}
+
+	// Clean up in-memory pending entries
 	for jobID, ev := range jm.pending {
 		if ev.PeerID == peerID {
-			// if the client disconnect and there is pending job then marke it as failed
-			if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Time out peer disconnected"); err != nil {
-				log.WithContext(ctx).Errorf("failed to mark pending jobs as failed: %v", err)
-			}
 			delete(jm.pending, jobID)
 		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for jobID, ev := range jm.pending {
if ev.PeerID == peerID {
// if the client disconnect and there is pending job then marke it as failed
if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Time out peer disconnected"); err != nil {
log.WithContext(ctx).Errorf("failed to mark pending jobs as failed: %v", err)
}
delete(jm.pending, jobID)
}
}
// Mark all pending jobs for this peer as failed in the store
if err := jm.Store.MarkPendingJobsAsFailed(ctx, accountID, peerID, "Time out peer disconnected"); err != nil {
log.WithContext(ctx).Errorf("failed to mark pending jobs as failed: %v", err)
}
// Clean up in-memory pending entries
for jobID, ev := range jm.pending {
if ev.PeerID == peerID {
delete(jm.pending, jobID)
}
}
🤖 Prompt for AI Agents
In management/server/job/manager.go around lines 129 to 137, the code calls
jm.Store.MarkPendingJobsAsFailed inside a loop for every pending job of the same
peer which is redundant and inefficient; move the single call to
MarkPendingJobsAsFailed outside the loop so it runs once per peer (before or
after iterating to delete matching entries), and only delete jm.pending entries
in the loop — ensure the error handling remains and log the error if the single
store call fails.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
management/server/types/job.go (1)

120-138: Potential JSON shape mismatch between stored proto.BundleResult and api.BundleResult

ApplyResponse stores resp.WorkloadResults.(*proto.JobResponse_Bundle).Bundle by JSON‑marshalling the proto struct, while buildBundleResponse later unmarshals that blob into api.BundleResult. If proto.BundleResult.UploadKey is still a wrapped type (e.g. a protobuf String message) rather than a plain string, the JSON will be { "upload_key": { "value": "..." } }, which will fail to unmarshal into api.BundleResult’s UploadKey *string. Consider mapping the proto result into an api.BundleResult and marshalling that instead, or updating the proto field to a plain string, so the stored JSON shape matches the HTTP type.

To confirm the field type and JSON shape, please run:

#!/bin/bash
# Inspect BundleResult in the proto package and compare to HTTP BundleResult
rg -n "message BundleResult" shared/management/proto -n -C3 || true
rg -n "type BundleResult struct" shared/management/http/api -n -C5 || true

Also applies to: 164-201

management/server/job/manager.go (1)

24-32: Race: SendJob can panic sending on a channel closed by CloseChannel/CreateJobChannel, and old streams can close new channels

SendJob grabs a *Channel under RLock, releases the mutex, then calls ch.AddEvent, while CloseChannel and CreateJobChannel can concurrently call ch.Close() on the same channel under Lock. This leaves a window where AddEvent writes into a channel that has just been closed, which will panic with “send on closed channel.” Additionally, because CloseChannel looks up the channel only by peerID, a previous Job stream’s deferred CloseChannel can close and delete a new channel created for the same peer by a later Job stream. You should either (a) ensure channels are not closed while AddEvent may be sending (e.g. by coordinating via the same mutex or a per‑channel state flag), or (b) make AddEvent robust to a concurrently closed channel (e.g. recovering from the panic and returning a typed ErrJobChannelClosed, which SendJob already routes into cleanup).

Also applies to: 34-45, 47-65, 68-91, 130-149, 151-162

🧹 Nitpick comments (2)
management/internals/shared/grpc/server.go (1)

446-449: Minor: comment still refers to “update message” instead of “job message”

The comment above sendJob mentions “update message” and “sync server”, which is copy‑pasted from the Sync path; updating it to refer to “job message”/“Job server” would avoid confusion.

management/server/store/sql_store.go (1)

109-118: Job persistence methods are scoped correctly; consider checking RowsAffected in CompletePeerJob

CreatePeerJob, MarkPendingJobsAsFailed, MarkAllPendingJobsAsFailed, GetPeerJobByID, and GetPeerJobs all use appropriate account/peer scoping and statuses, which should keep job rows well‑bounded. In CompletePeerJob, however, a missing job ID currently yields a successful return even if no rows were updated; consider checking result.RowsAffected and returning a NotFound‑style error when it is zero to avoid silently losing job completions.

Also applies to: 136-157, 159-193, 195-225

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 469e147 and b1cbddd.

📒 Files selected for processing (7)
  • management/internals/modules/peers/manager.go (2 hunks)
  • management/internals/server/controllers.go (3 hunks)
  • management/internals/shared/grpc/server.go (10 hunks)
  • management/server/job/manager.go (1 hunks)
  • management/server/store/sql_store.go (4 hunks)
  • management/server/store/store.go (1 hunks)
  • management/server/types/job.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
management/internals/modules/peers/manager.go (1)
management/server/store/store.go (1)
  • LockingStrengthNone (47-47)
management/server/types/job.go (1)
shared/management/http/api/types.gen.go (7)
  • JobRequest (708-710)
  • WorkloadResponse (1948-1950)
  • BundleParameters (361-373)
  • BundleResult (376-378)
  • BundleWorkloadResponse (391-399)
  • WorkloadTypeBundle (195-195)
  • JobResponse (713-721)
management/server/job/manager.go (3)
management/server/job/channel.go (2)
  • Channel (18-21)
  • NewChannel (23-29)
management/server/store/store.go (1)
  • Store (50-213)
management/internals/modules/peers/manager.go (1)
  • Manager (23-33)
management/server/store/store.go (1)
management/server/types/job.go (1)
  • Job (34-58)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
  • Job (34-58)
  • JobStatusPending (18-18)
  • JobStatusFailed (20-20)
management/server/store/store.go (2)
  • LockingStrength (40-40)
  • LockingStrengthNone (47-47)
🔇 Additional comments (6)
management/server/types/job.go (1)

66-103: Job creation and bundle parameter validation look consistent

NewJob and validateAndBuildBundleParams correctly derive the workload type, enforce tight bounds on BundleForTime and LogFileCount, and persist a normalized JSON parameters blob; this should interoperate cleanly with buildStreamBundleResponse and the proto BundleParameters.

Also applies to: 140-162

management/internals/server/controllers.go (1)

29-33: JobManager wiring via BaseServer accessor looks correct

The JobManager() accessor follows the existing Create pattern and passes the expected dependencies (Metrics, Store, PeersManager) into job.NewJobManager, so the wiring is coherent with the rest of the server.

management/internals/modules/peers/manager.go (1)

23-33: GetPeerID helper cleanly delegates to store

Extending Manager with GetPeerID and delegating to store.GetPeerIDByKey with LockingStrengthNone is straightforward and keeps peer‑ID resolution centralized in the store.

Also applies to: 165-167

management/server/job/manager.go (1)

93-128: HandleResponse flow (pending lookup, peer validation, DB completion) looks sound

Within the manager lock, HandleResponse now looks up the pending event by jobID, resolves the responding peer ID via peersManager.GetPeerID, enforces equality with event.PeerID, applies the proto response onto a types.Job, persists with CompletePeerJob, and only then deletes the pending entry; this fixes earlier concerns about peer spoofing and premature pending cleanup.

management/internals/shared/grpc/server.go (1)

79-132: Job streaming wiring and handshake/response receiver are coherent

The Job handler’s handshake, account/peer lookup, response receiver startup, and per‑peer CreateJobChannel/sendJobsLoop integration follow the existing Sync patterns and correctly tie into job.Manager and the store; the response receiver also exits cleanly on EOF/context cancellation while validating and dispatching decrypted JobResponses.

Also applies to: 175-207, 324-337, 339-362, 364-386, 446-469

management/server/store/store.go (1)

40-48: Store interface extensions for jobs and GetPeerIDByKey look appropriate

Adding job lifecycle methods and GetPeerIDByKey to Store matches the SqlStore implementation and the new job/peers flows; just ensure any non‑SQL Store implementations are updated accordingly to avoid build breakage.

If you have other Store implementations (e.g. a FileStore), please confirm they now implement these methods as well.

Also applies to: 206-213

Comment on lines +364 to +386
func (s *Server) sendJobsLoop(ctx context.Context, accountID string, peerKey wgtypes.Key, peer *nbpeer.Peer, updates *job.Channel, srv proto.ManagementService_JobServer) error {
// todo figure out better error handling strategy
defer s.jobManager.CloseChannel(ctx, accountID, peer.ID)

for {
event, err := updates.Event(ctx)
if err != nil {
if errors.Is(err, job.ErrJobChannelClosed) {
log.WithContext(ctx).Debugf("jobs channel for peer %s was closed", peerKey.String())
return nil
}

// happens when connection drops, e.g. client disconnects
log.WithContext(ctx).Debugf("stream of peer %s has been closed", peerKey.String())
return ctx.Err()
}

if err := s.sendJob(ctx, peerKey, event, srv); err != nil {
log.WithContext(ctx).Warnf("send job failed: %v", err)
return nil
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Job send failures are swallowed, returning OK status to the client

In sendJobsLoop, if sendJob returns a non‑nil error (including internal encryption or Send failures), the error is only logged and the loop returns nil, so the gRPC Job call completes with an OK status even though the job delivery failed. Consider propagating the error (or mapping it to a suitable gRPC status) so callers and observability can distinguish successful from failed job streams.

Also applies to: 446-469

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
management/server/store/sql_store.go (1)

146-157: Address past review feedback: Verify peer ownership in CompletePeerJob.

As noted in past reviews, this method should verify that the job being completed actually belongs to the peer making the request. Currently, it only checks the job ID without validating the peer_id, which could allow a peer to complete another peer's job.

Apply this diff to add peer verification:

 func (s *SqlStore) CompletePeerJob(ctx context.Context, job *types.Job) error {
 	result := s.db.
 		Model(&types.Job{}).
-		Where(idQueryCondition, job.ID).
+		Where(accountAndIDQueryCondition+" AND id = ?", job.AccountID, job.PeerID, job.ID).
 		Updates(job)

 	if result.Error != nil {
 		log.WithContext(ctx).Errorf("failed to update job in store: %s", result.Error)
 		return status.Errorf(status.Internal, "failed to update job in store")
 	}
+
+	if result.RowsAffected == 0 {
+		return status.Errorf(status.NotFound, "job %s not found or does not belong to peer %s", job.ID, job.PeerID)
+	}
+
 	return nil
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1cbddd and 26eaac0.

📒 Files selected for processing (1)
  • management/server/store/sql_store.go (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
management/server/store/sql_store.go (2)
management/server/types/job.go (3)
  • Job (34-58)
  • JobStatusPending (18-18)
  • JobStatusFailed (20-20)
management/server/store/store.go (2)
  • LockingStrength (40-40)
  • LockingStrengthNone (47-47)
🔇 Additional comments (3)
management/server/store/sql_store.go (3)

43-52: LGTM: Constants follow existing patterns.

The new accountAndPeerIDQueryCondition constant is consistent with the existing query condition constants and will be used for job filtering operations.


117-117: LGTM: Job entity added to auto-migration.

The addition of &types.Job{} to the AutoMigrate call ensures the jobs table schema is properly created and maintained.


4221-4239: LGTM: GetPeerIDByKey correctly uses GetKeyQueryCondition helper.

The method properly uses GetKeyQueryCondition(s) at line 4230 to handle MySQL's reserved key identifier, addressing the concern raised in past reviews about MySQL compatibility.

Comment on lines +195 to +209
// GetJobByID fetches job by ID
func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, jobID string) (*types.Job, error) {
var job types.Job
err := s.db.
Where(accountAndIDQueryCondition, accountID, jobID).
First(&job).Error
if errors.Is(err, gorm.ErrRecordNotFound) {
return nil, status.Errorf(status.NotFound, "job %s not found", jobID)
}
if err != nil {
log.WithContext(ctx).Errorf("failed to fetch job from store: %s", err)
return nil, err
}
return &job, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Address past review feedback: Add peer ownership verification to GetPeerJobByID.

As noted in past reviews, this method should verify that the job belongs to a specific peer when called from a peer context. Currently, it only checks accountID and jobID, which could allow any authenticated entity in the account to view other peers' jobs.

Consider either:

  1. Adding a peerID parameter and verifying ownership, OR
  2. Renaming this method to indicate it's for admin/management access only (e.g., GetJobByIDForAccount)

If peer verification is required, apply this diff:

-func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, jobID string) (*types.Job, error) {
+func (s *SqlStore) GetPeerJobByID(ctx context.Context, accountID, peerID, jobID string) (*types.Job, error) {
 	var job types.Job
 	err := s.db.
-		Where(accountAndIDQueryCondition, accountID, jobID).
+		Where(accountAndPeerIDQueryCondition+" AND id = ?", accountID, peerID, jobID).
 		First(&job).Error
 	if errors.Is(err, gorm.ErrRecordNotFound) {
 		return nil, status.Errorf(status.NotFound, "job %s not found", jobID)
 	}

Committable suggestion skipped: line range outside the PR's diff.

return "", fmt.Errorf("upload debug bundle: %w", err)
}

log.Infof("debug bundle has been generated well")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Infof("debug bundle has been generated well")
log.Infof("debug bundle has been generated successfully")

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
client/jobexec/executor.go (1)

63-63: Fix grammatical error in log message.

The message "generated well" is grammatically incorrect and misleading since at this point both generation and upload have succeeded.

-	log.Infof("debug bundle has been generated well")
+	log.Infof("debug bundle has been uploaded successfully")
🧹 Nitpick comments (1)
client/jobexec/executor.go (1)

51-61: Consider retry-friendly cleanup strategy on upload failure.

Currently, the deferred os.Remove deletes the bundle file regardless of upload outcome. If the upload fails, the generated bundle is lost, requiring full regeneration for any retry attempt.

Consider keeping the file on upload failure to enable retry, or implementing retry logic within this method before cleanup.

-	defer func() {
-		if err := os.Remove(path); err != nil {
-			log.Errorf("failed to remove debug bundle file: %v", err)
-		}
-	}()
-
 	key, err := debug.UploadDebugBundle(ctx, types.DefaultBundleURL, mgmURL, path)
 	if err != nil {
 		log.Errorf("failed to upload debug bundle: %v", err)
 		return "", fmt.Errorf("upload debug bundle: %w", err)
 	}
+
+	if err := os.Remove(path); err != nil {
+		log.Errorf("failed to remove debug bundle file: %v", err)
+	}

Alternatively, implement retry logic before cleanup if transient upload failures are expected.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26eaac0 and c71c0b6.

📒 Files selected for processing (1)
  • client/jobexec/executor.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-13T00:29:53.247Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.247Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.

Applied to files:

  • client/jobexec/executor.go
📚 Learning: 2025-11-14T13:05:31.729Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.729Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.

Applied to files:

  • client/jobexec/executor.go
🧬 Code graph analysis (1)
client/jobexec/executor.go (3)
client/internal/debug/debug.go (3)
  • GeneratorDependencies (238-243)
  • BundleConfig (232-236)
  • NewBundleGenerator (245-264)
client/internal/debug/upload.go (1)
  • UploadDebugBundle (17-28)
upload-server/types/upload.go (1)
  • DefaultBundleURL (11-11)
🔇 Additional comments (5)
client/jobexec/executor.go (5)

1-14: LGTM!

Package declaration and imports are correctly structured. All imports are utilized within the file.


16-18: LGTM!

The 60-minute cap on bundle wait time is a reasonable safeguard against excessively long waits.


24-29: LGTM!

Empty struct with constructor is an acceptable pattern for future extensibility.


67-76: LGTM!

The waitFor implementation correctly handles context cancellation and returns the appropriate error, addressing prior review feedback. The logging provides good observability.


20-22: Remove this comment — ErrJobNotImplemented is actively used.

The error is referenced in client/internal/engine.go:979 where it's accessed via .Error() method. It's appropriately exported for use across packages.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/internal/engine.go (1)

235-255: Remove unused c *profilemanager.Config parameter in NewEngine

NewEngine’s last parameter c *profilemanager.Config is never used in the function body. This is dead API surface and (depending on tooling) can trigger compile or lint failures and unnecessary churn at call sites.

Consider dropping this parameter from the signature and corresponding call sites, since EngineConfig.ProfileConfig already carries the same information.

-func NewEngine(..., config *EngineConfig, mobileDep MobileDependency, statusRecorder *peer.Status, checks []*mgmProto.Checks, c *profilemanager.Config) *Engine {
+func NewEngine(..., config *EngineConfig, mobileDep MobileDependency, statusRecorder *peer.Status, checks []*mgmProto.Checks) *Engine {

…and update call sites accordingly.

♻️ Duplicate comments (4)
management/server/peer.go (3)

348-358: Potential nil-pointer on am.jobManager.

The code calls am.jobManager.IsPeerConnected(peerID) and am.jobManager.IsPeerHasPendingJobs(peerID) without checking if am.jobManager is non-nil. If the JobManager is ever nil (e.g., in certain test configurations or misconfigured environments), this will panic.

Consider adding a defensive check at the start of the function:

+	if am.jobManager == nil {
+		return status.Errorf(status.Internal, "job manager is not configured")
+	}
+
 	if !am.jobManager.IsPeerConnected(peerID) {

Additionally, the TODO comment on lines 353-355 correctly identifies a race condition where concurrent calls could overwrite pending job entries. This should be addressed in a follow-up to ensure job queue integrity.

Based on past review comments, this issue was previously flagged but remains unaddressed.


365-398: Send-before-persist ordering can cause inconsistency between runtime and persistent state.

The current implementation sends the job to the peer (line 366) before persisting it to the database (line 379). If the database write fails after the job has been dispatched, the job will execute on the client but will be absent from persistent storage and from /jobs API listings.

This ordering can lead to:

  • Jobs executing on peers with no database record
  • Inability to track or query these jobs via the API
  • Potential audit and compliance issues

Consider reversing the order:

  1. Persist the job first with status = pending
  2. Send the job to the peer
  3. If sending fails, update the job status to failed with an appropriate FailedReason

This ensures the database remains the source of truth and runtime state stays consistent.

Based on past review comments, this issue was previously flagged but remains unaddressed.


427-451: Missing job ownership validation - authorization bypass.

After fetching the job by jobID (line 445), the function does not verify that job.PeerID matches the requested peerID. This allows a caller to access any job within the account by manipulating the URL path:

Example: GET /peers/{peerA}/jobs/{jobOfPeerB} will return jobOfPeerB even though the path specifies peerA.

Add an ownership check after retrieving the job:

 	job, err := am.Store.GetPeerJobByID(ctx, accountID, jobID)
 	if err != nil {
 		return nil, err
 	}
+
+	if job.PeerID != peerID {
+		return nil, status.NewPeerNotPartOfAccountError()
+	}

 	return job, nil

Based on past review comments, this issue was previously flagged but remains unaddressed.

client/internal/engine.go (1)

987-1010: Guard against nil ProfileConfig and ManagementURL in handleBundle

handleBundle dereferences e.config.ProfileConfig and e.config.ProfileConfig.ManagementURL without checking for nil:

  • InternalConfig: e.config.ProfileConfig
  • e.config.ProfileConfig.ManagementURL.String()

If ProfileConfig or its ManagementURL is nil (e.g. in tests or alternative EngineConfig construction), this will panic during remote debug bundle generation. A previous review already flagged this; it’s still present.

 func (e *Engine) handleBundle(params *mgmProto.BundleParameters) (*mgmProto.JobResponse_Bundle, error) {
 	log.Infof("handle remote debug bundle request: %s", params.String())
@@
-	bundleDeps := debug.GeneratorDependencies{
-		InternalConfig: e.config.ProfileConfig,
+	if e.config.ProfileConfig == nil || e.config.ProfileConfig.ManagementURL == nil {
+		return nil, fmt.Errorf("profile config or management URL not available for bundle generation")
+	}
+
+	bundleDeps := debug.GeneratorDependencies{
+		InternalConfig: e.config.ProfileConfig,
@@
-	waitFor := time.Duration(params.BundleForTime) * time.Minute
-
-	uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, e.config.ProfileConfig.ManagementURL.String())
+	waitFor := time.Duration(params.BundleForTime) * time.Minute
+
+	managementURL := e.config.ProfileConfig.ManagementURL.String()
+	uploadKey, err := e.jobExecutor.BundleJob(e.ctx, bundleDeps, bundleJobParams, waitFor, managementURL)

This keeps the job API unchanged while ensuring the engine can’t panic on missing profile data.

🧹 Nitpick comments (3)
client/server/server.go (1)

31-33: Using nbstatus.ToProtoFullStatus simplifies and centralizes status conversion

Replacing the local toProtoFullStatus helper with nbstatus.ToProtoFullStatus(fullStatus) is a good consolidation. The returned pbFullStatus is still enriched locally with events and SSH server state before being attached to the response, so behavior remains consistent while reducing duplication.

Also applies to: 1078-1091

client/internal/connect.go (2)

64-67: Log path propagation through ConnectClient.Run and mobile variants is consistent

The new Run(runningChan, logPath) signature, plus forwarding of logPath into run(...), cleanly introduces log‑path awareness without disturbing existing control flow. Android/iOS wrappers deliberately pass an empty log path, which keeps their behavior unchanged while still benefiting from other refactors.

Also applies to: 70-88, 90-106, 108-108


251-256: EngineConfig enrichment with LogPath and ProfileConfig is correctly wired

createEngineConfig now:

  • Accepts logPath and assigns it to EngineConfig.LogPath.
  • Sets EngineConfig.ProfileConfig = config.

This, combined with the updated call createEngineConfig(myPrivateKey, c.config, peerConfig, logPath) and the use of these fields in the engine’s debug bundle path, ensures the engine has both the profile metadata and log location needed for remote debug bundles.

Aside from the unused extra parameter on NewEngine (flagged separately in engine.go), the data flow looks sound.

Also applies to: 417-456

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c71c0b6 and 3b60fc8.

📒 Files selected for processing (5)
  • client/internal/connect.go (7 hunks)
  • client/internal/engine.go (13 hunks)
  • client/server/server.go (4 hunks)
  • management/server/peer.go (2 hunks)
  • management/server/permissions/modules/module.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
client/internal/engine.go (3)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
client/jobexec/executor.go (3)
  • Executor (24-25)
  • NewExecutor (27-29)
  • ErrJobNotImplemented (21-21)
client/internal/debug/debug.go (2)
  • GeneratorDependencies (238-243)
  • BundleConfig (232-236)
client/server/server.go (1)
client/status/status.go (1)
  • ToProtoFullStatus (549-635)
client/internal/connect.go (3)
client/internal/mobile_dependency.go (1)
  • MobileDependency (13-25)
client/internal/engine.go (2)
  • NewEngine (235-276)
  • EngineConfig (83-142)
client/internal/profilemanager/config.go (1)
  • Config (89-160)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: release_ui_darwin
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: release
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
  • GitHub Check: Darwin
  • GitHub Check: Linux
  • GitHub Check: Windows
  • GitHub Check: iOS / Build
  • GitHub Check: Android / Build
🔇 Additional comments (6)
management/server/permissions/modules/module.go (1)

8-8: LGTM!

The new RemoteJobs module constant and its registration in the All map follow the established pattern correctly.

Also applies to: 25-25

management/server/peer.go (2)

34-34: LGTM!

The minimum version constant for remote jobs is clear and will enable proper feature gating.


400-425: Function looks good; note the TODO for dedicated permissions.

The permission checks and peer ownership validation are correctly implemented. The TODO comment on line 401 indicates planned work to create more granular job permissions, which is a reasonable future enhancement.

client/internal/engine.go (2)

205-222: Job executor lifecycle wiring looks correct

The addition of jobExecutor, its dedicated jobExecutorWG, the receiveJobEvents() call in Start(), and e.jobExecutorWG.Wait() in Stop() form a coherent lifecycle: job goroutine is tracked and properly drained on shutdown, without sharing shutdownWg with other engine goroutines. I don’t see deadlock or ordering issues given current usage.

Also applies to: 278-337, 524-527, 950-985


809-822: Sync response persistence refactor is thread‑safe

The move to a dedicated syncRespMux for persistSyncResponse and latestSyncResponse (with R/W separation and cloning in GetLatestSyncResponse) removes contention with syncMsgMux and avoids data races on the stored message. Lock ordering is one‑way (syncMsgMuxsyncRespMux) and I don’t see any reverse order, so deadlock risk is low.

Also applies to: 1889-1925

client/server/server.go (1)

1539-1547: Passing s.logFile into ConnectClient.Run matches new signature

Updating connect() to call s.connectClient.Run(runningChan, s.logFile) correctly propagates the daemon’s log file path into the client, aligning with the new Run signature and the debug bundle/log‑path plumbing in ConnectClient and EngineConfig.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 12, 2025

Quality Gate Passed Quality Gate passed

Issues
0 New issues
8 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b60fc8 and 9ce25b9.

📒 Files selected for processing (13)
  • client/cmd/testutil_test.go (4 hunks)
  • client/internal/engine_test.go (10 hunks)
  • client/server/server_test.go (4 hunks)
  • management/internals/modules/peers/manager_mock.go (1 hunks)
  • management/server/account_test.go (2 hunks)
  • management/server/dns_test.go (2 hunks)
  • management/server/http/testing/testing_tools/channel/channel.go (2 hunks)
  • management/server/management_proto_test.go (3 hunks)
  • management/server/management_test.go (4 hunks)
  • management/server/nameserver_test.go (2 hunks)
  • management/server/peer_test.go (5 hunks)
  • management/server/route_test.go (2 hunks)
  • shared/management/client/client_test.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • management/server/dns_test.go
  • management/server/management_proto_test.go
  • management/server/peer_test.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/account_test.go
  • client/cmd/testutil_test.go
  • management/server/route_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • shared/management/client/client_test.go
🧬 Code graph analysis (2)
management/server/management_test.go (3)
management/server/networks/routers/manager.go (1)
  • NewManager (40-46)
management/server/settings/manager.go (1)
  • NewManager (37-44)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
management/server/nameserver_test.go (3)
management/internals/modules/peers/manager.go (1)
  • NewManager (44-49)
management/server/account.go (1)
  • BuildManager (178-266)
management/server/job/manager.go (1)
  • NewJobManager (34-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Build Cache
  • GitHub Check: Linux
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: release_ui
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: Darwin
  • GitHub Check: iOS / Build
  • GitHub Check: Windows
  • GitHub Check: Android / Build
  • GitHub Check: Check External GPL/AGPL Licenses
🔇 Additional comments (10)
management/internals/modules/peers/manager_mock.go (1)

100-113: LGTM! Auto-generated mock looks correct.

The mock method and recorder for GetPeerID follow standard gomock patterns and are consistent with the existing mock methods in this file. The signatures and implementations are correct.

client/server/server_test.go (2)

310-310: LGTM! JobManager correctly instantiated for tests.

The JobManager is created with nil metrics, which is appropriate for test setup. The store and peersManager dependencies are properly provided.


323-323: No action needed—BuildManager parameter positions are correct.

The jobManager is correctly positioned as the 5th parameter in the BuildManager call at line 323, matching the function signature where it follows networkMapController. All parameter alignments are accurate.

shared/management/client/client_test.go (2)

96-96: LGTM! JobManager wiring integrated correctly.

The JobManager is properly instantiated and will be threaded through the test setup.


122-122: Ephemeral manager import alias improves readability.

Using the ephemeral_manager alias avoids conflicts and makes the code clearer.

management/server/nameserver_test.go (1)

794-801: LGTM! Clean inline JobManager creation.

Creating the JobManager inline within the BuildManager call is concise and appropriate for test code. The peersManager dependency is correctly instantiated on the preceding line.

client/internal/engine_test.go (3)

1602-1635: LGTM! Management server test wiring is consistent.

The JobManager is correctly created and passed through to BuildManager and nbgrpc.NewServer, following the same pattern established in other test files.


1536-1536: No changes needed—NewEngine signature is correct.

NewEngine returns only *Engine (not (*Engine, error)), making the assignment e, err := NewEngine(...), nil valid Go syntax. This is not unconventional; Go functions can return single values, and assigning nil to err here is a straightforward tuple assignment with no issues.


256-256: The two trailing nil parameters passed to NewEngine correspond to the checks and c parameters in the function signature (line 235 of client/internal/engine.go), which are []*mgmProto.Checks and *profilemanager.Config respectively. These parameters are already documented in the function signature itself through their names and type annotations, so no additional documentation is needed. The speculation about "jobManager and logPath" does not match the actual implementation.

Likely an incorrect or invalid review comment.

management/server/management_test.go (1)

206-244: LGTM! JobManager integration is clean and consistent.

The test correctly:

  • Extracts peersManager into a variable (line 206)
  • Creates jobManager with nil metrics (line 207)
  • Passes jobManager to BuildManager (line 219)
  • Passes jobManager to nbgrpc.NewServer (line 244)

This follows the established pattern across all test files.

return nil, "", err
}
mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, secretsManager, nil, nil, &server.MockIntegratedValidator{}, networkMapController)
mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &server.MockIntegratedValidator{}, networkMapController)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find nbgrpc package files
fd -e go | grep -i nbgrpc | head -20

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Search for NewServer function definition
ast-grep --pattern $'func NewServer($_$$_) $_$$_'

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Also try a broader search for NewServer
rg -t go "func NewServer" --max-count 5 -A 10

Repository: netbirdio/netbird

Length of output: 5342


🏁 Script executed:

# Read the test file to see the actual call
cat -n client/server/server_test.go | sed -n '320,345p'

Repository: netbirdio/netbird

Length of output: 1643


Fix nbgrpc.NewServer call: pass concrete managers instead of nil for appMetrics and authManager parameters.

The parameter count (9) and order are correct. However, the call passes nil for the 6th and 7th parameters (appMetrics and authManager), which expect telemetry.AppMetrics and auth.Manager types respectively. Pass appropriate instances or mocks instead of nil for these parameters.

🤖 Prompt for AI Agents
In client/server/server_test.go around line 332, the nbgrpc.NewServer call
currently passes nil for the 6th and 7th parameters (appMetrics and
authManager); replace those nils with concrete instances or test mocks
implementing telemetry.AppMetrics and auth.Manager (e.g., create or reuse a
MockAppMetrics and MockAuthManager in the test setup or use existing test
helpers), then pass those instances to nbgrpc.NewServer in the same parameter
positions so the call receives proper typed objects instead of nil.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants