-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Contact Details
What problem does this solve?
We operate a monorepo setup with many tiny projects that need to be scanned. While we have sufficient concurrent scan capacity (scans complete within minutes), we experience queue capacity spikes when enqueuing many scans simultaneously.
Current Situation:
- Our concurrent scan capacity can handle the workload efficiently
- Individual scans are fast and complete quickly
- However, the initial enqueuing of many small projects creates a spike that exceeds the maximum queue capacity
- When queue capacity is exceeded, scan creation fails immediately without retries
Impact:
- Every team using our shared Checkmarx One subscription must implement their own retry logic
- This creates duplicated effort across teams
- Tools that depend on the CLI (like the Azure DevOps plugin) cannot easily benefit from retry logic
- Inconsistent retry implementations across different teams and tools
Root Cause:
The CLI currently only retries on specific HTTP errors (502, 401) as seen in internal/wrappers/client.go:83-105, but does not retry on queue capacity errors during scan creation.
Proposed Solution
Add configurable retry functionality specifically for scan enqueuing failures due to queue capacity limits.
Proposed Flags:
--scan-enqueue-retries <count>- Number of retry attempts (default: 0 to maintain backward compatibility)--scan-enqueue-retry-delay <seconds>- Delay between retry attempts (default: reasonable value like 5-10 seconds)
Implementation Approach:
- Extend the retry logic in
internal/wrappers/scans-http.go:32(ScansHTTPWrapper.Create()) - Detect queue capacity errors in the error response from Checkmarx One API
- Apply retry logic similar to the existing SCM rate limit handling in
internal/wrappers/rate-limit.go - Respect the new flags when retrying scan creation requests
- Log retry attempts for visibility (e.g., "Scan creation failed due to queue capacity, retrying (attempt 1/5)...")
Retry Strategy:
Reuse the existing exponential backoff implementation from internal/wrappers/client.go:83-105, which calculates delay as baseDelayInMilliSec * (1 << attempt). This provides:
- Consistent behavior with existing retry logic
- Proven exponential backoff pattern already in the codebase
- Reduced queue pressure as delays increase progressively
Benefits:
- Centralized retry logic that all teams can use
- Works automatically for Azure DevOps plugin and other tools built on the CLI
- Backward compatible (default of 0 retries maintains current behavior)
- Reduces burden on individual teams to implement retry logic
- Consistent behavior across all teams using the shared subscription
Importance Level
Critical
Additional Information
Technical Context:
- Existing retry logic:
internal/wrappers/client.go:83-105(retryHTTPRequest) - Scan creation:
internal/commands/scan.go:2302(runCreateScanCommand) - HTTP wrapper:
internal/wrappers/scans-http.go:32(Create method) - Precedent:
internal/wrappers/rate-limit.godemonstrates similar retry logic for SCM rate limits
Use Case Details:
- Multiple teams on shared subscription
- Monorepo with 100+ small projects
- Automated CI/CD pipelines triggering many scans concurrently
- Queue burns through quickly, but initial spike causes enqueuing failures
Related Functionality:
The CLI already has similar flags for other operations:
--wait-delayfor polling intervals--scan-timeoutfor scan timeouts- The proposed flags follow this existing pattern
Alternatives Considered:
- Implementing retry logic in each team's CI/CD pipeline → Creates duplication
- Increasing queue capacity → Not feasible, as queue capacity is tied to concurrent scan and developer licenses (which are adequate for our workload)
This enhancement would make the CLI more robust for enterprise environments with high-volume scan requirements.