From 47d3dfc53efa4390a019b5eb39b476572fae35ea Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Wed, 15 Oct 2025 13:30:52 -0700 Subject: [PATCH 01/20] Add Vision Pipeline API design document for domain experts Defines API workflows and implementation considerations for customer domain experts deploying vision analytics with SceneScape integration. --- docs/design/vision-pipeline-overview.md | 424 ++++++++++++++++++++++++ 1 file changed, 424 insertions(+) create mode 100644 docs/design/vision-pipeline-overview.md diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md new file mode 100644 index 000000000..42dc9fd4b --- /dev/null +++ b/docs/design/vision-pipeline-overview.md @@ -0,0 +1,424 @@ +# Design Document: Vision Pipeline API for Domain Experts + +- **Author(s)**: Rob Watts +- **Date**: 2025-10-07 +- **Status**: `Proposed` +- **Related ADRs**: TBD +### Data Flow Patterns + +- **Streaming**: Real-time sensor data processing with continuous output streams +- **Batch**: Offline processing of recorded sensor data +- **Hybrid**: Live processing with periodic batch analysis for quality assurance + +### Coordinate System Management + +- **Local Coordinates**: Pipeline outputs positions in camera/sensor coordinate space +- **World Coordinate Transformation**: External responsibility using calibration data (handled by SceneScape MOT) +- **Multi-Sensor Fusion**: Requires external coordinate system reconciliation and cross-sensor tracking +- **Single-Sensor Scope**: Vision pipeline operates only within individual sensor coordinate systems + +### Performance Considerationsiew + +This document defines a simple API for connecting cameras, configuring vision analytics pipelines, and accessing object detection metadata. The API enables domain experts to deploy computer vision capabilities without requiring deep technical knowledge of AI models, pipeline configurations, or video processing implementations. + +The vision pipeline API abstracts away technical complexity while providing reliable object detection metadata that feeds into downstream systems like Intel SceneScape for multi-camera tracking and scene analytics. + +## Goals + +- **Simple Camera Management**: Easy API to connect and manage one or many camera inputs dynamically +- **Composable Analytics Pipelines**: Modular pipeline stages that can be chained together (e.g., vehicle detection → license plate detection → OCR) where each stage can be pre-configured but combined flexibly +- **Raw Frame Access**: On-demand access to original camera frames regardless of input type or source +- **Performance Optimization**: Easy configuration of inference targets (CPU, iGPU, GPU, NPU) for optimal hardware utilization +- **Abstracted Complexity**: Hide AI model management, pipeline optimization details, and video processing complexity from domain experts +- **API-First Design**: Enable development of reference UIs for managing pipelines and sensor sources, supporting integration with SceneScape UI, VIPPET, or customer-implemented interfaces + +## Non-Goals + +- Advanced computer vision research or custom model training +- Multi-camera tracking and scene analytics (handled by downstream systems like SceneScape) +- Complex video processing workflows or custom pipeline development + +## Design Context + +### Primary Persona: **Traffic Operations Expert** + +- **Background**: Transportation engineer, city planner, or traffic management specialist who wants to leverage computer vision to improve traffic flow, safety, and urban mobility +- **Goal**: Deploy smart intersection systems that provide actionable traffic insights and automated responses without requiring deep computer vision expertise +- **Technical Level**: Understands traffic engineering, urban planning, and sensor networks but has limited computer vision knowledge; wants to focus on traffic optimization, not algorithm configuration +- **Pain Points**: + + - Complex vision systems obscure traffic engineering value + - Difficulty translating traffic requirements into vision configurations + - Unclear what vision capabilities are available for traffic applications + - Technical complexity prevents rapid deployment and testing of traffic solutions + +### Use Case: "Vision Pipeline API for Traffic Monitoring" + +A traffic operations expert wants to deploy vision analytics at a busy intersection to feed object detection metadata into their Intel SceneScape system for multi-camera tracking and scene analytics. + +**API Requirements:** + +1. **Camera Management**: Connect 4-8 cameras dynamically via RTSP streams, USB connections, or video files - add/remove cameras without system restart + +2. **Pipeline Composition**: Compose analytics pipelines by chaining stages together: + + - Vehicle detection → license plate detection → OCR + - Person detection → re-identification embedding generation + - General object detection → vehicle classification + - Custom combinations based on specific needs + +3. **Metadata Output**: Send detection results to MQTT broker for SceneScape processing: + + - JSON format with validated schema structure + - Batched messages to minimize network chatter + - Preserved frame timestamps and camera source IDs + - Procedurally generated MQTT topics with optional namespace configuration + +4. **Raw Frame Access**: Provide on-demand access to original camera frames for debugging, validation, and manual review - regardless of camera type or connection method + +They want to say: Connect these cameras, run vehicle and person detection, send metadata to SceneScape via MQTT and have a simple API that handles all the technical complexity - without needing to understand AI model formats, video decoding, or pipeline optimization. + +The vision pipeline interface enables this by providing: + +- **Intuitive Input/Output Selection**: Ability to independently select and configure sensor inputs and desired outputs +- **Modular Component Configuration**: Modular approach to configuring video analytics components (detection, tracking, classification) +- **Standardized Abstraction**: Clean separation between data streams, algorithmic configuration, and output products +- **Technology Independence**: Interface that works with any underlying pipeline implementation + +## Vision Pipeline Interface + +### Interface Definition + +The vision pipeline interface defines a clear contract between data inputs, processing components, and outputs. This interface can be implemented by any computer vision technology stack. + +```mermaid +flowchart LR + subgraph Inputs["Inputs"] + subgraph SensorInputs["Sensor Inputs"] + CAM1["Camera 1
Raw Video"] + CAM2["Camera 2
Raw Video"] + LIDAR["LiDAR
Point Cloud"] + RADAR["Radar
Point Cloud"] + end + + subgraph ConfigInputs["Configuration Inputs"] + MODELS["AI Models
Detection/Classification"] + CALIB["Calibration Data
Intrinsics + Distortion"] + end + + subgraph PlatformInputs["Platform Inputs"] + TIME["Synchronized System Time
(timestamps, time sync)"] + end + end + + subgraph Pipeline["Vision Pipeline"] + VIDEO["Video Processing
Decode → Detect → Single-Camera Track → Embed → Classify"] + POINTCLOUD["Point Cloud Processing
Segment → Detect → Single-Sensor Track → Embed"] + end + + subgraph Outputs["Pipeline Outputs"] + DETECTIONS["Object Detections & Tracks
(bounding boxes, classifications, temporal associations, IDs, embeddings)"] + RAWDATA["Raw Data
(original frames, point clouds)"] + DECORATED["Decorated Data
(annotated images, segmented point clouds)"] + end + + %% Styling + classDef pipeline fill:#fff8e1,stroke:#ff8f00,stroke-width:3px,color:#000000 + classDef sensors fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000 + classDef config fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000000 + classDef platform fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#000000 + classDef outputs fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000 + + class VIDEO,POINTCLOUD pipeline + class CAM1,CAM2,LIDAR,RADAR sensors + class MODELS,CALIB config + class TIME platform + class DETECTIONS,RAWDATA,DECORATED outputs +``` + +## Vision Pipeline API Components + +### Camera Management API + +**Dynamic camera connection and configuration:** + +- **Add Camera**: Connect new cameras via RTSP, USB, or file input without system restart +- **Remove Camera**: Disconnect cameras and clean up resources gracefully +- **Camera Status**: Monitor connection health, frame rate, and video quality +- **Camera Configuration**: Set resolution, frame rate, and encoding parameters +- **Multi-Source Support**: Handle mixed camera types (IP cameras, USB webcams, video files) in single deployment + +### Pipeline Configuration API + +**Composable analytics pipeline stages:** + +- **Detection Stages**: Vehicle detection, person detection, general object detection, license plate detection, barcode detection, QR code detection, AprilTag detection +- **Classification Stages**: Vehicle type classification, person attribute classification, object categorization +- **Analysis Stages**: OCR text extraction, barcode decoding, QR code decoding, AprilTag pose estimation, re-identification embedding generation, pose estimation +- **Pipeline Composition**: Chain compatible stages together where outputs of one stage match inputs of the next (e.g., vehicle detection → vehicle classification, license plate detection → OCR, barcode detection → barcode decoding) +- **Compatibility Validation**: System prevents invalid stage chaining when output formats are incompatible (e.g., classification stage cannot feed into detection stage) +- **Parallel Processing**: Support both sequential stage chaining and parallel stage execution for independent analytics on the same input +- **Pre-configured Stages**: Each stage comes with optimized default settings but allows customization +- **Per-Stage Hardware Optimization**: Target each individual stage to specific hardware (CPU, iGPU, GPU, NPU) for optimal performance +- **Pipeline Templates**: Save and reuse common stage combinations across deployments + +**Pipeline Stage Architecture:** + +- **Self-Contained Processing**: Each stage includes its own pre-processing (data preparation, format conversion) and post-processing (result formatting, filtering, validation) +- **Technology Agnostic**: Stages can run any type of analytics including computer vision (CV), deep learning (DL), traditional image processing, or related technologies +- **Modular Interface**: Standardized input/output interfaces allow stages to be combined regardless of underlying technology +- **Independent Optimization**: Each stage can be optimized separately for different performance characteristics and hardware targets + +### Metadata Output API + +**MQTT-focused metadata publishing for SceneScape integration:** + +- **MQTT Publishing**: All detection metadata published to MQTT brokers in JSON format with procedurally generated topics +- **Batch Processing**: Minimized chatter with one message per batch to reduce network overhead and improve performance +- **Temporal Preservation**: Original frame timestamps preserved along with camera source ID for accurate temporal correlation +- **Schema Validation**: Updated JSON schema provided for metadata structure validation and downstream system integration +- **Topic Generation**: MQTT topics are procedurally generated based on camera IDs and pipeline configuration, with optional top-level namespace configuration to prevent user errors + +### Frame Access API + +**On-demand access to camera frame data:** + +- **Live Frame Retrieval**: Get current frame from any connected camera +- **Historical Frame Access**: Access stored frames with timestamp-based queries +- **Decorated Frame Access**: Retrieve frames with detection bounding boxes, labels, and confidence scores overlaid +- **Batch Frame Export**: Download multiple frames for analysis or debugging +- **Frame Metadata**: Include camera settings, timestamps, and detection overlays + +**Performance Note**: Frame access operations must be designed to avoid impacting system throughput or latency whenever possible. Frame retrieval should use separate data paths or buffering mechanisms that do not interfere with real-time analytics processing. + +## API Workflows + +This section demonstrates common workflows using sequence diagrams to show the API interactions for typical deployment scenarios. + +### Add Cameras for Connectivity and Calibration + +**Purpose**: Verify camera connectivity and enable downstream calibration without analytics processing. + +```mermaid +sequenceDiagram + participant User as Traffic Engineer + participant API as Vision Pipeline API + participant Server as Pipeline Server + participant Camera as Camera Source + participant MQTT as MQTT Broker + + User->>API: POST /cameras + Note over User,API: Configure camera (RTSP URL, resolution, etc.) + API->>Server: Create camera instance + Server->>Camera: Establish connection + Camera-->>Server: Video stream + Server->>MQTT: Publish camera status (connected) + API-->>User: Camera ID and status + + Note over User,MQTT: Camera running in free-run mode
Raw frames available for calibration
No analytics processing yet +``` + +### Add Single Pipeline Stage and Verify Results + +**Purpose**: Add analytics processing to connected cameras and verify output in SceneScape. + +```mermaid +sequenceDiagram + participant User as Traffic Engineer + participant API as Vision Pipeline API + participant Server as Pipeline Server + participant MQTT as MQTT Broker + participant SceneScape as SceneScape System + + User->>API: POST /pipelines + Note over User,API: Configure pipeline:
- Camera ID
- Stage: Vehicle Detection
- Hardware: GPU + API->>Server: Create pipeline with detection stage + Server->>Server: Start analytics processing + Server->>MQTT: Publish detection metadata + MQTT->>SceneScape: Forward detection data + SceneScape->>SceneScape: Process multi-camera tracking + SceneScape->>MQTT: Publish tracks and properties + MQTT-->>User: Track data available for consumption + SceneScape-->>User: Visual verification in SceneScape UI + + User->>API: GET /frames/{camera_id}/decorated + API-->>User: Frame with detection overlays + Note over User: Visual verification of detections
Complete data flow: detections → tracks → properties +``` + +### Modify Pipeline Stage Model + +**Purpose**: Change the analytics model for an existing pipeline stage. + +```mermaid +sequenceDiagram + participant User as Traffic Engineer + participant API as Vision Pipeline API + participant Server as Pipeline Server + participant MQTT as MQTT Broker + + User->>API: PUT /pipelines/{pipeline_id}/stages/{stage_id} + Note over User,API: Update stage configuration:
- Change from Vehicle Detection
- To Person Detection + API->>Server: Stop current stage + Server->>Server: Cleanup detection resources + Server->>Server: Initialize person detection + Server->>Server: Resume analytics processing + Server->>MQTT: Publish updated metadata schema + API-->>User: Stage update confirmation + + Note over Server,MQTT: Pipeline now outputs
person detection data +``` + +### Modify Camera Configuration + +**Purpose**: Update camera properties like camera ID with graceful system handling. + +```mermaid +sequenceDiagram + participant User as Traffic Engineer + participant API as Vision Pipeline API + participant Server as Pipeline Server + participant MQTT as MQTT Broker + + User->>API: PUT /cameras/{camera_id} + Note over User,API: Update camera config:
- Change camera ID
- From "cam_01"
- To "cam_north" + API->>Server: Update camera metadata + Server->>Server: Apply configuration changes + Server->>Server: Update internal camera references + Server->>MQTT: Publish with updated camera ID + API-->>User: Camera update confirmation + + Note over Server,MQTT: System gracefully handles
camera ID changes +``` + +### Delete Camera + +**Purpose**: Remove camera and clean up all associated resources. + +```mermaid +sequenceDiagram + participant User as Traffic Engineer + participant API as Vision Pipeline API + participant Server as Pipeline Server + participant MQTT as MQTT Broker + + User->>API: DELETE /cameras/{camera_id} + API->>Server: Initiate camera deletion + Server->>Server: Stop associated pipelines + Server->>Server: Cleanup analytics resources + Server->>Server: Disconnect from camera source + Server->>MQTT: Publish camera offline status + Server->>Server: Remove camera instance + API-->>User: Deletion confirmation + + Note over Server: All camera resources cleaned up
Associated pipelines terminated +``` + +### Add Sequential Pipeline Stages + +**Purpose**: Chain multiple analytics stages for complex processing workflows. + +```mermaid +sequenceDiagram + participant User as Traffic Engineer + participant API as Vision Pipeline API + participant Server as Pipeline Server + participant MQTT as MQTT Broker + + Note over User: Existing pipeline: Vehicle Detection + + User->>API: POST /pipelines/{pipeline_id}/stages + Note over User,API: Add classification stage:
- Input: Vehicle detections
- Stage: Vehicle Type Classification
- Hardware: NPU + API->>Server: Validate stage compatibility + Server->>Server: Create classification stage + Server->>Server: Link detection → classification + Server->>Server: Start chained processing + + Note over Server: Processing chain:
1. Vehicle Detection (GPU)
2. Vehicle Classification (NPU) + + Server->>MQTT: Publish enhanced metadata + Note over MQTT: Detection + classification data
in single message batch + API-->>User: Stage addition confirmation +``` + +### Add Additional Camera to Existing Pipeline + +**Purpose**: Scale pipeline to process multiple cameras with batched MQTT output while preserving individual camera metadata. + +```mermaid +sequenceDiagram + participant User as Traffic Engineer + participant API as Vision Pipeline API + participant Server as Pipeline Server + participant MQTT as MQTT Broker + participant SceneScape as SceneScape System + + Note over User: Existing pipeline processing Camera 1
with Vehicle Detection + Classification + + User->>API: POST /pipelines/{pipeline_id}/cameras + Note over User,API: Add camera to existing pipeline:
- Camera ID: "cam_south"
- RTSP URL, resolution
- Inherits pipeline analytics + API->>Server: Create camera and add to pipeline + Server->>Server: Establish camera connection + Server->>Server: Configure multi-camera processing + Server->>Server: Apply existing analytics to new camera + + Note over Server: Processing both cameras:
Camera 1 + Camera 2
→ Detection + Classification + + Server->>Server: Batch results from both cameras + Server->>MQTT: Publish aggregated batch + Note over Server,MQTT: Single MQTT message containing:
- Camera 1 detections (ID + timestamp)
- Camera 2 detections (ID + timestamp)
- Preserved individual metadata + + MQTT->>SceneScape: Process batched multi-camera data + API-->>User: Camera addition confirmation +``` + +## Implementation Considerations + +### Coordinate System Management + +- **Local Coordinates**: Pipeline outputs positions in camera/sensor coordinate space without knowledge of world coordinates or global scene context +- **Camera Coordinates**: 3D coordinate output depends on detection model and sensor modality: + - **Monocular 3D Detectors**: Require intrinsic calibration parameters to estimate depth and convert to 3D camera space + - **LiDAR/Radar Sensors**: Provide native 3D point cloud data in sensor coordinate space + - **2D-Only Models**: Most 2D detectors operate natively in image pixel coordinates (x, y within frame dimensions) and it is acceptable to publish detection results in these units +- **World Coordinate Transformation**: External responsibility using extrinsic calibration data (handled by downstream systems like SceneScape) +- **Multi-Sensor Fusion**: Requires external coordinate system reconciliation and cross-sensor tracking - accomplished outside of the pipeline scope +- **Single-Sensor Scope**: Vision pipeline operates independently within individual sensor coordinate systems, maintaining clear boundaries + +### Performance Considerations + +- **Resource Management**: Interface should specify computational and memory requirements per pipeline stage for capacity planning +- **Hardware Targeting**: Enable per-stage optimization across CPU, iGPU, GPU, and NPU resources for balanced performance +- **Latency Requirements**: Support configurable real-time guarantees based on application needs (traffic safety vs. analytics) +- **Throughput Scaling**: Interface should support multiple concurrent sensor streams without performance degradation +- **System Headroom**: Enable configuration of available computational headroom reserved for other workloads to prevent pipeline overload +- **Dynamic Load Balancing**: Support runtime adjustment of processing priorities based on system load and application criticality + +### Server Architecture + +- **Single Server Instance**: One persistent server instance per compute node manages all vision pipelines, eliminating configuration complexity from multiple service instances +- **Always Running**: Server instance maintains continuous availability, managing pipeline lifecycle internally without requiring external service management +- **Pipeline Management**: Server handles creation, configuration, monitoring, and cleanup of individual pipelines through a unified API interface +- **Port Consolidation**: All pipeline operations accessible through single API endpoint, avoiding the configuration challenges of multiple services on different ports +- **Resource Coordination**: Centralized server enables optimal resource allocation and conflict resolution across concurrent pipelines +- **Simplified Deployment**: Single service deployment model reduces operational complexity compared to per-pipeline service instances + +### Pipeline Stage Management + +A pipeline stage represents a single operation such as a detection or classification step that includes its pre- and post-processing operations. It can represent any number of types of analytics, including deep learning, computer vision, transformer, or other related operations. + +- **Initial Configuration**: Pipeline stages can be initially managed through manual configuration files or system administration tools +- **Stage Discovery**: System should provide mechanisms to discover available analytics stages and their capabilities (input/output formats, hardware requirements) +- **Stage Validation**: Automated validation of stage compatibility when composing pipelines to prevent invalid configurations +- **Stage Versioning**: Support for multiple versions of analytics stages to enable gradual upgrades and rollback capabilities +- **Customer Extensibility**: Future capability for customers to register custom analytics stages through standardized interfaces +- **Configuration Templates**: Pre-built stage combinations and templates for common use cases to simplify deployment +- **Runtime Management**: Eventually support dynamic loading and unloading of analytics stages without service restart + +--- + +## Conclusion + +This vision pipeline interface definition provides a clean separation between sensor inputs, configuration inputs, and standardized outputs. By focusing on the interface rather than implementation details, it enables technology-agnostic pipeline development while supporting debugging, validation, and gradual enhancement of existing robust pipeline technologies. + +The interface is motivated by SceneScape's architectural needs but designed as a reusable specification for any computer vision application requiring clear, maintainable pipeline boundaries built on proven technologies. From e928625f1b4acd6b75dda6ab9aa53969da2c5fde Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Wed, 15 Oct 2025 16:29:37 -0700 Subject: [PATCH 02/20] Restored correct document overview and removed corrupted sections. --- .../gvapython/sscape/sscape_adapter.py | 12 +++++++++--- docs/design/vision-pipeline-overview.md | 16 +++------------- 2 files changed, 12 insertions(+), 16 deletions(-) diff --git a/dlstreamer-pipeline-server/user_scripts/gvapython/sscape/sscape_adapter.py b/dlstreamer-pipeline-server/user_scripts/gvapython/sscape/sscape_adapter.py index 2dd6a88f1..d8fe80b0e 100644 --- a/dlstreamer-pipeline-server/user_scripts/gvapython/sscape/sscape_adapter.py +++ b/dlstreamer-pipeline-server/user_scripts/gvapython/sscape/sscape_adapter.py @@ -16,7 +16,9 @@ import paho.mqtt.client as mqtt from pytz import timezone -from utils import publisher_utils as utils +import sys +sys.path.insert(0, '/home/pipeline-server') +import utils.publisher_utils as utils from sscape_policies import ( detectionPolicy, detection3DPolicy, @@ -88,10 +90,14 @@ def processFrame(self, frame): return True class PostInferenceDataPublish: - def __init__(self, cameraid, metadatagenpolicy='detectionPolicy', publish_image=False): + def __init__(self, cameraid, metadatagenpolicy='detectionPolicy', publish_frame=False, publish_image=None): self.cameraid = cameraid - self.is_publish_image = publish_image + # Support both publish_frame and publish_image for backward compatibility + if publish_image is not None: + self.is_publish_image = publish_image + else: + self.is_publish_image = publish_frame self.is_publish_calibration_image = False self.cam_auto_calibrate = False self.cam_auto_calibrate_intrinsics = None diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 42dc9fd4b..68ebfdc1f 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -4,20 +4,10 @@ - **Date**: 2025-10-07 - **Status**: `Proposed` - **Related ADRs**: TBD -### Data Flow Patterns -- **Streaming**: Real-time sensor data processing with continuous output streams -- **Batch**: Offline processing of recorded sensor data -- **Hybrid**: Live processing with periodic batch analysis for quality assurance - -### Coordinate System Management - -- **Local Coordinates**: Pipeline outputs positions in camera/sensor coordinate space -- **World Coordinate Transformation**: External responsibility using calibration data (handled by SceneScape MOT) -- **Multi-Sensor Fusion**: Requires external coordinate system reconciliation and cross-sensor tracking -- **Single-Sensor Scope**: Vision pipeline operates only within individual sensor coordinate systems +--- -### Performance Considerationsiew +## Overview This document defines a simple API for connecting cameras, configuring vision analytics pipelines, and accessing object detection metadata. The API enables domain experts to deploy computer vision capabilities without requiring deep technical knowledge of AI models, pipeline configurations, or video processing implementations. @@ -377,7 +367,7 @@ sequenceDiagram ### Coordinate System Management - **Local Coordinates**: Pipeline outputs positions in camera/sensor coordinate space without knowledge of world coordinates or global scene context -- **Camera Coordinates**: 3D coordinate output depends on detection model and sensor modality: +- **Camera Coordinates**: Coordinate output depends on detection model and sensor modality: - **Monocular 3D Detectors**: Require intrinsic calibration parameters to estimate depth and convert to 3D camera space - **LiDAR/Radar Sensors**: Provide native 3D point cloud data in sensor coordinate space - **2D-Only Models**: Most 2D detectors operate natively in image pixel coordinates (x, y within frame dimensions) and it is acceptable to publish detection results in these units From 4f12b1ca7c9d0a4ba2fb6d505c60e3ad56a2f179 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Wed, 15 Oct 2025 16:50:39 -0700 Subject: [PATCH 03/20] Remove unrelated sscape_adapter.py changes Keep only the vision pipeline API design document changes in this branch. --- .../user_scripts/gvapython/sscape/sscape_adapter.py | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/dlstreamer-pipeline-server/user_scripts/gvapython/sscape/sscape_adapter.py b/dlstreamer-pipeline-server/user_scripts/gvapython/sscape/sscape_adapter.py index d8fe80b0e..2dd6a88f1 100644 --- a/dlstreamer-pipeline-server/user_scripts/gvapython/sscape/sscape_adapter.py +++ b/dlstreamer-pipeline-server/user_scripts/gvapython/sscape/sscape_adapter.py @@ -16,9 +16,7 @@ import paho.mqtt.client as mqtt from pytz import timezone -import sys -sys.path.insert(0, '/home/pipeline-server') -import utils.publisher_utils as utils +from utils import publisher_utils as utils from sscape_policies import ( detectionPolicy, detection3DPolicy, @@ -90,14 +88,10 @@ def processFrame(self, frame): return True class PostInferenceDataPublish: - def __init__(self, cameraid, metadatagenpolicy='detectionPolicy', publish_frame=False, publish_image=None): + def __init__(self, cameraid, metadatagenpolicy='detectionPolicy', publish_image=False): self.cameraid = cameraid - # Support both publish_frame and publish_image for backward compatibility - if publish_image is not None: - self.is_publish_image = publish_image - else: - self.is_publish_image = publish_frame + self.is_publish_image = publish_image self.is_publish_calibration_image = False self.cam_auto_calibrate = False self.cam_auto_calibrate_intrinsics = None From 4234d67597491acbe313a5b442e3523ba73a68d7 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Thu, 16 Oct 2025 11:52:24 -0700 Subject: [PATCH 04/20] Added time synchronization section and clarified latency and throughput details. --- docs/design/vision-pipeline-overview.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 68ebfdc1f..8cf91c66e 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -349,7 +349,7 @@ sequenceDiagram Note over User,API: Add camera to existing pipeline:
- Camera ID: "cam_south"
- RTSP URL, resolution
- Inherits pipeline analytics API->>Server: Create camera and add to pipeline Server->>Server: Establish camera connection - Server->>Server: Configure multi-camera processing + Server->>Server: Configure multi-camera batching Server->>Server: Apply existing analytics to new camera Note over Server: Processing both cameras:
Camera 1 + Camera 2
→ Detection + Classification @@ -379,11 +379,20 @@ sequenceDiagram - **Resource Management**: Interface should specify computational and memory requirements per pipeline stage for capacity planning - **Hardware Targeting**: Enable per-stage optimization across CPU, iGPU, GPU, and NPU resources for balanced performance -- **Latency Requirements**: Support configurable real-time guarantees based on application needs (traffic safety vs. analytics) -- **Throughput Scaling**: Interface should support multiple concurrent sensor streams without performance degradation +- **Latency Requirements**: Support configurable real-time guarantees based on application needs (e.g., <15ms latency for traffic safety may cause more frames to be dropped and consequent drop in throughput) + - Latency and throughput are not always inversely related when parallel operations are possible, such as cross-camera batching +- **Throughput Scaling**: Additional concurrent sensor streams should be optimized using techniques such as cross-sensor/camera batching and other methods that minimize latency and maximize throughput as much as possible - **System Headroom**: Enable configuration of available computational headroom reserved for other workloads to prevent pipeline overload - **Dynamic Load Balancing**: Support runtime adjustment of processing priorities based on system load and application criticality +### Time Coordination + +- **System Requirements**: Time synchronization must be better than the dynamic observability of the system; e.g., monitoring scenes with faster moving objects requires better time precision +- **Precision Timestamping**: Spatiotemporal fusion requires precision timestamping, ideally at the moment of sensor data acquisition (before encoding, transmission, and other operations) +- **Platform Responsibility**: Implementation of time synchronization is the responsibility of the hardware+OS platform and is outside the scope of the pipeline server (system timestamps are assumed to be synchronized) + - Various technologies may be applied, including NTP, IEEE 1588 PTP, time sensitive networking (TSN), GPS PPS, and related capabilities +- **Fallback Options**: Time synchronization may not always be possible at frame acquisition, and late timestamping may be the only viable option; in this case, a configurable latency offset may need to be applied (backdating the timestamp by some configurable amount on a per-camera and/or per camera batch basis) when the frame arrives at the pipeline + ### Server Architecture - **Single Server Instance**: One persistent server instance per compute node manages all vision pipelines, eliminating configuration complexity from multiple service instances @@ -404,6 +413,7 @@ A pipeline stage represents a single operation such as a detection or classifica - **Customer Extensibility**: Future capability for customers to register custom analytics stages through standardized interfaces - **Configuration Templates**: Pre-built stage combinations and templates for common use cases to simplify deployment - **Runtime Management**: Eventually support dynamic loading and unloading of analytics stages without service restart +- **Stage Management Service**: Future consideration for a dedicated stage management service, particularly when integrated with a model server for centralized analytics lifecycle management --- From d3940297ee2db768697f2f1300e29e5788ff9bc4 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Fri, 17 Oct 2025 14:22:51 -0700 Subject: [PATCH 05/20] Enhance Vision Pipeline API design with additional workflows and improvements - Add sequence diagram for parallel pipeline stages (concurrent analytics) - Add sequence diagram for retrieving pipeline overview (system-wide monitoring) - Update frame access API to be less prescriptive on implementation methods - Add web-based graph visualization note for UI integration - Replace 'raw' with 'source' terminology for better clarity with domain experts - Improve sequence diagram interactions to be more declarative - Various formatting and clarity improvements throughout --- docs/design/vision-pipeline-overview.md | 88 +++++++++++++++++++------ 1 file changed, 69 insertions(+), 19 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 8cf91c66e..3c5035275 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -17,8 +17,8 @@ The vision pipeline API abstracts away technical complexity while providing reli - **Simple Camera Management**: Easy API to connect and manage one or many camera inputs dynamically - **Composable Analytics Pipelines**: Modular pipeline stages that can be chained together (e.g., vehicle detection → license plate detection → OCR) where each stage can be pre-configured but combined flexibly -- **Raw Frame Access**: On-demand access to original camera frames regardless of input type or source -- **Performance Optimization**: Easy configuration of inference targets (CPU, iGPU, GPU, NPU) for optimal hardware utilization +- **Source Frame Access**: On-demand access to original camera frames regardless of input type or source +- **Performance Optimization**: Easy configuration of hardware acceleration targets (CPU, iGPU, GPU, NPU) for optimal utilization - **Abstracted Complexity**: Hide AI model management, pipeline optimization details, and video processing complexity from domain experts - **API-First Design**: Enable development of reference UIs for managing pipelines and sensor sources, supporting integration with SceneScape UI, VIPPET, or customer-implemented interfaces @@ -32,7 +32,7 @@ The vision pipeline API abstracts away technical complexity while providing reli ### Primary Persona: **Traffic Operations Expert** -- **Background**: Transportation engineer, city planner, or traffic management specialist who wants to leverage computer vision to improve traffic flow, safety, and urban mobility +- **Background**: Transportation engineer, systems integrator, or traffic management specialist who wants to leverage computer vision to improve traffic flow, safety, and urban mobility - **Goal**: Deploy smart intersection systems that provide actionable traffic insights and automated responses without requiring deep computer vision expertise - **Technical Level**: Understands traffic engineering, urban planning, and sensor networks but has limited computer vision knowledge; wants to focus on traffic optimization, not algorithm configuration - **Pain Points**: @@ -64,7 +64,7 @@ A traffic operations expert wants to deploy vision analytics at a busy intersect - Preserved frame timestamps and camera source IDs - Procedurally generated MQTT topics with optional namespace configuration -4. **Raw Frame Access**: Provide on-demand access to original camera frames for debugging, validation, and manual review - regardless of camera type or connection method +4. **Source Frame Access**: Provide on-demand access to original camera frames for debugging, validation, and manual review - regardless of camera type or connection method They want to say: Connect these cameras, run vehicle and person detection, send metadata to SceneScape via MQTT and have a simple API that handles all the technical complexity - without needing to understand AI model formats, video decoding, or pipeline optimization. @@ -85,8 +85,8 @@ The vision pipeline interface defines a clear contract between data inputs, proc flowchart LR subgraph Inputs["Inputs"] subgraph SensorInputs["Sensor Inputs"] - CAM1["Camera 1
Raw Video"] - CAM2["Camera 2
Raw Video"] + CAM1["Camera 1
Source Video"] + CAM2["Camera 2
Source Video"] LIDAR["LiDAR
Point Cloud"] RADAR["Radar
Point Cloud"] end @@ -108,7 +108,7 @@ flowchart LR subgraph Outputs["Pipeline Outputs"] DETECTIONS["Object Detections & Tracks
(bounding boxes, classifications, temporal associations, IDs, embeddings)"] - RAWDATA["Raw Data
(original frames, point clouds)"] + RAWDATA["Source Data
(original frames, point clouds)"] DECORATED["Decorated Data
(annotated images, segmented point clouds)"] end @@ -157,7 +157,7 @@ flowchart LR - **Self-Contained Processing**: Each stage includes its own pre-processing (data preparation, format conversion) and post-processing (result formatting, filtering, validation) - **Technology Agnostic**: Stages can run any type of analytics including computer vision (CV), deep learning (DL), traditional image processing, or related technologies - **Modular Interface**: Standardized input/output interfaces allow stages to be combined regardless of underlying technology -- **Independent Optimization**: Each stage can be optimized separately for different performance characteristics and hardware targets +- **Flexible Optimization**: Each stage can be optimized for different performance characteristics and hardware targets, including inter-stage optimizations like buffer sharing on the same device ### Metadata Output API @@ -166,18 +166,19 @@ flowchart LR - **MQTT Publishing**: All detection metadata published to MQTT brokers in JSON format with procedurally generated topics - **Batch Processing**: Minimized chatter with one message per batch to reduce network overhead and improve performance - **Temporal Preservation**: Original frame timestamps preserved along with camera source ID for accurate temporal correlation -- **Schema Validation**: Updated JSON schema provided for metadata structure validation and downstream system integration +- **Updated Schema Availability**: Updated JSON schema provided for downstream metadata validation and integration (output from the pipeline is assumed to be valid against the provided schema) - **Topic Generation**: MQTT topics are procedurally generated based on camera IDs and pipeline configuration, with optional top-level namespace configuration to prevent user errors ### Frame Access API **On-demand access to camera frame data:** -- **Live Frame Retrieval**: Get current frame from any connected camera -- **Historical Frame Access**: Access stored frames with timestamp-based queries -- **Decorated Frame Access**: Retrieve frames with detection bounding boxes, labels, and confidence scores overlaid -- **Batch Frame Export**: Download multiple frames for analysis or debugging -- **Frame Metadata**: Include camera settings, timestamps, and detection overlays +- **Near Real-Time Source Frames**: Access undecorated source frames from any camera for calibration workflows and data flow confirmation +- **Near Real-Time Decorated Frames**: Access frames with detection bounding boxes, throughput, labels, and confidence scores overlaid for monitoring video analytics state +- **Web-Streamable Output**: Frame access designed for low-latency streaming into web application UIs (target <100ms latency) +- **Implementation Flexibility**: Frame access may be provided through various methods including REST endpoints, WebRTC streams, WebSocket connections, or dedicated streaming protocols + +**Note**: This API specification focuses on near real-time frame access only. Historical frame access (by camera ID and timestamp or timestamp range) is not required for this interface and may be considered as a separate system capability in future versions. **Performance Note**: Frame access operations must be designed to avoid impacting system throughput or latency whenever possible. Frame retrieval should use separate data paths or buffering mechanisms that do not interfere with real-time analytics processing. @@ -205,7 +206,7 @@ sequenceDiagram Server->>MQTT: Publish camera status (connected) API-->>User: Camera ID and status - Note over User,MQTT: Camera running in free-run mode
Raw frames available for calibration
No analytics processing yet + Note over User,MQTT: Camera running in free-run mode
Source frames available for calibration
No analytics processing yet ``` ### Add Single Pipeline Stage and Verify Results @@ -231,8 +232,8 @@ sequenceDiagram MQTT-->>User: Track data available for consumption SceneScape-->>User: Visual verification in SceneScape UI - User->>API: GET /frames/{camera_id}/decorated - API-->>User: Frame with detection overlays + User->>API: Request decorated frames for camera + API-->>User: Stream frames with detection overlays Note over User: Visual verification of detections
Complete data flow: detections → tracks → properties ``` @@ -249,11 +250,11 @@ sequenceDiagram User->>API: PUT /pipelines/{pipeline_id}/stages/{stage_id} Note over User,API: Update stage configuration:
- Change from Vehicle Detection
- To Person Detection - API->>Server: Stop current stage + API->>Server: Send pipeline configuration change Server->>Server: Cleanup detection resources Server->>Server: Initialize person detection Server->>Server: Resume analytics processing - Server->>MQTT: Publish updated metadata schema + Server->>MQTT: Publish updated metadata API-->>User: Stage update confirmation Note over Server,MQTT: Pipeline now outputs
person detection data @@ -331,6 +332,34 @@ sequenceDiagram API-->>User: Stage addition confirmation ``` +### Add Parallel Pipeline Stages + +**Purpose**: Add concurrent analytics processing for independent object types on the same camera input. + +```mermaid +sequenceDiagram + participant User as Traffic Engineer + participant API as Vision Pipeline API + participant Server as Pipeline Server + participant MQTT as MQTT Broker + + Note over User: Existing pipeline: Vehicle Detection + + User->>API: POST /pipelines/{pipeline_id}/stages + Note over User,API: Add parallel stage:
- Input: Source camera frames
- Stage: Person Detection
- Hardware: GPU
- Mode: Parallel + API->>Server: Validate parallel stage configuration + Server->>Server: Create person detection stage + Server->>Server: Configure parallel processing + Server->>Server: Start concurrent analytics + + Note over Server: Parallel processing:
1. Vehicle Detection (GPU)
2. Person Detection (GPU)
Both processing same input frames + + Server->>Server: Merge results from parallel stages + Server->>MQTT: Publish combined metadata + Note over MQTT: Single message with unified detection list:
All vehicle + person detections
from concurrent analytics + API-->>User: Parallel stage addition confirmation +``` + ### Add Additional Camera to Existing Pipeline **Purpose**: Scale pipeline to process multiple cameras with batched MQTT output while preserving individual camera metadata. @@ -362,6 +391,27 @@ sequenceDiagram API-->>User: Camera addition confirmation ``` +### Retrieve Pipeline Overview + +**Purpose**: Request and view all pipelines with their associated cameras and sensors for system-wide inspection. + +```mermaid +sequenceDiagram + participant User as Traffic Engineer + participant API as Vision Pipeline API + participant Server as Pipeline Server + + User->>API: GET /pipelines + Note over User,API: Request all pipeline configurations + API->>Server: Retrieve system-wide pipeline data + Server->>Server: Collect all pipeline configurations + Server->>Server: Include associated cameras and stages + API-->>User: Complete pipeline overview (JSON format) + Note over User: UI displays system overview:
- All active pipelines
- Camera assignments
- Stage configurations
- Resource utilization +``` + +**Note**: The JSON response format is designed to be compatible with web-based graph visualization tools, enabling interactive pipeline diagrams where cameras appear as input nodes, stages as processing nodes, and data flows as connecting edges. + ## Implementation Considerations ### Coordinate System Management From f4b833cf59f9a6e5c9f788dc360ef1e5093c36b8 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Fri, 17 Oct 2025 14:26:11 -0700 Subject: [PATCH 06/20] Remove redundant mention of procedurally generated MQTT topics - Eliminate duplicate reference in MQTT Publishing bullet point - Keep detailed explanation in dedicated Topic Generation section - Improve document clarity and reduce redundancy --- docs/design/vision-pipeline-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 3c5035275..0a54945ec 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -163,7 +163,7 @@ flowchart LR **MQTT-focused metadata publishing for SceneScape integration:** -- **MQTT Publishing**: All detection metadata published to MQTT brokers in JSON format with procedurally generated topics +- **MQTT Publishing**: All detection metadata published to MQTT brokers in JSON format - **Batch Processing**: Minimized chatter with one message per batch to reduce network overhead and improve performance - **Temporal Preservation**: Original frame timestamps preserved along with camera source ID for accurate temporal correlation - **Updated Schema Availability**: Updated JSON schema provided for downstream metadata validation and integration (output from the pipeline is assumed to be valid against the provided schema) From 3f5b026677763ece61a2cabfe05b69c0f33fafa1 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Mon, 20 Oct 2025 11:34:19 -0700 Subject: [PATCH 07/20] Enhance vision pipeline API design document - Standardize sequence diagrams to use 'User' consistently - Add robust camera error handling with persistent reconnection capabilities - Add comprehensive system monitoring API for observability - Add multimodal input support section with audio, LiDAR, and radar - Add security considerations note referencing separate hardening guide - Improve document structure and readability for target audience --- docs/design/vision-pipeline-overview.md | 113 ++++++++++++++++++------ 1 file changed, 85 insertions(+), 28 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 0a54945ec..cb5518d88 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -1,7 +1,7 @@ # Design Document: Vision Pipeline API for Domain Experts - **Author(s)**: Rob Watts -- **Date**: 2025-10-07 +- **Date**: 2025-10-20 - **Status**: `Proposed` - **Related ADRs**: TBD @@ -89,6 +89,7 @@ flowchart LR CAM2["Camera 2
Source Video"] LIDAR["LiDAR
Point Cloud"] RADAR["Radar
Point Cloud"] + AUDIO["Audio
Sound Data"] end subgraph ConfigInputs["Configuration Inputs"] @@ -120,12 +121,22 @@ flowchart LR classDef outputs fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000 class VIDEO,POINTCLOUD pipeline - class CAM1,CAM2,LIDAR,RADAR sensors + class CAM1,CAM2,LIDAR,RADAR,AUDIO sensors class MODELS,CALIB config class TIME platform class DETECTIONS,RAWDATA,DECORATED outputs ``` +### Multimodal Input Support + +While this document primarily focuses on camera-based vision systems, the interface is designed to establish a unified approach that accommodates multiple sensor modalities including 3D point-cloud sources and audio data. This multimodal architecture ensures the API can support sensor fusion applications where different sensors contribute complementary information: + +- **Cameras**: Provide high-resolution visual data for object detection, classification, and visual analytics +- **LiDAR/Radar**: Contribute precise spatial positioning, distance measurements, and velocity data through 3D point-cloud processing +- **Audio**: Enable acoustic event detection, sound classification, and audio-visual correlation for comprehensive scene understanding + +The interface design anticipates the growing prevalence of multimodal sensing in computer vision deployments, such as demonstrated in the Sensor Fusion for Traffic Management sample application (formerly TFCC) in the Metro AI Suite. All general requirements, API patterns, and architectural principles described in this document apply to multimodal data sources, even while cameras remain the primary sensor type in current implementations. + ## Vision Pipeline API Components ### Camera Management API @@ -137,20 +148,30 @@ flowchart LR - **Camera Status**: Monitor connection health, frame rate, and video quality - **Camera Configuration**: Set resolution, frame rate, and encoding parameters - **Multi-Source Support**: Handle mixed camera types (IP cameras, USB webcams, video files) in single deployment +- **Robust Error Handling**: Comprehensive error handling for network issues, authentication failures, and protocol incompatibilities with detailed logging +- **Connection Resilience**: Automatic retry mechanisms with configurable backoff strategies for network interruptions and camera disconnections +- **Persistent Reconnection**: Optional continuous reconnection attempts that persist indefinitely until cameras return online, maintaining system resilience during extended outages +- **Connection Monitoring**: Real-time monitoring endpoints for camera connection status, error rates, and reconnection attempts to enable proactive troubleshooting ### Pipeline Configuration API -**Composable analytics pipeline stages:** +**Pipeline Stage Types:** + +The following stage types represent common analytics capabilities that can be configured and chained together. These are examples of the types of stages available - the system is designed to support additional stage types and custom analytics as needed. + +- **Detection Stages**: Vehicle detection, person detection, general object detection, license plate detection, oriented bounding box detection, segmentation, keypoint detection, 3D bounding box detection +- **Classification Stages**: Generate text labels for vehicle types, person attributes, object categories, age/gender, personal protective equipment, mask wearing, and image-to-text descriptions +- **Analysis Stages**: OCR text extraction, barcode detection/decoding, QR code detection/decoding, AprilTag detection/decoding, re-identification embedding generation, pose estimation + +**Pipeline Stage Requirements:** -- **Detection Stages**: Vehicle detection, person detection, general object detection, license plate detection, barcode detection, QR code detection, AprilTag detection -- **Classification Stages**: Vehicle type classification, person attribute classification, object categorization -- **Analysis Stages**: OCR text extraction, barcode decoding, QR code decoding, AprilTag pose estimation, re-identification embedding generation, pose estimation -- **Pipeline Composition**: Chain compatible stages together where outputs of one stage match inputs of the next (e.g., vehicle detection → vehicle classification, license plate detection → OCR, barcode detection → barcode decoding) +- **Pipeline Composition**: Chain compatible stages together where outputs of one stage match inputs of the next (e.g., vehicle detection → vehicle classification, license plate detection → OCR) - **Compatibility Validation**: System prevents invalid stage chaining when output formats are incompatible (e.g., classification stage cannot feed into detection stage) - **Parallel Processing**: Support both sequential stage chaining and parallel stage execution for independent analytics on the same input - **Pre-configured Stages**: Each stage comes with optimized default settings but allows customization - **Per-Stage Hardware Optimization**: Target each individual stage to specific hardware (CPU, iGPU, GPU, NPU) for optimal performance - **Pipeline Templates**: Save and reuse common stage combinations across deployments +- **Configuration Schema Availability**: JSON schemas for pipeline and stage configurations provided via API endpoints for validation and tooling integration **Pipeline Stage Architecture:** @@ -159,15 +180,19 @@ flowchart LR - **Modular Interface**: Standardized input/output interfaces allow stages to be combined regardless of underlying technology - **Flexible Optimization**: Each stage can be optimized for different performance characteristics and hardware targets, including inter-stage optimizations like buffer sharing on the same device -### Metadata Output API +### Metadata Output **MQTT-focused metadata publishing for SceneScape integration:** - **MQTT Publishing**: All detection metadata published to MQTT brokers in JSON format - **Batch Processing**: Minimized chatter with one message per batch to reduce network overhead and improve performance -- **Temporal Preservation**: Original frame timestamps preserved along with camera source ID for accurate temporal correlation -- **Updated Schema Availability**: Updated JSON schema provided for downstream metadata validation and integration (output from the pipeline is assumed to be valid against the provided schema) -- **Topic Generation**: MQTT topics are procedurally generated based on camera IDs and pipeline configuration, with optional top-level namespace configuration to prevent user errors +- **Individual Frame Timestamps**: Each frame maintains its individual timestamp within batched messages for accurate temporal correlation +- **Camera Source Identification**: Each frame preserves its camera source ID within batch metadata +- **Cross-Camera Batching**: Frames are captured and batched across cameras within small time windows for efficiency +- **Original Timing Preservation**: Each frame's metadata preserves its original capture timestamp and camera identifier +- **Metadata Schema Availability**: JSON schemas for detection metadata provided via dedicated API endpoints for programmatic validation and integration +- **Clean Configuration**: Schema artifacts must not be included in configuration JSON to maintain separation of concerns +- **Topic Generation**: MQTT topics procedurally generated based on camera IDs and pipeline configuration with optional namespace configuration ### Frame Access API @@ -182,6 +207,18 @@ flowchart LR **Performance Note**: Frame access operations must be designed to avoid impacting system throughput or latency whenever possible. Frame retrieval should use separate data paths or buffering mechanisms that do not interfere with real-time analytics processing. +### System Monitoring API + +**Observability endpoints for system health and performance:** + +- **Health Check Endpoints**: System-wide health status including API availability, pipeline server status, and MQTT broker connectivity +- **Camera Monitoring**: Per-camera connection status, frame rate statistics, error counts, and reconnection attempt history +- **Pipeline Performance**: Per-pipeline throughput metrics, processing latency measurements, and resource utilization statistics +- **Resource Monitoring**: Hardware utilization metrics for CPU, GPU, NPU, and memory across all pipeline stages +- **Error Rate Tracking**: Aggregated error rates and failure patterns across cameras, pipelines, and individual processing stages +- **System Metrics Export**: Prometheus-compatible metrics export for integration with existing monitoring infrastructure +- **Alert Integration**: Configurable thresholds and alert generation for proactive issue detection and notification + ## API Workflows This section demonstrates common workflows using sequence diagrams to show the API interactions for typical deployment scenarios. @@ -192,7 +229,7 @@ This section demonstrates common workflows using sequence diagrams to show the A ```mermaid sequenceDiagram - participant User as Traffic Engineer + participant User participant API as Vision Pipeline API participant Server as Pipeline Server participant Camera as Camera Source @@ -215,7 +252,7 @@ sequenceDiagram ```mermaid sequenceDiagram - participant User as Traffic Engineer + participant User participant API as Vision Pipeline API participant Server as Pipeline Server participant MQTT as MQTT Broker @@ -243,7 +280,7 @@ sequenceDiagram ```mermaid sequenceDiagram - participant User as Traffic Engineer + participant User participant API as Vision Pipeline API participant Server as Pipeline Server participant MQTT as MQTT Broker @@ -266,7 +303,7 @@ sequenceDiagram ```mermaid sequenceDiagram - participant User as Traffic Engineer + participant User participant API as Vision Pipeline API participant Server as Pipeline Server participant MQTT as MQTT Broker @@ -288,7 +325,7 @@ sequenceDiagram ```mermaid sequenceDiagram - participant User as Traffic Engineer + participant User participant API as Vision Pipeline API participant Server as Pipeline Server participant MQTT as MQTT Broker @@ -311,7 +348,7 @@ sequenceDiagram ```mermaid sequenceDiagram - participant User as Traffic Engineer + participant User participant API as Vision Pipeline API participant Server as Pipeline Server participant MQTT as MQTT Broker @@ -338,7 +375,7 @@ sequenceDiagram ```mermaid sequenceDiagram - participant User as Traffic Engineer + participant User participant API as Vision Pipeline API participant Server as Pipeline Server participant MQTT as MQTT Broker @@ -366,7 +403,7 @@ sequenceDiagram ```mermaid sequenceDiagram - participant User as Traffic Engineer + participant User participant API as Vision Pipeline API participant Server as Pipeline Server participant MQTT as MQTT Broker @@ -397,7 +434,7 @@ sequenceDiagram ```mermaid sequenceDiagram - participant User as Traffic Engineer + participant User participant API as Vision Pipeline API participant Server as Pipeline Server @@ -425,23 +462,39 @@ sequenceDiagram - **Multi-Sensor Fusion**: Requires external coordinate system reconciliation and cross-sensor tracking - accomplished outside of the pipeline scope - **Single-Sensor Scope**: Vision pipeline operates independently within individual sensor coordinate systems, maintaining clear boundaries +### Time Coordination + +- **System Requirements**: Time synchronization must be better than the dynamic observability of the system; e.g., monitoring scenes with faster moving objects requires better time precision +- **Precision Timestamping**: Spatiotemporal fusion requires precision timestamping, ideally at the moment of sensor data acquisition (before encoding, transmission, and other operations) +- **Platform Responsibility**: Implementation of time synchronization is the responsibility of the hardware+OS platform and is outside the scope of the pipeline server (system timestamps are assumed to be synchronized) + - Various technologies may be applied, including NTP, IEEE 1588 PTP, time sensitive networking (TSN), GPS PPS, and related capabilities +- **Fallback Options**: Time synchronization may not always be possible at frame acquisition, and late timestamping may be the only viable option; in this case, a configurable latency offset may need to be applied (backdating the timestamp by some configurable amount on a per-camera and/or per camera batch basis) when the frame arrives at the pipeline +- **Distributed System Architecture**: In many deployments, the system operates in a distributed manner across edge clusters with various processing stages running on different compute nodes. This distributed architecture requires robust time synchronization across network boundaries and careful consideration of network latency when correlating timestamped data between processing stages. + ### Performance Considerations - **Resource Management**: Interface should specify computational and memory requirements per pipeline stage for capacity planning - **Hardware Targeting**: Enable per-stage optimization across CPU, iGPU, GPU, and NPU resources for balanced performance -- **Latency Requirements**: Support configurable real-time guarantees based on application needs (e.g., <15ms latency for traffic safety may cause more frames to be dropped and consequent drop in throughput) - - Latency and throughput are not always inversely related when parallel operations are possible, such as cross-camera batching - **Throughput Scaling**: Additional concurrent sensor streams should be optimized using techniques such as cross-sensor/camera batching and other methods that minimize latency and maximize throughput as much as possible - **System Headroom**: Enable configuration of available computational headroom reserved for other workloads to prevent pipeline overload - **Dynamic Load Balancing**: Support runtime adjustment of processing priorities based on system load and application criticality -### Time Coordination +### Latency Requirements -- **System Requirements**: Time synchronization must be better than the dynamic observability of the system; e.g., monitoring scenes with faster moving objects requires better time precision -- **Precision Timestamping**: Spatiotemporal fusion requires precision timestamping, ideally at the moment of sensor data acquisition (before encoding, transmission, and other operations) -- **Platform Responsibility**: Implementation of time synchronization is the responsibility of the hardware+OS platform and is outside the scope of the pipeline server (system timestamps are assumed to be synchronized) - - Various technologies may be applied, including NTP, IEEE 1588 PTP, time sensitive networking (TSN), GPS PPS, and related capabilities -- **Fallback Options**: Time synchronization may not always be possible at frame acquisition, and late timestamping may be the only viable option; in this case, a configurable latency offset may need to be applied (backdating the timestamp by some configurable amount on a per-camera and/or per camera batch basis) when the frame arrives at the pipeline +Latency is critical for real-time operation and must be configurable based on application needs (e.g., <15ms for traffic safety applications). + +- **Real-Time Priority**: Low latency is essential for safety-critical applications where delayed responses can impact traffic flow and safety +- **Critical Use Cases**: Ultra-low latency enables mission-critical applications such as CV2X signaling for jaywalking detection, adaptive traffic light controls using pedestrian monitoring, and collision avoidance systems where milliseconds can prevent accidents +- **Latency vs Throughput Trade-offs**: Strict latency requirements may necessitate dropping frames to maintain real-time guarantees, but parallel operations like cross-camera batching can optimize both +- **End-to-End Optimization**: Minimize total pipeline latency from camera data acquisition through analytics output using multiple techniques: + - Avoid unnecessary streaming/restreaming stages that add buffering delays + - Implement cross-camera batching to process multiple camera feeds simultaneously for improved GPU utilization + - Use direct memory access (DMA) and zero-copy operations between pipeline stages + - Optimize network configurations with dedicated VLANs, jumbo frames, and quality of service (QoS) settings + - Minimize intermediate data serialization and format conversions + - Configure hardware-specific optimizations like GPU memory pooling and CPU affinity + - Implement frame skipping strategies under high load to maintain real-time guarantees +- **IP Camera Protocol Selection**: Both RTSP and MJPEG streaming protocols must be supported (robust MJPEG support was lacking in DLS-PS). MJPEG can provide significant latency improvements compared to RTSP (typical: MJPEG ~50-100ms vs RTSP ~500-2000ms+, with some configurations experiencing even higher delays) at the cost of 3-5x higher bandwidth usage, making MJPEG preferable for edge deployments with local network connectivity ### Server Architecture @@ -465,6 +518,10 @@ A pipeline stage represents a single operation such as a detection or classifica - **Runtime Management**: Eventually support dynamic loading and unloading of analytics stages without service restart - **Stage Management Service**: Future consideration for a dedicated stage management service, particularly when integrated with a model server for centralized analytics lifecycle management +### Security Considerations + +Security requirements including authentication, authorization, data encryption, and access control are not covered in this document but must be considered in the implementation. Security architecture, threat models, and hardening procedures will be documented in a separate security and hardening guide. + --- ## Conclusion From 1ec7a0f7f9922a2826aa8757ff676388d669d858 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Tue, 21 Oct 2025 13:47:25 -0700 Subject: [PATCH 08/20] docs: enhance Vision Pipeline API design with core principles and resilience - Create core operating principles section - Strengthen separation of concerns language throughout Camera and Pipeline APIs - Add JSON configuration examples for a couple of camera types - Fix sequence diagram terminology and ensure consistency Incorporate review feedback from Vibhu and Sarat, emphasizing the importance of resilience in production environments and modular system design for operational manageability. --- docs/design/vision-pipeline-overview.md | 132 ++++++++++++++++++------ 1 file changed, 100 insertions(+), 32 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index cb5518d88..937718f1d 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -1,7 +1,7 @@ # Design Document: Vision Pipeline API for Domain Experts - **Author(s)**: Rob Watts -- **Date**: 2025-10-20 +- **Date**: 2025-10-21 - **Status**: `Proposed` - **Related ADRs**: TBD @@ -9,16 +9,28 @@ ## Overview -This document defines a simple API for connecting cameras, configuring vision analytics pipelines, and accessing object detection metadata. The API enables domain experts to deploy computer vision capabilities without requiring deep technical knowledge of AI models, pipeline configurations, or video processing implementations. +This document defines a simple REST API for connecting cameras, configuring vision analytics pipelines, and managing pipeline metadata publishing. The API enables domain experts to deploy computer vision capabilities without requiring deep technical knowledge of AI models, pipeline configurations, or video processing implementations. + +A **domain expert** in this context is a consumer of the video analytics pipeline who has expertise in a given field that is not computer vision. They understand their domain-specific requirements and goals but prefer to focus on their area of expertise rather than the technical complexities of computer vision implementation. The vision pipeline API abstracts away technical complexity while providing reliable object detection metadata that feeds into downstream systems like Intel SceneScape for multi-camera tracking and scene analytics. +## Core Operating Principles + +The vision pipeline API is built on three fundamental principles: + +**1. Production Robustness**: The pipeline must maintain continuous operation in dynamic production environments. Common scenarios such as network jitter, RTSP stream timeouts, camera power cycling, pipeline stage model updates, hardware acceleration target changes (switching between CPU, GPU, NPU), and MQTT broker reconnections should have minimal impact on running pipelines, including minimizing restarts or loss of metadata output when changes or errors occur in any aspect of the system. + +**2. Domain Expert Accessibility**: The pipeline must be easily configurable by domain experts without requiring deep technical knowledge of computer vision implementations. All operations should be intuitive, well-documented, and abstracted from underlying technical complexity while maintaining full functionality and flexibility. + +**3. Modular Manageability**: System components (cameras, pipelines, and pipeline stages) must be defined once and connected together in modular ways. Cameras are managed independently from pipelines, pipelines are managed independently from cameras, and pipeline stages are reusable across different pipeline configurations. This "define once, connect many" approach dramatically reduces configuration complexity, eliminates duplication, and enables rapid deployment changes without system-wide reconfiguration. + ## Goals - **Simple Camera Management**: Easy API to connect and manage one or many camera inputs dynamically -- **Composable Analytics Pipelines**: Modular pipeline stages that can be chained together (e.g., vehicle detection → license plate detection → OCR) where each stage can be pre-configured but combined flexibly +- **Composable Analytics Pipelines**: Modular pipeline stages that can be chained together (e.g., vehicle detection → license plate detection → OCR) where each pre-configured stage can be flexibly combined and downstream stages operate on the output of the previous stage only (and do not operate at all when stage output is null) - **Source Frame Access**: On-demand access to original camera frames regardless of input type or source -- **Performance Optimization**: Easy configuration of hardware acceleration targets (CPU, iGPU, GPU, NPU) for optimal utilization +- **Performance Optimization**: Easy configuration of hardware acceleration targets (CPU, iGPU, GPU, NPU) for optimal utilization with automatic but configurable hardware acceleration for all operations where possible - **Abstracted Complexity**: Hide AI model management, pipeline optimization details, and video processing complexity from domain experts - **API-First Design**: Enable development of reference UIs for managing pipelines and sensor sources, supporting integration with SceneScape UI, VIPPET, or customer-implemented interfaces @@ -32,7 +44,7 @@ The vision pipeline API abstracts away technical complexity while providing reli ### Primary Persona: **Traffic Operations Expert** -- **Background**: Transportation engineer, systems integrator, or traffic management specialist who wants to leverage computer vision to improve traffic flow, safety, and urban mobility +- **Background**: Independent Software Vendor (ISV) engineer working with a city to build smart intersections to improve traffic flow, safety, and urban mobility - **Goal**: Deploy smart intersection systems that provide actionable traffic insights and automated responses without requiring deep computer vision expertise - **Technical Level**: Understands traffic engineering, urban planning, and sensor networks but has limited computer vision knowledge; wants to focus on traffic optimization, not algorithm configuration - **Pain Points**: @@ -48,7 +60,7 @@ A traffic operations expert wants to deploy vision analytics at a busy intersect **API Requirements:** -1. **Camera Management**: Connect 4-8 cameras dynamically via RTSP streams, USB connections, or video files - add/remove cameras without system restart +1. **Camera Management**: Dynamically connect 4-8 cameras using various input methods (RTSP streams, MJPEG streams, WebRTC streams, USB connections, or offline video files) with fast, API-driven camera addition and removal that handles backend operations transparently 2. **Pipeline Composition**: Compose analytics pipelines by chaining stages together: @@ -141,20 +153,65 @@ The interface design anticipates the growing prevalence of multimodal sensing in ### Camera Management API +**Camera-Pipeline Separation**: Cameras are managed independently from pipeline configuration, dramatically improving system manageability through modular design. Each camera is defined once with its connection details and properties, then can be dynamically connected to any compatible pipeline without reconfiguration. A single camera can feed into multiple pipelines for different analytics, while a single pipeline can process video from multiple cameras simultaneously. This modular approach eliminates configuration duplication, reduces operational complexity, and enables rapid deployment changes without system-wide reconfiguration. + **Dynamic camera connection and configuration:** -- **Add Camera**: Connect new cameras via RTSP, USB, or file input without system restart +- **Add Camera**: Connect new cameras via RTSP, MJPEG, WebRTC, USB, or file input - **Remove Camera**: Disconnect cameras and clean up resources gracefully -- **Camera Status**: Monitor connection health, frame rate, and video quality - **Camera Configuration**: Set resolution, frame rate, and encoding parameters +- **Camera Properties**: Configure camera intrinsics and distortion parameters, with support for dynamically updating these values in near real-time to support zoom cameras +- **Default Configuration**: Apply sensible defaults when configuration parameters are not explicitly provided, minimizing setup complexity for common camera types and use cases +- **JSON Configuration**: All camera configuration handled through JSON-only payloads for consistent API interaction - **Multi-Source Support**: Handle mixed camera types (IP cameras, USB webcams, video files) in single deployment -- **Robust Error Handling**: Comprehensive error handling for network issues, authentication failures, and protocol incompatibilities with detailed logging +- **Robust Error Handling**: Comprehensive error handling for network issues, frame corruption, authentication failures, and protocol incompatibilities with detailed logging, while maintaining pipeline operation when possible - **Connection Resilience**: Automatic retry mechanisms with configurable backoff strategies for network interruptions and camera disconnections - **Persistent Reconnection**: Optional continuous reconnection attempts that persist indefinitely until cameras return online, maintaining system resilience during extended outages -- **Connection Monitoring**: Real-time monitoring endpoints for camera connection status, error rates, and reconnection attempts to enable proactive troubleshooting + +The following examples demonstrate adding cameras independently of pipeline configuration. Each camera inherits sensible system defaults (such as auto-detected resolution, frame rate, and default intrinsics) while allowing selective override of specific parameters when needed. Cameras can be added without concern for what analytics pipelines will eventually process their video streams. + +**Example Configuration (RTSP Camera):** +```json +{ + "camera_id": "cam_north", + "source": "rtsp://192.168.1.100:554/stream1" +} +``` + +**Example Configuration (USB Camera):** +```json +{ + "camera_id": "cam_usb", + "source": "/dev/video0" +} +``` + +**Example Configuration (MJPEG Camera):** +```json +{ + "camera_id": "cam_mjpeg", + "source": "http://192.168.1.102:8080/video" +} +``` + +**Example Configuration (RTSP with Authentication and Custom Intrinsics):** +```json +{ + "camera_id": "cam_south", + "source": "rtsp://admin:camera_pass@192.168.1.101:554/stream1", + "intrinsics": [ + [1000.0, 0.0, 960.0], + [0.0, 1000.0, 540.0], + [0.0, 0.0, 1.0] + ], + "distortion": [-0.1, 0.05, 0.0, 0.0, -0.01] +} +``` ### Pipeline Configuration API +**Pipeline-Camera Independence**: Pipelines are defined and managed independently from camera sources, significantly improving system manageability through reusable analytics configurations. Each pipeline is defined once with its analytics stages and processing requirements, then can be applied to any compatible camera or set of cameras without modification. This modular approach eliminates the need to recreate identical analytics configurations for each camera, reduces maintenance overhead, and enables consistent analytics behavior across diverse camera deployments. Pipeline definitions become reusable assets that can be instantly deployed across new camera installations. + **Pipeline Stage Types:** The following stage types represent common analytics capabilities that can be configured and chained together. These are examples of the types of stages available - the system is designed to support additional stage types and custom analytics as needed. @@ -165,34 +222,27 @@ The following stage types represent common analytics capabilities that can be co **Pipeline Stage Requirements:** +- **Multi-Camera Processing**: Pipelines can simultaneously process video from multiple cameras, applying identical analytics configurations across all camera sources while maintaining per-camera metadata identification - **Pipeline Composition**: Chain compatible stages together where outputs of one stage match inputs of the next (e.g., vehicle detection → vehicle classification, license plate detection → OCR) - **Compatibility Validation**: System prevents invalid stage chaining when output formats are incompatible (e.g., classification stage cannot feed into detection stage) -- **Parallel Processing**: Support both sequential stage chaining and parallel stage execution for independent analytics on the same input +- **Parallel and Sequential Processing**: Support both sequential stage chaining and parallel stage execution for independent analytics on the same input - **Pre-configured Stages**: Each stage comes with optimized default settings but allows customization - **Per-Stage Hardware Optimization**: Target each individual stage to specific hardware (CPU, iGPU, GPU, NPU) for optimal performance - **Pipeline Templates**: Save and reuse common stage combinations across deployments -- **Configuration Schema Availability**: JSON schemas for pipeline and stage configurations provided via API endpoints for validation and tooling integration +- **Configuration Schema Availability**: JSON schema for pipeline and stage configurations provided via API endpoints for validation and tooling integration, ideally with a single extensible schema for all possible pipeline configurations +- **Stage Input/Output Behavior**: A given stage operates on the output of the previous stage (or the original frame for the first stage), and may operate on multiple outputs +- **Unscaled Image Data Output**: For stages that output image-like data (rather than text data), the output must refer to the unscaled portion of the input associated with the detection, such as the bounding box or a masked output of oriented bounding box or instance segment +- **Metadata Collation**: Whenever a stage runs, the metadata is collated into a single object array per chain, with a property key defined by each stage that has run (e.g. when `vehicle+lpd+lpr` finds a vehicle but no plate, the metadata will have an empty `"lpd: []"` array to indicate the stage ran but found nothing, and no `lpr` value exists because it didn't run) +- **Guaranteed Output**: Every frame input must have a resultant metadata output, even if nothing is detected (not detecting something is also an important result) +- **Source Frame Coordinates**: All collated metadata is reported in source frame coordinates for staged operations, e.g. vehicle bounding box and the license plate bounding box are both reported in original frame pixel units **Pipeline Stage Architecture:** - **Self-Contained Processing**: Each stage includes its own pre-processing (data preparation, format conversion) and post-processing (result formatting, filtering, validation) - **Technology Agnostic**: Stages can run any type of analytics including computer vision (CV), deep learning (DL), traditional image processing, or related technologies -- **Modular Interface**: Standardized input/output interfaces allow stages to be combined regardless of underlying technology +- **Modular Interface**: Standardized input/output interfaces allow stages to be combined regardless of underlying technology, dramatically improving system manageability by enabling stage reuse across different pipelines - **Flexible Optimization**: Each stage can be optimized for different performance characteristics and hardware targets, including inter-stage optimizations like buffer sharing on the same device - -### Metadata Output - -**MQTT-focused metadata publishing for SceneScape integration:** - -- **MQTT Publishing**: All detection metadata published to MQTT brokers in JSON format -- **Batch Processing**: Minimized chatter with one message per batch to reduce network overhead and improve performance -- **Individual Frame Timestamps**: Each frame maintains its individual timestamp within batched messages for accurate temporal correlation -- **Camera Source Identification**: Each frame preserves its camera source ID within batch metadata -- **Cross-Camera Batching**: Frames are captured and batched across cameras within small time windows for efficiency -- **Original Timing Preservation**: Each frame's metadata preserves its original capture timestamp and camera identifier -- **Metadata Schema Availability**: JSON schemas for detection metadata provided via dedicated API endpoints for programmatic validation and integration -- **Clean Configuration**: Schema artifacts must not be included in configuration JSON to maintain separation of concerns -- **Topic Generation**: MQTT topics procedurally generated based on camera IDs and pipeline configuration with optional namespace configuration +- **Define Once, Connect Many**: Pipeline stages are defined once with their analytics capabilities and requirements, then can be dynamically connected into different pipeline configurations without modification, reducing configuration complexity and enabling rapid analytics deployment ### Frame Access API @@ -219,6 +269,20 @@ The following stage types represent common analytics capabilities that can be co - **System Metrics Export**: Prometheus-compatible metrics export for integration with existing monitoring infrastructure - **Alert Integration**: Configurable thresholds and alert generation for proactive issue detection and notification +## Metadata Output + +**MQTT-focused metadata publishing for SceneScape integration:** + +- **MQTT Publishing**: All detection metadata published to MQTT brokers in JSON format +- **Batch Processing**: Minimized chatter with one message per batch to reduce network overhead and improve performance +- **Individual Frame Timestamps**: Each frame maintains its individual timestamp within batched messages for accurate temporal correlation +- **Camera Source Identification**: Each frame preserves its camera source ID within batch metadata +- **Cross-Camera Batching**: Frames are captured and batched across cameras within small time windows for efficiency +- **Original Timing Preservation**: Each frame's metadata preserves its original capture timestamp and camera identifier +- **Metadata Schema Availability**: JSON schemas for detection metadata provided via dedicated API endpoints for programmatic validation and integration +- **Clean Configuration**: Schema artifacts must not be included in configuration JSON to maintain separation of concerns +- **Topic Generation**: MQTT topics procedurally generated based on camera IDs and pipeline configuration with optional namespace configuration + ## API Workflows This section demonstrates common workflows using sequence diagrams to show the API interactions for typical deployment scenarios. @@ -236,7 +300,7 @@ sequenceDiagram participant MQTT as MQTT Broker User->>API: POST /cameras - Note over User,API: Configure camera (RTSP URL, resolution, etc.) + Note over User,API: Configure camera (RTSP URL, camera ID, etc.) API->>Server: Create camera instance Server->>Camera: Establish connection Camera-->>Server: Video stream @@ -412,7 +476,7 @@ sequenceDiagram Note over User: Existing pipeline processing Camera 1
with Vehicle Detection + Classification User->>API: POST /pipelines/{pipeline_id}/cameras - Note over User,API: Add camera to existing pipeline:
- Camera ID: "cam_south"
- RTSP URL, resolution
- Inherits pipeline analytics + Note over User,API: Add camera to existing pipeline:
- Camera ID: "cam_south"
- RTSP URL
- Inherits pipeline analytics API->>Server: Create camera and add to pipeline Server->>Server: Establish camera connection Server->>Server: Configure multi-camera batching @@ -449,7 +513,7 @@ sequenceDiagram **Note**: The JSON response format is designed to be compatible with web-based graph visualization tools, enabling interactive pipeline diagrams where cameras appear as input nodes, stages as processing nodes, and data flows as connecting edges. -## Implementation Considerations +## Implementation Requirements ### Coordinate System Management @@ -526,6 +590,10 @@ Security requirements including authentication, authorization, data encryption, ## Conclusion -This vision pipeline interface definition provides a clean separation between sensor inputs, configuration inputs, and standardized outputs. By focusing on the interface rather than implementation details, it enables technology-agnostic pipeline development while supporting debugging, validation, and gradual enhancement of existing robust pipeline technologies. +This vision pipeline interface definition is built on three core operating principles that drive both technical excellence and business value: **Production Robustness** ensures reliable operation in dynamic environments, **Domain Expert Accessibility** enables non-specialists to deploy sophisticated computer vision capabilities, and **Modular Manageability** provides unprecedented flexibility through reusable, composable components. + +This principled approach delivers significant customer benefits that accelerate adoption and reduce time to market. Domain experts can rapidly deploy computer vision solutions without deep technical expertise, while the modular architecture eliminates configuration duplication and enables instant reuse of analytics across diverse deployments. The clean separation between cameras, pipelines, and stages dramatically reduces integration complexity and operational overhead. + +The modular composability of pipeline components also enables automated optimization of hardware platforms. Since cameras, analytics stages, and acceleration targets are independently configurable, optimization systems can dynamically reassign workloads across CPU, GPU, and NPU resources based on real-time performance metrics and system load, maximizing throughput while maintaining quality of service guarantees. -The interface is motivated by SceneScape's architectural needs but designed as a reusable specification for any computer vision application requiring clear, maintainable pipeline boundaries built on proven technologies. +By focusing on interface definitions rather than implementation details, this specification enables technology-agnostic pipeline development while supporting debugging, validation, and gradual enhancement of existing robust pipeline technologies. The interface is motivated by SceneScape's architectural needs but designed as a reusable specification for any computer vision application requiring clear, maintainable pipeline boundaries built on proven technologies. From 9eae864d94fcf12057f2402ec145661926b89eab Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Tue, 21 Oct 2025 13:57:58 -0700 Subject: [PATCH 09/20] docs: clarify Stage Input/Output Behavior in pipeline requirements Specify that a stage operates on an array of outputs from a single previous stage, not outputs from multiple different stages, removing ambiguity about pipeline architecture. --- docs/design/vision-pipeline-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 937718f1d..0541998a3 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -230,7 +230,7 @@ The following stage types represent common analytics capabilities that can be co - **Per-Stage Hardware Optimization**: Target each individual stage to specific hardware (CPU, iGPU, GPU, NPU) for optimal performance - **Pipeline Templates**: Save and reuse common stage combinations across deployments - **Configuration Schema Availability**: JSON schema for pipeline and stage configurations provided via API endpoints for validation and tooling integration, ideally with a single extensible schema for all possible pipeline configurations -- **Stage Input/Output Behavior**: A given stage operates on the output of the previous stage (or the original frame for the first stage), and may operate on multiple outputs +- **Stage Input/Output Behavior**: A given stage operates on the output of the previous stage (or the original frame for the first stage), and may operate on an array of outputs from that single previous stage - **Unscaled Image Data Output**: For stages that output image-like data (rather than text data), the output must refer to the unscaled portion of the input associated with the detection, such as the bounding box or a masked output of oriented bounding box or instance segment - **Metadata Collation**: Whenever a stage runs, the metadata is collated into a single object array per chain, with a property key defined by each stage that has run (e.g. when `vehicle+lpd+lpr` finds a vehicle but no plate, the metadata will have an empty `"lpd: []"` array to indicate the stage ran but found nothing, and no `lpr` value exists because it didn't run) - **Guaranteed Output**: Every frame input must have a resultant metadata output, even if nothing is detected (not detecting something is also an important result) From d189958b3152bbfabbf22faad120087ddaef3f03 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Tue, 21 Oct 2025 14:05:20 -0700 Subject: [PATCH 10/20] docs: add MQTT broker configuration requirement to pipeline API Specify that pipelines must support configuring MQTT broker connection details including host, port, credentials, TLS settings, and optional topic namespace prefix via the REST API. --- docs/design/vision-pipeline-overview.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 0541998a3..ba27dbbe4 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -222,6 +222,7 @@ The following stage types represent common analytics capabilities that can be co **Pipeline Stage Requirements:** +- **MQTT Broker Configuration**: Each pipeline must specify the MQTT broker connection details where metadata will be published, including broker host, port, authentication credentials, TLS settings, and optional topic namespace prefix - **Multi-Camera Processing**: Pipelines can simultaneously process video from multiple cameras, applying identical analytics configurations across all camera sources while maintaining per-camera metadata identification - **Pipeline Composition**: Chain compatible stages together where outputs of one stage match inputs of the next (e.g., vehicle detection → vehicle classification, license plate detection → OCR) - **Compatibility Validation**: System prevents invalid stage chaining when output formats are incompatible (e.g., classification stage cannot feed into detection stage) From f3079fab1b0fc1c01d4b56c97d26cd61403a469a Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Tue, 21 Oct 2025 14:10:05 -0700 Subject: [PATCH 11/20] docs: add line break to prevent Mermaid note text from running outside of the note container. --- docs/design/vision-pipeline-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index ba27dbbe4..e02507eb4 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -301,7 +301,7 @@ sequenceDiagram participant MQTT as MQTT Broker User->>API: POST /cameras - Note over User,API: Configure camera (RTSP URL, camera ID, etc.) + Note over User,API: Configure camera
(RTSP URL, camera ID, etc.) API->>Server: Create camera instance Server->>Camera: Establish connection Camera-->>Server: Video stream From eeb902ff0257128894e5c8574dcd1bba872cee13 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Wed, 22 Oct 2025 09:10:20 -0700 Subject: [PATCH 12/20] Fix mermaid diagrams to use curly braces for CI validation --- docs/design/vision-pipeline-overview.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index e02507eb4..ac1e58851 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -93,7 +93,7 @@ The vision pipeline interface enables this by providing: The vision pipeline interface defines a clear contract between data inputs, processing components, and outputs. This interface can be implemented by any computer vision technology stack. -```mermaid +```{mermaid} flowchart LR subgraph Inputs["Inputs"] subgraph SensorInputs["Sensor Inputs"] @@ -292,7 +292,7 @@ This section demonstrates common workflows using sequence diagrams to show the A **Purpose**: Verify camera connectivity and enable downstream calibration without analytics processing. -```mermaid +```{mermaid} sequenceDiagram participant User participant API as Vision Pipeline API @@ -315,7 +315,7 @@ sequenceDiagram **Purpose**: Add analytics processing to connected cameras and verify output in SceneScape. -```mermaid +```{mermaid} sequenceDiagram participant User participant API as Vision Pipeline API @@ -343,7 +343,7 @@ sequenceDiagram **Purpose**: Change the analytics model for an existing pipeline stage. -```mermaid +```{mermaid} sequenceDiagram participant User participant API as Vision Pipeline API @@ -366,7 +366,7 @@ sequenceDiagram **Purpose**: Update camera properties like camera ID with graceful system handling. -```mermaid +```{mermaid} sequenceDiagram participant User participant API as Vision Pipeline API @@ -388,7 +388,7 @@ sequenceDiagram **Purpose**: Remove camera and clean up all associated resources. -```mermaid +```{mermaid} sequenceDiagram participant User participant API as Vision Pipeline API @@ -411,7 +411,7 @@ sequenceDiagram **Purpose**: Chain multiple analytics stages for complex processing workflows. -```mermaid +```{mermaid} sequenceDiagram participant User participant API as Vision Pipeline API @@ -438,7 +438,7 @@ sequenceDiagram **Purpose**: Add concurrent analytics processing for independent object types on the same camera input. -```mermaid +```{mermaid} sequenceDiagram participant User participant API as Vision Pipeline API @@ -466,7 +466,7 @@ sequenceDiagram **Purpose**: Scale pipeline to process multiple cameras with batched MQTT output while preserving individual camera metadata. -```mermaid +```{mermaid} sequenceDiagram participant User participant API as Vision Pipeline API @@ -497,7 +497,7 @@ sequenceDiagram **Purpose**: Request and view all pipelines with their associated cameras and sensors for system-wide inspection. -```mermaid +```{mermaid} sequenceDiagram participant User participant API as Vision Pipeline API From cf76c655a4e33973631f74b94328d372ff09bde3 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Wed, 22 Oct 2025 09:26:01 -0700 Subject: [PATCH 13/20] Revert curly braces around Mermaid diagram language declaration since it breaks GitHub rendering of the diagrams. Will look for cross-support with Sphynx using some other method. --- docs/design/vision-pipeline-overview.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index ac1e58851..e02507eb4 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -93,7 +93,7 @@ The vision pipeline interface enables this by providing: The vision pipeline interface defines a clear contract between data inputs, processing components, and outputs. This interface can be implemented by any computer vision technology stack. -```{mermaid} +```mermaid flowchart LR subgraph Inputs["Inputs"] subgraph SensorInputs["Sensor Inputs"] @@ -292,7 +292,7 @@ This section demonstrates common workflows using sequence diagrams to show the A **Purpose**: Verify camera connectivity and enable downstream calibration without analytics processing. -```{mermaid} +```mermaid sequenceDiagram participant User participant API as Vision Pipeline API @@ -315,7 +315,7 @@ sequenceDiagram **Purpose**: Add analytics processing to connected cameras and verify output in SceneScape. -```{mermaid} +```mermaid sequenceDiagram participant User participant API as Vision Pipeline API @@ -343,7 +343,7 @@ sequenceDiagram **Purpose**: Change the analytics model for an existing pipeline stage. -```{mermaid} +```mermaid sequenceDiagram participant User participant API as Vision Pipeline API @@ -366,7 +366,7 @@ sequenceDiagram **Purpose**: Update camera properties like camera ID with graceful system handling. -```{mermaid} +```mermaid sequenceDiagram participant User participant API as Vision Pipeline API @@ -388,7 +388,7 @@ sequenceDiagram **Purpose**: Remove camera and clean up all associated resources. -```{mermaid} +```mermaid sequenceDiagram participant User participant API as Vision Pipeline API @@ -411,7 +411,7 @@ sequenceDiagram **Purpose**: Chain multiple analytics stages for complex processing workflows. -```{mermaid} +```mermaid sequenceDiagram participant User participant API as Vision Pipeline API @@ -438,7 +438,7 @@ sequenceDiagram **Purpose**: Add concurrent analytics processing for independent object types on the same camera input. -```{mermaid} +```mermaid sequenceDiagram participant User participant API as Vision Pipeline API @@ -466,7 +466,7 @@ sequenceDiagram **Purpose**: Scale pipeline to process multiple cameras with batched MQTT output while preserving individual camera metadata. -```{mermaid} +```mermaid sequenceDiagram participant User participant API as Vision Pipeline API @@ -497,7 +497,7 @@ sequenceDiagram **Purpose**: Request and view all pipelines with their associated cameras and sensors for system-wide inspection. -```{mermaid} +```mermaid sequenceDiagram participant User participant API as Vision Pipeline API From b2567d4f5390e95599a161f5b5995c3e56e87f26 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Wed, 22 Oct 2025 12:13:57 -0700 Subject: [PATCH 14/20] Add Hardware Enumeration requirement to the System Monitoring API per feedback from Tomasz D. --- docs/design/vision-pipeline-overview.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index e02507eb4..60d698c25 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -263,6 +263,7 @@ The following stage types represent common analytics capabilities that can be co **Observability endpoints for system health and performance:** - **Health Check Endpoints**: System-wide health status including API availability, pipeline server status, and MQTT broker connectivity +- **Hardware Enumeration**: Discovery endpoint that returns available hardware accelerators on the platform, providing device identifiers that can be used in pipeline stage configuration (e.g., "CPU", "GPU.0", "GPU.1", "NPU.0", "iGPU") along with device capabilities, memory specifications, and current availability status - **Camera Monitoring**: Per-camera connection status, frame rate statistics, error counts, and reconnection attempt history - **Pipeline Performance**: Per-pipeline throughput metrics, processing latency measurements, and resource utilization statistics - **Resource Monitoring**: Hardware utilization metrics for CPU, GPU, NPU, and memory across all pipeline stages From 36a768ad56685194237b98ee0626c753e97d6b8a Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Wed, 22 Oct 2025 14:33:15 -0700 Subject: [PATCH 15/20] Improve clarity of Implementation Requirements and Considerations section MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Move misplaced items between Requirements and Considerations - Strengthen requirement language (should → must) for mandatory items - Reorganize Time Coordination, Performance Management, and Latency sections - Clarify system resource limits and frame drop strategy requirements - Separate actionable requirements from implementation guidance --- docs/design/vision-pipeline-overview.md | 62 ++++++++++++++++++------- 1 file changed, 46 insertions(+), 16 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 60d698c25..72100afe7 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -263,7 +263,7 @@ The following stage types represent common analytics capabilities that can be co **Observability endpoints for system health and performance:** - **Health Check Endpoints**: System-wide health status including API availability, pipeline server status, and MQTT broker connectivity -- **Hardware Enumeration**: Discovery endpoint that returns available hardware accelerators on the platform, providing device identifiers that can be used in pipeline stage configuration (e.g., "CPU", "GPU.0", "GPU.1", "NPU.0", "iGPU") along with device capabilities, memory specifications, and current availability status +- **Hardware Enumeration**: Discovery endpoint that returns available hardware accelerators on the platform, providing device identifiers that can be used in pipeline stage configuration (e.g., "CPU", "GPU.0", "GPU.1", "NPU.0") along with device capabilities, memory specifications, and current availability status - **Camera Monitoring**: Per-camera connection status, frame rate statistics, error counts, and reconnection attempt history - **Pipeline Performance**: Per-pipeline throughput metrics, processing latency measurements, and resource utilization statistics - **Resource Monitoring**: Hardware utilization metrics for CPU, GPU, NPU, and memory across all pipeline stages @@ -519,39 +519,60 @@ sequenceDiagram ### Coordinate System Management +**Requirements:** + - **Local Coordinates**: Pipeline outputs positions in camera/sensor coordinate space without knowledge of world coordinates or global scene context - **Camera Coordinates**: Coordinate output depends on detection model and sensor modality: - **Monocular 3D Detectors**: Require intrinsic calibration parameters to estimate depth and convert to 3D camera space - **LiDAR/Radar Sensors**: Provide native 3D point cloud data in sensor coordinate space - **2D-Only Models**: Most 2D detectors operate natively in image pixel coordinates (x, y within frame dimensions) and it is acceptable to publish detection results in these units -- **World Coordinate Transformation**: External responsibility using extrinsic calibration data (handled by downstream systems like SceneScape) -- **Multi-Sensor Fusion**: Requires external coordinate system reconciliation and cross-sensor tracking - accomplished outside of the pipeline scope - **Single-Sensor Scope**: Vision pipeline operates independently within individual sensor coordinate systems, maintaining clear boundaries +**Considerations:** + +- **World Coordinate Transformation**: External responsibility using extrinsic calibration data (handled by downstream systems like SceneScape) +- **Multi-Sensor Fusion**: Requires external coordinate system reconciliation and cross-sensor tracking (also typically handled by downstream systems) + ### Time Coordination -- **System Requirements**: Time synchronization must be better than the dynamic observability of the system; e.g., monitoring scenes with faster moving objects requires better time precision -- **Precision Timestamping**: Spatiotemporal fusion requires precision timestamping, ideally at the moment of sensor data acquisition (before encoding, transmission, and other operations) +**Requirements:** + +- **Timestamp at Data Acquisition**: System must timestamp sensor data as early as possible in the acquisition process, preferably at the moment of sensor data capture before any encoding, transmission, or other processing operations to ensure maximum precision for spatiotemporal fusion +- **Fallback Options**: Time synchronization may not always be possible at frame acquisition, and late timestamping may be the only viable option; in this case, a configurable latency offset may need to be applied (backdating the timestamp by some configurable amount on a per-camera and/or per camera batch basis) when the frame arrives at the pipeline +- **Precision Requirements**: For sensor fusion solutions, time synchronization must be better than the dynamic observability of the system (e.g., monitoring scenes with faster moving objects requires better time precision) + +**Considerations:** + - **Platform Responsibility**: Implementation of time synchronization is the responsibility of the hardware+OS platform and is outside the scope of the pipeline server (system timestamps are assumed to be synchronized) - Various technologies may be applied, including NTP, IEEE 1588 PTP, time sensitive networking (TSN), GPS PPS, and related capabilities -- **Fallback Options**: Time synchronization may not always be possible at frame acquisition, and late timestamping may be the only viable option; in this case, a configurable latency offset may need to be applied (backdating the timestamp by some configurable amount on a per-camera and/or per camera batch basis) when the frame arrives at the pipeline - **Distributed System Architecture**: In many deployments, the system operates in a distributed manner across edge clusters with various processing stages running on different compute nodes. This distributed architecture requires robust time synchronization across network boundaries and careful consideration of network latency when correlating timestamped data between processing stages. -### Performance Considerations +### Performance Management -- **Resource Management**: Interface should specify computational and memory requirements per pipeline stage for capacity planning +**Requirements:** + +- **Resource Management**: Interface must specify computational and memory requirements per pipeline stage for capacity planning - **Hardware Targeting**: Enable per-stage optimization across CPU, iGPU, GPU, and NPU resources for balanced performance -- **Throughput Scaling**: Additional concurrent sensor streams should be optimized using techniques such as cross-sensor/camera batching and other methods that minimize latency and maximize throughput as much as possible -- **System Headroom**: Enable configuration of available computational headroom reserved for other workloads to prevent pipeline overload +- **System Resource Limits**: Enable dynamic configuration of maximum resource utilization limits (CPU cores, GPU memory, system memory) that pipeline operations can consume, ensuring sufficient computational resources remain available for operating system services, container orchestration, monitoring agents, and other non-pipeline workloads - **Dynamic Load Balancing**: Support runtime adjustment of processing priorities based on system load and application criticality +**Considerations:** + +- **Implementation Strategies**: Various optimization approaches may be applied including cross-camera batching, GPU memory pooling, and hardware-specific acceleration techniques + ### Latency Requirements Latency is critical for real-time operation and must be configurable based on application needs (e.g., <15ms for traffic safety applications). -- **Real-Time Priority**: Low latency is essential for safety-critical applications where delayed responses can impact traffic flow and safety +**Requirements:** + +- **Real-Time Operation**: Low latency is essential for safety-critical applications where delayed responses can impact traffic flow and safety +- **Frame Drop Strategy**: System must implement frame dropping mechanisms to maintain real-time latency guarantees, always processing the latest available frame and discarding queued frames when processing cannot keep up with input rate +- **IP Camera Protocol Selection**: Both RTSP and MJPEG streaming protocols must be supported (robust MJPEG support was lacking in DLS-PS). MJPEG can provide significant latency improvements compared to RTSP (typical: MJPEG ~50-100ms vs RTSP ~500-2000ms+, with some configurations experiencing even higher delays) at the cost of 3-5x higher bandwidth usage, making MJPEG preferable for edge deployments with local network connectivity + +**Considerations:** + - **Critical Use Cases**: Ultra-low latency enables mission-critical applications such as CV2X signaling for jaywalking detection, adaptive traffic light controls using pedestrian monitoring, and collision avoidance systems where milliseconds can prevent accidents -- **Latency vs Throughput Trade-offs**: Strict latency requirements may necessitate dropping frames to maintain real-time guarantees, but parallel operations like cross-camera batching can optimize both - **End-to-End Optimization**: Minimize total pipeline latency from camera data acquisition through analytics output using multiple techniques: - Avoid unnecessary streaming/restreaming stages that add buffering delays - Implement cross-camera batching to process multiple camera feeds simultaneously for improved GPU utilization @@ -560,14 +581,18 @@ Latency is critical for real-time operation and must be configurable based on ap - Minimize intermediate data serialization and format conversions - Configure hardware-specific optimizations like GPU memory pooling and CPU affinity - Implement frame skipping strategies under high load to maintain real-time guarantees -- **IP Camera Protocol Selection**: Both RTSP and MJPEG streaming protocols must be supported (robust MJPEG support was lacking in DLS-PS). MJPEG can provide significant latency improvements compared to RTSP (typical: MJPEG ~50-100ms vs RTSP ~500-2000ms+, with some configurations experiencing even higher delays) at the cost of 3-5x higher bandwidth usage, making MJPEG preferable for edge deployments with local network connectivity ### Server Architecture +**Requirements:** + - **Single Server Instance**: One persistent server instance per compute node manages all vision pipelines, eliminating configuration complexity from multiple service instances - **Always Running**: Server instance maintains continuous availability, managing pipeline lifecycle internally without requiring external service management - **Pipeline Management**: Server handles creation, configuration, monitoring, and cleanup of individual pipelines through a unified API interface -- **Port Consolidation**: All pipeline operations accessible through single API endpoint, avoiding the configuration challenges of multiple services on different ports +- **Single Port Operation**: All pipeline operations accessible through a single pipeline server instance running on one port, avoiding the configuration challenges and operational complexity of multiple services running on different ports + +**Considerations:** + - **Resource Coordination**: Centralized server enables optimal resource allocation and conflict resolution across concurrent pipelines - **Simplified Deployment**: Single service deployment model reduces operational complexity compared to per-pipeline service instances @@ -575,10 +600,15 @@ Latency is critical for real-time operation and must be configurable based on ap A pipeline stage represents a single operation such as a detection or classification step that includes its pre- and post-processing operations. It can represent any number of types of analytics, including deep learning, computer vision, transformer, or other related operations. -- **Initial Configuration**: Pipeline stages can be initially managed through manual configuration files or system administration tools -- **Stage Discovery**: System should provide mechanisms to discover available analytics stages and their capabilities (input/output formats, hardware requirements) +**Requirements:** + +- **Stage Discovery**: System must provide mechanisms to discover available analytics stages and their capabilities (input/output formats, hardware requirements) - **Stage Validation**: Automated validation of stage compatibility when composing pipelines to prevent invalid configurations - **Stage Versioning**: Support for multiple versions of analytics stages to enable gradual upgrades and rollback capabilities + +**Considerations:** + +- **Initial Configuration**: Pipeline stages can be initially managed through manual configuration files or system administration tools - **Customer Extensibility**: Future capability for customers to register custom analytics stages through standardized interfaces - **Configuration Templates**: Pre-built stage combinations and templates for common use cases to simplify deployment - **Runtime Management**: Eventually support dynamic loading and unloading of analytics stages without service restart From 9f8634d32bd2090aa18e6f36627cecc67c6a93c0 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Wed, 22 Oct 2025 15:03:20 -0700 Subject: [PATCH 16/20] Reduce specificity on the metadata output -- change from "send detection results" to "send pipeline results" --- docs/design/vision-pipeline-overview.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 72100afe7..3cbbc7e54 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -69,7 +69,7 @@ A traffic operations expert wants to deploy vision analytics at a busy intersect - General object detection → vehicle classification - Custom combinations based on specific needs -3. **Metadata Output**: Send detection results to MQTT broker for SceneScape processing: +3. **Metadata Output**: Send pipeline results to MQTT broker for SceneScape processing: - JSON format with validated schema structure - Batched messages to minimize network chatter @@ -515,7 +515,7 @@ sequenceDiagram **Note**: The JSON response format is designed to be compatible with web-based graph visualization tools, enabling interactive pipeline diagrams where cameras appear as input nodes, stages as processing nodes, and data flows as connecting edges. -## Implementation Requirements +## Implementation Requirements and Considerations ### Coordinate System Management From 7d4ae03650a881e8dafdf424cfd4abbf28c4509b Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Mon, 27 Oct 2025 15:15:02 -0700 Subject: [PATCH 17/20] Enhance Vision Pipeline API design document with comprehensive DAG composition per feedback from Tomasz D. - Add Advanced Pipeline Composition section with DAG architecture - Implement comprehensive DAG syntax with sequential, parallel, and branching patterns - Add visual Mermaid diagram showing metadata convergence architecture - Include practical JSON metadata examples with model tracking - Add hardware targeting syntax (@GPU, @NPU, @CPU) for per-stage optimization - Implement stage aliases (vehicle, vattrib, lpd, lpr, reid) with critical importance warnings - Add comprehensive model metadata tracking (name, version, hash) for reproducibility - Include complex DAG examples for traffic intersection and video analysis workflows - Add DAG validation, execution order, and performance optimization details - Update document date and clarify scope to include supporting functionality --- docs/design/vision-pipeline-overview.md | 243 +++++++++++++++++++++++- 1 file changed, 234 insertions(+), 9 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 3cbbc7e54..c7a24ac51 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -1,7 +1,7 @@ # Design Document: Vision Pipeline API for Domain Experts - **Author(s)**: Rob Watts -- **Date**: 2025-10-21 +- **Date**: 2025-10-27 - **Status**: `Proposed` - **Related ADRs**: TBD @@ -9,7 +9,7 @@ ## Overview -This document defines a simple REST API for connecting cameras, configuring vision analytics pipelines, and managing pipeline metadata publishing. The API enables domain experts to deploy computer vision capabilities without requiring deep technical knowledge of AI models, pipeline configurations, or video processing implementations. +This document defines a simple REST API and supporting functionality for connecting cameras, configuring vision analytics pipelines, and managing pipeline metadata publishing. The API enables domain experts to deploy computer vision capabilities without requiring deep technical knowledge of AI models, pipeline configurations, or video processing implementations. A **domain expert** in this context is a consumer of the video analytics pipeline who has expertise in a given field that is not computer vision. They understand their domain-specific requirements and goals but prefer to focus on their area of expertise rather than the technical complexities of computer vision implementation. @@ -115,14 +115,14 @@ flowchart LR end subgraph Pipeline["Vision Pipeline"] - VIDEO["Video Processing
Decode → Detect → Single-Camera Track → Embed → Classify"] - POINTCLOUD["Point Cloud Processing
Segment → Detect → Single-Sensor Track → Embed"] + VIDEO["Video Processing
Decode → Detect
→ Track → Classify"] + POINTCLOUD["Point Cloud Processing
Segment → Detect
→ Track → Embed"] end subgraph Outputs["Pipeline Outputs"] - DETECTIONS["Object Detections & Tracks
(bounding boxes, classifications, temporal associations, IDs, embeddings)"] - RAWDATA["Source Data
(original frames, point clouds)"] - DECORATED["Decorated Data
(annotated images, segmented point clouds)"] + DETECTIONS["Object Detections
& Tracks
(boxes, classes, IDs)"] + RAWDATA["Source Data
(frames, clouds)"] + DECORATED["Decorated Data
(annotated frames,
segmented clouds)"] end %% Styling @@ -159,8 +159,13 @@ The interface design anticipates the growing prevalence of multimodal sensing in - **Add Camera**: Connect new cameras via RTSP, MJPEG, WebRTC, USB, or file input - **Remove Camera**: Disconnect cameras and clean up resources gracefully -- **Camera Configuration**: Set resolution, frame rate, and encoding parameters +- **Camera Configuration**: Automatically detect and use camera's native frame properties (resolution, frame rate, encoding) by default, with optional pipeline-level overrides for specific requirements - **Camera Properties**: Configure camera intrinsics and distortion parameters, with support for dynamically updating these values in near real-time to support zoom cameras +- **Distortion Handling**: No undistortion by default; automatically enable undistortion when distortion coefficients are provided, with optional flag to disable undistortion even when coefficients are present +- **Distortion Models**: Use Brown-Conrady distortion model by default with override option for fisheye undistortion models +- **Undistortion Alpha**: Configure alpha parameter for undistortion output cropping (crop to remove black areas or preserve full frame with black regions) +- **Undistortion Metadata**: When undistortion occurs, compute new camera matrix and zero out distortion coefficients in the output metadata +- **Performance Optimization**: All input frame processing operations must use optimized implementations including GPU acceleration for compute-intensive tasks, precomputed undistortion map caching for repeated coordinate transformations, and efficient pixel remapping to minimize processing latency - **Default Configuration**: Apply sensible defaults when configuration parameters are not explicitly provided, minimizing setup complexity for common camera types and use cases - **JSON Configuration**: All camera configuration handled through JSON-only payloads for consistent API interaction - **Multi-Source Support**: Handle mixed camera types (IP cameras, USB webcams, video files) in single deployment @@ -226,7 +231,7 @@ The following stage types represent common analytics capabilities that can be co - **Multi-Camera Processing**: Pipelines can simultaneously process video from multiple cameras, applying identical analytics configurations across all camera sources while maintaining per-camera metadata identification - **Pipeline Composition**: Chain compatible stages together where outputs of one stage match inputs of the next (e.g., vehicle detection → vehicle classification, license plate detection → OCR) - **Compatibility Validation**: System prevents invalid stage chaining when output formats are incompatible (e.g., classification stage cannot feed into detection stage) -- **Parallel and Sequential Processing**: Support both sequential stage chaining and parallel stage execution for independent analytics on the same input +- **Pipeline Composition**: Support complex Directed Acyclic Graph (DAG) structures including sequential chaining, parallel execution, and branching patterns (see Advanced Pipeline Composition below) - **Pre-configured Stages**: Each stage comes with optimized default settings but allows customization - **Per-Stage Hardware Optimization**: Target each individual stage to specific hardware (CPU, iGPU, GPU, NPU) for optimal performance - **Pipeline Templates**: Save and reuse common stage combinations across deployments @@ -234,6 +239,7 @@ The following stage types represent common analytics capabilities that can be co - **Stage Input/Output Behavior**: A given stage operates on the output of the previous stage (or the original frame for the first stage), and may operate on an array of outputs from that single previous stage - **Unscaled Image Data Output**: For stages that output image-like data (rather than text data), the output must refer to the unscaled portion of the input associated with the detection, such as the bounding box or a masked output of oriented bounding box or instance segment - **Metadata Collation**: Whenever a stage runs, the metadata is collated into a single object array per chain, with a property key defined by each stage that has run (e.g. when `vehicle+lpd+lpr` finds a vehicle but no plate, the metadata will have an empty `"lpd: []"` array to indicate the stage ran but found nothing, and no `lpr` value exists because it didn't run) +- **Model Metadata**: Each stage must include model information in the metadata output, including model name, version identifier, and content hash for reproducibility and compliance tracking. This enables debugging, model lifecycle management, and audit trails for regulatory requirements - **Guaranteed Output**: Every frame input must have a resultant metadata output, even if nothing is detected (not detecting something is also an important result) - **Source Frame Coordinates**: All collated metadata is reported in source frame coordinates for staged operations, e.g. vehicle bounding box and the license plate bounding box are both reported in original frame pixel units @@ -245,6 +251,225 @@ The following stage types represent common analytics capabilities that can be co - **Flexible Optimization**: Each stage can be optimized for different performance characteristics and hardware targets, including inter-stage optimizations like buffer sharing on the same device - **Define Once, Connect Many**: Pipeline stages are defined once with their analytics capabilities and requirements, then can be dynamically connected into different pipeline configurations without modification, reducing configuration complexity and enabling rapid analytics deployment +### Advanced Pipeline Composition + +The vision pipeline system supports complex Directed Acyclic Graph (DAG) structures for sophisticated analytics workflows. This enables domain experts to create powerful analytics chains without requiring deep understanding of the underlying computer vision implementations. + +**DAG Construction Principles:** + +Pipeline composition uses DAG structures where each stage represents a processing node, and data flows along directed edges between stages. This approach provides maximum flexibility while ensuring deterministic execution order and preventing circular dependencies. + +```mermaid +flowchart LR + INPUT["Preprocessed Frame
(intrinsics, distortion)"] + VEH["Vehicle Detection
(vehicle)"] + PERSON["Person Detection
(person)"] + LPD["License Plate Detection
(lpd)"] + LPR["License Plate OCR
(lpr)"] + VATTR["Vehicle Attributes
(vattrib)"] + REID["Person ReID
(reid)"] + PUBLISH["Metadata Publish Node
(MQTT Output)"] + + INPUT --> VEH + INPUT --> PERSON + VEH --> LPD + VEH --> VATTR + LPD --> LPR + PERSON --> REID + + %% All stages converge to single publish node + VATTR --> PUBLISH + LPR --> PUBLISH + REID --> PUBLISH + + classDef input fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000 + classDef detection fill:#fff8e1,stroke:#ff8f00,stroke-width:2px,color:#000000 + classDef analysis fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000000 + classDef output fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000 + + class INPUT input + class VEH,PERSON,LPD detection + class LPR,VATTR,REID analysis + class PUBLISH output +``` + +**Corresponding DAG Definition:** + +The above diagram can be expressed using the following DAG syntax for direct comparison: + +```text +vehicle+[vattrib,lpd+lpr],person+reid +``` + +This demonstrates how complex multi-branch analytics workflows can be concisely defined through intuitive syntax while maintaining clear visual correspondence with the diagram representation. + +**Corresponding Metadata Output:** + +The DAG execution produces structured JSON metadata that combines results from all executed stages. Here's an example showing how the parallel branches contribute to the final output: + +```json +{ + "pipeline_start": "2025-01-25T15:30:45.120Z", + "pipeline_complete": "2025-01-25T15:30:45.155Z", + "models": { + "vehicle": { + "name": "yolov8n-vehicle", + "version": "1.2.3", + "hash": "sha256:a1b2c3d4e5f6..." + }, + "vattrib": { + "name": "vehicle-attributes-resnet50", + "version": "2.1.0", + "hash": "sha256:f6e5d4c3b2a1..." + }, + "lpd": { + "name": "license-plate-detector", + "version": "1.0.5", + "hash": "sha256:9f8e7d6c5b4a..." + }, + "lpr": { + "name": "license-plate-ocr-crnn", + "version": "3.2.1", + "hash": "sha256:3a2b1c9d8e7f..." + }, + "person": { + "name": "yolov8s-person", + "version": "1.2.3", + "hash": "sha256:7f8e9d0c1b2a..." + }, + "reid": { + "name": "person-reid-osnet", + "version": "2.0.8", + "hash": "sha256:5d4c3b2a1f9e..." + } + }, + "objects": [ + { + "timestamp": "2025-01-25T15:30:45.100Z", + "camera_id": "cam_north", + "category": "vehicle", + "confidence": 0.94, + "bounding_box": {"x": 120, "y": 80, "width": 200, "height": 120}, + "id": 1, + "vattrib": { + "subtype": "car", + "color": "blue" + }, + "lpd": [ + { + "category": "license_plate", + "confidence": 0.89, + "bounding_box": {"x": 180, "y": 160, "width": 80, "height": 20}, + "lpr": { + "text": "ABC123", + "confidence": 0.91 + } + } + ] + }, + { + "timestamp": "2025-01-25T15:30:45.100Z", + "camera_id": "cam_north", + "category": "person", + "confidence": 0.87, + "bounding_box": {"x": 350, "y": 100, "width": 60, "height": 180}, + "id": 2, + "reid": "eyJ2ZWN0b3IiOiJbMC4xMiwgMC44NywgLi4uXSJ9" + } + ] +} +``` + +**Key Metadata Features:** + +- **Stage Collation**: Each stage contributes its results as nested properties (e.g., `vattrib`, `lpd`, `lpr`, `reid`) +- **Model Information**: Top-level `models` object provides name, version, and hash for each stage that executed, enabling reproducibility and audit trails +- **Guaranteed Output**: Empty arrays appear for stages that ran but found nothing (e.g., `"lpd": []` when no license plate detected) +- **Source Coordinates**: All bounding boxes reported in original frame pixel coordinates +- **Nested Dependencies**: Downstream stages only execute when upstream stages produce results (LPR only runs when LPD finds a plate) +- **Per-Object Results**: Each detected object carries its own stage-specific metadata + +**Critical Importance of Stage Aliases:** + +Stage aliases (such as `vehicle`, `vattrib`, `lpd`, `lpr`, `reid`) are not merely convenient shorthand—they are **critical identifiers that directly determine the metadata structure**. Each alias becomes a property key in the output JSON, defining how downstream systems access and process the results. Changing a stage alias fundamentally changes the metadata schema and will break integration with systems expecting specific property names. Stage aliases must be carefully chosen and consistently maintained across deployments to ensure metadata compatibility and system interoperability. + +**Note**: The DAG syntax shown throughout this document is a suggested approach for pipeline composition. Alternative syntax designs and composition methods may be carefully considered based on implementation requirements, user feedback, and evolving best practices in pipeline orchestration. + +**Pipeline Composition Syntax:** + +The system uses a concise syntax inspired by Percebro's DAG notation for defining complex pipeline structures: + +- **Sequential Chaining**: Use `+` to chain stages in sequence + + ```text + vehicle+vattrib // Vehicle detection → Vehicle attributes + person+reid // Person detection → ReID embedding + vehicle+lpd+lpr // Vehicle → License plate detection → OCR + ``` + +- **Parallel Execution**: Use `,` to run stages in parallel on the same input + + ```text + vehicle,person // Vehicle and person detection in parallel + vehicle+vattrib,person+reid // Two parallel chains + ``` + +- **Branching**: Use `[...]` to feed one stage output to multiple downstream stages + + ```text + vehicle+[vattrib,lpd] // Vehicle detection feeds both attributes and LPD + person+[reid,head+agr] // Person detection feeds ReID and head detection chain + ``` + +- **Hardware Targeting**: Use `@TARGET` to specify hardware for individual stages + + ```text + vehicle@GPU+vattrib@NPU // Vehicle detection on GPU, attributes on NPU + person,vehicle@GPU // Parallel stages on different hardware + ``` + +**Complex DAG Examples:** + +1. **Traffic Intersection Analytics**: + + ```text + vehicle@GPU+[vattrib@NPU,lpd+lpr],person@GPU+reid@NPU + ``` + + This creates: + - Vehicle detection (GPU) → Vehicle attributes (NPU) + License plate chain (default CPU) + - Person detection (GPU) → ReID embedding (NPU) + - All running in parallel from the same camera input + +2. **Comprehensive Video Analysis**: + + ```text + vehicle+[vattrib,lpd+lpr,safety@NPU],person+[reid,ppe@NPU],general + ``` + + This creates three parallel branches: + - Vehicle analysis with attributes, license plates, and safety assessment + - Person analysis with ReID and PPE detection + - General object detection for scene context + +**DAG Validation and Execution:** + +- **Compatibility Checking**: System validates that stage outputs match downstream stage inputs before execution +- **Resource Management**: Hardware assignments are validated against available accelerators +- **Execution Order**: DAG topology determines optimal execution scheduling +- **Metadata Convergence**: All stages must ultimately converge on a single metadata publish node to ensure unified output regardless of DAG complexity +- **Error Handling**: Failed stages don't block parallel branches; metadata indicates stage completion status +- **Dynamic Reconfiguration**: DAG structure can be modified at runtime without stopping the pipeline + +**Performance Considerations:** + +- **Parallel Optimization**: Independent branches execute concurrently to maximize hardware utilization +- **Memory Management**: Intermediate results are efficiently shared between branching stages +- **Load Balancing**: Hardware assignments can be dynamically adjusted based on system performance +- **Batch Processing**: Multiple detection outputs from one stage efficiently feed downstream stages + +This DAG-based approach enables domain experts to create sophisticated analytics workflows through intuitive syntax while the system handles all the underlying complexity of stage coordination, resource management, and data flow optimization. + ### Frame Access API **On-demand access to camera frame data:** From 44166cb10026f5607a1eb184d6019ceadce2d142 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Wed, 29 Oct 2025 10:00:07 -0700 Subject: [PATCH 18/20] Refine metadata output section: remove models object from JSON example, clarify model info is available via separate endpoint, and combine timestamp/camera ID bullets for clarity. --- docs/design/vision-pipeline-overview.md | 40 +++---------------------- 1 file changed, 4 insertions(+), 36 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index c7a24ac51..6c0c15254 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -305,44 +305,12 @@ This demonstrates how complex multi-branch analytics workflows can be concisely **Corresponding Metadata Output:** -The DAG execution produces structured JSON metadata that combines results from all executed stages. Here's an example showing how the parallel branches contribute to the final output: +The DAG execution produces structured JSON metadata that combines results from all executed stages. Here's an example output from our example pipeline showing how the parallel and sequential branches of the DAG contribute to a single metadata output message: ```json { "pipeline_start": "2025-01-25T15:30:45.120Z", "pipeline_complete": "2025-01-25T15:30:45.155Z", - "models": { - "vehicle": { - "name": "yolov8n-vehicle", - "version": "1.2.3", - "hash": "sha256:a1b2c3d4e5f6..." - }, - "vattrib": { - "name": "vehicle-attributes-resnet50", - "version": "2.1.0", - "hash": "sha256:f6e5d4c3b2a1..." - }, - "lpd": { - "name": "license-plate-detector", - "version": "1.0.5", - "hash": "sha256:9f8e7d6c5b4a..." - }, - "lpr": { - "name": "license-plate-ocr-crnn", - "version": "3.2.1", - "hash": "sha256:3a2b1c9d8e7f..." - }, - "person": { - "name": "yolov8s-person", - "version": "1.2.3", - "hash": "sha256:7f8e9d0c1b2a..." - }, - "reid": { - "name": "person-reid-osnet", - "version": "2.0.8", - "hash": "sha256:5d4c3b2a1f9e..." - } - }, "objects": [ { "timestamp": "2025-01-25T15:30:45.100Z", @@ -380,6 +348,8 @@ The DAG execution produces structured JSON metadata that combines results from a } ``` +*Note: Model information (name, version, hash) for each stage can be retrieved via a separate endpoint if needed, rather than being included in every output message.* + **Key Metadata Features:** - **Stage Collation**: Each stage contributes its results as nested properties (e.g., `vattrib`, `lpd`, `lpr`, `reid`) @@ -502,10 +472,8 @@ This DAG-based approach enables domain experts to create sophisticated analytics - **MQTT Publishing**: All detection metadata published to MQTT brokers in JSON format - **Batch Processing**: Minimized chatter with one message per batch to reduce network overhead and improve performance -- **Individual Frame Timestamps**: Each frame maintains its individual timestamp within batched messages for accurate temporal correlation -- **Camera Source Identification**: Each frame preserves its camera source ID within batch metadata +- **Detection-Level Timestamps and Camera IDs**: Each detection includes its original timestamp from the source and the camera ID, ensuring accurate temporal correlation and source identification in all metadata outputs - **Cross-Camera Batching**: Frames are captured and batched across cameras within small time windows for efficiency -- **Original Timing Preservation**: Each frame's metadata preserves its original capture timestamp and camera identifier - **Metadata Schema Availability**: JSON schemas for detection metadata provided via dedicated API endpoints for programmatic validation and integration - **Clean Configuration**: Schema artifacts must not be included in configuration JSON to maintain separation of concerns - **Topic Generation**: MQTT topics procedurally generated based on camera IDs and pipeline configuration with optional namespace configuration From 530b87f975eda9463996883213836ea9b5c6c601 Mon Sep 17 00:00:00 2001 From: "Watts, Robert A" Date: Wed, 29 Oct 2025 10:34:10 -0700 Subject: [PATCH 19/20] Clarify model properties and versioning requirement: require API access to model info per stage, without specifying output structure and without requiring this information in pipeline metadata output --- docs/design/vision-pipeline-overview.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index 6c0c15254..c99785e6f 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -239,7 +239,7 @@ The following stage types represent common analytics capabilities that can be co - **Stage Input/Output Behavior**: A given stage operates on the output of the previous stage (or the original frame for the first stage), and may operate on an array of outputs from that single previous stage - **Unscaled Image Data Output**: For stages that output image-like data (rather than text data), the output must refer to the unscaled portion of the input associated with the detection, such as the bounding box or a masked output of oriented bounding box or instance segment - **Metadata Collation**: Whenever a stage runs, the metadata is collated into a single object array per chain, with a property key defined by each stage that has run (e.g. when `vehicle+lpd+lpr` finds a vehicle but no plate, the metadata will have an empty `"lpd: []"` array to indicate the stage ran but found nothing, and no `lpr` value exists because it didn't run) -- **Model Metadata**: Each stage must include model information in the metadata output, including model name, version identifier, and content hash for reproducibility and compliance tracking. This enables debugging, model lifecycle management, and audit trails for regulatory requirements +**Model Metadata**: There must be a method to retrieve model information (such as model name, version identifier, and content hash) for each stage via the API, to support reproducibility, compliance tracking, debugging, and audit requirements. - **Guaranteed Output**: Every frame input must have a resultant metadata output, even if nothing is detected (not detecting something is also an important result) - **Source Frame Coordinates**: All collated metadata is reported in source frame coordinates for staged operations, e.g. vehicle bounding box and the license plate bounding box are both reported in original frame pixel units @@ -353,7 +353,6 @@ The DAG execution produces structured JSON metadata that combines results from a **Key Metadata Features:** - **Stage Collation**: Each stage contributes its results as nested properties (e.g., `vattrib`, `lpd`, `lpr`, `reid`) -- **Model Information**: Top-level `models` object provides name, version, and hash for each stage that executed, enabling reproducibility and audit trails - **Guaranteed Output**: Empty arrays appear for stages that ran but found nothing (e.g., `"lpd": []` when no license plate detected) - **Source Coordinates**: All bounding boxes reported in original frame pixel coordinates - **Nested Dependencies**: Downstream stages only execute when upstream stages produce results (LPR only runs when LPD finds a plate) From 61a4e0c6ebeafef6000aa5d10e56ceeebec5e32f Mon Sep 17 00:00:00 2001 From: Sarat Poluri Date: Wed, 12 Nov 2025 09:52:51 -0700 Subject: [PATCH 20/20] Prettier fix --- docs/design/vision-pipeline-overview.md | 71 +++++++++++++------------ 1 file changed, 36 insertions(+), 35 deletions(-) diff --git a/docs/design/vision-pipeline-overview.md b/docs/design/vision-pipeline-overview.md index c99785e6f..100173fd0 100644 --- a/docs/design/vision-pipeline-overview.md +++ b/docs/design/vision-pipeline-overview.md @@ -48,7 +48,6 @@ The vision pipeline API is built on three fundamental principles: - **Goal**: Deploy smart intersection systems that provide actionable traffic insights and automated responses without requiring deep computer vision expertise - **Technical Level**: Understands traffic engineering, urban planning, and sensor networks but has limited computer vision knowledge; wants to focus on traffic optimization, not algorithm configuration - **Pain Points**: - - Complex vision systems obscure traffic engineering value - Difficulty translating traffic requirements into vision configurations - Unclear what vision capabilities are available for traffic applications @@ -63,14 +62,12 @@ A traffic operations expert wants to deploy vision analytics at a busy intersect 1. **Camera Management**: Dynamically connect 4-8 cameras using various input methods (RTSP streams, MJPEG streams, WebRTC streams, USB connections, or offline video files) with fast, API-driven camera addition and removal that handles backend operations transparently 2. **Pipeline Composition**: Compose analytics pipelines by chaining stages together: - - Vehicle detection → license plate detection → OCR - Person detection → re-identification embedding generation - General object detection → vehicle classification - Custom combinations based on specific needs 3. **Metadata Output**: Send pipeline results to MQTT broker for SceneScape processing: - - JSON format with validated schema structure - Batched messages to minimize network chatter - Preserved frame timestamps and camera source IDs @@ -98,40 +95,40 @@ flowchart LR subgraph Inputs["Inputs"] subgraph SensorInputs["Sensor Inputs"] CAM1["Camera 1
Source Video"] - CAM2["Camera 2
Source Video"] + CAM2["Camera 2
Source Video"] LIDAR["LiDAR
Point Cloud"] RADAR["Radar
Point Cloud"] AUDIO["Audio
Sound Data"] end - + subgraph ConfigInputs["Configuration Inputs"] MODELS["AI Models
Detection/Classification"] CALIB["Calibration Data
Intrinsics + Distortion"] end - + subgraph PlatformInputs["Platform Inputs"] TIME["Synchronized System Time
(timestamps, time sync)"] end end - + subgraph Pipeline["Vision Pipeline"] VIDEO["Video Processing
Decode → Detect
→ Track → Classify"] POINTCLOUD["Point Cloud Processing
Segment → Detect
→ Track → Embed"] end - + subgraph Outputs["Pipeline Outputs"] DETECTIONS["Object Detections
& Tracks
(boxes, classes, IDs)"] RAWDATA["Source Data
(frames, clouds)"] DECORATED["Decorated Data
(annotated frames,
segmented clouds)"] end - + %% Styling classDef pipeline fill:#fff8e1,stroke:#ff8f00,stroke-width:3px,color:#000000 classDef sensors fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000 classDef config fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000000 classDef platform fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#000000 classDef outputs fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000 - + class VIDEO,POINTCLOUD pipeline class CAM1,CAM2,LIDAR,RADAR,AUDIO sensors class MODELS,CALIB config @@ -176,6 +173,7 @@ The interface design anticipates the growing prevalence of multimodal sensing in The following examples demonstrate adding cameras independently of pipeline configuration. Each camera inherits sensible system defaults (such as auto-detected resolution, frame rate, and default intrinsics) while allowing selective override of specific parameters when needed. Cameras can be added without concern for what analytics pipelines will eventually process their video streams. **Example Configuration (RTSP Camera):** + ```json { "camera_id": "cam_north", @@ -184,6 +182,7 @@ The following examples demonstrate adding cameras independently of pipeline conf ``` **Example Configuration (USB Camera):** + ```json { "camera_id": "cam_usb", @@ -192,6 +191,7 @@ The following examples demonstrate adding cameras independently of pipeline conf ``` **Example Configuration (MJPEG Camera):** + ```json { "camera_id": "cam_mjpeg", @@ -200,9 +200,10 @@ The following examples demonstrate adding cameras independently of pipeline conf ``` **Example Configuration (RTSP with Authentication and Custom Intrinsics):** + ```json { - "camera_id": "cam_south", + "camera_id": "cam_south", "source": "rtsp://admin:camera_pass@192.168.1.101:554/stream1", "intrinsics": [ [1000.0, 0.0, 960.0], @@ -239,7 +240,7 @@ The following stage types represent common analytics capabilities that can be co - **Stage Input/Output Behavior**: A given stage operates on the output of the previous stage (or the original frame for the first stage), and may operate on an array of outputs from that single previous stage - **Unscaled Image Data Output**: For stages that output image-like data (rather than text data), the output must refer to the unscaled portion of the input associated with the detection, such as the bounding box or a masked output of oriented bounding box or instance segment - **Metadata Collation**: Whenever a stage runs, the metadata is collated into a single object array per chain, with a property key defined by each stage that has run (e.g. when `vehicle+lpd+lpr` finds a vehicle but no plate, the metadata will have an empty `"lpd: []"` array to indicate the stage ran but found nothing, and no `lpr` value exists because it didn't run) -**Model Metadata**: There must be a method to retrieve model information (such as model name, version identifier, and content hash) for each stage via the API, to support reproducibility, compliance tracking, debugging, and audit requirements. + **Model Metadata**: There must be a method to retrieve model information (such as model name, version identifier, and content hash) for each stage via the API, to support reproducibility, compliance tracking, debugging, and audit requirements. - **Guaranteed Output**: Every frame input must have a resultant metadata output, even if nothing is detected (not detecting something is also an important result) - **Source Frame Coordinates**: All collated metadata is reported in source frame coordinates for staged operations, e.g. vehicle bounding box and the license plate bounding box are both reported in original frame pixel units @@ -269,24 +270,24 @@ flowchart LR VATTR["Vehicle Attributes
(vattrib)"] REID["Person ReID
(reid)"] PUBLISH["Metadata Publish Node
(MQTT Output)"] - + INPUT --> VEH INPUT --> PERSON VEH --> LPD VEH --> VATTR LPD --> LPR PERSON --> REID - + %% All stages converge to single publish node VATTR --> PUBLISH LPR --> PUBLISH REID --> PUBLISH - + classDef input fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#000000 classDef detection fill:#fff8e1,stroke:#ff8f00,stroke-width:2px,color:#000000 classDef analysis fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000000 classDef output fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000 - + class INPUT input class VEH,PERSON,LPD detection class LPR,VATTR,REID analysis @@ -317,7 +318,7 @@ The DAG execution produces structured JSON metadata that combines results from a "camera_id": "cam_north", "category": "vehicle", "confidence": 0.94, - "bounding_box": {"x": 120, "y": 80, "width": 200, "height": 120}, + "bounding_box": { "x": 120, "y": 80, "width": 200, "height": 120 }, "id": 1, "vattrib": { "subtype": "car", @@ -327,7 +328,7 @@ The DAG execution produces structured JSON metadata that combines results from a { "category": "license_plate", "confidence": 0.89, - "bounding_box": {"x": 180, "y": 160, "width": 80, "height": 20}, + "bounding_box": { "x": 180, "y": 160, "width": 80, "height": 20 }, "lpr": { "text": "ABC123", "confidence": 0.91 @@ -340,7 +341,7 @@ The DAG execution produces structured JSON metadata that combines results from a "camera_id": "cam_north", "category": "person", "confidence": 0.87, - "bounding_box": {"x": 350, "y": 100, "width": 60, "height": 180}, + "bounding_box": { "x": 350, "y": 100, "width": 60, "height": 180 }, "id": 2, "reid": "eyJ2ZWN0b3IiOiJbMC4xMiwgMC44NywgLi4uXSJ9" } @@ -348,7 +349,7 @@ The DAG execution produces structured JSON metadata that combines results from a } ``` -*Note: Model information (name, version, hash) for each stage can be retrieved via a separate endpoint if needed, rather than being included in every output message.* +_Note: Model information (name, version, hash) for each stage can be retrieved via a separate endpoint if needed, rather than being included in every output message._ **Key Metadata Features:** @@ -500,7 +501,7 @@ sequenceDiagram Camera-->>Server: Video stream Server->>MQTT: Publish camera status (connected) API-->>User: Camera ID and status - + Note over User,MQTT: Camera running in free-run mode
Source frames available for calibration
No analytics processing yet ``` @@ -526,7 +527,7 @@ sequenceDiagram SceneScape->>MQTT: Publish tracks and properties MQTT-->>User: Track data available for consumption SceneScape-->>User: Visual verification in SceneScape UI - + User->>API: Request decorated frames for camera API-->>User: Stream frames with detection overlays Note over User: Visual verification of detections
Complete data flow: detections → tracks → properties @@ -551,7 +552,7 @@ sequenceDiagram Server->>Server: Resume analytics processing Server->>MQTT: Publish updated metadata API-->>User: Stage update confirmation - + Note over Server,MQTT: Pipeline now outputs
person detection data ``` @@ -573,7 +574,7 @@ sequenceDiagram Server->>Server: Update internal camera references Server->>MQTT: Publish with updated camera ID API-->>User: Camera update confirmation - + Note over Server,MQTT: System gracefully handles
camera ID changes ``` @@ -596,7 +597,7 @@ sequenceDiagram Server->>MQTT: Publish camera offline status Server->>Server: Remove camera instance API-->>User: Deletion confirmation - + Note over Server: All camera resources cleaned up
Associated pipelines terminated ``` @@ -612,16 +613,16 @@ sequenceDiagram participant MQTT as MQTT Broker Note over User: Existing pipeline: Vehicle Detection - + User->>API: POST /pipelines/{pipeline_id}/stages Note over User,API: Add classification stage:
- Input: Vehicle detections
- Stage: Vehicle Type Classification
- Hardware: NPU API->>Server: Validate stage compatibility Server->>Server: Create classification stage Server->>Server: Link detection → classification Server->>Server: Start chained processing - + Note over Server: Processing chain:
1. Vehicle Detection (GPU)
2. Vehicle Classification (NPU) - + Server->>MQTT: Publish enhanced metadata Note over MQTT: Detection + classification data
in single message batch API-->>User: Stage addition confirmation @@ -639,16 +640,16 @@ sequenceDiagram participant MQTT as MQTT Broker Note over User: Existing pipeline: Vehicle Detection - + User->>API: POST /pipelines/{pipeline_id}/stages Note over User,API: Add parallel stage:
- Input: Source camera frames
- Stage: Person Detection
- Hardware: GPU
- Mode: Parallel API->>Server: Validate parallel stage configuration Server->>Server: Create person detection stage Server->>Server: Configure parallel processing Server->>Server: Start concurrent analytics - + Note over Server: Parallel processing:
1. Vehicle Detection (GPU)
2. Person Detection (GPU)
Both processing same input frames - + Server->>Server: Merge results from parallel stages Server->>MQTT: Publish combined metadata Note over MQTT: Single message with unified detection list:
All vehicle + person detections
from concurrent analytics @@ -668,20 +669,20 @@ sequenceDiagram participant SceneScape as SceneScape System Note over User: Existing pipeline processing Camera 1
with Vehicle Detection + Classification - + User->>API: POST /pipelines/{pipeline_id}/cameras Note over User,API: Add camera to existing pipeline:
- Camera ID: "cam_south"
- RTSP URL
- Inherits pipeline analytics API->>Server: Create camera and add to pipeline Server->>Server: Establish camera connection Server->>Server: Configure multi-camera batching Server->>Server: Apply existing analytics to new camera - + Note over Server: Processing both cameras:
Camera 1 + Camera 2
→ Detection + Classification - + Server->>Server: Batch results from both cameras Server->>MQTT: Publish aggregated batch Note over Server,MQTT: Single MQTT message containing:
- Camera 1 detections (ID + timestamp)
- Camera 2 detections (ID + timestamp)
- Preserved individual metadata - + MQTT->>SceneScape: Process batched multi-camera data API-->>User: Camera addition confirmation ```