Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions cookbooks/cosmos3/generator/action/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ HF repository before running these examples. To disable the guardrail, set
- [Run with Cosmos Framework](#run-with-cosmos-framework)
- [Quickstart](#quickstart)
- [Cosmos Framework Walkthrough](#cosmos-framework-walkthrough)
- [Topological Modeling](#topological-modeling)
- [Run with vLLM-Omni](#run-with-vllm-omni)
- [Quickstart](#quickstart-1)
- [Notebook walkthrough](#notebook-walkthrough)
Expand All @@ -48,6 +49,46 @@ fingers.

Action data samples across different embodiments can be inspected interactively in the [Cosmos3 Action Viewer](https://huggingface.co/spaces/nvidia/Cosmos3-Action-Viewer) Hugging Face Space.

## Topological Modeling

The optional [`topology_helpers.py`](./topology_helpers.py) module adds
topology-aware diagnostics for action forward-dynamics rollouts. It evaluates
binary or labeled masks from generated robotics videos and reports connected
components, hole proxies, Euler-characteristic-style summaries, chunk-boundary
drift, and rollout stability scores. The helper is side-effect-free and does not
change Cosmos3 inference behavior.

The modeling note in [`topological_modeling.md`](./topological_modeling.md)
describes the contribution as one topology-aware layer for Cosmos action FD:
finite-difference/FDTD rollout structure, topology metrics, topological
inference extension points, and sparse swarm-routing extension points.

Minimal mask-first usage:

```python
from pathlib import Path
from topology_helpers import (
RolloutSpec,
TopologyConfig,
evaluate_fd_rollout,
write_topology_csv,
write_topology_json,
)

report = evaluate_fd_rollout(
masks=object_masks, # sequence of binary masks, or {"object": masks, "gripper": masks}
rollout=RolloutSpec(
video_id="robotics_action_cond_stitched",
domain_name="droid_lerobot",
fps=15,
action_chunk_size=16,
),
config=TopologyConfig(min_component_area_px=16, generated_frame_start=1),
)
write_topology_json(report, Path("topology_metrics.json"))
write_topology_csv(report, Path("topology_metrics.csv"))
```

## Run with Cosmos Framework

### Quickstart
Expand Down
146 changes: 146 additions & 0 deletions cookbooks/cosmos3/generator/action/topological_modeling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Topological Modeling for Cosmos Action Forward Dynamics

This note describes an optional modeling and evaluation layer for Cosmos3 action
forward dynamics. It does not change model inference. It adds a way to inspect
whether generated robotics rollouts preserve simple topology over time.

## Modeling Rationale

Forward dynamics predicts future observations from a start image and action
trajectory. For robotics rollouts, useful structure often lives in relationships:
object separation, contact regions, gripper/object coupling, holes, occlusions,
and chunk-boundary continuity. Pixel error alone does not expose these failures
cleanly.

The contribution treats generated rollout masks or point-cloud proxies as
field-like state samples:

- finite-difference and FDTD thinking motivate checking how state evolves across
adjacent frames and chunks;
- topology metrics summarize whether visible structure splits, merges, opens, or
collapses;
- topological inference concepts motivate stable manifold state, sparse
neighborhood reasoning, and convergence checks;
- swarm-routing concepts motivate future multi-agent robot/model specialists
selected by topology-aware state similarity.

The implementation is fresh Cosmos-native code. No external research prototype
code is imported or vendored.

## Implemented Core

`topology_helpers.py` evaluates masks for generated FD rollouts and emits
deterministic JSON/CSV summaries.

The mask-first API is deliberate. It keeps the core metric independent of any
specific segmentation model and makes the evaluator safe for CPU-only tests.
Notebook users can provide task-specific binary masks, labeled masks, or simple
luminance-threshold masks for quick inspection.

Frame-level metrics include:

- connected foreground components;
- enclosed background holes as a loop proxy;
- Euler-characteristic-style `components - holes`;
- foreground area, bounding box, and centroid;
- drift from the previous frame;
- drift from the first generated frame;
- optional persistent-homology summaries when `ripser` is installed and enabled.

Sequence-level metrics include:

- component and hole change counts;
- mean and maximum topology drift;
- chunk-boundary jump count;
- stable frame ratio;
- heuristic stability score in `[0, 1]`.

Finite-difference and convergence metrics include:

- `compute_fdtd_rollout_trace`, an FDTD-inspired trace over topology state
velocity, acceleration, and local change ratio;
- `TopologyConvergenceGate`, a theorem-style threshold bundle for stable
rollouts;
- `evaluate_topological_convergence`, a pass/fail certificate over stability,
Betti-stability, drift, boundary jumps, and finite-difference speed.

## FD/FDTD Modeling

Finite-difference modeling gives the evaluator its local-time bias. A rollout is
not inspected as disconnected images; it is inspected as a sequence whose visible
structure should evolve smoothly unless the action implies a topological event.

This is especially relevant for autoregressive robotics chunks. A chunk boundary
that splits one object mask into two components, opens a previously closed
contact loop, or collapses a stable foreground component can be flagged even when
the generated video still looks plausible at a glance.

The helper computes finite differences over compact topology state vectors. This
does not claim to solve a physical PDE. It gives the action FD cookbook a
deterministic analogue of velocity and acceleration over rollout topology, so
large local topology jumps become measurable.

## Topology Metrics

The first metric layer is intentionally simple:

1. build connected components over foreground masks;
2. count enclosed background regions as hole proxies;
3. compare component and hole counts over time;
4. summarize stability per rollout.

This gives a deterministic baseline that can be reviewed without GPU access,
model weights, or external topology packages. Persistent homology is treated as
an optional enrichment, not a hard dependency.

## Topological Inference Extension

The same report schema supports an inference-oriented state vector today:

- `frame_to_state_vector` compresses a frame report into component, hole,
Euler, area, centroid, and drift features;
- `topology_state_distance` compares state vectors with a weighted L1 distance;
- `betti_stability_score` checks whether component/hole proxies stabilize over a
sliding window;
- `evaluate_topological_convergence` produces a small convergence certificate
from stability, Betti-stability, drift, and finite-difference speed checks.

These helpers are not a new inference engine. They are small interfaces that can
support later inference-oriented analysis:

- represent rollout frames as points on a manifold;
- use sparse neighborhood queries to compare the current state to prior states;
- monitor Betti-stability over a sliding window;
- detect drift when topology changes faster than action-conditioned dynamics
justify;
- gate stronger claims behind reproducible tests or formal checks.

Those ideas remain extension points. The current helper exposes metrics and
schemas, not a new inference engine.

## Swarm Robotics Extension

For multi-agent robotics, topology summaries can become routing features. A
future swarm evaluator can build on the included `rank_topology_specialists`
helper, which applies a sparse top-k rule over:

- topology-aware state similarity;
- worker reliability;
- estimated cost;
- utilization balance;
- local drift risk.

This would let a multi-robot or multi-model system route hard local dynamics
problems to specialists without changing the base Cosmos3 generation interface.

## Recommended Use

1. Generate DROID or UMI action FD outputs using the existing notebooks.
2. Produce task-specific masks for objects, grippers, or contact regions.
3. Call `evaluate_fd_rollout` with a `RolloutSpec`.
4. Write `topology_metrics.json` and, when useful, `topology_metrics.csv` beside
the generated rollout output.
5. Inspect drift spikes at autoregressive chunk boundaries.

The metric is diagnostic. It should not be reported as generation improvement
unless paired with a controlled validation protocol.
Loading