From e5b686e653f504001ab4b2fb22826e4bb5591c2f Mon Sep 17 00:00:00 2001
From: Yang Fu <yangf@nvidia.com>
Date: Wed, 24 Jun 2026 16:22:17 -0700
Subject: [PATCH 1/2] docs: add Cosmos3 Reasoner Transformers notebook

Add run_with_transformers.ipynb so all four Reasoner backends (Cosmos
Framework, vLLM, NIM, Transformers) ship a notebook. The notebook mirrors
the in-process Diffusers notebook flow: dedicated venv, registered
`Cosmos3 Transformers (Python 3.13)` kernel, then loads
Cosmos3OmniForConditionalGeneration in process and runs the image and
video examples via a small run_reasoner helper.

Also add the matching "Notebook walkthrough" subsection to the Run with
Transformers section of the reasoner README for parity with the other
backends.

Verified on a GB200 box: the notebook's model-load + image + video cells
execute end-to-end with correct outputs; structure passes nbformat.validate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 cookbooks/cosmos3/reasoner/README.md          |  11 +
 .../reasoner/run_with_transformers.ipynb      | 448 ++++++++++++++++++
 2 files changed, 459 insertions(+)
 create mode 100644 cookbooks/cosmos3/reasoner/run_with_transformers.ipynb

diff --git a/cookbooks/cosmos3/reasoner/README.md b/cookbooks/cosmos3/reasoner/README.md
index bdf14d2f..991de5f8 100644
--- a/cookbooks/cosmos3/reasoner/README.md
+++ b/cookbooks/cosmos3/reasoner/README.md
@@ -272,3 +272,14 @@ To run **Cosmos3-Super**, change `model_id` to `nvidia/Cosmos3-Super`.
 `device_map="auto"` can shard the model across multiple GPUs when Accelerate is
 installed. Use [vLLM](#run-with-vllm) or [NIM](#run-with-nim) when you need an
 OpenAI-compatible server instead of local Python inference.
+
+### Notebook walkthrough
+
+[`run_with_transformers.ipynb`](./run_with_transformers.ipynb) is the Python-first
+counterpart to the server notebooks: instead of launching a server, it installs an
+isolated venv, registers a `Cosmos3 Transformers (Python 3.13)` Jupyter kernel,
+and loads `Cosmos3OmniForConditionalGeneration` in process. A small
+`run_reasoner` helper wraps `apply_chat_template` + `generate`, and the notebook
+then runs the image and video examples shown above. To scale from **Nano** to
+**Super**, change only `model_id` in the load cell and re-run; `device_map="auto"`
+shards Super across multiple GPUs.
diff --git a/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb b/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb
new file mode 100644
index 00000000..cccc482d
--- /dev/null
+++ b/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb
@@ -0,0 +1,448 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "458e701e",
+   "metadata": {},
+   "source": [
+    "<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.\n",
+    "SPDX-License-Identifier: OpenMDW-1.1 -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28202840",
+   "metadata": {},
+   "source": [
+    "# Cosmos3 Reasoner with Transformers\n",
+    "\n",
+    "This notebook runs Cosmos3 Reasoner inference directly with Hugging Face\n",
+    "Transformers — a Python-first path with no server to launch.\n",
+    "\n",
+    "1. Sets up an isolated venv with the Cosmos3 Transformers integration (`transformers>=5.11.0`).\n",
+    "2. Registers a Jupyter kernel and loads `Cosmos3OmniForConditionalGeneration` in process.\n",
+    "3. Runs image and video reasoning requests.\n",
+    "\n",
+    "The integration loads **only the Reasoner tower** from the unified `nvidia/Cosmos3-Nano`\n",
+    "(or `nvidia/Cosmos3-Super`) checkpoint and returns text for text, image, and video\n",
+    "understanding. It does not generate images, video, audio, or actions — use the\n",
+    "Diffusers or vLLM-Omni cookbooks for those.\n",
+    "\n",
+    "Note: if you have already completed steps 1-4 and installed the\n",
+    "`Cosmos3 Transformers (Python 3.13)` kernel, switch to that kernel, run the\n",
+    "Restore Environment cell in step 4, then continue from step 5."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98a8252e",
+   "metadata": {},
+   "source": [
+    "## 1. Prerequisites\n",
+    "\n",
+    "Use a Linux machine with NVIDIA GPU access, model access on Hugging Face, and\n",
+    "either `uvx hf@latest auth login` or `HF_TOKEN` set.\n",
+    "\n",
+    "> **Headless servers:** if you see an error like `libxcb.so.1: cannot open shared\n",
+    "> object file` when importing, install the required system libraries:\n",
+    ">\n",
+    "> ```bash\n",
+    "> apt-get install -y libxcb1 libgl1 libglib2.0-0\n",
+    "> ```\n",
+    "\n",
+    "> **uv version:** these notebooks need `uv >= 0.11.3`. Older versions do not\n",
+    "> recognize newer `--torch-backend` values such as `cu130`. Upgrade with\n",
+    "> `uv self update` if you hit version-related errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9f3020e",
+   "metadata": {},
+   "source": [
+    "## 2. Configure Paths and Environment\n",
+    "\n",
+    "The defaults are relative to this `cosmos` checkout. Override any of these before\n",
+    "running the next cell if needed:\n",
+    "\n",
+    "```bash\n",
+    "export COSMOS3_TRANSFORMERS_VENV=/path/to/.venv-cosmos3-transformers\n",
+    "export COSMOS3_TORCH_BACKEND=auto   # or cu130 / cu128 to pin an explicit CUDA wheel\n",
+    "export HF_HOME=/path/to/large/huggingface/cache\n",
+    "export UV_LINK_MODE=copy\n",
+    "export CUDA_VISIBLE_DEVICES=0\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "38384104",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pathlib import Path\n",
+    "import os\n",
+    "\n",
+    "\n",
+    "def find_repo_root(start: Path) -> Path:\n",
+    "    for path in [start, *start.parents]:\n",
+    "        if (path / \"README.md\").exists() and (path / \"cookbooks\").exists():\n",
+    "            return path\n",
+    "    return start\n",
+    "\n",
+    "\n",
+    "def configure_transformers_environment() -> None:\n",
+    "    global COSMOS_ROOT\n",
+    "    global COSMOS_REASONER_ASSETS\n",
+    "    global COSMOS3_TRANSFORMERS_VENV\n",
+    "    global COSMOS3_TORCH_BACKEND\n",
+    "\n",
+    "    COSMOS_ROOT = find_repo_root(Path.cwd().resolve())\n",
+    "    COSMOS_REASONER_ASSETS = COSMOS_ROOT / \"cookbooks\" / \"cosmos3\" / \"reasoner\" / \"assets\"\n",
+    "    COSMOS3_TRANSFORMERS_VENV = Path(\n",
+    "        os.environ.get(\"COSMOS3_TRANSFORMERS_VENV\", COSMOS_ROOT / \".venv-cosmos3-transformers\")\n",
+    "    ).resolve()\n",
+    "    COSMOS3_TORCH_BACKEND = os.environ.get(\"COSMOS3_TORCH_BACKEND\", \"auto\")\n",
+    "\n",
+    "    os.environ[\"COSMOS3_TRANSFORMERS_VENV\"] = str(COSMOS3_TRANSFORMERS_VENV)\n",
+    "    os.environ[\"COSMOS3_TORCH_BACKEND\"] = COSMOS3_TORCH_BACKEND\n",
+    "    os.environ.setdefault(\"UV_LINK_MODE\", \"copy\")\n",
+    "\n",
+    "    assert COSMOS_REASONER_ASSETS.exists(), COSMOS_REASONER_ASSETS\n",
+    "\n",
+    "\n",
+    "def asset_path(name: str) -> Path:\n",
+    "    path = COSMOS_REASONER_ASSETS / name\n",
+    "    if not path.exists():\n",
+    "        raise FileNotFoundError(path)\n",
+    "    return path\n",
+    "\n",
+    "\n",
+    "configure_transformers_environment()\n",
+    "print(\"cosmos root:\", COSMOS_ROOT)\n",
+    "print(\"Reasoner assets:\", COSMOS_REASONER_ASSETS)\n",
+    "print(\"Transformers venv:\", COSMOS3_TRANSFORMERS_VENV)\n",
+    "print(\"Torch backend:\", COSMOS3_TORCH_BACKEND)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ea0d464",
+   "metadata": {},
+   "source": [
+    "## 3. Install Transformers Dependencies\n",
+    "\n",
+    "Cosmos3 support first appears in the Transformers `v5.11.0` release tag. This cell\n",
+    "creates the venv, installs the dependencies, and registers a Jupyter kernel so the\n",
+    "model can run in process.\n",
+    "\n",
+    "`--torch-backend` defaults to `auto`, which lets uv pick a CUDA build of\n",
+    "`torch`/`torchvision` that matches your driver. Set `COSMOS3_TORCH_BACKEND=cu130`\n",
+    "(or `cu128`) above to pin an explicit wheel."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "34acbe10",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%bash\n",
+    "set -euo pipefail\n",
+    "\n",
+    "if ! command -v uv >/dev/null 2>&1; then\n",
+    "  echo \"uv is not installed. Install it first: https://docs.astral.sh/uv/getting-started/installation/\"\n",
+    "  exit 1\n",
+    "fi\n",
+    "\n",
+    "export UV_LINK_MODE=\"${UV_LINK_MODE:-copy}\"\n",
+    "uv venv \"$COSMOS3_TRANSFORMERS_VENV\" --python 3.13 --seed --managed-python --allow-existing\n",
+    "source \"$COSMOS3_TRANSFORMERS_VENV/bin/activate\"\n",
+    "\n",
+    "uv pip install --torch-backend=\"$COSMOS3_TORCH_BACKEND\" \\\n",
+    "  accelerate \\\n",
+    "  av \\\n",
+    "  ipykernel \\\n",
+    "  pillow \\\n",
+    "  \"safetensors>=0.8.0\" \\\n",
+    "  torch \\\n",
+    "  torchvision \\\n",
+    "  \"transformers>=5.11.0\"\n",
+    "\n",
+    "\"$COSMOS3_TRANSFORMERS_VENV/bin/python\" -m ipykernel install --user \\\n",
+    "  --name cosmos3-transformers \\\n",
+    "  --display-name \"Cosmos3 Transformers (Python 3.13)\"\n",
+    "\n",
+    "echo\n",
+    "echo \"Installed dependencies into: $COSMOS3_TRANSFORMERS_VENV\"\n",
+    "echo \"Next: switch this notebook kernel to: Cosmos3 Transformers (Python 3.13)\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1149a05a",
+   "metadata": {},
+   "source": [
+    "## 4. Select the Transformers Kernel\n",
+    "\n",
+    "The install cell registers the `Cosmos3 Transformers (Python 3.13)` Jupyter kernel.\n",
+    "\n",
+    "**Switch this notebook to that kernel before running the remaining Python cells**,\n",
+    "then run the Restore Environment cell immediately below. It can take a moment for\n",
+    "the new kernel to appear in the notebook interface."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3d209e12",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run this cell immediately after switching to the Cosmos3 Transformers kernel.\n",
+    "# It restores the same paths as the configure cell in step 2.\n",
+    "from pathlib import Path\n",
+    "import os\n",
+    "\n",
+    "\n",
+    "def find_repo_root(start: Path) -> Path:\n",
+    "    for path in [start, *start.parents]:\n",
+    "        if (path / \"README.md\").exists() and (path / \"cookbooks\").exists():\n",
+    "            return path\n",
+    "    return start\n",
+    "\n",
+    "\n",
+    "COSMOS_ROOT = find_repo_root(Path.cwd().resolve())\n",
+    "COSMOS_REASONER_ASSETS = COSMOS_ROOT / \"cookbooks\" / \"cosmos3\" / \"reasoner\" / \"assets\"\n",
+    "COSMOS3_TRANSFORMERS_VENV = Path(\n",
+    "    os.environ.get(\"COSMOS3_TRANSFORMERS_VENV\", COSMOS_ROOT / \".venv-cosmos3-transformers\")\n",
+    ").resolve()\n",
+    "os.environ[\"COSMOS3_TRANSFORMERS_VENV\"] = str(COSMOS3_TRANSFORMERS_VENV)\n",
+    "\n",
+    "\n",
+    "def asset_path(name: str) -> Path:\n",
+    "    path = COSMOS_REASONER_ASSETS / name\n",
+    "    if not path.exists():\n",
+    "        raise FileNotFoundError(path)\n",
+    "    return path\n",
+    "\n",
+    "\n",
+    "print(\"cosmos root:\", COSMOS_ROOT)\n",
+    "print(\"Reasoner assets:\", COSMOS_REASONER_ASSETS)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab6a9c99",
+   "metadata": {},
+   "source": [
+    "## 5. Verify GPU and Python Environment"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "948b294f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "from pathlib import Path\n",
+    "\n",
+    "if \"COSMOS3_TRANSFORMERS_VENV\" not in os.environ:\n",
+    "    raise RuntimeError(\"Run the Restore Environment cell after switching to the Transformers kernel.\")\n",
+    "\n",
+    "expected_python = (Path(os.environ[\"COSMOS3_TRANSFORMERS_VENV\"]) / \"bin\" / \"python\").resolve()\n",
+    "current_python = Path(sys.executable).resolve()\n",
+    "print(\"kernel python:\", current_python)\n",
+    "print(\"expected python:\", expected_python)\n",
+    "if current_python != expected_python:\n",
+    "    raise RuntimeError(\n",
+    "        \"This notebook is not running inside the Transformers venv. \"\n",
+    "        \"Switch the kernel to 'Cosmos3 Transformers (Python 3.13)', then run the Restore Environment cell above.\"\n",
+    "    )\n",
+    "\n",
+    "import torch\n",
+    "import transformers\n",
+    "\n",
+    "print(\"transformers:\", transformers.__version__)\n",
+    "print(\"torch:\", torch.__version__)\n",
+    "print(\"torch cuda:\", torch.version.cuda)\n",
+    "print(\"cuda available:\", torch.cuda.is_available())\n",
+    "print(\"device count:\", torch.cuda.device_count())\n",
+    "if torch.cuda.is_available():\n",
+    "    print(\"device 0:\", torch.cuda.get_device_name(0))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e00b6d3",
+   "metadata": {},
+   "source": [
+    "## 6. Load the Reasoner\n",
+    "\n",
+    "Load the processor and model once, then reuse them for every request below.\n",
+    "\n",
+    "`device_map=\"auto\"` places the model on the available GPU(s) and can shard\n",
+    "`Cosmos3-Super` across multiple GPUs when Accelerate is installed.\n",
+    "\n",
+    "> **First run downloads the full unified checkpoint** (tens of GiB; ~28 GiB for\n",
+    "> Nano) even though only the Reasoner tower is loaded into memory (~17 GiB).\n",
+    "> Subsequent runs reuse the Hugging Face cache."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "937f9ede",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from transformers import AutoProcessor, Cosmos3OmniForConditionalGeneration\n",
+    "\n",
+    "model_id = \"nvidia/Cosmos3-Nano\"  # or \"nvidia/Cosmos3-Super\"\n",
+    "\n",
+    "processor = AutoProcessor.from_pretrained(model_id)\n",
+    "model = Cosmos3OmniForConditionalGeneration.from_pretrained(\n",
+    "    model_id,\n",
+    "    dtype=torch.bfloat16,\n",
+    "    device_map=\"auto\",\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def run_reasoner(content, fps=None, max_new_tokens=512):\n",
+    "    \"\"\"Run one Reasoner request. `content` is a chat content list (image/video/text blocks).\"\"\"\n",
+    "    messages = [{\"role\": \"user\", \"content\": content}]\n",
+    "    template_kwargs = dict(\n",
+    "        tokenize=True,\n",
+    "        add_generation_prompt=True,\n",
+    "        return_dict=True,\n",
+    "        return_tensors=\"pt\",\n",
+    "    )\n",
+    "    if fps is not None:\n",
+    "        template_kwargs[\"fps\"] = fps\n",
+    "\n",
+    "    inputs = processor.apply_chat_template(messages, **template_kwargs).to(model.device, torch.bfloat16)\n",
+    "    generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=max_new_tokens)\n",
+    "    generated_ids_trimmed = [\n",
+    "        out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)\n",
+    "    ]\n",
+    "    return processor.batch_decode(\n",
+    "        generated_ids_trimmed,\n",
+    "        skip_special_tokens=True,\n",
+    "        clean_up_tokenization_spaces=False,\n",
+    "    )[0]\n",
+    "\n",
+    "\n",
+    "print(\"Loaded\", model_id)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f80ec19",
+   "metadata": {},
+   "source": [
+    "## 7. Image Reasoning"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3cdd010c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from IPython.display import Image, display\n",
+    "\n",
+    "image_path = asset_path(\"robot_153.jpg\")\n",
+    "display(Image(filename=str(image_path), width=512))\n",
+    "\n",
+    "output = run_reasoner(\n",
+    "    [\n",
+    "        {\"type\": \"image\", \"path\": str(image_path.resolve())},\n",
+    "        {\"type\": \"text\", \"text\": \"Caption the image in detail.\"},\n",
+    "    ]\n",
+    ")\n",
+    "print(output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88cadada",
+   "metadata": {},
+   "source": [
+    "## 8. Video Reasoning\n",
+    "\n",
+    "Use a `video` content block and pass a frame sampling rate (`fps`) to the helper.\n",
+    "\n",
+    "> Video decoding uses the packages installed above. Transformers prints a\n",
+    "> deprecation warning that it fell back to the `torchvision` decoder — this is\n",
+    "> expected and harmless. To switch to the modern `torchcodec` decoder, install it\n",
+    "> along with system FFmpeg libraries (`libavutil`/`libavcodec`)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e7cab369",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from IPython.display import Video, display\n",
+    "\n",
+    "video_path = asset_path(\"video_caption.mp4\")\n",
+    "display(Video(str(video_path), embed=True, width=640))\n",
+    "\n",
+    "output = run_reasoner(\n",
+    "    [\n",
+    "        {\"type\": \"video\", \"path\": str(video_path.resolve())},\n",
+    "        {\"type\": \"text\", \"text\": \"Describe the notable events in this video.\"},\n",
+    "    ],\n",
+    "    fps=2,\n",
+    ")\n",
+    "print(output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d2a6b57",
+   "metadata": {},
+   "source": [
+    "## 9. Next Steps\n",
+    "\n",
+    "- Run **Cosmos3-Super**: set `model_id = \"nvidia/Cosmos3-Super\"` in step 6 and\n",
+    "  re-run from there. `device_map=\"auto\"` shards it across multiple GPUs.\n",
+    "- Try other Reasoner tasks (temporal localization, grounding, embodied reasoning)\n",
+    "  by changing the prompt and asset — see the\n",
+    "  [Reasoner Prompt Guide](./reasoner_prompt_guide.md).\n",
+    "- Need an OpenAI-compatible server instead of in-process Python? See\n",
+    "  [`run_with_vllm.ipynb`](./run_with_vllm.ipynb) or [`run_with_nim.ipynb`](./run_with_nim.ipynb)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

From bb5fbe9cccd13712274f32b9bd6ce61ce00bc466 Mon Sep 17 00:00:00 2001
From: Yang Fu <yangf@nvidia.com>
Date: Wed, 24 Jun 2026 18:38:02 -0700
Subject: [PATCH 2/2] docs: pin torchvision to 0.25.0 in Transformers notebook
 and README

---
 cookbooks/cosmos3/README.md                            | 2 +-
 cookbooks/cosmos3/reasoner/run_with_transformers.ipynb | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/cookbooks/cosmos3/README.md b/cookbooks/cosmos3/README.md
index 4c1f9fdc..a12f123d 100644
--- a/cookbooks/cosmos3/README.md
+++ b/cookbooks/cosmos3/README.md
@@ -180,7 +180,7 @@ uv pip install --torch-backend=auto \
   pillow \
   "safetensors>=0.8.0" \
   torch \
-  torchvision \
+  "torchvision==0.25.0" \
   "transformers>=5.11.0"
 ```
 
diff --git a/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb b/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb
index cccc482d..258a0ef6 100644
--- a/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb
+++ b/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb
@@ -168,7 +168,7 @@
     "  pillow \\\n",
     "  \"safetensors>=0.8.0\" \\\n",
     "  torch \\\n",
-    "  torchvision \\\n",
+    "  \"torchvision==0.25.0\" \\\n",
     "  \"transformers>=5.11.0\"\n",
     "\n",
     "\"$COSMOS3_TRANSFORMERS_VENV/bin/python\" -m ipykernel install --user \\\n",
@@ -426,7 +426,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -440,7 +440,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.10.20"
   }
  },
  "nbformat": 4,