From e5b686e653f504001ab4b2fb22826e4bb5591c2f Mon Sep 17 00:00:00 2001 From: Yang Fu Date: Wed, 24 Jun 2026 16:22:17 -0700 Subject: [PATCH 1/2] docs: add Cosmos3 Reasoner Transformers notebook Add run_with_transformers.ipynb so all four Reasoner backends (Cosmos Framework, vLLM, NIM, Transformers) ship a notebook. The notebook mirrors the in-process Diffusers notebook flow: dedicated venv, registered `Cosmos3 Transformers (Python 3.13)` kernel, then loads Cosmos3OmniForConditionalGeneration in process and runs the image and video examples via a small run_reasoner helper. Also add the matching "Notebook walkthrough" subsection to the Run with Transformers section of the reasoner README for parity with the other backends. Verified on a GB200 box: the notebook's model-load + image + video cells execute end-to-end with correct outputs; structure passes nbformat.validate. Co-Authored-By: Claude Opus 4.8 (1M context) --- cookbooks/cosmos3/reasoner/README.md | 11 + .../reasoner/run_with_transformers.ipynb | 448 ++++++++++++++++++ 2 files changed, 459 insertions(+) create mode 100644 cookbooks/cosmos3/reasoner/run_with_transformers.ipynb diff --git a/cookbooks/cosmos3/reasoner/README.md b/cookbooks/cosmos3/reasoner/README.md index bdf14d2f..991de5f8 100644 --- a/cookbooks/cosmos3/reasoner/README.md +++ b/cookbooks/cosmos3/reasoner/README.md @@ -272,3 +272,14 @@ To run **Cosmos3-Super**, change `model_id` to `nvidia/Cosmos3-Super`. `device_map="auto"` can shard the model across multiple GPUs when Accelerate is installed. Use [vLLM](#run-with-vllm) or [NIM](#run-with-nim) when you need an OpenAI-compatible server instead of local Python inference. + +### Notebook walkthrough + +[`run_with_transformers.ipynb`](./run_with_transformers.ipynb) is the Python-first +counterpart to the server notebooks: instead of launching a server, it installs an +isolated venv, registers a `Cosmos3 Transformers (Python 3.13)` Jupyter kernel, +and loads `Cosmos3OmniForConditionalGeneration` in process. A small +`run_reasoner` helper wraps `apply_chat_template` + `generate`, and the notebook +then runs the image and video examples shown above. To scale from **Nano** to +**Super**, change only `model_id` in the load cell and re-run; `device_map="auto"` +shards Super across multiple GPUs. diff --git a/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb b/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb new file mode 100644 index 00000000..cccc482d --- /dev/null +++ b/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb @@ -0,0 +1,448 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "458e701e", + "metadata": {}, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "id": "28202840", + "metadata": {}, + "source": [ + "# Cosmos3 Reasoner with Transformers\n", + "\n", + "This notebook runs Cosmos3 Reasoner inference directly with Hugging Face\n", + "Transformers — a Python-first path with no server to launch.\n", + "\n", + "1. Sets up an isolated venv with the Cosmos3 Transformers integration (`transformers>=5.11.0`).\n", + "2. Registers a Jupyter kernel and loads `Cosmos3OmniForConditionalGeneration` in process.\n", + "3. Runs image and video reasoning requests.\n", + "\n", + "The integration loads **only the Reasoner tower** from the unified `nvidia/Cosmos3-Nano`\n", + "(or `nvidia/Cosmos3-Super`) checkpoint and returns text for text, image, and video\n", + "understanding. It does not generate images, video, audio, or actions — use the\n", + "Diffusers or vLLM-Omni cookbooks for those.\n", + "\n", + "Note: if you have already completed steps 1-4 and installed the\n", + "`Cosmos3 Transformers (Python 3.13)` kernel, switch to that kernel, run the\n", + "Restore Environment cell in step 4, then continue from step 5." + ] + }, + { + "cell_type": "markdown", + "id": "98a8252e", + "metadata": {}, + "source": [ + "## 1. Prerequisites\n", + "\n", + "Use a Linux machine with NVIDIA GPU access, model access on Hugging Face, and\n", + "either `uvx hf@latest auth login` or `HF_TOKEN` set.\n", + "\n", + "> **Headless servers:** if you see an error like `libxcb.so.1: cannot open shared\n", + "> object file` when importing, install the required system libraries:\n", + ">\n", + "> ```bash\n", + "> apt-get install -y libxcb1 libgl1 libglib2.0-0\n", + "> ```\n", + "\n", + "> **uv version:** these notebooks need `uv >= 0.11.3`. Older versions do not\n", + "> recognize newer `--torch-backend` values such as `cu130`. Upgrade with\n", + "> `uv self update` if you hit version-related errors." + ] + }, + { + "cell_type": "markdown", + "id": "f9f3020e", + "metadata": {}, + "source": [ + "## 2. Configure Paths and Environment\n", + "\n", + "The defaults are relative to this `cosmos` checkout. Override any of these before\n", + "running the next cell if needed:\n", + "\n", + "```bash\n", + "export COSMOS3_TRANSFORMERS_VENV=/path/to/.venv-cosmos3-transformers\n", + "export COSMOS3_TORCH_BACKEND=auto # or cu130 / cu128 to pin an explicit CUDA wheel\n", + "export HF_HOME=/path/to/large/huggingface/cache\n", + "export UV_LINK_MODE=copy\n", + "export CUDA_VISIBLE_DEVICES=0\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38384104", + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "import os\n", + "\n", + "\n", + "def find_repo_root(start: Path) -> Path:\n", + " for path in [start, *start.parents]:\n", + " if (path / \"README.md\").exists() and (path / \"cookbooks\").exists():\n", + " return path\n", + " return start\n", + "\n", + "\n", + "def configure_transformers_environment() -> None:\n", + " global COSMOS_ROOT\n", + " global COSMOS_REASONER_ASSETS\n", + " global COSMOS3_TRANSFORMERS_VENV\n", + " global COSMOS3_TORCH_BACKEND\n", + "\n", + " COSMOS_ROOT = find_repo_root(Path.cwd().resolve())\n", + " COSMOS_REASONER_ASSETS = COSMOS_ROOT / \"cookbooks\" / \"cosmos3\" / \"reasoner\" / \"assets\"\n", + " COSMOS3_TRANSFORMERS_VENV = Path(\n", + " os.environ.get(\"COSMOS3_TRANSFORMERS_VENV\", COSMOS_ROOT / \".venv-cosmos3-transformers\")\n", + " ).resolve()\n", + " COSMOS3_TORCH_BACKEND = os.environ.get(\"COSMOS3_TORCH_BACKEND\", \"auto\")\n", + "\n", + " os.environ[\"COSMOS3_TRANSFORMERS_VENV\"] = str(COSMOS3_TRANSFORMERS_VENV)\n", + " os.environ[\"COSMOS3_TORCH_BACKEND\"] = COSMOS3_TORCH_BACKEND\n", + " os.environ.setdefault(\"UV_LINK_MODE\", \"copy\")\n", + "\n", + " assert COSMOS_REASONER_ASSETS.exists(), COSMOS_REASONER_ASSETS\n", + "\n", + "\n", + "def asset_path(name: str) -> Path:\n", + " path = COSMOS_REASONER_ASSETS / name\n", + " if not path.exists():\n", + " raise FileNotFoundError(path)\n", + " return path\n", + "\n", + "\n", + "configure_transformers_environment()\n", + "print(\"cosmos root:\", COSMOS_ROOT)\n", + "print(\"Reasoner assets:\", COSMOS_REASONER_ASSETS)\n", + "print(\"Transformers venv:\", COSMOS3_TRANSFORMERS_VENV)\n", + "print(\"Torch backend:\", COSMOS3_TORCH_BACKEND)" + ] + }, + { + "cell_type": "markdown", + "id": "0ea0d464", + "metadata": {}, + "source": [ + "## 3. Install Transformers Dependencies\n", + "\n", + "Cosmos3 support first appears in the Transformers `v5.11.0` release tag. This cell\n", + "creates the venv, installs the dependencies, and registers a Jupyter kernel so the\n", + "model can run in process.\n", + "\n", + "`--torch-backend` defaults to `auto`, which lets uv pick a CUDA build of\n", + "`torch`/`torchvision` that matches your driver. Set `COSMOS3_TORCH_BACKEND=cu130`\n", + "(or `cu128`) above to pin an explicit wheel." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34acbe10", + "metadata": {}, + "outputs": [], + "source": [ + "%%bash\n", + "set -euo pipefail\n", + "\n", + "if ! command -v uv >/dev/null 2>&1; then\n", + " echo \"uv is not installed. Install it first: https://docs.astral.sh/uv/getting-started/installation/\"\n", + " exit 1\n", + "fi\n", + "\n", + "export UV_LINK_MODE=\"${UV_LINK_MODE:-copy}\"\n", + "uv venv \"$COSMOS3_TRANSFORMERS_VENV\" --python 3.13 --seed --managed-python --allow-existing\n", + "source \"$COSMOS3_TRANSFORMERS_VENV/bin/activate\"\n", + "\n", + "uv pip install --torch-backend=\"$COSMOS3_TORCH_BACKEND\" \\\n", + " accelerate \\\n", + " av \\\n", + " ipykernel \\\n", + " pillow \\\n", + " \"safetensors>=0.8.0\" \\\n", + " torch \\\n", + " torchvision \\\n", + " \"transformers>=5.11.0\"\n", + "\n", + "\"$COSMOS3_TRANSFORMERS_VENV/bin/python\" -m ipykernel install --user \\\n", + " --name cosmos3-transformers \\\n", + " --display-name \"Cosmos3 Transformers (Python 3.13)\"\n", + "\n", + "echo\n", + "echo \"Installed dependencies into: $COSMOS3_TRANSFORMERS_VENV\"\n", + "echo \"Next: switch this notebook kernel to: Cosmos3 Transformers (Python 3.13)\"" + ] + }, + { + "cell_type": "markdown", + "id": "1149a05a", + "metadata": {}, + "source": [ + "## 4. Select the Transformers Kernel\n", + "\n", + "The install cell registers the `Cosmos3 Transformers (Python 3.13)` Jupyter kernel.\n", + "\n", + "**Switch this notebook to that kernel before running the remaining Python cells**,\n", + "then run the Restore Environment cell immediately below. It can take a moment for\n", + "the new kernel to appear in the notebook interface." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d209e12", + "metadata": {}, + "outputs": [], + "source": [ + "# Run this cell immediately after switching to the Cosmos3 Transformers kernel.\n", + "# It restores the same paths as the configure cell in step 2.\n", + "from pathlib import Path\n", + "import os\n", + "\n", + "\n", + "def find_repo_root(start: Path) -> Path:\n", + " for path in [start, *start.parents]:\n", + " if (path / \"README.md\").exists() and (path / \"cookbooks\").exists():\n", + " return path\n", + " return start\n", + "\n", + "\n", + "COSMOS_ROOT = find_repo_root(Path.cwd().resolve())\n", + "COSMOS_REASONER_ASSETS = COSMOS_ROOT / \"cookbooks\" / \"cosmos3\" / \"reasoner\" / \"assets\"\n", + "COSMOS3_TRANSFORMERS_VENV = Path(\n", + " os.environ.get(\"COSMOS3_TRANSFORMERS_VENV\", COSMOS_ROOT / \".venv-cosmos3-transformers\")\n", + ").resolve()\n", + "os.environ[\"COSMOS3_TRANSFORMERS_VENV\"] = str(COSMOS3_TRANSFORMERS_VENV)\n", + "\n", + "\n", + "def asset_path(name: str) -> Path:\n", + " path = COSMOS_REASONER_ASSETS / name\n", + " if not path.exists():\n", + " raise FileNotFoundError(path)\n", + " return path\n", + "\n", + "\n", + "print(\"cosmos root:\", COSMOS_ROOT)\n", + "print(\"Reasoner assets:\", COSMOS_REASONER_ASSETS)" + ] + }, + { + "cell_type": "markdown", + "id": "ab6a9c99", + "metadata": {}, + "source": [ + "## 5. Verify GPU and Python Environment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "948b294f", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import sys\n", + "from pathlib import Path\n", + "\n", + "if \"COSMOS3_TRANSFORMERS_VENV\" not in os.environ:\n", + " raise RuntimeError(\"Run the Restore Environment cell after switching to the Transformers kernel.\")\n", + "\n", + "expected_python = (Path(os.environ[\"COSMOS3_TRANSFORMERS_VENV\"]) / \"bin\" / \"python\").resolve()\n", + "current_python = Path(sys.executable).resolve()\n", + "print(\"kernel python:\", current_python)\n", + "print(\"expected python:\", expected_python)\n", + "if current_python != expected_python:\n", + " raise RuntimeError(\n", + " \"This notebook is not running inside the Transformers venv. \"\n", + " \"Switch the kernel to 'Cosmos3 Transformers (Python 3.13)', then run the Restore Environment cell above.\"\n", + " )\n", + "\n", + "import torch\n", + "import transformers\n", + "\n", + "print(\"transformers:\", transformers.__version__)\n", + "print(\"torch:\", torch.__version__)\n", + "print(\"torch cuda:\", torch.version.cuda)\n", + "print(\"cuda available:\", torch.cuda.is_available())\n", + "print(\"device count:\", torch.cuda.device_count())\n", + "if torch.cuda.is_available():\n", + " print(\"device 0:\", torch.cuda.get_device_name(0))" + ] + }, + { + "cell_type": "markdown", + "id": "0e00b6d3", + "metadata": {}, + "source": [ + "## 6. Load the Reasoner\n", + "\n", + "Load the processor and model once, then reuse them for every request below.\n", + "\n", + "`device_map=\"auto\"` places the model on the available GPU(s) and can shard\n", + "`Cosmos3-Super` across multiple GPUs when Accelerate is installed.\n", + "\n", + "> **First run downloads the full unified checkpoint** (tens of GiB; ~28 GiB for\n", + "> Nano) even though only the Reasoner tower is loaded into memory (~17 GiB).\n", + "> Subsequent runs reuse the Hugging Face cache." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "937f9ede", + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "from transformers import AutoProcessor, Cosmos3OmniForConditionalGeneration\n", + "\n", + "model_id = \"nvidia/Cosmos3-Nano\" # or \"nvidia/Cosmos3-Super\"\n", + "\n", + "processor = AutoProcessor.from_pretrained(model_id)\n", + "model = Cosmos3OmniForConditionalGeneration.from_pretrained(\n", + " model_id,\n", + " dtype=torch.bfloat16,\n", + " device_map=\"auto\",\n", + ")\n", + "\n", + "\n", + "def run_reasoner(content, fps=None, max_new_tokens=512):\n", + " \"\"\"Run one Reasoner request. `content` is a chat content list (image/video/text blocks).\"\"\"\n", + " messages = [{\"role\": \"user\", \"content\": content}]\n", + " template_kwargs = dict(\n", + " tokenize=True,\n", + " add_generation_prompt=True,\n", + " return_dict=True,\n", + " return_tensors=\"pt\",\n", + " )\n", + " if fps is not None:\n", + " template_kwargs[\"fps\"] = fps\n", + "\n", + " inputs = processor.apply_chat_template(messages, **template_kwargs).to(model.device, torch.bfloat16)\n", + " generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=max_new_tokens)\n", + " generated_ids_trimmed = [\n", + " out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)\n", + " ]\n", + " return processor.batch_decode(\n", + " generated_ids_trimmed,\n", + " skip_special_tokens=True,\n", + " clean_up_tokenization_spaces=False,\n", + " )[0]\n", + "\n", + "\n", + "print(\"Loaded\", model_id)" + ] + }, + { + "cell_type": "markdown", + "id": "0f80ec19", + "metadata": {}, + "source": [ + "## 7. Image Reasoning" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3cdd010c", + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import Image, display\n", + "\n", + "image_path = asset_path(\"robot_153.jpg\")\n", + "display(Image(filename=str(image_path), width=512))\n", + "\n", + "output = run_reasoner(\n", + " [\n", + " {\"type\": \"image\", \"path\": str(image_path.resolve())},\n", + " {\"type\": \"text\", \"text\": \"Caption the image in detail.\"},\n", + " ]\n", + ")\n", + "print(output)" + ] + }, + { + "cell_type": "markdown", + "id": "88cadada", + "metadata": {}, + "source": [ + "## 8. Video Reasoning\n", + "\n", + "Use a `video` content block and pass a frame sampling rate (`fps`) to the helper.\n", + "\n", + "> Video decoding uses the packages installed above. Transformers prints a\n", + "> deprecation warning that it fell back to the `torchvision` decoder — this is\n", + "> expected and harmless. To switch to the modern `torchcodec` decoder, install it\n", + "> along with system FFmpeg libraries (`libavutil`/`libavcodec`)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7cab369", + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import Video, display\n", + "\n", + "video_path = asset_path(\"video_caption.mp4\")\n", + "display(Video(str(video_path), embed=True, width=640))\n", + "\n", + "output = run_reasoner(\n", + " [\n", + " {\"type\": \"video\", \"path\": str(video_path.resolve())},\n", + " {\"type\": \"text\", \"text\": \"Describe the notable events in this video.\"},\n", + " ],\n", + " fps=2,\n", + ")\n", + "print(output)" + ] + }, + { + "cell_type": "markdown", + "id": "5d2a6b57", + "metadata": {}, + "source": [ + "## 9. Next Steps\n", + "\n", + "- Run **Cosmos3-Super**: set `model_id = \"nvidia/Cosmos3-Super\"` in step 6 and\n", + " re-run from there. `device_map=\"auto\"` shards it across multiple GPUs.\n", + "- Try other Reasoner tasks (temporal localization, grounding, embodied reasoning)\n", + " by changing the prompt and asset — see the\n", + " [Reasoner Prompt Guide](./reasoner_prompt_guide.md).\n", + "- Need an OpenAI-compatible server instead of in-process Python? See\n", + " [`run_with_vllm.ipynb`](./run_with_vllm.ipynb) or [`run_with_nim.ipynb`](./run_with_nim.ipynb)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From bb5fbe9cccd13712274f32b9bd6ce61ce00bc466 Mon Sep 17 00:00:00 2001 From: Yang Fu Date: Wed, 24 Jun 2026 18:38:02 -0700 Subject: [PATCH 2/2] docs: pin torchvision to 0.25.0 in Transformers notebook and README --- cookbooks/cosmos3/README.md | 2 +- cookbooks/cosmos3/reasoner/run_with_transformers.ipynb | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/cookbooks/cosmos3/README.md b/cookbooks/cosmos3/README.md index 4c1f9fdc..a12f123d 100644 --- a/cookbooks/cosmos3/README.md +++ b/cookbooks/cosmos3/README.md @@ -180,7 +180,7 @@ uv pip install --torch-backend=auto \ pillow \ "safetensors>=0.8.0" \ torch \ - torchvision \ + "torchvision==0.25.0" \ "transformers>=5.11.0" ``` diff --git a/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb b/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb index cccc482d..258a0ef6 100644 --- a/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb +++ b/cookbooks/cosmos3/reasoner/run_with_transformers.ipynb @@ -168,7 +168,7 @@ " pillow \\\n", " \"safetensors>=0.8.0\" \\\n", " torch \\\n", - " torchvision \\\n", + " \"torchvision==0.25.0\" \\\n", " \"transformers>=5.11.0\"\n", "\n", "\"$COSMOS3_TRANSFORMERS_VENV/bin/python\" -m ipykernel install --user \\\n", @@ -426,7 +426,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "Python 3", "language": "python", "name": "python3" }, @@ -440,7 +440,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.3" + "version": "3.10.20" } }, "nbformat": 4,