diff --git a/examples/deep_learning-computerVision/nvidia-cifar10-notebook/README.md b/examples/deep_learning-computerVision/nvidia-cifar10-notebook/README.md new file mode 100644 index 00000000..3ac09476 --- /dev/null +++ b/examples/deep_learning-computerVision/nvidia-cifar10-notebook/README.md @@ -0,0 +1,50 @@ +# PyTorch CIFAR10 Classifier Dashboard + +Train a ResNet18 deep learning model on the CIFAR10 image dataset using PyTorch with GPU acceleration and visualize results in an interactive dashboard. + +## Overview + +This example shows how you can use the power of a GPU to quickly train an image classification neural network in Saturn Cloud. This code runs on a single GPU of a Jupyter server resource. + +This is an example of a computer vision neural network which is trained on the CIFAR10 dataset to classify images into 10 categories: airplane, car, bird, cat, deer, dog, frog, horse, ship, and truck. The model uses a ResNet18 architecture which is especially good at image recognition tasks. ResNet uses "residual connections" that allow the network to learn complex patterns while avoiding the vanishing gradient problem. The results are displayed in an interactive dashboard that can be viewed in the notebook or deployed to Saturn Cloud for continuous hosting. + +## What is CIFAR10? + +CIFAR10 is a dataset of 60,000 32Γ—32 color images across 10 classes: +- 50,000 training images +- 10,000 test images +- 10 classes: airplane, car, bird, cat, deer, dog, frog, horse, ship, truck + +## What is ResNet18? + +ResNet18 is an 18-layer convolutional neural network that uses "residual connections" (skip connections) to enable training of deeper networks. These skip connections allow gradients to flow more easily during backpropagation, solving the vanishing gradient problem that plagued earlier deep networks. With approximately 11 million parameters, ResNet18 strikes an ideal balance between accuracy and computational efficiency for image classification tasks. + +## Requirements + +### Hardware +- **NVIDIA GPU (1Γ—)** - Required for training acceleration + +### Software +- Python 3.8+ +- PyTorch +- Torchvision +- Panel +- hvPlot +- Pandas +- NumPy +- Matplotlib + +## What This Template Does + +1. **Downloads** the CIFAR10 dataset automatically (~170MB) +2. **Applies** data augmentation (random flips and crops) to training images +3. **Trains** a ResNet18 model for 5 epochs on GPU +4. **Tracks** training loss and accuracy metrics +5. **Evaluates** model performance on test data after each epoch +6. **Visualizes** training curves showing loss and accuracy over time +7. **Displays** sample predictions with correct/incorrect labels +8. **Creates** an interactive dashboard with: + - Training and test accuracy curves + - Loss curve over epochs + - Sample prediction grid (16 images) + - Key performance indicators (KPIs) \ No newline at end of file diff --git a/examples/deep_learning-computerVision/nvidia-cifar10-notebook/pytorch-cifar10.ipynb b/examples/deep_learning-computerVision/nvidia-cifar10-notebook/pytorch-cifar10.ipynb new file mode 100644 index 00000000..f4bb7369 --- /dev/null +++ b/examples/deep_learning-computerVision/nvidia-cifar10-notebook/pytorch-cifar10.ipynb @@ -0,0 +1,433 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# PyTorch CIFAR10 Classifier\n", + "\n", + "![PyTorch Logo](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/pytorch-logo.png)\n", + "\n", + "This example demonstrates GPU-accelerated image classification using PyTorch on Saturn Cloud, running on GPU Jupyter server resource. \n", + "\n", + "The code trains a ResNet18 neural network on the CIFAR10 dataset to classify images into 10 categories: airplane, car, bird, cat, deer, dog, frog, horse, ship, and truck. ResNet18 uses \"residual connections\" that enable the network to learn complex visual patterns while avoiding the vanishing gradient problem.\n", + "\n", + "Leverage the use of multiple GPU on [Saturn Cloud](https://saturncloud.io/?utm_source=Blog+&utm_medium=Try&utm_campaign=Try) to effectively decrease model’s training time and handle larger datasets. [Check our blog](https://saturncloud.io/blog/how-to-use-multiple-gpus-in-pytorch/) on exploring the computational power of multiple GPUs in PyTorch." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Installing Dependencies\n", + "First, ensure all required packages are installed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -q -r requirements.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Import Libraries and Setup\n", + "This code mainly relies on PyTorch and Torchvision for the deep learning work." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "import torchvision\n", + "import torchvision.transforms as transforms\n", + "from torchvision.models import resnet18\n", + "import numpy as np\n", + "import pandas as pd\n", + "import hvplot.pandas\n", + "import panel as pn\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Enable Panel extension for dashboard\n", + "pn.extension()\n", + "\n", + "# Check if GPU is available\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "print(f\"Using device: {device}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Preparing Data\n", + "\n", + "This code is used to get the CIFAR10 dataset and format it properly for training, also define the image transformations. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Training transforms with data augmentation\n", + "transform_train = transforms.Compose([\n", + " transforms.RandomHorizontalFlip(),\n", + " transforms.RandomCrop(32, padding=4),\n", + " transforms.ToTensor(),\n", + " transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),\n", + "])\n", + "\n", + "# Test transforms without augmentation\n", + "transform_test = transforms.Compose([\n", + " transforms.ToTensor(),\n", + " transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),\n", + "])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, download the CIFAR10 dataset and create data loaders. The dataset will be automatically downloaded on first run (~170MB)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Download and load training data\n", + "trainset = torchvision.datasets.CIFAR10(\n", + " root='./data', train=True, download=True, transform=transform_train\n", + ")\n", + "trainloader = torch.utils.data.DataLoader(\n", + " trainset, batch_size=128, shuffle=True, num_workers=2\n", + ")\n", + "\n", + "# Download and load test data\n", + "testset = torchvision.datasets.CIFAR10(\n", + " root='./data', train=False, download=True, transform=transform_test\n", + ")\n", + "testloader = torch.utils.data.DataLoader(\n", + " testset, batch_size=100, shuffle=False, num_workers=2\n", + ")\n", + "\n", + "# Class names for visualization\n", + "classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')\n", + "\n", + "print(f\"Training samples: {len(trainset)}\")\n", + "print(f\"Test samples: {len(testset)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Model Architecture\n", + "\n", + "This code defines the ResNet18 structure that the neural network will use. ResNet18 is a convolutional neural network with 18 layers that uses residual connections to enable training of deeper networks." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load ResNet18 and modify for CIFAR10 (10 classes)\n", + "model = resnet18(weights=None, num_classes=10)\n", + "model = model.to(device)\n", + "\n", + "# Define loss function and optimizer\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = optim.Adam(model.parameters(), lr=0.001)\n", + "\n", + "print(\"Model initialized and ready for training\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Train the Model\n", + "\n", + "We define a `train()` function that will do the work to train the neural network. This function trains the model for a specified number of epochs and tracks both training and test accuracy. It uses the GPU via `device` for acceleration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def train(model, trainloader, testloader, criterion, optimizer, device, num_epochs=5):\n", + " train_losses = []\n", + " train_accuracies = []\n", + " test_accuracies = []\n", + " \n", + " print(\"Starting training...\\n\")\n", + " \n", + " for epoch in range(num_epochs):\n", + " model.train()\n", + " running_loss = 0.0\n", + " correct = 0\n", + " total = 0\n", + " \n", + " # Training loop\n", + " for i, (inputs, labels) in enumerate(trainloader):\n", + " inputs, labels = inputs.to(device), labels.to(device)\n", + " \n", + " optimizer.zero_grad()\n", + " outputs = model(inputs)\n", + " loss = criterion(outputs, labels)\n", + " loss.backward()\n", + " optimizer.step()\n", + " \n", + " running_loss += loss.item()\n", + " _, predicted = outputs.max(1)\n", + " total += labels.size(0)\n", + " correct += predicted.eq(labels).sum().item()\n", + " \n", + " # Calculate training metrics\n", + " epoch_loss = running_loss / len(trainloader)\n", + " epoch_acc = 100. * correct / total\n", + " train_losses.append(epoch_loss)\n", + " train_accuracies.append(epoch_acc)\n", + " \n", + " # Evaluate on test set\n", + " model.eval()\n", + " test_correct = 0\n", + " test_total = 0\n", + " \n", + " with torch.no_grad():\n", + " for inputs, labels in testloader:\n", + " inputs, labels = inputs.to(device), labels.to(device)\n", + " outputs = model(inputs)\n", + " _, predicted = outputs.max(1)\n", + " test_total += labels.size(0)\n", + " test_correct += predicted.eq(labels).sum().item()\n", + " \n", + " test_acc = 100. * test_correct / test_total\n", + " test_accuracies.append(test_acc)\n", + " \n", + " print(f'Epoch [{epoch+1}/{num_epochs}] '\n", + " f'Loss: {epoch_loss:.3f} | '\n", + " f'Train Acc: {epoch_acc:.2f}% | '\n", + " f'Test Acc: {test_acc:.2f}%')\n", + " \n", + " print(\"\\nTraining completed!\")\n", + " return train_losses, train_accuracies, test_accuracies" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next block of code actually runs the training function and creates the trained model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "num_epochs = 5\n", + "train_losses, train_accuracies, test_accuracies = train(\n", + " model, trainloader, testloader, criterion, optimizer, device, num_epochs\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Visualizing Results\n", + "### Generate Training Curves\n", + "To visualize the training progress, this create interactive plots showing loss and accuracy over time." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create dataframe with training metrics\n", + "curves_data = pd.DataFrame({\n", + " 'Epoch': range(1, num_epochs + 1),\n", + " 'Training Loss': train_losses,\n", + " 'Training Accuracy': train_accuracies,\n", + " 'Test Accuracy': test_accuracies\n", + "})\n", + "\n", + "# Create loss curve\n", + "loss_curve = curves_data.hvplot.line(\n", + " x='Epoch', \n", + " y='Training Loss', \n", + " title='Training Loss Over Time',\n", + " color='#e74c3c',\n", + " line_width=3,\n", + " width=600,\n", + " height=300\n", + ")\n", + "\n", + "# Create accuracy curves\n", + "acc_curve = curves_data.hvplot.line(\n", + " x='Epoch', \n", + " y=['Training Accuracy', 'Test Accuracy'],\n", + " title='Accuracy Over Time',\n", + " line_width=3,\n", + " width=600,\n", + " height=300,\n", + " legend='top_left'\n", + ")\n", + "\n", + "# Display curves\n", + "pn.Column(loss_curve, acc_curve)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Interactive Dashboard\n", + "\n", + "Finally, we can create an interactive dashboard that displays all results in one place. This dashboard can be deployed to Saturn Cloud for continuous hosting." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Helper function to create KPI boxes\n", + "def kpi_box(title, color, value, unit=\"\"):\n", + " return pn.pane.Markdown(\n", + " f\"\"\"\n", + " ### {title}\n", + " # {value}{unit}\n", + " \"\"\",\n", + " styles={\n", + " \"background-color\": \"#F6F6F6\",\n", + " \"border\": f\"2px solid {color}\",\n", + " \"border-radius\": \"5px\",\n", + " \"padding\": \"10px\",\n", + " \"color\": color,\n", + " },\n", + " )\n", + "\n", + "# Final metric values\n", + "final_test_acc = test_accuracies[-1]\n", + "final_train_acc = train_accuracies[-1]\n", + "\n", + "# Create KPI boxes\n", + "test_acc_kpi = kpi_box(\"Final Test Accuracy\", \"#27ae60\", f\"{final_test_acc:.2f}\", \"%\")\n", + "train_acc_kpi = kpi_box(\"Final Train Accuracy\", \"#3498db\", f\"{final_train_acc:.2f}\", \"%\")\n", + "epochs_kpi = kpi_box(\"Epochs Trained\", \"#9b59b6\", num_epochs)\n", + "\n", + "# Dashboard introduction content\n", + "dashboard_intro = \"\"\"\n", + "# CIFAR10 Image Classification Dashboard\n", + "\n", + "This dashboard shows the results of training a **ResNet18** model on the CIFAR10 dataset using GPU acceleration.\n", + "The model was trained to classify images into 10 categories: plane, car, bird, cat, deer, dog, frog, horse, ship, and truck.\n", + "\n", + "## About the Model\n", + "\n", + "- **Architecture:** ResNet18 (18-layer residual network)\n", + "- **Dataset:** CIFAR10 (60,000 images, 10 classes)\n", + "- **Training:** GPU-accelerated with PyTorch\n", + "\"\"\"\n", + "\n", + "# Deployment information\n", + "about_deployment = \"\"\"\n", + "## Deploying on Saturn Cloud\n", + "\n", + "This dashboard can be deployed to Saturn Cloud for continuous hosting, allowing users without notebook access \n", + "to view the results. The model training leverages **NVIDIA GPU acceleration** for faster training times. \n", + "Learn more about GPU deployments in the [Saturn Cloud documentation](https://saturncloud.io/docs/).\n", + "\"\"\"\n", + "\n", + "# Convert Matplotlib figure to Panel pane\n", + "image_grid = pn.pane.Matplotlib(fig, tight=True)\n", + "\n", + "# Create dashboard layout using GridSpec\n", + "dashboard = pn.GridSpec(\n", + " name=\"CIFAR10 Dashboard\",\n", + " sizing_mode=\"stretch_both\",\n", + " min_width=900,\n", + " min_height=800\n", + ")\n", + "\n", + "# Layout configuration\n", + "dashboard[0:3, 0:2] = pn.Column(dashboard_intro, about_deployment)\n", + "dashboard[0, 2] = test_acc_kpi\n", + "dashboard[0, 3] = train_acc_kpi\n", + "dashboard[0, 4] = epochs_kpi\n", + "dashboard[1:3, 2:5] = loss_curve\n", + "dashboard[3:5, 0:5] = acc_curve\n", + "dashboard[5:9, 0:5] = image_grid\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Display the dashboard\n", + "dashboard" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you wanted to experiment with trying many different hyperparameters for the model, you could concurrently train models with different hyperparameters using [distributed computing](https://github.com/saturncloud/examples/blob/54589b72f63737afd2ccce5a38b7feac86b17c96/examples/pytorch/02-pytorch-gpu-dask-multiple-models.ipynb). You could also train a single neural network over many GPUs at once with distributed computing via PyTorch's DistributedDataParallel or Dask." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/examples/deep_learning-computerVision/nvidia-cifar10-notebook/requirements.txt b/examples/deep_learning-computerVision/nvidia-cifar10-notebook/requirements.txt new file mode 100644 index 00000000..07808a4a --- /dev/null +++ b/examples/deep_learning-computerVision/nvidia-cifar10-notebook/requirements.txt @@ -0,0 +1,21 @@ +# Deep Learning Framework +torch>=2.0.0 +torchvision>=0.15.0 + +# Dashboard and Visualization +panel>=1.2.0 +hvplot>=0.8.0 +holoviews>=1.16.0 +bokeh>=3.1.0 + +# Data Processing +pandas>=2.0.0 +numpy>=1.24.0 + +# Image Processing and Plotting +matplotlib>=3.7.0 +Pillow>=9.5.0 + +# Jupyter Support (optional, for notebook rendering) +jupyter>=1.0.0 +notebook>=6.5.0 \ No newline at end of file diff --git a/examples/deep_learning-computerVision/nvidia-fastrcnn_object_detection/README.md b/examples/deep_learning-computerVision/nvidia-fastrcnn_object_detection/README.md new file mode 100644 index 00000000..c30d7186 --- /dev/null +++ b/examples/deep_learning-computerVision/nvidia-fastrcnn_object_detection/README.md @@ -0,0 +1,98 @@ +# πŸͺ Saturn Cloud Template: Object Detection with Faster R-CNN + +This template provides a ready-to-run **object detection project** built for **Saturn Cloud**. +It uses a pre-trained **Faster R-CNN** model to detect common objects in images and visualize results β€” all powered by **GPU acceleration**. + +Use this template as a **fast start** for your own computer vision or image analysis projects on Saturn Cloud. + +--- + +## 🧠 What This Template Does + +- Load and analyze images from **local paths** or **URLs** +- Detect objects using a pre-trained **Faster R-CNN** model +- Display bounding boxes and confidence scores +- Run interactively from a terminal or Jupyter Notebook +- Easily extend to **custom training, datasets, or scaling** with Saturn Cloud’s GPU clusters + +--- + +## βš™οΈ Saturn Cloud Environment Setup + +This template is pre-configured for **Saturn Cloud GPU environments**. +You can run it immediately on a GPU-backed resource β€” no setup required beyond installing dependencies. + +### Default Environment +- **Image**: `saturncloud/pytorch:latest` +- **Hardware**: GPU instance (recommended: 1Γ— NVIDIA T4 or A10G) +- **Python**: 3.10+ +- **Memory**: 8GB+ + +### Dependencies (from `requirements.txt`) +``` + +torch +torchvision +matplotlib +pillow +requests + +```` + +To reproduce the environment manually: + +```bash +pip install -r requirements.txt +```` + +--- + +## πŸš€ Quickstart (in Saturn Cloud) + +1. **Launch this template** in your Saturn Cloud workspace: + + * Go to [Saturn Cloud](https://saturncloud.io/) + * Click **New Project β†’ From Template** + * Choose **Object Detection with Faster R-CNN** + +2. **Open the Jupyter notebook and run all the code cells**. + +3. When prompted, enter an image path or URL. + You can test with this example URL: + + ``` + https://plus.unsplash.com/premium_photo-1667030489905-d8e6309ebe0e?ixlib=rb-4.1.0&auto=format&fit=crop&q=60&w=200 + ``` + + Output: + + ``` + πŸ“‘ Downloading image from URL... + βœ… Image downloaded successfully + 🎯 Detected 3 objects (threshold: 0.5): + 1. Person: 99.3% + 2. Dog: 97.1% + 3. Chair: 88.4% + ``` + +4. A visualization window will display the bounding boxes drawn over the detected objects. + +--- + +## 🧩 Core Components + +### `detect_in_uploaded_image(image_input, threshold=0.5)` + +Detects objects in an image (from a local file or URL) using the pre-trained model. +Returns the bounding boxes, labels, and confidence scores. + +--- + +## πŸ“š References + +* [Saturn Cloud Examples Repository](https://github.com/saturncloud/examples) +* [Faster R-CNN Model Implementation](https://github.com/trzy/FasterRCNN) +* [COCO Dataset Classes](https://cocodataset.org/#home) +* [Saturn Cloud Documentation](https://saturncloud.io/docs/) + + diff --git a/examples/deep_learning-computerVision/nvidia-fastrcnn_object_detection/object-detection_torchvision.ipynb b/examples/deep_learning-computerVision/nvidia-fastrcnn_object_detection/object-detection_torchvision.ipynb new file mode 100644 index 00000000..3c016c90 --- /dev/null +++ b/examples/deep_learning-computerVision/nvidia-fastrcnn_object_detection/object-detection_torchvision.ipynb @@ -0,0 +1,510 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Em9lhruBVWh8" + }, + "source": [ + "# Object Detection with Faster R-CNN\n", + "\n", + "Faster R-CNN is a powerful object detection model used by developers across industries for tasks like surveillance, autonomous driving, and image analysis. In this example, you'll detect objects in images using a pre-trained Faster R-CNN model. The model identifies objects like people, cars, animals, and furniture, drawing bounding boxes around each detection with confidence scores.\n", + "\n", + "The model is pre-trained on the COCO dataset, which contains 80 common object categories including person, car, dog, cat, bicycle, and many more. It uses a two-stage detection process: first identifying potential object regions, then classifying what each object is. The model is approximately 160MB and will be downloaded from PyTorch on first run." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oaMnDyq9VWh_" + }, + "source": [ + "The code below installs the necessary packages. TorchVision provides the pre-trained Faster R-CNN model and detection utilities:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ycs51IFnVWiA" + }, + "outputs": [], + "source": [ + "!pip install -q torch torchvision pillow matplotlib numpy" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tHwIFOlsVWiC" + }, + "source": [ + "In the code below, we import the required libraries. TorchVision handles the model and image transformations, while Matplotlib will be used for visualizing the detection results:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "apH_UXTlVWiD" + }, + "outputs": [], + "source": [ + "import torch\n", + "import torchvision\n", + "from torchvision import transforms\n", + "from torchvision.models.detection import fasterrcnn_resnet50_fpn, FasterRCNN_ResNet50_FPN_Weights\n", + "from PIL import Image\n", + "import matplotlib.pyplot as plt\n", + "import matplotlib.patches as patches\n", + "import numpy as np\n", + "import urllib.request\n", + "from io import BytesIO\n", + "import requests\n", + "import os\n", + "\n", + "# Check if GPU is available\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "print(f\"Using device: {device}\")\n", + "\n", + "if device == \"cpu\":\n", + " print(\"⚠️ Warning: Running on CPU. Detection will be slower.\")\n", + " print(\" For best results, use a GPU-enabled resource.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lGYoPryFVWiD" + }, + "source": [ + "The code below loads the pre-trained Faster R-CNN model with a ResNet-50 backbone. The model will be downloaded automatically on first run and cached for future use. We set the model to evaluation mode since we're only doing inference, not training:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f2PYxvGEVWiE" + }, + "outputs": [], + "source": [ + "print(\"Loading Faster R-CNN model...\")\n", + "print(\"(First run may take a moment to download the model)\\n\")\n", + "\n", + "# Load pre-trained model with updated weights API\n", + "weights = FasterRCNN_ResNet50_FPN_Weights.DEFAULT\n", + "model = fasterrcnn_resnet50_fpn(weights=weights)\n", + "model.eval()\n", + "model = model.to(device)\n", + "\n", + "print(\"βœ“ Model loaded successfully!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VFu2VuXTVWiF" + }, + "source": [ + "The COCO dataset contains 80 object categories. In the code below, we define the class names that correspond to the model's predictions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "u-_gaVM0VWiF" + }, + "outputs": [], + "source": [ + "# COCO class names (91 total, but only 80 are used)\n", + "COCO_CLASSES = [\n", + " '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',\n", + " 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',\n", + " 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',\n", + " 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',\n", + " 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',\n", + " 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',\n", + " 'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',\n", + " 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',\n", + " 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',\n", + " 'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',\n", + " 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',\n", + " 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'\n", + "]\n", + "\n", + "print(f\"Model can detect {len([c for c in COCO_CLASSES if c != 'N/A'])} object categories\")\n", + "print(\"\\nSample categories:\", COCO_CLASSES[1:11])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WqMet0x3VWiH" + }, + "source": [ + "## Object Detection\n", + "\n", + "The code below defines our detection function. It takes an image, runs it through the model, and returns the detected objects with their locations (bounding boxes), class labels, and confidence scores. We filter out low-confidence detections using a threshold:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "uMC_rAH6VWiH" + }, + "outputs": [], + "source": [ + "def detect_objects(image, model, threshold=0.5):\n", + " \"\"\"\n", + " Detect objects in an image.\n", + "\n", + " Args:\n", + " image: PIL Image or numpy array\n", + " model: Faster R-CNN model\n", + " threshold: Confidence threshold (0-1)\n", + "\n", + " Returns:\n", + " boxes: Bounding box coordinates [x1, y1, x2, y2]\n", + " labels: Class IDs for each detection\n", + " scores: Confidence scores for each detection\n", + " \"\"\"\n", + " # Convert to PIL Image if needed\n", + " if isinstance(image, np.ndarray):\n", + " image = Image.fromarray(image)\n", + "\n", + " # Transform image to tensor\n", + " transform = transforms.Compose([transforms.ToTensor()])\n", + " image_tensor = transform(image).unsqueeze(0).to(device)\n", + "\n", + " # Run detection\n", + " with torch.no_grad():\n", + " predictions = model(image_tensor)\n", + "\n", + " # Extract predictions\n", + " pred = predictions[0]\n", + "\n", + " # Filter by threshold\n", + " mask = pred['scores'] > threshold\n", + " boxes = pred['boxes'][mask].cpu().numpy()\n", + " labels = pred['labels'][mask].cpu().numpy()\n", + " scores = pred['scores'][mask].cpu().numpy()\n", + "\n", + " return boxes, labels, scores" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bWovqkzrVWiI" + }, + "source": [ + "Now let's create a function to visualize the detections. The code below draws bounding boxes around detected objects, adds labels with confidence scores, and uses different colors for better visibility:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2yB44niFVWiJ" + }, + "outputs": [], + "source": [ + "def visualize_detections(image, boxes, labels, scores, class_names, threshold=0.5):\n", + " \"\"\"\n", + " Draw bounding boxes on image with labels and scores.\n", + "\n", + " Args:\n", + " image: PIL Image or numpy array\n", + " boxes: Array of bounding boxes [x1, y1, x2, y2]\n", + " labels: Array of class IDs\n", + " scores: Array of confidence scores\n", + " class_names: List of class names\n", + " threshold: Only show detections above this confidence\n", + " \"\"\"\n", + " # Convert to numpy if PIL Image\n", + " if isinstance(image, Image.Image):\n", + " image = np.array(image)\n", + "\n", + " fig, ax = plt.subplots(1, figsize=(12, 8))\n", + " ax.imshow(image)\n", + "\n", + " # Color palette for different object classes\n", + " colors = plt.cm.hsv(np.linspace(0, 1, len(class_names)))\n", + "\n", + " for box, label, score in zip(boxes, labels, scores):\n", + " if score < threshold:\n", + " continue\n", + "\n", + " x1, y1, x2, y2 = box\n", + " width = x2 - x1\n", + " height = y2 - y1\n", + "\n", + " # Select color based on class\n", + " color = colors[label % len(colors)]\n", + "\n", + " # Draw rectangle\n", + " rect = patches.Rectangle(\n", + " (x1, y1), width, height,\n", + " linewidth=2, edgecolor=color, facecolor='none'\n", + " )\n", + " ax.add_patch(rect)\n", + "\n", + " # Add label with background\n", + " label_text = f\"{class_names[label]}: {score:.2f}\"\n", + " ax.text(\n", + " x1, y1 - 5, label_text,\n", + " bbox=dict(facecolor=color, alpha=0.7, edgecolor='none', pad=2),\n", + " color='white', fontsize=10, weight='bold'\n", + " )\n", + "\n", + " ax.axis('off')\n", + " plt.tight_layout()\n", + " return fig" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N3DLJhOVVWiJ" + }, + "source": [ + "## Test with Sample Images\n", + "\n", + "Let's test our object detection on some sample images. The code below downloads a few test images from the web and runs detection on them. You'll see bounding boxes appear around detected objects!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xb3SlHsRVWiJ" + }, + "outputs": [], + "source": [ + "# Sample image URLs\n", + "sample_images = [\n", + " \"https://images.unsplash.com/photo-1543466835-00a7907e9de1\", # Dog\n", + " \"https://images.unsplash.com/photo-1507003211169-0a1dd7228f2d\", # Person\n", + " \"https://images.unsplash.com/photo-1549399542-7e3f8b79c341\", # car\n", + " \"https://images.pexels.com/photos/1133957/pexels-photo-1133957.jpeg\", # Bird\n", + "]\n", + "\n", + "print(\"Downloading and processing sample images...\\n\")\n", + "\n", + "for idx, url in enumerate(sample_images):\n", + " try:\n", + " print(f\"Processing image {idx + 1}/{len(sample_images)}...\")\n", + "\n", + " # Use requests with headers\n", + " headers = {\n", + " 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'\n", + " }\n", + "\n", + " response = requests.get(url, headers=headers, timeout=10)\n", + " response.raise_for_status()\n", + "\n", + " # Open image from memory\n", + " image = Image.open(BytesIO(response.content)).convert(\"RGB\")\n", + "\n", + " # Detect objects\n", + " boxes, labels, scores = detect_objects(image, model, threshold=0.5)\n", + "\n", + " print(f\" Found {len(boxes)} objects\")\n", + "\n", + " # Show detected object types\n", + " detected_classes = [COCO_CLASSES[label] for label in labels]\n", + " print(f\" Detected: {', '.join(detected_classes)}\\n\")\n", + "\n", + " # Visualize\n", + " fig = visualize_detections(image, boxes, labels, scores, COCO_CLASSES)\n", + " plt.show()\n", + "\n", + " except Exception as e:\n", + " print(f\" Error processing image {url}: {e}\\n\")\n", + "\n", + "print(\"βœ“ Sample detection complete!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PrdZCLcoVWiK" + }, + "source": [ + "## Upload Your Own Images\n", + "\n", + "Now you can try the detector on your own images! The code below allows you to upload an image using path directory or URL of the image and see what objects are detected:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OgqoO5gYv1bq" + }, + "outputs": [], + "source": [ + "def detect_in_uploaded_image(image_input, threshold=0.5):\n", + " \"\"\"\n", + " Detect objects in an uploaded image from either local path or URL.\n", + "\n", + " Args:\n", + " image_input: Can be either:\n", + " - Local file path (e.g., '/content/image.jpg')\n", + " - Image URL (e.g., 'https://example.com/image.jpg')\n", + " threshold: Confidence threshold for detection (0.0 to 1.0)\n", + "\n", + " Returns:\n", + " boxes, labels, scores: Detection results\n", + " \"\"\"\n", + " try:\n", + " # Check if input is a URL\n", + " if image_input.startswith(('http://', 'https://')):\n", + " print(f\"πŸ“‘ Downloading image from URL: {image_input}\")\n", + "\n", + " # Download image from URL with headers\n", + " headers = {\n", + " 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'\n", + " }\n", + " response = requests.get(image_input, headers=headers, timeout=10)\n", + " response.raise_for_status()\n", + "\n", + " # Open image from memory\n", + " image = Image.open(BytesIO(response.content)).convert(\"RGB\")\n", + " print(\"βœ… Image downloaded successfully\")\n", + "\n", + " else:\n", + " # Treat as local file path\n", + " print(f\"πŸ“ Loading image from path: {image_input}\")\n", + "\n", + " # Check if file exists\n", + " if not os.path.exists(image_input):\n", + " raise FileNotFoundError(f\"Image file not found: {image_input}\")\n", + "\n", + " image = Image.open(image_input).convert(\"RGB\")\n", + " print(\"βœ… Image loaded successfully\")\n", + "\n", + " # Display original image\n", + " print(f\"πŸ“Š Image size: {image.size}\")\n", + "\n", + " # Detect objects\n", + " boxes, labels, scores = detect_objects(image, model, threshold=threshold)\n", + "\n", + " # Display results\n", + " print(f\"\\n🎯 Detected {len(boxes)} objects (threshold: {threshold}):\")\n", + " for i, (label, score) in enumerate(zip(labels, scores)):\n", + " print(f\" {i+1}. {COCO_CLASSES[label]}: {score:.1%} confidence\")\n", + "\n", + " # Visualize detections\n", + " fig = visualize_detections(image, boxes, labels, scores, COCO_CLASSES, threshold)\n", + " plt.show()\n", + "\n", + " return boxes, labels, scores\n", + "\n", + " except Exception as e:\n", + " print(f\"❌ Error: {e}\")\n", + " return None, None, None\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JvxNXqMvEKdB" + }, + "source": [ + "This code create an interactive options to use the code above, you can add the image path or URL link for classification" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WnsxeHbjvz6J" + }, + "outputs": [], + "source": [ + "def interactive_detection():\n", + " \"\"\"\n", + " Interactive version that asks for input\n", + " \"\"\"\n", + " print(\"πŸ€– Object Detection - Enter your image source\")\n", + " print(\"=\" * 50)\n", + "\n", + " # Get image source\n", + " image_source = input(\"Enter image path or URL: \").strip()\n", + "\n", + " # Get threshold with default value\n", + " threshold_input = input(\"Confidence threshold (0.1-1.0) [default: 0.5]: \").strip()\n", + " threshold = float(threshold_input) if threshold_input else 0.5\n", + "\n", + " print(\"\\n\" + \"=\" * 50)\n", + "\n", + " # Run detection\n", + " return detect_in_uploaded_image(image_source, threshold)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PMO7Qmn5ET9P" + }, + "source": [ + "You can classify your image here by providing the image path or URL. Let's test with this link incase you dont have an image ready \"https://plus.unsplash.com/premium_photo-1667030489905-d8e6309ebe0e?ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1pbi1zYW1lLXNlcmllc3wxfHx8ZW58MHx8fHx8&auto=format&fit=crop&q=60&w=200\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "M-5xPS2_2Wfr" + }, + "outputs": [], + "source": [ + "interactive_detection()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Jd-izSyaVWiN" + }, + "source": [ + "## Conclusion\n", + "\n", + "We have now set up [Faster R-CNN](\"https://github.com/trzy/FasterRCNN\") for object detection on a GPU! The model can detect 80 different object types in images, drawing bounding boxes around each detection with confidence scores. Once you've got this working, you have all the tools you need to build object detection applications on Saturn Cloud.\n", + "\n", + "If you wanted to process images at scale or work with video streams, you could distribute the workload across multiple GPUs using distributed computing - this can be achieved using [saturn cloud](https://app.community.saturnenterprise.io/auth/hosted-registration?_gl=1*vubx0h*_gcl_au*NDg0MjEzNzI5LjE3NTk3OTIwOTY.*_ga*MTMzMjE2NDcxMi4xNzU5NzkyMDk2*_ga_9QKGCS5Q41*czE3NjA0Njk4MjkkbzckZzEkdDE3NjA0NzE4MDMkajU5JGwwJGgw). You could also fine-tune the model on custom datasets to detect specialized objects specific to your use case." + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/examples/deep_learning-computerVision/nvidia-sd-gradio/README.md b/examples/deep_learning-computerVision/nvidia-sd-gradio/README.md new file mode 100644 index 00000000..43ecb51f --- /dev/null +++ b/examples/deep_learning-computerVision/nvidia-sd-gradio/README.md @@ -0,0 +1,5 @@ +# Stable Diffusion Turbo + +This resource provides a starting point for running Stable Diffusion Turbo text-to-image generation on GPU within Saturn Cloud. You'll deploy an interactive Gradio application that transforms text prompts into AI-generated images in seconds, leveraging the optimized SDXL-Turbo model that produces high-quality results in just 1-4 steps instead of the typical 50+ steps. + +The example creates a web interface where users can input any text description and instantly visualize generated images, with options to control output variations and quality settings. This interface runs directly in your Jupyter server and can be deployed on Saturn Cloud for continuous hosting and sharing. \ No newline at end of file diff --git a/examples/deep_learning-computerVision/nvidia-sd-gradio/stable-diffusion-turbo.ipynb b/examples/deep_learning-computerVision/nvidia-sd-gradio/stable-diffusion-turbo.ipynb new file mode 100644 index 00000000..9b1ce178 --- /dev/null +++ b/examples/deep_learning-computerVision/nvidia-sd-gradio/stable-diffusion-turbo.ipynb @@ -0,0 +1,371 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Stable Diffusion Turbo - Text-to-Image Generation\n", + "\n", + "![Stable Diffusion](https://stability.ai/favicon.ico)\n", + "\n", + "This example shows how you can use the power of a GPU to quickly generate AI images from text prompts in Saturn Cloud. This code runs on a single GPU of a Jupyter server resource.\n", + "\n", + "This is an example of a text-to-image generation model using Stable Diffusion Turbo, which can create photorealistic or artistic images from natural language descriptions. The model uses a diffusion process that starts with random noise and gradually refines it into a coherent image based on your text prompt. The \"Turbo\" version is optimized for speed, generating high-quality images in just 1-4 steps instead of the typical 50+ steps.\n", + "\n", + "This notebook creates an interactive Gradio interface where users can type any prompt and instantly see generated images. The interface can be viewed in the notebook or deployed to Saturn Cloud for continuous hosting." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, we ensure all required packages are installed. This will only install packages that are missing from your environment. We use specific versions of `diffusers`, `transformers`, and `accelerate` to maintain compatibility with the Stable Diffusion Turbo model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -q \"diffusers[torch]==0.35.1\" transformers==4.56.2 accelerate==1.10.1 gradio safetensors gradio>=4.0.0 pillow>=9.5.0 " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This code mainly relies on the Diffusers library for loading the Stable Diffusion model, Gradio for creating the interactive interface, and PyTorch for GPU acceleration. The architecture and components are automatically handled." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "from diffusers import AutoPipelineForText2Image\n", + "import gradio as gr\n", + "from PIL import Image\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We load the SDXL-Turbo model from Hugging Face Hub, which is optimized for fast single-step image generation. The model is approximately 7GB and will be downloaded on first run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Loading Stable Diffusion Turbo model...\")\n", + "print(\"(This may take a few minutes on first run as the model downloads)\\n\")\n", + "\n", + "# Load the model with optimizations for GPU\n", + "pipe = AutoPipelineForText2Image.from_pretrained(\n", + " \"stabilityai/sdxl-turbo\",\n", + " torch_dtype=torch.float16,\n", + " variant=\"fp16\"\n", + ")\n", + "\n", + "# Move model to GPU\n", + "pipe = pipe.to(\"cuda\")\n", + "\n", + "pipe.enable_attention_slicing()\n", + "\n", + "print(\"βœ“ Model loaded successfully!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we define the core function that interacts with the loaded pipeline. The `generate_image` function takes a text prompt and uses the diffusion model to create an image. For the Turbo model, we use just 1 inference step and a guidance scale of 0.0 for maximum speed, as this is its intended configuration. The function also supports a random seed for reproducible results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def generate_image(prompt, num_inference_steps=1, guidance_scale=0.0, seed=None):\n", + " \"\"\"\n", + " Generate an image from a text prompt.\n", + "\n", + " Args:\n", + " prompt: Text description of the desired image\n", + " num_inference_steps: Number of denoising steps (1 for Turbo)\n", + " guidance_scale: How strictly to follow the prompt (0.0 for Turbo)\n", + " seed: Random seed for reproducibility (None for random)\n", + "\n", + " Returns:\n", + " PIL Image\n", + " \"\"\"\n", + " # Set seed for reproducibility if provided\n", + " generator = None\n", + " device = pipe.device # Define the device\n", + " if seed is not None:\n", + " generator = torch.Generator(device=device).manual_seed(seed)\n", + "\n", + " # Generate image\n", + " image = pipe(\n", + " prompt=prompt,\n", + " num_inference_steps=num_inference_steps,\n", + " guidance_scale=guidance_scale,\n", + " generator=generator\n", + " ).images[0]\n", + "\n", + " return image" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To showcase the model's ability to create variations, we define a function to generate a grid of images from the same prompt using different random seeds. This is useful for exploring different interpretations of a single concept. The function creates a matplotlib figure displaying all generated images in a grid layout with their corresponding seeds." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Gradio Blocks Interface for Image Grid Generation\n", + "def generate_image_grid_gradio(prompt, num_images=4, seed_start=42):\n", + " \"\"\"\n", + " Generate a grid of images for Gradio interface.\n", + " \n", + " Args:\n", + " prompt: Text description\n", + " num_images: Number of variations to generate\n", + " seed_start: Starting seed\n", + " \n", + " Returns:\n", + " matplotlib figure\n", + " \"\"\"\n", + " images = []\n", + " seeds = []\n", + " \n", + " for i in range(num_images):\n", + " seed = seed_start + i\n", + " image = generate_image(prompt, seed=seed)\n", + " images.append(image)\n", + " seeds.append(seed)\n", + " \n", + " # Create grid layout\n", + " grid_size = int(np.ceil(np.sqrt(num_images)))\n", + " fig, axes = plt.subplots(grid_size, grid_size, figsize=(10, 10))\n", + " \n", + " if grid_size == 1:\n", + " axes = [[axes]]\n", + " elif len(axes.shape) == 1:\n", + " axes = [axes]\n", + " \n", + " fig.suptitle(f'Prompt: \"{prompt}\"', fontsize=14, fontweight='bold')\n", + " \n", + " for idx, ax in enumerate(axes.flat):\n", + " if idx < len(images):\n", + " ax.imshow(images[idx])\n", + " ax.set_title(f'Seed: {seeds[idx]}', fontsize=8)\n", + " ax.axis('off')\n", + " \n", + " plt.tight_layout()\n", + " return fig" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's create an advanced interactive web interface using Gradio's Blocks API. This interface provides more control, allowing users to generate a grid of image variations from a single prompt. It includes sliders to control the number of images and the starting seed for reproducibility." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create Gradio Blocks interface\n", + "with gr.Blocks(title=\"🎨 AI Image Grid Generator\") as demo:\n", + " gr.Markdown(\"# 🎨 AI Image Grid Generator\")\n", + " gr.Markdown(\"Generate multiple variations of the same prompt with different seeds\")\n", + " \n", + " with gr.Row():\n", + " with gr.Column():\n", + " prompt_input = gr.Textbox(\n", + " label=\"Prompt\",\n", + " placeholder=\"Describe the image you want to generate...\",\n", + " lines=2\n", + " )\n", + " num_images = gr.Slider(\n", + " minimum=1,\n", + " maximum=9,\n", + " value=4,\n", + " step=1,\n", + " label=\"Number of Images\"\n", + " )\n", + " seed_start = gr.Slider(\n", + " minimum=0,\n", + " maximum=1000,\n", + " value=42,\n", + " step=1,\n", + " label=\"Starting Seed\"\n", + " )\n", + " generate_btn = gr.Button(\"Generate Image Grid\", variant=\"primary\")\n", + " \n", + " with gr.Column():\n", + " output_plot = gr.Plot(label=\"Generated Images\")\n", + " \n", + " examples = gr.Examples(\n", + " examples=[\n", + " [\"a cat wearing sunglasses on a beach\", 4, 42],\n", + " [\"a futuristic city at sunset with flying cars\", 4, 123],\n", + " [\"a magical forest with glowing mushrooms\", 4, 456]\n", + " ],\n", + " inputs=[prompt_input, num_images, seed_start]\n", + " )\n", + " \n", + " generate_btn.click(\n", + " fn=generate_image_grid_gradio,\n", + " inputs=[prompt_input, num_images, seed_start],\n", + " outputs=output_plot\n", + " )\n", + "\n", + "print(\"βœ“ Gradio Grid interface created!\")\n", + "demo.launch(share=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def gradio_generate(prompt, seed, num_steps):\n", + " \"\"\"\n", + " Wrapper function for Gradio interface.\n", + " \n", + " Args:\n", + " prompt: Text description\n", + " seed: Random seed (-1 for random)\n", + " num_steps: Number of inference steps (1-4 for Turbo)\n", + " \n", + " Returns:\n", + " PIL Image\n", + " \"\"\"\n", + " # Convert -1 to None for random seed\n", + " actual_seed = None if seed == -1 else seed\n", + " \n", + " # Generate and return image\n", + " return generate_image(prompt, num_inference_steps=num_steps, seed=actual_seed)\n", + "\n", + "\n", + "# Create Gradio interface\n", + "demo = gr.Interface(\n", + " fn=gradio_generate,\n", + " inputs=[\n", + " gr.Textbox(\n", + " label=\"Prompt\",\n", + " placeholder=\"Describe the image you want to generate...\",\n", + " lines=3\n", + " ),\n", + " gr.Slider(\n", + " minimum=-1,\n", + " maximum=10000,\n", + " value=-1,\n", + " step=1,\n", + " label=\"Seed (-1 for random)\"\n", + " ),\n", + " gr.Slider(\n", + " minimum=1,\n", + " maximum=4,\n", + " value=1,\n", + " step=1,\n", + " label=\"Inference Steps (1 = fastest, 4 = highest quality)\"\n", + " )\n", + " ],\n", + " outputs=gr.Image(label=\"Generated Image\", type=\"pil\"),\n", + " title=\"🎨 Stable Diffusion Turbo - Text-to-Image Generator\",\n", + " description=\"Generate AI images from text prompts using Stable Diffusion Turbo. Type your prompt and click Submit!\",\n", + " examples=[\n", + " [\"a cat wearing sunglasses on a beach\", 42, 1],\n", + " [\"a futuristic city at sunset with flying cars\", 123, 1],\n", + " [\"a magical forest with glowing mushrooms and fairy lights\", 456, 1],\n", + " [\"portrait of a wise old wizard with a long beard\", 789, 1],\n", + " [\"a steampunk robot playing piano in a Victorian mansion\", 999, 1],\n", + " [\"an underwater city with bioluminescent coral and fish\", 555, 1]\n", + " ],\n", + " theme=gr.themes.Soft(),\n", + " allow_flagging=\"never\"\n", + ")\n", + "\n", + "print(\"\\nβœ“ Gradio interface created!\")\n", + "print(\" Run the next cell to launch the interface.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Launch the Interface\n", + "\n", + "Run this cell to launch the interactive Gradio interface. You can use it directly in the notebook or share the public URL. The interface connects the UI elements to the model, allowing real-time image generation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Launch the interface\n", + "demo.launch(share=False, debug=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The interface is now live! You can:\n", + "- Type any prompt in the text box\n", + "- Adjust the seed for reproducibility\n", + "- Change inference steps (1 = fastest, 4 = best quality)\n", + "- Click example prompts to try them instantly\n", + "- Generate unlimited images!\n", + "\n", + "**Tip:** For best results, be descriptive in your prompts. Instead of \"a cat\", try \"a fluffy orange cat sitting on a windowsill at sunset\"." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/examples/deep_learning-computerVision/nvidia_tensorrt/ONNX_TensorRT_Optimization.ipynb b/examples/deep_learning-computerVision/nvidia_tensorrt/ONNX_TensorRT_Optimization.ipynb new file mode 100644 index 00000000..346425ae --- /dev/null +++ b/examples/deep_learning-computerVision/nvidia_tensorrt/ONNX_TensorRT_Optimization.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"machine_shape":"hm","gpuType":"A100","toc_visible":true,"authorship_tag":"ABX9TyMf6mR8BjApc1msAtoNvl4l"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"},"accelerator":"GPU"},"cells":[{"cell_type":"markdown","source":["## βœ… ONNX/TensorRT Optimization\n","\n","![Speedometer latency chart](https://cdn-icons-png.freepik.com/512/12710/12710013.png)\n","\n","In this template, we optimized a PyTorch-based image classification model using **ONNX + TensorRT** for **fast, low-latency inference**, with minimal code changes. You trained a model using **AutoGluon’s MultiModalPredictor**, exported it to **ONNX format**, and ran **benchmark comparisons** between vanilla PyTorch inference and accelerated **TensorRT** inference β€” all inside a Jupyter Notebook.\n","\n","Running this workflow on [**Saturn Cloud**](https://saturncloud.io) makes it both **GPU-accelerated and production-ready**. With Saturn Cloud, you can:\n","\n","* πŸ” Train models using **multiple frameworks** like PyTorch, TensorFlow, Hugging Face, and AutoGluon.\n","* πŸš€ Deploy on **scalable NVIDIA GPU machines** with support for CUDA, TensorRT, and ONNXRuntime.\n","* πŸ“Š Run inference at scale using **interactive notebooks or scheduled jobs** β€” ideal for real-time applications.\n","\n","This template gives you a full pipeline to **quantize, export, and benchmark** fast image inference β€” ideal for edge devices, cloud APIs, or any latency-sensitive AI application."],"metadata":{"id":"QReXVCBz9iDb"}},{"cell_type":"code","source":["# Install AutoGluon with multimodal and TensorRT support\n","!pip install -q autogluon.multimodal[all] onnx onnxruntime-gpu\n"],"metadata":{"id":"3wD6onKP5i9A"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### πŸ“₯ Step 1: Download and Prepare the PetFinder Dataset\n","\n","This downloads a simplified **PetFinder** dataset used to train a model that predicts whether a pet is likely to be adopted quickly.\n","\n","The dataset contains images + metadata (text, tabular, etc), but in this flow we’ll use **images only** for simplicity.\n","\n","---\n","\n","### 🧹 Step 2: Load and Clean the Data\n","\n","Loads the CSV files and defines the key columns. We'll drop text/numerical data later to focus on image performance.\n","\n","---\n","\n"],"metadata":{"id":"10Z3RdMz9i3k"}},{"cell_type":"code","source":["import os\n","from autogluon.core.utils.loaders import load_zip\n","\n","# πŸ”½ Download & unzip dataset\n","download_dir = './ag_automm_tutorial'\n","zip_url = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip'\n","load_zip.unzip(zip_url, unzip_dir=download_dir)\n","\n","# βœ… Confirm structure\n","dataset_path = os.path.join(download_dir, 'petfinder_for_tutorial')\n","print(\"πŸ“ Dataset folder:\", dataset_path)\n","print(\"πŸ“· Sample images:\", os.listdir(os.path.join(dataset_path, 'images'))[:3])\n"],"metadata":{"id":"M8fRg9fX5i50"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["\n","### 🧹 Step 2: Load and Clean the Data\n","\n","Loads the CSV files and defines the key columns. We'll drop text/numerical data later to focus on image performance.\n","\n","---\n","\n","### πŸ–ΌοΈ Step 3: Preprocess Image Paths\n","\n","Ensures all image paths are **fully resolved** so AutoGluon can locate them.\n","---"],"metadata":{"id":"BUMNjPyB9jQr"}},{"cell_type":"code","source":["import pandas as pd\n","import os\n","\n","# Define dataset path (same as from Step 1)\n","dataset_path = './ag_automm_tutorial/petfinder_for_tutorial'\n","\n","# Load CSVs\n","train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0)\n","test_data = pd.read_csv(f'{dataset_path}/test.csv', index_col=0)\n","\n","# Target label\n","label_col = 'AdoptionSpeed'\n","image_col = 'Images'\n","\n","# For this tutorial, we only use the first image in the list\n","train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0])\n","test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])\n","\n","# Expand image path\n","def path_expander(path, base_folder):\n"," return os.path.abspath(os.path.join(base_folder, path))\n","\n","train_data[image_col] = train_data[image_col].apply(lambda x: path_expander(x, dataset_path))\n","test_data[image_col] = test_data[image_col].apply(lambda x: path_expander(x, dataset_path))\n","\n","# πŸ” Preview\n","print(\"βœ… Sample rows from training data:\")\n","train_data[[image_col, label_col]].head()\n"],"metadata":{"id":"pjHceXrS5i28"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### 🧠 Step 4: Train Image-Only Model with AutoGluon\n","\n","Trains a lightweight image classification model (e.g., MobileNetV3) using only image data.\n","\n","The model is trained quickly (~2 mins) to allow for demonstration and benchmarking.\n","\n","---"],"metadata":{"id":"0CTsXfKE9kEq"}},{"cell_type":"code","source":["from autogluon.multimodal import MultiModalPredictor\n","\n","# Drop extra columns – use image + label only\n","train_image_only = train_data[[image_col, label_col]]\n","test_image_only = test_data[[image_col, label_col]]\n","\n","# Set a short training time (you can increase this later)\n","predictor = MultiModalPredictor(label=label_col).fit(\n"," train_data=train_image_only,\n"," time_limit=120, # seconds\n",")\n","\n","# Save model path\n","model_path = predictor.path\n","print(f\"βœ… Model trained and saved at: {model_path}\")\n"],"metadata":{"id":"J2o9UegO5i0h"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### βš™οΈ Step 5: Optimize the Model for Inference (TensorRT via ONNX)\n","\n","Converts the model internally to ONNX format and enables **ONNX Runtime acceleration**.\n","\n","---\n"],"metadata":{"id":"HT0lXvoz9kqt"}},{"cell_type":"code","source":["from autogluon.multimodal import MultiModalPredictor\n","\n","# Load the model from previous training path\n","trt_predictor = MultiModalPredictor.load(path=model_path)\n","\n","# Optimize for fast inference (uses ONNX + TensorRT if available)\n","trt_predictor.optimize_for_inference()\n","\n","print(\"βœ… Model optimized for ONNX/TensorRT inference!\")\n"],"metadata":{"id":"8tdblQVA5ixy"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":[],"metadata":{"id":"v1nGxFzW9lFm"}},{"cell_type":"code","source":["import time\n","import numpy as np\n","\n","# Use small batch for timing test\n","batch_size = 2\n","n_trials = 10\n","sample = test_image_only.head(batch_size)\n","\n","# --- PyTorch inference timing ---\n","pt_times = []\n","for _ in range(n_trials):\n"," start = time.time()\n"," _ = predictor.predict_proba(sample)\n"," pt_times.append(time.time() - start)\n","\n","pt_avg = np.mean(pt_times)\n","print(f\"🐒 PyTorch Avg Time (per batch): {pt_avg*1000:.2f} ms\")\n","\n","# --- TensorRT (ONNX Runtime) inference timing ---\n","trt_times = []\n","for _ in range(n_trials):\n"," start = time.time()\n"," _ = trt_predictor.predict_proba(sample)\n"," trt_times.append(time.time() - start)\n","\n","trt_avg = np.mean(trt_times)\n","print(f\"⚑ TensorRT Avg Time (per batch): {trt_avg*1000:.2f} ms\")\n","\n","\n","print(\"Final comparison of time lapse:\")\n","print(f\"🐒 PyTorch Avg Time (per batch): {pt_avg*1000:.2f} ms\")\n","print(f\"⚑ TensorRT Avg Time (per batch): {trt_avg*1000:.2f} ms\")\n"],"metadata":{"id":"LivVBwxg5iu0"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### πŸ§ͺ Step 6: Compare Inference Speeds\n","\n","Measures inference speed in **rows per second** across 10 runs.\n","Compares vanilla PyTorch inference with accelerated ONNX/TensorRT inference.\n","\n","---"],"metadata":{"id":"TYBJ7xLu9l6_"}},{"cell_type":"code","source":["import numpy as np\n","\n","# Run predictions on the same sample\n","proba_pt = predictor.predict_proba(sample)\n","proba_trt = trt_predictor.predict_proba(sample)\n","\n","# Show outputs\n","print(\"🎯 PyTorch Output:\")\n","print(proba_pt)\n","\n","print(\"\\nπŸš€ TensorRT Output:\")\n","print(proba_trt)\n","\n","# Check if close (with small tolerance due to FP16 rounding)\n","try:\n"," np.testing.assert_allclose(proba_pt, proba_trt, rtol=1e-2, atol=1e-2)\n"," print(\"\\nβœ… Predictions are numerically close!\")\n","except AssertionError as e:\n"," print(\"\\n⚠️ Predictions differ more than expected.\")\n"," print(str(e))\n"],"metadata":{"id":"EV2dKlRS5ish"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### πŸ“Š Step 7: Visualize Speedup\n","\n","\n","Shows the real benefit of inference optimization β€” TensorRT is often **2–5x faster** than vanilla PyTorch.\n","Ideal for reducing latency in production.\n","\n","---\n"],"metadata":{"id":"gnjxSIrg9TKW"}},{"cell_type":"code","source":["import time\n","import matplotlib.pyplot as plt\n","import numpy as np\n","\n","# Define test batch\n","batch_size = 2\n","n_trials = 10\n","sample = test_data.head(batch_size)\n","\n","# Measure PyTorch inference times\n","pt_times = []\n","for _ in range(n_trials):\n"," start = time.time()\n"," _ = predictor.predict_proba(sample)\n"," pt_times.append(time.time() - start)\n","\n","# Measure TensorRT inference times\n","trt_times = []\n","for _ in range(n_trials):\n"," start = time.time()\n"," _ = trt_predictor.predict_proba(sample)\n"," trt_times.append(time.time() - start)\n","\n","# Calculate rows per second\n","pt_speed = batch_size / np.mean(pt_times)\n","trt_speed = batch_size / np.mean(trt_times)\n","\n","print(f\"⚑ PyTorch speed: {pt_speed:.1f} rows/sec\")\n","print(f\"⚑ TensorRT speed: {trt_speed:.1f} rows/sec\")\n","\n","# Bar chart comparison\n","fig, ax = plt.subplots()\n","fig.set_figheight(1.5)\n","ax.barh([\"PyTorch\", \"TensorRT\"], [pt_speed, trt_speed])\n","ax.annotate(f\"{pt_speed:.1f} rows/s\", xy=(pt_speed, 0))\n","ax.annotate(f\"{trt_speed:.1f} rows/s\", xy=(trt_speed, 1))\n","_ = plt.xlabel(\"Inference Speed (rows per second)\")\n","plt.show()\n"],"metadata":{"id":"AoWKUlo_5ipo"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["\n","\n","## βœ… Conclusion\n","\n","In this template, we:\n","\n","βœ… Trained an image classifier using **AutoGluon’s MultiModalPredictor**\n","βœ… Exported and optimized the model with **ONNX + TensorRT**\n","βœ… Benchmarked inference performance side-by-side\n","βœ… Verified minimal accuracy loss with **significant speed gains**\n","\n","Running this end-to-end on **[Saturn Cloud](https://saturncloud.io)** ensures you have:\n","\n","* πŸ’» Pre-configured NVIDIA GPU environments\n","* ⚑ Fast installation of CUDA, TensorRT, PyTorch, and ONNX\n","* πŸ“… Support for scheduled inference pipelines and model serving\n","\n","Whether you're deploying to production or prototyping locally, **this template helps you productionize your models with speed and efficiency**.\n","\n","---\n","\n","### πŸ“š Continue Exploring\n","\n","* [Saturn Cloud Documentation](https://saturncloud.io/docs/) – Custom environments, GPUs, and scheduling\n","* [ONNX + TensorRT Blog Post](https://saturncloud.io/blog/) – Coming soon\n","* [Saturn Cloud Templates](https://saturncloud.io/resources/templates/) – Other GPU-accelerated projects.\n"],"metadata":{"id":"PZBmu-tW1Q_R"}}]} \ No newline at end of file diff --git a/examples/deep_learning-computerVision/nvidia_tensorrt/README.md b/examples/deep_learning-computerVision/nvidia_tensorrt/README.md new file mode 100644 index 00000000..fc7d971f --- /dev/null +++ b/examples/deep_learning-computerVision/nvidia_tensorrt/README.md @@ -0,0 +1,94 @@ +# πŸ”₯ ONNX/TensorRT Optimization + +This project demonstrates how to accelerate inference in image classification tasks using **ONNX** and **TensorRT**, powered by **AutoGluon MultiModalPredictor**. You’ll train a lightweight image model and optimize it for **low-latency prediction**, comparing vanilla PyTorch vs. optimized inference. + +> πŸš€ Ideal for real-time AI applications, edge deployment, and cloud inference. + +--- + +## πŸ“¦ Key Features + +- βœ… Train image classification models with [AutoGluon](https://www.autogluon.ai/) +- βœ… Convert models to ONNX for hardware-agnostic inference +- βœ… Accelerate inference using NVIDIA [TensorRT](https://developer.nvidia.com/tensorrt) +- βœ… Benchmark PyTorch vs. TensorRT speed +- βœ… Fully reproducible in Jupyter / Colab / [Saturn Cloud](https://saturncloud.io) + +--- + +## πŸ“ Project Structure + +``` + +. +β”œβ”€β”€ ag_automm_tutorial/ +β”‚ └── petfinder_for_tutorial/ +β”œβ”€β”€ notebook.ipynb # Main inference pipeline +└── README.md # This file + +```` + +--- + +## βš™οΈ Dependencies + +Install in Jupyter or Colab: + +```bash +pip install autogluon +pip install onnx onnxruntime-gpu +pip install matplotlib +```` + +> πŸ’‘ TensorRT is used automatically by ONNXRuntime on supported NVIDIA GPUs. No manual installation required for ONNXRuntime-GPU. + +--- + +## πŸš€ How to Run + +1. Launch the notebook: `notebook.ipynb` +2. Follow the **7 steps** to: + + * Load and clean dataset + * Train image model + * Export to ONNX + * Optimize with TensorRT + * Compare inference speed (PyTorch vs. ONNX) +3. Visualize performance + +--- + +## πŸ“Š Sample Speedup + +| Framework | Inference Speed | +| --------- | --------------- | +| PyTorch | ~50 rows/sec | +| TensorRT | ~150 rows/sec | + +> πŸ”§ Results will vary based on GPU model and batch size. + +--- + +## ☁️ Recommended Environment: Saturn Cloud + +This notebook runs great on [**Saturn Cloud**](https://saturncloud.io): + +* πŸ” NVIDIA GPU preinstalled with CUDA + TensorRT +* πŸ’‘ Jupyter, VSCode, and Python environments +* πŸ•’ Schedule jobs or deploy as APIs +* πŸ” Great for prototyping + production AI workflows + +🟒 Try it free: [https://saturncloud.io](https://saturncloud.io) + +--- + +## 🧠 Related Projects + +* [AutoGluon Examples](https://github.com/autogluon/autogluon/tree/master/examples) +* [ONNX Runtime](https://onnxruntime.ai/) +* [TensorRT Developer Docs](https://docs.nvidia.com/deeplearning/tensorrt/) +* [Saturn Cloud Templates](https://saturncloud.io/resources/templates/) + +--- + +