diff --git a/tutorials/notebooks/task_notebooks/README.md b/tutorials/notebooks/task_notebooks/README.md index 2309930f7..1fa8126e5 100644 --- a/tutorials/notebooks/task_notebooks/README.md +++ b/tutorials/notebooks/task_notebooks/README.md @@ -6,7 +6,7 @@ The notebooks in this section demonstrate how to use MCT for various tasks and m | Model | Task | Notes | |----------------------------------------------------------------------------|---------------------------|----------------------------------------------------------------------------------------------| - | TBD | | | + | [EfficientDet](keras/example_effdet_keras_mixed_precision_ptq.ipynb) | Object Detection | use [CustomLayer](https://github.com/SonySemiconductorSolutions/aitrios-edge-mdt-cl/tree/main) | ### Pytorch Tutorials diff --git a/tutorials/notebooks/task_notebooks/keras/__init__.py b/tutorials/notebooks/task_notebooks/keras/__init__.py new file mode 100644 index 000000000..8b1378917 --- /dev/null +++ b/tutorials/notebooks/task_notebooks/keras/__init__.py @@ -0,0 +1 @@ + diff --git a/tutorials/notebooks/task_notebooks/keras/example_effdet_keras_mixed_precision_ptq.ipynb b/tutorials/notebooks/task_notebooks/keras/example_effdet_keras_mixed_precision_ptq.ipynb new file mode 100644 index 000000000..03dcb44f4 --- /dev/null +++ b/tutorials/notebooks/task_notebooks/keras/example_effdet_keras_mixed_precision_ptq.ipynb @@ -0,0 +1,810 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6bb4870d", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "# EfficientDet and Mixed-Precision Post-Training Quantization in Keras using the Model Compression Toolkit(MCT)\n", + "\n", + "## Overview\n", + "This quick-start guide explains how to use the **Model Compression Toolkit (MCT)** to quantize a EfficientDet model. We will load a pre-trained model and quantize it using the MCT with **Mixed-Precision Post-Training Quantization (PTQ)** .\n", + "\n", + "## Summary\n", + "In this tutorial, we will cover:\n", + "\n", + "1. Loading and preprocessing COCO’s dataset.\n", + "2. Constructing an unlabeled representative dataset.\n", + "3. Post-Training Quantization using MCT.\n", + "4. Accuracy evaluation of the floating-point and the quantized models.\n", + "\n", + "## efficientdet-pytorch(Dependent External Repository)\n", + "This tutorial uses a pre-trained pytorch model from the repository linked below, and converts to a pre-trained keras model. \n", + "Installation instructions are provided in the **Setup** section. \n", + "[efficientdet-pytorch](https://github.com/rwightman/efficientdet-pytorch)\n", + "\n", + "### License(efficientdet-pytorch)\n", + " Copyright 2020 Ross Wightman\n", + "\n", + " Licensed under the Apache License, Version 2.0 (the \"License\");\n", + " you may not use this file except in compliance with the License.\n", + " You may obtain a copy of the License at\n", + "\n", + " http://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + " Unless required by applicable law or agreed to in writing, software\n", + " distributed under the License is distributed on an \"AS IS\" BASIS,\n", + " WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + " See the License for the specific language governing permissions and\n", + " limitations under the License.\n", + "\n", + "## Additional Code Attribution\n", + "\n", + "This tutorial uses custom model conversion code located in the `models/efficientdet/` and `models/utils/` directory.\n", + "These files facilitate the conversion of PyTorch EfficientDet models to Keras/TensorFlow format for use with MCT.\n", + "\n", + "### Source Code Attribution\n", + "\n", + "The following files contain code derived from open-source PyTorch implementations:\n", + "\n", + "**efficientdet-pytorch**\n", + "- Source: https://github.com/rwightman/efficientdet-pytorch\n", + "- License: Apache License 2.0\n", + "- Files: `effdet_keras.py`, `torch2keras_weights_translation.py`\n", + "- Modifications: PyTorch layers converted to Keras/TensorFlow equivalents, weight loading adapted for Keras format\n", + "\n", + "**pytorch-image-models (timm)**\n", + "- Source: https://github.com/huggingface/pytorch-image-models\n", + "- License: Apache License 2.0\n", + "- Files: `effnet_keras.py`, `effnet_blocks_keras.py`\n", + "- Modifications: `torch.nn.Module` classes converted to Keras layers\n", + "\n", + "### License(efficientdet-pytorch)\n", + "Please refer to the license section described earlier in this notebook.\n", + "\n", + "### License(pytorch-image-models)\n", + "```\n", + " Copyright 2019 Ross Wightman\n", + "\n", + " Licensed under the Apache License, Version 2.0 (the \"License\");\n", + " you may not use this file except in compliance with the License.\n", + " You may obtain a copy of the License at\n", + "\n", + " http://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + " Unless required by applicable law or agreed to in writing, software\n", + " distributed under the License is distributed on an \"AS IS\" BASIS,\n", + " WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + " See the License for the specific language governing permissions and\n", + " limitations under the License.\n", + "```\n", + "\n", + "For detailed attribution information, see the header comments in each file under `models/`.\n", + "\n", + "## Setup " + ] + }, + { + "cell_type": "markdown", + "id": "646df95c", + "metadata": {}, + "source": [ + "First, install the relevant packages: \n", + "This step may take several minutes...\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa7b9ab2", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install tensorflow==2.15.*\n", + "!pip install numpy==1.26.4\n", + "!pip install opencv-python==4.9.0.80\n", + "!pip install pycocotools==2.0.10\n", + "\n", + "# install efficientdet-pytorch(effdet) and dependencies\n", + "!pip install torch==2.6.0 torchvision==0.21.0\n", + "!pip install timm==0.9.16\n", + "!pip install effdet==0.4.1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "57c3f3ea", + "metadata": {}, + "outputs": [], + "source": [ + "import importlib\n", + "if not importlib.util.find_spec('model_compression_toolkit'):\n", + " !pip install model_compression_toolkit" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0dba768a", + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "from typing import Dict, List, Tuple, Any\n", + "import random\n", + "import os\n", + "import cv2\n", + "import numpy as np\n", + "import itertools\n", + "from tqdm import tqdm\n", + "from pycocotools.coco import COCO\n", + "from pycocotools.cocoeval import COCOeval\n", + "\n", + "import model_compression_toolkit as mct\n", + "from edgemdt_cl.keras import SSDPostProcess\n", + "from edgemdt_cl.keras.object_detection import ScoreConverter\n", + "\n", + "from effdet.config import get_efficientdet_config\n", + "from effdet.anchors import Anchors" + ] + }, + { + "cell_type": "markdown", + "id": "7c766698", + "metadata": {}, + "source": [ + "### Various Settings\n", + "Here, you can configure the parameters listed below. \n", + "\n", + "#### Parameter setting\n", + "- IMG_HEIGHT, IMG_WIDTH \n", + " This parameter allows you to set the size of input images.\n", + "- SCORE_THR \n", + " This parameter allows you to set the threshold of class score for the Non-Maximum Suppression (NMS) and evaluation.\n", + "- IOU_THR \n", + " This parameter allows you to set the threshold of iou for the Non-Maximum Suppression (NMS).\n", + "- CALIB_ITER \n", + " This parameter allows you to set how many samples to use when generating representative data for quantization.\n", + "- WEIGHTS_COMPRESSION_RATIO \n", + " This parameter allows you to set the quantization ratio based on the weight size of the 8-bit model when using mixed-precision quantization." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6f4914b", + "metadata": {}, + "outputs": [], + "source": [ + "# Parameter setting\n", + "IMG_HEIGHT = 320\n", + "IMG_WIDTH = 320\n", + "SCORE_THR = 0.001\n", + "IOU_THR = 0.50\n", + "CALIB_ITER = 10\n", + "WEIGHTS_COMPRESSION_RATIO = 0.85\n", + "BATCH_SIZE = 16" + ] + }, + { + "cell_type": "markdown", + "id": "c928c12f", + "metadata": {}, + "source": [ + "Load a pre-trained PyTorch model, and Convert to Keras model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a87d9885", + "metadata": {}, + "outputs": [], + "source": [ + "from models.efficientdet import EfficientDetKeras\n", + "model_name = 'tf_efficientdet_lite0'\n", + "config = get_efficientdet_config(model_name)\n", + "input_shape = [*config.image_size] + [3]\n", + "\n", + "float_model = EfficientDetKeras(config, pretrained_backbone=False).get_model(input_shape)" + ] + }, + { + "cell_type": "markdown", + "id": "436710a6", + "metadata": {}, + "source": [ + "Next, we add the CustomLayer (edgemdt_cl) **SSDPostProcess** as post-processing. \n", + "\n", + "SSDPostProcess: Decodes EfficientDet inference results from Anchor format to BoundingBox format and Executes the Non-Maximum Suppression to remove overlapping boxes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "97e10a7b", + "metadata": {}, + "outputs": [], + "source": [ + "# Make CustomLayer Instance\n", + "anchors = tf.constant(Anchors.from_config(config).boxes.detach().cpu().numpy())\n", + "ssd_pp = SSDPostProcess(anchors, [1, 1, 1, 1], [*config.image_size],\n", + " ScoreConverter.SIGMOID, score_threshold=SCORE_THR, iou_threshold=IOU_THR,\n", + " max_detections=config.max_det_per_image)\n", + "\n", + "# Add CustomLayer to model\n", + "input = tf.keras.layers.Input(shape=input_shape)\n", + "x_class, x_box = float_model(input)\n", + "outputs = ssd_pp((x_box, x_class))\n", + "full_float_model = tf.keras.Model(inputs=input, outputs=outputs)" + ] + }, + { + "cell_type": "markdown", + "id": "0af4bac0", + "metadata": {}, + "source": [ + "The input and output formats of SSDPostProcess are shown below. \n", + "For detailed attribution information, see [API Document](https://sonysemiconductorsolutions.github.io/aitrios-edge-mdt-cl/edgemdt_cl/keras.html#SSDPostProcess).\n", + "\n", + "Inputs: \n", + " A list or tuple of: \n", + "- rel_codes: Relative codes (encoded offsets). \n", + "- scores: Scores or logits. \n", + "\n", + "Returns: \n", + " 'CombinedNonMaxSuppression' named tuple: \n", + "- nmsed_boxes: Selected boxes sorted by scores in descending order.\n", + "- nmsed_scores: Scores corresponding to the selected boxes.\n", + "- nmsed_classes: Labels corresponding to the selected boxes. \n", + "- valid_detections: The number of valid detections out of max_detections(unused in this tutorial)." + ] + }, + { + "cell_type": "markdown", + "id": "39a4d437", + "metadata": {}, + "source": [ + "## Dataset preparation\n", + "### Download COCO's dataset\n", + "\n", + "**Note** \n", + "In this tutorial, we will use a subset of COCO train2017 for calibration during quantization and COCO val2017 for evaluation.\n", + "\n", + "This step may take several minutes..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "afe82c55", + "metadata": {}, + "outputs": [], + "source": [ + "if not os.path.isdir('COCO_dataset'):\n", + " !mkdir COCO_dataset\n", + " !wget -P COCO_dataset http://images.cocodataset.org/annotations/annotations_trainval2017.zip\n", + " !wget -P COCO_dataset http://images.cocodataset.org/zips/train2017.zip\n", + " !wget -P COCO_dataset http://images.cocodataset.org/zips/val2017.zip\n", + " !unzip COCO_dataset/annotations_trainval2017.zip -d COCO_dataset\n", + " !unzip COCO_dataset/train2017.zip -d COCO_dataset\n", + " !unzip COCO_dataset/val2017.zip -d COCO_dataset" + ] + }, + { + "cell_type": "markdown", + "id": "fe934651", + "metadata": {}, + "source": [ + "Here, we are setting the paths for the annotation file and image folder of the downloaded dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36d8913a", + "metadata": {}, + "outputs": [], + "source": [ + "COCO_TRAIN_IMG_DIR = \"COCO_dataset/train2017/\"\n", + "COCO_VAL_IMG_DIR = \"COCO_dataset/val2017/\"\n", + "COCO_TRAIN_ANN_JSON = \"COCO_dataset/annotations/instances_train2017.json\"\n", + "COCO_VAL_ANN_JSON = \"COCO_dataset/annotations/instances_val2017.json\"" + ] + }, + { + "cell_type": "markdown", + "id": "b64c60b0", + "metadata": {}, + "source": [ + "In this class, we process the downloaded COCO's dataset for calibration during quantization and for use in evaluation. \n", + "We define the dataset and dataloader for COCO's dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa9c159c", + "metadata": {}, + "outputs": [], + "source": [ + "class CocoDataset:\n", + " \"\"\"\n", + " COCO dataset class (preprocessor matching effdet's pipeline).\n", + "\n", + " Args:\n", + " img_dir (str): A directory path containing COCO images.\n", + " ann_json (str): A file path to COCO annotation json file.\n", + " img_size (Tuple[float, float]): Target image size for effdet model.\n", + " \"\"\"\n", + " def __init__(self, img_dir: str, ann_json: str, img_size: Tuple = (320, 320)):\n", + " self.img_dir = img_dir\n", + " self.coco = COCO(ann_json)\n", + " self.img_ids = self.coco.getImgIds()\n", + " self.img_size = img_size\n", + " \n", + " # Normalization parameters matching effdet configuration\n", + " self.mean = np.array([0.5, 0.5, 0.5], dtype=np.float32)\n", + " self.std = np.array([0.5, 0.5, 0.5], dtype=np.float32)\n", + " self.fill_value = (self.mean * 255).astype(np.uint8)\n", + "\n", + " def __len__(self) -> int: \n", + " return len(self.img_ids)\n", + "\n", + " \"\"\"\n", + " Iteration of COCO dataset.\n", + "\n", + " Args:\n", + " idx (int): Index of the image in the dataset.\n", + "\n", + " Returns:\n", + " Dict[str, Any]: A dictionary containing:\n", + " 'input' (np.ndarray): Preprocessed image.\n", + " 'id' (int): Image ID.\n", + " 'file_name' (str): Image file name.\n", + " 'ratio' (float): Scale factor used in preprocessing.\n", + " \"\"\"\n", + " def __getitem__(self, idx: int) -> Dict[str, Any]:\n", + " img_id = self.img_ids[idx]\n", + " img_info = self.coco.loadImgs([img_id])[0]\n", + " img_path = os.path.join(self.img_dir, img_info['file_name'])\n", + "\n", + " org_img = cv2.imread(img_path)\n", + " org_img = cv2.cvtColor(org_img, cv2.COLOR_BGR2RGB)\n", + " input_img, ratio = self.preprocess(input_img=org_img)\n", + "\n", + " sample = {\n", + " 'input': input_img,\n", + " 'id': img_id,\n", + " 'file_name': img_info['file_name'],\n", + " 'ratio': ratio\n", + " }\n", + " return sample\n", + " \n", + " def preprocess(self, input_img: np.ndarray) -> Tuple:\n", + " \"\"\"\n", + " Preprocess image to match effdet's pipeline.\n", + " \n", + " Args:\n", + " input_img (np.ndarray): Input image in HWC format.\n", + "\n", + " Returns:\n", + " Tuple[np.ndarray, float]:\n", + " - Preprocessed image.\n", + " - Scale factor used in resizing.\n", + " \"\"\"\n", + " height, width = input_img.shape[:2]\n", + " target_h, target_w = self.img_size\n", + " \n", + " # Calculate scale factor for letterbox resize\n", + " img_scale = min(target_h / height, target_w / width)\n", + " \n", + " # Resize with bilinear interpolation\n", + " scaled_h = int(height * img_scale)\n", + " scaled_w = int(width * img_scale)\n", + " resized_img = cv2.resize(input_img, (scaled_w, scaled_h), interpolation=cv2.INTER_LINEAR)\n", + " \n", + " # Pad with mean value\n", + " padded_img = np.full((target_h, target_w, 3), self.fill_value, dtype=np.uint8)\n", + " padded_img[:scaled_h, :scaled_w, :] = resized_img\n", + " \n", + " # Normalize: (x/255 - mean) / std\n", + " normalized_img = (padded_img.astype(np.float32) / 255.0 - self.mean) / self.std\n", + "\n", + " return normalized_img, img_scale" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3171def3", + "metadata": {}, + "outputs": [], + "source": [ + "class CocoDataLoader:\n", + " \"\"\"\n", + " Dataloader class like pytorch for CocoDataset.\n", + "\n", + " Args:\n", + " dataset (List[Tuple]): A list of dataset samples.\n", + " batch_size (int): Number of samples per batch.\n", + " shuffle (bool): Whether to shuffle the dataset at the start of each iteration.\n", + " \"\"\"\n", + " def __init__(self, dataset: List[Tuple], batch_size: int, shuffle: bool = False):\n", + " self.dataset = dataset\n", + " self.batch_size = batch_size\n", + " self.shuffle = shuffle\n", + " self.count = 0\n", + " self.inds = list(range(len(dataset)))\n", + "\n", + " def __iter__(self):\n", + " self.count = 0\n", + " if self.shuffle:\n", + " random.shuffle(self.inds)\n", + "\n", + " return self\n", + "\n", + " \"\"\"\n", + " Iteration of COCO dataloader.\n", + "\n", + " Returns:\n", + " Dict[str, Any]: A dictionary containing:\n", + " 'input' (np.ndarray): Preprocessed image.\n", + " 'id' (int): Image ID.\n", + " 'file_name' (str): Image file name.\n", + " 'ratio' (float): Scale factor used in preprocessing.\n", + " \"\"\"\n", + " def __next__(self):\n", + " if self.count >= len(self.dataset):\n", + " raise StopIteration\n", + "\n", + " batch_sample = {}\n", + " batch_count = 0\n", + " while batch_count < self.batch_size and self.count < len(self.dataset):\n", + " index = self.inds[self.count]\n", + " sample = self.dataset[index]\n", + " for sample_key in sample.keys():\n", + " batch_sample.setdefault(sample_key, []).append(sample[sample_key])\n", + " self.count += 1\n", + " batch_count += 1\n", + " for sample_key in batch_sample.keys():\n", + " batch_sample[sample_key] = np.array(batch_sample[sample_key])\n", + "\n", + " return batch_sample" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4061db7f", + "metadata": {}, + "outputs": [], + "source": [ + "val_dataset = CocoDataset(\n", + " img_dir = COCO_VAL_IMG_DIR, ann_json = COCO_VAL_ANN_JSON,\n", + " img_size = (IMG_HEIGHT, IMG_WIDTH)\n", + ")\n", + "train_dataset = CocoDataset(\n", + " img_dir = COCO_TRAIN_IMG_DIR, ann_json=COCO_TRAIN_ANN_JSON,\n", + " img_size = (IMG_HEIGHT, IMG_WIDTH)\n", + ")\n", + "\n", + "# For evaluation\n", + "val_dataloader = CocoDataLoader(\n", + " val_dataset, batch_size=BATCH_SIZE, shuffle=False\n", + ")\n", + "# For calibration(No label required)\n", + "calib_loader = CocoDataLoader(\n", + " train_dataset, batch_size=1, shuffle=False\n", + ")\n", + "\n", + "print(len(train_dataset))\n", + "print(len(val_dataset))" + ] + }, + { + "cell_type": "markdown", + "id": "6f2915a6", + "metadata": {}, + "source": [ + "## Representative Dataset\n", + "For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7bba02a5", + "metadata": {}, + "outputs": [], + "source": [ + "def representative_dataset_gen():\n", + " for sample in itertools.islice(itertools.cycle(calib_loader), CALIB_ITER):\n", + " yield [sample['input']]" + ] + }, + { + "cell_type": "markdown", + "id": "e5ea063a", + "metadata": {}, + "source": [ + "## Target Platform Capabilities (TPC)\n", + "In addition, MCT optimizes the model for dedicated hardware platforms. This is done using TPC (for more details, please visit our [documentation](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/modules/target_platform_capabilities.html)). Here, we use the default Keras TPC:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5d5f5715", + "metadata": {}, + "outputs": [], + "source": [ + "tpc = mct.get_target_platform_capabilities('tensorflow', 'default')" + ] + }, + { + "cell_type": "markdown", + "id": "01d232bb", + "metadata": {}, + "source": [ + "## Mixed Precision Configurations\n", + "We will create a `MixedPrecisionQuantizationConfig` that defines the search options for mixed-precision:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f25783c9", + "metadata": {}, + "outputs": [], + "source": [ + "configuration = mct.core.CoreConfig(\n", + " mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(num_of_images=CALIB_ITER))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "78f1557f", + "metadata": {}, + "outputs": [], + "source": [ + "# Get Resource Utilization information to constraint your model's memory size.\n", + "resource_utilization_data = mct.core.keras_resource_utilization_data(\n", + " full_float_model,\n", + " representative_dataset_gen,\n", + " configuration,\n", + " target_platform_capabilities=tpc)\n", + " \n", + "# Define target Resource Utilization for mixed precision weights quantization.\n", + "resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * WEIGHTS_COMPRESSION_RATIO)" + ] + }, + { + "cell_type": "markdown", + "id": "fa6fcf4d", + "metadata": {}, + "source": [ + "# Post-Training Quantization using MCT\n", + "Now for the exciting part! Let's run PTQ on the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a31c7f1a", + "metadata": {}, + "outputs": [], + "source": [ + "quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(\n", + " in_model=full_float_model,\n", + " representative_data_gen=representative_dataset_gen,\n", + " target_platform_capabilities=tpc,\n", + " core_config=configuration,\n", + " target_resource_utilization=resource_utilization)" + ] + }, + { + "cell_type": "markdown", + "id": "8773b178", + "metadata": {}, + "source": [ + "# Model Evaluation\n", + "Now, we will create a function for evaluating a model. \n", + "The inference results before and after quantization are displayed on the terminal." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62d0e32a", + "metadata": {}, + "outputs": [], + "source": [ + "def evaluate(model: tf.keras.Model, val_dataloader: CocoDataLoader,\n", + " score_threshold: float = 0.1):\n", + " \"\"\"\n", + " Evaluation of the COCO dataset.\n", + "\n", + " Args:\n", + " model (tf.keras.Model): Evaluation model.\n", + " val_dataloader (CocoDataLoader): Evaluation dataset.\n", + " score_threshold (float): Score threshold.\n", + " \"\"\"\n", + " model.trainable = False\n", + "\n", + " results = []\n", + " for sample in tqdm(val_dataloader, desc=\"Evaluating\"):\n", + " input_imgs = sample['input']\n", + " img_ids = sample['id']\n", + " ratios = sample['ratio']\n", + "\n", + " nmsed_boxes, nmsed_scores, nmsed_classes, _ = model(input_imgs)\n", + " nmsed_boxes = nmsed_boxes.numpy()\n", + " nmsed_scores = nmsed_scores.numpy()\n", + " nmsed_classes = nmsed_classes.numpy()\n", + "\n", + " for batch_idx in range(len(img_ids)):\n", + " img_id = img_ids[batch_idx]\n", + " ratio = ratios[batch_idx]\n", + " # boxes: [N, 4] (ymin, xmin, ymax, xmax), scores: [N], labels: [N]\n", + " for box, score, label in zip(nmsed_boxes[batch_idx], nmsed_scores[batch_idx], nmsed_classes[batch_idx]):\n", + " if score > score_threshold:\n", + " box /= ratio\n", + " y_min, x_min, y_max, x_max = box.tolist()\n", + " width = x_max - x_min\n", + " height = y_max - y_min\n", + " result = {\n", + " 'image_id': int(img_id),\n", + " 'category_id': int(label) + 1, # Convert class index (0-based) to COCO category ID (1-based)\n", + " 'bbox': [int(x_min), int(y_min), int(width), int(height)],\n", + " 'score': float(score),\n", + " }\n", + " results.append(result)\n", + "\n", + " # evaluation\n", + " coco_gt = val_dataset.coco\n", + "\n", + " coco_dt = coco_gt.loadRes(results)\n", + " evaluator = COCOeval(coco_gt, coco_dt, iouType='bbox')\n", + " evaluator.evaluate()\n", + " evaluator.accumulate()\n", + " evaluator.summarize()" + ] + }, + { + "cell_type": "markdown", + "id": "029eb053", + "metadata": {}, + "source": [ + "Let's start with the floating-point model evaluation. \n", + "This step may take several minutes..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a557583a", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"evaluating float model(COCO mAP)...\")\n", + "evaluate(full_float_model, val_dataloader,\n", + " score_threshold = SCORE_THR)" + ] + }, + { + "cell_type": "markdown", + "id": "85ef5d0b", + "metadata": {}, + "source": [ + "Finally, let's evaluate the quantized model: \n", + "This step may take several minutes..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5ba05924", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"evaluating quantized model(COCO mAP)...\")\n", + "evaluate(quantized_model, val_dataloader,\n", + " score_threshold = SCORE_THR)" + ] + }, + { + "cell_type": "markdown", + "id": "8b590922", + "metadata": {}, + "source": [ + "## Export and Load the quantized model\n", + "Lastly, we will demonstrate how to export the quantized model into a file and then load it.\n", + "\n", + "We will use `keras_export_model` function to save the quantized model with the integrated custom quantizers into a \".keras\" file format." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05c42212", + "metadata": {}, + "outputs": [], + "source": [ + "# Export a keras model with mctq custom quantizers into a file\n", + "mct.exporter.keras_export_model(model=quantized_model,\n", + " save_model_path='./effdet_keras_mixed_precision_ptq.keras')" + ] + }, + { + "cell_type": "markdown", + "id": "2ac971ae", + "metadata": {}, + "source": [ + "Then, we can load the saved model using `keras_load_quantized_model` function. For this specific case, we'll have to supply the load function with an extra custom layer integrated into the model, namely `SSDPostProcess`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "613e3095", + "metadata": {}, + "outputs": [], + "source": [ + "from edgemdt_cl.keras.object_detection.ssd_post_process import SSDPostProcess\n", + "\n", + "custom_objects = {SSDPostProcess.__name__: SSDPostProcess} # An extra custom layer integrated in the model \n", + "quant_model_from_file = mct.keras_load_quantized_model('./effdet_keras_mixed_precision_ptq.keras', custom_objects=custom_objects)" + ] + }, + { + "cell_type": "markdown", + "id": "1256f29a", + "metadata": {}, + "source": [ + "## Copyrights\n", + "\n", + "Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved.\n", + "\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + "\n", + " http://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "py310-effdettest4 (3.10.12)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tutorials/notebooks/task_notebooks/keras/models/__init__.py b/tutorials/notebooks/task_notebooks/keras/models/__init__.py new file mode 100644 index 000000000..88f8a3322 --- /dev/null +++ b/tutorials/notebooks/task_notebooks/keras/models/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== diff --git a/tutorials/notebooks/task_notebooks/keras/models/efficientdet/__init__.py b/tutorials/notebooks/task_notebooks/keras/models/efficientdet/__init__.py new file mode 100644 index 000000000..5686219fe --- /dev/null +++ b/tutorials/notebooks/task_notebooks/keras/models/efficientdet/__init__.py @@ -0,0 +1,16 @@ +# Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +from models.efficientdet.effdet_keras import EfficientDetKeras \ No newline at end of file diff --git a/tutorials/notebooks/task_notebooks/keras/models/efficientdet/effdet_keras.py b/tutorials/notebooks/task_notebooks/keras/models/efficientdet/effdet_keras.py new file mode 100644 index 000000000..9e0e1004c --- /dev/null +++ b/tutorials/notebooks/task_notebooks/keras/models/efficientdet/effdet_keras.py @@ -0,0 +1,581 @@ +# Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +# The following code was mostly duplicated from https://github.com/rwightman/efficientdet-pytorch +# and changed to generate an equivalent Keras model. +# Main changes: +# * Torch layers replaced with Keras layers +# * removed class inheritance from torch.nn.Module +# * changed "forward" class methods with "__call__" +# * removed processes unused in effdet tutorial. +# ============================================================================== + +import logging +from functools import partial +from typing import List, Optional, Union, Tuple + +import tensorflow as tf + + +gpus = tf.config.list_physical_devices('GPU') +if gpus: + try: + # Currently, memory growth needs to be the same across GPUs + for gpu in gpus: + tf.config.experimental.set_memory_growth(gpu, True) + logical_gpus = tf.config.list_logical_devices('GPU') + print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs") + except RuntimeError as e: + # Memory growth must be set before GPUs have been initialized + print(e) + + +from effdet.anchors import get_feat_sizes +from effdet.config import get_fpn_config, set_config_readonly +from effdet.efficientdet import get_feature_info +from models.efficientdet.effnet_keras import create_model, handle_name +from models.efficientdet.effnet_blocks_keras import create_conv2d, create_pool2d +from models.utils.torch2keras_weights_translation import load_state_dict + +_ACT_LAYER = tf.nn.swish + +# ####################################################################################### +# This file generates the Keras model. It's based on the EfficientDet repository in +# https://github.com/rwightman/efficientdet-pytorch, and switched the Torch Modules +# with Keras layers +# ####################################################################################### + +def get_act_layer(act_type): + if act_type == 'relu6': + return partial(tf.keras.layers.ReLU, max_value=6.0) + else: + raise NotImplemented + + +class ConvBnAct2d: + def __init__( + self, + in_channels, + out_channels, + kernel_size, + stride=1, + dilation=1, + padding='', + bias=False, + norm_layer=tf.keras.layers.BatchNormalization, + act_layer=_ACT_LAYER, + name=None + ): + name = handle_name(name) + self.conv = create_conv2d( + in_channels, + out_channels, + kernel_size, + stride=stride, + dilation=dilation, + padding=padding, + bias=bias, + name=name + '/conv' + ) + self.bn = None if norm_layer is None else norm_layer(name=name + '/bn') + self.act = None if act_layer is None else act_layer() + + def __call__(self, x): + x = self.conv(x) + if self.bn is not None: + x = self.bn(x) + if self.act is not None: + x = self.act(x) + return x + + +class SeparableConv2d: + """ Separable Conv + """ + def __init__( + self, + in_channels, + out_channels, + kernel_size=3, + stride=1, + dilation=1, + padding='', + bias=False, + channel_multiplier=1.0, + pw_kernel_size=1, + norm_layer=tf.keras.layers.BatchNormalization, + act_layer=_ACT_LAYER, + name=None + ): + name = handle_name(name) + self.conv_dw = create_conv2d( + in_channels, + int(in_channels * channel_multiplier), + kernel_size, + stride=stride, + dilation=dilation, + padding=padding, + depthwise=True, + name=name + '/conv_dw' + ) + self.conv_pw = create_conv2d( + int(in_channels * channel_multiplier), + out_channels, + pw_kernel_size, + padding=padding, + bias=bias, + name=name + '/conv_pw' + ) + self.bn = None if norm_layer is None else norm_layer(name=name + '/bn') + self.act = None if act_layer is None else act_layer() + + def __call__(self, x): + x = self.conv_dw(x) + x = self.conv_pw(x) + if self.bn is not None: + x = self.bn(x) + if self.act is not None: + x = self.act(x) + return x + + +class Interpolate2d: + r"""Resamples a 2d Image + + The input data is assumed to be of the form + `minibatch x channels x [optional depth] x [optional height] x width`. + Hence, for spatial inputs, we expect a 4D Tensor and for volumetric inputs, we expect a 5D Tensor. + + The algorithms available for upsampling are nearest neighbor and linear, + bilinear, bicubic and trilinear for 3D, 4D and 5D input Tensor, + respectively. + + One can either give a :attr:`scale_factor` or the target output :attr:`size` to + calculate the output size. (You cannot give both, as it is ambiguous) + + Args: + size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int], optional): + output spatial sizes + scale_factor (float or Tuple[float] or Tuple[float, float] or Tuple[float, float, float], optional): + multiplier for spatial size. Has to match input size if it is a tuple. + mode (str, optional): the upsampling algorithm: one of ``'nearest'``, + ``'linear'``, ``'bilinear'``, ``'bicubic'`` and ``'trilinear'``. + Default: ``'nearest'`` + align_corners (bool, optional): if ``True``, the corner pixels of the input + and output tensors are aligned, and thus preserving the values at + those pixels. This only has effect when :attr:`mode` is + ``'linear'``, ``'bilinear'``, or ``'trilinear'``. Default: ``False`` + """ + __constants__ = ['size', 'scale_factor', 'mode', 'align_corners', 'name'] + name: str + size: Optional[Union[int, Tuple[int, int]]] + scale_factor: Optional[Union[float, Tuple[float, float]]] + mode: str + align_corners: Optional[bool] + + def __init__( + self, + size: Optional[Union[int, Tuple[int, int]]] = None, + scale_factor: Optional[Union[float, Tuple[float, float]]] = None, + mode: str = 'nearest', + align_corners: bool = False, + ) -> None: + self.name = type(self).__name__ + self.size = size + if isinstance(scale_factor, tuple): + self.scale_factor = tuple(float(factor) for factor in scale_factor) + else: + self.scale_factor = float(scale_factor) if scale_factor else None + self.mode = mode + self.align_corners = None if mode == 'nearest' else align_corners + + # tested in keras + assert self.align_corners in [None, False] + assert self.scale_factor is None + if self.mode == 'nearest': + self.mode = tf.image.ResizeMethod.NEAREST_NEIGHBOR + else: + raise NotImplemented + + def __call__(self, input: tf.Tensor) -> tf.Tensor: + return tf.image.resize(input, self.size, method=self.mode) + + +class ResampleFeatureMap: + + def __init__( + self, + in_channels, + out_channels, + input_size, + output_size, + pad_type='', + downsample=None, + upsample=None, + norm_layer=tf.keras.layers.BatchNormalization, + apply_bn=False, + redundant_bias=False, + name=None + ): + name = handle_name(name) + downsample = downsample or 'max' + upsample = upsample or 'nearest' + self.in_channels = in_channels + self.out_channels = out_channels + self.input_size = input_size + self.output_size = output_size + + self.layers = [] + if in_channels != out_channels: + self.layers.append(ConvBnAct2d( + in_channels, + out_channels, + kernel_size=1, + padding=pad_type, + norm_layer=norm_layer if apply_bn else None, + bias=not apply_bn or redundant_bias, + act_layer=None, + name=f'{name}/conv'#/{len(self.layers)}' + )) + + if input_size[0] > output_size[0] and input_size[1] > output_size[1]: + if downsample in ('max', 'avg'): + stride_size_h = int((input_size[0] - 1) // output_size[0] + 1) + stride_size_w = int((input_size[1] - 1) // output_size[1] + 1) + assert stride_size_h == stride_size_w + kernel_size = stride_size_h + 1 + stride = stride_size_h + down_inst = create_pool2d(downsample, kernel_size=kernel_size, stride=stride, padding=pad_type, + name=name + '/downsample') + else: + down_inst = Interpolate2d(size=output_size, mode=downsample, name=name) + self.layers.append(down_inst) + else: + if input_size[0] < output_size[0] or input_size[1] < output_size[1]: + self.layers.append(Interpolate2d(size=output_size, mode=upsample)) # 'upsample' + + def __call__(self, x: tf.Tensor) -> List[tf.Tensor]: + for module in self.layers: + x = module(x) + return x + + +class FpnCombine: + def __init__( + self, + feature_info, + fpn_channels, + inputs_offsets, + output_size, + pad_type='', + downsample=None, + upsample=None, + norm_layer=tf.keras.layers.BatchNormalization, + apply_resample_bn=False, + redundant_bias=False, + weight_method='sum', + name=None + ): + name = handle_name(name) + self.inputs_offsets = inputs_offsets + self.weight_method = weight_method + + self.resample = [] # nn.ModuleDict() + for idx, offset in enumerate(inputs_offsets): + self.resample.append(ResampleFeatureMap( + feature_info[offset]['num_chs'], + fpn_channels, + input_size=feature_info[offset]['size'], + output_size=output_size, + pad_type=pad_type, + downsample=downsample, + upsample=upsample, + norm_layer=norm_layer, + apply_bn=apply_resample_bn, + redundant_bias=redundant_bias, + name = name + f'/resample/{offset}' + )) + + def __call__(self, x: List[tf.Tensor]): + nodes = [] + for offset, resample in zip(self.inputs_offsets, self.resample): + input_node = x[offset] + input_node = resample(input_node) + nodes.append(input_node) + + if self.weight_method == 'sum': + out = tf.keras.layers.Add()(nodes[:2]) + for i in range(2, len(nodes)): + out = tf.keras.layers.Add()([out, nodes[i]]) + else: + raise ValueError('unknown weight_method {}'.format(self.weight_method)) + return out + + +class Fnode: + """ A simple wrapper used in place of nn.Sequential for torchscript typing + Handles input type List[Tensor] -> output type Tensor + """ + def __init__(self, combine, after_combine): + self.combine = combine + self.after_combine = after_combine + + def __call__(self, x: List[tf.Tensor]) -> tf.Tensor: + x = self.combine(x) + for fn in self.after_combine: + x = fn(x) + return x + + +class BiFpnLayer: + def __init__( + self, + feature_info, + feat_sizes, + fpn_config, + fpn_channels, + num_levels=5, + pad_type='', + downsample=None, + upsample=None, + norm_layer=tf.keras.layers.BatchNormalization, + act_layer=_ACT_LAYER, + apply_resample_bn=False, + pre_act=True, + separable_conv=True, + redundant_bias=False, + name=None + ): + name = handle_name(name) + self.num_levels = num_levels + # fill feature info for all FPN nodes (chs and feat size) before creating FPN nodes + fpn_feature_info = feature_info + [ + dict(num_chs=fpn_channels, size=feat_sizes[fc['feat_level']]) for fc in fpn_config.nodes] + + self.fnode = [] # nn.ModuleList() + for i, fnode_cfg in enumerate(fpn_config.nodes): + logging.debug('fnode {} : {}'.format(i, fnode_cfg)) + combine = FpnCombine( + fpn_feature_info, + fpn_channels, + tuple(fnode_cfg['inputs_offsets']), + output_size=feat_sizes[fnode_cfg['feat_level']], + pad_type=pad_type, + downsample=downsample, + upsample=upsample, + norm_layer=norm_layer, + apply_resample_bn=apply_resample_bn, + redundant_bias=redundant_bias, + weight_method=fnode_cfg['weight_method'], + name=f'{name}/fnode/{i}/combine' + ) + + after_combine = [] # nn.Sequential() + conv_kwargs = dict( + in_channels=fpn_channels, + out_channels=fpn_channels, + kernel_size=3, + padding=pad_type, + bias=False, + norm_layer=norm_layer, + act_layer=act_layer, + ) + if pre_act: + conv_kwargs['bias'] = redundant_bias + conv_kwargs['act_layer'] = None + after_combine.append(act_layer()) # 'act' + after_combine.append( + SeparableConv2d(name=f'{name}/fnode/{i}/after_combine/conv', **conv_kwargs) if separable_conv + else ConvBnAct2d(name=f'{name}/fnode/{i}/after_combine/conv', **conv_kwargs)) + + self.fnode.append(Fnode(combine=combine, after_combine=after_combine)) + + self.feature_info = fpn_feature_info[-num_levels::] + + def __call__(self, x: List[tf.Tensor]): + for fn in self.fnode: + x.append(fn(x)) + return x[-self.num_levels::] + + +class BiFpn: + + def __init__(self, config, feature_info, name): + self.num_levels = config.num_levels + norm_layer = config.norm_layer or tf.keras.layers.BatchNormalization + norm_kwargs = {**config.norm_kwargs} + norm_kwargs['epsilon'] = norm_kwargs.pop('eps', 0.001) + if config.norm_kwargs: + norm_layer = partial(norm_layer, **norm_kwargs) + act_layer = get_act_layer(config.act_type) or _ACT_LAYER + fpn_config = config.fpn_config or get_fpn_config( + config.fpn_name, min_level=config.min_level, max_level=config.max_level) + + feat_sizes = get_feat_sizes(config.image_size, max_level=config.max_level) + prev_feat_size = feat_sizes[config.min_level] + self.resample = [] # nn.ModuleDict() + for level in range(config.num_levels): + feat_size = feat_sizes[level + config.min_level] + if level < len(feature_info): + in_chs = feature_info[level]['num_chs'] + feature_info[level]['size'] = feat_size + else: + # Adds a coarser level by downsampling the last feature map + self.resample.append(ResampleFeatureMap( + in_channels=in_chs, + out_channels=config.fpn_channels, + input_size=prev_feat_size, + output_size=feat_size, + pad_type=config.pad_type, + downsample=config.downsample_type, + upsample=config.upsample_type, + norm_layer=norm_layer, + apply_bn=config.apply_resample_bn, + redundant_bias=config.redundant_bias, + name=name + f'/resample/{level}' + )) + in_chs = config.fpn_channels + feature_info.append(dict(num_chs=in_chs, size=feat_size)) + prev_feat_size = feat_size + + self.cell = [] # SequentialList() + for rep in range(config.fpn_cell_repeats): + logging.debug('building cell {}'.format(rep)) + fpn_layer = BiFpnLayer( + feature_info=feature_info, + feat_sizes=feat_sizes, + fpn_config=fpn_config, + fpn_channels=config.fpn_channels, + num_levels=config.num_levels, + pad_type=config.pad_type, + downsample=config.downsample_type, + upsample=config.upsample_type, + norm_layer=norm_layer, + act_layer=act_layer, + separable_conv=config.separable_conv, + apply_resample_bn=config.apply_resample_bn, + pre_act=not config.conv_bn_relu_pattern, + redundant_bias=config.redundant_bias, + name=name + f'/cell/{rep}' + ) + self.cell.append(fpn_layer) + feature_info = fpn_layer.feature_info + + def __call__(self, x: List[tf.Tensor]): + for resample in self.resample: + x.append(resample(x[-1])) + for _cell in self.cell: + x = _cell(x) + return x + + +class HeadNet: + + def __init__(self, config, num_outputs, name): + self.num_levels = config.num_levels + norm_layer = config.norm_layer or tf.keras.layers.BatchNormalization + if config.norm_kwargs: + norm_kwargs = {**config.norm_kwargs} + if 'eps' in norm_kwargs: + eps = norm_kwargs.pop('eps') + norm_kwargs['epsilon'] = eps + norm_layer = partial(norm_layer, **norm_kwargs) + act_type = config.head_act_type if getattr(config, 'head_act_type', None) else config.act_type + act_layer = get_act_layer(act_type) or _ACT_LAYER + + # Build convolution repeats + conv_fn = SeparableConv2d if config.separable_conv else ConvBnAct2d + conv_kwargs = dict( + in_channels=config.fpn_channels, + out_channels=config.fpn_channels, + kernel_size=3, + padding=config.pad_type, + bias=config.redundant_bias, + act_layer=None, + norm_layer=None, + ) + self.conv_rep = [conv_fn(name=f'{name}/conv_rep/{_}', **conv_kwargs) for _ in range(config.box_class_repeats)] + + # Build batchnorm repeats. There is a unique batchnorm per feature level for each repeat. + # This can be organized with repeats first or feature levels first in module lists, the original models + # and weights were setup with repeats first, levels first is required for efficient torchscript usage. + self.bn_rep = [] # nn.ModuleList() + for _ in range(config.box_class_repeats): + self.bn_rep.append([norm_layer(name=f'{name}/bn_rep/{_}/{_level}/bn') for _level in range(self.num_levels)]) + + self.act = act_layer + + # Prediction (output) layer. Has bias with special init reqs, see init fn. + num_anchors = len(config.aspect_ratios) * config.num_scales + predict_kwargs = dict( + in_channels=config.fpn_channels, + out_channels=num_outputs * num_anchors, + kernel_size=3, + padding=config.pad_type, + bias=True, + norm_layer=None, + act_layer=None, + name=f'{name}/predict' + ) + self.predict = conv_fn(**predict_kwargs) + + def _forward(self, x: List[tf.Tensor]) -> List[tf.Tensor]: + outputs = [] + for level in range(self.num_levels): + x_level = x[level] + for conv, bn in zip(self.conv_rep, self.bn_rep): + x_level = conv(x_level) + x_level = bn[level](x_level) # this is not allowed in torchscript + x_level = self.act()(x_level) + outputs.append(self.predict(x_level)) + return outputs + + def __call__(self, x: List[tf.Tensor]) -> List[tf.Tensor]: + return self._forward(x) + + +class EfficientDetKeras: + + def __init__(self, config, pretrained_backbone=True, alternate_init=False): + self.config = config + set_config_readonly(self.config) + self.backbone = create_model( + config.backbone_name, + features_only=True, + out_indices=self.config.backbone_indices or (2, 3, 4), + pretrained=pretrained_backbone, + **config.backbone_args, + ) + feature_info = get_feature_info(self.backbone) + self.fpn = BiFpn(self.config, feature_info, 'fpn') + self.class_net = HeadNet(self.config, num_outputs=self.config.num_classes, name='class_net') + self.box_net = HeadNet(self.config, num_outputs=4, name='box_net') + + def get_model(self, input_shape, load_state_dict_to_model=True): + _input = tf.keras.layers.Input(shape=input_shape) + x = self.backbone(_input) + x = self.fpn(x) + x_class = self.class_net(x) + x_box = self.box_net(x) + + x_class = [tf.keras.layers.Reshape((-1, self.config.num_classes))(_x) for _x in x_class] + x_class = tf.keras.layers.Concatenate(axis=1)(x_class) + x_box = [tf.keras.layers.Reshape((-1, 4))(_x) for _x in x_box] + x_box = tf.keras.layers.Concatenate(axis=1)(x_box) + + model = tf.keras.Model(inputs=_input, outputs=[x_class, x_box]) + if load_state_dict_to_model: + load_state_dict(model, self.config.url) + return model diff --git a/tutorials/notebooks/task_notebooks/keras/models/efficientdet/effnet_blocks_keras.py b/tutorials/notebooks/task_notebooks/keras/models/efficientdet/effnet_blocks_keras.py new file mode 100644 index 000000000..223c3483d --- /dev/null +++ b/tutorials/notebooks/task_notebooks/keras/models/efficientdet/effnet_blocks_keras.py @@ -0,0 +1,261 @@ +# Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +# The following code was mostly duplicated from https://github.com/huggingface/pytorch-image-models +# and changed to generate an equivalent Keras model. +# Main changes: +# * Torch layers replaced with Keras layers +# * removed class inheritance from torch.nn.Module +# * changed "forward" class methods with "__call__" +# * removed processes unused in effdet tutorial. +# ============================================================================== + +import types +from functools import partial +import tensorflow as tf + +from timm.layers import DropPath, make_divisible + +__all__ = [ + 'DepthwiseSeparableConv', 'InvertedResidual'] + + +def handle_name(_name): + return '' if _name is None or _name == '' else _name + + +def num_groups(group_size, channels): + if not group_size: # 0 or None + return 1 # normal conv with 1 group + else: + # NOTE group_size == 1 -> depthwise conv + assert channels % group_size == 0 + return channels // group_size + + +def get_attn(attn_type): + if isinstance(attn_type, tf.keras.layers.Layer): + return attn_type + module_cls = None + if attn_type: + module_cls = attn_type + return module_cls + + +def create_conv2d_pad(in_chs, out_chs, kernel_size, **kwargs): + padding = kwargs.pop('padding', '') + s = kwargs.pop('stride', None) + if s is not None: + kwargs.update({'strides': s}) + d = kwargs.pop('dilation', None) + if d is not None: + kwargs.update({'dilation_rate': d}) + assert padding in ['valid', 'same'], 'Not Implemented' + kwargs.setdefault('use_bias', kwargs.pop('bias', False)) + if kwargs.get('groups', -1) == in_chs: + kwargs.pop('groups', None) + return tf.keras.layers.DepthwiseConv2D(kernel_size, padding=padding, **kwargs) + else: + return tf.keras.layers.Conv2D(out_chs, kernel_size, padding=padding, **kwargs) + + +def create_pool2d(pool_type, kernel_size, stride=None, **kwargs): + stride = stride or kernel_size + padding = kwargs.pop('padding', '') + padding = padding.lower() + if pool_type == 'max': + # return MaxPool2dSame(kernel_size, stride=stride, **kwargs) + return tf.keras.layers.MaxPooling2D(kernel_size, strides=stride, padding=padding.lower()) + else: + assert False, f'Unsupported pool type {pool_type}' + + +def create_conv2d(in_channels, out_channels, kernel_size, **kwargs): + """ Select a 2d convolution implementation based on arguments + Creates and returns one of torch.nn.Conv2d, Conv2dSame, MixedConv2d, or CondConv2d. + + Used extensively by EfficientNet, MobileNetv3 and related networks. + """ + depthwise = kwargs.pop('depthwise', False) + # for DW out_channels must be multiple of in_channels as must have out_channels % groups == 0 + groups = in_channels if depthwise else kwargs.pop('groups', 1) + m = create_conv2d_pad(in_channels, out_channels, kernel_size, groups=groups, **kwargs) + return m + + + +class DepthwiseSeparableConv: + """ DepthwiseSeparable block + Used for DS convs in MobileNet-V1 and in the place of IR blocks that have no expansion + (factor of 1.0). This is an alternative to having a IR with an optional first pw conv. + """ + def __init__( + self, in_chs, out_chs, dw_kernel_size=3, stride=1, dilation=1, group_size=1, pad_type='', + noskip=False, pw_kernel_size=1, pw_act=False, act_layer=tf.keras.layers.ReLU, + norm_layer=tf.keras.layers.BatchNormalization, se_layer=None, drop_path_rate=0., name=None): + norm_act_layer = get_norm_act_layer(norm_layer, act_layer) + groups = num_groups(group_size, in_chs) + self.has_skip = (stride == 1 and in_chs == out_chs) and not noskip + self.has_pw_act = pw_act # activation after point-wise conv + + self.conv_dw = create_conv2d( + in_chs, in_chs, dw_kernel_size, stride=stride, dilation=dilation, padding=pad_type, + groups=groups, name=name + '/conv_dw') + self.bn1 = norm_act_layer(in_chs, name=name + '/bn1') + + # Squeeze-and-excitation + self.se = se_layer(in_chs, act_layer=act_layer, name=name + '/se') if se_layer else None + + self.conv_pw = create_conv2d(in_chs, out_chs, pw_kernel_size, padding=pad_type, name=name + '/conv_pw') + self.bn2 = norm_act_layer(out_chs, inplace=True, apply_act=self.has_pw_act, name=name + '/bn2') + self.drop_path = DropPath(drop_path_rate) if drop_path_rate else None + + def feature_info(self, location): + if location == 'expansion': # after SE, input to PW + return dict(module='conv_pw', hook_type='forward_pre', num_chs=self.conv_pw.in_channels) + else: # location == 'bottleneck', block output + return dict(module='', num_chs=self.conv_pw.filters) + + def __call__(self, x): + shortcut = x + x = self.conv_dw(x) + x = self.bn1(x) + if self.se is not None: + x = self.se(x) + x = self.conv_pw(x) + x = self.bn2(x) + if self.has_skip: + if self.drop_path is not None: + x = self.drop_path(x) + x = x + shortcut + return x + + +class InvertedResidual: + """ Inverted residual block w/ optional SE + + Originally used in MobileNet-V2 - https://arxiv.org/abs/1801.04381v4, this layer is often + referred to as 'MBConv' for (Mobile inverted bottleneck conv) and is also used in + * MNasNet - https://arxiv.org/abs/1807.11626 + * EfficientNet - https://arxiv.org/abs/1905.11946 + * MobileNet-V3 - https://arxiv.org/abs/1905.02244 + """ + + def __init__( + self, in_chs, out_chs, dw_kernel_size=3, stride=1, dilation=1, group_size=1, pad_type='', + noskip=False, exp_ratio=1.0, exp_kernel_size=1, pw_kernel_size=1, act_layer=tf.keras.layers.ReLU, + norm_layer=tf.keras.layers.BatchNormalization, se_layer=None, conv_kwargs=None, drop_path_rate=0., + name=None): + norm_act_layer = get_norm_act_layer(norm_layer, act_layer) + conv_kwargs = conv_kwargs or {} + mid_chs = make_divisible(in_chs * exp_ratio) + groups = num_groups(group_size, mid_chs) + self.has_skip = (in_chs == out_chs and stride == 1) and not noskip + + # Point-wise expansion + self.conv_pw = create_conv2d(in_chs, mid_chs, exp_kernel_size, padding=pad_type, name=name + '/conv_pw', **conv_kwargs) + self.bn1 = norm_act_layer(mid_chs, name=name + '/bn1') + + # Depth-wise convolution + self.conv_dw = create_conv2d( + mid_chs, mid_chs, dw_kernel_size, stride=stride, dilation=dilation, + groups=groups, padding=pad_type, name=name + '/conv_dw', **conv_kwargs) + self.bn2 = norm_act_layer(mid_chs, name=name + '/bn2') + + # Squeeze-and-excitation + self.se = se_layer(mid_chs, act_layer=act_layer) if se_layer else None + + # Point-wise linear projection + self.conv_pwl = create_conv2d(mid_chs, out_chs, pw_kernel_size, padding=pad_type, + name=name + '/conv_pwl', **conv_kwargs) + self.bn3 = norm_act_layer(out_chs, apply_act=False, name=name + '/bn3') + self.drop_path = DropPath(drop_path_rate) if drop_path_rate else None + + def feature_info(self, location): + if location == 'expansion': # after SE, input to PWL + return dict(module='conv_pwl', hook_type='forward_pre', num_chs=self.conv_pwl.in_channels) + else: # location == 'bottleneck', block output + return dict(module='', num_chs=self.conv_pwl.filters) + + def __call__(self, x): + shortcut = x + x = self.conv_pw(x) + x = self.bn1(x) + x = self.conv_dw(x) + x = self.bn2(x) + if self.se is not None: + x = self.se(x) + x = self.conv_pwl(x) + x = self.bn3(x) + if self.has_skip: + if self.drop_path is not None: + x = self.drop_path(x) + x = x + shortcut + return x + + +class BatchNormAct2d: + """BatchNorm + Activation + + This module performs BatchNorm + Activation in a manner that will remain backwards + compatible with weights trained with separate bn, act. This is why we inherit from BN + instead of composing it as a .bn member. + """ + def __init__( + self, + num_features, + epsilon=1e-5, + momentum=0.1, + affine=True, + track_running_stats=True, + apply_act=True, + act_layer=tf.keras.layers.ReLU, + act_kwargs=None, + inplace=True, + drop_layer=None, + device=None, + dtype=None, + name=None + ): + assert affine, 'Not Implemented' + self.bn = tf.keras.layers.BatchNormalization(momentum=momentum, epsilon=epsilon, name=name) + if act_kwargs is None: + act_kwargs = {} + self.act = act_layer(**act_kwargs) if apply_act else None + + def __call__(self, x): + x = self.bn(x) + if self.act is not None: + x = self.act(x) + return x + + +def get_norm_act_layer(norm_layer, act_layer=None): + assert isinstance(norm_layer, (type, str, types.FunctionType, partial)) + # assert act_layer is None or isinstance(act_layer, (type, str, types.FunctionType, partial)) + norm_act_kwargs = {} + + # unbind partial fn, so args can be rebound later + if isinstance(norm_layer, partial): + norm_act_kwargs.update(norm_layer.keywords) + norm_layer = norm_layer.func + + type_name = norm_layer.__name__.lower() + if type_name.startswith('batchnormalization'): + norm_act_layer = BatchNormAct2d + + norm_act_kwargs.setdefault('act_layer', act_layer) + norm_act_layer = partial(norm_act_layer, **norm_act_kwargs) # bind/rebind args + return norm_act_layer diff --git a/tutorials/notebooks/task_notebooks/keras/models/efficientdet/effnet_keras.py b/tutorials/notebooks/task_notebooks/keras/models/efficientdet/effnet_keras.py new file mode 100644 index 000000000..70671e984 --- /dev/null +++ b/tutorials/notebooks/task_notebooks/keras/models/efficientdet/effnet_keras.py @@ -0,0 +1,410 @@ +# Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +# The following code was mostly duplicated from https://github.com/huggingface/pytorch-image-models +# and changed to generate an equivalent Keras model. +# Main changes: +# * Torch layers replaced with Keras layers +# * removed class inheritance from torch.nn.Module +# * changed "forward" class methods with "__call__" +# * removed processes unused in effdet tutorial. +# ============================================================================== + +from functools import partial +from typing import Any, Dict, Optional, Union, List +import tensorflow as tf +from timm.models import parse_model_name, split_model_name_tag, build_model_with_cfg, FeatureInfo +from timm.models._efficientnet_builder import BN_EPS_TF_DEFAULT, decode_arch_def, round_channels +from timm.models._builder import pretrained_cfg_for_features + +from models.efficientdet.effnet_blocks_keras import create_conv2d, get_attn, \ + handle_name, get_norm_act_layer, InvertedResidual, DepthwiseSeparableConv + + +__all__ = ["EfficientNetBuilder", "decode_arch_def", "efficientnet_init_weights", + 'resolve_bn_args', 'resolve_act_layer', 'round_channels', 'BN_MOMENTUM_TF_DEFAULT', 'BN_EPS_TF_DEFAULT'] + + +# ####################################################################################### +# This file generates the Keras model. It's based on the EfficientNet code in the timm +# repository, and switched the Torch Modules with Keras layers +# ####################################################################################### + + +class EfficientNetBuilder: + """ Build Trunk Blocks + + This ended up being somewhat of a cross between + https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/mnasnet_models.py + and + https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/maskrcnn_benchmark/modeling/backbone/fbnet_builder.py + + """ + def __init__(self, output_stride=32, pad_type='', round_chs_fn=round_channels, se_from_exp=False, + act_layer=None, norm_layer=None, se_layer=None, drop_path_rate=0., feature_location=''): + self.output_stride = output_stride + self.pad_type = pad_type + self.round_chs_fn = round_chs_fn + self.se_from_exp = se_from_exp # calculate se channel reduction from expanded (mid) chs + self.act_layer = act_layer + self.norm_layer = norm_layer + self.se_layer = get_attn(se_layer) + try: + self.se_layer(8, rd_ratio=1.0) # test if attn layer accepts rd_ratio arg + self.se_has_ratio = True + except TypeError: + self.se_has_ratio = False + self.drop_path_rate = drop_path_rate + self.feature_location = feature_location + assert feature_location in ('bottleneck', 'expansion', '') + + # state updated during build, consumed by model + self.in_chs = None + self.features = [] + + def _make_block(self, ba, block_idx, block_count, name): + drop_path_rate = self.drop_path_rate * block_idx / block_count + bt = ba.pop('block_type') + ba['name'] = name + ba['in_chs'] = self.in_chs + ba['out_chs'] = self.round_chs_fn(ba['out_chs']) + ba['pad_type'] = self.pad_type + # block act fn overrides the model default + ba['act_layer'] = ba['act_layer'] if ba['act_layer'] is not None else self.act_layer + assert ba['act_layer'] is not None + ba['norm_layer'] = self.norm_layer + ba['drop_path_rate'] = drop_path_rate + if bt != 'cn': + ba.pop('se_ratio') + + if bt == 'ir': + block = InvertedResidual(**ba) + elif bt == 'ds' or bt == 'dsa': + block = DepthwiseSeparableConv(**ba) + else: + assert False, 'Uknkown block type (%s) while building model.' % bt + + self.in_chs = ba['out_chs'] # update in_chs for arg of next block + return block + + def __call__(self, in_chs, model_block_args, name=None): + """ Build the blocks + Args: + in_chs: Number of input-channels passed to first block + model_block_args: A list of lists, outer list defines stages, inner + list contains strings defining block configuration(s) + Return: + List of block stacks (each stack wrapped in nn.Sequential) + """ + name = handle_name(name) + self.in_chs = in_chs + total_block_count = sum([len(x) for x in model_block_args]) + total_block_idx = 0 + current_stride = 2 + current_dilation = 1 + stages = [] + + # outer list of block_args defines the stacks + for stack_idx, stack_args in enumerate(model_block_args): + assert isinstance(stack_args, list) + + blocks = [] + # each stack (stage of blocks) contains a list of block arguments + for block_idx, block_args in enumerate(stack_args): + last_block = block_idx + 1 == len(stack_args) + assert block_args['stride'] in (1, 2) + if block_idx >= 1: # only the first block in any stack can have a stride > 1 + block_args['stride'] = 1 + + extract_features = False + if last_block: + next_stack_idx = stack_idx + 1 + extract_features = next_stack_idx >= len(model_block_args) or \ + model_block_args[next_stack_idx][0]['stride'] > 1 + + next_dilation = current_dilation + if block_args['stride'] > 1: + next_output_stride = current_stride * block_args['stride'] + current_stride = next_output_stride + block_args['dilation'] = current_dilation + if next_dilation != current_dilation: + current_dilation = next_dilation + + # create the block + block = self._make_block(block_args, total_block_idx, total_block_count, f'{name}/{stack_idx}/{block_idx}') + blocks.append(block) + + # stash feature module name and channel info for model feature extraction + if extract_features: + feature_info = dict( + stage=stack_idx + 1, + reduction=current_stride, + **block.feature_info(self.feature_location), + ) + feature_info['module'] = f'blocks.{stack_idx}' + self.features.append(feature_info) + + total_block_idx += 1 # incr global block idx (across all stacks) + stages.append(blocks) + return stages + + +class EfficientNetFeatures: + """ EfficientNet Feature Extractor + + A work-in-progress feature extraction module for EfficientNet, to use as a backbone for segmentation + and object detection models. + """ + + def __init__( + self, + block_args, + out_indices=(0, 1, 2, 3, 4), + feature_location='bottleneck', + in_chans=3, + stem_size=32, + fix_stem=False, + output_stride=32, + pad_type='', + round_chs_fn=round_channels, + act_layer=None, + norm_layer=None, + se_layer=None, + drop_rate=0., + drop_path_rate=0., + name=None + ): + name = handle_name(name) + act_layer = act_layer or tf.keras.layers.ReLU + norm_layer = norm_layer or tf.keras.layers.BatchNormalization + norm_act_layer = get_norm_act_layer(norm_layer, act_layer) + se_layer = se_layer + self.drop_rate = drop_rate + self.grad_checkpointing = False + + # Stem + self.conv_stem = create_conv2d(in_chans, stem_size, 3, stride=2, padding=pad_type, name=name + '/conv_stem') + self.bn1 = norm_act_layer(stem_size, name=name + '/bn1') + + # Middle stages (IR/ER/DS Blocks) + builder = EfficientNetBuilder( + output_stride=output_stride, + pad_type=pad_type, + round_chs_fn=round_chs_fn, + act_layer=act_layer, + norm_layer=norm_layer, + se_layer=se_layer, + drop_path_rate=drop_path_rate, + feature_location=feature_location, + ) + self.blocks = builder(stem_size, block_args, name=name + '/blocks') + self.feature_info = FeatureInfo(builder.features, out_indices) + self._stage_out_idx = {f['stage']: f['index'] for f in self.feature_info.get_dicts()} + + # efficientnet_init_weights(self) + + # Register feature extraction hooks with FeatureHooks helper + self.feature_hooks = None + + def __call__(self, x) -> List[tf.Tensor]: + x = self.conv_stem(x) + x = self.bn1(x) + features = [] + for i, b in enumerate(self.blocks): + for bb in b: + # print(i, type(b), type(bb)) + x = bb(x) + if i + 1 in self._stage_out_idx: + features.append(x) + return features + + +def _create_effnet(variant, pretrained=False, **kwargs): + features_mode = '' + model_cls = None # EfficientNet + kwargs_filter = None + if kwargs.pop('features_only', False): + if 'feature_cfg' in kwargs: + features_mode = 'cfg' + else: + kwargs_filter = ('num_classes', 'num_features', 'head_conv', 'global_pool') + model_cls = EfficientNetFeatures + features_mode = 'cls' + else: + raise NotImplemented + + model = build_model_with_cfg( + model_cls, + variant, + pretrained, + features_only=features_mode == 'cfg', + pretrained_strict=features_mode != 'cls', + kwargs_filter=kwargs_filter, + **kwargs, + ) + model.pretrained_cfg = model.default_cfg = pretrained_cfg_for_features(model.pretrained_cfg) + return model + + +def resolve_bn_args(kwargs): + bn_args = {} + bn_eps = kwargs.pop('bn_eps', None) + if bn_eps is not None: + bn_args['epsilon'] = bn_eps + return bn_args + + +def resolve_act_layer(kwargs, default='relu'): + act_name = kwargs.pop('act_layer', default) + if act_name == 'relu': + return tf.keras.layers.ReLU + elif act_name == 'relu6': + return partial(tf.keras.layers.ReLU, max_value=6.0) + else: + raise NotImplemented + + +def _gen_efficientnet_lite(variant, channel_multiplier=1.0, depth_multiplier=1.0, pretrained=False, **kwargs): + """Creates an EfficientNet-Lite model. + + Ref impl: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite + Paper: https://arxiv.org/abs/1905.11946 + + EfficientNet params + name: (channel_multiplier, depth_multiplier, resolution, dropout_rate) + 'efficientnet-lite0': (1.0, 1.0, 224, 0.2), + 'efficientnet-lite1': (1.0, 1.1, 240, 0.2), + 'efficientnet-lite2': (1.1, 1.2, 260, 0.3), + 'efficientnet-lite3': (1.2, 1.4, 280, 0.3), + 'efficientnet-lite4': (1.4, 1.8, 300, 0.3), + + Args: + channel_multiplier: multiplier to number of channels per layer + depth_multiplier: multiplier to number of repeats per stage + """ + arch_def = [ + ['ds_r1_k3_s1_e1_c16'], + ['ir_r2_k3_s2_e6_c24'], + ['ir_r2_k5_s2_e6_c40'], + ['ir_r3_k3_s2_e6_c80'], + ['ir_r3_k5_s1_e6_c112'], + ['ir_r4_k5_s2_e6_c192'], + ['ir_r1_k3_s1_e6_c320'], + ] + model_kwargs = dict( + block_args=decode_arch_def(arch_def, depth_multiplier, fix_first_last=True), + num_features=1280, + stem_size=32, + fix_stem=True, + round_chs_fn=partial(round_channels, multiplier=channel_multiplier), + act_layer=resolve_act_layer(kwargs, 'relu6'), + norm_layer=kwargs.pop('norm_layer', None) or partial(tf.keras.layers.BatchNormalization, **resolve_bn_args(kwargs)), + **kwargs, + ) + model = _create_effnet(variant, pretrained, **model_kwargs) + return model + + +def tf_efficientnet_lite0(pretrained=False, **kwargs): + """ EfficientNet-Lite0 """ + # NOTE for train, drop_rate should be 0.2, drop_path_rate should be 0.2 + kwargs['bn_eps'] = BN_EPS_TF_DEFAULT + kwargs['pad_type'] = 'same' + kwargs['name'] = 'backbone' + model = _gen_efficientnet_lite( + 'tf_efficientnet_lite0', channel_multiplier=1.0, depth_multiplier=1.0, pretrained=pretrained, **kwargs) + return model + + +model_entrypoints = {'tf_efficientnet_lite0': tf_efficientnet_lite0} + + +def create_model( + model_name: str, + pretrained: bool = False, + pretrained_cfg: Optional[Union[str, Dict[str, Any], Any]] = None, + pretrained_cfg_overlay: Optional[Dict[str, Any]] = None, + checkpoint_path: str = '', + scriptable: Optional[bool] = None, + exportable: Optional[bool] = None, + no_jit: Optional[bool] = None, + **kwargs, +): + """Create a model. + + Lookup model's entrypoint function and pass relevant args to create a new model. + + + **kwargs will be passed through entrypoint fn to ``timm.models.build_model_with_cfg()`` + and then the model class __init__(). kwargs values set to None are pruned before passing. + + + Args: + model_name: Name of model to instantiate. + pretrained: If set to `True`, load pretrained ImageNet-1k weights. + pretrained_cfg: Pass in an external pretrained_cfg for model. + pretrained_cfg_overlay: Replace key-values in base pretrained_cfg with these. + checkpoint_path: Path of checkpoint to load _after_ the model is initialized. + scriptable: Set layer config so that model is jit scriptable (not working for all models yet). + exportable: Set layer config so that model is traceable / ONNX exportable (not fully impl/obeyed yet). + no_jit: Set layer config so that model doesn't utilize jit scripted layers (so far activations only). + + Keyword Args: + drop_rate (float): Classifier dropout rate for training. + drop_path_rate (float): Stochastic depth drop rate for training. + global_pool (str): Classifier global pooling type. + + Example: + + ```py + >>> from timm import create_model + + >>> # Create a MobileNetV3-Large model with no pretrained weights. + >>> model = create_model('mobilenetv3_large_100') + + >>> # Create a MobileNetV3-Large model with pretrained weights. + >>> model = create_model('mobilenetv3_large_100', pretrained=True) + >>> model.num_classes + 1000 + + >>> # Create a MobileNetV3-Large model with pretrained weights and a new head with 10 classes. + >>> model = create_model('mobilenetv3_large_100', pretrained=True, num_classes=10) + >>> model.num_classes + 10 + ``` + """ + # Parameters that aren't supported by all models or are intended to only override model defaults if set + # should default to None in command line args/cfg. Remove them if they are present and not set so that + # non-supporting models don't break and default args remain in effect. + kwargs = {k: v for k, v in kwargs.items() if v is not None} + + model_source, model_name = parse_model_name(model_name) + model_name, pretrained_tag = split_model_name_tag(model_name) + if pretrained_tag and not pretrained_cfg: + # a valid pretrained_cfg argument takes priority over tag in model name + pretrained_cfg = pretrained_tag + + model_name, pretrained_tag = split_model_name_tag(model_name) + + create_fn = model_entrypoints[model_name] + # with set_layer_config(scriptable=scriptable, exportable=exportable, no_jit=no_jit): + model = create_fn( + pretrained=pretrained, + pretrained_cfg=pretrained_cfg, + pretrained_cfg_overlay=pretrained_cfg_overlay, + **kwargs, + ) + + return model diff --git a/tutorials/notebooks/task_notebooks/keras/models/utils/__init__.py b/tutorials/notebooks/task_notebooks/keras/models/utils/__init__.py new file mode 100644 index 000000000..88f8a3322 --- /dev/null +++ b/tutorials/notebooks/task_notebooks/keras/models/utils/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== diff --git a/tutorials/notebooks/task_notebooks/keras/models/utils/torch2keras_weights_translation.py b/tutorials/notebooks/task_notebooks/keras/models/utils/torch2keras_weights_translation.py new file mode 100644 index 000000000..e29d96a45 --- /dev/null +++ b/tutorials/notebooks/task_notebooks/keras/models/utils/torch2keras_weights_translation.py @@ -0,0 +1,92 @@ +# Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +from typing import Dict +import tensorflow as tf +import torch +import numpy as np + +def weight_translation(keras_name: str, pytorch_weights_dict: Dict[str, np.ndarray], + layer: tf.keras.layers.Layer) -> np.ndarray: + """ + Convert a keras weight name format to torch naming format, so the value of the weight can be + retrieved from the Torch model state_dict. + + For example: + * Keras name: model_name/layer_name/kernel:0 + is translated to: + * Torch name: model_name.layer_name.weight + + Args: + keras_name (str): keras weight name + pytorch_weights_dict (Dict[str, np.ndarray]): the Torch model state_dict, as {name_str: weight value as numpy array} + layer (tf.keras.layers.Layer): the Keras layer of the weight + + Returns: + np.ndarray: the weight value as a numpy array + + """ + keras_name = keras_name.replace('/', '.') + # Handle Convolution layers + if '.depthwise_kernel:0' in keras_name: + value = pytorch_weights_dict.pop(keras_name.replace(".depthwise_kernel:0", ".weight")).transpose((2, 3, 0, 1)) + elif '.kernel:0' in keras_name: + value = pytorch_weights_dict.pop(keras_name.replace(".kernel:0", ".weight")) + value = value.transpose((2, 3, 1, 0)) + elif '.bias:0' in keras_name: + value = pytorch_weights_dict.pop(keras_name.replace(".bias:0", ".bias")) + + # Handle normalization layers + elif '.beta:0' in keras_name: + value = pytorch_weights_dict.pop(keras_name.replace(".beta:0", ".bias")) + elif '.gamma:0' in keras_name: + value = pytorch_weights_dict.pop(keras_name.replace(".gamma:0", ".weight")) + elif '.moving_mean:0' in keras_name: + value = pytorch_weights_dict.pop(keras_name.replace(".moving_mean:0", ".running_mean")) + elif '.moving_variance:0' in keras_name: + value = pytorch_weights_dict.pop(keras_name.replace(".moving_variance:0", ".running_var")) + else: + value = pytorch_weights_dict.pop(keras_name) + return value + + +def load_state_dict(model: tf.keras.Model, state_dict_url: str = None, + state_dict_torch: Dict = None): + """ + Assign a Keras model weights according to a state_dict from the equivalent Torch model. + Args: + model (tf.keras.Model): A Keras model + state_dict_url (str): the Torch model state_dict location + state_dict_torch(Dict[str, np.ndarray]): Torch model state_dict. If not None, will be used instead of state_dict_url + + Returns: + tf.keras.Model: The same model object after assigning the weights(model) + + """ + if state_dict_torch is None: + assert state_dict_url is not None, "either 'state_dict_url' or 'state_dict_torch' should not be None" + state_dict_torch = torch.hub.load_state_dict_from_url(state_dict_url, progress=False, + map_location='cpu') + state_dict = {k: v.numpy() for k, v in state_dict_torch.items()} + + for layer in model.layers: + for w in layer.weights: + w.assign(weight_translation(w.name, state_dict, layer)) + + # look for variables not assigned in torch state dict + for k in state_dict: + if 'num_batches_tracked' in k: + continue + print(f' WARNING: {k} not assigned to keras model !!!')