diff --git a/README.md b/README.md index 34b658758..43da6db81 100644 --- a/README.md +++ b/README.md @@ -1,111 +1,119 @@ -# Detectron +# Detectron Transfer Learning with PASCAL VOC 2007 dataset -Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including [Mask R-CNN](https://arxiv.org/abs/1703.06870). It is written in Python and powered by the [Caffe2](https://github.com/caffe2/caffe2) deep learning framework. +** Detectron implemented several object detecton algorithms. All the algorithms are trained on coco 2014 data set which has 80 categories. I want to fine tune the faster-rcnn with FPN on pascal voc 2007 dataset which has only 20 categories. The same way can be used to fine tune your own model on a new dataset ** -At FAIR, Detectron has enabled numerous research projects, including: [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144), [Mask R-CNN](https://arxiv.org/abs/1703.06870), [Detecting and Recognizing Human-Object Interactions](https://arxiv.org/abs/1704.07333), [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002), [Non-local Neural Networks](https://arxiv.org/abs/1711.07971), [Learning to Segment Every Thing](https://arxiv.org/abs/1711.10370), and [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/abs/1712.04440). - -
- -

Example Mask R-CNN output.

-
- -## Introduction - -The goal of Detectron is to provide a high-quality, high-performance -codebase for object detection *research*. It is designed to be flexible in order -to support rapid implementation and evaluation of novel research. Detectron -includes implementations of the following object detection algorithms: - -- [Mask R-CNN](https://arxiv.org/abs/1703.06870) -- *Marr Prize at ICCV 2017* -- [RetinaNet](https://arxiv.org/abs/1708.02002) -- *Best Student Paper Award at ICCV 2017* -- [Faster R-CNN](https://arxiv.org/abs/1506.01497) -- [RPN](https://arxiv.org/abs/1506.01497) -- [Fast R-CNN](https://arxiv.org/abs/1504.08083) -- [R-FCN](https://arxiv.org/abs/1605.06409) - -using the following backbone network architectures: - -- [ResNeXt{50,101,152}](https://arxiv.org/abs/1611.05431) -- [ResNet{50,101,152}](https://arxiv.org/abs/1512.03385) -- [Feature Pyramid Networks](https://arxiv.org/abs/1612.03144) (with ResNet/ResNeXt) -- [VGG16](https://arxiv.org/abs/1409.1556) - -Additional backbone architectures may be easily implemented. For more details about these models, please see [References](#references) below. - -## License - -Detectron is released under the [Apache 2.0 license](https://github.com/facebookresearch/detectron/blob/master/LICENSE). See the [NOTICE](https://github.com/facebookresearch/detectron/blob/master/NOTICE) file for additional details. - -## Citing Detectron - -If you use Detectron in your research or wish to refer to the baseline results published in the [Model Zoo](MODEL_ZOO.md), please use the following BibTeX entry. +## 1. Setup caffe2 and Detectron and run the Detectron demo successfully. +I will refer the Detectron directory as $DETECTRON +## 2. Download the pre-trained model +The code will download the models automatically. But my internet is slow and I'd like to download them before I run the code. +Becase I'm gonna using the ResNet-50 as the backbone, so I need to download the ResNet and faster_rcnn_R-50-FPN model. ``` -@misc{Detectron2018, - author = {Ross Girshick and Ilija Radosavovic and Georgia Gkioxari and - Piotr Doll\'{a}r and Kaiming He}, - title = {Detectron}, - howpublished = {\url{https://github.com/facebookresearch/detectron}}, - year = {2018} -} +wget https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl /tmp/detectron/detectron-download-cache/ImageNetPretrained/MSRA/R-50.pkl +wget https://s3-us-west-2.amazonaws.com/detectron/36225732/12_2017_baselines/mask_rcnn_R-50-FPN_2x.yaml.08_43_08.gDqBz9zS/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl ``` -## Model Zoo and Baselines +## 3. Prepare configuration file. +### a. Copy the sample configure file from $DETECTRON/configs/getting_started +``` +cd $DETECTORN +mkdir experiments && cd experiments +cp ../configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml e2e_faster_rcnn_resnet-50-FPN_pascal2007.yaml +``` +### b. Change the configuration file +``` +MODEL: + TYPE: generalized_rcnn + CONV_BODY: FPN.add_fpn_ResNet50_conv5_body + NUM_CLASSES: 21 + FASTER_RCNN: True +``` +The pascal voc 2007 has only 20 classes plus one background class. So the NUM_CLASSES is set to 21. -We provide a large set of baseline results and trained models available for download in the [Detectron Model Zoo](MODEL_ZOO.md). +``` +TRAIN: + SNAPSHOT_ITERS: 5000 + WEIGHTS: /tmp/detectron/detectron-download-cache/35857389/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_2x.yaml.01_37_22.KSeq0b5q/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl + DATASETS: ('voc_2007_train',) +``` +Change the WEIGHTS value to where you just place in step 2. -## Installation +## 4. Download pascal voc2007 and coco format annotations. +Refer [data readme file](https://github.com/facebookresearch/Detectron/blob/master/lib/datasets/data/README.md) to prepare the pascal data set. -Please find installation instructions for Caffe2 and Detectron in [`INSTALL.md`](INSTALL.md). +The code support pascal data set has bug, the code in $DETECTRON/lib/datasets/dataset_catalog.py should be changed as following: +``` + 'voc_2007_train': { + IM_DIR: + _DATA_DIR + '/VOC2007/JPEGImages', + ANN_FN: + _DATA_DIR + '/VOC2007/annotations/pascal_train2007.json', + DEVKIT_DIR: + _DATA_DIR + '/VOC2007/VOCdevkit2007' + }, + 'voc_2007_test': { + IM_DIR: + _DATA_DIR + '/VOC2007/JPEGImages', + ANN_FN: + _DATA_DIR + '/VOC2007/annotations/pascal_test2007.json', + DEVKIT_DIR: + _DATA_DIR + '/VOC2007/VOCdevkit2007' + }, -## Quick Start: Using Detectron +``` -After installation, please see [`GETTING_STARTED.md`](GETTING_STARTED.md) for brief tutorials covering inference and training with Detectron. +## 5. Change the cls_score and bbox_pred name to prevent error when train_net.py load weights +In lib/modeling/fast_rcnn_heads.py, change all cls_score to cls_score_voc, bbox_pred to bbox_pred_voc. -## Getting Help +## 6. Run command to begin training. +``` +python2 tools/train_net.py --cfg experiments/e2e_faster_rcnn_resnet-50-FPN_pascal2007.yaml OUTPUT_DIR experiments/output +``` -To start, please check the [troubleshooting](INSTALL.md#troubleshooting) section of our installation instructions as well as our [FAQ](FAQ.md). If you couldn't find help there, try searching our GitHub issues. We intend the issues page to be a forum in which the community collectively troubleshoots problems. +## 7. Copy the final model just trained. +``` +mkdir -p /tmp/detectron-download-cache/voc2007/ +cp experiments/output/train/voc_2007_train/generalized_rcnn/model_iter49999.pkl /tmp/detectron-download-cache/voc2007/model_final.pkl -If bugs are found, **we appreciate pull requests** (including adding Q&A's to `FAQ.md` and improving our installation instructions and troubleshooting documents). Please see [CONTRIBUTING.md](CONTRIBUTING.md) for more information about contributing to Detectron. +``` +## 8. Infer some images psacal 2007 from test dataset +``` +python2 tools/infer_simple.py --cfg experiments/e2e_faster_rcnn_resnet-50-FPN_pascal2007.yaml \ + --output-dir /tmp/detectron-visualizations --wts /tmp/detectron-download-cache/voc2007/model_final.pkl \ + demo2 +``` +Unfortunately, I found all persons are labeld as bird. This may be caused by the json datasets which are not corrrectly converted. -## References +## 9. Run test_net.py on pascal 2007 test dataset. +``` +python2 tools/test_net.py \ + --cfg experiments/e2e_faster_rcnn_resnet-50-FPN_pascal2007.yaml \ + TEST.WEIGHTS /tmp/detectron-download-cache/voc2007/model_final.pkl \ + NUM_GPUS 1 +``` +The test report the AP and mAP: +``` +INFO voc_dataset_evaluator.py: 127: AP for aeroplane = 0.8095 +INFO voc_dataset_evaluator.py: 127: AP for bicycle = 0.8042 +INFO voc_dataset_evaluator.py: 127: AP for bird = 0.7086 +INFO voc_dataset_evaluator.py: 127: AP for boat = 0.6418 +INFO voc_dataset_evaluator.py: 127: AP for bottle = 0.6861 +INFO voc_dataset_evaluator.py: 127: AP for bus = 0.8822 +INFO voc_dataset_evaluator.py: 127: AP for car = 0.8794 +INFO voc_dataset_evaluator.py: 127: AP for cat = 0.8621 +INFO voc_dataset_evaluator.py: 127: AP for chair = 0.5876 +INFO voc_dataset_evaluator.py: 127: AP for cow = 0.7799 +INFO voc_dataset_evaluator.py: 127: AP for diningtable = 0.7404 +INFO voc_dataset_evaluator.py: 127: AP for dog = 0.8497 +INFO voc_dataset_evaluator.py: 127: AP for horse = 0.8855 +INFO voc_dataset_evaluator.py: 127: AP for motorbike = 0.7912 +INFO voc_dataset_evaluator.py: 127: AP for person = 0.7931 +INFO voc_dataset_evaluator.py: 127: AP for pottedplant = 0.5142 +INFO voc_dataset_evaluator.py: 127: AP for sheep = 0.7950 +INFO voc_dataset_evaluator.py: 127: AP for sofa = 0.7457 +INFO voc_dataset_evaluator.py: 127: AP for train = 0.7956 +INFO voc_dataset_evaluator.py: 127: AP for tvmonitor = 0.6960 +INFO voc_dataset_evaluator.py: 130: Mean AP = 0.7624 -- [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/abs/1712.04440). - Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, and Kaiming He. - Tech report, arXiv, Dec. 2017. -- [Learning to Segment Every Thing](https://arxiv.org/abs/1711.10370). - Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick. - Tech report, arXiv, Nov. 2017. -- [Non-Local Neural Networks](https://arxiv.org/abs/1711.07971). - Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. - Tech report, arXiv, Nov. 2017. -- [Mask R-CNN](https://arxiv.org/abs/1703.06870). - Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. - IEEE International Conference on Computer Vision (ICCV), 2017. -- [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002). - Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. - IEEE International Conference on Computer Vision (ICCV), 2017. -- [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677). - Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. - Tech report, arXiv, June 2017. -- [Detecting and Recognizing Human-Object Interactions](https://arxiv.org/abs/1704.07333). - Georgia Gkioxari, Ross Girshick, Piotr Dollár, and Kaiming He. - Tech report, arXiv, Apr. 2017. -- [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144). - Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. - IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. -- [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431). - Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. - IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. -- [R-FCN: Object Detection via Region-based Fully Convolutional Networks](http://arxiv.org/abs/1605.06409). - Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. - Conference on Neural Information Processing Systems (NIPS), 2016. -- [Deep Residual Learning for Image Recognition](http://arxiv.org/abs/1512.03385). - Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. - IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. -- [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](http://arxiv.org/abs/1506.01497) - Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. - Conference on Neural Information Processing Systems (NIPS), 2015. -- [Fast R-CNN](http://arxiv.org/abs/1504.08083). - Ross Girshick. - IEEE International Conference on Computer Vision (ICCV), 2015. +``` + diff --git a/demo2/000012.jpg b/demo2/000012.jpg new file mode 100644 index 000000000..b829107b8 Binary files /dev/null and b/demo2/000012.jpg differ diff --git a/demo2/000014.jpg b/demo2/000014.jpg new file mode 100644 index 000000000..5d8583e6a Binary files /dev/null and b/demo2/000014.jpg differ diff --git a/demo2/000017.jpg b/demo2/000017.jpg new file mode 100644 index 000000000..19dd833f8 Binary files /dev/null and b/demo2/000017.jpg differ diff --git a/demo2/000018.jpg b/demo2/000018.jpg new file mode 100644 index 000000000..e76c295f2 Binary files /dev/null and b/demo2/000018.jpg differ diff --git a/demo2/000019.jpg b/demo2/000019.jpg new file mode 100644 index 000000000..f4f2e76ab Binary files /dev/null and b/demo2/000019.jpg differ diff --git a/demo2/000021.jpg b/demo2/000021.jpg new file mode 100644 index 000000000..bf32a0002 Binary files /dev/null and b/demo2/000021.jpg differ diff --git a/demo2/000022.jpg b/demo2/000022.jpg new file mode 100644 index 000000000..c83174d74 Binary files /dev/null and b/demo2/000022.jpg differ diff --git a/demo2/000023.jpg b/demo2/000023.jpg new file mode 100644 index 000000000..ff2886993 Binary files /dev/null and b/demo2/000023.jpg differ diff --git a/demo2/000025.jpg b/demo2/000025.jpg new file mode 100644 index 000000000..b1b0359c9 Binary files /dev/null and b/demo2/000025.jpg differ diff --git a/demo2/000028.jpg b/demo2/000028.jpg new file mode 100644 index 000000000..df4a907e5 Binary files /dev/null and b/demo2/000028.jpg differ diff --git a/demo2/000030.jpg b/demo2/000030.jpg new file mode 100644 index 000000000..ea141fc52 Binary files /dev/null and b/demo2/000030.jpg differ diff --git a/demo2/000031.jpg b/demo2/000031.jpg new file mode 100644 index 000000000..74ea3a85c Binary files /dev/null and b/demo2/000031.jpg differ diff --git a/demo2/000032.jpg b/demo2/000032.jpg new file mode 100644 index 000000000..b111b5a0b Binary files /dev/null and b/demo2/000032.jpg differ diff --git a/demo2/000034.jpg b/demo2/000034.jpg new file mode 100644 index 000000000..aacf7d075 Binary files /dev/null and b/demo2/000034.jpg differ diff --git a/demo2/000035.jpg b/demo2/000035.jpg new file mode 100644 index 000000000..fa23a0b27 Binary files /dev/null and b/demo2/000035.jpg differ diff --git a/demo2/000036.jpg b/demo2/000036.jpg new file mode 100644 index 000000000..911d58131 Binary files /dev/null and b/demo2/000036.jpg differ diff --git a/experiments/_init_paths.py b/experiments/_init_paths.py new file mode 100644 index 000000000..11f3d85a2 --- /dev/null +++ b/experiments/_init_paths.py @@ -0,0 +1,20 @@ +#!/usr/bin/env python2 +# -*- coding: utf-8 -*- +""" +Created on Tue Feb 6 08:39:36 2018 + +@author: roy +""" + +"""Insert /home/roy/projects/caffe2/build to PYTHONPATH""" + +import sys + +pt = '/home/roy/projects/caffe2/build' + + +def add_path(path): + if path not in sys.path: + sys.path.insert(0, path) + +add_path(pt) \ No newline at end of file diff --git a/experiments/demo.txt b/experiments/demo.txt new file mode 100644 index 000000000..5c8c91e13 --- /dev/null +++ b/experiments/demo.txt @@ -0,0 +1,9 @@ +python2 tools/infer_simple.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --output-dir /tmp/detectron-visualizations --image-ext jpg --wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl demo + +python2 tools/infer_simple.py \ + --cfg configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_2x.yaml \ + --output-dir /tmp/detectron-visualizations \ + --image-ext jpg \ + --wts https://s3-us-west-2.amazonaws.com/detectron/35857389/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_2x.yaml.01_37_22.KSeq0b5q/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl demo + +python2 tools/train_net.py --cfg experiments/e2e_faster_rcnn_resnet-50-FPN_pascal2007.yaml OUTPUT_DIR experiments/output diff --git a/experiments/e2e_faster_rcnn_resnet-50-FPN_pascal2007.yaml b/experiments/e2e_faster_rcnn_resnet-50-FPN_pascal2007.yaml new file mode 100644 index 000000000..994c86a75 --- /dev/null +++ b/experiments/e2e_faster_rcnn_resnet-50-FPN_pascal2007.yaml @@ -0,0 +1,55 @@ +MODEL: + TYPE: generalized_rcnn + CONV_BODY: FPN.add_fpn_ResNet50_conv5_body + NUM_CLASSES: 21 + FASTER_RCNN: True +NUM_GPUS: 1 +SOLVER: + WEIGHT_DECAY: 0.0001 + LR_POLICY: steps_with_decay + BASE_LR: 0.0025 + GAMMA: 0.1 + MAX_ITER: 50000 + STEPS: [0, 30000, 40000] + # Equivalent schedules with... + # 1 GPU: + # BASE_LR: 0.0025 + # MAX_ITER: 60000 + # STEPS: [0, 30000, 40000] + # 2 GPUs: + # BASE_LR: 0.005 + # MAX_ITER: 30000 + # STEPS: [0, 15000, 20000] + # 4 GPUs: + # BASE_LR: 0.01 + # MAX_ITER: 15000 + # STEPS: [0, 7500, 10000] + # 8 GPUs: + # BASE_LR: 0.02 + # MAX_ITER: 7500 + # STEPS: [0, 3750, 5000] +FPN: + FPN_ON: True + MULTILEVEL_ROIS: True + MULTILEVEL_RPN: True +FAST_RCNN: + ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head + ROI_XFORM_METHOD: RoIAlign + ROI_XFORM_RESOLUTION: 7 + ROI_XFORM_SAMPLING_RATIO: 2 +TRAIN: + SNAPSHOT_ITERS: 5000 + WEIGHTS: /tmp/detectron/detectron-download-cache/35857389/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_2x.yaml.01_37_22.KSeq0b5q/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl + DATASETS: ('voc_2007_train',) + SCALES: (500,) + MAX_SIZE: 833 + BATCH_SIZE_PER_IM: 256 + RPN_PRE_NMS_TOP_N: 2000 # Per FPN level +TEST: + DATASETS: ('voc_2007_test',) + SCALES: (500,) + MAX_SIZE: 833 + NMS: 0.5 + RPN_PRE_NMS_TOP_N: 1000 # Per FPN level + RPN_POST_NMS_TOP_N: 1000 +OUTPUT_DIR: . diff --git a/experiments/net2json.ipynb b/experiments/net2json.ipynb new file mode 100644 index 000000000..69da3d4aa --- /dev/null +++ b/experiments/net2json.ipynb @@ -0,0 +1,156 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from __future__ import division\n", + "from __future__ import print_function\n", + "import _init_paths\n", + "import json\n", + "\n", + "from caffe2.python.core import Net\n", + "from caffe2.python import workspace\n", + "from caffe2.proto.caffe2_pb2 import NetDef\n", + "import caffe2.proto.caffe2_pb2 as pb2\n", + "\n", + "from google.protobuf import json_format\n", + "import pprint" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "net = Net('fist_net')\n", + "X = net.GaussianFill([], ['X'], mean=0.0, std=1.0, shape=[2,3], run_once=0)\n", + "\n", + "net_json = json_format.MessageToJson(net.Proto())\n", + "\n", + "#json.loads convert a json instance to Python object\n", + "#json.dumps convert a Python object to JSON formatted str\n", + "with open('testnet.txt', 'w') as fid:\n", + " fid.write(json.dumps(json.loads(net_json), indent=4))\n", + " \n", + "jstr = None \n", + "with open('testnet.txt', 'r') as fid:\n", + " jstr = json.dumps(json.load(fid))\n", + " \n", + "net2 = NetDef()\n", + "json_format.Parse(jstr, net2)\n", + "\n", + "print(str(net2))\n", + " \n", + "'''\n", + "init_net = NetDef()\n", + "with open('param_init_net.pb', 'r') as fid:\n", + " init_net.ParseFromString(fid.read())\n", + " \n", + "\n", + "train_net = NetDef()\n", + "with open('net.pb', 'r') as fid:\n", + " train_net.ParseFromString(fid.read())\n", + " \n", + "print(init_net.name)\n", + "print(train_net.name)\n", + "'''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "jstr = None \n", + "with open('net.json', 'r') as fid:\n", + " jstr = json.dumps(json.load(fid))\n", + "\n", + "net = NetDef()\n", + "json_format.Parse(jstr, net)\n", + "\n", + "with open('param_init_net.json', 'r') as fid:\n", + " jstr = json.dumps(json.load(fid))\n", + " \n", + "init_net = NetDef()\n", + "json_format.Parse(jstr, init_net)\n", + "\n", + "print(str(net))\n", + "print('=======================')\n", + "print(str(init_net))\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "jstr = None \n", + "with open('/home/roy/working/Detectron/experiments/output/train/voc_2007_train/generalized_rcnn/net.json', 'r') as fid:\n", + " jstr = json.load(fid)\n", + "\n", + "\n", + "ext_list = jstr['externalInput']\n", + "for it in ext_list:\n", + " #print(it)\n", + " if it.startswith('gpu_0/roi_blobs_queue'):\n", + " #print(it)\n", + " ext_list.remove(it)\n", + " \n", + "#print(str(ext_list))\n", + "\n", + "op_list = jstr['op']\n", + "for it_op in op_list:\n", + " if it_op['input'][0].startswith('gpu_0/roi_blobs_queue'):\n", + " op_list.remove(it_op)\n", + "\n", + "print(json.dumps(jstr, indent=4))\n", + "\n", + "\n", + " \n", + "#net = NetDef()\n", + "#json_format.Parse(jstr, net)\n", + "\n", + "\n", + "#print(str(net))\n", + "#print('=======================')\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/experiments/test.yaml b/experiments/test.yaml new file mode 100644 index 000000000..55aff2295 --- /dev/null +++ b/experiments/test.yaml @@ -0,0 +1,57 @@ +MODEL: + TYPE: generalized_rcnn + CONV_BODY: FPN.add_fpn_ResNet50_conv5_body + NUM_CLASSES: 81 + FASTER_RCNN: True +NUM_GPUS: 1 +SOLVER: + WEIGHT_DECAY: 0.0001 + LR_POLICY: steps_with_decay + BASE_LR: 0.0025 + GAMMA: 0.1 + MAX_ITER: 5000 + STEPS: [0, 1000, 3000] + # Equivalent schedules with... + # 1 GPU: + # BASE_LR: 0.0025 + # MAX_ITER: 60000 + # STEPS: [0, 30000, 40000] + # 2 GPUs: + # BASE_LR: 0.005 + # MAX_ITER: 30000 + # STEPS: [0, 15000, 20000] + # 4 GPUs: + # BASE_LR: 0.01 + # MAX_ITER: 15000 + # STEPS: [0, 7500, 10000] + # 8 GPUs: + # BASE_LR: 0.02 + # MAX_ITER: 7500 + # STEPS: [0, 3750, 5000] +FPN: + FPN_ON: True + MULTILEVEL_ROIS: True + MULTILEVEL_RPN: True +FAST_RCNN: + ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head + ROI_XFORM_METHOD: RoIAlign + ROI_XFORM_RESOLUTION: 7 + ROI_XFORM_SAMPLING_RATIO: 2 +TRAIN: + SNAPSHOT_ITERS: 1000 + #WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl + WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/35857389/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_2x.yaml.01_37_22.KSeq0b5q/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl + DATASETS: ('voc_2007_train',) + SCALES: (500,) + MAX_SIZE: 833 + BATCH_SIZE_PER_IM: 256 + RPN_PRE_NMS_TOP_N: 2000 # Per FPN level +TEST: + WEIGHTS: /home/roy/working/Detectron/experiments/output/train/voc_2007_train/generalized_rcnn/model_iter999.pkl + DATASETS: ('voc_2007_test',) + SCALES: (500,) + MAX_SIZE: 833 + NMS: 0.5 + RPN_PRE_NMS_TOP_N: 1000 # Per FPN level + RPN_POST_NMS_TOP_N: 1000 +OUTPUT_DIR: . diff --git a/lib/core/config.py b/lib/core/config.py index 08c7068f8..1e5ebe28e 100644 --- a/lib/core/config.py +++ b/lib/core/config.py @@ -483,7 +483,8 @@ # Use 'prof_dag' to get profiling statistics __C.MODEL.EXECUTION_TYPE = b'dag' - +__C.MODEL.PROTOTXT = b'' +__C.MODEL.INIT_PROTOTXT = b'' # ---------------------------------------------------------------------------- # # RetinaNet options # ---------------------------------------------------------------------------- # diff --git a/lib/core/test.py b/lib/core/test.py index bf788afad..dc8c5e050 100644 --- a/lib/core/test.py +++ b/lib/core/test.py @@ -168,7 +168,7 @@ def im_detect_bbox(model, im, boxes=None): if cfg.TEST.BBOX_REG: # Apply bounding-box regression deltas - box_deltas = workspace.FetchBlob(core.ScopedName('bbox_pred')).squeeze() + box_deltas = workspace.FetchBlob(core.ScopedName('bbox_pred_voc')).squeeze() # In case there is 1 proposal box_deltas = box_deltas.reshape([-1, box_deltas.shape[-1]]) if cfg.MODEL.CLS_AGNOSTIC_BBOX_REG: diff --git a/lib/datasets/dataset_catalog.py b/lib/datasets/dataset_catalog.py index 9a0f35dbb..66e0d997b 100644 --- a/lib/datasets/dataset_catalog.py +++ b/lib/datasets/dataset_catalog.py @@ -162,11 +162,11 @@ ANN_FN: _DATA_DIR + '/coco/annotations/image_info_test-dev2015.json' }, - 'voc_2007_trainval': { + 'voc_2007_train': { IM_DIR: _DATA_DIR + '/VOC2007/JPEGImages', ANN_FN: - _DATA_DIR + '/VOC2007/annotations/voc_2007_trainval.json', + _DATA_DIR + '/VOC2007/annotations/pascal_train2007.json', DEVKIT_DIR: _DATA_DIR + '/VOC2007/VOCdevkit2007' }, @@ -174,7 +174,7 @@ IM_DIR: _DATA_DIR + '/VOC2007/JPEGImages', ANN_FN: - _DATA_DIR + '/VOC2007/annotations/voc_2007_test.json', + _DATA_DIR + '/VOC2007/annotations/pascal_test2007.json', DEVKIT_DIR: _DATA_DIR + '/VOC2007/VOCdevkit2007' }, diff --git a/lib/modeling/fast_rcnn_heads.py b/lib/modeling/fast_rcnn_heads.py index e2cadc0c5..a338ef96c 100644 --- a/lib/modeling/fast_rcnn_heads.py +++ b/lib/modeling/fast_rcnn_heads.py @@ -46,7 +46,7 @@ def add_fast_rcnn_outputs(model, blob_in, dim): """Add RoI classification and bounding box regression output ops.""" model.FC( blob_in, - 'cls_score', + 'cls_score_voc', dim, model.num_classes, weight_init=gauss_fill(0.01), @@ -55,10 +55,10 @@ def add_fast_rcnn_outputs(model, blob_in, dim): if not model.train: # == if test # Only add softmax when testing; during training the softmax is combined # with the label cross entropy loss for numerical stability - model.Softmax('cls_score', 'cls_prob', engine='CUDNN') + model.Softmax('cls_score_voc', 'cls_prob', engine='CUDNN') model.FC( blob_in, - 'bbox_pred', + 'bbox_pred_voc', dim, model.num_classes * 4, weight_init=gauss_fill(0.001), @@ -69,12 +69,12 @@ def add_fast_rcnn_outputs(model, blob_in, dim): def add_fast_rcnn_losses(model): """Add losses for RoI classification and bounding box regression.""" cls_prob, loss_cls = model.net.SoftmaxWithLoss( - ['cls_score', 'labels_int32'], ['cls_prob', 'loss_cls'], + ['cls_score_voc', 'labels_int32'], ['cls_prob', 'loss_cls'], scale=model.GetLossScale() ) loss_bbox = model.net.SmoothL1Loss( [ - 'bbox_pred', 'bbox_targets', 'bbox_inside_weights', + 'bbox_pred_voc', 'bbox_targets', 'bbox_inside_weights', 'bbox_outside_weights' ], 'loss_bbox', diff --git a/old-README.md b/old-README.md new file mode 100644 index 000000000..34b658758 --- /dev/null +++ b/old-README.md @@ -0,0 +1,111 @@ +# Detectron + +Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including [Mask R-CNN](https://arxiv.org/abs/1703.06870). It is written in Python and powered by the [Caffe2](https://github.com/caffe2/caffe2) deep learning framework. + +At FAIR, Detectron has enabled numerous research projects, including: [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144), [Mask R-CNN](https://arxiv.org/abs/1703.06870), [Detecting and Recognizing Human-Object Interactions](https://arxiv.org/abs/1704.07333), [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002), [Non-local Neural Networks](https://arxiv.org/abs/1711.07971), [Learning to Segment Every Thing](https://arxiv.org/abs/1711.10370), and [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/abs/1712.04440). + +
+ +

Example Mask R-CNN output.

+
+ +## Introduction + +The goal of Detectron is to provide a high-quality, high-performance +codebase for object detection *research*. It is designed to be flexible in order +to support rapid implementation and evaluation of novel research. Detectron +includes implementations of the following object detection algorithms: + +- [Mask R-CNN](https://arxiv.org/abs/1703.06870) -- *Marr Prize at ICCV 2017* +- [RetinaNet](https://arxiv.org/abs/1708.02002) -- *Best Student Paper Award at ICCV 2017* +- [Faster R-CNN](https://arxiv.org/abs/1506.01497) +- [RPN](https://arxiv.org/abs/1506.01497) +- [Fast R-CNN](https://arxiv.org/abs/1504.08083) +- [R-FCN](https://arxiv.org/abs/1605.06409) + +using the following backbone network architectures: + +- [ResNeXt{50,101,152}](https://arxiv.org/abs/1611.05431) +- [ResNet{50,101,152}](https://arxiv.org/abs/1512.03385) +- [Feature Pyramid Networks](https://arxiv.org/abs/1612.03144) (with ResNet/ResNeXt) +- [VGG16](https://arxiv.org/abs/1409.1556) + +Additional backbone architectures may be easily implemented. For more details about these models, please see [References](#references) below. + +## License + +Detectron is released under the [Apache 2.0 license](https://github.com/facebookresearch/detectron/blob/master/LICENSE). See the [NOTICE](https://github.com/facebookresearch/detectron/blob/master/NOTICE) file for additional details. + +## Citing Detectron + +If you use Detectron in your research or wish to refer to the baseline results published in the [Model Zoo](MODEL_ZOO.md), please use the following BibTeX entry. + +``` +@misc{Detectron2018, + author = {Ross Girshick and Ilija Radosavovic and Georgia Gkioxari and + Piotr Doll\'{a}r and Kaiming He}, + title = {Detectron}, + howpublished = {\url{https://github.com/facebookresearch/detectron}}, + year = {2018} +} +``` + +## Model Zoo and Baselines + +We provide a large set of baseline results and trained models available for download in the [Detectron Model Zoo](MODEL_ZOO.md). + +## Installation + +Please find installation instructions for Caffe2 and Detectron in [`INSTALL.md`](INSTALL.md). + +## Quick Start: Using Detectron + +After installation, please see [`GETTING_STARTED.md`](GETTING_STARTED.md) for brief tutorials covering inference and training with Detectron. + +## Getting Help + +To start, please check the [troubleshooting](INSTALL.md#troubleshooting) section of our installation instructions as well as our [FAQ](FAQ.md). If you couldn't find help there, try searching our GitHub issues. We intend the issues page to be a forum in which the community collectively troubleshoots problems. + +If bugs are found, **we appreciate pull requests** (including adding Q&A's to `FAQ.md` and improving our installation instructions and troubleshooting documents). Please see [CONTRIBUTING.md](CONTRIBUTING.md) for more information about contributing to Detectron. + +## References + +- [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/abs/1712.04440). + Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, and Kaiming He. + Tech report, arXiv, Dec. 2017. +- [Learning to Segment Every Thing](https://arxiv.org/abs/1711.10370). + Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick. + Tech report, arXiv, Nov. 2017. +- [Non-Local Neural Networks](https://arxiv.org/abs/1711.07971). + Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. + Tech report, arXiv, Nov. 2017. +- [Mask R-CNN](https://arxiv.org/abs/1703.06870). + Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. + IEEE International Conference on Computer Vision (ICCV), 2017. +- [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002). + Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. + IEEE International Conference on Computer Vision (ICCV), 2017. +- [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677). + Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. + Tech report, arXiv, June 2017. +- [Detecting and Recognizing Human-Object Interactions](https://arxiv.org/abs/1704.07333). + Georgia Gkioxari, Ross Girshick, Piotr Dollár, and Kaiming He. + Tech report, arXiv, Apr. 2017. +- [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144). + Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. + IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. +- [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431). + Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. + IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. +- [R-FCN: Object Detection via Region-based Fully Convolutional Networks](http://arxiv.org/abs/1605.06409). + Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. + Conference on Neural Information Processing Systems (NIPS), 2016. +- [Deep Residual Learning for Image Recognition](http://arxiv.org/abs/1512.03385). + Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. + IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. +- [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](http://arxiv.org/abs/1506.01497) + Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. + Conference on Neural Information Processing Systems (NIPS), 2015. +- [Fast R-CNN](http://arxiv.org/abs/1504.08083). + Ross Girshick. + IEEE International Conference on Computer Vision (ICCV), 2015.