|
| 1 | +# Contributing to the Model Zoo for Intel® Architecture |
| 2 | + |
| 3 | +## Adding scripts for a new TensorFlow model |
| 4 | + |
| 5 | +### Code updates |
| 6 | + |
| 7 | +In order to add a new model to the zoo, there are a few things that are |
| 8 | +required: |
| 9 | + |
| 10 | +1. Setup the directory structure to allow the |
| 11 | + [launch script](/docs/general/tensorflow/LaunchBenchmark.md) to find |
| 12 | + your model. This involves creating folders for: |
| 13 | + `/benchmarks/<use case>/<framework>/<model name>/<mode>/<precision>`. |
| 14 | + Note that you will need to add `__init__.py` files in each new |
| 15 | + directory that you add, in order for python to find the code. |
| 16 | + |
| 17 | +  |
| 18 | + |
| 19 | +2. Next, in the leaf folder that was created in the previous step, you |
| 20 | + will need to create `config.json` and `model_init.py` files: |
| 21 | + |
| 22 | +  |
| 23 | + |
| 24 | + The `config.json` file contains the best known KMP environment variable |
| 25 | + settings to get optimal performance for the model. Below default settings are recommended for most of |
| 26 | + the models in Model Zoo. |
| 27 | + |
| 28 | + ``` |
| 29 | + { |
| 30 | + "optimization_parameters": { |
| 31 | + "KMP_AFFINITY": "granularity=fine,verbose,compact,1,0", |
| 32 | + "KMP_BLOCKTIME": 1, |
| 33 | + "KMP_SETTINGS": 1 |
| 34 | + } |
| 35 | + } |
| 36 | + ``` |
| 37 | +
|
| 38 | + The `model_init.py` file is used to initialize the best known configuration for the |
| 39 | + model, and then start executing inference or training. When the |
| 40 | + [launch script](/docs/general/tensorflow/LaunchBenchmark.md) is run, |
| 41 | + it will look for the appropriate `model_init.py` file to use |
| 42 | + according to the model name, framework, mode, and precision that are |
| 43 | + specified by the user. |
| 44 | +
|
| 45 | + The contents of the `model_init.py` file will vary by framework. For |
| 46 | + TensorFlow models, we typically use the |
| 47 | + [base model init class](/benchmarks/common/base_model_init.py) that |
| 48 | + includes functions for doing common tasks such as setting up the best |
| 49 | + known environment variables (like `KMP_BLOCKTIME`, `KMP_SETTINGS`, |
| 50 | + `KMP_AFFINITY` by loading **config.json** and `OMP_NUM_THREADS`), num intra threads, and num |
| 51 | + inter threads. The `model_init.py` file also sets up the string that |
| 52 | + will ultimately be used to run inference or model training, which |
| 53 | + normally includes the use of `numactl` and sending all of the |
| 54 | + appropriate arguments to the model's script. Also, if your model |
| 55 | + requires any non-standard arguments (arguments that are not part of |
| 56 | + the [launch script flags](/docs/general/tensorflow/LaunchBenchmark.md#launch_benchmarkpy-flags)), |
| 57 | + the `model_init.py` file is where you would define and parse those |
| 58 | + args. |
| 59 | +
|
| 60 | +3. [start.sh](/benchmarks/common/tensorflow/start.sh) is a shell script |
| 61 | + that is called by the `launch_benchmarks.py` script in the docker |
| 62 | + container. This script installs dependencies that are required by |
| 63 | + the model, sets up the `PYTHONPATH` environment variable, and then |
| 64 | + calls the [run_tf_benchmark.py](/benchmarks/common/tensorflow/run_tf_benchmark.py) |
| 65 | + script with the appropriate args. That run script will end up calling |
| 66 | + the `model_init.py` file that you have defined in the previous step. |
| 67 | +
|
| 68 | + To add support for a new model in the `start.sh` script, you will |
| 69 | + need to add a function with the same name as your model. Note that |
| 70 | + this function name should match the `<model name>` folder from the |
| 71 | + first step where you setup the directories for your model. In this |
| 72 | + function, add commands to install any third-party dependencies within |
| 73 | + an `if [ ${NOINSTALL} != "True" ]; then` conditional block. The |
| 74 | + purpose of the `NOINSTALL` flag is to be able to skip the installs |
| 75 | + for quicker iteration when running on bare metal or debugging. If |
| 76 | + your model requires the `PYTHONPATH` environment variable to be setup |
| 77 | + to find model code or dependencies, that should be done in the |
| 78 | + model's function. Next, setup the command that will be run. The |
| 79 | + standard launch script args are already added to the `CMD` variable, |
| 80 | + so your model function will only need to add on more args if you have |
| 81 | + model-specific args defined in your `model_init.py`. Lastly, call the |
| 82 | + `run_model` function with the `PYTHONPATH` and the `CMD` string. |
| 83 | +
|
| 84 | + Below is a sample template of a `start.sh` model function that |
| 85 | + installs dependencies from `requirements.txt` file, sets up the |
| 86 | + `PYHTONPATH` to find model source files, adds on a custom steps flag |
| 87 | + to the run command, and then runs the model: |
| 88 | + ```bash |
| 89 | + function <model_name>() { |
| 90 | + if [ ${PRECISION} == "fp32" ]; then |
| 91 | + if [ ${NOINSTALL} != "True" ]; then |
| 92 | + pip install -r ${MOUNT_EXTERNAL_MODELS_SOURCE}/requirements.txt |
| 93 | + fi |
| 94 | +
|
| 95 | + export PYTHONPATH=${PYTHONPATH}:${MOUNT_EXTERNAL_MODELS_SOURCE} |
| 96 | + CMD="${CMD} $(add_steps_args)" |
| 97 | + PYTHONPATH=${PYTHONPATH} CMD=${CMD} run_model |
| 98 | + else |
| 99 | + echo "PRECISION=${PRECISION} is not supported for ${MODEL_NAME}" |
| 100 | + exit 1 |
| 101 | + fi |
| 102 | + } |
| 103 | + ``` |
| 104 | + |
| 105 | +Optional step: |
| 106 | +* If there is CPU-optimized model code that has not been upstreamed to |
| 107 | + the original repository, then it can be added to the |
| 108 | + [models](/models) directory in the zoo repo. As with the first step |
| 109 | + in the previous section, the directory structure should be setup like: |
| 110 | + `/models/<use case>/<framework>/<model name>/<mode>/<precision>`. |
| 111 | + |
| 112 | +  |
| 113 | + |
| 114 | + If there are model files that can be shared by multiple modes or |
| 115 | + precisions, they can be placed the higher-level directory. For |
| 116 | + example, if a file could be shared by both `FP32` and `Int8` |
| 117 | + precisions, then it could be placed in the directory at: |
| 118 | + `/models/<use case>/<framework>/<model name>/<mode>` (omitting the |
| 119 | + `<precision>` directory). Note that if this is being done, you need to |
| 120 | + ensure that the license that is associated with the original model |
| 121 | + repository is compatible with the license of the model zoo. |
| 122 | + |
| 123 | +### Debugging |
| 124 | + |
| 125 | +There are a couple of options for debugging and quicker iteration when |
| 126 | +developing new scripts: |
| 127 | +* Use the `--debug` flag in the launch_benchmark.py script, which will |
| 128 | + give you a shell into the docker container. See the |
| 129 | + [debugging section](/docs/general/tensorflow/LaunchBenchmark.md#debugging) |
| 130 | + of the launch script documentation for more information on using this |
| 131 | + flag. |
| 132 | +* Run the launch script on bare metal (without a docker container). The |
| 133 | + launch script documentation also has a |
| 134 | + [section](/docs/general/tensorflow/LaunchBenchmark.md#alpha-feature-running-on-bare-metal) |
| 135 | + with instructions on how to do this. Note that when running without |
| 136 | + docker, you are responsible for installing all dependencies on your |
| 137 | + system before running the launch script. If you are using this option |
| 138 | + during development, be sure to also test _with_ a docker container to |
| 139 | + ensure that the `start.sh` script dependency installation is working |
| 140 | + properly for your model. |
| 141 | + |
| 142 | +### Documentation updates |
| 143 | + |
| 144 | +1. Create a `README.md` file in the |
| 145 | + `/benchmarks/<use case>/<framework>/<model name>` directory: |
| 146 | + |
| 147 | +  |
| 148 | + |
| 149 | + This README file should describe all of the steps necessary to run |
| 150 | + the model, including downloading and preprocessing the dataset, |
| 151 | + downloading the pretrained model, cloning repositories, and running |
| 152 | + the model script with the appropriate arguments. Most models |
| 153 | + have best known settings for batch and online inference performance |
| 154 | + testing as well as testing accuracy. The README file should specify |
| 155 | + how to set these configs using the `launch_benchmark.py` script. |
| 156 | + |
| 157 | +2. Update the table in the [main `benchmarks` README](/benchmarks/README.md) |
| 158 | + with a link to the model that you are adding. Note that the models |
| 159 | + in this table are ordered alphabetically by use case, framework, and |
| 160 | + model name. The model name should link to the original paper for the |
| 161 | + model. The instructions column should link to the README |
| 162 | + file that you created in the previous step. |
| 163 | + |
| 164 | +### Testing |
| 165 | + |
| 166 | +1. After you've completed the above steps, run the model according to |
| 167 | + instructions in the README file for the new model. Ensure that the |
| 168 | + performance and accuracy metrics are on par with what you would |
| 169 | + expect. |
| 170 | + |
| 171 | +2. Add unit tests to cover the new model. |
| 172 | + * For TensorFlow models, there is a |
| 173 | + [parameterized test](/tests/unit/common/tensorflow/test_run_tf_benchmarks.py#L80) |
| 174 | + that checks the flow running from `run_tf_benchmarks.py` to the |
| 175 | + inference command that is executed by the `model_init.py` file. The |
| 176 | + test ensures that the inference command has all of the expected |
| 177 | + arguments. |
| 178 | + |
| 179 | + To add a new parameterized instance of the test for your |
| 180 | + new model, add a new JSON file `tf_<model_name>_args.json` to the [tf_models_args](/tests/unit/common/tensorflow/tf_model_args) |
| 181 | + directory. Each file has a list of dictionaries, a dictionary has three |
| 182 | + items: (1) `_comment` a comment describes the command, |
| 183 | + (2) `input` the `run_tf_benchmarks.py` command with the appropriate |
| 184 | + flags to run the model (3) `output` the expected inference or training |
| 185 | + command that should get run by the `model_init.py` file. |
| 186 | + * If any launch script or base class files were changed, then |
| 187 | + additional unit tests should be added. |
| 188 | + * Unit tests and style checks are run when you post a GitHub PR, and |
| 189 | + the tests must be passing before the PR is merged. |
| 190 | + * For information on how to run the unit tests and style checks |
| 191 | + locally, see the [tests documentation](/tests/README.md). |
0 commit comments