IJCAI-ready

fabawi · Aug 23, 2021 · 3d181e4 · 3d181e4
commit 3d181e4
Show file tree

Hide file tree

Showing 295 changed files with 30,797 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,5 @@
+venv/
+*.pyc
+.idea/
+build/
+__pycache__/
diff --git a/LICENSE.md b/LICENSE.md
@@ -0,0 +1,97 @@
+# License Guide
+
+When you create a new open-source repo you need to be attent to the right way to create or update a LICENSE file
+
+In the copyright year you need to write the year of the begining of your work.
+
+The copyright notice should include the year in which you finished preparing the release (so if you finished it in 1998 but didn't post it until 1999, use 1998). You should add the proper year for each release; for example, “Copyright 1998, 1999 Terry Jones” if some versions were finished in 1998 and some were finished in 1999. If several people helped write the code, use all their names.
+
+For software with several releases over multiple years, it's okay to use a range (“2008-2010”) instead of listing individual years (“2008, 2009, 2010”) if and only if every year in the range, inclusive, really is a “copyrightable” year that would be listed individually; and you make an explicit statement in your documentation about this usage.
+
+If you made changes that year, do include the year in a comma-separated list in your copyright notice.
+
+If did not make copyrightable changes that year, do not include that year in your copyright notice.
+
+### Useful links
+
+- [What's the right license for me?](http://choosealicense.com)
+- [Is renewal of MIT license needed on github at the beginning of each year?](http://programmers.stackexchange.com/a/210491)
+
+### Examples
+
+```
+The MIT License (MIT)
+
+Copyright (c) 2016 Pagar.me Pagamentos S/A
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+```
+
+> Example of a MIT license for projects in constantly active development (like our libraries)
+
+```
+The MIT License (MIT)
+
+Copyright (c) 2013-present Pagar.me Pagamentos S/A
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+```
+
+> Example of a MIT license when you're a new maintainer of a deprecated ou discontinued project.
+
+```
+The MIT License (MIT)
+
+Copyright: (c) 2016 Pagar.me Pagamentos S/A
+           (c) 2010 John Doe
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+```
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1,7 @@
+include gazenet/configs/infer_configs/*.json
+include gazenet/configs/train_configs/*.json
+include gazenet/readers/visualization/assets/*.css
+global-include *.sh
+exclude *.zip *.7z *.tar.gz *.ptb *.ptb.tar *.npy *.npz *.hd5 *.txt *.jpg *.png *.avi *.gif *.mp4 *.wav *.mp3
+include datasets/processed/center_bias.jpg
+include datasets/processed/center_bias_bw.jpg
diff --git a/README.md b/README.md
@@ -0,0 +1,220 @@
+# GASP: Gated Attention for Saliency Prediction
+
+[\[Project Page: KT\]](http://software.knowledge-technology.info/#gasp) | [\[Abstract\]](https://www.ijcai.org/proceedings/2021/81) | [\[Paper\]](https://www.ijcai.org/proceedings/2021/0081.pdf) | [\[BibTeX\]](https://www.ijcai.org/proceedings/2021/bibtex/81) 
+
+This is the official [GASP](http://software.knowledge-technology.info/#gasp) code for our paper, presented at 
+[IJCAI 2021](https://ijcai-21.org). If you find this work useful, please cite our [paper](https://www2alt.informatik.uni-hamburg.de/wtm/publications/2021/AWW21/index.php):
+
+```
+@inproceedings{abawi2021gasp,
+  title={{GASP: Gated Attention for Saliency Prediction}},
+  author={Abawi, Fares and Weber, Tom and Wermter, Stefan},
+  booktitle={Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI)},
+  pages={584--591},
+  year={2021},
+  doi={10.24963/ijcai.2021/81},
+  publisher={IJCAI Organization}
+}
+```
+
+## Architecture Overview
+
+![GASP model architecture showing the feature extraction pipeline in the first stage followed by the feature integration in the second stage](showcase/multimodalsaliency.png)
+
+## Environment Variables and Preparation
+
+Configuring the training and inference pipelines is done through python scripts to maintain flexibility 
+in adding functionality within the configuration itself. Configuration can also be added externally in the
+form of `json` files found in [infer_configs](gazenet/configs/infer_configs) and [train_configs](gazenet/configs/train_configs). 
+In general, all the configurations for repeating the paper experiments 
+could be found in [infer_config.py](gazenet/configs/infer_config.py) and [train_config.py](gazenet/configs/train_config.py).
+
+We recommend using [comet.ml](https://comet.ml) for tracking your experiments during training. You can opt to use `tensorboard` as well,
+but the logger needs to be specified as an argument `--logger_name tensorboard` to `gasp_train` or changed in the training configuration itself 
+
+`
+logger = 'tensorboard' # logger = '' -> does not log the experiment
+`
+
+When choosing to use comet.ml by specifying `--logger_name comet`, set the following environment variables:
+
+```
+export COMET_WORKSPACE=<YOUR COMET WORKSPACE>
+export COMET_KEY=<YOUR COMET API KEY>
+```
+
+replacing `<YOUR COMET WORKSPACE>` with your workspace name and `<YOUR COMET API KEY>` with your comet.ml API key.
+
+It is recommended to create a separate working space outside of this repository. This can be done by setting:
+
+```
+export GASP_CWD=<DIRECTORY WHERE MODELS, DATASETS AND RESULTS ARE STORED>
+# CAN BE SET TO $(pwd) TO INSTALL IN THE CODE REPO:
+# export GASP_CWD=$(pwd)
+```
+
+## Setup
+
+The following need to be installed:
+
+```
+sudo apt-get install pv jq libportaudio2 ffmpeg
+```
+
+GASP is implemented in Pytorch and trained using the PytorchLightning library. To install GASP requirements, create a virtual environment and run:
+
+`
+python3 setup.py install
+` 
+
+### Preprocessing and Generating Social Cues + Saliency Prediction Representations (SCD) 
+
+| <span style="display: inline-block; width:500px"> Saliency Prediction (DAVE) + Video </span> | Gaze Direction Estimation (Gaze360) | Gaze Following (VideoGaze) | Facial Expression Recognition (ESR9) |
+| ------------------------- | ------------------------- | ------------------------- |:------------------------- |
+| <img src="showcase/det_transformed_dave_coutrot1_clip48_compressed.gif" width="200" height="200"/> | <img src="showcase/det_transformed_gaze360_coutrot1_clip48_compressed.gif" width="200" height="200"/> | <img src="showcase/det_transformed_videogaze_coutrot1_clip48_compressed.gif" width="200" height="200"/> | <img src="showcase/det_transformed_esr9_coutrot1_clip48_compressed.gif" width="200" height="200"/> |   
+
+**Download**
+
+You can download the preprocessed spatiotemporal maps directly without the need to process the training data locally:
+
+```
+gasp_download_manager --working_dir $GASP_CWD \
+                      --datasets processed/Grouped_frames/coutrot1 \
+                                 processed/Grouped_frames/coutrot2 \
+                                 processed/Grouped_frames/diem
+```
+
+
+**Preprocess Locally**
+
+Alternatively, generate the modality representations using the provided scripts. 
+Note that this might take upwards of a day depending on your CPU and/or GPU. 
+
+1. Download the datasets and pretrained social cue parameters directly by running the following script
+ (shells bash scripts and has been tested on Ubuntu 20.4):
+
+    ```
+    gasp_download_manager --working_dir $GASP_CWD \
+                          --datasets ave/database1 ave/database2 ave/diem stavis_preprocessed \
+                          --models emotion_recognition/esr9/checkpoints/pretrained_esr9_orig \
+                                   gaze_estimation/gaze360/checkpoints/pretrained_gaze360_orig \
+                                   gaze_following/videogaze/checkpoints/pretrained_videogaze_orig \
+                                   saliency_prediction/dave/checkpoints/pretrained_dave_orig
+    ```
+
+    *Note*: You can instead navigate to `datasets/<CHOSEN DATASET>` to download individual datasets and run the corresponding `download_dataset.sh` bash file directly.
+
+2. On download completion, the dataset can be generated (run from within the `--working_dir $GASP_CWD` specified in the previous step):
+
+    ```
+    gasp_infer --infer_config InferGeneratorAllModelsCoutrot1
+    gasp_infer --infer_config InferGeneratorAllModelsCoutrot2
+    gasp_infer --infer_config InferGeneratorAllModelsDIEM
+    ```
+   
+3. Finally, you could choose to replace the ground-truth fixation density maps and fixation points by the preprocessed 
+   maps generated by [STAViS: Tsiami et al.](https://github.com/atsiami/STAViS) Note that we use these ground-truth maps in all our experiments:
+
+    ```
+    gasp_scripts --working_dir $GASP_CWD --scripts postprocess_get_from_stavis
+    ```
+   
+## Training
+
+To train the best achieving sequential model **(DAM + LARGMU; Context Size = 10)**, invoke the script configuration on the social event subset of the [AVE dataset \[Tavakoli et al.\]](https://hrtavakoli.github.io/AVE/):
+
+```
+gasp_train --train_config GASPExp002_SeqDAMALSTMGMU1x1Conv_10Norm \
+           --infer_configs InferMetricsGASPTrain \
+           --checkpoint_save_every_n_epoch 50 --checkpoint_save_n_top 5 --check_val_every_n_epoch 49 --max_epochs 2000 \
+           --gpus "0," --logger_name "comet" --val_store_image_samples --compute_metrics
+
+```
+
+or specify a json configuration file:
+
+```
+gasp_train --train_config_file $GASP_CWD/gazenet/configs/train_configs/GASPExp002_SeqDAMALSTMGMU1x1Conv_10Norm.json \
+           --infer_config_files $GASP_CWD/gazenet/configs/infer_configs/InferMetricsGASPTrain.json \
+           --checkpoint_save_every_n_epoch 50 --checkpoint_save_n_top 5 --check_val_every_n_epoch 49 --max_epochs 2000 \
+           --gpus "0," --logger_name "comet" --val_store_image_samples --compute_metrics
+
+```
+
+The `--compute_metrics` argument will run the inference script on completion and store the metrics results in [logs\metrics](logs\metrics) in the working directory.
+
+*Note*: We treat a single peek (covering the context size of GASP) into each of the videos in the dataset as an epoch since we visualize validation samples at short intervals rather than entire epochs. This should not be misconstrued as 2000 epochs over the dataset.
+
+## Inference
+
+![coutrot2 clip13 with sequential GASP DAM + LARGMU (Context Size: 10) overlaid on a video of 4 men in a meeting scenario](showcase/coutrot2_clip13_compressed.gif)
+
+**WARNING: Always run** `gasp_scripts --working_dir $GASP_CWD --scripts clean_temp` **before executing any inference script when changing dataset splits. 
+As a precaution, always delete temporary files before executing inference scripts if it doesn't take too long to process.**
+
+The inferer can run and visualize all integrated models as well as the GASP variants. To download all GASP variants:
+
+```
+gasp_download_manager --working_dir $GASP_CWD \
+                      --models "saliency_prediction/gasp/<...>"
+```
+
+To run the GASP inference, select a configuration class or json file and execute the inference script:
+
+```
+gasp_infer --infer_config InferVisualizeGASPSeqDAMALSTMGMU1x1Conv_10Norm --gpu 0
+``` 
+
+*Note*: Remove `--gpu 0` argument to run on CPU.
+
+*Tip*: To try out specific videos, create a new split (.csv files found in [datasets/processed](datasets/processed)) e.g.,:
+
+```
+video_id,fps,scene_type,dataset
+clip_11,25,Other,coutrot1
+```
+
+and in the configuration file e.g., [InferVisualizeGASPSeqDAMALSTMGMU1x1Conv_10Norm](gazenet/configs/infer_configs/InferVisualizeGASPSeqDAMALSTMGMU1x1Conv_10Norm.json) replace:
+
+```
+"datasplitter_properties": {
+        "train_csv_file": "datasets/processed/test_ave.csv",
+        "val_csv_file": null,
+        "test_csv_file": null
+    },
+```
+
+by the new split's name:
+
+```
+        "train_csv_file": "datasets/processed/<NEW_SPLIT_NAME>.csv"
+```
+
+## TODOs
+
+- [ ] Support parallelizing inference models on multiple GPUs
+- [ ] Support parallelizing inference models on multiple machines using middleware
+- [ ] Support realtime inference for GASP (currently works for selected models)
+- [ ] Restructure configuration files for more consistency
+- [ ] Support intermediate invocation of external applications within the inference model pipeline
+
+## Attribution
+
+This work relies on several packages and code-bases which we have modified to fit our framework. 
+If any attributions are missing, please notify us by [Email](mailto:[email protected]?subject=[GitHub]%20Missing%20GASP%20Attribution). 
+The following is a list of repositories which have a substantial portion of their content included in this work:
+
+* [STAViS: Spatio-Temporal AudioVisual Saliency Network](https://github.com/atsiami/STAViS)
+* [Unified Image and Video Saliency Modeling](https://github.com/rdroste/unisal)
+* [TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection](https://github.com/MichiganCOG/TASED-Net)
+* [ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction](https://github.com/samyak0210/ViNet)
+* [DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction](https://github.com/hrtavakoli/DAVE)
+* [Gaze360: Physically Unconstrained Gaze Estimation in the Wild Dataset](https://github.com/erkil1452/gaze360)
+* [Following Gaze in Video](https://github.com/recasens/Gaze-Following)
+* [Efficient Facial Feature Learning with Wide Ensemble-based Convolutional Neural Networks](https://github.com/siqueira-hc/Efficient-Facial-Feature-Learning-with-Wide-Ensemble-based-Convolutional-Neural-Networks)
+* [Saliency Metrics](https://github.com/tarunsharma1/saliency_metrics)
+
+## Acknowledgement
+
+This work was supported by the German
+Research Foundation DFG under project [CML (TRR 169)](https://www.crossmodal-learning.org/).
diff --git a/datasets/__init__.py b/datasets/__init__.py
diff --git a/datasets/ave/database1/download_dataset.sh b/datasets/ave/database1/download_dataset.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+./megadown "https://mega.nz/#!At8DWR7L!yf5k0jVwL961-jI4FJ2DGUAUqAu-yNbq3s3i6b52M2I"
+wget -O coutrot_database1.mat "http://antoinecoutrot.magix.net/public/assets/coutrot_database1.mat"
+7za x "ERB3_Stimuli.zip"
+rm "ERB3_Stimuli.zip"