first commit 🎉

mir-group · Mar 16, 2021 · 6d378ea · 6d378ea
commit 6d378ea
Show file tree

Hide file tree

Showing 64 changed files with 10,582 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,159 @@
+.idea/
+.vscode/
+*__pycache__*
+*wandb*
+*.npz
+results/
+experiments/
+nequip.egg-info/
+saved_models/
+cov*.xml
+.coverage
+analysis/
+*.icloud
+benchmark_data/
+log.*
+saved_model/
+*.ipynb_checkpoints/
+tutorial/results/
+tutorial/tutorial_data/
+.DS_store/
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2021 The President and Fellows of Harvard College
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,91 @@
+# NequIP
+
+NequIP is an open-source deep learning package for learning interatomic potentials using E(3)-equivariant convolutions.
+
+### Requirements
+
+* Python, v3.8
+* PyTorch, v1.8
+* Numpy, v1.19.5
+* Scipy, v1.6.0
+* ASE, v3.20.1
+
+In particular, please be sure to install Python 3.8 and Pytorch 1.8. 
+
+### Installation
+
+* Install [PyTorch Geometric](https://github.com/rusty1s/pytorch_geometric), make sure to install this with your correct version of CUDA/CPU. 
+* Install [e3nn](https://github.com/e3nn/e3nn) - it is important to install the ```main``` branch and not the ```master```
+
+```
+pip install git+https://github.com/e3nn/e3nn.git 
+```
+
+* We use [Weights&Biases](https://wandb.ai) to keep track of experiments. This is not a strict requirement, you can use our software without this, but it may make your life easier. If you want to use it, create an account [here](https://wandb.ai) and install it: 
+
+```
+pip install wandb
+```
+
+* Install NequIP
+
+```
+pip install git+https://github.com/mir-group/nequip.git
+```
+
+### Installation Issues
+
+We recommend running the tests using ```pytest```: 
+
+```
+pip install pytest
+pytest ./tests
+```
+
+One some platforms, the installation may complain about the scikit learn installation. If that's the case, specifically install the following scikit-learn version:
+
+```
+pip install -U scikit-learn==0.23.0
+```
+
+That should fix it.
+
+### Tutorial 
+
+The best way to learn how to use NequIP is through the tutorial notebook in ```tutorials```. 
+
+### Training a network
+
+To train a network, all you need to is run train.py with a config file that describes your data set and network, for example: 
+
+```
+python scripts/train.py configs/example.yaml
+```
+
+### References
+
+The theory behind NequIP is described in our preprint [1]. NequIP's backend builds on e3nn, a general framework for building E(3)-equivariant neural networks [2]. 
+
+    [1] https://arxiv.org/abs/2101.03164
+    [2] https://github.com/e3nn/e3nn
+
+### Authors
+
+NequIP is being developed by:
+
+    - Simon Batzner
+    - Anders Johansson
+    - Albert Musaelian
+    - Lixin Sun
+    - Mario Geiger
+    - Tess Smidt
+
+under the guidance of Boris Kozinsky at Harvard.
+
+
+### Citing
+
+If you use this repository in your work, plase consider citing us with the following pre-print: 
+
+    [1] https://arxiv.org/abs/2101.03164
+
diff --git a/configs/example.yaml b/configs/example.yaml
@@ -0,0 +1,124 @@
+# general
+
+# Two folders will be used during the training: 'root'/process and 'root'/'project'
+# project contains logfiles and saved models
+# process contains processed data sets
+# if 'root'/'project' exists, 'root'/'project'_'year'-'month'-'day'-'hour'-'min'-'s' will be used instead.
+root: results/aspirin
+project: maximal
+seed: 0                                                                           # random number seed for numpy and torch
+restart: false                                                                    # set True for a restarted run
+append: false                                                                     # set True if a restarted run should append to the previous log file
+
+# network
+compile_model: False                                                              # whether to compile the constructed model to TorchScript
+num_basis: 8                                                                      # number of basis functions
+r_max: 4.0                                                                        # cutoff radius
+irreps_edge_sh: 0e + 1o + 2e                                                      # irreps of the spherical harmonics used for edges. If a single integer, indicates the full SH up to L_max=that_integer
+conv_to_output_hidden_irreps_out: 16x0e                                           # irreps used in hidden layer of output block
+feature_irreps_hidden: 16x0o + 16x0e + 16x1o + 16x1e + 16x2o + 16x2e              # irreps used for hidden features, here we go up to lmax=2, with even and odd parities
+BesselBasis_trainable: true                                                       # set true to train the bessel weights
+nonlinearity_type: gate                                                           # may be 'gate' or 'norm', 'gate' is recommended
+num_layers: 6                                                                     # number of interaction blocks, we found 5-6 to work best
+resnet: false                                                                     # set True to make interaction block a resnet-style update
+PolynomialCutoff_p: 6                                                             # p-value used in polynomial cutoff function
+convolution_kwargs:                                                               # 
+  invariant_layers: 1                                                             # number of radial layers, we found it important to keep this small, 1 or 2
+  invariant_neurons: 8                                                            # number of hidden neurons in radial function, again keep this small for MD applications, 8 - 32, smaller is faster
+  avg_num_neighbors: null                                                         # number of neighbors to divide by, None => no normalization.
+  use_sc: true                                                                    # use self-connection or not, usually gives big improvement
+
+# data set
+# the keys used need to be stated at least once in key_mapping, npz_fixed_field_keys or npz_keys
+# key_mapping is used to map the key in the npz file to the NequIP default values (see data/_key.py)
+# all arrays are expected to have the shape of (nframe, natom, ?) except the fixed fields
+dataset: npz                                                                       # type of data set, can be npz or ase
+dataset_file_name: ./benchmark_data/aspirin_dft.npz                                # path to data set file
+key_mapping:
+  z: atomic_numbers                                                                # atomic species, integers
+  E: total_energy                                                                  # total potential eneriges to train to
+  F: forces                                                                        # atomic forces to train to
+  R: pos                                                                           # raw atomic positions
+npz_fixed_field_keys:                                                              # fields that are repeated across different examples
+  - atomic_numbers
+
+# As an alternative option to npz, you can also pass data ase ASE Atoms-bojects
+# dataset: ase
+# dataset_file_name: xxx.xyz                                                       # need to be a format accepted by ase.io.read
+# ase_args:                                                                        # any arguments needed by ase.io.read
+#   format: extxyz
+
+# logging
+wandb: true                                                                        # we recommend using wandb for logging
+verbose: info                                                                      # the same as python logging, e.g. warning, info, debug, error. case insensitive
+log_batch_freq: 1                                                                  # batch frequency, how often to print training errors withinin the same epoch
+log_epoch_freq: 1                                                                  # epoch frequency, how often to print and save the model
+
+# training
+n_train: 975                                                                       # number of training data
+n_val: 25                                                                          # number of validation data
+learning_rate: 0.01                                                                # learning rate, we found 0.01 to work best - this is often one of the most important hyperparameters to tune
+batch_size: 5                                                                      # batch size
+max_epochs: 1000000                                                                # stop training after _ number of epochs
+train_val_split: random                                                            # can be random or sequential. if sequential, first n_train elements are training, next n_val are val, else random 
+shuffle: true                                                                      # If true, the data loader will shuffle the data
+metrics_key: loss                                                                  # metrics used for scheduling and saving best model. Options: loss, or anything appear in the
+                                                                                   # validation batch step header, such as f_mae, f_rmse, e_mae, e_rmse
+
+# loss function
+loss_coeffs:                                                                       # different weights to use in a weighted loss functions
+  forces: 1.0                                                                      # for MD applications, we recommed a force weight of 1
+  total_energy: 0.0                                                                # and an energy weight of 0., this usually gives the best errors in the forces
+
+# # default loss function is MSELoss, the name has to be exactly the same as those in torch.nn. 
+# the only supprted targets are forces and total_energy
+
+# here are some example of more ways to declare different types of loss functions, depending on your application:
+# loss_coeffs:
+#   total_energy: MSELoss
+# 
+# loss_coeffs:
+#   total_energy:
+#   - 3.0
+#   - MSELoss
+# 
+# loss_coeffs:
+#   forces: 
+#   - 1.0
+#   - PerSpeciesL1Loss
+# 
+# loss_coeffs: total_energy
+# 
+# loss_coeffs:
+#   total_energy:
+#   - 3.0
+#   - L1Loss
+#   forces: 1.0
+
+# optional keys
+# if true and weights_forces defined in the dataset, the loss function will be weighted
+atomic_weight_on: false
+
+# optimizer, may be any optimizer defined in torch.optim
+# the name `optimizer_name`is case sensitive
+optimizer_name: Adam                                                               # default optimizer is Adam in the amsgrad mode
+optimzer_params:                                                                   # any params taken by the torch.optim.xxx constructor
+  amsgrad: true
+  betas: !!python/tuple
+  - 0.9
+  - 0.999
+  eps: 1.0e-08
+  weight_decay: 0
+
+# lr scheduler, currently only supports the two options listed below, if you need more please file an issue
+# first open, consine annealing with warm restart
+lr_scheduler_name: CosineAnnealingWarmRestarts
+lr_scheduler_params:
+  T_0: 10000
+  T_mult: 2
+  eta_min: 0
+  last_epoch: -1
+
+# alternative option, on-plateau
+# lr_scheduler_name: ReduceLROnPlateau
+# lr_patience: 1000