The majority of datasets explored in this directory are from the matbench
collection. Others include:
ricci_carrier_transport
: Electronic Transport Properties by F. Ricci et al. from MPContribs which contains 48,000 DFT Seebeck coefficients (Paper). [Download link (from here)].boltztrap_mp
which contains ~9000 effective mass and thermoelectric properties calculated by the BoltzTraP software package.tri_camd_2022
: Toyota Research Institute's 2nd active learning crystal discovery dataset from Computational Autonomy for Materials Discovery (CAMD)WBM
: From the paper Predicting stable crystalline compounds using chemical similarity published Jan 26, 2021 in Nature. A dataset generated with DFT building on earlier work by some of the same authors published in The optimal one dimensional periodic table: a modified Pettifor chemical scale from data mining. Kindly shared by the author Hai-Chen Wang on email request.
MatBench is an ImageNet for materials science; a set of 13 supervised, pre-cleaned, ready-to-use ML tasks for benchmarking and fair comparison. The tasks span across the domain of inorganic materials science applications.
To browse these datasets online, go to ml.materialsproject.org and log in. Datasets were originally published in https://www.nature.com/articles/s41524-020-00406-3.
Detailed information about how each dataset was created and prepared for use is available at https://hackingmaterials.lbl.gov/matminer/dataset_summary.html
task name | target column (unit) | sample count | task type | input | download |
---|---|---|---|---|---|
matbench_dielectric |
n (unitless) |
4764 | regression | structure | download |
matbench_expt_gap |
gap expt (eV) |
4604 | regression | composition | download |
matbench_expt_is_metal |
is_metal (unitless) |
4921 | classification | composition | download |
matbench_glass |
gfa (unitless) |
5680 | classification | composition | download |
matbench_jdft2d |
exfoliation_en (meV/atom) |
636 | regression | structure | download |
matbench_log_gvrh |
log10(G_VRH) (log(GPa)) |
10987 | regression | structure | download |
matbench_log_kvrh |
log10(K_VRH) (log(GPa)) |
10987 | regression | structure | download |
matbench_mp_e_form |
e_form (eV/atom) |
132752 | regression | structure | download |
matbench_mp_gap |
gap pbe (eV) |
106113 | regression | structure | download |
matbench_mp_is_metal |
is_metal (unitless) |
106113 | classification | structure | download |
matbench_perovskites |
e_form (eV, per unit cell) |
18928 | regression | structure | download |
matbench_phonons |
last phdos peak (1/cm) |
1265 | regression | structure | download |
matbench_steels |
yield strength (MPa) |
312 | regression | composition | download |
task name | verified top score (MAE or ROCAUC) | algorithm name, config, | general purpose algorithm? |
---|---|---|---|
matbench_dielectric |
0.299 (unitless) | Automatminer express v1.0.3.2019111 | yes |
matbench_expt_gap |
0.416 eV | Automatminer express v1.0.3.2019111 | yes |
matbench_expt_is_metal |
0.92 | Automatminer express v1.0.3.2019111 | yes |
matbench_glass |
0.861 | Automatminer express v1.0.3.2019111 | yes |
matbench_jdft2d |
38.6 meV/atom | Automatminer express v1.0.3.2019111 | yes |
matbench_log_gvrh |
0.0849 log(GPa) | Automatminer express v1.0.3.2019111 | yes |
matbench_log_kvrh |
0.0679 log(GPa) | Automatminer express v1.0.3.2019111 | yes |
matbench_mp_e_form |
0.0327 eV/atom | MEGNet v0.2.2 | yes, structure only |
matbench_mp_gap |
0.228 eV | CGCNN (2019) | yes, structure only |
matbench_mp_is_metal |
0.977 | MEGNet v0.2.2 | yes, structure only |
matbench_perovskites |
0.0417 | MEGNet v0.2.2 | yes, structure only |
matbench_phonons |
36.9 cm^-1 | MEGNet v0.2.2 | yes, structure only |
matbench_steels |
95.2 MPa | Automatminer express v1.0.3.2019111 | yes |