Skip to content

dwmoreau/MLI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

226 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLINDEX - A data driven approach to powder diffraction indexing

A powder diffraction indexing program that uses machine learning models to initialize the SVD-Index algorithm. It takes an input peak list and returns a list of unit cells ranked by Figure of Merit.

Note: The paper describing the methods is currently in submission to the Journal of Applied Crystallography. This application is in beta stage. Usage and feedback would be greatly appreciated to improve user experience.

Installation

Standard installation (pip)

pip install mlindex
mlindex.download_models

mlindex.download_models fetches the ML model files (~1 GB) from GitHub using git and git-lfs and installs them to ~/.local/share/mlindex/models/. git-lfs must be installed before running this step.

The model directory can be customized with --models-dir or the MLINDEX_MODELS_DIR environment variable:

mlindex.download_models --models-dir /path/to/models
export MLINDEX_MODELS_DIR=/path/to/models

Developer installation (git clone)

Required for model training, dataset generation, or contributing to the codebase. The machine learning models are version controlled through git-lfs.

  1. Clone the repository:

    git clone [email protected]:dwmoreau/MLI.git
  2. Retrieve the model files:

    git lfs fetch --all
    git lfs checkout
  3. Install the project:

    cd /path/to/the/cloned/repo
    pip install .

Usage

Peak List Generation

Peak list files generated by GSAS-II can be used directly. GSAS-II provides tutorials for creating peak lists:

Alternatively, provide the d-spacings of the observed diffraction peaks in units of q², where q² = (2 sin θ / λ)² = 1/d² (Å⁻²). Save this list to a numpy array.

Note: Only the first 20 peaks in the list are used internally.

Code Execution

Using a numpy array

mlindex.run --peak-file /path/to/your/file/peaks.npy

Using a GSAS-II pkslst file

When using a GSAS-II pkslst file, you must supply the wavelength:

mlindex.run --peak-file /path/to/your/file/peaks.pkslst --wavelength 0.413128

Parallel execution (recommended)

Use --nproc N to run with N parallel worker processes. This is the recommended way to speed up indexing:

mlindex.run --peak-file /path/to/your/file/peaks.npy --nproc 4

Zero-point error correction

If your instrument has a systematic 2θ offset, use --zero-error to correct for it during indexing. This option requires a wavelength to be specified:

mlindex.run --peak-file /path/to/your/file/peaks.pkslst --wavelength 0.413128 --zero-error

MPI mode (HPC clusters)

MPI mode is available for use on HPC clusters with MPI infrastructure. It requires exactly 6 MPI ranks and the --mpi flag:

mpiexec -n 6 mlindex.run --peak-file /path/to/your/file/peaks.pkslst --wavelength 0.413128 --mpi

Analytical Indexer (lightweight alternative)

mlindex.run_analytical uses a geometry-based guess-and-check approach instead of ML models. It covers the 11 higher-symmetry Bravais lattices (cF, cI, cP, hP, hR, tI, tP, oC, oF, oI, oP) and requires no model files.

Basic usage

mlindex.run_analytical --peak-file /path/to/your/file/peaks.npy

Using a GSAS-II pkslst file

mlindex.run_analytical --peak-file /path/to/your/file/peaks.pkslst --wavelength 0.413128

Parallel execution

mlindex.run_analytical --peak-file /path/to/your/file/peaks.npy --nproc 4

Zero-point error correction

mlindex.run_analytical --peak-file /path/to/your/file/peaks.pkslst --wavelength 0.413128 --zero-error

MPI mode

mpiexec -n 6 mlindex.run_analytical --peak-file /path/to/your/file/peaks.pkslst --wavelength 0.413128 --mpi

Results are written to analytic_results.json.


Results Interpretation

The program outputs the top 20 unit cell candidates ranked by M20 score and writes them to indexing_results.json:

Indexing Results

Column Descriptions

Column Description
M20 de Wolff Figure of Merit (Wolff 1968)
Minfo Figure of Merit from Taupin (1988)
n_indexed Number of indexed peaks, using a probability from Taupin (1988) and a 95% threshold
bravais_lattice Assumed Bravais lattice for the unit cell optimization
spacegroup Spacegroup whose systematic absences best align with the observed peak list
volume Unit cell volume (ų)
a, b, c Unit cell edge lengths (Å)
alpha, beta, gamma Unit cell angles (°)

Acknowledgements

The US Department of Energy Integrated Computational and Data Infrastructure for Scientific Discovery supported this work via grant DE-SC0022215 to Aaron S. Brewster (LBL), Tess Smidt (MIT), and Nate Hohmann (UCONN).

Citations

  • Taupin, D. (1988). J. Appl. Cryst. 21, 485-489.
  • Wolff, P. M. D. (1968). J. Appl. Cryst. 1, 108.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors