-
Notifications
You must be signed in to change notification settings - Fork 18
Add new makers for unified training data generation #451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| rattle_seed: int = 42 | ||
| rattle_mc_n_iter: int = 10 | ||
| w_angle: list[float] | None = None | ||
| w_angle: list[int] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not float? Technically, an angle can be a float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s not the exact angle; instead, it’s a list of angle indices to be distorted — 0 for alpha, 1 for beta, and 2 for gamma. This part wasn’t written by me, but after reading through the code, that’s how I understood it. Please correct me if I’m wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, in this case, you could use Literal to say it can only be 0, 1 or 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, use Literal in other places where a fixed set of options is possible. Example rattle_type, distort_type
| BaseVaspMaker | CastepStaticMaker | ForceFieldStaticMaker | None | ||
| BaseVaspMaker | BaseCastepMaker | ForceFieldStaticMaker | None | ||
| ) = None | ||
| config_type: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you moved this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to define all parameters in one place, and have the make function only handle the structure. Is there any convention to follow for this?
| This Maker will first generate training data based on the chosen ML model (default is GAP) | ||
| by randomizing (ase rattle) atomic displacements in supercells of the provided input structures. | ||
| Then it will proceed with MLIP-based Phonon calculations (based on atomate2 PhononMaker), collect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a pure data generation or a data generation with benchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be a reusable maker? Why is it named GenerateTrainingDataForTesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be a reusable maker? Why is it named
GenerateTrainingDataForTesting?
Ah, I found it 7f47badfaf202339e6ff770394fb7bca7a146f9f
It was supposed to be temporary. Additionally, the text description of the maker and what it is doing does not match. Maybe best to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not new code; it was already there before. It also looks like this part is not being used. I can remove it if @JaGeo, @naik-aakash, and @QuantumChemist are okay with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we take it out?
src/autoplex/data/common/flows.py
Outdated
| """ | ||
|
|
||
| name: str = "do_rattling" | ||
| bulk_relax_maker: BaseVaspMaker | BaseCastepMaker | ForceFieldRelaxMaker = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use lambda calls here with field and default_factory like it's done in other makers.
src/autoplex/data/rss/utils.py
Outdated
| tuple: | ||
| atoms: list | ||
| List of ASE Atoms objects read from the trajectory files. | ||
| pressures: list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it list[list[float]] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For pressures, yes. Modified.
| scaled_cell_2_2 = scale_cell(structure=structure, volume_custom_scale_factors=[0.95, 1.0, 1.05, 1.10], n_structures=4) | ||
|
|
||
| assert len(scaled_cell) == 11 | ||
| assert len(scaled_cell) == 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was there a bug before ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue was that the previous code had a default value of n_structures = 10, but it actually generated 11 structures in the end, which was quite confusing. So I modified the code to make the number of generated structures consistent with the defined value.
| job3 = DFTStaticLabelling(e0_spin=e0_spin, | ||
| isolated_atom=isolated_atom, | ||
| dimer=dimer, | ||
| include_isolated_atom=isolated_atom, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we still need the mock_do_rss_iterations_multi_jobs and mock_do_rss_iterations functions, as we have a complete RSS workflow test in place?
This PR adds new makers that unify both the generation and DFT labeling of training data, so that the produced data can be directly used for fitting machine-learned potentials.
Major updates include:
MDMaker: Enables molecular dynamics simulations with multiple temperature schedules (e.g., NVT, NPT, heating/cooling).RattledTrainingDataMaker: Added to support structure perturbation–based data generation.CastepRelaxMakerandCastepDoubleRelaxMakerfor CASTEP interface.DataGenerationMaker: Implement a unified maker to combine different data generation methods (e.g., RSS, MD, rattling, etc.).