Skip to content

Conversation

@YuanbinLiu
Copy link
Collaborator

@YuanbinLiu YuanbinLiu commented Oct 24, 2025

This PR adds new makers that unify both the generation and DFT labeling of training data, so that the produced data can be directly used for fitting machine-learned potentials.

Major updates include:

  • MDMaker: Enables molecular dynamics simulations with multiple temperature schedules (e.g., NVT, NPT, heating/cooling).
  • RattledTrainingDataMaker: Added to support structure perturbation–based data generation.
  • Castep-specific makers: Introduced CastepRelaxMaker and CastepDoubleRelaxMaker for CASTEP interface.
  • DataGenerationMaker: Implement a unified maker to combine different data generation methods (e.g., RSS, MD, rattling, etc.).

rattle_seed: int = 42
rattle_mc_n_iter: int = 10
w_angle: list[float] | None = None
w_angle: list[int] | None = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not float? Technically, an angle can be a float?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not the exact angle; instead, it’s a list of angle indices to be distorted — 0 for alpha, 1 for beta, and 2 for gamma. This part wasn’t written by me, but after reading through the code, that’s how I understood it. Please correct me if I’m wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in this case, you could use Literal to say it can only be 0, 1 or 2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, use Literal in other places where a fixed set of options is possible. Example rattle_type, distort_type

BaseVaspMaker | CastepStaticMaker | ForceFieldStaticMaker | None
BaseVaspMaker | BaseCastepMaker | ForceFieldStaticMaker | None
) = None
config_type: str | None = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you moved this here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to define all parameters in one place, and have the make function only handle the structure. Is there any convention to follow for this?

This Maker will first generate training data based on the chosen ML model (default is GAP)
by randomizing (ase rattle) atomic displacements in supercells of the provided input structures.
Then it will proceed with MLIP-based Phonon calculations (based on atomate2 PhononMaker), collect
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a pure data generation or a data generation with benchmark?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be a reusable maker? Why is it named GenerateTrainingDataForTesting?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be a reusable maker? Why is it named GenerateTrainingDataForTesting?

Ah, I found it 7f47badfaf202339e6ff770394fb7bca7a146f9f

It was supposed to be temporary. Additionally, the text description of the maker and what it is doing does not match. Maybe best to remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not new code; it was already there before. It also looks like this part is not being used. I can remove it if @JaGeo, @naik-aakash, and @QuantumChemist are okay with it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we take it out?

"""

name: str = "do_rattling"
bulk_relax_maker: BaseVaspMaker | BaseCastepMaker | ForceFieldRelaxMaker = (
Copy link
Collaborator

@naik-aakash naik-aakash Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use lambda calls here with field and default_factory like it's done in other makers.

tuple:
atoms: list
List of ASE Atoms objects read from the trajectory files.
pressures: list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it list[list[float]] ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pressures, yes. Modified.

scaled_cell_2_2 = scale_cell(structure=structure, volume_custom_scale_factors=[0.95, 1.0, 1.05, 1.10], n_structures=4)

assert len(scaled_cell) == 11
assert len(scaled_cell) == 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there a bug before ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue was that the previous code had a default value of n_structures = 10, but it actually generated 11 structures in the end, which was quite confusing. So I modified the code to make the number of generated structures consistent with the defined value.

job3 = DFTStaticLabelling(e0_spin=e0_spin,
isolated_atom=isolated_atom,
dimer=dimer,
include_isolated_atom=isolated_atom,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we still need the mock_do_rss_iterations_multi_jobs and mock_do_rss_iterations functions, as we have a complete RSS workflow test in place?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants