Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add finetune method for MatterSim #68

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

brian-xue
Copy link

  1. Logger Enhancements

    • Updated the logger format.
    • Introduced a new logger class to filter redundant outputs in multi-processing environments.
  2. Installation Update

    • Added wandb to the installation requirements.
  3. Dataset Construction

    • Enabled one-click dataset construction from vasprun.xml to .xyz files.
  4. Enhancements in potential.py

  • src/mattersim/forcefield/potential.py: Integrated a new logger utility, added early stopping based on the best metric epoch, and updated the training process to support distributed training and early stopping.
  1. Finetune Feature for MatterSim Model
    New Scripts:
  • script/finetune_mattersim.py: Added a new script for fine-tuning the MatterSim model with support for distributed training and logging.
  • script/vasprun_to_xyz.py: Added a new script to convert VASP output files to XYZ format, including splitting data into training, validation, and test sets.

I tested the finetune method on my custom H-structure dataset. The process involved generating .xyz files from vasprun.xml and then finetuning the model using the following command:

torchrun --nproc_per_node=4 --master_port 29501 script/finetune_mattersim.py \
--load_model_path mattersim-v1.0.0-1m \
--train_data_path xyz_files/train.xyz \
--valid_data_path xyz_files/valid.xyz \
--batch_size 8 \
--lr 2e-4 \
--step_size 20 \
--epochs 1500 \
--early_stop_patience 30 \
--save_path ./test \
--wandb \
--save_checkpoint \
--ckpt_interval 10

I have train it in 4xV100 GPUs and the training process took around 1 hours. The model was able to satisfy my expectations and I am happy with the results.

微信图片_20241225144853

@brian-xue
Copy link
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant