Skip to content

maciek-wisniewski/MolecularDynamicsPipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

67 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Molecular Dynamics Simulation Pipeline

A comprehensive molecular dynamics simulation pipeline for protein-ligand systems, built with OpenMM and designed for high-throughput screening and detailed biomolecular analysis. This pipeline supports multi-stage MD simulations with advanced bond preservation, checkpoint recovery, and SLURM cluster integration.

πŸš€ Features

  • Multi-stage MD simulation pipeline: Warmup β†’ Backbone restraint removal β†’ NVT β†’ NPT β†’ Production
  • Checkpoint-based recovery: Resume interrupted simulations from any stage
  • SLURM cluster integration: High-throughput batch processing capabilities
  • Multiple file format support: PDB, CIF, SDF, MOL2 with proper bond handling
  • Comprehensive reporting: Forces, trajectories, thermodynamic data, and Hessians
  • GPU acceleration: CUDA and OpenCL platform support
  • Flexible force fields: AMBER, GAFF, OpenFF with customizable parameters

πŸ“‹ Prerequisites

  • Python: 3.7-3.12 (recommended: 3.11)
  • CUDA: For GPU acceleration (optional but recommended)
  • Git: For installation from source
  • Conda/Mamba: For environment management

πŸ”§ Installation

Method 1: Installation with Conda Environment (Recommended)

  1. Clone the repository:

    git clone https://github.com/maciejwisniewski-drugdiscovery/MolecularDynamicsPipeline.git
    cd MolecularDynamicsPipeline
  2. Create conda environment from YAML:

    conda env create -f environment.yml
    conda activate molecular_dynamics_pipeline
  3. Install the package in development mode:

    pip install -e .

Verification

Test your installation:

# Quick test
python -c "import molecular_dynamics_pipeline; print('Installation successful!')"

The validation script will check:

  • Python version compatibility
  • All required dependencies
  • GPU/CUDA support
  • Basic OpenMM functionality

πŸ“ Project Structure

plinder_dynamics/
β”œβ”€β”€ config/                           # Configuration templates
β”‚   β”œβ”€β”€ plinder_parameters_bound.yaml      # Bound state simulations
β”‚   β”œβ”€β”€ plinder_parameters_unbound.yaml    # Unbound state simulations  
β”‚   β”œβ”€β”€ plinder_parameters_metadynamics.yaml # Enhanced sampling
β”‚   β”œβ”€β”€ misato_parameters.yaml             # MISATO dataset configs
β”‚   └── simulation_parameters.yaml         # Base parameters
β”œβ”€β”€ scripts/                          # Execution scripts
β”‚   β”œβ”€β”€ run_simulation.py                 # Main simulation runner
β”‚   β”œβ”€β”€ plinder_scripts/                  # PLINDER-specific scripts
β”‚   └── misato_scripts/                   # MISATO-specific scripts
β”œβ”€β”€ src/dynamics_pipeline/            # Core pipeline modules
β”‚   β”œβ”€β”€ simulation/                       # MD simulation engine
β”‚   β”œβ”€β”€ data/                            # Data handling and processing
β”‚   └── utils/                           # Utilities and helpers
β”œβ”€β”€ environment.yml                   # Conda environment specification
└── setup.py                        # Package installation

βš™οΈ Configuration

Configuration File Structure

The pipeline uses YAML configuration files with the following sections:

1. System Information (info)

info:
  system_id: "1abc_ligand_123"        # Unique system identifier
  simulation_id: "bound_state_md"      # Simulation identifier
  use_plinder_index: true             # Use PLINDER database integration
  bound_state: true                   # Bound vs unbound simulation

2. File Paths (paths)

paths:
  raw_protein_files: 
    - "path/to/protein.pdb"           # Protein structure files
  raw_ligand_files:
    - "path/to/ligand.sdf"            # Ligand structure files  
  output_dir: "path/to/output"        # Output directory

3. Preprocessing Parameters (preprocessing)

preprocessing:
  process_protein: true               # Clean protein with PDBFixer
  process_ligand: true                # Process ligand with OpenFF
  add_solvent: true                   # Add explicit solvent
  ionic_strength: 0.15                # Salt concentration (M)
  box_padding: 1.0                    # Solvent box padding (nm)

4. Force Field Configuration (forcefield)

forcefield:
  proteinFF: "amber14-all.xml"        # Protein force field
  nucleicFF: "amber14/DNA.OL15.xml"   # Nucleic acid force field
  ligandFF: "gaff-2.11"               # Ligand force field (gaff-2.11, openff-2.0.0)
  waterFF: "amber14/tip3pfb.xml"      # Water model
  water_model: "tip3p"                # Water model name
  forcefield_kwargs:                  # Additional FF parameters
    rigidWater: true
    removeCMMotion: false
    hydrogenMass: 1.5                 # Hydrogen mass repartitioning

Force Field Options:

  • Protein: amber14-all.xml, amber14/protein.ff14SB.xml, amber99sbildn.xml
  • Ligand: gaff-2.11, openff-2.0.0, openff-2.1.0
  • Water: tip3p, tip4pew, spce

5. Simulation Parameters (simulation_params)

Platform Configuration:

simulation_params:
  platform:
    type: "CUDA"                      # Platform: CUDA, OpenCL, CPU
    devices: "0"                      # GPU device indices
  backbone_restraint_force: 100.0     # Backbone restraint (kcal/mol/Γ…Β²)
  save_forces: true                   # Save force data
  save_hessian: false                 # Save Hessian matrices

Stage-Specific Parameters:

Each simulation stage (warmup, backbone_removal, nvt, npt, production) supports:

  warmup:
    init_temp: 50.0                   # Initial temperature (K)
    final_temp: 300.0                 # Final temperature (K)
    friction: 1.0                     # Langevin friction (ps⁻¹)
    time_step: 2.0                    # Integration timestep (fs)
    heating_step: 100                 # Steps per 1K temperature increase
    checkpoint_interval: 1000         # Checkpoint frequency
    trajectory_interval: 1000         # Trajectory save frequency
    state_data_reporter_interval: 1000 # State data frequency

Creating Configuration Files

  1. Copy a template:

    cp config/plinder_parameters_bound.yaml my_simulation.yaml
  2. Edit required fields:

    • Set system_id and simulation_id
    • Update file paths in paths section
    • Adjust simulation parameters as needed
  3. Validate configuration:

    python scripts/run_simulation.py --config my_simulation.yaml --validate-only

πŸƒ Usage

Basic Simulation Execution

Single simulation:

python scripts/run_simulation.py --config config/my_simulation.yaml

With custom output directory:

python scripts/run_simulation.py \
  --config config/my_simulation.yaml \
  --output-dir /path/to/output

Advanced Options

Validation mode (check config without running):

python scripts/run_simulation.py \
  --config config/my_simulation.yaml \
  --validate-only

Verbose logging:

python scripts/run_simulation.py \
  --config config/my_simulation.yaml \
  --log-level DEBUG

PLINDER Integration

For PLINDER database systems:

python scripts/plinder_scripts/run_single_plinder_simulation.py \
  --plinder_id "1abc__1.00__ligand_113" \
  --config config/plinder_parameters_bound.yaml \
  --output-dir /path/to/output

πŸ“Š Output Structure

output_directory/
β”œβ”€β”€ forcefields/                      # Ligand topology with bonds
β”‚   β”œβ”€β”€ {ligand_name}_topology.sdf         # SDF format with bonds
β”‚   β”œβ”€β”€ {ligand_name}_topology.mol2        # MOL2 format with bonds
β”‚   └── {ligand_name}_info.yaml            # Ligand metadata
β”œβ”€β”€ trajectories/                     # Simulation trajectories
β”‚   β”œβ”€β”€ {system_id}_warmup_trajectory.npz       # NPZ trajectory data
β”‚   β”œβ”€β”€ {system_id}_nvt_trajectory.npz          # NPZ trajectory data
β”‚   └── {system_id}_production_trajectory.npz   # NPZ trajectory data
β”œβ”€β”€ checkpoints/                      # Checkpoint files for recovery
β”‚   β”œβ”€β”€ {system_id}_warmup_checkpoint.dcd
β”‚   └── {system_id}_production_checkpoint.dcd
β”œβ”€β”€ state_data_reporters/             # Thermodynamic data
β”‚   β”œβ”€β”€ {system_id}_warmup_state_data.csv
β”‚   └── {system_id}_production_state_data.csv
β”œβ”€β”€ states/                          # XML state files
β”œβ”€β”€ topologies/                      # Structure files with bonds
β”‚   β”œβ”€β”€ {system_id}_warmup_topology.cif
β”‚   └── {system_id}_production_topology.cif
β”œβ”€β”€ forces/                          # Force data (if enabled)
β”‚   └── {system_id}_production_forces.npy
β”œβ”€β”€ hessians/                        # Hessian matrices (if enabled)
β”‚   └── {system_id}_production_hessian.npy
└── {system_id}_init_complex.cif     # Initial system structure

πŸ“ž Support

For questions and support:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors