This repository contains the official implementation of the paper titled "LipoPU: Pocket-level Prediction of Lipid-Protein Interactions via Positive-Unlabeled Learning" accepted by ICML26.
LipoPU is a pocket-centric model for lipid-protein interaction prediction under incomplete annotation. It uses a ranking-based positive-unlabeled learning objective to prioritize lipid-binding pockets without treating all unlabeled pockets as negatives. LipoPU supports both binary lipid-binding detection and multi-label lipid category prediction. The model represents each pocket with attention-based multiple instance learning (MIL) pooling over residue-level protein embeddings, enabling residue-level interpretability.
This repository provides the LipoPU implementation, a pretrained checkpoint, and a small case-study tutorial. Users can prepare their own datasets by following the input format demonstrated in Case_study_example.csv, LipoPU_tutorial.ipynb, and the accompanying Python scripts.
.
├── assets/
│ └── LipoPU.png
├── README.md
├── LipoPU_tutorial.ipynb
├── Case_study_example.csv
├── best_LipoPU_attn_pooling_64bs_only_BioLiP2_run1.pt
├── DataLoader.py
├── model_attn.py
├── train_attn_pooling_multi_rounds.py
├── loss_es.py
└── utils.py
Key files:
LipoPU_tutorial.ipynb: tutorial notebook for running the pretrained model on the provided example.Case_study_example.csv: small example input table used by the tutorial.best_LipoPU_attn_pooling_64bs_only_BioLiP2_run1.pt: pretrained checkpoint.model_attn.py: model definition, including masked attention pooling.DataLoader.py: HDF5 dataset, PU sampler, and padding collate utilities used for training.loss_es.py: positive-unlabeled loss and validation ranking loss.utils.py: utility functions, including the prior estimator used by the training script.train_attn_pooling_multi_rounds.py: training script for LipoPU.
Citation information will be added after publication.
