Skip to content

Implemented PEFT feature in CSModel class#28

Open
MKHarsha03 wants to merge 19 commits intovandijklab:masterfrom
MKHarsha03:peft_lora
Open

Implemented PEFT feature in CSModel class#28
MKHarsha03 wants to merge 19 commits intovandijklab:masterfrom
MKHarsha03:peft_lora

Conversation

@MKHarsha03
Copy link
Copy Markdown

@MKHarsha03 MKHarsha03 commented Mar 14, 2026

  • Added the LoRAConfig function from peft into CSModel
  • Added new parameters/arguments to CSModel init constructor to support the newly added LoRAConfig
  • Added an argument to init function in CSModel class to accept the huggingface token
  • Added tutorials 7 to 10 in the readme file and change the version from 3.8 to 3.10. Also changed the pip command for cell2sentence to use version 1.2.0
  • Added new requirements: sentencepiece, tiktoken and protobuf to support the tokenizer of gemma models
  • Commented out the old test script for CSModel and added a new test script to check if model is loading with LoRA modules correctly and validate save path

Copilot AI review requested due to automatic review settings March 14, 2026 12:08
@MKHarsha03
Copy link
Copy Markdown
Author

@aakashdp6548 @SyedA5688 I am an engineering student and this is my first contribution. I am working on a project that uses your library and added the LoRA feature as I needed it for my project. Please review my code and mention any sorts of corrections.

Thank you

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds initial PEFT/LoRA support to the CSModel wrapper and updates packaging/docs to support newer model/tokenizer requirements and a newer Python baseline.

Changes:

  • Add optional LoRA (PEFT) configuration to CSModel initialization.
  • Update packaging requirements (Python >= 3.10; add tokenizer-related deps; add peft/bitsandbytes).
  • Update documentation + README tutorials list and adjust ReadTheDocs configuration.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/cell2sentence/csmodel.py Adds PEFT/LoRA parameters, HF login option, and updates Trainer initialization.
src/cell2sentence/tests/test_csmodel.py Replaces prior CSModel test with a new PEFT-loading-focused test (currently problematic).
src/cell2sentence/tests/small_data_diffgenes.csv Modifies a test fixture CSV (currently breaks existing CSData tests).
setup.cfg Raises minimum Python version and adds new dependencies.
docs/source/csmodel.rst Updates CSModel docs page (currently contains an unresolved conflict marker).
README.md Updates Python version, install instructions, and tutorials list (contains a broken tutorial link and an incomplete row).
.readthedocs.yaml Attempts to configure RTD installs (currently duplicated blocks).
docs/Makefile, docs/make.bat Adds trailing newline only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +23

A CSModel object is a wrapper around a Cell2Sentence model, which tracks the path of the model
saved on disk. When needed, the model is loaded from the path on disk for inference or finetuning.
The class contains utilities for model generation and cell embedding with a Huggingface backend.

.. autofunction:: csmodel.CSModel

.. autofunction:: csmodel.CSModel.__init__

.. autofunction:: csmodel.CSModel.__str__

.. autofunction:: csmodel.CSModel.fine_tune

.. autofunction:: csmodel.CSModel.generate_from_prompt

.. autofunction:: csmodel.CSModel.generate_from_prompt_batched

.. autofunction:: csmodel.CSModel.embed_cell

.. autofunction:: csmodel.CSModel.embed_cells_batched

.. autofunction:: csmodel.CSModel.push_model_to_hub
=======
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot This is resolved in latest commit

@@ -2,4 +2,3 @@
g1,0,3,0,1,3
g2,0,0,1,1,2
g3,3,1,0,0,1
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

Comment on lines +87 to +88
if not os.path.exists(save_dir):
os.mkdir(save_dir)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Comment on lines +78 to +80
if huggingface_token:
login(huggingface_token)

data_collator=data_collator,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
processing_class=self.tokenizer #changed argument from tokenizer to processing_class as per modern documentation
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +52 to +70
class TestCSModelPeftModelLoadingAndErrorHandling:
@classmethod
def setup_class(self):
self.save_dir = "/mnt/c/Users/khmam/Desktop/c2s_model_directory"
self.save_name = "lora_gemma_model"
hf_model_path = "vandijklab/C2S-Scale-Gemma-2-2B"
self.csmodel = CSModel(
model_name_or_path=hf_model_path,
save_dir=self.save_dir,
save_name=self.save_name,
peft = True,
)

def test_csmodel_created_correctly(self):
assert self.csmodel.save_path == os.path.join(self.save_dir, self.save_name)

def test_layers_are_created_correctly(self):
model = AutoModelForCausalLM.from_pretrained(self.csmodel.save_path, trust_remote_code = True)
print(model)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

README.md Outdated
| [c2s_tutorial_4_cell_type_prediction.ipynb](tutorials/c2s_tutorial_4_cell_type_prediction.ipynb) | Cell type prediction using C2S models
| [c2s_tutorial_5_cell_generation.ipynb](tutorials/c2s_tutorial_5_cell_generation.ipynb) | Cell generation conditioned on cell type
| [c2s_tutorial_6_cell_annotation_with_foundation_model.ipynb](tutorials/c2s_tutorial_6_cell_annotation_with_foundation_model.ipynb) | Cell type annotation with foundation model
| [c2s_tutorial_7_custom_prompt_templates.ipynb](tutorials/c2s_tutorials_7_custom_prompt_templates.ipynb) | Custom Prompt Templates with C2S PromptFormatter class
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Comment on lines +23 to +29
install_requires =
torch
transformers
peft
bitsandbytes
datasets
anndata
@MKHarsha03 MKHarsha03 closed this Mar 14, 2026
@MKHarsha03 MKHarsha03 deleted the peft_lora branch March 14, 2026 13:09
@MKHarsha03 MKHarsha03 reopened this Mar 14, 2026
@MKHarsha03
Copy link
Copy Markdown
Author

  • Added the LoRAConfig function from peft into CSModel
  • Added new parameters/arguments to CSModel init constructor to support the newly added LoRAConfig
  • Added an argument to init function in CSModel class to accept the huggingface token
  • Added tutorials 7 to 10 in the readme file and change the version from 3.8 to 3.10. Also changed the pip command for cell2sentence to use version 1.2.0
  • Added new requirements: sentencepiece, tiktoken and protobuf to support the tokenizer of gemma models
  • Commented out the old test script for CSModel and added a new test script to check if model is loading with LoRA modules correctly and validate save path
  • Changed the tokenizer argument in Trainer to processing_class (new versions of transformer accepts this argument)
  • Modified the test code to check if lora layers are created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants