Skip to content

Latest commit

 

History

History
25 lines (16 loc) · 1.72 KB

README.md

File metadata and controls

25 lines (16 loc) · 1.72 KB

ModernBERT Code retrieval sample project

The repository contains a sample project for using the ModernBERT model. It's using uv for dependency managment, and notebooks for running the code.

Make sure to restore the dependencies before running the notebooks:

uv sync

After that you can open the notebooks in your favorite Jupyter environment (which clearly should be VSCode) and start experimenting with the ModernBERT model!

The ModernBert.ipynb notebook contains the two samples for using two versions of the model:

Note

The repository is using forked version of PyLate - the only changes I've done was bumping dependencies to the pre-release version of transformers (required for ModernBERT)

Code retrieval

The CodeRetrieval.ipynb notebook contains the sample code for using the PyLate model for code retrieval task.

I'm using GitPython to clone one of the well-known OSS Rust projects (https://github.com/chronotope/chrono) and then tree-sitter to parse the code and extract the function definitions together with their docstrings.

After that I'm using the PyLate version of the ModernBERT to create the embeddings and the index of the codebase, and to perform sample retrieval task.