The repository contains a sample project for using the ModernBERT model. It's using uv
for dependency managment, and notebooks for running the code.
Make sure to restore the dependencies before running the notebooks:
uv sync
After that you can open the notebooks in your favorite Jupyter environment (which clearly should be VSCode) and start experimenting with the ModernBERT model!
The ModernBert.ipynb
notebook contains the two samples for using two versions of the model:
- base model used on fill-mask task (
answerdotai/ModernBERT-base
) - PyLate model used for multi-vector retrieval task (
joe32140/ColModernBERT-base-msmarco-en-bge
)
Note
The repository is using forked version of PyLate - the only changes I've done was bumping dependencies to the pre-release version of transformers
(required for ModernBERT
)
The CodeRetrieval.ipynb
notebook contains the sample code for using the PyLate model for code retrieval task.
I'm using GitPython
to clone one of the well-known OSS Rust projects (https://github.com/chronotope/chrono) and then tree-sitter
to parse the code and extract the function definitions together with their docstrings.
After that I'm using the PyLate version of the ModernBERT to create the embeddings and the index of the codebase, and to perform sample retrieval task.