Skip to content

Krzysztof-Cieslak/ModernBert-CodeRetrieval

Repository files navigation

ModernBERT Code retrieval sample project

The repository contains a sample project for using the ModernBERT model. It's using uv for dependency managment, and notebooks for running the code.

Make sure to restore the dependencies before running the notebooks:

uv sync

After that you can open the notebooks in your favorite Jupyter environment (which clearly should be VSCode) and start experimenting with the ModernBERT model!

The ModernBert.ipynb notebook contains the two samples for using two versions of the model:

Note

The repository is using forked version of PyLate - the only changes I've done was bumping dependencies to the pre-release version of transformers (required for ModernBERT)

Code retrieval

The CodeRetrieval.ipynb notebook contains the sample code for using the PyLate model for code retrieval task.

I'm using GitPython to clone one of the well-known OSS Rust projects (https://github.com/chronotope/chrono) and then tree-sitter to parse the code and extract the function definitions together with their docstrings.

After that I'm using the PyLate version of the ModernBERT to create the embeddings and the index of the codebase, and to perform sample retrieval task.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published