Skip to content

dimiboeckaerts/ProteinLanguageWorkshop

Repository files navigation

A workshop on protein language models

Building on top of the successes of word embeddings and transformer models for language, increasingly more of these architectures are now being used to learn the 'language of proteins'. This workshop introduces you, both theoretically and practically, to this latest trend in protein sequence analysis and feature engineering.

Theory slides

The slides cover the following topics:

  • A short and broad introduction to the similarities and differences between language and proteins
  • The history and recent developments on how to encode the information of proteins
  • The state-of-the-art & recent applications
  • An outlook with trends and challenges

Jupyter notebook

The Jupyter notebook let's you practically explore protein embeddings using two publicly available software packages. Both simple explorations and protein-related machine learning tasks are explored. The notebook is set up to be ran in Google Colab or on Kaggle with a GPU enabled, but you can also run it on your own local machine.

Open in Google Colab

Go to http://colab.research.google.com/, then go to File > Upload notebook and choose this notebook after having saved it to your computer. Then Connect to a runtime (upper right of the screen) and finally go to Runtime > Change runtime type > make sure a GPU is enabled.

Open in Kaggle

Go to https://www.kaggle.com and sign in. Then click the 'Create' button on the left and start a new notebook. On the next screen, do File > Import notebook.

Enjoy!

About

An introductory workshop to protein language models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published