A workshop on protein language models

Building on top of the successes of word embeddings and transformer models for language, increasingly more of these architectures are now being used to learn the 'language of proteins'. This workshop introduces you, both theoretically and practically, to this latest trend in protein sequence analysis and feature engineering.

Theory slides

The slides cover the following topics:

A short and broad introduction to the similarities and differences between language and proteins
The history and recent developments on how to encode the information of proteins
The state-of-the-art & recent applications
An outlook with trends and challenges

Jupyter notebook

The Jupyter notebook let's you practically explore protein embeddings using two publicly available software packages. Both simple explorations and protein-related machine learning tasks are explored. The notebook is set up to be ran in Google Colab or on Kaggle with a GPU enabled, but you can also run it on your own local machine.

Open in Google Colab

Go to http://colab.research.google.com/, then go to File > Upload notebook and choose this notebook after having saved it to your computer. Then Connect to a runtime (upper right of the screen) and finally go to Runtime > Change runtime type > make sure a GPU is enabled.

Open in Kaggle

Go to https://www.kaggle.com and sign in. Then click the 'Create' button on the left and start a new notebook. On the next screen, do File > Import notebook.

Enjoy!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
protein_language_practice.ipynb		protein_language_practice.ipynb
protein_language_slides.pdf		protein_language_slides.pdf
protein_language_solutions.ipynb		protein_language_solutions.ipynb
uniprot_sapiens.fasta		uniprot_sapiens.fasta
uniprot_yeast.fasta		uniprot_yeast.fasta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A workshop on protein language models

Theory slides

Jupyter notebook

Open in Google Colab

Open in Kaggle

About

Releases

Packages

Languages

License

dimiboeckaerts/ProteinLanguageWorkshop

Folders and files

Latest commit

History

Repository files navigation

A workshop on protein language models

Theory slides

Jupyter notebook

Open in Google Colab

Open in Kaggle

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages