Building on top of the successes of word embeddings and transformer models for language, increasingly more of these architectures are now being used to learn the 'language of proteins'. This workshop introduces you, both theoretically and practically, to this latest trend in protein sequence analysis and feature engineering.
The slides cover the following topics:
- A short and broad introduction to the similarities and differences between language and proteins
- The history and recent developments on how to encode the information of proteins
- The state-of-the-art & recent applications
- An outlook with trends and challenges
The Jupyter notebook let's you practically explore protein embeddings using two publicly available software packages. Both simple explorations and protein-related machine learning tasks are explored. The notebook is set up to be ran in Google Colab or on Kaggle with a GPU enabled, but you can also run it on your own local machine.
Go to http://colab.research.google.com/, then go to File > Upload notebook and choose this notebook after having saved it to your computer. Then Connect to a runtime (upper right of the screen) and finally go to Runtime > Change runtime type > make sure a GPU is enabled.
Go to https://www.kaggle.com and sign in. Then click the 'Create' button on the left and start a new notebook. On the next screen, do File > Import notebook.
Enjoy!