Educational resources to get started with Natural Language Processing in Python.
By Sebastian Castro, 2020
For more background, check out the following resources:
Install conda and then create and activate a conda environment
conda create --name intro-nlp --file conda-requirements.txt
conda activate intro-nlp
The version of HuggingFace Transformers available in conda
is quite outdated, so you should directly install that one using pip
. To do this, first make sure that you are in your conda environment!
conda activate intro-nlp
pip install transformers
Basic text processing and sentence parsing using a grammar.
Refer to the Rule-Based Processing README for more information.
The "old school" of NLP, including features such as bag-of-words and machine learning classifiers that do not use neural networks, such as Naive Bayes and Support Vector Machines (SVM).
Refer to the Traditional Machine Learning README for more information.
Here we will see how neural networks have revolutionized NLP, using techniques like word embeddings to reduce vocabulary dimensionality and recurrent neural networks with elements like Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM) units.
Finally we will look at the most state-of-the-art deep learning based NLP models like Transformers, which do away with recurrent neural networks and their disadvantages by using attention mechanisms.
Refer to the Deep Learning README for more information.
- re for regular expressions
- NLTK for traditional NLP
- scikit-learn for traditional machine learning
- PyTorch for neural networks
- HuggingFace Transformers for transformer networks