Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions nlp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,40 +2,40 @@

Current content:

- [_Multilingual Sentence Embeddings_ (21/01/2021)](2021_01_21_multilingual_sentence_embeddings):
- [_Multilingual Sentence Embeddings_](multilingual_sentence_embeddings):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove double space between the bullet point and the text (this applies to all list entries except the last).

Gives an overview of various current multilingual sentence embedding techniques and tools, and
how they compare given various sequence lengths.

- [_Spacy 3.0_ (01/02/2021)](2021_02_01_spacy_3_projects):
- [_Spacy 3.0_](spacy_3_projects):
Spacy 3.0 has just been released and in this tip, we'll have a look at some of the new features.
We'll be training a German NER model and streamline the end-to-end pipeline using the brand new spaCy projects!

- [_Compact transformers_ (26/02/2021)](2021_02_26_compact_transformers):
- [_Compact transformers_](compact_transformers):
Bigger isn't always better. In this tip we look at some compact BERT-based models that provide a nice balance
between computational resources and model accuracy.

- [_Keyword Extraction with pke_ (18/03/2021)](2021_03_18_pke_keyword_extraction):
- [_Keyword Extraction with pke_](pke_keyword_extraction):
The KEYNG (read *king*) is dead, long live the KEYNG!
In this tip we look at `pke`, an alternative to Gensim for keyword extraction.

- [_Explainable transformers using SHAP_ (22/04/2021)](2021_04_22_shap_for_huggingface_transformers):
- [_Explainable transformers using SHAP_](shap_for_huggingface_transformers):
BERT, explain yourself! 📖
Up until recently language model predictions have lacked transparency. In this tip we look at `SHAP`, a way to explain your latest transformer based models.

- [_Transformer-based Data Augmentation_ (18/06/2021)](2021_06_18_data_augmentation):
- [_Transformer-based Data Augmentation_](data_augmentation):
Ever struggled with having a limited non-English NLP dataset for a project? Fear not, data augmentation to the rescue ⛑️
In this week's tip, we look at backtranslation 🔀 and contextual word embedding insertions as data augmentation techniques for multilingual NLP.

- [_Long range transformers_ (14/07/2021)](2021_06_29_long_range_transformers):
- [_Long range transformers_](long_range_transformers):
Beyond and above the 512! 🏅 In this week's tip, we look at novel long range transformer architectures and compare them against the well-known RoBERTa model.

- [_Neural Keyword Extraction_ (10/09/2021)](2021_09_10_neural_keyword_extraction):
- [_Neural Keyword Extraction_](neural_keyword_extraction):
Neural Keyword Extraction 🧠
In this week's tip, we look at neural keyword extraction methods and how they compare to classical methods.

- [_HuggingFace Optimum_ (12/10/2021)](2021_10_12_huggingface_optimum):
- [_HuggingFace Optimum_](huggingface_optimum):
HuggingFace Optimum Quantization ✂️
In this week's tip, we take a look at the new HuggingFace Optimum package to check out some model quantization techniques.

- [ _Text Augmentation using large-scale LMs and prompt engineering_ (25/11/2021)](2021_11_25_augmentation_lm):
- [ _Text Augmentation using large-scale LMs and prompt engineering_](augmentation_lm):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove leading space in link label.

Typically, the more data we have, the better performance we can achieve 🤙. However, it is sometimes difficult and/or expensive to annotate a large amount of training data 😞. In this tip, we leverage three large-scale LMs (GPT-3, GPT-J and GPT-Neo) to generate very realistic samples from a very small dataset.
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ Typically, the more data we have, the better performance we can achieve 🤙. Ho
Large-scale language models (LMs) are excellent few-shot learners, allowing them to be controlled via natural text prompts. In this tip, we leverage three large-scale LMs (GPT-3, GPT-J and GPT-Neo) and prompt engineering to generate very realistic samples from a very small dataset. The model takes as input two real samples from our dataset, embeds them in a carefully designed prompt and generates an augmented mixed sample influenced by the sample sentences. We use the [Emotion](https://huggingface.co/datasets/emotion) dataset and distilled BERT pre-trained model and show that this augmentation method boosts the model performance and generates very realistic samples. For more information on text augmentation using large-scale LMs check [GPT3Mix](https://arxiv.org/pdf/2104.08826.pdf).

We recommend to open the notebook using Colab for an interactive explainable experience and optimal rendering of the visuals 👇:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml6team/quick-tips/blob/main/nlp/2021_11_25_augmentation_lm/nlp_augmentation_lm.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml6team/quick-tips/blob/main/nlp/augmentation_lm/nlp_augmentation_lm.ipynb)
Loading