Skip to content

nflechas/sentence_similarity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Textual Similarity

Table of Contents (Optional)

1. Description

This work investigates how three types of large language models pre-trained on different types of corpora (monolingual, multilingual and multimodal), perform on a Semantic Similarity Task after being fine-tuned on Spanish or English dataset. It finds: 1). A comparable performance between monolingual and multilingual models; 2). A sharp advantage of monolingual models over multimodal models and 3). A similar performance to the state-of-the-art presented in the SemEval-2017 STS shared-task (Cer et al., 2017).

Summary of the fine-tuning experiments

2. Usage

The fine-tuned models with the best scores are available at:

To use any of these models:

```
from transformers import BertTokenizer, AutoModelForSequenceClassification

tokenizer = BertTokenizer.from_pretrained("nflechas/<MODEL_NAME>")
model = AutoTokenizer.from_pretrained("bert-base-cased")

sent_1 = "Replace me by any text you'd like."
sent_2 = "Same here."
encoded_input = tokenizer(sent_1, 
                          sent_2, 
                          return_tensors='tf', 
                          model_max_length=512)
output = model(encoded_input)

```

3. Results

Pearson’s correlation RMSE Dataset
Baseline Spanish (SemEval-2017) 0.856 - STS-es
Baseline English (SemEval-2017) 0.855 - STS-en
BERT base (GLUE) (Devlin et al., 2019) 0.858 - STS-en
BERT large (GLUE) (Devlin et al., 2019) 0.865 - STS-en
A. Spanish 0.833 0.684 STS-es
B. Multilingual-Es 0.835 0.663 STS-es
C. English 0.881 0.704 STS-en
D. Multimodal-En 0.130 1.65 STS-en
E. Multimodal-Es 0.190 1.18 STS-es

4. Data

5. References

  • Cer, D.M., Diab, M.T., Agirre, E., Lopez-Gazpio, I., & Specia, L. (2017). SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. International Workshop on Semantic Evaluation.
  • Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv, abs/1810.04805.
  • Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S.R. (2018). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. BlackboxNLP@EMNLP.

6. License

The MIT License (MIT) Copyright (c) 2023, Natalia Flechas Manrique

About

Comparison of DL models in a sentence similarity task

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors