Text similarity task

Program that retrieves n-most (by default 3) similar docs in French only.

All details regarding preprocessing, model used for training and inference can be found in the paper.

Training data

The data used for training was Belgian French news from the publicly available RTBF corpus.

Model

The model used was Word2Vec.

DEPENDENCIES

pip install -r requirements.txt

LAUNCHING

(optional)

python -m spacy download fr_core_news_md

Main program

python launch_session.py

The program starts an interactive CLI session for model inference.

After entering a sentence, it prints the n-most similar docs along with other details on the console.

AUTHOR

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
evaluation_sets		evaluation_sets
paper		paper
01_preprocess_corpus.py		01_preprocess_corpus.py
02_train_models.py		02_train_models.py
03_evaluate_models.py		03_evaluate_models.py
README.md		README.md
get_most_sim.py		get_most_sim.py
launch_session.py		launch_session.py
parser_builder.py		parser_builder.py
preprocess_doc.py		preprocess_doc.py
requirements.txt		requirements.txt
store_corpus_vectors.py		store_corpus_vectors.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text similarity task

Training data

Model

DEPENDENCIES

LAUNCHING

AUTHOR

About

Uh oh!

Releases

Packages

Uh oh!

Languages

C-bianc/Text-Similarity-Task

Folders and files

Latest commit

History

Repository files navigation

Text similarity task

Training data

Model

DEPENDENCIES

LAUNCHING

AUTHOR

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages