Skip to content

C-bianc/Text-Similarity-Task

Repository files navigation

Text similarity task

Program that retrieves n-most (by default 3) similar docs in French only.

All details regarding preprocessing, model used for training and inference can be found in the paper.

Training data

The data used for training was Belgian French news from the publicly available RTBF corpus.

Model

The model used was Word2Vec.

DEPENDENCIES

pip install -r requirements.txt

LAUNCHING

(optional)

python -m spacy download fr_core_news_md

Main program

python launch_session.py

The program starts an interactive CLI session for model inference.

After entering a sentence, it prints the n-most similar docs along with other details on the console.

AUTHOR

[email protected]

Releases

No releases published

Packages

No packages published

Languages