Skip to content

Latest commit

 

History

History
26 lines (14 loc) · 822 Bytes

README.md

File metadata and controls

26 lines (14 loc) · 822 Bytes

A Smart Tweeter Search Engine.

  • Local Twitter Search engine including:

  • Custom build Parser to collect data from Emojis Numbers, Countries, Entities and more.

  • Smart inverted index for creating posting files.

Using 3 Main ALGORITHMS:

  • GloVe: Global Vectors for Word Representation.

  • W2V : Word2Vec technique for natural language processing.

  • WordNet : get information such as Synonyms, Hypernyms and Hyponyms.

Similarity: Cosine Similarity, TF-IDF, BM-25 and more.

Also includes the following API's and libs:

gensim.models, NumPy, Pandas, SciPy spatial, nltk, PorterStemmer, emoji etc.

The quality of results tested against Benchmark DB, with MAP, Recall, precision, precision@5 measurements.

includes sample.parquet files.

Tests