ICS_UCI_search_engine

A search engine within the ics.uci.edu domain without the use of any indexing libraries

The HTML parser gets all text in a page, even the non-visible, such as images names. This was particularly useful for the performance of our search engine. Moreover, the TF-IDF measure was used to order the relevance of each document giving a specific query. For each result, the URL of the document is shown, the positions of the query in it and the TF-IDF score of the document. Many different normalization schemes of TF were tried and the best results were obtained when no normalization was used. Although this creates a bias towards documents with many terms, in our collection this works well, since we have many short documents with just a directory. In case of normalization, these non-useful documents were prioritized and documents with meaningful content were shown lower in the results. Furthermore, the pre-computed TF-IDF scores for all terms and documents in the index were used and the results were stored for further use. Finally, each query is preprocessed by modifying it to lowercase, removing stopwords and applying a stemmer.

There two options for the search function: 1. elastic search: where not all words of the query are necessary for a match, 2. strict search: where all terms of the query must be present for a match.

The results are evaluated and ordered based on DCG values.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
bookkeeping.tsv		bookkeeping.tsv
google_query_fetcher.py		google_query_fetcher.py
ics_search_engine.py		ics_search_engine.py
indexer.py		indexer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ICS_UCI_search_engine

About

Uh oh!

Releases

Packages

Uh oh!

Languages

besitocat/ICS_UCI_search_engine

Folders and files

Latest commit

History

Repository files navigation

ICS_UCI_search_engine

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages