Skip to content

Anton87/SearchEngine

Repository files navigation

Itwiki

Indexing and retrieving tools for the itwiki corpus.

Clean up

Tthe following command clean up the itwiki corpus :

	shell> grep --invert-match --perl-regexp "^(Wikipedia|Progetto|Template|File|Aiuto|Portale):" "itwiki-20140127-pages-articles-multistream-paragraphs-with-lists-no-images.txt" > "data/itwiki-20140127-pages-articles-multistream-paragraphs-with-lists-no-images.cleaned.txt"

About

Indexign and retrieving tools for the itwiki corpus.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published