A file is given as an input and text is then pre-processed, tokenized, stemmed and lemmatized. After this, all the lemmatized words are tagged in context of Brown Corpus. A list of words along with there respective tags is obtained a a result. Some side results are also obtained:
- Bar-Graphs:
- Relationship between word length and frequency (tokens with stopwords)
- Relationship between word length and frequency (tokens without stopwords)
- Top 10 most occured word in the file (with stopwords)
- Top 10 most occured word in the file (without stopwords)
- Relationship between word tags and frequency
- Word Cloud:
- Of tokens (with stopwords)
- Of tokens (without stopwords)