-
Notifications
You must be signed in to change notification settings - Fork 0
Text Feature Extraction
Sagen Soren edited this page Nov 30, 2020
·
2 revisions
Bag of words is a feature extraction method used to train machine learning models. It is one of the fundamental method to convert tokens into features.
-
text-preprocessing :
- convert entire text into lowercase characters
- remove all punctuations and unnecessary symbols
-
Vocabulary creation :
- from the text create a set of unique word
-
Text vectorization :
- create a matrix of features by assigning a separate column for each word, while each row corresponds to a sentence
- assign 1 if the word is present in the sentence, and 0 if it is not present