Skip to content

Coercing issue when the code is run #4

Open
@KungFuPandey

Description

@KungFuPandey

I am getting the following error , can you please help ?


TypeError Traceback (most recent call last)
in ()
60 #tries using all words as the feature selection mechanism
61 print 'using all words as features'
---> 62 evaluate_features(make_full_dict)
63
64 #scores words based on chi-squared test to show information gain (http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/)

in evaluate_features(feature_select)
14 #http://stackoverflow.com/questions/367155/splitting-a-string-into-words-and-punctuation
15 #breaks up the sentences into lists of individual words (as selected by the input mechanism) and appends 'pos' or 'neg' after each list
---> 16 with open(RT_POLARITY_POS_FILE, 'r') as posSentences:
17 for i in posSentences:
18 posWords = re.findall(r"[\w']+|[.,!?;]", i.rstrip())

TypeError: coercing to Unicode: need string or buffer, RDD found

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions