Skip to content

Files

Latest commit

e37f53e · Jun 30, 2013

History

History

versionWithStopList

In this folder are the same Wordcount Hive and Pig scripts except with a "stoplist" feature: http://en.wikipedia.org/wiki/Stop_list

A stoplist, in this case, is a list of words that get filtered out of the final results before they are output. I'm using a Left Outer Join to remove any words that appear in the stoplist. I got the stop list idea from the book: Enterprise Data Workflows With Cascading by Paco Nathan

The directory SampleData contains a sample stop list and input text.