Skip to content

Latest commit

 

History

History
10 lines (8 loc) · 616 Bytes

File metadata and controls

10 lines (8 loc) · 616 Bytes

In this folder are the same Wordcount Hive and Pig scripts except with a "stoplist" feature: http://en.wikipedia.org/wiki/Stop_list

A stoplist, in this case, is a list of words that get filtered out of the final results before they are output. I'm using a Left Outer Join to remove any words that appear in the stoplist. I got the stop list idea from the book: Enterprise Data Workflows With Cascading by Paco Nathan

The directory SampleData contains a sample stop list and input text.