this is an implementation for cloudera's pdf scanning post http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-pdfs-at-scale-using-fewer-than-50-lines-of-code/ with elastic and cassandra instead of slor and hbase
this is an implementation for cloudera's pdf scanning post http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-pdfs-at-scale-using-fewer-than-50-lines-of-code/ with elastic and cassandra instead of slor and hbase