Steps 1-5 can be edited to just take data (if doing practice with local files) and put it onto HDFS with the command
"hadoop fs -copyFromLocal 'file/address/in/linux' 'hdfs/location/' "
Use collected data on 1 Hive table (Hue/Company HDFS) and store it onto Personal HDFS
insert overwrite directory '/user/hue/sample_test' row format delimited fields terminated by '|' select device_idfa,device_mac,device_manufacturer,device_screen_pixel_metric,device_model from adcocoa_device where device_idfa is not null and device_idfa != 'null'
The command above stores the data into small pieces in /user/hue/sample_test. There the files can be downloaded and imported onto local file system
Write Java Program to read files from local environment, parse linearly, and send data to Kafka using the log4j and kafka packages provided on Maven.
/src/main/java/parser.java
Wrap package into .jar file and export to HDFS for Kafka/Flume Processing
Start up all needed resources on CentOS (Linux Distribution I'm using, yours may be different). Name/Data nodes, Zookeeper, Kafka, Hadoop. Do -> jps <- to make sure all of them are online.
Run Flume to have a receiver after setting up a Flume.conf file.
flume-ng agent -n flume1 -c conf -f flume.conf - Dflume.root.logger=INFO,console
Run the java -jar file to start the Kafka Producer
java -jar tooltest-VERSION-SNAPSHOT.jar
With both the Consumer/Producer running, the files from the folder will now be read into Hadoop Distributed File System (HDFS) and stored under '/user/kafka/database/%topic/%y-%m-%d'
Install and configure Hive. Start up Hive.
'$HIVE_HOME/bin/hive'
Create a table in hive delimited by whatever you are delimited by, in this case it's the pipe character |
create table tablename(a int, b string, c string, d string, e string)
row format delimited
fields delimited by '\|';
Load data from hdfs into hive table
'load data inpath 'filepath/path' into table tester2
Create sorted table that sorts by phone brand, we'll use this data to create a visual after sending to MySQL
'insert into table sortorder select phone,count(phone) as phoneCount from tester2 group by phone order by phoneCount desc;'
This is a sorted table with entries in 2 columns of phone brand and the # of times that people using that brand have accessed our app.
Use Sqoop (ver 1.4.6 compatible with Hadoop 2.8.0) to export data from hive warehouse to MySQL for web visual integration.
./sqoop export --connect jdbc:mysql://localhost/test --username root -P --table test --fields-terminated-by ',' --lines-terminated-by '\n' --export-dir /user/hive/warehouse/tester2
See other project for continued development, including processing SQL Data to a webpage using Java and Spring