- This repo contains Linux/Unix command line code that I used to perform data analysis, cleaning, mining, and visualization on the real New York City August 2019 taxi dataset.
- cmds.log contains the main commands performed for data cleaning and mining
- plotcmnds.log contains the commands performed on Gnuplot
- a3.txt contains the top 10 pickup locations and the top 10 pickup and dropoff pair locations that yielded the highest average "total amount" in August 2019
- a3t3.svg contains the correlation chart between average tip amounts and passenger counts a3t4.svg contains the correlation chart between trip distances (miles) and average total earnings amounts ($)
- awk.scr, awk.task2.scr, awk.task3.scr are the awk script files I wrote that are used for finding average total earnings and tip amounts