Skip to content

Latest commit

 

History

History
102 lines (97 loc) · 4.37 KB

README.md

File metadata and controls

102 lines (97 loc) · 4.37 KB

Learning how to tame the Big Data with Hadoop and related technologies

Table of Contents

Hadoop

  • Hadoop is an open source software platform for distributed storage and distributed processing of very large datasets on computer clusters built from commodity hardware
  • Why Hadoop?
    • Data is too big
    • Vertical scaling isn't an option
      • Disk seek times
      • Hardware failures
      • Processing times
    • Horizontal scaling is linear
    • You can do much more instead of just batch processing

Installation

  • Download Virtual Box from https://www.virtualbox.org/
  • Download image of Hadoop to run on Virtual Box
  • Import the image into Virtual Box
  • Once you bootup, you will have CentOS instance that has Hadoop up and running
  • We can use CLI, it also has browser interface
    • Ambari is available to easily navigate and manage different systems on Hadoop
    • Goto http://localhost:8888
  • Launch Dashboard and login to Ambari
    • Username: maria_dev
    • Password: maria_dev
  • Trouble shooting

    • Enable virtualization in your BIOS
    • Disable Hyper-V acceleration in Windows