layout

title

date

author

tags

modified_time

post

Apache Spark Tutorial with Hortonworks Data Platform

2015-09-12T12:34:00.001-07:00

Saptak Sen

spark

hadoop

2015-09-12T15:11:18.054-07:00

Apache Spark is a fast, in-memory data processing engine with an elegant development API that allows data workers to efficiently execute algorithms which require iterative access to datasets, like machine learning algorithms. Spark on YARN enables deep integration with Hadoop and other YARN enabled workloads in the enterprise.

Below, we are going to explore the basic concepts of Apache Spark and the first few necessary steps to get started.

Introduction
Configuring Hortonworks Sandbox on Azure
Installing Apache Spark 1.3.1 on HDP 2.2.4.2
Installing Apache Spark 1.2.0 on HDP 2.2
Basics of programming Apache Spark
A short primer on Scala
Exploring Spark with Scala
Using Hive and ORC with Apache Spark
Installing and configuring Zeppelin
Using IPython Notebook with Apache Spark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2015-09-12-apache-spark-tutorial.md

2015-09-12-apache-spark-tutorial.md

Table of Contents

Files

2015-09-12-apache-spark-tutorial.md

Latest commit

History

2015-09-12-apache-spark-tutorial.md

File metadata and controls

Table of Contents