Skip to content

Commit

Permalink
[SPARK-7249] Updated Hadoop dependencies due to inconsistency in the …
Browse files Browse the repository at this point in the history
…versions

Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons.

Changes proposed by vanzin resulting from previous pull-request apache#5783 that did not fixed the problem correctly.

Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned.

Author: FavioVazquez <[email protected]>

Closes apache#5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits:

11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh
379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior
3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies
31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation
cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about  hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies
83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml
93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM
668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM
fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh  due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml
0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0
a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml
199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that.
88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file
70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles
287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc.
1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation
6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff.
7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)
  • Loading branch information
FavioVazquez authored and srowen committed May 14, 2015
1 parent c1080b6 commit 7fb715d
Show file tree
Hide file tree
Showing 8 changed files with 79 additions and 90 deletions.
14 changes: 7 additions & 7 deletions dev/create-release/create-release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -118,14 +118,14 @@ if [[ ! "$@" =~ --skip-publish ]]; then

rm -rf $SPARK_REPO

build/mvn -DskipTests -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 \
-Pyarn -Phive -Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
build/mvn -DskipTests -Pyarn -Phive \
-Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
clean install

./dev/change-version-to-2.11.sh

build/mvn -DskipTests -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 \
-Dscala-2.11 -Pyarn -Phive -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
build/mvn -DskipTests -Pyarn -Phive \
-Dscala-2.11 -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
clean install

./dev/change-version-to-2.10.sh
Expand Down Expand Up @@ -228,9 +228,9 @@ if [[ ! "$@" =~ --skip-package ]]; then

# We increment the Zinc port each time to avoid OOM's and other craziness if multiple builds
# share the same Zinc server.
make_binary_release "hadoop1" "-Phive -Phive-thriftserver -Dhadoop.version=1.0.4" "3030" &
make_binary_release "hadoop1-scala2.11" "-Phive -Dscala-2.11" "3031" &
make_binary_release "cdh4" "-Phive -Phive-thriftserver -Dhadoop.version=2.0.0-mr1-cdh4.2.0" "3032" &
make_binary_release "hadoop1" "-Phadoop-1 -Phive -Phive-thriftserver" "3030" &
make_binary_release "hadoop1-scala2.11" "-Phadoop-1 -Phive -Dscala-2.11" "3031" &
make_binary_release "cdh4" "-Phadoop-1 -Phive -Phive-thriftserver -Dhadoop.version=2.0.0-mr1-cdh4.2.0" "3032" &
make_binary_release "hadoop2.3" "-Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn" "3033" &
make_binary_release "hadoop2.4" "-Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn" "3034" &
make_binary_release "mapr3" "-Pmapr3 -Phive -Phive-thriftserver" "3035" &
Expand Down
6 changes: 3 additions & 3 deletions dev/run-tests
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ function handle_error () {
{
if [ -n "$AMPLAB_JENKINS_BUILD_PROFILE" ]; then
if [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop1.0" ]; then
export SBT_MAVEN_PROFILES_ARGS="-Dhadoop.version=1.0.4"
export SBT_MAVEN_PROFILES_ARGS="-Phadoop-1 -Dhadoop.version=1.0.4"
elif [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop2.0" ]; then
export SBT_MAVEN_PROFILES_ARGS="-Dhadoop.version=2.0.0-mr1-cdh4.1.1"
export SBT_MAVEN_PROFILES_ARGS="-Phadoop-1 -Dhadoop.version=2.0.0-mr1-cdh4.1.1"
elif [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop2.2" ]; then
export SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0"
export SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.2"
elif [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop2.3" ]; then
export SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0"
fi
Expand Down
4 changes: 2 additions & 2 deletions dev/scalastyle
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
echo -e "q\n" | build/sbt -Phive -Phive-thriftserver scalastyle > scalastyle.txt
echo -e "q\n" | build/sbt -Phive -Phive-thriftserver test:scalastyle >> scalastyle.txt
# Check style with YARN built too
echo -e "q\n" | build/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 scalastyle >> scalastyle.txt
echo -e "q\n" | build/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 test:scalastyle >> scalastyle.txt
echo -e "q\n" | build/sbt -Pyarn -Phadoop-2.2 scalastyle >> scalastyle.txt
echo -e "q\n" | build/sbt -Pyarn -Phadoop-2.2 test:scalastyle >> scalastyle.txt

ERRORS=$(cat scalastyle.txt | awk '{if($1~/error/)print}')
rm scalastyle.txt
Expand Down
11 changes: 6 additions & 5 deletions docs/building-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,14 +59,14 @@ You can fix this by setting the `MAVEN_OPTS` variable as discussed before.

# Specifying the Hadoop Version

Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the "hadoop.version" property. If unset, Spark will build against Hadoop 1.0.4 by default. Note that certain build profiles are required for particular Hadoop versions:
Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the "hadoop.version" property. If unset, Spark will build against Hadoop 2.2.0 by default. Note that certain build profiles are required for particular Hadoop versions:

<table class="table">
<thead>
<tr><th>Hadoop version</th><th>Profile required</th></tr>
</thead>
<tbody>
<tr><td>1.x to 2.1.x</td><td>(none)</td></tr>
<tr><td>1.x to 2.1.x</td><td>hadoop-1</td></tr>
<tr><td>2.2.x</td><td>hadoop-2.2</td></tr>
<tr><td>2.3.x</td><td>hadoop-2.3</td></tr>
<tr><td>2.4.x</td><td>hadoop-2.4</td></tr>
Expand All @@ -77,19 +77,20 @@ For Apache Hadoop versions 1.x, Cloudera CDH "mr1" distributions, and other Hado

{% highlight bash %}
# Apache Hadoop 1.2.1
mvn -Dhadoop.version=1.2.1 -DskipTests clean package
mvn -Dhadoop.version=1.2.1 -Phadoop-1 -DskipTests clean package

# Cloudera CDH 4.2.0 with MapReduce v1
mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package
mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phadoop-1 -DskipTests clean package
{% endhighlight %}

You can enable the "yarn" profile and optionally set the "yarn.version" property if it is different from "hadoop.version". Spark only supports YARN versions 2.2.0 and later.

Examples:

{% highlight bash %}

# Apache Hadoop 2.2.X
mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
mvn -Pyarn -Phadoop-2.2 -DskipTests clean package

# Apache Hadoop 2.3.X
mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package
Expand Down
2 changes: 1 addition & 1 deletion docs/hadoop-third-party-distributions.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ property. For certain versions, you will need to specify additional profiles. Fo
see the guide on [building with maven](building-spark.html#specifying-the-hadoop-version):

mvn -Dhadoop.version=1.0.4 -DskipTests clean package
mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
mvn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package

The table below lists the corresponding `hadoop.version` code for each CDH/HDP release. Note that
some Hadoop releases are binary compatible across client versions. This means the pre-built Spark
Expand Down
2 changes: 1 addition & 1 deletion make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ while (( "$#" )); do
--hadoop)
echo "Error: '--hadoop' is no longer supported:"
echo "Error: use Maven profiles and options -Dhadoop.version and -Dyarn.version instead."
echo "Error: Related profiles include hadoop-2.2, hadoop-2.3 and hadoop-2.4."
echo "Error: Related profiles include hadoop-1, hadoop-2.2, hadoop-2.3 and hadoop-2.4."
exit_with_usage
;;
--with-yarn)
Expand Down
33 changes: 15 additions & 18 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -122,9 +122,9 @@
<slf4j.version>1.7.10</slf4j.version>
<log4j.version>1.2.17</log4j.version>
<hadoop.version>2.2.0</hadoop.version>
<protobuf.version>2.4.1</protobuf.version>
<protobuf.version>2.5.0</protobuf.version>
<yarn.version>${hadoop.version}</yarn.version>
<hbase.version>0.98.7-hadoop1</hbase.version>
<hbase.version>0.98.7-hadoop2</hbase.version>
<hbase.artifact>hbase</hbase.artifact>
<flume.version>1.4.0</flume.version>
<zookeeper.version>3.4.5</zookeeper.version>
Expand All @@ -143,7 +143,7 @@
<oro.version>2.0.8</oro.version>
<codahale.metrics.version>3.1.0</codahale.metrics.version>
<avro.version>1.7.7</avro.version>
<avro.mapred.classifier></avro.mapred.classifier>
<avro.mapred.classifier>hadoop2</avro.mapred.classifier>
<jets3t.version>0.7.1</jets3t.version>
<aws.java.sdk.version>1.8.3</aws.java.sdk.version>
<aws.kinesis.client.version>1.1.0</aws.kinesis.client.version>
Expand All @@ -155,7 +155,7 @@
<jline.version>${scala.version}</jline.version>
<jline.groupid>org.scala-lang</jline.groupid>
<jodd.version>3.6.3</jodd.version>
<codehaus.jackson.version>1.8.8</codehaus.jackson.version>
<codehaus.jackson.version>1.9.13</codehaus.jackson.version>
<fasterxml.jackson.version>2.4.4</fasterxml.jackson.version>
<snappy.version>1.1.1.7</snappy.version>
<netlib.java.version>1.1.2</netlib.java.version>
Expand Down Expand Up @@ -1644,39 +1644,36 @@
-->

<profile>
<id>hadoop-2.2</id>
<id>hadoop-1</id>
<properties>
<hadoop.version>2.2.0</hadoop.version>
<protobuf.version>2.5.0</protobuf.version>
<hbase.version>0.98.7-hadoop2</hbase.version>
<avro.mapred.classifier>hadoop2</avro.mapred.classifier>
<codehaus.jackson.version>1.9.13</codehaus.jackson.version>
<hadoop.version>1.0.4</hadoop.version>
<protobuf.version>2.4.1</protobuf.version>
<hbase.version>0.98.7-hadoop1</hbase.version>
<avro.mapred.classifier>hadoop1</avro.mapred.classifier>
<codehaus.jackson.version>1.8.8</codehaus.jackson.version>
</properties>
</profile>

<profile>
<id>hadoop-2.2</id>
<!-- SPARK-7249: Default hadoop profile. Uses global properties. -->
</profile>

<profile>
<id>hadoop-2.3</id>
<properties>
<hadoop.version>2.3.0</hadoop.version>
<protobuf.version>2.5.0</protobuf.version>
<jets3t.version>0.9.3</jets3t.version>
<hbase.version>0.98.7-hadoop2</hbase.version>
<commons.math3.version>3.1.1</commons.math3.version>
<avro.mapred.classifier>hadoop2</avro.mapred.classifier>
<codehaus.jackson.version>1.9.13</codehaus.jackson.version>
</properties>
</profile>

<profile>
<id>hadoop-2.4</id>
<properties>
<hadoop.version>2.4.0</hadoop.version>
<protobuf.version>2.5.0</protobuf.version>
<jets3t.version>0.9.3</jets3t.version>
<hbase.version>0.98.7-hadoop2</hbase.version>
<commons.math3.version>3.1.1</commons.math3.version>
<avro.mapred.classifier>hadoop2</avro.mapred.classifier>
<codehaus.jackson.version>1.9.13</codehaus.jackson.version>
</properties>
</profile>

Expand Down
97 changes: 44 additions & 53 deletions yarn/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
<name>Spark Project YARN</name>
<properties>
<sbt.project.name>yarn</sbt.project.name>
<jersey.version>1.9</jersey.version>
</properties>

<dependencies>
Expand Down Expand Up @@ -85,7 +86,12 @@
<artifactId>jetty-servlet</artifactId>
</dependency>
<!-- End of shaded deps. -->


<!--
See SPARK-3710. hadoop-yarn-server-tests in Hadoop 2.2 fails to pull some needed
dependencies, so they need to be added manually for the tests to work.
-->

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-tests</artifactId>
Expand All @@ -97,59 +103,44 @@
<artifactId>mockito-all</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
<version>6.1.26</version>
<exclusions>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-core</artifactId>
<version>${jersey.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-json</artifactId>
<version>${jersey.version}</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>stax</groupId>
<artifactId>stax-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-server</artifactId>
<version>${jersey.version}</version>
<scope>test</scope>
</dependency>
</dependencies>

<!--
See SPARK-3710. hadoop-yarn-server-tests in Hadoop 2.2 fails to pull some needed
dependencies, so they need to be added manually for the tests to work.
-->
<profiles>
<profile>
<id>hadoop-2.2</id>
<properties>
<jersey.version>1.9</jersey.version>
</properties>
<dependencies>
<dependency>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
<version>6.1.26</version>
<exclusions>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-core</artifactId>
<version>${jersey.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-json</artifactId>
<version>${jersey.version}</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>stax</groupId>
<artifactId>stax-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-server</artifactId>
<version>${jersey.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
</profile>
</profiles>


<build>
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
Expand Down

0 comments on commit 7fb715d

Please sign in to comment.