基于hive0.13.1的spark1.6.0源码编译说明

-----maven安装
到 Maven 官网 http://maven.apache.org/download.cgi 下载 Maven 软件
tar -zxvf apache-maven-3.5.0-bin.tar.gz -C ../
mv apache-maven-3.5.0 maven
修改maven 配置
vi /etc/profile
export M2=/root/maven
export PATH=${PATH}:$M2/bin
source /etc/profile
检查maven是否安装成功
mvn -version
conf/settings.xml中添加国内镜像配置
<mirror>
  <id>alimaven</id>
  <name>aliyun maven</name>
  <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
  <mirrorOf>central</mirrorOf>
</mirror>
<mirror>
  <id>repo2</id>
  <mirrorOf>central</mirrorOf>
  <name>Human Readable Name for this Mirror.</name>
  <url>http://repo2.maven.org/maven2/</url>
</mirror>
<mirror>
  <id>ibiblio</id>
  <mirrorOf>central</mirrorOf>
  <name>ibiblio Mirror of http://repo1.maven.org/maven2/</name>
  <url>http://mirrors.ibiblio.org/pub/mirrors/maven2/</url>
</mirror>
<mirror>
    <id>jboss-public-repository-group</id>
    <mirrorOf>central</mirrorOf>
    <name>JBoss Public Repository Group</name>
    <url>http://repository.jboss.org/nexus/content/groups/public</url>
</mirror>
-----scala2.10.6安装
tar -zxvf scala-2.10.6.tgz -C ../
增加环境变量
vi /etc/profile
export SCALA_HOME=/root/scala-2.10.6
export PATH=$PATH:$SCALA_HOME/bin
检查scala是否安装成功
scala -version
----spark1.6源码编译

到http://spark.apache.org/downloads.html下载源码

基于hive0.13.1的spark1.6.0源码编译说明

tar -zxvf spark-1.6.0.tgz -C ../
cd /root/spark-1.6.0

调整pom版本配置

基于hive0.13.1的spark1.6.0源码编译说明

将hive.version调整成0.13.1,hadoop.version调整成2.5.0,scala.version调整成2.10.6

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package

permgen space问题处理:

基于hive0.13.1的spark1.6.0源码编译说明

解决办法:

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package


[ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.6.0: Failed to collect dependencies at org.spark-project.hive:hive-exec:jar:0.13.1.spark: Failed to read artifact descriptor for org.spark-project.hive:hive-exec:jar:0.13.1.spark: Could not transfer artifact org.spark-project.hive:hive-exec:pom:0.13.1.spark from/to twttr-repo (http://maven.twttr.com): Connect to maven.twttr.com:80 [maven.twttr.com/199.59.149.208] failed: Connection timed ou
解决办法:将spark目录下pom.xml中hive.version值0.13.1.spark改成0.13.1
mvn -rf :spark-hive_2.10 -Pyarn -Phive -Phive-thriftserver -DskipTests clean package

[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on project spark-hive_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed.: CompileFailed -> [Help 1]

基于hive0.13.1的spark1.6.0源码编译说明


最终结论:

spark 1.6.0由于源码相关变量和类型不存在或者不匹配,无法基于hive-0.13.1进行源码编译。

将hive版本改成1.2.1,重新编译spark1.6.0是OK的

基于hive0.13.1的spark1.6.0源码编译说明