IDEA 打包 spark 程序并在远程 hadoop HA 上运行测试

1. idea 安装创建（略）

2。创建 scala 的 Maven 项目（略）

3。导入maven 依赖（重要）

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>SparkHbase</groupId>
  <artifactId>SparkHbase</artifactId>
  <version>1.0-SNAPSHOT</version>
  <inceptionYear>2008</inceptionYear>
  <properties>
    <hadoop.version>2.6.0</hadoop.version>
    <hbase.version>1.2.0</hbase.version>
    <spark.version>1.6.0</spark.version>
  </properties>
  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>${hadoop.version}</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>${hadoop.version}</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>${hadoop.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>${hbase.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-server</artifactId>
      <version>${hbase.version}</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase</artifactId>
      <version>${hbase.version}</version>
      <type>pom</type>
    </dependency>
  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <version>2.15.2</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.1.0</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <createDependencyReducedPom>false</createDependencyReducedPom>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
              <!--<transformers>-->
                <!--<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">-->
                  <!--<mainClass>com.test.SparkCount</mainClass>-->
                <!--</transformer>-->
              <!--</transformers>-->

            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

关于此项选择：《maven 用的不是很好不太懂》

这一项中添加此配置可以解决打包之后的依赖冲突；如果不加此项有可能在打包之后的jar 中出现包的依赖重复，运行时提示错误信息：

java.lang.SecurityException: class "javax.servlet.FilterRegistration"'s signer information does notmatch signer information of other classes in the same package

遇到这种情况需要对包进行处理： (执行此命令)

zip -d SparkHBase.jar META-INF/*.SF META-INF/*.DSA META-INF/*.RSA

 <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>

4。打包：

测试代码：

package SparkTest

import org.apache.spark.{SparkConf, SparkContext}

object TestStreaming {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("spark://slaver3:7077").setAppName("2018_3_19")
//      setJars(List("D:\\project\\SparkHbase\\out\\artifacts\\SparkHBase_jar"))
    val ssc = new SparkContext(conf);
    val input = ssc.textFile("hdfs://Machenmaster/hbase/data.txt")
    val words = input.flatMap(line => line.split(" ")).map(word =>(word,1)).reduceByKey((x,y)=>x+y)
    println(words)
    words.saveAsTextFile("hdfs://Machenmaster/hbase/OUT")
  }
}

可以使用maven 也可以使用idea自己的打包方式：

【1】我用的是IDEA 自己的打包方式；

【2】和之前文章介绍的一样把所有的依赖包全部去了（linux 环境中已经全部拥有）

IDEA 打包 spark 程序并在远程 hadoop HA 上运行测试

【3】上传至集群中，运行：

spark-submit --class SparkTest.TestStreaming SparkHBase.jar

IDEA 打包 spark 程序并在远程 hadoop HA 上运行测试

IDEA 打包 spark 程序 并在远程 hadoop HA 上运行测试

相关推荐

IDEA 打包 spark 程序并在远程 hadoop HA 上运行测试