Idea 里面创建Mavenproject 利用sparkSQL操作Hive

1、创建maven project,这个就在不说了。project结构如下所示,

Idea 里面创建Mavenproject 利用sparkSQL操作Hive

2、添加pom文件的spark依赖。

<properties>
    <spark.version>2.2.0</spark.version>
    <scala.version>2.11</scala.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>0.10.0.0</version>
    </dependency>

    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.38</version>
    </dependency>
    <!--<dependency>-->
        <!--<groupId>commons-dbutils</groupId>-->
        <!--<artifactId>commons-dbutils</artifactId>-->
        <!--<version>1.6</version>-->
    <!--</dependency>-->
    <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
    </dependency>

3、我的hive版本是1.2.2,Hadoop-2.7.1 把hive的配置文件hive-site.xml拷贝到maven project 下面的src/resource/下面。同时在该目录下面创建log4j.properties的配置文件,方便我们查看运行的日志信息:下面是log4j的配置信息,可以直接拷贝使用,

###\u8BBE\u7F6E ###
log4j.rootLogger = info,stdout
###\u8F93\u51FA\u4FE1\u606F\u5230\u63A7\u5236\u62AC ###
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n

下面给出hive-site.xml文件的内容。

<configuration>
    <property>
       <name>hive.metastore.warehouse.dir</name>
       <value>hdfs://172.16.0.37:9000/user/hive/warehouse</value>   # 这里的地址要改成你自己的hadoop的namenode的地址
    </property>

    <property>
        <name>javax.jdo.option.ConnectionURL</name>
       <value>jdbc:mysql://172.16.0.37:3306/hive?createDatabaseIfNotExist=true</value> # 这个地址是hive元数据保存的mysql的主机地址。
    </property>
    <property>
       <name>javax.jdo.option.ConnectionDriverName</name>
       <value>com.mysql.jdbc.Driver</value>
   </property>
    <property>
       <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
       <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>
 </configuration>

下面是demo测试程序。

Idea 里面创建Mavenproject 利用sparkSQL操作Hive

下面是运行的结果显示:

Idea 里面创建Mavenproject 利用sparkSQL操作Hive