元数据与数据治理|Hive安装与配置详解,整半天终于全明白了
本篇介绍hive的安装、配置、测试等内容希望对大家有所收获
什么是hive
hive是建立在hadoop上的,hadoop中的mapreduce调用如果面向DBA的时候,因为不是每个DBA都能明白mapreduce的工作原理,所以此时处于一种很尴尬的场景,毕竟精力有限,再学习mapreduce有点得不偿失。
hive正是实现了这个,hive是要类SQL语句(HiveQL)来实现对hadoop下的数据管理。hive属于数据仓库的范畴,数据库侧重于OLTP(在线事务处理),数据仓库侧重OLAP(在线分析处理);也就是说,例如mysql类的数据库更侧重于短时间内的数据处理,反之。
无hive:使用者.....->mapreduce...->hadoop数据(可能需要会mapreduce)
有hive:使用者...->HQL(SQL)->hive...->mapreduce...->hadoop数据(只需要会SQL语句)
应该明白hive的定位了吧,至于技术的发展图,用下面的一张图来替代下吧 (手绘版)
正文(hive安装和配置)
安装
1.hive下载——地址:http://mirror.bit.edu.cn/apache/hive/
下载速度还算 可以,以安装 hive-3.1.1为例
2.将hive上传到服务器,解压到/usr/local下:
tar -zxvf apache-hive-3.1.1-bin.tar.gz -C /usr/local/
3.将文件重命名为hive文件:
mv apache-hive-3.1.1-bin hive
4.修改环境变量/etc/profile:
vi /etc/profile
在最底端追加,如下 配置
export HIVE_HOME=/usr/local/hive export PATH=$PATH:$HIVE_HOME/bin
执行(否则不会 立即生效)
source /etc/profile
5.验证是否安装成功
hive --version
本应该成功的结果报错如下
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
hadoop路径配置问:修改 conf/hive-env.sh 中Hadoop的路径设置
之后 再 执行,ok 成功
配置
1. 进入配置文件目录
cd /usr/local/hive/conf/
2.修改配置文件
vi hive-site.xml
没有,以模板复制一个:
cp hive-default.xml.template hive-site.xml
在文件中添加如下配置(删除配置文件中原有的配置)
<property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>[email protected]</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.68.112:3306/datacenter</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property>
3.复制mysql的驱动程序到hive/lib下面
4.mysql中datacenter的schema(在此之前需要创建mysql下的相应的数据库)
schematool -dbType mysql -initSchema
遇到错误也是经常的事情
上面的第3207行,第96个字符是非法字符,注释掉就行了
修改后这次就可以了吧
我了个去?又来了
[[email protected] bin]# schematool -dbType mysql -initSchema SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:mysql://192.168.68.112:3306/datacenter Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: root org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version. Underlying cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException : Communications link failure Last packet sent to the server was 0 ms ago. SQL Error code: 0 Use --verbose for detailed stacktrace. *** schemaTool failed ***
Underlying cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException : Communications link fail :mysql版本问题,所以 高版本的需要加一下
解决方案: hive-site.xml 修正
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.26.112:3306/datacenter?useSSL=false</value> </property>
最后成功
6.执行hive命令
[[email protected] bin]# hive
报错
[[email protected] bin]# hive which: no hbase in (/opt/jdk1.8.0_171/bin:/opt/jdk1.8.0_171/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/hive/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 9c03f0d0-fece-4fe4-8c3d-4b6f7f411ecd Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-3.1.1.jar!/hive-log4j2.properties Async: true Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx------ at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:315) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:242) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:604) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1802) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1820) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:672) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:111) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3063) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1147) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:940) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) at java.security.AccessController.doPrivileged(Native Method)
忘记了hadoop是用hadoop用户安装的,所以切换到hadoop
[[email protected] bin]# su haddop
执行 后还是报错
bash-4.2$ hive which: no hbase in (/opt/jdk1.8.0_171/bin:/opt/jdk1.8.0_171/bin:/opt/jdk1.8.0_171/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/hive/bin:/usr/local/hive/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 329a7077-5513-4905-b9d3-30e6d0b9b54a Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-3.1.1.jar!/hive-log4j2.properties Async: true Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D at org.apache.hadoop.fs.Path.initialize(Path.java:259) at org.apache.hadoop.fs.Path.<init>(Path.java:217) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:707) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:624) at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:588) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
解决方案如下:
在hive 配置文件hive-site.xml 中找到${system:java.io.tmpdir},并把此都替换成具体目录,如/usr/local/hive/temp
无法创建目录的时候需要切换成root用户,然后执行以下授权命令(换成自己的目录即可)
[[email protected] bin]# sudo chmod -R a+w /usr/
local/hive
再次执行hive
进入下一 环节
测试
1.执行hive命令后
创建一个 hive_test 测试库
hive> create database hive_test; OK Time taken: 1.769 seconds hive>
2.显示库
hive> show databases; OK default hive_test Time taken: 0.337 seconds, Fetched: 2 row(s) hive>
3.创建一张表 hive_test01
use hive_test; OK Time taken: 0.06 seconds hive> create table hive_test01 (id int,name string); OK Time taken: 0.615 seconds hive>
4.adoop中的HDFS目录结构是?
[[email protected] conf]# hadoop fs -lsr /
我们可以看到上面hive_test库和hive_test01表已存在了。(hive和 hadoop现在应该知道什么联系了吧)
错误提示:bash: hadoop: command not found
需要将hadoop/bin路径加入PATH,配置环境变量
[[email protected] bin]# vi ~/.bash_profile 打开文件,添加hadoop的bin路径到path中
PATH=$PATH:$HOME/bin:/home/hadoop/hadoop/bin
[[email protected] bin]# source ~/.bash_profile ,使bash_profiel文件生效
5.Hadoop的web端如何查看,复制上面的hive_test的路径,如下图所示
End
通过本章我们可以简单了解hive的安装、配置、测试 三部分内容,此外是不是对hive、mysql、hadoop三者的关系是不是 也 清楚很多呢?有任何问题和在下方留言 ,有问必有回。更对技能点正在整理当中,明天继续分享,感兴趣的欢迎转发 关注