基于Cloudera Manager5配置HIVE压缩
基于Cloudera Manager5配置HIVE压缩,配置HIVE的压缩,实际就是配置MapReduce的压缩,包括运行结果及中间结果的压缩。
1、基于HIVE命令行的配置
- set hive.enforce.bucketing=true;
- set hive.exec.compress.output=true;
- set mapred.output.compress=true;
- set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
- set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
在hive的命令下行运行如上代码即可,这里用的是Gzip压缩。
2、基于xml文件的压缩配置
mapred-site.xml
- <property>
- <name>mapred.output.compress</name>
- <value>true</value>
- <description>Should the job outputs be compressed?
- </description>
- </property>
- <property>
- <name>mapred.output.compression.codec</name>
- <value>org.apache.hadoop.io.compress.GzipCodec</value>
- <description>If the job outputs are compressed, how should they be compressed?
- </description>
- </property>
hive-site.xml
- <property>
- <name>hive.enforce.bucketing</name>
- <value>true</value>
- </property>
- <property>
- <name>hive.exec.compress.output</name>
- <value>true</value>
- </property>
- <property>
- <name>io.compression.codecs</name>
- <value>org.apache.hadoop.io.compress.GzipCodec</value>
- </property>
3、基于Cloudera Manager5配置HIVE压缩
1) 基于yarn的MR配置
2) hive的配置
增加如下内容
- <property>
- <name>hive.enforce.bucketing</name>
- <value>true</value>
- </property>
- <property>
- <name>hive.exec.compress.output</name>
- <value>true</value>
- </property>
- <property>
- <name>io.compression.codecs</name>
- <value>org.apache.hadoop.io.compress.GzipCodec</value>
- </property>
配置完毕,MapReduce包括hive运行结果以GZip进行压缩。
在hive命令行的话可以根据分区进行压缩。可以设置:
--设置hive处理压缩
set hive.exec.compress.output=true;
--yarn
set mapreduce.output.fileoutputformat.compress=true;
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec;
set hive.exec.compress.output=true;
--yarn
set mapreduce.output.fileoutputformat.compress=true;
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec;