Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)

第一章、准备Hadoop2.7.4,Zookeeper3.4.8,Hbase1.2.6安装文件

Hadoop-2.7.4安装包  下载地址: http://hadoop.apache.org/releases.html

Hbase-1.2.6安装包    下载地址:http://mirror.bit.edu.cn/apache/hbase/

zookeeper-3.4.8安装包  下载地址:http://mirror.bit.edu.cn/apache/zookeeper/



第二章、安装环境CentOS6 7.0

1. linux虚拟机11台,服务器CPU:i5双核以上,内存:2G以上


2.机器名ip地址安装软件及运行进程说明:


master1 192.168.1.20hadoop、Zookeeper、hbase NN、DN 、RM、DFSZKFC、journalNode、HMaster、QuorumPeerMain 
master2 192.168.1.21hadoop、Zookeeper、hbase NN、DN、RM、DFSZKFC、journalNode、QuorumPeerMain 
slave1 192.168. 1.22hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 
slave1 192.168. 1.23hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.24hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.25hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.26hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.27hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.28hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.29hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain

slave1 192.168. 1.30hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 



3.此时我们先对第一台机器做修改,其他的后期克隆该机器即可



4.修改/etc/hosts文件,举例(参考样式):

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)

配置完成后截图(真是环境截图):

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)


修改/etc/sysconfig/network

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)



5.重启机器

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)
或者临时修改hostname

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)



6.安装JDK1.7.0以上版本即可(如果已经安装,此步骤可以跳过

6.1 将JDK解压

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)
6.2 编辑/etc/profile 添加jdk路径
Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)
6.3 保存退出

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)
6.4 修改CentOS里的java优先级
Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)
此时JDK已经安装完成

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)



7.解压hadoop并修改环境变量


7.1 解压Hadoop2.7.1安装包(安装包统一放置在根目录的data文件夹中),配置环境变量

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)



8.修改配置文件

8.1 修改$HADOOP_HOME/etc/hadoop/slaves文件

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)
加入所有slave节点的hostname
Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)


8.2 修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh文件,修改JAVA_HOME路径

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)

举例(参考图例):

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)


配置完成后效果:

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)


8.3 修改$HADOOP_HOME/etc/hadoop/yarn-env.sh文件,修改JAVA_HOME路径

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)


举例(参考图例):

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)


配置完成后效果:

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)


8.4 可以在配置文件中查找coresite.xml,hdfs-site.xml,修改HADOOPHOME/etc/hadoop/coresite.xml文件


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<!-- Put site-specific property overrides in this file. -->


<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster</value>
</property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/soft/hadoop-2.6.0/tmp/</value>
        </property>
<property>
<name>ha.zookeeper.quorum</name>
<value>master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181</value>
</property>
<property>
                <name>io.file.buffer.size</name>
                <value>131072</value>
        </property>
<property>
                <name>dfs.ha.fencing.methods</name>
                <value>sshfence</value>
        </property>
        <property>
                <name>dfs.ha.fencing.ssh.private-key-files</name>
                <value>/root/.ssh/id_dsa</value>
        </property>
</configuration>



 修改HADOOP_HOME/etc/hadoop/hdfs-site.xml文件


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<!-- Put site-specific property overrides in this file. -->


<configuration>
<property>
                <name>dfs.nameservices</name>
                <value>cluster</value>
        </property>
        <property>
                <name>dfs.ha.namenodes.cluster</name>
                <value>master1,master2</value>
        </property>
        <property>
                <name>dfs.namenode.rpc-address.cluster.master1</name>
                <value>master1:8020</value>
        </property>
        <property>
                <name>dfs.namenode.rpc-address.cluster.master2</name>
                <value>master2:8020</value>
        </property>
        <property>
                <name>dfs.namenode.http-address.cluster.master1</name>
                <value>master1:50070</value>
        </property>
        <property>
                <name>dfs.namenode.http-address.cluster.master2</name>
                <value>master2:50070</value>
        </property>
<property>
                 <name>dfs.namenode.servicerpc-address.cluster.master1</name>
                 <value>master1:53333</value>
         </property>
         <property>
                 <name>dfs.namenode.servicerpc-address.cluster.master2</name>
                 <value>master2:53333</value>
         </property>
        <property>
                <name>dfs.namenode.shared.edits.dir</name>
                <value>qjournal://master1:8485;master2:8485;slave1:8485;slave2:8485;slave3:8485/cluster</value>
        </property>
        <property>
                <name>dfs.client.failover.proxy.provider.cluster</name>
                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <property>
                <name>dfs.journalnode.edits.dir</name>
                <value>/soft/hadoop-2.6.0/mydata/journal</value>
        </property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>  
       <name>dfs.namenode.name.dir</name>  
       <value>file:/soft/hadoop-2.6.0/mydata/name</value>  
</property>  
<property>  
    <name>dfs.datanode.data.dir</name>  
    <value>file:/soft/hadoop-2.6.0/mydata/data</value>  
</property>  
<property>
       <name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>  
<name>dfs.webhdfs.enabled</name>  
<value>true</value>
</property>
<property>  
<name>dfs.journalnode.http-address</name>  
<value>0.0.0.0:8480</value>  
</property>  
<property>  
<name>dfs.journalnode.rpc-address</name>  
<value>0.0.0.0:8485</value>  
</property>
<property>    
<name>dfs.permissions</name>    
<value>false</value>    
</property>
</configuration>


8.5可以在配置文件中查找mapredsite.xml,yarn-site.xml,修改HADOOPHOME/etc/hadoop/mapredsite.xml文件


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<!-- Put site-specific property overrides in this file. -->


<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master1:19888</value>
</property>
</configuration>


修改 HADOOP_HOME/etc/hadoop/yarn-site.xml文件yarn.resourcemanager.ha.id的属性值在master2机器中需要更改为rm2


<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
</property>
<property>
                 <name>yarn.resourcemanager.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8032</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.scheduler.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8030</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.webapp.https.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8089</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.webapp.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8088</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8025</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.admin.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8041</value>
         </property>


         <property>
                 <name>yarn.resourcemanager.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8032</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.scheduler.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8030</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.webapp.https.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8089</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.webapp.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8088</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8025</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.admin.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8041</value>
         </property>


<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property> 
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
    <value>/soft/hadoop-2.6.0/mydata/yarn/local</value>
</property>
<property>
                <name>yarn.nodemanager.log-dirs</name>
                <value>/soft/hadoop-2.6.0/mydata/yarn/log</value>
        </property>
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property> 
<property> 
<name>yarn.resourcemanager.zk-state-store.address</name> 
  <value>master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181</value>
</property>
<property> 
  <name>yarn.resourcemanager.store.class</name> 
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> 
</property> 
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster</value>
</property> 
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/soft/hadoop-2.6.0/etc/hadoop/fairscheduler.xml</value>
</property>


</configuration>


8.6 添加$HADOOP_HOME/etc/hadoop/fairscheduler.xml文件


<?xml version="1.0"?>
<allocations>
         <queue name="news">
                 <minResources>1024 mb, 1 vcores </minResources>
                 <maxResources>1536 mb, 1 vcores </maxResources>
                 <maxRunningApps>5</maxRunningApps>
                 <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
                 <weight>1.0</weight>
                 <aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
         </queue>
         <queue name="crawler">
                 <minResources>1024 mb, 1 vcores</minResources>
                 <maxResources>1536 mb, 1 vcores</maxResources>
         </queue>
         <queue name="map">
                 <minResources>1024 mb, 1 vcores</minResources>
                 <maxResources>1536 mb, 1 vcores</maxResources>
         </queue>
</allocations>



8.9 创建相关文件夹,根据xml配置文件,建立相应的文件夹,举例(参考图列):

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)
  配置完成后效果如下:

Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)


此时Hadoop+HA配置文件已经配好,就差ssh免密码登录+格式化Hadoop系统。
等我们装完所有软件(Zookeeper+hbase),克隆机器后再进行ssh免密码登录及Hadoop格式化。克隆后还需要更改每个节点的/etc/sysconfig/network中的hostname,以及更改master2中$HADOOP_HOME/etc/hadoop/yarn-site.xml文件的yarn.resourcemanager.ha.id属性值为rm2,(其他节点不用修改)


注意:此篇为大数据知识体系开章篇,即将推出实际项目中所应用到与大数据集群相关知识教程,敬请期待!