(下面是DRBD+NFS的简介,因为heartbeat笔者已在上篇博客介绍过,就不再赘述)

DRBD基础

Distributed Replicated Block Device(DRBD)是一个用软件实现的、无共享的、服务器之间镜像块设备内容的存储复制解决方案。

数据镜像:实时、透明、同步(所有服务器都成功后返回)、异步(本地服务器成功后返回)

DRBD的核心功能通过Linux的内核实现,最接近系统的IO栈,但它不能神奇地添加上层的功能比如检测到EXT3文件系统的崩溃。

DRBD的位置处于文件系统以下,比文件系统更加靠近操作系统内核及IO栈。

工具:

drbdadm:高级管理工具,管理/etc/drbd.conf,向drbdsetup和drbdmeta发送指令,

drbdsetup:配置装载进kernel的DRBD模块,平时很少直接用

drbdmeta:管理META数据结构,平时很少直接用。

在DRBD中,资源是特指某复制的存储设备的所有方面。包括资源名称、DRBD设备(/dev/drbdm,这里m是设备最小号,最大号可到147)、磁盘配置(使本地数据可以为DRBD所用)、网络配置(与对方通信)

每个资源有个角色,是Primary或Secondary,下面简称“主”和“备”

主角色的DRBD设备可以不受限制的读和写,比如创建和映射文件系统、对于块设备的raw或直接IO访问。

备角色的DRBD设备接受来自对方的所有修改,但是不能被应用程序读写,甚至只读也不行。

角色可以改变。

DRBD功能

单主模式:典型的高可靠性集群方案。

复主模式:需要采用共享cluster文件系统,如GFS和OCFS2。用于需要从2个节点并发访问数据的场合,需要特别配置。

复制模式:3种模式:

协议A:异步复制协议。本地写成功后立即返回,数据放在发送buffer中,可能丢失。

协议B:内存同步(半同步)复制协议。本地写成功并将数据发送到对方后立即返回,如果双机掉电,数据可能丢失。

协议C:同步复制协议。本地和对方写成功确认后返回。如果双机掉电或磁盘同时损坏,则数据可能丢失。

一般用协议C。选择协议将影响流量,从而影响网络时延。

有效的同步:按线性而不是当初写的顺序同步块。同步损坏时间内的不一致数据。

在线的设备检验:一端顺序计算底层存储,得到一个数字,传给另一端,另一端也计算,如果不一致,则稍后进行同步。建议一周或一月一次。

复制过程的一致性检验:加密后,对*不一致则要求重传。防止网卡、缓冲等问题导致位丢失、覆盖等错误。

Split brain:当网络出现暂时性故障,导致两端都自己提升为Primary。两端再次连通时,可以选择email通知,建议手工处理这种情况。

当数据写在缓冲区里,没有真正写到磁盘上时,系统崩溃会导致数据丢失。而disk flush是指将数据真正写到磁盘上后才返回。

有些带电池的硬盘控制器,如带电池的带点出Dell PERC Raid卡,不但自带缓存而且自带电池,会在系统意外断电或者崩溃后将最后的数据写入磁盘,对这类控制器,可以使用disk flush,从而在保证性能的前提下提高了数据的安全性。

磁盘错误处理策略:

传递给上层:可能造成文件系统remounting成只读,不推荐。

对上层屏蔽:用另一端的相应块进行读写,应用不中断。可以在任何方便的时候再切换。

不一致的(inconsistent)数据:不能以任何方式访问和使用的数据。如正在同步时的目标节点数据,这些数据不能识别,不能mount,甚至不能通过磁盘的自动检测。

过期的(outdated)数据:在备机上的数据,与主机一致,但不需要同步。如主备机网络中断后,备机上的数据就是过期的。

DRBD有接口允许应用程序在网络中断时将备机数据标识为过期的。DRBD拒绝将这个节点提升为主角色。这些管理接口在Heartbeat框架上得到完整实现。

一旦过期资源的复制链接恢复,他的过期标志自动清除,接着进行后台同步。

工作原理

在高可用(HA)中使用DRBD功能,可以代替使用一个共享盘阵.因为数据同时存在于本地主机和远程主机上,

切换时,远程主机只要使用它上面的那份备份数据,就可以继续进行服务了.

DRBD的工作原理如下图:

+--------+ | 文件系统 | +--------+

|

V

+--------+| 块设备层 | | (/dev/drbd1) +--------+

| |

V V

| 本地硬盘 | | (/dev/hdb1) | | 远程主机硬盘 | | (/dev/hdb1) |

 

NFS简介

NFS是Network File System的简写,即网络文件系统.

网络文件系统是FreeBSD支持的文件系统中的一种,也被称为NFS. NFS允许一个系统在网络上与他人共享目录和文件。通过使用NFS,用户和程序可以像访问本地文件一样访问远端系统上的文件。

NFS好处

集群和存储管理之DRBD+Heartbeat+NFS的实现

以下是NFS最显而易见的好处:

1.本地工作站使用更少的磁盘空间,因为通常的数据可以存放在一台机器上而且可以通过网络访问到。

2.用户不必在每个网络上机器里头都有一个home目录。Home目录 可以被放在NFS服务器上并且在网络上处处可用。

3.诸如软驱,CDROM,和 Zip(是指一种高储存密度的磁盘驱动器与磁盘)之类的存储设备可以在网络上面被别的机器使用。这可以减少整个网络上的可移动介质设备的数量。

NFS组成
NFS至少有两个主要部分:一台服务器和一台(或者更多)客户机。客户机远程访问存放在服务器上的数据。为了正常工作,一些进程需要被配置并运行。
实际应用
NFS 有很多实际应用。下面是比较常见的一些:

1.多个机器共享一台CDROM或者其他设备。这对于在多台机器中安装软件来说更加便宜跟方便。

2.在大型网络中,配置一台中心 NFS 服务器用来放置所有用户的home目录可能会带来便利。这些目录能被输出到网络以便用户不管在哪台工作站上登录,总能得到相同的home目录。

3.几台机器可以有通用的/usr/ports/distfiles 目录。这样的话,当您需要在几台机器上安装port时,您可以无需在每台设备上下载而快速访问源码。

NFS配置方式
NFS的配置过程相对简单。这个过程只需要对/etc/rc.conf文件作一些简单修改。

1 在NFS服务器这端,确认/etc/rc.conf 文件里头以下开关都配上了:

rpcbind_enable="YES"

nfs_server_enable="YES"

mountd_flags="-r"

只要NFS服务被置为enable,mountd 就能自动运行。

2 在客户端一侧,确认下面这个开关出现在 /etc/rc.conf里头:

nfs_client_enable="YES"

/etc/exports文件指定了哪个文件系统 NFS应该输出(有时被称为“共享”)。 /etc/exports里面每行指定一个输出的文件系统和哪些机器可以访问该文件系统。在指定机器访问权限的同时,访问选项开关也可以被指定。

 

 

实验拓扑图:

集群和存储管理之DRBD+Heartbeat+NFS的实现

 

一:准备工作》

1.1配置node1的地址:

集群和存储管理之DRBD+Heartbeat+NFS的实现

 

集群和存储管理之DRBD+Heartbeat+NFS的实现

1.2构建一个新的磁盘空间有利于实现DRBD技术

[[email protected] drbd.d]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1288    10241437+  83  Linux
/dev/sda3            1289        1353      522112+  82  Linux swap / Solaris
/dev/sda4            1354        2610    10096852+  83  Linux
[[email protected] drbd.d]# fdisk /dev/sda

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n
You must delete some partition and add an extended partition first

Command (m for help): d
Partition number (1-4): 4

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
e
Selected partition 4
First cylinder (1354-2610, default 1354):
Using default value 1354
Last cylinder or +size or +sizeM or +sizeK (1354-2610, default 2610): +1G

Command (m for help): n
First cylinder (1354-1476, default 1354):
Using default value 1354
Last cylinder or +size or +sizeM or +sizeK (1354-1476, default 1476): +1G

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: 设备或资源忙.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

[[email protected] drbd.d]#
[[email protected] drbd.d]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1288    10241437+  83  Linux
/dev/sda3            1289        1353      522112+  82  Linux swap / Solaris
/dev/sda4            1354        1476      987997+   5  Extended
/dev/sda5            1354        1476      987966   83  Linux

[[email protected] drbd.d]# partprobe /dev/sda
[[email protected] drbd.d]#  cat /proc/partitions
major minor  #blocks  name

   8     0   20971520 sda
   8     1     104391 sda1
   8     2   10241437 sda2
   8     3     522112 sda3
   8     4          0 sda4
   8     5     987966 sda5

 

 

1.4配置node2的地址:

集群和存储管理之DRBD+Heartbeat+NFS的实现

[[email protected] ~]# hostname
node2.a.com

集群和存储管理之DRBD+Heartbeat+NFS的实现

1.5构建一个新的磁盘空间有利于实现DRBD技术

[[email protected] ~]# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): p

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
e
Partition number (1-4): 1
First cylinder (1-2610, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-2610, default 2610): +1000M

Command (m for help): n
Command action
   l   logical (5 or over)
   p   primary partition (1-4)
p
Partition number (1-4): 1
Partition 1 is already defined.  Delete it before re-adding it.

Command (m for help): n
Command action
   l   logical (5 or over)
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (124-2610, default 124):
Using default value 124
Last cylinder or +size or +sizeM or +sizeK (124-2610, default 2610): +1000M

Command (m for help): n
Command action
   l   logical (5 or over)
   p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (247-2610, default 247):
Using default value 247
Last cylinder or +size or +sizeM or +sizeK (247-2610, default 2610): +1000M

Command (m for help): n
Command action
   l   logical (5 or over)
   p   primary partition (1-4)
p
Selected partition 4
First cylinder (370-2610, default 370):
Using default value 370
Last cylinder or +size or +sizeM or +sizeK (370-2610, default 2610): +1000M

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[[email protected] ~]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        2610    20860402+  8e  Linux LVM

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1         123      987966    5  Extended
/dev/sdb2             124         246      987997+  83  Linux
/dev/sdb3             247         369      987997+  83  Linux
/dev/sdb4             370         492      987997+  83  Linux
[[email protected] ~]# partprobe /dev/sdb
[[email protected] ~]# cat  /proc/partitions
major minor  #blocks  name

   8     0   20971520 sda
   8     1     104391 sda1
   8     2   20860402 sda2
   8    16   20971520 sdb
   8    17          0 sdb1
   8    18     987997 sdb2
   8    19     987997 sdb3
   8    20     987997 sdb4
253     0   19791872 dm-0
253     1    1048576 dm-1
[[email protected] ~]#


1.6在node1和node2上配置ssh**信息,有利于以后在一个节点对另一节点直接操作:

在node1上配置ssh**信息

[[email protected] ~]# ssh-****** -t rsa

[[email protected] ~]# ssh-copy-id -i .ssh/id_rsa.pub  [email protected]a.com

在node2上配置ssh**信息

[[email protected] ~]# ssh-****** -t rsa

[[email protected] ~]# ssh-copy-id -i .ssh/id_rsa.pub  [email protected]a.com

二、DRBD安装配置步骤

在node1和node2做以下操作:

我下载的软件包是:(放在/root/下了)

drbd83-8.3.8-1.el5.centos.i386.rpm

kmod-drbd83-8.3.8-1.el5.centos.i686.rpm

2.1、安装DRBD 套件

[[email protected] ~]# rpm -ivh drbd83-8.3.8-1.el5.centos.i386.rpm

[[email protected] ~]# rpm -ivh kmod-drbd83-8.3.8-1.el5.centos.i686.rpm

[[email protected] ~]# rpm -ivh drbd83-8.3.8-1.el5.centos.i386.rpm

[[email protected] ~]# rpm -ivh kmod-drbd83-8.3.8-1.el5.centos.i686.rpm

2.2、加载DRBD 模块

[[email protected] ~]# modprobe drbd

[[email protected] ~]# lsmod | grep drbd

drbd 228528 0

[[email protected] ~]#

[[email protected] ~]# modprobe drbd

[[email protected] ~]# lsmod | grep drbd

drbd 228528 0

[[email protected] ~]#

2.3、修改配置文件

drbd.conf配置文件DRBD运行时,会读取一个配置文件/etc/drbd.conf.这个文件里描述了DRBD设备与硬盘分区的映射关系

2.3.1 在node1上作以下配置

集群和存储管理之DRBD+Heartbeat+NFS的实现

[[email protected] ~]# cd /etc/drbd.d/

[[email protected] drbd.d]# ll

total 4

-rwxr-xr-x 1 root root 1418 Jun 4 2010 global_common.conf

[[email protected] drbd.d]# cp global_common.conf global_common.conf.bak

修改全局配置文件:vim global_common.conf

集群和存储管理之DRBD+Heartbeat+NFS的实现

修改资源配置文件:vim /etc/drbd.d/nfs.res

集群和存储管理之DRBD+Heartbeat+NFS的实现

2.3.2 复制配置到node2上:
[[email protected] drbd.d]# scp /etc/drbd.conf node2.a.com:/etc/
drbd.conf                                                                                  100%  133     0.1KB/s   00:00   
[[email protected] drbd.d]# scp /etc/drbd.d/* node2.a.com:/etc/drbd.d/
global_common.conf                                                                         100% 1148     1.1KB/s   00:00   
global_common.conf.bak                                                                     100% 1418     1.4KB/s   00:00   
nfs.res                                                                                    100%  352     0.3KB/s   00:00   

2.4、 检测配置文件

// 检测配置文件

[[email protected] drbd.d]# drbdadm   adjust nfs
[[email protected] drbd.d]# drbdadm   create-md nfs
Device '0' is configured!
Command 'drbdmeta 0 v08 /dev/sda5 internal create-md' terminated with exit code 20
drbdadm create-md nfs: exited with code 20
[[email protected] drbd.d]#

2.5、创建nfs 的资源

2.5.1 在node1上创建nfs 的资源

[[email protected] drbd.d]# drbdadm create-md nfs

Device '0' is configured!
Command 'drbdmeta 0 v08 /dev/sda5 internal create-md' terminated with exit code 20
drbdadm create-md nfs: exited with code 20

2.5.2 在node2上创建nfs 的资源

[[email protected] drbd.d]#  ssh node2.a.com 'drbdadm create-md nfs'
NOT initialized bitmap
Writing meta data...
initializing activity log
New drbd meta data block successfully created.
[[email protected] drbd.d]#

2.6 启动DRBD服务

[[email protected] ~]# service drbd start
Starting DRBD resources: drbdsetup 0 show:5: delay-probe-volume 0k => 0k out of range [4..1048576]k.
[[email protected] ~]# ssh node2.a.com 'service drbd start'
The authenticity of host 'node2.a.com (192.168.145.99)' can't be established.
RSA key fingerprint is b5:21:65:8a:4b:b1:5a:71:92:81:cc:89:06:56:9a:77.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2.a.com,192.168.145.99' (RSA) to the list of known hosts.
[email protected]'s password:
Starting DRBD resources: drbdsetup 0 show:5: delay-probe-volume 0k => 0k out of range [4..1048576]k.
[[email protected] ~]#

2.7 启动DRBD服务,查看DRBD状态

[[email protected] drbd.d]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
m:res  cs            ro                 ds                     p  mounted  fstype
0:nfs  WFConnection  Secondary/Unknown  Inconsistent/DUnknown  C
[[email protected] drbd.d]#  ssh node2.a.com 'service drbd status'
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
m:res  cs            ro  ds  p  mounted  fstype
0:nfs  Unconfigured
[[email protected] drbd.d]#  drbd-overview
  0:nfs  WFConnection Secondary/Unknown Inconsistent/DUnknown C r----
[[email protected] drbd.d]# ssh node2.a.com 'drbd-overview'
  0:nfs  Unconfigured . . . .
[[email protected] drbd.d]#  chkconfig drbd on
[[email protected] drbd.d]# ssh node2.a.com 'chkconfig drbd on'
[[email protected] drbd.d]#  ssh node2.a.com 'chkconfig --list drbd'
drbd               0:关闭    1:关闭    2:启用    3:启用    4:启用    5:启用    6:关闭
[[email protected] drbd.d]#

2.8 在node1主节点上进行以下配置,并查看挂载信息。

[[email protected] drbd.d]# mkdir /mnt/nfs

[[email protected] drbd.d]# ssh node2.a.com 'mkdir /mnt/nfs'

[[email protected] drbd.d]# drbdsetup /dev/drbd0 primary -o

[[email protected] drbd.d]# mkfs.ext3 /dev/drbd0

[[email protected] drbd.d]# mount /dev/drbd0 /mnt/nfs/

[[email protected] drbd.d]# service drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
m:res  cs            ro               ds                 p  mounted   fstype
0:nfs  WFConnection  Primary/Unknown  UpToDate/Outdated  C  /mnt/nfs  ext3
[[email protected] drbd.d]#  drbd-overview
  0:nfs  WFConnection Primary/Unknown UpToDate/Outdated C r---- /mnt/nfs ext3 950M 18M 885M 2%
[[email protected] drbd.d]#  ssh node2.a.com 'service drbd status'
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
m:res  cs            ro  ds  p  mounted  fstype
0:nfs  Unconfigured
[[email protected] drbd.d]# ssh node2.a.com 'drbd-overview'
  0:nfs  Unconfigured . . . .
[[email protected] drbd.d]# ll /mnt/nfs
总计 16
drwx------ 2 root root 16384 10-17 15:02 lost+found
[[email protected] drbd.d]#

至此DRBD配置成功!!!

三、NFS配置

两台服务器都修改nfs 配置文件,都修改nfs 启动脚本,如下:

vim /etc/exports

集群和存储管理之DRBD+Heartbeat+NFS的实现  
[[email protected] ~]# chkconfig portmap on
[[email protected] ~]# chkconfig  --list portmap
portmap            0:关闭    1:关闭    2:启用    3:启用    4:启用    5:启用    6:关闭
[[email protected] ~]# service portmap start
启动 portmap:                                             [确定]
[[email protected] ~]# chkconfig nfs on
[[email protected] ~]# chkconfig  --list nfs
nfs                0:关闭    1:关闭    2:启用    3:启用    4:启用    5:启用    6:关闭
[[email protected] ~]# service nfs start
启动 NFS 服务:                                            [确定]
关掉 NFS 配额:                                            [确定]
启动 NFS 守护进程:                                        [确定]
启动 NFS mountd:                                          [确定]
[[email protected] ~]#

vim /etc/init.d/nfs

集群和存储管理之DRBD+Heartbeat+NFS的实现

 

node2上:

[[email protected] ~]# vim /etc/exports

集群和存储管理之DRBD+Heartbeat+NFS的实现
[[email protected] ~]# chkconfig portmap on
[[email protected] ~]# chkconfig  --list portmap
portmap            0:关闭    1:关闭    2:启用    3:启用    4:启用    5:启用    6:关闭
[[email protected] ~]# service protmap start
protmap: 未被识别的服务
[[email protected] ~]# service portmap start
启动 portmap:                                             [确定]
[[email protected] ~]# chkconfig nfs on
[[email protected] ~]# chkconfig  --list nfs
nfs                0:关闭    1:关闭    2:启用    3:启用    4:启用    5:启用    6:关闭
[[email protected] ~]# service nfs start
启动 NFS 服务:                                            [确定]
关掉 NFS 配额:                                            [确定]
启动 NFS 守护进程:                                        [确定]
启动 NFS mountd:                                          [确定]
[[email protected] ~]# vim /etc/init.d/nfs

集群和存储管理之DRBD+Heartbeat+NFS的实现
[[email protected] ~]#

四、Heartbeat配置

在server1和server2做以下操作:

1、安装Heartbeat套件

[[email protected] ~]# yum localinstall -y heartbeat-2.1.4-9.el5.i386.rpm heartbeat-pils-2.1.4-10.el5.i386.rpm heartbeat-stonith-2.1.4-10.el5.i386.rpm libnet-1.1.4-3.el5.i386.rpm perl-MailTools-1.77-1.el5.noarch.rpm --nogpgcheck

[[email protected] ~]# yum localinstall -y heartbeat-2.1.4-9.el5.i386.rpm heartbeat-pils-2.1.4-10.el5.i386.rpm heartbeat-stonith-2.1.4-10.el5.i386.rpm libnet-1.1.4-3.el5.i386.rpm perl-MailTools-1.77-1.el5.noarch.rpm –nogpgcheck

2、拷贝配置文档

[[email protected] ~]# cd /usr/share/doc/heartbeat-2.1.4/

[[email protected] heartbeat-2.1.4]# cp authkeys ha.cf haresources /etc/ha.d/

[[email protected] heartbeat-2.1.4]# cd /etc/ha.d/

vim /etc/ha.d/ha.cf

集群和存储管理之DRBD+Heartbeat+NFS的实现

集群和存储管理之DRBD+Heartbeat+NFS的实现

集群和存储管理之DRBD+Heartbeat+NFS的实现

集群和存储管理之DRBD+Heartbeat+NFS的实现

集群和存储管理之DRBD+Heartbeat+NFS的实现

集群和存储管理之DRBD+Heartbeat+NFS的实现

vim /etc/ha.d/haresources

集群和存储管理之DRBD+Heartbeat+NFS的实现

vim /etc/ha.d/authkeys

集群和存储管理之DRBD+Heartbeat+NFS的实现

[[email protected] heartbeat-2.1.4]# cd /etc/ha.d/
[[email protected] ha.d]# echo "killall -9 nfsd; /etc/init.d/nfs restart; exit 0" >>resource.d/killnfsd
[[email protected] ha.d]# chmod 600 /etc/ha.d/authkeys
[[email protected] ha.d]# chmod 755 /etc/ha.d/resource.d/killnfsd
[[email protected] ha.d]# scp ha.cf authkeys  haresources
haresources: Not a directory
[[email protected] ha.d]# hostname server1.a.com
[[email protected] ha.d]# vim /hosts
[[email protected] ha.d]# vim /etc/hosts
[[email protected] ha.d]# vim /etc/sysconfig/net
netconsole       network          networking/      network-scripts/
[[email protected] ha.d]# vim /etc/sysconfig/network
[[email protected] ha.d]# scp ha.cf authkeys  haresources server2.a.com:/etc/ha.d/
The authenticity of host 'server2.a.com (192.168.145.102)' can't be established.
RSA key fingerprint is 0c:84:f6:dd:8b:bd:9c:5b:b0:4c:cc:5c:6a:a6:32:f5.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'server2.a.com,192.168.145.102' (RSA) to the list of known hosts.
[email protected]'s password:
ha.cf                                                                        100%   10KB  10.3KB/s   00:00   
authkeys                                                                     100%  643     0.6KB/s   00:00   
haresources                                                                  100% 6014     5.9KB/s   00:00   
[[email protected] ha.d]# vim /hosts

[[email protected] ~]# cd /etc/ha.d/
[[email protected] ha.d]# scp resource.d/killnfsd server2.a.com:/etc/ha.d/resource.d/killnfsd
[email protected]'s password:
killnfsd                                                                     100%   49     0.1KB/s   00:00   
[[email protected] ha.d]#

server2.a.com上:

vim ha.cf

集群和存储管理之DRBD+Heartbeat+NFS的实现

配置server1的地址:

集群和存储管理之DRBD+Heartbeat+NFS的实现

[[email protected] ha.d]# service heartbeat restart
Stopping High-Availability services:
                                                           [确定]
Waiting to allow resource takeover to complete:
                                                           [确定]
Starting High-Availability services:
2012/10/10_03:39:13 INFO:  Resource is stopped
                                                           [确定]

[[email protected] ha.d]# ifconfig eth0:0
eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:13:81:77 
          inet addr:192.168.145.103  Bcast:192.168.145.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:193 Base address:0x2000

 

五、测试

1、在测试机上将192.168.145.103:/mnt/nfs挂到本地/data下

[[email protected] ~]# mkdir /data

[[email protected] ~]# mount 192.168.145.103:/mnt/nfs/ /data/

[[email protected] ~]# cd /data/

[[email protected] data]# ll

total 20

-rw-r--r-- 1 root root 4 Feb 8 17:41 f1

drwx------ 2 root root 16384 Feb 8 14:57 lost+found

[[email protected] data]# touch f-client-1

[[email protected] data]# ll

total 20

-rw-r--r-- 1 root root 0 Feb 8 19:50 f-client-1

-rw-r--r-- 1 root root 4 Feb 8 17:41 f1

drwx------ 2 root root 16384 Feb 8 14:57 lost+found

[[email protected] data]#cd

[[email protected] ~]#

2、在测试机上创建测试shell,每秒一次

集群和存储管理之DRBD+Heartbeat+NFS的实现

3、将主节点node1 的heartbeat服务停止,则备节点node2 接管服务

[[email protected] ha.d]# service heartbeat stop

Stopping High-Availability services:

[ OK ]

[[email protected] ha.d]# drbd-overview

0:nfs Connected Secondary/Primary UpToDate/UpToDate C r----

[[email protected] ha.d]# ifconfig eth0:0

eth0:0 Link encap:Ethernet HWaddr 00:0C:29:AE:83:D1

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Interrupt:67 Base address:0x2000

[[email protected] ha.d]#

[[email protected] ha.d]# drbd-overview

0:nfs Connected Primary/Secondary UpToDate/UpToDate C r---- /mnt/nfs ext3 950M 18M 885M 2%

[[email protected] ha.d]# ifconfig eth0:0

eth0:0 Link encap:Ethernet HWaddr 00:0C:29:D1:D4:32

inet addr:192.168.145.103 Bcast:192.168.145.254 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Interrupt:67 Base address:0x2000

[[email protected] ha.d]#

4、在客户端上运行nfs.sh测试文件,一直显示的信息如下:

[[email protected] ~]# ./nfs.sh

---> trying touch x : Wed Feb 8 20:00:58 CST 2012

<----- done touch x : Wed Feb 8 20:00:58 CST 2012

---> trying touch x : Wed Feb 8 20:00:59 CST 2012

<----- done touch x : Wed Feb 8 20:00:59 CST 2012

---> trying touch x : Wed Feb 8 20:01:00 CST 2012

<----- done touch x : Wed Feb 8 20:01:00 CST 2012

---> trying touch x : Wed Feb 8 20:01:01 CST 2012

<----- done touch x : Wed Feb 8 20:01:01 CST 2012

---> trying touch x : Wed Feb 8 20:01:02 CST 2012

<----- done touch x : Wed Feb 8 20:01:02 CST 2012

---> trying touch x : Wed Feb 8 20:01:03 CST 2012

<----- done touch x : Wed Feb 8 20:01:03 CST 2012

---> trying touch x : Wed Feb 8 20:01:04 CST 2012

<----- done touch x : Wed Feb 8 20:01:04 CST 2012

---> trying touch x : Wed Feb 8 20:01:05 CST 2012

<----- done touch x : Wed Feb 8 20:01:05 CST 2012

5、查看客户端的挂载信息如下,磁盘可正常使用:

[[email protected] ~]# mount

/dev/sda2 on / type ext3 (rw)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw)

devpts on /dev/pts type devpts (rw,gid=5,mode=620)

/dev/sda1 on /boot type ext3 (rw)

tmpfs on /dev/shm type tmpfs (rw)

none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

nfsd on /proc/fs/nfsd type nfsd (rw)

192.168.101.210:/mnt/nfs on /data type nfs (rw,addr=192.168.101.210)

[[email protected] ~]#

[[email protected] ~]# ll /data/

total 20

-rw-r--r-- 1 root root 0 Feb 8 19:50 f-client-1

-rw-r--r-- 1 root root 4 Feb 8 17:41 f1

drwx------ 2 root root 16384 Feb 8 14:57 lost+found

[[email protected] ~]#

至此,node2 接管服务成功,实验已实现所需的功能;也可手动在nfs挂载目录里建立文件,来回切换node1和node2的drbd服务来进行测试。

5、恢复node1为主要节点

[[email protected] ha.d]# service heartbeat start

Starting High-Availability services:

2012/02/08_20:04:49 INFO: Resource is stopped

[ OK ]

[[email protected] ha.d]# drbd-overview

0:nfs Connected Primary/Secondary UpToDate/UpToDate C r---- /mnt/nfs ext3 950M 18M 885M 2%

[[email protected] ha.d]#

至此DRBD+Heartbeat+NFS已经实现成功!!!