MHA+keepalive高可用环境搭建

MHA+keepalive高可用环境搭建

2017年02月17日 14:05:57

阅读数:2582

MHA(Master HighAvailability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。

MHA Node运行在每台MySQL服务器上,MHAManager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

实验环境

MHA+keepalive高可用环境搭建

10.192.203.201, 10.192.203.101上面安装了keepalive,vip是108

实验步骤

 

1 修改/etc/hosts

在mha管理节点和数据节点所在服务器上添加这些服务器ip和主机名对应关系。

如:

cat /etc/hosts

10.192.203.201 pc2

10.192.203.101 slave1

10.192.203.102 PC

2 Mysql主从复制环境搭建

搭建过程略,可以参考参考:http://blog.****.net/yabingshi_tech/article/details/45192599

要确保两个从库设置read_only。

要确保master和备选master为主主互备模式,否则后面配置过程有可能会发生错误。

 

3 配置主机信任关系 

在所有节点生成密码文件,然后将其拷贝到本机及其他服务器上,这里以10.192.203.201为例:

 # ssh-******

# ssh-copy-id  [email protected]

 # ssh-copy-id  [email protected]

 # ssh-copy-id  [email protected]  

然后ssh验证下,是否可以免密码登录。

 

4 安装MHA

点击这里进行下载:

http://download.****.net/download/yabignshi/8974251

http://download.****.net/detail/yabignshi/8974265 

在所有数据节点上安装:

yum install perl-DBD-MySQL -y

rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm

 安装完成后会在/usr/bin目录下生成以下脚本文件(这些工具通常由MHAManager的脚本触发,无需人为操作):

save_binary_logs              //保存和复制master的二进制日志
apply_diff_relay_logs          //识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog             //去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs               //清除中继日志(不会阻塞SQL线程)

在管理节点上安装:

yum install perl-DBD-MySQL -y(由于本实验环境管理节点和数据节点部署在了同一台服务器上,所以这里不用重复安装)

yum install perl-Config-Tiny -y

yum install epel-release -y

yum install perl-Log-Dispatch -y

yum install perl-Parallel-ForkManager -y

rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm(由于本实验环境管理节点和数据节点部署在了同一台服务器上,所以这里不用重复安装)

rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm 

安装完成后会在/usr/bin目录下生成以下脚本文件:

 

 
  1. -rwxr-xr-x. 1 root root 1995 Apr 1 2014 masterha_check_repl

  2. -rwxr-xr-x. 1 root root 1779 Apr 1 2014 masterha_check_ssh

  3. -rwxr-xr-x. 1 root root 1865 Apr 1 2014 masterha_check_status

  4. -rwxr-xr-x. 1 root root 3201 Apr 1 2014 masterha_conf_host

  5. -rwxr-xr-x. 1 root root 2517 Apr 1 2014 masterha_manager

  6. -rwxr-xr-x. 1 root root 2165 Apr 1 2014 masterha_master_monitor

  7. -rwxr-xr-x. 1 root root 2373 Apr 1 2014 masterha_master_switch

  8. -rwxr-xr-x. 1 root root 5171 Apr 1 2014 masterha_secondary_check

  9. -rwxr-xr-x. 1 root root 1739 Apr 1 2014 masterha_stop

  10. -rwxr-xr-x. 1 root root 4807 Apr 1 2014 filter_mysqlbinlog

  11. -rwxr-xr-x. 1 root root 7525 Apr 1 2014 save_binary_logs

  12. -rwxr-xr-x. 1 root root 8261 Apr 1 2014 purge_relay_logs

  13. -rwxr-xr-x. 1 root root 16367 Apr 1 2014 apply_diff_relay_logs

 

5 安装配置keepalive

 

5.1 安装keepalive

在master和备选master上安装keepalive:

 

 
  1. yum install -y popt-devel

  2. cd /usr/local/src

  3. wgethttp://www.keepalived.org/software/keepalived-1.2.2.tar.gz

  4. tar zxvf keepalived-1.2.2.tar.gz

  5. cd keepalived-1.2.2

  6. ./configure --prefix=/

  7. make

  8. make install

 

5.2 修改配置文件

vi/etc/keepalived/keepalived.conf

master和备选master配置文件内容相同。

 

 
  1. #ConfigurationFile for keepalived

  2. global_defs {

  3. notification_email { ######定义接受邮件的邮箱

  4. [email protected]

  5. }

  6. notification_email_from [email protected] ######定义发送邮件的邮箱

  7. smtp_server mail.tuge.com

  8. smtp_connect_timeout 10

  9. }

  10. vrrp_instance vrrptest { ######定义vrrptest实例

  11. state BACKUP ######服务器状态

  12. interface eth0 ######使用的接口

  13. virtual_router_id 51 ######虚拟路由的标志,一组lvs的虚拟路由标识必须相同,这样才能切换

  14. priority 150 ######服务启动优先级,值越大,优先级越高,BACKUP 不能大于MASTER

  15. advert_int 1 ######服务器之间的存活检查时间

  16. authentication {

  17. auth_type PASS ######认证类型

  18. auth_pass ufsoft ######认证密码,一组lvs 服务器的认证密码必须一致

  19. }

  20. virtual_ipaddress { ######虚拟IP地址

  21. 10.192.203.108

  22. }

  23. }

这里master服务器的state不配置成MASTER,且配置的优先级一样,是期望在master宕机后再恢复时,不主动将MASTER状态抢过来,避免MySQL服务的波动。

这里没有配置vrrp_script,在后面会让mha实现vip的自动漂移。

5.3  vi /etc/sysconfig/iptables

#注意,在两台机器上都要修改。

添加:

-A INPUT-d 10.192.203.108/32 -j ACCEPT

-A INPUT-d 224.0.0.18 -j ACCEPT #添加VRRP通讯支持

注意:第一行中的10.192.203.108需要改成你自己的vip。

serviceiptables restart

5.4 启动keepalive

service keepalived start

分别执行ip addr命令,可以在其中一台机器上看到虚拟IP

5.5 测试

停止master服务器keepalived,检查VIP是否切换到备选master服务器(用ip addr命令验证即可);

6 配置Mha

 

6.1 添加管理账号

#在数据节点上执行以下操作

grant all privileges on *.* TO [email protected]'10.192.%' IDENTIFIED BY 'test'; 

flush privileges;

6.2 配置/etc/mha/app1.cnf

 #只在管理端做

 mkdir /etc/mha

 mkdir -p /var/log/mha/app1 

vi /etc/mha/app1.cnf

添加:

 

 
  1. [server default]

  2. manager_log=/var/log/mha/app1/manager.log

  3. manager_workdir=/var/log/mha/app1.log

  4. master_binlog_dir=/data/mysql/data

  5. master_ip_failover_script= /usr/bin/master_ip_failover

  6. master_ip_online_change_script=/usr/bin/master_ip_online_change

  7. report_script=/usr/bin/send_report

  8. user=mha

  9. password=test

  10. ping_interval=2

  11. repl_password=beijing

  12. repl_user=rep_user

  13. ssh_user=root

  14.  
  15. [server1]

  16. hostname=10.192.203.201

  17. port=3306

  18.  
  19. [server2]

  20. candidate_master=1

  21. check_repl_delay=0

  22. hostname=10.192.203.101

  23. port=3306

  24.  
  25. [server3]

  26. hostname=10.192.203.102

  27. port=3306

在server default中的配置,是三台数据节点共同的配置,也可以放到具体的server中进行定制。
/*
参数概念解释:

master_binlog_dir=/data/mysql/data #设置master 保存binlog的位置,以便MHA可以找到master的日志

master_ip_failover_script=/usr/bin/master_ip_failover         #设置自动failover时候的切换脚本

master_ip_online_change_script=/usr/bin/master_ip_online_change  #设置手动切换时候的切换脚本

report_script=/usr/bin/send_report                            //设置发生切换后发送的报警的脚本

ping_interval=2 #设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行failover

candidate_master=1 #设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave

check_repl_delay=0 #默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master

*/ 

6.3 master_ip_failover脚本代码

要想把keepalived服务引入MHA,我们只需要修改切换是触发的脚本文件master_ip_failover即可,在该脚本中添加在master发生宕机时对keepalived的处理。

在管理节点编辑脚本/usr/bin/master_ip_failover,修改后如下:

 

 
  1. #!/usr/bin/env perl

  2. use strict;

  3. use warnings FATAL => 'all';

  4.  
  5. use Getopt::Long;

  6.  
  7. my (

  8. $command, $ssh_user, $orig_master_host, $orig_master_ip,

  9. $orig_master_port, $new_master_host, $new_master_ip, $new_master_port

  10. );

  11.  
  12. my $vip = '10.192.203.108';

  13. my $ssh_start_vip ="/etc/init.d/keepalived start";

  14. my $ssh_stop_vip ="/etc/init.d/keepalived stop";

  15.  
  16. GetOptions(

  17. 'command=s' =>\$command,

  18. 'ssh_user=s' =>\$ssh_user,

  19. 'orig_master_host=s' => \$orig_master_host,

  20. 'orig_master_ip=s' =>\$orig_master_ip,

  21. 'orig_master_port=i' => \$orig_master_port,

  22. 'new_master_host=s' =>\$new_master_host,

  23. 'new_master_ip=s' =>\$new_master_ip,

  24. 'new_master_port=i' =>\$new_master_port,

  25. );

  26.  
  27. exit &main();

  28.  
  29. sub main {

  30.  
  31. print "\n\nIN SCRIPTTEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

  32.  
  33. if ( $command eq "stop" || $command eq "stopssh" ) {

  34.  
  35. my $exit_code = 1;

  36. eval {

  37. print "Disabling the VIP on old master: $orig_master_host \n";

  38. &stop_vip();

  39. $exit_code = 0;

  40. };

  41. if ([email protected]) {

  42. warn "Got Error: [email protected]\n";

  43. exit $exit_code;

  44. }

  45. exit $exit_code;

  46. }

  47. elsif ( $command eq "start" ) {

  48.  
  49. my $exit_code = 10;

  50. eval {

  51. print "Enabling the VIP - $vip on the new master - $new_master_host\n";

  52. &start_vip();

  53. $exit_code = 0;

  54. };

  55. if ([email protected]) {

  56. warn [email protected];

  57. exit $exit_code;

  58. }

  59. exit $exit_code;

  60. }

  61. elsif ( $command eq "status" ) {

  62. print "Checking the Status of the script.. OK \n";

  63. exit 0;

  64. }

  65. else {

  66. &usage();

  67. exit 1;

  68. }

  69. }

  70. sub start_vip() {

  71. `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;

  72. }

  73. # A simple system call that disable the VIPon the old_master

  74. sub stop_vip() {

  75. `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;

  76. }

  77.  
  78. sub usage {

  79. print

  80. "Usage: master_ip_failover --command=start|stop|stopssh|status--orig_master_host=host --orig_master_ip=ip --orig_master_port=port--new_master_host=host --new_master_ip=ip --new_master_port=port\n";

  81. }


注意:my $vip = '10.192.203.108';这一行中的vip请根据情况改成自己的虚拟IP。

chmod +x /usr/bin/master_ip_failover

6.4 master_ip_online_change脚本代码

在管理节点编辑脚本/usr/bin/master_ip_online_change

 

 
  1. #!/usr/bin/env perl

  2.  
  3. # Copyright (C) 2011 DeNA Co.,Ltd.

  4. #

  5. # This program is free software; you can redistribute it and/or modify

  6. # itunder the terms of the GNU General Public License as published by

  7. # the Free Software Foundation; either version 2 of the License, or

  8. # (at your option) any later version.

  9. #

  10. # This program is distributed in the hope that it will be useful,

  11. # but WITHOUT ANY WARRANTY; without even the implied warranty of

  12. # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

  13. # GNU General Public License for more details.

  14. #

  15. # You should have received a copy of the GNU General Public License

  16. # along with this program; if not, write to the Free Software

  17. # Foundation, Inc.,

  18. # 51Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

  19.  
  20. ## Note: This is a sample script and is notcomplete. Modify the script based on your environment.

  21.  
  22. use strict;

  23. use warnings FATAL => 'all';

  24.  
  25. use Getopt::Long;

  26. use MHA::DBHelper;

  27. use MHA::NodeUtil;

  28. use Time::HiRes qw( sleep gettimeofdaytv_interval );

  29. use Data::Dumper;

  30.  
  31. my $_tstart;

  32. my $_running_interval = 0.1;

  33. my (

  34. $command, $orig_master_is_new_slave, $orig_master_host,

  35. $orig_master_ip, $orig_master_port, $orig_master_user,

  36. $orig_master_password, $orig_master_ssh_user, $new_master_host,

  37. $new_master_ip, $new_master_port, $new_master_user,

  38. $new_master_password, $new_master_ssh_user

  39. );

  40. my $vip = '10.192.203.108/32';

  41. my $key = '1';

  42. my $ssh_start_vip = "/etc/init.d/keepalivedstart";

  43. my $ssh_stop_vip = "/etc/init.d/keepalivedstop";

  44. my $orig_master_ssh_port = 22;

  45. my $new_master_ssh_port = 22;

  46. GetOptions(

  47. 'command=s' =>\$command,

  48. 'orig_master_is_new_slave' => \$orig_master_is_new_slave,

  49. 'orig_master_host=s' =>\$orig_master_host,

  50. 'orig_master_ip=s' =>\$orig_master_ip,

  51. 'orig_master_port=i' =>\$orig_master_port,

  52. 'orig_master_user=s' =>\$orig_master_user,

  53. 'orig_master_password=s' =>\$orig_master_password,

  54. 'orig_master_ssh_user=s' =>\$orig_master_ssh_user,

  55. 'new_master_host=s' =>\$new_master_host,

  56. 'new_master_ip=s' =>\$new_master_ip,

  57. 'new_master_port=i' =>\$new_master_port,

  58. 'new_master_user=s' =>\$new_master_user,

  59. 'new_master_password=s' =>\$new_master_password,

  60. 'new_master_ssh_user=s' =>\$new_master_ssh_user,

  61. 'orig_master_ssh_port=i' =>\$orig_master_ssh_port,

  62. 'new_master_ssh_port=i' =>\$new_master_ssh_port,

  63. );

  64.  
  65. exit &main();

  66.  
  67. sub current_time_us {

  68. my( $sec, $microsec ) = gettimeofday();

  69. my$curdate = localtime($sec);

  70. return $curdate . " " . sprintf( "%06d", $microsec);

  71. }

  72.  
  73. sub sleep_until {

  74. my$elapsed = tv_interval($_tstart);

  75. if( $_running_interval > $elapsed ) {

  76. sleep( $_running_interval - $elapsed );

  77. }

  78. }

  79.  
  80. sub get_threads_util {

  81. my$dbh = shift;

  82. my$my_connection_id = shift;

  83. my$running_time_threshold = shift;

  84. my$type = shift;

  85. $running_time_threshold = 0 unless ($running_time_threshold);

  86. $type = 0 unless($type);

  87. [email protected];

  88.  
  89. my$sth = $dbh->prepare("SHOW PROCESSLIST");

  90. $sth->execute();

  91.  
  92. while ( my $ref = $sth->fetchrow_hashref() ) {

  93. my $id = $ref->{Id};

  94. my $user = $ref->{User};

  95. my $host = $ref->{Host};

  96. my $command =$ref->{Command};

  97. my $state = $ref->{State};

  98. my $query_time = $ref->{Time};

  99. my $info = $ref->{Info};

  100. $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);

  101. next if ( $my_connection_id == $id );

  102. next if ( defined($query_time) && $query_time <$running_time_threshold );

  103. next if ( defined($command) && $command eq "Binlog Dump" );

  104. next if ( defined($user) && $user eq "system user" );

  105. next

  106. if ( defined($command)

  107. && $command eq "Sleep"

  108. && defined($query_time)

  109. && $query_time >= 1 );

  110.  
  111. if ( $type >= 1 ) {

  112. next if ( defined($command) && $command eq "Sleep" );

  113. next if ( defined($command) && $command eq "Connect");

  114. }

  115.  
  116. if ( $type >= 2 ) {

  117. next if ( defined($info) && $info =~ m/^select/i );

  118. next if ( defined($info) && $info =~ m/^show/i );

  119. }

  120.  
  121. push @threads, $ref;

  122. }

  123. return @threads;

  124. }

  125.  
  126. sub main {

  127. if( $command eq "stop" ) {

  128. ## Gracefully killing connections on the current master

  129. #1. Set read_only= 1 on the new master

  130. #2. DROP USER so that no app user can establish new connections

  131. #3. Set read_only= 1 on the current master

  132. #4. Kill current queries

  133. #* Any database access failure will result in script die.

  134. my $exit_code = 1;

  135. eval {

  136. ## Setting read_only=1 on the new master (to avoid accident)

  137. my $new_master_handler = new MHA::DBHelper();

  138.  
  139. # args: hostname, port, user, password, raise_error(die_on_error)_or_not

  140. $new_master_handler->connect( $new_master_ip, $new_master_port,

  141. $new_master_user, $new_master_password, 1 );

  142. print current_time_us() . " Set read_only on the new master..";

  143. $new_master_handler->enable_read_only();

  144. if ( $new_master_handler->is_read_only() ) {

  145. print "ok.\n";

  146. }

  147. else {

  148. die "Failed!\n";

  149. }

  150. $new_master_handler->disconnect();

  151.  
  152. # Connecting to the orig master, die if any database error happens

  153. my $orig_master_handler = new MHA::DBHelper();

  154. $orig_master_handler->connect( $orig_master_ip, $orig_master_port,

  155. $orig_master_user, $orig_master_password, 1 );

  156.  
  157. ## Drop application user so that nobody can connect. Disablingper-session binlog beforehand

  158. $orig_master_handler->disable_log_bin_local();

  159. print current_time_us() . " Drpping appuser on the orig master..\n";

  160. #FIXME_xxx_drop_app_user($orig_master_handler);

  161.  
  162. ## Waiting for N * 100 milliseconds so that current connections can exit

  163. my $time_until_read_only = 15;

  164. $_tstart = [gettimeofday];

  165. my @threads = get_threads_util( $orig_master_handler->{dbh},

  166. $orig_master_handler->{connection_id} );

  167. while ( $time_until_read_only > 0 && $#threads >= 0 ) {

  168. if ( $time_until_read_only % 5 == 0 ) {

  169. printf

  170. "%s Waiting all running %d threads aredisconnected.. (max %d milliseconds)\n",

  171. current_time_us(), $#threads + 1, $time_until_read_only * 100;

  172. if ( $#threads < 5 ) {

  173. print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump ."\n"

  174. foreach (@threads);

  175. }

  176. }

  177. sleep_until();

  178. $_tstart = [gettimeofday];

  179. $time_until_read_only--;

  180. @threads = get_threads_util( $orig_master_handler->{dbh},

  181. $orig_master_handler->{connection_id} );

  182. }

  183.  
  184. ## Setting read_only=1 on the current master so that nobody(exceptSUPER) can write

  185. print current_time_us() . " Set read_only=1 on the orig master..";

  186. $orig_master_handler->enable_read_only();

  187. if ( $orig_master_handler->is_read_only() ) {

  188. print "ok.\n";

  189. }

  190. else {

  191. die "Failed!\n";

  192. }

  193.  
  194. ## Waiting for M * 100 milliseconds so that current update queries cancomplete

  195. my $time_until_kill_threads = 5;

  196. @threads = get_threads_util( $orig_master_handler->{dbh},

  197. $orig_master_handler->{connection_id} );

  198. while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {

  199. if ( $time_until_kill_threads % 5 == 0 ) {

  200. printf

  201. "%s Waiting all running %d queries aredisconnected.. (max %d milliseconds)\n",

  202. current_time_us(), $#threads + 1, $time_until_kill_threads * 100;

  203. if ( $#threads < 5 ) {

  204. print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump ."\n"

  205. foreach (@threads);

  206. }

  207. }

  208. sleep_until();

  209. $_tstart = [gettimeofday];

  210. $time_until_kill_threads--;

  211. @threads = get_threads_util( $orig_master_handler->{dbh},

  212. $orig_master_handler->{connection_id} );

  213. }

  214.  
  215. ## Terminating all threads

  216. print current_time_us() . " Killing all applicationthreads..\n";

  217. $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0);

  218. print current_time_us() . " done.\n";

  219. $orig_master_handler->enable_log_bin_local();

  220. $orig_master_handler->disconnect();

  221.  
  222. ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK

  223. eval {

  224. `ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host\" $ssh_stop_vip \"`;

  225. };

  226. if ([email protected]) {

  227. warn [email protected];

  228. }

  229. $exit_code = 0;

  230. };

  231. if ([email protected]) {

  232. warn "Got Error: [email protected]\n";

  233. exit $exit_code;

  234. }

  235. exit $exit_code;

  236. }

  237. elsif ( $command eq "start" ) {

  238. ## Activating master ip on the new master

  239. #1. Create app user with write privileges

  240. #2. Moving backup script if needed

  241. #3. Register new master's ip to the catalog database

  242.  
  243. # We don't return error even thoughactivating updatable accounts/ip failed so that we don't interrupt slaves'recovery.

  244. # If exit code is 0 or 10, MHA does notabort

  245. my $exit_code = 10;

  246. eval {

  247. my $new_master_handler = new MHA::DBHelper();

  248.  
  249. # args: hostname, port, user, password, raise_error_or_not

  250. $new_master_handler->connect( $new_master_ip, $new_master_port,

  251. $new_master_user, $new_master_password, 1 );

  252.  
  253. ## Set read_only=0 on the new master

  254. $new_master_handler->disable_log_bin_local();

  255. print current_time_us() . " Setread_only=0 on the new master.\n";

  256. $new_master_handler->disable_read_only();

  257.  
  258. ## Creating an app user on the new master

  259. print current_time_us() . " Creating app user on the newmaster..\n";

  260. #FIXME_xxx_create_app_user($new_master_handler);

  261. $new_master_handler->enable_log_bin_local();

  262. $new_master_handler->disconnect();

  263.  
  264. ## Update master ip on the catalog database, etc

  265. `ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host\" $ssh_start_vip \"`;

  266. $exit_code = 0;

  267. };

  268. if ([email protected]) {

  269. warn "Got Error: [email protected]\n";

  270. exit $exit_code;

  271. }

  272. exit $exit_code;

  273. }

  274. elsif ( $command eq "status" ) {

  275.  
  276. #do nothing

  277. exit 0;

  278. }

  279. else {

  280. &usage();

  281. exit 1;

  282. }

  283. }

  284.  
  285. sub usage {

  286. print

  287. "Usage: master_ip_online_change--command=start|stop|status --orig_master_host=host --orig_master_ip=ip--orig_master_port=port --new_master_host=host --new_master_ip=ip--new_master_port=port\n";

  288. die;

  289. }

注意:需要把my $vip = '10.192.203.108/32';改成自己的vip即可。

chmod +x /usr/bin/master_ip_online_change

6.5 send_report脚本代码

在管理节点编辑脚本/usr/bin/send_report

 

 
  1. #!/usr/bin/perl

  2.  
  3. # Copyright (C) 2011 DeNA Co.,Ltd.

  4. #

  5. # This program is free software; you can redistribute it and/or modify

  6. # itunder the terms of the GNU General Public License as published by

  7. # the Free Software Foundation; either version 2 of the License, or

  8. # (at your option) any later version.

  9. #

  10. # This program is distributed in the hope that it will be useful,

  11. # but WITHOUT ANY WARRANTY; without even the implied warranty of

  12. # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

  13. # GNU General Public License for more details.

  14. #

  15. # You should have received a copy of the GNU General Public License

  16. # along with this program; if not, write to the Free Software

  17. # Foundation, Inc.,

  18. # 51Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

  19.  
  20. ## Note: This is a sample script and is notcomplete. Modify the script based on your environment.

  21.  
  22. use strict;

  23. use warnings FATAL => 'all';

  24. use Mail::Sender;

  25. use Getopt::Long;

  26.  
  27. #new_master_host and new_slave_hosts areset only when recovering master succeeded

  28. my ( $dead_master_host, $new_master_host, $new_slave_hosts,$subject, $body );

  29. my $smtp='smtp.163.com';

  30. my $mail_from='[email protected]';

  31. my $mail_user='[email protected]';

  32. my $mail_pass='Password';

  33. my$mail_to=['[email protected]','[email protected]'];

  34. GetOptions(

  35. 'orig_master_host=s' => \$dead_master_host,

  36. 'new_master_host=s' =>\$new_master_host,

  37. 'new_slave_hosts=s' =>\$new_slave_hosts,

  38. 'subject=s' =>\$subject,

  39. 'body=s' => \$body,

  40. );

  41.  
  42. mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

  43.  
  44. sub mailToContacts {

  45. my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;

  46. open my $DEBUG, "> /tmp/monitormail.log"

  47. or die "Can't open the debug file:$!\n";

  48. my $sender = new Mail::Sender {

  49. ctype => 'text/plain;charset=utf-8',

  50. encoding => 'utf-8',

  51. smtp => $smtp,

  52. from => $mail_from,

  53. auth => 'LOGIN',

  54. TLS_allowed => '0',

  55. authid => $user,

  56. authpwd => $passwd,

  57. to => $mail_to,

  58. subject => $subject,

  59. debug => $DEBUG

  60. };

  61.  
  62. $sender->MailMsg(

  63. { msg => $msg,

  64. debug => $DEBUG

  65. }

  66. )or print $Mail::Sender::Error;

  67. return 1;

  68. }

  69.  
  70.  
  71.  
  72. # Do whatever you want here

  73.  
  74. exit 0;

  75.  
  76. 注意:需要修改下以下几行内容:

  77. my $smtp='smtp.163.com';

  78. my $mail_from='[email protected]';

  79. my $mail_user='[email protected]';

  80. my $mail_pass='Password';

  81. my$mail_to=['[email protected]','[email protected]'];

#f赋予执行权限

chmod +x /usr/bin/send_report 

注意:需要确保管理节点服务器可以正常发送邮件。可以先用sendEmail命令试下。

6.6 设置relay log的清除方式(在每个slave节点上)

从服务器配置文件要确保relay_log_purge=0,否则在masterha_check_repl时会报错warning,relay_log_purge=0 is not set on slave

主库将来发生切换变成一个从库后,记得在原先的主库上也执行该操作。

MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。

在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。

但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。

定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。

为了避免复制延时,需要暂时为中继日志创建硬链接,因为在linux系统中通过硬链接删除大文件速度会很快。

(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)

 

7 检查mha manage是不是配置成功 

7.1 检查ssh登录 

 

 
  1. [[email protected]]# masterha_check_ssh--conf=/etc/mha/app1.cnf

  2. Mon Feb 13 15:35:25 2017 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.

  3. Mon Feb 13 15:35:25 2017 - [info] Readingapplication default configuration from /etc/mha/app1.cnf..

  4. Mon Feb 13 15:35:25 2017 - [info] Readingserver configuration from /etc/mha/app1.cnf..

  5. Mon Feb 13 15:35:25 2017 - [info] StartingSSH connection tests..

  6. Mon Feb 13 15:35:26 2017 - [debug]

  7. Mon Feb 13 15:35:25 2017 - [debug] Connecting via SSH [email protected](10.192.203.201:22) [email protected](10.192.203.101:22)..

  8. Mon Feb 13 15:35:25 2017 - [debug] ok.

  9. Mon Feb 13 15:35:25 2017 - [debug] Connecting via SSH [email protected](10.192.203.201:22) [email protected](10.192.203.102:22)..

  10. Mon Feb 13 15:35:26 2017 - [debug] ok.

  11. Mon Feb 13 15:35:26 2017 - [debug]

  12. Mon Feb 13 15:35:25 2017 - [debug] Connecting via SSH [email protected](10.192.203.101:22) [email protected](10.192.203.201:22)..

  13. Mon Feb 13 15:35:25 2017 - [debug] ok.

  14. Mon Feb 13 15:35:25 2017 - [debug] Connecting via SSH [email protected](10.192.203.101:22) [email protected](10.192.203.102:22)..

  15. Mon Feb 13 15:35:26 2017 - [debug] ok.

  16. Mon Feb 13 15:35:27 2017 - [debug]

  17. Mon Feb 13 15:35:26 2017 - [debug] Connecting via SSH [email protected](10.192.203.102:22) [email protected](10.192.203.201:22)..

  18. Mon Feb 13 15:35:26 2017 - [debug] ok.

  19. Mon Feb 13 15:35:26 2017 - [debug] Connecting via SSH [email protected](10.192.203.102:22) [email protected](10.192.203.101:22)..

  20. Mon Feb 13 15:35:27 2017 - [debug] ok.

  21. Mon Feb 13 15:35:27 2017 - [info] All SSHconnection tests passed successfully.

假如ssh可以免密码登录其他几台服务器,但检查ssh时却报错:

 

 
  1. [[email protected] .ssh]# masterha_check_ssh--conf=/etc/mha/app1.cnf

  2. Fri Feb 10 10:04:35 2017 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.

  3. Fri Feb 10 10:04:35 2017 - [info] Readingapplication default configuration from /etc/mha/app1.cnf..

  4. Fri Feb 10 10:04:35 2017 - [info] Readingserver configuration from /etc/mha/app1.cnf..

  5. Fri Feb 10 10:04:35 2017 - [info] StartingSSH connection tests..

  6. Fri Feb 10 10:04:36 2017 -[error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]

  7. Fri Feb 10 10:04:35 2017 - [debug] Connecting via SSH [email protected](10.192.203.201:22) [email protected](10.192.203.101:22)..

  8. Warning: Permanently added '10.192.203.201'(RSA) to the list of known hosts.

  9. Permission denied(publickey,gssapi-keyex,gssapi-with-mic,password).

  10. Fri Feb 10 10:04:35 2017 -[error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connectionfrom [email protected](10.192.203.201:22) [email protected](10.192.203.101:22) failed!

  11. Fri Feb 10 10:04:37 2017 - [debug]

  12. Fri Feb 10 10:04:36 2017 - [debug] Connecting via SSH [email protected](10.192.203.101:22) [email protected](10.192.203.201:22)..

  13. Fri Feb 10 10:04:36 2017 - [debug] ok.

  14. Fri Feb 10 10:04:36 2017 - [debug] Connecting via SSH [email protected](10.192.203.101:22) [email protected](10.192.203.102:22)..

  15. Fri Feb 10 10:04:37 2017 - [debug] ok.

  16. Fri Feb 10 10:04:38 2017 - [debug]

  17. Fri Feb 10 10:04:36 2017 - [debug] Connecting via SSH [email protected](10.192.203.102:22) [email protected](10.192.203.201:22)..

  18. Fri Feb 10 10:04:37 2017 - [debug] ok.

  19. Fri Feb 10 10:04:37 2017 - [debug] Connecting via SSH [email protected](10.192.203.102:22) to [email protected](10.192.203.101:22)..

  20. Fri Feb 10 10:04:38 2017 - [debug] ok.

  21. SSH Configuration Check Failed!

  22. at/usr/bin/masterha_check_ssh line 44

 需要检查下:

①   /etc/hosts文件是否配置正确;

② 密码文件是否ssh-copy-id到了自身服务器。

假如遇到这个错误,修正后,将.ssh下的内容全部清空,然后重新认证即可。

 

7.2 检查mysql replication是否配置成功

[[email protected] app1]# masterha_check_repl--conf=/etc/mha/app1.cnf

 

 
  1. Fri Feb 10 11:25:05 2017 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.

  2. Fri Feb 10 11:25:05 2017 - [info] Readingapplication default configuration from /etc/mha/app1.cnf..

  3. Fri Feb 10 11:25:05 2017 - [info] Readingserver configuration from /etc/mha/app1.cnf..

  4. Fri Feb 10 11:25:05 2017 - [info]MHA::MasterMonitor version 0.56.

  5. Fri Feb 10 11:25:06 2017 -[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln193] There is noalive slave. We can't do failover

  6. Fri Feb 10 11:25:06 2017 -[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Errorhappened on checking configurations. at/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326

  7. Fri Feb 10 11:25:06 2017 -[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Errorhappened on monitoring servers.

  8. Fri Feb 10 11:25:06 2017 - [info] Got exitcode 1 (Not master dead).

  9.  
  10. MySQL Replication Health is NOT OK!

明明我检查了复制状态都是正常的,为什么通不过呢?

解决办法:

需要确保master和备选master设置为双主复制。即让master也指向备选master。

 

再次验证,假如报错:

[[email protected] mysql]#  masterha_check_repl --conf=/etc/mha/app1.cnf

 

 
  1. Mon Feb 13 17:21:52 2017 - [info] Connecting [email protected](10.192.203.101:22)..

  2. Checking slave recovery environment settings..

  3. Opening /data/mysql/relay-log.info ... ok.

  4. Relay log found at /data/mysql, up to slave1-relay-bin.000011

  5. Temporary relay log file is /data/mysql/slave1-relay-bin.000011

  6. Testing mysql connection and privileges..ERROR 1045 (28000): Accessdenied for user 'mha'@'slave1' (using password: YES)

  7. mysql command failed with rc 1:0!

  8. at/usr/bin/apply_diff_relay_logs line 375

  9. main::check()called at /usr/bin/apply_diff_relay_logs line 497

  10. eval{...} called at /usr/bin/apply_diff_relay_logs line 475

  11. main::main()called at /usr/bin/apply_diff_relay_logs line 120

  12. Mon Feb 13 17:21:53 2017 -[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln205] Slavessettings check failed!

  13. Mon Feb 13 17:21:53 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm,ln413] Slave configuration failed.

  14. Mon Feb 13 17:21:53 2017 -[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Errorhappened on checking configurations. at/usr/bin/masterha_check_repl line 48

  15. Mon Feb 13 17:21:53 2017 -[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Errorhappened on monitoring servers.

  16. Mon Feb 13 17:21:53 2017 - [info] Got exitcode 1 (Not master dead).

  17.  
  18. MySQL Replication Health is NOT OK!

解决办法:

在slave1上创建用户:

mysql> grant  ALL PRIVILEGES ON *.* TO 'mha'@'slave1'identified by 'test';

Query OK, 0 rows affected (0.00 sec) 

mysql> flush privileges;

Query OK, 0 rows affected (0.00 sec)

同理,在PC上创建用户:

mysql> grant  ALL PRIVILEGES ON *.* TO 'mha'@'PC' identifiedby 'test';

Query OK, 0 rows affected (0.00 sec) 

mysql> flush privileges;

Query OK, 0 rows affected (0.00 sec)

  

再次验证时,假如报错:

Can't exec "mysqlbinlog": No suchfile or directory at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line106.

mysqlbinlog version command failed with rc1:0, please verify PATH, LD_LIBRARY_PATH, and client options

解决办法:

在所有数据节点上都创建一下软连接:

 ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog

再次验证,报错:

Testing mysql connection andprivileges..sh: mysql: command not found

mysql command failed with rc 127:0!

解决办法:

在所有数据节点上建立软连接:

 ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql

最后,一切正常:

[[email protected] mysql]#  masterha_check_repl --conf=/etc/mha/app1.cnf

 

 
  1. Mon Feb 13 17:32:49 2017 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.

  2. Mon Feb 13 17:32:49 2017 - [info] Readingapplication default configuration from /etc/mha/app1.cnf..

  3. Mon Feb 13 17:32:49 2017 - [info] Readingserver configuration from /etc/mha/app1.cnf..

  4. Mon Feb 13 17:32:49 2017 - [info]MHA::MasterMonitor version 0.56.

  5. Mon Feb 13 17:32:49 2017 - [info]Multi-master configuration is detected. Current primary(writable) master is10.192.203.201(10.192.203.201:3306)

  6. Mon Feb 13 17:32:49 2017 - [info] Masterconfigurations are as below:

  7. Master 10.192.203.101(10.192.203.101:3306),replicating from 10.192.203.201(10.192.203.201:3306), read-only

  8. Master 10.192.203.201(10.192.203.201:3306),replicating from 10.192.203.101(10.192.203.101:3306)

  9.  
  10. Mon Feb 13 17:32:49 2017 - [info] GTID failovermode = 0

  11. Mon Feb 13 17:32:49 2017 - [info] DeadServers:

  12. Mon Feb 13 17:32:49 2017 - [info] AliveServers:

  13. Mon Feb 13 17:32:49 2017 - [info] 10.192.203.201(10.192.203.201:3306)

  14. Mon Feb 13 17:32:49 2017 - [info] 10.192.203.101(10.192.203.101:3306)

  15. Mon Feb 13 17:32:49 2017 - [info] 10.192.203.102(10.192.203.102:3306)

  16. Mon Feb 13 17:32:49 2017 - [info] AliveSlaves:

  17. Mon Feb 13 17:32:49 2017 - [info] 10.192.203.101(10.192.203.101:3306) Version=5.5.19-log (oldest major versionbetween slaves) log-bin:enabled

  18. Mon Feb 13 17:32:49 2017 - [info] Replicating from10.192.203.201(10.192.203.201:3306)

  19. Mon Feb 13 17:32:49 2017 - [info] Primary candidate for the new Master(candidate_master is set)

  20. Mon Feb 13 17:32:49 2017 - [info] 10.192.203.102(10.192.203.102:3306) Version=5.5.19-log (oldest major versionbetween slaves) log-bin:enabled

  21. Mon Feb 13 17:32:49 2017 - [info] Replicating from10.192.203.201(10.192.203.201:3306)

  22. Mon Feb 13 17:32:49 2017 - [info] CurrentAlive Master: 10.192.203.201(10.192.203.201:3306)

  23. Mon Feb 13 17:32:49 2017 - [info] Checkingslave configurations..

  24. Mon Feb 13 17:32:49 2017 - [info] Checkingreplication filtering settings..

  25. Mon Feb 13 17:32:49 2017 - [info] binlog_do_db= , binlog_ignore_db=

  26. Mon Feb 13 17:32:49 2017 - [info] Replication filtering check ok.

  27. Mon Feb 13 17:32:49 2017 - [info] GTID(with auto-pos) is not supported

  28. Mon Feb 13 17:32:49 2017 - [info] StartingSSH connection tests..

  29. Mon Feb 13 17:32:51 2017 - [info] All SSHconnection tests passed successfully.

  30. Mon Feb 13 17:32:51 2017 - [info] CheckingMHA Node version..

  31. Mon Feb 13 17:32:52 2017 - [info] Version check ok.

  32. Mon Feb 13 17:32:52 2017 - [info] CheckingSSH publickey authentication settings on the current master..

  33. Mon Feb 13 17:32:52 2017 - [info]HealthCheck: SSH to 10.192.203.201 is reachable.

  34. Mon Feb 13 17:32:52 2017 - [info] MasterMHA Node version is 0.56.

  35. Mon Feb 13 17:32:52 2017 - [info] Checkingrecovery script configurations on 10.192.203.201(10.192.203.201:3306)..

  36. Mon Feb 13 17:32:52 2017 - [info] Executing command: save_binary_logs--command=test --start_pos=4 --binlog_dir=/data/mysql--output_file=/var/tmp/save_binary_logs_test --manager_version=0.56--start_file=single-mysql-bin.000018

  37. Mon Feb 13 17:32:52 2017 - [info] Connecting [email protected](10.192.203.201:22)..

  38. Creating /var/tmp if not exists.. ok.

  39. Checking output directory is accessible or not..

  40. ok.

  41. Binlog found at /data/mysql, up to single-mysql-bin.000018

  42. Mon Feb 13 17:32:52 2017 - [info] Binlogsetting check done.

  43. Mon Feb 13 17:32:52 2017 - [info] CheckingSSH publickey authentication and checking recovery script configurations on allalive slave servers..

  44. Mon Feb 13 17:32:52 2017 - [info] Executing command : apply_diff_relay_logs--command=test --slave_user='mha' --slave_host=10.192.203.101--slave_ip=10.192.203.101 --slave_port=3306 --workdir=/var/tmp--target_version=5.5.19-log --manager_version=0.56--relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx

  45. Mon Feb 13 17:32:52 2017 - [info] Connecting [email protected](10.192.203.101:22)..

  46. Checking slave recovery environment settings..

  47. Opening /data/mysql/relay-log.info ... ok.

  48. Relay log found at /data/mysql, up to slave1-relay-bin.000011

  49. Temporaryrelay log file is /data/mysql/slave1-relay-bin.000011

  50. Testing mysql connection and privileges.. done.

  51. Testing mysqlbinlog output.. done.

  52. Cleaning up test file(s).. done.

  53. Mon Feb 13 17:32:52 2017 - [info] Executing command : apply_diff_relay_logs--command=test --slave_user='mha' --slave_host=10.192.203.102--slave_ip=10.192.203.102 --slave_port=3306 --workdir=/var/tmp--target_version=5.5.19-log --manager_version=0.56--relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx

  54. Mon Feb 13 17:32:52 2017 - [info] Connecting [email protected](10.192.203.102:22)..

  55. Checking slave recovery environment settings..

  56. Opening /data/mysql/relay-log.info ... ok.

  57. Relay log found at /data/mysql, up to PC-relay-bin.000011

  58. Temporary relay log file is /data/mysql/PC-relay-bin.000011

  59. Testing mysql connection and privileges.. done.

  60. Testing mysqlbinlog output.. done.

  61. Cleaning up test file(s).. done.

  62. Mon Feb 13 17:32:53 2017 - [info] Slavessettings check done.

  63. Mon Feb 13 17:32:53 2017 - [info]

  64. 10.192.203.201(10.192.203.201:3306)(current master)

  65. +--10.192.203.101(10.192.203.101:3306)

  66. +--10.192.203.102(10.192.203.102:3306)

  67.  
  68. Mon Feb 13 17:32:53 2017 - [info] Checkingreplication health on 10.192.203.101..

  69. Mon Feb 13 17:32:53 2017 - [info] ok.

  70. Mon Feb 13 17:32:53 2017 - [info] Checkingreplication health on 10.192.203.102..

  71. Mon Feb 13 17:32:53 2017 - [info] ok.

  72. Mon Feb 13 17:32:53 2017 - [info] Checkingmaster_ip_failover_script status:

  73. Mon Feb 13 17:32:53 2017 - [info] /usr/bin/master_ip_failover --command=status--ssh_user=root --orig_master_host=10.192.203.201--orig_master_ip=10.192.203.201 --orig_master_port=3306

  74.  
  75.  
  76. IN SCRIPT TEST====/etc/init.d/keepalivedstop==/etc/init.d/keepalived start===

  77.  
  78. Checking the Status of the script.. OK

  79. Mon Feb 13 17:32:53 2017 - [info] OK.

  80. Mon Feb 13 17:32:53 2017 - [warning]shutdown_script is not defined.

  81. Mon Feb 13 17:32:53 2017 - [info] Got exitcode 0 (Not master dead).

  82.  
  83. MySQL Replication Health is OK.

 

8 在管理端启动监控

nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 & 

 #查看状态

masterha_check_status --conf=/etc/mha/app1.cnf  

app1 (pid:3349) is running(0:PING_OK),master:10.192.203.201

9 验证故障转移

在10.192.203.201上关闭Mysql实例:

 [[email protected]]# service mysqld stop

Shutting downMySQL... SUCCESS!

 在管理节点上查看日志:

tail -f /var/log/mha/app1/manager.log  

会看到:

 

 
  1. Started automated(non-interactive)failover.

  2. Invalidated master IP address on10.192.203.201(10.192.203.201:3306)

  3. The latest slave10.192.203.101(10.192.203.101:3306) has all relay logs for recovery.

  4. Selected10.192.203.101(10.192.203.101:3306) as a new master.

  5. 10.192.203.101(10.192.203.101:3306): OK:Applying all logs succeeded.

  6. 10.192.203.101(10.192.203.101:3306): OK:Activated master IP address.

  7. 10.192.203.102(10.192.203.102:3306): Thishost has the latest relay log events.

  8. Generating relay diff files from the latestslave succeeded.

  9. 10.192.203.102(10.192.203.102:3306): OK:Applying all logs succeeded. Slave started, replicating from10.192.203.101(10.192.203.101:3306)

  10. 10.192.203.101(10.192.203.101:3306):Resetting slave info succeeded.

  11. Master failover to10.192.203.101(10.192.203.101:3306) completed successfully.

  12. Mon Feb 13 17:44:28 2017 - [info] Sendingmail..

  13. Unknown option: conf

也收到了报警邮件:

MHA+keepalive高可用环境搭建

Vip也漂移过来了:

 

 
  1. [[email protected] mysql]# ip addr

  2. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu16436 qdisc noqueue state UNKNOWN

  3. link/loopback 00:00:00:00:00:00 brd00:00:00:00:00:00

  4. inet 127.0.0.1/8 scope host lo

  5. inet6 ::1/128 scope host

  6. valid_lft forever preferred_lft forever

  7. 2: eth0:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen1000

  8. link/ether 08:00:27:04:05:16 brd ff:ff:ff:ff:ff:ff

  9. inet 10.192.203.101/24 brd 10.192.203.255 scope global eth0

  10. inet 10.192.203.108/32 scope global eth0

  11. inet6 fe80::a00:27ff:fe04:516/64 scope link tentative dadfailed

  12. valid_lft forever preferred_lft forever

  13. 3: eth1:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen1000

  14. link/ether 08:00:27:3a:ec:3c brd ff:ff:ff:ff:ff:ff

  15. inet 10.0.10.5/24 brd 10.0.10.255 scope global eth1

  16. inet6 fe80::a00:27ff:fe3a:ec3c/64 scope link tentative dadfailed

  17. valid_lft forever preferred_lft forever

在PC上查看下slave状态:

mysql> show slave status \G;

*************************** 1. row***************************

               Slave_IO_State: Waiting formaster to send event

                  Master_Host: 10.192.203.101

                  Master_User: rep_user

                  Master_Port: 3306

可以看到10.192.203.102复制自动指向了10.192.203.101了。 

在现在的master上查看变量read_only,发现被自动关闭了,说明之前的slave现在可写了:

mysql> showvariables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
1 row in set (0.01 sec) 

切换完成后,manager进程会自动挂掉:

[[email protected] mysql]#  masterha_check_status --conf=/etc/mha/app1.cnf

app1 is stopped(2:NOT_RUNNING).

mha配置文件被自己修改了(原主库的配置信息被删除掉了):

[[email protected] mha]# cat app1.cnf

 

 
  1. [server default]

  2. manager_log=/var/log/mha/app1/manager.log

  3. manager_workdir=/var/log/mha/app1.log

  4. master_binlog_dir=/data/mysql

  5. master_ip_failover_script=/usr/bin/master_ip_failover

  6. master_ip_online_change_script=/usr/bin/master_ip_online_change

  7. password=test

  8. ping_interval=2

  9. repl_password=beijing

  10. repl_user=rep_user

  11. report_script=/usr/bin/send_report

  12. ssh_user=root

  13. user=mha

  14.  
  15. [server2]

  16. candidate_master=1

  17. check_repl_delay=0

  18. hostname=10.192.203.101

  19. port=3306

  20.  
  21. [server3]

  22. hostname=10.192.203.102

  23. port=3306

现在的主库10.192.203.101上已经没有它原先的主从复制信息:

mysql> showslave status \G;

Empty set (0.00sec)

 

ERROR:

No queryspecified

 

接下来我们该做的事情:

①     找出主库挂到的原因,并修复之

②     确保原先的主库现在已追赶上现在的主库,并继续正确复制

③     在现在的主库上change master to指向原来的主库开始复制,为将来故障切换做好准备。

④     检查当前主从复制状态是否正常(masterha_check_repl--conf=/etc/mha/app1.cnf)

⑤     启动原先主库keepalive进程

⑥     确保原先主库配置文件(read_only设置为1,relay_log_purge=0)

⑦     修改mha配置文件,将原先的主库信息重新加入

⑧     启动manager进程,检查mha状态是否正常

10 在线切换

在许多情况下,需要将现有的主服务器迁移到另外一台服务器上。比如主服务器硬件故障,RAID 控制卡需要重建,将主服务器移到性能更好的服务器上等等。维护主服务器引起性能下降,导致停机时间至少无法写入数据。另外, 阻塞或杀掉当前运行的会话会导致主主之间数据不一致的问题发生。 MHA 提供快速切换和优雅的阻

塞写入,这个切换过程只需要 0.5-2s 的时间,这段时间内数据是无法写入的。在很多情况下,0.5-2s 的阻塞写入是可以接受的。因此切换主服务器不需要计划分配维护时间窗口。

MHA在线切换的大概过程:

(1)检测复制设置和确定当前主服务器

(2)确定新的主服务器

(3)阻塞写入到当前主服务器

(4)等待所有从服务器赶上复制

(5)授予写入到新的主服务器

(6)重新设置从服务器

 

假如现在我想从10.192.203.101切换回10.192.203.201,通过在线切换的方式完成。

10.1 停掉MHA监控

masterha_stop --conf=/etc/mha/app1.cnf

10.2 在线切换

[[email protected] bin]# masterha_master_switch--conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=10.192.203.201--new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

 

 
  1. Tue Feb 14 02:44:20 2017 - [info]MHA::MasterRotate version 0.56.

  2. Tue Feb 14 02:44:20 2017 - [info] Startingonline master switch..

  3. Tue Feb 14 02:44:20 2017 - [info]

  4. Tue Feb 14 02:44:20 2017 - [info] * Phase1: Configuration Check Phase..

  5. Tue Feb 14 02:44:20 2017 - [info]

  6. Tue Feb 14 02:44:20 2017 - [warning] Globalconfiguration file /etc/masterha_default.cnf not found. Skipping.

  7. Tue Feb 14 02:44:20 2017 - [info] Readingapplication default configuration from /etc/mha/app1.cnf..

  8. Tue Feb 14 02:44:20 2017 - [info] Readingserver configuration from /etc/mha/app1.cnf..

  9. Tue Feb 14 02:44:20 2017 - [info]Multi-master configuration is detected. Current primary(writable) master is10.192.203.101(10.192.203.101:3306)

  10. Tue Feb 14 02:44:20 2017 - [info] Masterconfigurations are as below:

  11. Master 10.192.203.101(10.192.203.101:3306),replicating from 10.192.203.201(10.192.203.201:3306)

  12. Master 10.192.203.201(10.192.203.201:3306),replicating from 10.192.203.101(10.192.203.101:3306), read-only

  13.  
  14. Tue Feb 14 02:44:20 2017 - [info] GTIDfailover mode = 0

  15. Tue Feb 14 02:44:20 2017 - [info] CurrentAlive Master: 10.192.203.101(10.192.203.101:3306)

  16. Tue Feb 14 02:44:20 2017 - [info] AliveSlaves:

  17. Tue Feb 14 02:44:20 2017 - [info] 10.192.203.201(10.192.203.201:3306) Version=5.5.19-log (oldest major versionbetween slaves) log-bin:enabled

  18. Tue Feb 14 02:44:20 2017 - [info] Replicating from10.192.203.101(10.192.203.101:3306)

  19. Tue Feb 14 02:44:20 2017 - [info] 10.192.203.102(10.192.203.102:3306) Version=5.5.19-log (oldest major versionbetween slaves) log-bin:enabled

  20. Tue Feb 14 02:44:20 2017 - [info] Replicating from10.192.203.101(10.192.203.101:3306)

  21.  
  22. It is better to execute FLUSHNO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to executeon 10.192.203.101(10.192.203.101:3306)? (YES/no): yes

  23. Tue Feb 14 02:45:03 2017 - [info] ExecutingFLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..

  24. Tue Feb 14 02:45:03 2017 - [info] ok.

  25. Tue Feb 14 02:45:03 2017 - [info] CheckingMHA is not monitoring or doing failover..

  26. Tue Feb 14 02:45:03 2017 - [info] Checkingreplication health on 10.192.203.201..

  27. Tue Feb 14 02:45:03 2017 - [info] ok.

  28. Tue Feb 14 02:45:03 2017 - [info] Checkingreplication health on 10.192.203.102..

  29. Tue Feb 14 02:45:03 2017 - [info] ok.

  30. Tue Feb 14 02:45:03 2017 - [info]10.192.203.201 can be new master.

  31. Tue Feb 14 02:45:03 2017 - [info]

  32. From:

  33. 10.192.203.101(10.192.203.101:3306)(current master)

  34. +--10.192.203.201(10.192.203.201:3306)

  35. +--10.192.203.102(10.192.203.102:3306)

  36.  
  37. To:

  38. 10.192.203.201(10.192.203.201:3306) (newmaster)

  39. +--10.192.203.102(10.192.203.102:3306)

  40. +--10.192.203.101(10.192.203.101:3306)

  41.  
  42. Starting master switch from10.192.203.101(10.192.203.101:3306) to 10.192.203.201(10.192.203.201:3306)?(yes/NO): yes

  43. Tue Feb 14 02:45:09 2017 - [info] Checkingwhether 10.192.203.201(10.192.203.201:3306) is ok for the new master..

  44. Tue Feb 14 02:45:09 2017 - [info] ok.

  45. Tue Feb 14 02:45:09 2017 - [info] ** Phase1: Configuration Check Phase completed.

  46. Tue Feb 14 02:45:09 2017 - [info]

  47. Tue Feb 14 02:45:09 2017 - [info] * Phase2: Rejecting updates Phase..

  48. Tue Feb 14 02:45:09 2017 - [info]

  49. Tue Feb 14 02:45:09 2017 - [info] Executingmaster ip online change script to disable write on the current master:

  50. Tue Feb 14 02:45:09 2017 - [info] /usr/bin/master_ip_online_change--command=stop --orig_master_host=10.192.203.101--orig_master_ip=10.192.203.101 --orig_master_port=3306--orig_master_user='mha' --orig_master_password='test'--new_master_host=10.192.203.201 --new_master_ip=10.192.203.201--new_master_port=3306 --new_master_user='mha' --new_master_password='test' --orig_master_ssh_user=root--new_master_ssh_user=root --orig_master_is_new_slave

  51. Tue Feb 14 02:45:09 2017 728241 Setread_only on the new master.. ok.

  52. Tue Feb 14 02:45:09 2017 733969 Drpping appuser on the orig master..

  53. Tue Feb 14 02:45:09 2017 735861 Set read_only=1on the orig master.. ok.

  54. Tue Feb 14 02:45:09 2017 739394 Killing allapplication threads..

  55. Tue Feb 14 02:45:09 2017 739431 done.

  56. SIOCSIFFLAGS: Cannot assign requestedaddress

  57. Tue Feb 14 02:45:09 2017 - [info] ok.

  58. Tue Feb 14 02:45:09 2017 - [info] Lockingall tables on the orig master to reject updates from everybody (includingroot):

  59. Tue Feb 14 02:45:09 2017 - [info] ExecutingFLUSH TABLES WITH READ LOCK..

  60. Tue Feb 14 02:45:09 2017 - [info] ok.

  61. Tue Feb 14 02:45:09 2017 - [info] Origmaster binlog:pos is single-mysql-bin.000009:997.

  62. Tue Feb 14 02:45:09 2017 - [info] Waiting to execute all relay logs on10.192.203.201(10.192.203.201:3306)..

  63. Tue Feb 14 02:45:09 2017 - [info] master_pos_wait(single-mysql-bin.000009:997)completed on 10.192.203.201(10.192.203.201:3306). Executed 0 events.

  64. Tue Feb 14 02:45:09 2017 - [info] done.

  65. Tue Feb 14 02:45:09 2017 - [info] Gettingnew master's binlog name and position..

  66. Tue Feb 14 02:45:09 2017 - [info] single-mysql-bin.000022:107

  67. Tue Feb 14 02:45:09 2017 - [info] All other slaves should start replicationfrom here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.192.203.201',MASTER_PORT=3306, MASTER_LOG_FILE='single-mysql-bin.000022', MASTER_LOG_POS=107,MASTER_USER='rep_user', MASTER_PASSWORD='xxx';

  68. Tue Feb 14 02:45:09 2017 - [info] Executingmaster ip online change script to allow write on the new master:

  69. Tue Feb 14 02:45:09 2017 - [info] /usr/bin/master_ip_online_change--command=start --orig_master_host=10.192.203.101 --orig_master_ip=10.192.203.101--orig_master_port=3306 --orig_master_user='mha' --orig_master_password='test'--new_master_host=10.192.203.201 --new_master_ip=10.192.203.201--new_master_port=3306 --new_master_user='mha' --new_master_password='test'--orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave

  70. Tue Feb 14 02:45:09 2017 946501 Setread_only=0 on the new master.

  71. Tue Feb 14 02:45:09 2017 947009 Creatingapp user on the new master..

  72. Tue Feb 14 02:45:10 2017 - [info] ok.

  73. Tue Feb 14 02:45:10 2017 - [info]

  74. Tue Feb 14 02:45:10 2017 - [info] *Switching slaves in parallel..

  75. Tue Feb 14 02:45:10 2017 - [info]

  76. Tue Feb 14 02:45:10 2017 - [info] -- Slaveswitch on host 10.192.203.102(10.192.203.102:3306) started, pid: 8353

  77. Tue Feb 14 02:45:10 2017 - [info]

  78. Tue Feb 14 02:45:10 2017 - [info] Logmessages from 10.192.203.102 ...

  79. Tue Feb 14 02:45:10 2017 - [info]

  80. Tue Feb 14 02:45:10 2017 - [info] Waiting to execute all relay logs on10.192.203.102(10.192.203.102:3306)..

  81. Tue Feb 14 02:45:10 2017 - [info] master_pos_wait(single-mysql-bin.000009:997)completed on 10.192.203.102(10.192.203.102:3306). Executed 0 events.

  82. Tue Feb 14 02:45:10 2017 - [info] done.

  83. Tue Feb 14 02:45:10 2017 - [info] Resetting slave10.192.203.102(10.192.203.102:3306) and starting replication from the newmaster 10.192.203.201(10.192.203.201:3306)..

  84. Tue Feb 14 02:45:10 2017 - [info] Executed CHANGE MASTER.

  85. Tue Feb 14 02:45:10 2017 - [info] Slave started.

  86. Tue Feb 14 02:45:10 2017 - [info] End oflog messages from 10.192.203.102 ...

  87. Tue Feb 14 02:45:10 2017 - [info]

  88. Tue Feb 14 02:45:10 2017 - [info] -- Slaveswitch on host 10.192.203.102(10.192.203.102:3306) succeeded.

  89. Tue Feb 14 02:45:10 2017 - [info] Unlockingall tables on the orig master:

  90. Tue Feb 14 02:45:10 2017 - [info] ExecutingUNLOCK TABLES..

  91. Tue Feb 14 02:45:10 2017 - [info] ok.

  92. Tue Feb 14 02:45:10 2017 - [info] Startingorig master as a new slave..

  93. Tue Feb 14 02:45:10 2017 - [info] Resetting slave10.192.203.101(10.192.203.101:3306) and starting replication from the newmaster 10.192.203.201(10.192.203.201:3306)..

  94. Tue Feb 14 02:45:10 2017 - [info] Executed CHANGE MASTER.

  95. Tue Feb 14 02:45:10 2017 - [info] Slave started.

  96. Tue Feb 14 02:45:10 2017 - [info] All newslave servers switched successfully.

  97. Tue Feb 14 02:45:10 2017 - [info]

  98. Tue Feb 14 02:45:10 2017 - [info] * Phase5: New master cleanup phase..

  99. Tue Feb 14 02:45:10 2017 - [info]

  100. Tue Feb 14 02:45:10 2017 - [info] 10.192.203.201: Resetting slave infosucceeded.

  101. Tue Feb 14 02:45:10 2017 - [info] Switchingmaster to 10.192.203.201(10.192.203.201:3306) completed successfully.

中间需要手动输入两个yes。

 

10.3 检查是否切换成功

执行masterha_check_repl --conf=/etc/mha/app1.cnf验证下当前主从配置是否正确。

登录各从库,验证下现在复制是否已指向新的主库。

现在10.192.203.101, 10.192.203.102复制都指向10.192.203.201.

10.192.203.201上没有复制信息:

mysql> show slave status \G;

Empty set (0.00 sec)

 

ERROR:

No query specified

 

发现在线切换后有如下特点:

①  :在线切换完成后,其他库复制都会指向现在的主库。现在的主库上原先的复制信息被清除

②  :mha配置文件里原先主库的信息没有被删除。

③  :原先的主库自动变成只读,现在的主库自动变成可写。

④  :原来主库上的vip自动漂移到了现在的主库上。

10.4 后续操作

①:在线切换完成后,现在的主库仍然需要change master to原先的主库,然后开始复制。

②  :启动原先主库的keepalive进程

③:启动mha监控

 

我发现当备用主库宕机后,假如主库也宕掉了,此时mha无法自动切换到其他slave上。错误日志:

Wed Feb 15 21:42:04 2017 -[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln492]  Server 10.192.203.101(10.192.203.101:3306) isdead, but must be alive! Check server settings.

Wed Feb 15 21:42:04 2017 -[error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm line 268