oracle12.2 dataguard 丢失归档日志恢复
今天有个邮件服务器的库做的adg,由于文件系统满了,导致主备一直有问题,没有同步数据,因为主库一直是可用状态,过了好久也没管。今天要用库了,就解决了下这个主备不同步的问题。
查看主库的问题日志
但是备库日志才到85号日志文件:
查询备库,是gap了:
86-97号日志文件丢了,因为文件系统满了,我把归档直接删了。
oracle的dg如果日志无法应用的话,是没有办法像mysql那样跳过去的。
要解决这个问题,目前我只能想到两个办法:
- 主库还在,直接把备库铲了,重新初始化备库。
- 根据备库当前的scn号,去主库做增量备份,根据主库的增量备份在备库进行应用。
我这里采用第二种办法,做增量备份,然后在备库还原。
下面是详细步骤:
- 查询备库scn号。这个地方注意,如果备库跟主库是同步的,那么备库的scn号跟主库应该是一样的,因为备库的scn号就是根据主库传来的归档日志产生的。
=======这个地方的查询是主备已经恢复正常后的查询,只是为了说明主备scn一致=======
SQL> select SEQUENCE#,FIRST_CHANGE#,NEXT_CHANGE#,APPLIED,STATUS from v$archived_log t ;
...
SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# APPLIED S
---------- ------------- ------------ --------- -
904 14063537 14064375 YES A
901 14060851 14062030 YES A
905 14064375 14065005 YES A
907 14065323 14066064 YES A
908 14066064 14066284 YES A
909 14066284 14067751 YES A
比如在备库查的应用到最近的一个日志是909,他的下一个scn号是14067751,而在主库查询
SQL> select SEQUENCE#,FIRST_CHANGE#,NEXT_CHANGE#,APPLIED,STATUS from v$archived_log t ;
...
SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# APPLIED S
---------- ------------- ------------ --------- -
907 14065323 14066064 YES A
908 14066064 14066284 NO A
908 14066064 14066284 YES A
909 14066284 14067751 NO A
909 14066284 14067751 YES A
909号next scn号也是一样,所以可以认为主备在绝对一致的时候scn是一样的。
===============================================================================
查询备库scn
SQL> select current_scn from v$database;
CURRENT_SCN
-----------
7315442
SQL>
这个scn号要记下来,后面我们在主库做增量备份的时候就要从这个scn号开始。
2、取消掉备库上的日志应用
alter database recover managed standby database cancel;
- 主库上进行备份
export ORACLE_SID=cc
rman target /
backup device type disk incremental from scn 7315442 database format '/ccarch/incre_bak/lost_archive_%u_%s_%p.bk';
注意增量备份的文件名不能起固定的名字,因为增量备的时候会产生多个备份文件,如果起成一个,备份的时候会报文件名已存在的错误,备份失败。
- 将增量备份的文件传到备库上面
scp /ccarch/incre_bak/lost_*.bk [email protected]备库ip:/ccarch/incre_bak/
- 在备库进行恢复
export ORACLE_SID=tcc
rman target /
catalog backuppiece '/ccarch/incre_bak/lost_archive_0kte1mnq_20_1.bk' ;
catalog backuppiece '/ccarch/incre_bak/lost_archive_0lte1mnt_21_1.bk' ;
我这个库因为一直都没有变动,所以恢复起来很快,几秒就恢复完了。
- 备库改成mount状态
export ORACLE_SID=tcc
sqlplus / as sysdba
shutdown immediate ;
startup mount ;
注意:如果你的备库当前是mount阶段,此步略过。
- 在备库执行恢复
recover DATABASE noredo;
注意:这个是接上面第四步,因为我当时做的时候备库是open read only的状态,所以报错了。
- 备库启动同步
export ORACLE_SID=tcc
sqlplus / as sysdba
alter database recover managed standby database parallel 4 using current logfile disconnect from session;
=========可以看下备库当前的状态和日志应用情况==============================
备库状态是mounted,角色是物理库
SQL> SELECT database_role,open_mode FROM v$database;
DATABASE_ROLE OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED
备库归档位置
SQL> select process,status,thread#,sequence#,block#,blocks from V$managed_standby;
PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
--------- ------------ ---------- ---------- ---------- ----------
ARCH CONNECTED 0 0 0 0
DGRD ALLOCATED 0 0 0 0
DGRD ALLOCATED 0 0 0 0
ARCH CONNECTED 0 0 0 0
ARCH CONNECTED 0 0 0 0
ARCH CONNECTED 0 0 0 0
MRP0 WAIT_FOR_GAP 1 86 0 0
7 rows selected.
SELECT MAX(al.SEQUENCE#) "Last Seq Recieved",MAX(lh.SEQUENCE#) "Last Seq Applied"
FROM v$archived_log al, v$log_history lh;
SQL> SELECT MAX(al.SEQUENCE#) "Last Seq Recieved",MAX(lh.SEQUENCE#) "Last Seq Applied"
2 FROM v$archived_log al, v$log_history lh;
Last Seq Recieved Last Seq Applied
----------------- ----------------
100 85
显示有GAP
SQL> SELECT * FROM v$archive_gap;
THREAD# LOW_SEQUENCE# HIGH_SEQUENCE# CON_ID
---------- ------------- -------------- ----------
1 86 97 1
===============================================================================
- 查询主库归档位置
SELECT MAX(al.SEQUENCE#) "Last Seq Recieved",MAX(lh.SEQUENCE#) "Last Seq Applied"
FROM v$archived_log al, v$log_history lh;
SELECT database_role,open_mode FROM v$database;
主库切换日志:
ALTER system switch logfile;
--切换后在次查看日志序号增加1个
SELECT MAX(al.SEQUENCE#) "Last Seq Recieved",MAX(lh.SEQUENCE#) "Last Seq Applied"
FROM v$archived_log al, v$log_history lh;
- 备库停同步
SQL> ALTER DATABASE recover managed standby DATABASE cancel;
Database altered.
- 查备库控制文件和数据文件scn号是否一致
SQL> SELECT file#,checkpoint_change# FROM v$datafile ORDER BY 1;
FILE# CHECKPOINT_CHANGE#
---------- ------------------
1 7315443
3 7315443
4 7315443
7 7315443
SQL> SELECT file#,checkpoint_change# FROM v$datafile_header ORDER BY 1;
FILE# CHECKPOINT_CHANGE#
---------- ------------------
1 13907046
3 13907046
4 13907046
7 13907046
SQL>
很遗憾,虽然做了恢复,但是备库的控制文件和数据文件的scn不一致。
gap说明:
尽管在恢复两个备份片后 备库日志还说有GAP:
备库归档显示也是缺少归档的:
注意:v$datafile信息来自于控制文件,而v$datafile_header信息是来自于数据文件头。
- 尝试打开备库(试错步骤)
此时备库执行
alter database open read only;
是打不开的,会报:
SQL> alter database open read only;
alter database open read only
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '/tccdata/tcc/system01.dbf'
- 从主库生成备份控制文件
ALTER DATABASE CREATE standby controlfile AS '/ccarch/standby_lostarchive.ctl';
- 将主库生成的备份控制文件传到备库
scp /ccarch/standby_lostarchive.ctl [email protected]备库ip:/ccarch/
- 停备库后启动到nomount阶段
SQL> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
启动备库到nomount阶段
SQL> startup nomount
ORACLE instance started.
Total System Global Area 2147483648 bytes
Fixed Size 8622776 bytes
Variable Size 603983176 bytes
Database Buffers 1526726656 bytes
Redo Buffers 8151040 bytes
SQL>
- rman连接备库进行恢复控制文件
export ORACLE_SID=cc
rman target /
RMAN> restore controlfile from '/ccarch/standby_lostarchive.ctl';
Starting restore at 26-SEP-18
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=19 device type=DISK
channel ORA_DISK_1: copied control file copy
output file name=/ccdata/tcc/control01.dbf
output file name=/ccdata/tcc/control02.dbf
output file name=/ccredo/tcc/control03.dbf
Finished restore at 26-SEP-18
RMAN> exit
- 启动备库到mount阶段
SQL> alter database mount standby database;
Database altered.
- 启动备库同步进程
SQL> alter database recover managed standby database parallel 4 using current logfile disconnect from session;
Database altered.
- 查询备库日志应用情况
SQL> select process,status,thread#,sequence#,block#,blocks from V$managed_standby;
PROCESS STATUS THREAD#
--------------------------- ------------------------------------ ----------
SEQUENCE# BLOCK# BLOCKS
---------- ---------- ----------
DGRD ALLOCATED 0
0 0 0
ARCH CONNECTED 0
0 0 0
DGRD ALLOCATED 0
0 0 0
PROCESS STATUS THREAD#
--------------------------- ------------------------------------ ----------
SEQUENCE# BLOCK# BLOCKS
---------- ---------- ----------
ARCH CONNECTED 0
0 0 0
ARCH CONNECTED 0
0 0 0
ARCH CONNECTED 0
0 0 0
PROCESS STATUS THREAD#
--------------------------- ------------------------------------ ----------
SEQUENCE# BLOCK# BLOCKS
---------- ---------- ----------
MRP0 WAIT_FOR_LOG 1
739 0 0
7 rows selected.
发现备库归档已经到739了,这个就是主库当前日志文件的前一个
主库
SQL> archive log list ;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /ccarch
Oldest online log sequence 738
Next log sequence to archive 740
Current log sequence 740
SQL>
- 确认主备库同步
在主库上:
SQL> alter system switch logfile;
System altered.
SQL> create table wufan.test001(id int,name varchar2(10));
Table created.
SQL> insert into wufan.test001 values(1,'AAAAAA');
1 row created.
SQL> commit ;
Commit complete.
SQL> alter system switch logfile;
System altered.
SQL> archive log list ;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /ccarch
Oldest online log sequence 907
Next log sequence to archive 909
Current log sequence 909
在备库上:
set linesize 1000
col process format a10;
col status format a10;
col thread# format 99;
col sequence# format 9999;
SQL> select process,status,thread#,sequence# from V$managed_standby;
PROCESS STATUS THREAD# SEQUENCE#
---------- ---------- ------- ---------
DGRD ALLOCATED 0 0
ARCH CONNECTED 0 0
DGRD ALLOCATED 0 0
ARCH CONNECTED 0 0
ARCH CONNECTED 0 0
ARCH CONNECTED 0 0
MRP0 WAIT_FOR_L 1 909
OG
DGRD ALLOCATED 0 0
RFS IDLE 1 909
PROCESS STATUS THREAD# SEQUENCE#
---------- ---------- ------- ---------
RFS IDLE 0 0
RFS IDLE 0 0
RFS IDLE 0 0
RFS IDLE 0 0
13 rows selected.
发现主备归档日志文件一致.
在备库上确认写入的测试数据:
SQL> select * from wufan.test001;
select * from wufan.test001
*
ERROR at line 1:
ORA-01219: database or pluggable database not open: queries allowed on fixed
tables or views only
SQL> alter database open read only ;
alter database open read only
*
ERROR at line 1:
ORA-10456: cannot open standby database; media recovery session may be in
progress
备库未打开,并且在应用日志的时候无法打开,先关闭日志应用,再打开。
备库:
关闭日志应用:
SQL> ALTER DATABASE recover managed standby DATABASE cancel;
Database altered.
打开备库:
SQL> alter database open read only;
Database altered.
再次验证数据:
select * from wufan.test001;
打开备库日志应用:
SQL> alter database recover managed standby database using current logfile disconnect from session;
Database altered.
备库日志应用状态:
SQL> select SEQUENCE#,FIRST_CHANGE#,NEXT_CHANGE#,APPLIED,STATUS from v$archived_log t ;
...
SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# APPLIED STATUS
--------- ------------- ------------ --------- ----------
889 14052764 14053923 YES A
893 14055528 14055545 YES A
894 14055545 14056262 YES A
895 14056262 14057020 YES A
897 14057818 14058568 YES A
896 14057020 14057818 YES A
898 14058568 14059318 YES A
899 14059318 14060100 YES A
900 14060100 14060851 YES A
902 14062030 14062789 YES A
903 14062789 14063537 YES A
SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# APPLIED STATUS
--------- ------------- ------------ --------- ----------
904 14063537 14064375 YES A
901 14060851 14062030 YES A
905 14064375 14065005 YES A
907 14065323 14066064 YES A
908 14066064 14066284 YES A
170 rows selected.
主库:
SQL> alter system switch logfile ;
System altered.
SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /ccarch
Oldest online log sequence 908
Next log sequence to archive 910
Current log sequence 910
SQL>
备库:
SQL> select SEQUENCE#,FIRST_CHANGE#,NEXT_CHANGE#,APPLIED,STATUS from v$archived_log t ;
...
SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# APPLIED STATUS
--------- ------------- ------------ --------- ----------
889 14052764 14053923 YES A
893 14055528 14055545 YES A
894 14055545 14056262 YES A
895 14056262 14057020 YES A
897 14057818 14058568 YES A
896 14057020 14057818 YES A
898 14058568 14059318 YES A
899 14059318 14060100 YES A
900 14060100 14060851 YES A
902 14062030 14062789 YES A
903 14062789 14063537 YES A
SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# APPLIED STATUS
--------- ------------- ------------ --------- ----------
904 14063537 14064375 YES A
901 14060851 14062030 YES A
905 14064375 14065005 YES A
907 14065323 14066064 YES A
908 14066064 14066284 YES A
909 14066284 14067751 YES A
171 rows selected.
发现备库应用正常。
查看adg工作在哪种模式下:
主库:
SQL> select database_role,protection_mode,protection_level from v$database;
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
---------------- -------------------- --------------------
PRIMARY MAXIMUM AVAILABILITY RESYNCHRONIZATION