频发:故障排除之又见 ORA-4031丨云和恩墨技术通讯





经验:Oracle RAC跨节点访问数据块,节点长事务加剧gc等待

经验:IBM MQ通道连接数达到最大故障分析


频发:再谈Library Cache Lock




云和恩墨技术通讯集锦: https://www.modb.pro/doc/topic/5927

部分精选-频发:故障排除之又见 ORA-4031  作者:候静远

当遇到ORA-4031错误时,你会不会内心一紧。Oracle进程在向SGA申请内存时,如果申请失败,则会抛出这个错误,大部分情况下是在向SGA中的 shared pool申请内存时失败。严重情况下,可能导致数据库出现异常崩溃。本文分享客户近期碰到的一起由于ORA-4031问题导致数据库异常宕机的案例,供大家参考。




1. 节点后台对应alert日志:

Wed Sep 04 03:57:50 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(2,0)","kglsim object batch")

Wed Sep 04 03:58:10 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(1,0)","kglsim object batch")

Wed Sep 04 03:58:26 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(7,0)","kglsim object batch")

Wed Sep 04 03:58:42 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(6,0)","kglsim object batch")

Wed Sep 04 03:58:57 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.mon_mods$ set ins...","sga heap(5,0)","kglsim object batch")

Wed Sep 04 03:59:08 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_xxx0_42548.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select count(*) from sys.job...","sga heap(3,0)","kglsim object batch")

Wed Sep 04 03:59:10 2019

License high water mark = 97

USER (ospid: 28750): terminating the instance

统计1节点每个子池及duration出现04031的次数,sga heap(n,0)-n代表第几个子池,0代表是第几个duration:

频发:故障排除之又见 ORA-4031丨云和恩墨技术通讯

根据alert日志可以看出,所有的ora-4031都发生在shared pool子池的第0个duration上。

Summary of resize operations history:

shared pool            start   3.19 GB  now   3.19 GB  0 grows   0 shrinks

large pool             start   0.50 GB  now   0.50 GB  0 grows   0 shrinks

java pool              start   0.50 GB  now   0.50 GB  0 grows   0 shrinks

SGA Target             start  32.00 GB  now  32.00 GB  0 grows   0 shrinks

DEFAULT buffer cache   start  27.59 GB  now  27.59 GB  0 grows   0 shrinks

PGA Target             start  11.00 GB  now  11.00 GB  0 grows   0 shrinks

发现shared pool并没有进行resize。




"KGLH0                     "       1103 MB 19%

"SQLA                      "       1081 MB 18%

"free memory               "        835 MB 14%

"gcs resources             "        794 MB 14%

"gcs shadows               "        550 MB  9%

"db_block_hash_buckets     "        178 MB  3%

"ASH buffers               "        160 MB  3%

"KGLHD                     "        157 MB  3%

"Checkpoint queue          "        156 MB  3%

"kglsim object batch       "         90 MB  2%

"kglsim heap               "         56 MB  1%

"ges resource              "         53 MB  1%

"ges enqueues              "         43 MB  1%

"KGLDA                     "         41 MB  1%

"dbwriter coalesce buffer  "         40 MB  1%

"dirty object counts array "         40 MB  1%

"object queue              "         35 MB  1%

"gcs res hash bucket       "         32 MB  1%

"dbktb: trace buffer       "         31 MB  1%

"FileOpenBlock             "         30 MB  1%

TOTALS ---------------------------------------

Total free memory                   830 MB

Total memory alloc.                5026 MB

Grand total                        5856 MB


2. 节点后台对应的alert日志:

Wed Sep 04 03:23:18 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_35378.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")

Wed Sep 04 03:23:23 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_35453.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(1,0)","kglsim object batch")

Wed Sep 04 03:23:29 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_35725.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(5,0)","kglsim object batch")

Wed Sep 04 03:23:34 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_35778.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(2,0)","kglsim object batch")

Wed Sep 04 03:23:39 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_36069.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(6,0)","kglsim object batch")

Wed Sep 04 03:23:45 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_36151.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(3,0)","kglsim object batch")

Wed Sep 04 03:23:50 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_36242.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(7,0)","kglsim object batch")

Wed Sep 04 03:23:55 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_36305.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")


频发:故障排除之又见 ORA-4031丨云和恩墨技术通讯

根据alert日志可以看出,所有的ora-4031同样都发生在shared pool子池的第0个duration上,导致4031的根本原因是因为shared pool子池的第0个duration内存不足。

通过设置sga_target的ASMM管理后,共享池(shared_pool)和流池(streams pool)每个子池都是4个duration。它们分别是:instance,session,cursor,execution,只有第四个duration,也就是execution是可以resize的,而当第0个duration内存不足的时候不能resize,就直接会报错ora-4031。



alter system set "_enable_shared_pool_durations"=false scope=spfile;

通过该参数设置后,把它们四个duration都合并到一个池中,不会再出现一个duration的内存被耗尽,而另外一个duration仍具有空闲内存,对于共享池和流池都是这样;设置sga_target之后,所有池都会通过buffer cache来传输granules(颗粒)整数倍大小的内存,如果shrink,则返回buffer cache,没有从一个pool到另外一个pool的直接传输,所有的内存resize都会以buffer cache作为源和目标。

设置该参数的唯一负面影响是SGA resize的时候,不能从shared pool中取内存到其他的pool。