现象:硬件掉电后,oracle集群无法启动。
[root@rac2 ~]# crsctl stat res -t
crs-4535: cannot communicate with cluster ready services
crs-4000: command status failed, or completed with errors.
[root@rac2 ~]# crsctl start crs
crs-4640: oracle high availability services is already active
crs-4000: command start failed, or completed with errors.
查看集群组件发现ora.asm状态为offline
root@rac2 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
name target state server state_details
--------------------------------------------------------------------------------
cluster resources
--------------------------------------------------------------------------------
ora.asm
1 online offline instance shutdown
ora.cluster_interconnect.haip
1 online online rac2
ora.crf
1 online online rac2
ora.crsd
1 online offline
ora.cssd
1 online online rac2
ora.cssdmonitor
1 online online rac2
ora.ctssd
1 online online rac2 observer
ora.diskmon
1 offline offline
ora.drivers.acfs
1 online online rac2
ora.evmd
1 online intermediate rac2
ora.gipcd
1 online online rac2
ora.gpnpd
1 online online rac2
ora.mdnsd
1 online online rac2
查看grid alert日志发现磁盘组没有mount
[ohasd(4329)]crs-2769:unable to failover resource 'ora.diskmon'.
2018-05-08 04:12:24.940:
[cssd(4576)]crs-1707:lease acquisition for node rac2 number 2 completed
2018-05-08 04:12:26.188:
[cssd(4576)]crs-1605:cssd voting file is online: /dev/asmdisk/oraasm-ocr_0000; details in /u01/app/11.2.0/grid/log/rac2/cssd/ocssd.log.
2018-05-08 04:12:28.723:
[cssd(4576)]crs-1601:cssd reconfiguration complete. active nodes are rac1 rac2 .
2018-05-08 04:12:30.617:
[ctssd(4660)]crs-2401:the cluster time synchronization service started on host rac2.
2018-05-08 04:12:30.617:
[ctssd(4660)]crs-2407:the new cluster time synchronization service reference node is host rac1.
2018-05-08 04:12:32.348:
[ohasd(4329)]crs-2767:resource state recovery not attempted for 'ora.diskmon' as its target state is offline
2018-05-08 04:12:32.348:
[ohasd(4329)]crs-2769:unable to failover resource 'ora.diskmon'.
查看asm_alert,出现ora-00600 [kfrvalacd30]的报错
note: gmon heartbeating for grp 2
gmon querying group 2 at 6 for pid 23, osid 5727
note: cache opening disk 0 of grp 2: data_0000 path:/dev/asmdisk/oraasm-asm_0000
note: f1x0 found on disk 0 au 2 fcn 0.0
note: cache opening disk 1 of grp 2: data_0001 path:/dev/asmdisk/oraasm-asm_0001
note: f1x0 found on disk 1 au 2 fcn 0.0
note: cache opening disk 2 of grp 2: data_0002 path:/dev/asmdisk/oraasm-asm_0002
note: f1x0 found on disk 2 au 2 fcn 0.0
note: cache opening disk 3 of grp 2: data_0003 path:/dev/asmdisk/oraasm-asm_0003
note: cache mounting (first) normal redundancy group 2/0x877a96cd (data)
* allocate domain 2, invalid = true
note: attached to recovery domain 2
note: starting recovery of thread=1 ckpt=8.390 group=2 (data)
errors in file /u01/app/grid/diag/asm/ asm/ asm2/trace/ asm2_ora_5727.trc (incident=50111):
ora-00600: internal error code, arguments: [kfrvalacd30], [data], [1], [8], [390], [9], [390], [], [], [], [], []
ora-15017: diskgroup "asm" cannot be mounted
ora-15063: asm discovered an insufficient number of disks for diskgroup "asm"
incident details in: /u01/app/grid/diag/asm/ asm/ asm2/incident/incdir_50111/ asm2_ora_5727_i50111.trc
use adrci or support workbench to package the incident.
see note 411.1 at my oracle support for error and packaging details.
errors in file /u01/app/grid/diag/asm/ asm/ asm2/trace/ asm2_ora_5727.trc:
ora-00600: internal error code, arguments: [kfrvalacd30], [data], [1], [8], [390], [9], [390], [], [], [], [], []
ora-15017: diskgroup "asm" cannot be mounted
ora-15063: asm discovered an insufficient number of disks for diskgroup "asm"
abort recovery for domain 2
note: crash recovery signalled oer-600
error: ora-600 signalled during mount of diskgroup data
note: cache dismounting (clean) group 2/0x877a96cd (data)
note: messaging ckpt to quiesce pins unix process pid: 5727, image: oracle@rac2 (tns v1-v3)
查看asm2_ora_5727_i50111.trc
kfdp_query(data): 6
----- abridged call stack trace -----
ksedsts() 465<-kfdp_query() 530<-kfdpstsyncpriv() 585<-kfgfinalizemount() 1630<-kfgscfinalize() 1433<-kfgforeachkfgsc() 285<-kfgsofinalize() 135<-kfgfinalize() 398<-kfxdrvmount() 5558<-kfxdrventry() 2207<-opiexe() 20624<-opiosq0() 3932<-kpooprx() 274<-kpoal8() 842
<-opiodr() 917<-ttcpip() 2183<-opitsk() 1710<-opiino() 969<-opiodr() 917<-opidrv() 570<-sou2o() 103<-opimai_real() 133<-ssthrdmain() 265<-main() 201<-__libc_start_main() 253
----- end of abridged call stack trace -----
2018-05-17 21:54:44.321623 : start recovery for domain=2, valid=0, flags=0x4
note: starting recovery of thread=1 ckpt=8.390 group=2 (data)
incident 50111 created, dump file: /u01/app/grid/diag/asm/ asm/ asm2/incident/incdir_50111/ asm2_ora_5727_i50111.trc
ora-00600: internal error code, arguments: [kfrvalacd30], [data], [1], [8], [390], [9], [390], [], [], [], [], []
ora-15017: diskgroup "asm" cannot be mounted
ora-15063: asm discovered an insufficient number of disks for diskgroup "asm"
recovery of group data failed due to the following error(s):
ora-00600: internal error code, arguments: [kfrvalacd30], [data], [1], [8], [390], [9], [390], [], [], [], [], []
ora-15017: diskgroup "asm" cannot be mounted
ora-15063: asm discovered an insufficient number of disks for diskgroup "asm"
2018-05-17 21:54:45.016907 : abort recovery for domain 2, flags = 0x4
2018-05-17 21:54:45.017799 : kjb_abort_recovery: domain flags=0x0, valid=0
note: messaging ckpt to quiesce pins unix process pid: 5727, image: oracle@rac2 (tns v1-v3)
kfdp_dismount(): 7
----- abridged call stack trace -----
ksedsts() 465<-kfdp_dismountint() 389<-kfdp_dismount() 11<-kfgtermcache() 347<-kfgrecoverdismount() 373<-kfgrecovermount() 277<-kfgscdelete() 2742<-kssdel() 155<-kfgscfinalize() 1211<-kfgforeachkfgsc() 285<-kfgsofinalize() 135<-kfgfinalize() 398<-kfxdrvmount() 5558
<-kfxdrventry() 2207<-opiexe() 20624<-opiosq0() 3932<-kpooprx() 274<-kpoal8() 842<-opiodr() 917<-ttcpip() 2183<-opitsk() 1710<-opiino() 969<-opiodr() 917<-opidrv() 570<-sou2o() 103<-opimai_real() 133<-ssthrdmain() 265<-main() 201<-__libc_start_main() 253
----- end of abridged call stack trace -----
asm name of disk 0x9f1d9488 (2:0:data_0000:/dev/asmdisk/oraasm-asm_0000) is being cleared
asm name of disk 0x9f1d9808 (2:1:data_0001:/dev/asmdisk/oraasm-asm_0001) is being cleared
asm name of disk 0x9f1d9108 (2:2:data_0002:/dev/asmdisk/oraasm-asm_0002) is being cleared
asm name of disk 0x9f1d8d88 (2:3:data_0003:/dev/asmdisk/oraasm-asm_0003) is being cleared
note: messaging ckpt to quiesce pins unix process pid: 5727, image: oracle@rac2 (tns v1-v3)
kfdp_dismount(): 8
----- abridged call stack trace -----
ksedsts() 465<-kfdp_dismountint() 389<-kfdp_dismount() 11<-kfgtermcache() 347<-kfgrecoverdismount() 373<-kfgrecovermount() 277<-kfgscdelete() 2742<-kssdct() 258<-kfgsodelete() 721<-kssdel() 155<-kssdch() 6849<-ksures() 52<-opiosq0() 4721<-kpooprx() 274<-kpoal8() 842
<-opiodr() 917<-ttcpip() 2183<-opitsk() 1710<-opiino() 969<-opiodr() 917<-opidrv() 570<-sou2o() 103<-opimai_real() 133<-ssthrdmain() 265<-main() 201<-__libc_start_main() 253
----- end of abridged call stack trace -----
搜索mos也发现了几个bug:
bug 27288230 - ora-600: [kfrvalacd30] in asm smon (doc id 27288230.8)
bug 19064132 : ora-600 [kfrvalacd30] mounting diskgroup after restarting servers
ora-600 [kfrvalacd30] in asm (doc id 2123013.1)
bug在11.2.0.1/11.2.0.3/112.0.4/12.1/18.1 都存在。
oracle认为这是存储或者os问题导致asm acd block的元数据不一致了
可能导致asm元数据的主辅扩展区都发生损坏。这个损坏会导致rebalance挂起或不断尝试失败,或者阻止磁盘组被挂载。
strace出现ora 600 [kfrvalacd30]报错,但没发现其他可用的信息
write(25, "string_value", 12) = 12
write(25, ">", 1) = 1
write(25, "ora 600 [kfrvalacd30]", 21) = 21
write(25, "", 1) = 1
write(25, "", 1) = 1
write(25, "\n", 1) = 1
write(25, "<", 1) = 1
write(25, "creatm_dbgristih", 16) = 16
write(25, ">", 1) = 1
write(25, "<", 1) = 1
write(25, "time", 4) = 4
write(25, ">", 1) = 1
asm的active change directory(acd)简单来说就相当于asm元数据重做记录,需要说明的是,每一个asm实例,都用有其自己的acd目录,也就说,如果你是双节点的rac,那么就有84m的acd 目录信息。
acd主要是记录了这样一些数据:线程号,block号,opcode,sequences,记录长度等信息。
当处于运行的asm实例,突然crash后,那么重启asm实例以后,asm可以根据acd信息去进行”instance recover”,从而保证能够正常启动asm实例。
如果acd的信息出现损坏,磁盘组将不能够被mount。
使用kfed来读取元数据,首先定位到active change directory 所在au
[grid@rac2 ~]$ kfed read /dev/dm-0 aun=2 blkn=3|grep .xptr.au |head -1
kfffde[0].xptr.au: 4 ; 0x4a0: 0x00000004
从上面信息可以看出,acd元数据和数据应该包含在其中一个au 4
查看au 4的信息
grid@rac2 ~]$ kfed read /dev/dm-0 aun=4 blkn=1|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 8 ; 0x002: kfbtyp_chngdir
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 1 ; 0x004: blk=1
kfbh.block.obj: 3 ; 0x008: file=3
kfbh.check: 17400326 ; 0x00c: 0x01098206
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfracdb.aba.seq: 2 ; 0x000: 0x00000002
kfracdb.aba.blk: 0 ; 0x004: 0x00000000
kfracdb.ents: 2 ; 0x008: 0x0002
kfracdb.ub2spare: 0 ; 0x00a: 0x0000
kfracdb.lge[0].valid: 1 ; 0x00c: v=1 b=0 m=0
kfracdb.lge[0].chgcount: 1 ; 0x00d: 0x01
kfracdb.lge[0].len: 52 ; 0x00e: 0x0034
kfracdb.lge[0].kfcn.base: 1 ; 0x010: 0x00000001
kfracdb.lge[0].kfcn.wrap: 0 ; 0x014: 0x00000000
kfracdb.lge[0].bcd[0].kfbl.blk: 0 ; 0x018: blk=0
kfracdb.lge[0].bcd[0].kfbl.obj: 4 ; 0x01c: file=4
kfracdb.lge[0].bcd[0].kfcn.base: 0 ; 0x020: 0x00000000
kfracdb.lge[0].bcd[0].kfcn.wrap: 0 ; 0x024: 0x00000000
kfracdb.lge[0].bcd[0].oplen: 4 ; 0x028: 0x0004 ---表示长度,类似logfile dump的len
kfracdb.lge[0].bcd[0].blkindex: 0 ; 0x02a: 0x0000
kfracdb.lge[0].bcd[0].flags: 28 ; 0x02c: f=0 n=0 f=1 l=1 v=1 a=0 c=0
kfracdb.lge[0].bcd[0].opcode: 212 ; 0x02e: 0x00d4 --opcode,类似数据库实例中的update/delete/insert操作的opcode编号
kfracdb.lge[0].bcd[0].kfbtyp: 9 ; 0x030: kfbtyp_cod_bgo --操作类型,类似数据库实例中的update/delete/insert等类型
kfracdb.lge[0].bcd[0].redund: 17 ; 0x031: sche=0x1 numb=0x1 --这里表示冗余级别,17是unport,18是mirror,19表示high
kfracdb.lge[0].bcd[0].pad: 63903 ; 0x032: 0xf99f
kfracdb.lge[0].bcd[0].kfrcod_crash: 1 ; 0x034: 0x00000001
kfracdb.lge[0].bcd[0].au[0]: 46 ; 0x038: 0x0000002e
kfracdb.lge[0].bcd[0].disks[0]: 0 ; 0x03c: 0x0000
kfracdb.lge[1].valid: 1 ; 0x040: v=1 b=0 m=0
kfracdb.lge[1].chgcount: 1 ; 0x041: 0x01
check等信息属于hash值,每隔3s都会更新一次,可能是由于突然掉电,cache里的信息没有更新到磁盘中导致。
[grid@rac2 ~]$ kfed read /dev/dm-2 aun=4 blkn=0|grep ckpt
kfracdc.ckpt.seq: 8 ; 0x018: 0x00000008
kfracdc.ckpt.blk: 390 ; 0x01c: 0x00000186
[grid@rac2 ~]$ kfed read /dev/dm-0 aun=4 blkn=0|grep ckpt
kfracdc.ckpt.seq: 70 ; 0x018: 0x00000046
kfracdc.ckpt.blk: 255 ; 0x01c: 0x000000f
通过kfed merge手工修改的方法恢复
kfed read /dev/dm-2 aun=4 blkn=0 >acd.txt
kfed merge /dev/dm-2 aun=4 blkn=0 text=acd.txt
手工mount磁盘组
sql> alter diskgroup data mount;
alter diskgroup data mount
*
error at line 1:
ora-00600: internal error code, arguments: [kfrvalacd30], [data], [1], [8],
[390], [9], [390], [], [], [], [], []
sql> alter diskgroup data mount;
diskgroup altered.