Twitter

Exadata: shutdown or a reboot a cell without impacting ASM

Redundancy is an important feature of any system and Exadata is not an exception. At storage level, Exadata uses ASM to ensure a software redundancy based on failgroup. This is how we can patch the Exadata storage online and also perform some maintenance online (replacing a failed DIMM, a flash card for models < X6, a motherboard, etc ...). The below blog shows real life example on how to shutdown or reboot a cell without impacting ASM (Oracle official note about this is 1188080.1).

First of all, let's check the status of the disks before starting the maintenance (you can also use cell-status.sh for this):
[root@exa01db01 ~]# ssh exa01cel04
Last login: Tue Oct 13 20:14:28 AEDT 2020 from 10.248.6.210 on pts/0
Last login: Tue Oct 13 20:14:54 2020 from 10.248.6.210
[root@exa01cel04 ~]# cellcli
CellCLI: Release 19.2.4.0.0 - Production on Tue Oct 13 20:14:57 AEDT 2020
Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.
CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
         DATA_exa01C1_CD_00_exa01cel04     ONLINE  Yes
         DATA_exa01C1_CD_01_exa01cel04     ONLINE  Yes
         . . .
         DBFS_DGC2_CD_02_exa01cel04       ONLINE  Yes
         DBFS_DGC2_CD_03_exa01cel04       ONLINE  Yes
         . . .
         RECO_exa01C1_CD_00_exa01cel04     ONLINE  Yes
         RECO_exa01C1_CD_01_exa01cel04     ONLINE  Yes
         . . .
CellCLI>
Now we need to deactivate the grid disks:
CellCLI> alter griddisk all inactive
         GridDisk DATA_exa01C1_CD_00_exa01cel04 successfully altered
         GridDisk DATA_exa01C1_CD_01_exa01cel04 successfully altered
         . . .
         GridDisk RECO_exa01C2_CD_10_exa01cel04 successfully altered
         GridDisk RECO_exa01C2_CD_11_exa01cel04 successfully altered
CellCLI> 
The grid disks are now inactive:
CellCLI> list griddisk
         DATA_exa01C1_CD_00_exa01cel04     inactive
         DATA_exa01C1_CD_01_exa01cel04     inactive
         . . .
         RECO_exa01C2_CD_10_exa01cel04     inactive
         RECO_exa01C2_CD_11_exa01cel04     inactive
CellCLI> 
If you want to shutdown the cell:
[root@exa01cel04 ~]# shutdown -h now
Connection to exa01cel04 closed by remote host.
Connection to exa01cel04 closed.
[root@exa01db01 ~]#
If you want to reboot the cell:
[root@exa01cel04 ~]#  shutdown -Fr now
Once the cell is back, the disks are inactive:
CellCLI> list griddisk
         DATA_exa01C1_CD_00_exa01cel04     inactive
         DATA_exa01C1_CD_01_exa01cel04     inactive
         . . .
         RECO_exa01C2_CD_10_exa01cel04     inactive
         RECO_exa01C2_CD_11_exa01cel04     inactive
CellCLI> 
Now activate the grid disks:
CellCLI> alter griddisk all active
         GridDisk DATA_exa01C1_CD_00_exa01cel04 successfully altered
         GridDisk DATA_exa01C1_CD_01_exa01cel04 successfully altered
         . . .
         GridDisk RECO_exa01C2_CD_10_exa01cel04 successfully altered
         GridDisk RECO_exa01C2_CD_11_exa01cel04 successfully altered
CellCLI>
Disks are now SYNCING (this is the fast mirror resync feature, it should be fast as it will resync only the data modified durnig the maintenance -- usually not much):
CellCLI>  list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
         DATA_exa01C1_CD_00_exa01cel04     SYNCING         Yes
         DATA_exa01C1_CD_01_exa01cel04     SYNCING         Yes
         DATA_exa01C1_CD_02_exa01cel04     SYNCING         Yes
         DATA_exa01C1_CD_03_exa01cel04     SYNCING         Yes
         DATA_exa01C1_CD_04_exa01cel04     SYNCING         Yes
         . . .
CellCLI> 
Once the disks have finished SYNCING, you are all done !
If you did a shutdown to change a flash disk, verify the new device is here
CellCLI> list physicaldisk
         8:0             ABXDNV          normal
         8:1             ABV1XV          normal
         8:2             ABT38V          normal
         8:3             ABYU6V          normal
         8:4             ABPRAV          normal
         8:5             ABA8DV          normal
         8:6             PX6JKV          normal
         8:7             ABW08V          normal
         8:8             AV0SHV          normal
         8:9             BCVZ3V          normal
         8:10            ABD6LV          normal
         8:11            ABYM1V          normal
         FLASH_1_1       S2T7NHJK800123  normal
         FLASH_2_1       S2T7NHJK800456  normal    <=== In my example, we have replaced this flash disk
         FLASH_4_1       S2T7NHJK800789  normal
         FLASH_5_1       S2T7NHJK800246  normal
CellCLI> 
Check the new disk details:
CellCLI> list physicaldisk FLASH_2_1 detail
         name:                   FLASH_2_1
         deviceName:             /dev/nvme2n1
         diskType:               FlashDisk
         luns:                   2_1
         makeModel:              "Oracle Flash Accelerator F320 PCIe Card"
         physicalFirmware:       KPYAIR3Q
         physicalInsertTime:     2020-10-13T20:40:11+11:00
         physicalSerial:         S2T7NHJK800456
         physicalSize:           2.910957656800746917724609375T
         slotNumber:             "PCI Slot: 2; FDOM: 1"
         status:                 normal
CellCLI>
One more useful procedure for some more online maintenance !

No comments:

Post a Comment

CUDA: getting started on WSL

I have always preferred command line and vi finding it more efficient so after the CUDA: getting started on Windows , let's have a loo...