Redundancy is an important feature of any system and Exadata is not an exception. At storage level, Exadata uses ASM to ensure a software redundancy based on failgroup. This is how we can patch the Exadata storage online and also perform some maintenance online (replacing a failed DIMM, a flash card for models < X6, a motherboard, etc ...). The below blog shows real life example on how to shutdown or reboot a cell without impacting ASM (Oracle official note about this is 1188080.1).
First of all, let's check the status of the disks before starting the maintenance (you can also use cell-status.sh for this):
If you did a shutdown to change a flash disk, verify the new device is here
First of all, let's check the status of the disks before starting the maintenance (you can also use cell-status.sh for this):
[root@exa01db01 ~]# ssh exa01cel04 Last login: Tue Oct 13 20:14:28 AEDT 2020 from 10.248.6.210 on pts/0 Last login: Tue Oct 13 20:14:54 2020 from 10.248.6.210 [root@exa01cel04 ~]# cellcli CellCLI: Release 19.2.4.0.0 - Production on Tue Oct 13 20:14:57 AEDT 2020 Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved. CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome DATA_exa01C1_CD_00_exa01cel04 ONLINE Yes DATA_exa01C1_CD_01_exa01cel04 ONLINE Yes . . . DBFS_DGC2_CD_02_exa01cel04 ONLINE Yes DBFS_DGC2_CD_03_exa01cel04 ONLINE Yes . . . RECO_exa01C1_CD_00_exa01cel04 ONLINE Yes RECO_exa01C1_CD_01_exa01cel04 ONLINE Yes . . . CellCLI>Now we need to deactivate the grid disks:
CellCLI> alter griddisk all inactive GridDisk DATA_exa01C1_CD_00_exa01cel04 successfully altered GridDisk DATA_exa01C1_CD_01_exa01cel04 successfully altered . . . GridDisk RECO_exa01C2_CD_10_exa01cel04 successfully altered GridDisk RECO_exa01C2_CD_11_exa01cel04 successfully altered CellCLI>The grid disks are now inactive:
CellCLI> list griddisk DATA_exa01C1_CD_00_exa01cel04 inactive DATA_exa01C1_CD_01_exa01cel04 inactive . . . RECO_exa01C2_CD_10_exa01cel04 inactive RECO_exa01C2_CD_11_exa01cel04 inactive CellCLI>If you want to shutdown the cell:
[root@exa01cel04 ~]# shutdown -h now Connection to exa01cel04 closed by remote host. Connection to exa01cel04 closed. [root@exa01db01 ~]#If you want to reboot the cell:
[root@exa01cel04 ~]# shutdown -Fr nowOnce the cell is back, the disks are inactive:
CellCLI> list griddisk DATA_exa01C1_CD_00_exa01cel04 inactive DATA_exa01C1_CD_01_exa01cel04 inactive . . . RECO_exa01C2_CD_10_exa01cel04 inactive RECO_exa01C2_CD_11_exa01cel04 inactive CellCLI>Now activate the grid disks:
CellCLI> alter griddisk all active GridDisk DATA_exa01C1_CD_00_exa01cel04 successfully altered GridDisk DATA_exa01C1_CD_01_exa01cel04 successfully altered . . . GridDisk RECO_exa01C2_CD_10_exa01cel04 successfully altered GridDisk RECO_exa01C2_CD_11_exa01cel04 successfully altered CellCLI>Disks are now SYNCING (this is the fast mirror resync feature, it should be fast as it will resync only the data modified durnig the maintenance -- usually not much):
CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome DATA_exa01C1_CD_00_exa01cel04 SYNCING Yes DATA_exa01C1_CD_01_exa01cel04 SYNCING Yes DATA_exa01C1_CD_02_exa01cel04 SYNCING Yes DATA_exa01C1_CD_03_exa01cel04 SYNCING Yes DATA_exa01C1_CD_04_exa01cel04 SYNCING Yes . . . CellCLI>Once the disks have finished SYNCING, you are all done !
If you did a shutdown to change a flash disk, verify the new device is here
CellCLI> list physicaldisk 8:0 ABXDNV normal 8:1 ABV1XV normal 8:2 ABT38V normal 8:3 ABYU6V normal 8:4 ABPRAV normal 8:5 ABA8DV normal 8:6 PX6JKV normal 8:7 ABW08V normal 8:8 AV0SHV normal 8:9 BCVZ3V normal 8:10 ABD6LV normal 8:11 ABYM1V normal FLASH_1_1 S2T7NHJK800123 normal FLASH_2_1 S2T7NHJK800456 normal <=== In my example, we have replaced this flash disk FLASH_4_1 S2T7NHJK800789 normal FLASH_5_1 S2T7NHJK800246 normal CellCLI>Check the new disk details:
CellCLI> list physicaldisk FLASH_2_1 detail name: FLASH_2_1 deviceName: /dev/nvme2n1 diskType: FlashDisk luns: 2_1 makeModel: "Oracle Flash Accelerator F320 PCIe Card" physicalFirmware: KPYAIR3Q physicalInsertTime: 2020-10-13T20:40:11+11:00 physicalSerial: S2T7NHJK800456 physicalSize: 2.910957656800746917724609375T slotNumber: "PCI Slot: 2; FDOM: 1" status: normal CellCLI>One more useful procedure for some more online maintenance !
No comments:
Post a Comment