Twitter

ASM: Why you must use HIGH redundancy in production

HIGH redundancy is the ASM redundancy level recommended to run on production systems as each primary file extent has 2 mirrored copies on different failgroups while with NORMAL redundancy each primary file extent has only one mirrored file extent in a different failgroup. This basically means that NORMAL redundancy protects from a single disk failure while HIGH redundancy protects from 2 simultaneous disk failures. Note that, in the specific case of Exadata, each failgroup is located on different storage servers.

A picture is worth a thousand words so let's have a look at a simple illustration:

We can see on the above image that:
  • 3 failgroups are defined here (so on 3 different cells if this is Exadata)
  • The DATA1 diskgroup (in blue) has been created with a HIGH redundancy then we can see 3 copies of a file extent on each failgroup (the blue squares)
  • The DATA2 diskgroup (in red) has been created with a NORMAL redundancy then we can see 2 copies of a file extent on each failgroup (the red squares)

To clarify the features and limitations of such configurations, let's go through a scenario:
  • 1/ We lose all the disks in the Failgroup3 (or the whole storage cell if Exadata)
  • Let's see what happens for our 2 diskgroups DATA1 and DATA2:
    • DATA1 has no issue as it still has 2 copies of each file extent on Failgroup1 and Failgroup2
    • DATA2 has no issue as it still has 1 copy of each file extent on Failgroup2
  • 2/ Now we also lose all the disks on the Failgroup2 (or the whole cell storage if Exadata)
  • And see what happens:
    • DATA1 still survives as it still has 1 copy of each file extent on Failgroup1
    • DATA2 cannot survive this second disk failure and has to be dismounted, everything running on it crashes

You'll tell me "yeah Fred OK but this would be really unfortunate to have such scenario, it never happens !" -- I wouldn't be so sure as I found myself some Exadata storage servers in very bad health when I developped cell-status.sh with asmDeactivationOutcome not set to YES due to not detected bad partner disks for example. Also, when patching Exadata as i will be showing a real life example below.

Let's have a look at this cell patching patchmgr session logs:
2018-11-30 14:28:52 -0600 Do exadatacel12 :Working: Execute plugin check for Patching ...
2018-11-30 14:28:53 -0600 Done exadatacel12 :SUCCESS: Execute plugin check for Patching.
2018-11-30 14:28:53 -0600 3 Do exadatacel12 :Working: Cell will reboot. Up to 5 minutes ...
2018-11-30 14:28:55 -0600 3 Done exadatacel12 :SUCCESS: Finalize patch on cell.
2018-11-30 14:29:12 -0600 4 Do exadatacel12 :Working: Wait for cell to reboot and come online. Between 35 minutes and 600 minutes.
2018-11-30 14:29:12 -0600        :INFO   : exadatacel12 Wait for patch finalization and reboot
2018-11-30 14:55:54 -0600 4 Done exadatacel12 :SUCCESS: Wait for cell to reboot and come online.
2018-11-30 14:55:54 -0600 5 of 5 :Working: DO: Check the state of patch on cells. Up to 5 minutes ...
2018-11-30 14:56:10 -0600 5 Done exadatacel12 :SUCCESS: Check the state of patch on cell.
2018-11-30 14:56:10 -0600 Do exadatacel12 :Working: Execute plugin check for Pre Disk Activation ...
2018-11-30 14:56:11 -0600 Done exadatacel12 :SUCCESS: Execute plugin check for Pre Disk Activation.
2018-11-30 14:56:11 -0600        :Working: DO: Activate grid disks...  Up to  600 minutes ...
2018-11-30 14:56:12 -0600        :INFO   : Wait for checking and activating grid disks
2018-11-30 15:52:52 -0600        :SUCCESS: DONE: Activate grid disks.
2018-11-30 15:52:55 -0600        :Working: DO: Execute plugin check for Post Patch ...
2018-11-30 15:52:56 -0600 Done exadatacel12 :SUCCESS: Execute plugin check for Post Patch.
2018-11-30 15:52:56 -0600 Do exadatacel13 :Working: Execute plugin check for Patching ...
2018-11-30 15:52:57 -0600 Done exadatacel13 :SUCCESS: Execute plugin check for Patching.
2018-11-30 15:52:57 -0600 3 Do exadatacel13 :Working: Cell will reboot. Up to 5 minutes ...
2018-11-30 15:52:59 -0600 3 Done exadatacel13 :SUCCESS: Finalize patch on cell.
2018-11-30 15:53:43 -0600 4 Do exadatacel13 :Working: Wait for cell to reboot and come online. Between 35 minutes and 600 minutes.
2018-11-30 15:53:43 -0600        :INFO   : exadatacel13 Wait for patch finalization and reboot
2018-11-30 16:20:12 -0600 4 Done exadatacel13 :SUCCESS: Wait for cell to reboot and come online.
2018-11-30 16:20:12 -0600 5 of 5 :Working: DO: Check the state of patch on cells. Up to 5 minutes ...
2018-11-30 16:20:25 -0600 5 Done exadatacel13 :SUCCESS: Check the state of patch on cell.
2018-11-30 16:20:25 -0600 Do exadatacel13 :Working: Execute plugin check for Pre Disk Activation ...
2018-11-30 16:20:26 -0600 Done exadatacel13 :SUCCESS: Execute plugin check for Pre Disk Activation.
2018-11-30 16:20:26 -0600        :Working: DO: Activate grid disks...  Up to  600 minutes ...
2018-11-30 16:20:27 -0600        :INFO   : Wait for checking and activating grid disks
||||| 2018-11-30 18:43:25 -0600 Minutes left 457
||||| 2018-11-30 20:51:24 -0600 Minutes left 329
If you look at the exadatacel12 timestamps, it took ~ one hour to reactivate the grid disks and exadatacel13 was stuck activating grid disks for hours; weird, right ? why those cel13 disks would be stuck after having successfully patched 12 cells already ? (and we patch this Exadata on a regular basis).

A closer look at the disks was showing the grid disks of the DATA diskgroups in an UNKNOWN status:
CellCLI> list griddisk attributes name,asmmodestatus
         DATA_CD_00_exadatacel13      UNKNOWN
         DATA_CD_01_exadatacel13      UNKNOWN
         DATA_CD_02_exadatacel13      UNKNOWN
         DATA_CD_03_exadatacel13      UNKNOWN
         DATA_CD_04_exadatacel13      UNKNOWN
         DATA_CD_05_exadatacel13      UNKNOWN
         DATA_CD_06_exadatacel13      UNKNOWN
         DATA_CD_07_exadatacel13      UNKNOWN
         DATA_CD_08_exadatacel13      UNKNOWN
         DATA_CD_09_exadatacel13      UNKNOWN
         DATA_CD_10_exadatacel13      UNKNOWN
         DATA_CD_11_exadatacel13      UNKNOWN
         RECO_CD_00_exadatacel13      ONLINE
         RECO_CD_01_exadatacel13      ONLINE
         RECO_CD_02_exadatacel13      ONLINE
         RECO_CD_03_exadatacel13      ONLINE
         RECO_CD_04_exadatacel13      ONLINE
         RECO_CD_05_exadatacel13      ONLINE
         RECO_CD_06_exadatacel13      ONLINE
         RECO_CD_07_exadatacel13      ONLINE
         RECO_CD_08_exadatacel13      ONLINE
         RECO_CD_09_exadatacel13      ONLINE
         RECO_CD_10_exadatacel13      ONLINE
         RECO_CD_11_exadatacel13      ONLINE
CellCLI>

Using rac-status.sh, we then found out that all the databases were down (I have shrunk the below output for better visibility as the real case was more dramatic as there was 80 databases running on a Full Rack, standbys as well, many more listeners, etc ...):
[oracle@exadatadb01]$ ./rac-status.sh

                Cluster exadata is a  X4-2 Full Rack HC 4TB

    Listener   |      Port     |     db01     |     db02     |     db03     |     db04     |     Type     |
-----------------------------------------------------------------------------------------------------------
    LISTENER   | TCP:1522      |    Online    |    Online    |    Online    |    Online    |   Listener   |
 LISTENER_SCAN1| TCP:1521      |       -      |    Online    |       -      |       -      |     SCAN     |
 LISTENER_SCAN2| TCP:1521      |       -      |       -      |    Online    |       -      |     SCAN     | 
 LISTENER_SCAN3| TCP:1521      |       -      |       -      |       -      |    Online    |     SCAN     |
-----------------------------------------------------------------------------------------------------------

       DB      |    Version    |     db01     |     db02     |     db03     |     db04     |    DB Type   |
-----------------------------------------------------------------------------------------------------------
  db_01        | 11.2.0.4  (1) |   Shutdown   |   Shutdown   |   Shutdown   |   Shutdown   |    RAC (P)   |
  db_02        | 11.2.0.4  (2) |   Shutdown   |   Shutdown   |   Shutdown   |   Shutdown   |    RAC (P)   |
  db_03        | 11.2.0.4  (1) |   Shutdown   |   Shutdown   |   Shutdown   |   Shutdown   |    RAC (P)   |
  db_04        | 12.1.0.2  (3) |   Shutdown   |   Shutdown   |   Shutdown   |   Shutdown   |    RAC (P)   |
  db_05        | 12.1.0.2  (3) |   Shutdown   |   Shutdown   |   Shutdown   |   Shutdown   |    RAC (P)   |
  db_06        | 12.1.0.2  (4) |   Shutdown   |   Shutdown   |   Shutdown   |   Shutdown   |    RAC (P)   |
  db_07        | 12.1.0.2  (5) |   Shutdown   |   Shutdown   |   Shutdown   |   Shutdown   |    RAC (P)   |
  db_08        | 12.1.0.2  (3) |   Shutdown   |   Shutdown   |   Shutdown   |   Shutdown   |    RAC (P)   |
  db_09        | 11.2.0.4  (2) |   Shutdown   |   Shutdown   |   Shutdown   |   Shutdown   |    RAC (P)   |
  db_10        | 11.2.0.4  (1) |   Shutdown   |   Shutdown   |   Shutdown   |   Shutdown   |    RAC (P)   |
. . .

Wow, but why these databases were down (crashed ?) as the DATA diskgroup they are having their datafiles on is a NORMAL redundancy one ?

And . . . here was the accident:
[root@exadatacel11 ~]# cellcli -e list alerthistory
                  5       2018-11-30T16:18:57-06:00       critical        "Disk controller was hung. Cell was power cycled to restore access to the cell."
[root@exadatacel11 ~]#
Yeah, cel11 controller died when cel13 was down for patching => 2 simultaneous disk failures against a NORMAL redundancy diskgroup => DG is dismounted => everything running on it crashes (it worked as designed :)) !

Bad luck here ? may be yes and I am not saying it happens often but I personally do not want to rely on luck or on controller failure statistics on production. This is why is strongly recommend to go with HIGH redundancy on production !

5 comments:

  1. Yes, but need to concern about the DBFS at quarter/eight racks. If want to use high, check for quorums too.

    ReplyDelete
    Replies
    1. Indeed. From https://www.oracle.com/technetwork/database/exadata/maa-exadata-asm-cloud-3656632.pdf
      "Starting with the Exadata storage server software 12.1.2.3.0, quorum disks enable users to deploy and leverage disks on database servers to achieve highest redundancy in quarter rack or smaller configurations. Quorum disks are created on the database servers and added into the quorum failure group. This allows the use of 3 storage servers in HIGH redundancy configurations for the eighth and quarter rack Exadata configurations"

      Delete
  2. Thanks for detailed explanation, while deployment our disk groups were created with Normal redundancy,past weekend we migrated our last set of databases to high redundancy, it took weeks of efforts to move 50+ databases, but worth it :-)

    ReplyDelete
  3. Such a very good information share for us. nice blog.

    ReplyDelete

CUDA: getting started on WSL

I have always preferred command line and vi finding it more efficient so after the CUDA: getting started on Windows , let's have a loo...