An Unknown DBA blog: How to Patch / Upgrade Exadata to any version -- Part 6 -- Timing

6/ Timing

Now that we know how to patch every component and the different options available to do so (rolling, non-rolling), which one is the best? How much time does it take?
The answer is obviously "it depends" but I will try to bring few insights so you can have a bright answer when you will inevitably be asked "How long will that patch be? "
Here is a summary of the length of the patch application in a Rolling fashion and in a Non-Rolling fashion (as well as the downtime for each method). Please note that I put in green what I recommend.

6.1/ Cells

Rolling : 1h30 x number of cells
Rolling downtime : 0 minute
Non-rolling : 2h (1h30 to patch a cell + 30 minutes to stop and start everything before and after the patch)
Non-rolling downtime : 2h

Note : the time to patch a cell greatly depends on the activity of the databases running on the system when you apply the patch. 1h30 usually what I see when I patch the cells during a quiet period. I saw some 4 hours per cells on busy production (and times 14 cells then 42 hours to patch the cells -- but everything is online, you just have to let patchmgr do the job). Note : Refer to my notes at the end of this page about this choice

6.2/ IB Switches

Rolling : 45 minutes per switch then 1h30 total
Rolling downtime : 0 minute
Non-rolling : not available
Non-rolling downtime : not available

Note: There's no non-rolling method for the IB Switches then here the choice is an easy one!

6.3/ Database Servers

Rolling : 45 minutes to 1h per node
Rolling downtime : It can be 0 minutes if you make a good use of the Oracle services (as described here for the Grid patching. You can apply the same concept for the database servers patching as well)
Non-rolling : 1h
Non-rolling downtime : 1h

Note: Refer to my notes at the end of this page about this choice

6.4/ Grid

Rolling : 30 - 45 minutes per node
Rolling downtime: Can be 0 minute if you make a good use of the Oracle services as described in this paragraph
Non-rolling : 30 - 45 minutes
Non-rolling downtime : 30 - 45 minutes for all the instances running on the node you patch

Note: No green color here? To patch the grid, I recommend to go for a mix like:

Rebalance the services away from node 1

Patch the node 1

Verify that everything is well restarted on the node 1

Move all the services to the node 1 (if it is possible that only one node can handle the whole activity - but usually we patch during a quiet period)

Apply the patch in a non-rolling method (for the Grid it means launching the patch manually in parallel on the remaining nodes)

Once the grid has been patched on all the nodes, restart all the services as they were before the patch

7/ Notes on my recommendations

Yes, I always prefer the rolling method for the Infrastructure components (Grid and Database Servers). This is because I can mitigate the outage and I'm also sure to avoid any outage created by the patch or anything preventing for example a reboot as we do not reboot those servers frequently.

Imagine if you go for a cell rolling upgrade and one cell does not reboot after the patch. You'll have no issue here as the patch will stop automatically; everything will work as before with one cell down, no one will notice anything, you are still supported as it is supported to run different version across different servers. You can then quietly check the troubleshooting section of this blog or go to the pool while Oracle finds a solution for you.
It happened to us on production (it didn't happen on the DEV on QA Exadatas before...), we warned the client and it took few days to Oracle to provide an action plan. All ran perfectly during a week with a cell down, we then applied the Oracle action plan during the next week-end and could properly finish the patch. The result here is that we applied the patch successfully. We had an issue that caused no outage nor performance degradation and we still fit in the maintenance window - very good job from a client and process point of view !

But if you go for a non-rolling cell patching and all your cells (or few of them) do not reboot after the patch, then you are in trouble and you will lose ten times the time you think you could have won by doing a non-rolling manner. You will most likely have a failed patch outside of the maintenance window, a Root Cause Analysis to provide to the process guys and you probably won't patch this Exadata any more for a while as the client will be... hmmm... a bit chilly about that question in the future.
And this risk is the same for the databases servers.
I do not say that the Bundle won't work and create a big outage (I did a lot and it works pretty well), it is just all about risk mitigation. And remember: "highest level of patch = highest level of bug" :)

If you've reached this point, I hope that you enjoyed this Odyssey into the Exadata patching world as much as I enjoy working with it on a daily basis!

Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

An Unknown DBA blog

Twitter

How to Patch / Upgrade Exadata to any version -- Part 6 -- Timing