6/ Timing
Now that we know how to patch every component and the different options available to do so (rolling, non-rolling), which one is the best? How much time does it take?
The answer is obviously "it depends" but I will try to bring few insights so you can have a bright answer when you will inevitably be asked "How long will that patch be? "
Here is a summary of the length of the patch application in a Rolling fashion and in a Non-Rolling fashion (as well as the downtime for each method). Please note that I put in green what I recommend.
6.1/ Cells
- Rolling : 1h30 x number of cells
- Rolling downtime : 0 minute
- Non-rolling : 2h (1h30 to patch a cell + 30 minutes to stop and start everything before and after the patch)
- Non-rolling downtime : 2h
6.2/ IB Switches
- Rolling : 45 minutes per switch then 1h30 total
- Rolling downtime : 0 minute
- Non-rolling : not available
- Non-rolling downtime : not available
6.3/ Database Servers
- Rolling : 45 minutes to 1h per node
- Rolling downtime : It can be 0 minutes if you make a good use of the Oracle services (as described here for the Grid patching. You can apply the same concept for the database servers patching as well)
- Non-rolling : 1h
- Non-rolling downtime : 1h
6.4/ Grid
- Rolling : 30 - 45 minutes per node
- Rolling downtime: Can be 0 minute if you make a good use of the Oracle services as described in this paragraph
- Non-rolling : 30 - 45 minutes
- Non-rolling downtime : 30 - 45 minutes for all the instances running on the node you patch
Note: No green color here? To patch the grid, I recommend to go for a mix like:
- Rebalance the services away from node 1
- Patch the node 1
- Verify that everything is well restarted on the node 1
- Move all the services to the node 1 (if it is possible that only one node can handle the whole activity - but usually we patch during a quiet period)
- Apply the patch in a non-rolling method (for the Grid it means launching the patch manually in parallel on the remaining nodes)
- Once the grid has been patched on all the nodes, restart all the services as they were before the patch
7/ Notes on my recommendations
Yes, I always prefer the rolling method for the Infrastructure components (Grid and Database Servers). This is because I can mitigate the outage and I'm also sure to avoid any outage created by the patch or anything preventing for example a reboot as we do not reboot those servers frequently.
Imagine if you go for a cell rolling upgrade and one cell does not reboot after the patch. You'll have no issue here as the patch will stop automatically; everything will work as before with one cell down, no one will notice anything, you are still supported as it is supported to run different version across different servers. You can then quietly check the troubleshooting section of this blog or go to the pool while Oracle finds a solution for you.
It happened to us on production (it didn't happen on the DEV on QA Exadatas before...), we warned the client and it took few days to Oracle to provide an action plan. All ran perfectly during a week with a cell down, we then applied the Oracle action plan during the next week-end and could properly finish the patch. The result here is that we applied the patch successfully. We had an issue that caused no outage nor performance degradation and we still fit in the maintenance window - very good job from a client and process point of view !
But if you go for a non-rolling cell patching and all your cells (or few of them) do not reboot after the patch, then you are in trouble and you will lose ten times the time you think you could have won by doing a non-rolling manner. You will most likely have a failed patch outside of the maintenance window, a Root Cause Analysis to provide to the process guys and you probably won't patch this Exadata any more for a while as the client will be... hmmm... a bit chilly about that question in the future.
And this risk is the same for the databases servers.
I do not say that the Bundle won't work and create a big outage (I did a lot and it works pretty well), it is just all about risk mitigation. And remember: "highest level of patch = highest level of bug" :)
If you've reached this point, I hope that you enjoyed this Odyssey into the Exadata patching world as much as I enjoy working with it on a daily basis!
Best step by step instructions on Exadata patching out there. Kudos and thanks.
ReplyDeleteDenis .. I second it . Great stuff , Denis !!!
ReplyDelete