Twitter

How to Patch / Upgrade Exadata to any version -- Part 6 -- Timing


6/ Timing

Now that we know how to patch every component and the different options available to do so (rolling, non-rolling), which one is the best? How much time does it take?
The answer is obviously "it depends" but I will try to bring few insights so you can have a bright answer when you will inevitably be asked "How long will that patch be? "
Here is a summary of the length of the patch application in a Rolling fashion and in a Non-Rolling fashion (as well as the downtime for each method). Please note that I put in green what I recommend.


6.1/ Cells

  • Rolling : 1h30 x number of cells
  • Rolling downtime : 0 minute
  • Non-rolling : 2h (1h30 to patch a cell + 30 minutes to stop and start everything before and after the patch)
  • Non-rolling downtime : 2h
Note : the time to patch a cell greatly depends on the activity of the databases running on the system when you apply the patch. 1h30 usually what I see when I patch the cells during a quiet period. I saw some 4 hours per cells on busy production (and times 14 cells then 42 hours to patch the cells -- but everything is online, you just have to let patchmgr do the job). Note : Refer to my notes at the end of this page about this choice

6.2/ IB Switches

  • Rolling : 45 minutes per switch then 1h30 total
  • Rolling downtime : 0 minute
  • Non-rolling : not available
  • Non-rolling downtime : not available
Note: There's no non-rolling method for the IB Switches then here the choice is an easy one!

6.3/ Database Servers

Note: Refer to my notes at the end of this page about this choice

6.4/ Grid


Note: No green color here? To patch the grid, I recommend to go for a mix like:
  • Rebalance the services away from node 1

  • Patch the node 1

  • Verify that everything is well restarted on the node 1

  • Move all the services to the node 1 (if it is possible that only one node can handle the whole activity - but usually we patch during a quiet period)

  • Apply the patch in a non-rolling method (for the Grid it means launching the patch manually in parallel on the remaining nodes)

  • Once the grid has been patched on all the nodes, restart all the services as they were before the patch



7/ Notes on my recommendations

Yes, I always prefer the rolling method for the Infrastructure components (Grid and Database Servers). This is because I can mitigate the outage and I'm also sure to avoid any outage created by the patch or anything preventing for example a reboot as we do not reboot those servers frequently.


Imagine if you go for a cell rolling upgrade and one cell does not reboot after the patch. You'll have no issue here as the patch will stop automatically; everything will work as before with one cell down, no one will notice anything, you are still supported as it is supported to run different version across different servers. You can then quietly check the troubleshooting section of this blog or go to the pool while Oracle finds a solution for you.
It happened to us on production (it didn't happen on the DEV on QA Exadatas before...), we warned the client and it took few days to Oracle to provide an action plan. All ran perfectly during a week with a cell down, we then applied the Oracle action plan during the next week-end and could properly finish the patch. The result here is that we applied the patch successfully. We had an issue that caused no outage nor performance degradation and we still fit in the maintenance window - very good job from a client and process point of view !

But if you go for a non-rolling cell patching and all your cells (or few of them) do not reboot after the patch, then you are in trouble and you will lose ten times the time you think you could have won by doing a non-rolling manner. You will most likely have a failed patch outside of the maintenance window, a Root Cause Analysis to provide to the process guys and you probably won't patch this Exadata any more for a while as the client will be... hmmm... a bit chilly about that question in the future.
And this risk is the same for the databases servers.
I do not say that the Bundle won't work and create a big outage (I did a lot and it works pretty well), it is just all about risk mitigation. And remember: "highest level of patch = highest level of bug" :)


If you've reached this point, I hope that you enjoyed this Odyssey into the Exadata patching world as much as I enjoy working with it on a daily basis!




Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

2 comments:

  1. Best step by step instructions on Exadata patching out there. Kudos and thanks.

    ReplyDelete
  2. Denis .. I second it . Great stuff , Denis !!!

    ReplyDelete

CUDA: getting started on WSL

I have always preferred command line and vi finding it more efficient so after the CUDA: getting started on Windows , let's have a loo...