Twitter

How to Patch / Upgrade Exadata to any version -- Part 5 -- Troubleshooting

It does not seem possible to me to write an exhaustive blog about troubleshooting an Exadata patching session which would go bad (or it would be incredibly pretentious). Indeed, an Exadata stack is a complete and complex mix of software and hardware which can, on top of that, be configured very differently depending in each company needs, norms, compliance rules, etc....

The best way, in my humble opinion, to be able to efficiently troubleshoot a failure during an Exadata patching session is to:
  • Know vi and grep to check the logfiles :)
  • Have the full picture of the Exadata patching procedure
  • Keep in mind that, in case of rolling patches, even a crash or a non reponsive server does not impact the uptime of the applications as everything is (at least) redundant so take your time to troubleshoot and stay cool (as a cucumber)
  • Have access to well documented procedures from the real life which help setting and/or manage Exadata components; indeed, the main reasons for problems during Exadata patching sessions are:
    • Failed pre-requisites:
    • - Due to hardware alerts; open a SR to have the failed hardware fixed

      - Due to a misconfiguration of one of the component; you'll find the the procedure below

    • A crash / timeout / non responsive server during the patching:
    • - A space issue but they are usually detected by the pre-requisites

      - Usually, rebooting the server will be needed; you'll find the the procedure below

      - If a SR has to be opened, ILOM snapshot will be needed; you'll find the the procedure below

    • Again, this is not exhaustive and this is the beauty of it ! and this is why this blog is a living blog and will be updated when new issues and solutions appear.


Troubleshooting a failed patch starts with checking the logfiles to get more informatio about the issue you just faced; below a list of the most commn logfiles:
  • patchmgr.log -- the main patchmgr logfile
  • patchmgr.trc -- a more detailed patchmgr output
  • nodename.log --
  • /var/log/cellos/dbnodeupdate.log -- located on the node (not in the patchmgr directory), detailed log of the patch application on this specific node

Each procedure listed below has been executed on real life production Exadatas at least once (many have been used far more than once)
Hope it helps !



Quick links to Part 1 / Part 2 / Part 3 / Part 4 / Part 5 / Part 6

No comments:

Post a Comment

CUDA: getting started on WSL

I have always preferred command line and vi finding it more efficient so after the CUDA: getting started on Windows , let's have a loo...