Twitter

How to install a brand new Exadata (X2 to X7) -- Part 1


This post is a rebranded post originally posted in August 2016 here; it will be maintained here in the future.

When you buy an Exadata machine, you can ask Oracle to proceed with the installation or let some brilliant DBA to do it :) I will be showing in this blog how to install a brand new Exadata; this procedure comes from the real life, it has been applied on X2 to X7 Exadatas that are now running on production. As the procedure is quite long, I have split it in two distinct parts :

  • Part 1 (this one) is about the settings that are needed to proceed with the installation
  • Part 2 is about the installation procedure itself


0 / Anonymization

The whole procedure has been anomymized using the below conventions :
  • I have replaced the name of the Exadata cluster by "mycluster"
  • "a_server" is a hostname outside the Exadata I used to reach it (a jump server, an administration server for example)
  • "a_user" is my own non privileged user I use to reach the Exadata servers from this "jump server" (I could have written "fdenis" instead)
  • I show the client specific IP like this one "IP_of_an_IB_Switch"
  • Sometimes the customer specific IPs appear like "10.XX.YY.ZZ" (no secret here, 10.x.x.x being the class A private network)
  • All the clear text IPs are the Exadata defaults
Please let me know in the comments if I missed one or if something is not clear.

1/ What is already configured ?

When Oracle delivers an Exadata, they take care of the cabling and the first boot. During this first boot, few things are configured:
  • The Ethernet Switch
  • The InfiniBand Switches (IB Switches)
  • The PDUs
  • An access to the ILOM of the first database server

2/ What is needed to go further


2.1/ <client>-<cluster>.xml

The DBA needs some customer specific information before heading on to the network configuration and the software installation of the new Exadata machine such as the hostnames, the IP addresses of each component, the default gateway, etc . . .
Since Exadata X2, Oracle is kind enough to provide an easy way to achieve this with a software named Oracle Exadata Deployment Assistant (OEDA). The client then downloads OEDA on his laptop, provides all the needed information and then generates an XML file (named <client>-<cluster>.xml which contains all the required information for the DBA to proceed with the installation. This file is vital as the whole installation procedure relies on it.

2.2/ OEDA for Linux

OEDA is not only used to fill the <client>-<cluster>.xml XML file, it is also used to proceed with the set up (and as the OEDA can be updated quite often, it is NOT provided with the Exadata machine), we then need to upload on the first database server the following components : There're 2 ways described in the official documentation to achieve this goal; usually, the client plugs an USB key and copy the OEDA zip file and the XML configuration file let's say in /root/config_files (this is the easiest way).

3/ Exadata set up


3.1/ What's in it ?

First, let's have a look at what's in this new Exadata ! we can easily know what is in the box by using the ibhosts command from any part of the IB network (here, I did it from an IB switch as there's not much server reachable at this point):
a_server:a_user:/home/a_user> ssh root@IP_of_an_IB_Switch
root@IP_of_an_IB_Switch's password:
Last login: Fri Feb 26 15:36:23 2016 from a_server_ip
You are now logged in to the root shell.
It is recommended to use ILOM shell instead of root shell.
All usage should be restricted to documented commands and documented
config files.
To view the list of documented commands, use "help" at linux prompt.
[root@mycluster01sw-ib2 ~]# ibhosts
Ca      : 0x0010e00001888818 ports 2 "zfsp01ctl2 PCIe 5"
Ca      : 0x0010e00001889588 ports 2 "zfsp01ctl1 PCIe 5"
Ca      : 0x0010e0000187e6f8 ports 2 "node12 elasticNode 172.16.2.48,172.16.2.48 ETH0"
Ca      : 0x0010e00001754cf0 ports 2 "node10 elasticNode 172.16.2.46,172.16.2.46 ETH0"
Ca      : 0x0010e0000187e318 ports 2 "node9 elasticNode 172.16.2.45,172.16.2.45 ETH0"
Ca      : 0x0010e000014bab90 ports 2 "node8 elasticNode 172.16.2.44,172.16.2.44 ETH0"
Ca      : 0x0010e0000188a918 ports 2 "node6 elasticNode 172.16.2.42,172.16.2.42 ETH0"
Ca      : 0x0010e0000188a9b8 ports 2 "node4 elasticNode 172.16.2.40,172.16.2.40 ETH0"
Ca      : 0x0010e00001884898 ports 2 "node3 elasticNode 172.16.2.39,172.16.2.39 ETH0"
Ca      : 0x0010e00001887818 ports 2 "node2 elasticNode 172.16.2.38,172.16.2.38 ETH0"
Ca      : 0x0010e0000188a088 ports 2 "node1 elasticNode 172.16.2.37,172.16.2.37 ETH0"
[root@mycluster01sw-ib2 ~]#
We can see 5 cells (5 storage servers):
   node1 elasticNode 172.16.2.37
   node2 elasticNode 172.16.2.38
   node3 elasticNode 172.16.2.39
   node4 elasticNode 172.16.2.40
   node6 elasticNode 172.16.2.42
And 4 database servers:
   node8  elasticNode 172.16.2.44
   node9  elasticNode 172.16.2.45
   node10 elasticNode 172.16.2.46
   node12 elasticNode 172.16.2.48
As no network configuration has been done yet, all components have the default factory hostnames and IP. Oracle uses the private 172.16 network for the factory IP settings starting with the first database node at 172.16.2.44. Oracle uses the Infiniband port number where the node is plugged (the first database node is plugged on the port number 8) + 36 => 36 + 8 = 44 then the default database server node 1 IP is 172.16.2.44.

3.2/ Network Configuration

Let's configure these IPs and hostnames with the customer's requirements in order to be able to access the database and storage servers from the customer network.

3.2.1/ /etc/hosts of the IB Switches

There's not much to configure on the IB Switches, just to check that the IP and the hosts are properly defined in the /etc/hosts file; add it if it's not already done (which is unlikely):
[root@mycluster01sw-ib2 ~]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
::1                     localhost6.localdomain6 localhost6
10.XX.YY.ZZ             mycluster01sw-ib2.mydomain.com mycluster-01sw-ib2
[root@mycluster01sw-ib2 ~]#

3.2.2/ ILOM console

To start with, we have to connect to the ILOM of the first database server (the IP is accessible from the customer network).
a_server:a_user:/home/a_user> ssh root@ILOM_IP
Password:
Oracle(R) Integrated Lights Out Manager
Version 3.2.4.52 r101649
Copyright (c) 2015, Oracle and/or its affiliates. All rights reserved.
Hostname: ORACLESP-1518NM117R
->
As we don't need to do much things on the ILOM itself, let's head on to the first database server using the ILOM console:
-> start /SP/console
Are you sure you want to start /SP/console (y/n)? y
Serial console started. To stop, type ESC (
Then here press <ENTER> and login as root
node8.my.company.com login: root
Password:
Last login: Fri Mar 18 13:08:45 on tty1
[root@node8 ~]#
As we saw previously, the first database server is <node8> and a Linux OS is pre-installed:
[root@node8 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
                       30G  4.3G   24G  16% /
tmpfs                 252G  4.0K  252G   1% /dev/shm
/dev/sda1             504M   81M  398M  17% /boot
/dev/mapper/VGExaDbOra-LVDbOra1
                       99G  188M   94G   1% /u01
[root@node8 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.7 (Santiago)
[root@node8 ~]

3.3/ ReclaimSpace

Each Exadata database server comes with Linux pre-installed but it is also possible to switch to Oracle VM which is also pre-installed on disk. In order to not waste any disk space, we can reclaim that space and any other unused space by executing the below command on every database server (a list of IP for the database servers can be taken from the ibhosts command like shown earlier):
[root@node8 ~]# /opt/oracle.SupportTools/reclaimdisks.sh -free -reclaim
Model is ORACLE SERVER X5-2
...
[INFO     ] Mount /dev/mapper/VGExaDb-LVDbOra1 to /u01
[INFO     ] Logical volume LVDbSys2 exists in volume group VGExaDb
[root@node8 ~]#

3.4/ OneCommand

Create a OneCommand directory, copy and unzip OEDA and the XML configuration file provided by the client in it.
[root@node8 ~]# mkdir /u01/oracle.SupportTools/onecommand
[root@node8 ~]# cd /root/config_files/
[root@node8 config_files]# ls -ltr
total 68664
-rw------- 1 root root 70186224 Mar 18 14:00 Ocmd-16.049-OTN-linux-x64.zip
-rw------- 1 root root   114077 Mar 18 14:01 -mycluster.xml
-rw------- 1 root root     6105 Mar 18 14:01 databasemachine.xml
[root@node8 config_files]# cd /u01/oracle.SupportTools/onecommand>
[root@node8 onecommand]# unzip Ocmd-16.049-OTN-linux-x64.zip
Archive:  Ocmd-16.049-OTN-linux-x64.zip
   creating: linux-x64/
  inflating: linux-x64/config.sh
  inflating: linux-x64/README.txt
   creating: linux-x64/Lib/
….
[root@node8 onecommand]# cd linux-x64/
[root@node8 linux-x64]# pwd
/u01/oracle.SupportTools/onecommand/linux-x64
[root@node8 linux-x64]# mv ..-mycluster.xml .
[root@node8 linux-x64]#
Please Note that it is not allowed to run OEDA from a root (/opt) directory then use /u01 to unzip it. Here is the error you would face if you launch OEDA from /opt:
Invoke OEDA from a non root file system. Current directory /opt/oracle.SupportTools/onecommand/linux-x64 is part of the root file system
 Non-root file systems with required space are:
 File System /u01 free space 95485 MB

3.5/ ApplyElasticConfig

We can now proceed with the network configuration (the applyElasticConfig.sh shell script is provided by OEDA and is located under /opt/oracle.SupportTools/onecommand/linux-x64)
[root@node8 linux-x64]# ./applyElasticConfig.sh -cf -mycluster.xml
 Applying Elastic Config...
 Applying Elastic configuration...
 Searching Subnet 172.16.2.x
 10 live IPs in  172.16.2.x
 Exadata node found 172.16.2.38
 Configuring node : 172.16.2.48
 Done Configuring node : 172.16.2.48
 Configuring node : 172.16.2.46
 Done Configuring node : 172.16.2.46
 Configuring node : 172.16.2.45
 Done Configuring node : 172.16.2.45
 Configuring node : 172.16.2.42
 Done Configuring node : 172.16.2.42
 Configuring node : 172.16.2.41
 Done Configuring node : 172.16.2.41
 Configuring node : 172.16.2.40
 \\\\\
….
This will apply all the customer's specifications to the database and storage servers and all servers will be rebooted at the end of their configuration. Once the set-up applied, you will be able to connect to each database and storage servers from the customer network using their 10.x.y.z specific IPs or their hostnames if they have already been defined in the corporate DNS.

3.5.1/ An Issue with the Cells IP config

It happened that after the previous step, the cells were not properly configured and then the cells configuration was different from the real network configuration shown by the ifconfig command. We have to correct that manually to avoid any issue later on specially when applying some patches; here are the error messages you could face if your cells were misconfigured:
CELL-01533: Unable to validate the IP addresses from the cellinit.ora file because the IP addresses may be down or misconfigured
ORA-00700: soft internal error, arguments: [main_6a], [3], [IP addresses in cellinit.ora not operational], [], [], [], [], [], [], [], [], []
Here is the output of the ifconfig command on one cell (is pasted here only the information we need for visibility):
[root@myclustercel01 ~]# ifconfig ib0
          ...
          inet addr:192.168.12.17  Bcast:192.168.12.255  Mask:255.255.255.0
          ...
[root@yclustercel01 ~]# ifconfig ib1
          ...
          inet addr:192.168.12.18  Bcast:192.168.12.255  Mask:255.255.255.0
          ...
[root@yclustercel01 ~]#
Here is a (bad) output of one cell (is pasted here only the information we need for visibility):
[root@myclustercel01 ~]# cellcli
CellCLI> list cell detail
         name:                   ru02
         ...
         ipaddress1:             192.168.10.1/24
         ipaddress2:             192.168.10.2/24
         ...
CellCLI>
Note : /24 is the network mask, it should be set to /22 in the following step Who's right then ? well, ibhosts doesn't lie:
[root@myclusterdb01 ~]# ibhosts | grep cel01
Ca      : 0x0010e00001884a08 ports 2 "myclustercel01 C 192.168.12.17,192.168.12.18 HCA-1"
[root@myclusterdb01 ~]#
We then need to update the configuration manually (even the hostname has to be set) and restart the cells:
CellCLI> alter cell name=myclustercel01,ipaddress1='192.168.12.17/22',ipaddress2='192.168.12.18/22'
Network configuration altered. Please issue the following commands as root to restart the network and open IB stack:
service openibd restart
service network restart
A restart of all services is required to put new network configuration into effect. MS-CELLSRV communication may be hampered until restart.
Cell myclustercel01 successfully altered

CellCLI> alter cell restart services all

Stopping the RS, CELLSRV, and MS services...
The SHUTDOWN of services was successful.
Starting the RS, CELLSRV, and MS services...
Getting the state of RS services...  running
Starting CELLSRV services...
The STARTUP of CELLSRV services was successful.
Starting MS services...
The STARTUP of MS services was successful.

CellCLI>
And then a quick check of the new config :
CellCLI> list cell detail
         name:                   myclustercel01
         …
         ipaddress1:             192.168.12.17/22
         ipaddress2:             192.168.12.18/22
         ...
CellCLI>
Note : this procedure has to be executed for each misconfigured cell only

3.6/ dbs_group, cell_group and all_group

You have probably already seen many Exadata documentations mentioning some files named dbs_group, cell_group and all_group. But where do these files come from ? Well, you, the DBA, have to create them :) As these files will be used on our daily basis administration tasks, it is a good idea to put them in a well known directory. Please find below few commands to create them quickly -- I also like to have an ib_group file with the name of my IB Switches, this will be very useful when you will be patching the IB Switches later on (it would be sad to stop at the installation, isn't it ?).
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep db | awk '{print $6}' | sort > /root/dbs_group
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep cel | awk '{print $6}' | sort > /root/cell_group
[root@myclusterdb01 ~]# cat /root/dbs_group /root/cell_group > /root/all_group
[root@myclusterdb01 ~]# ibswitches | awk '{print $10}' | sort > /root/ib_group
[root@myclusterdb01 ~]#

3.7/ Root SSH equivalence

It is better to have a root SSH equivalence to ease all the installation and the administration tasks. We will choose one server (I use to choose DB server node 1 aka "myclusterdb01") and will be deploying its SSH key to all the other DB nodes, cell nodes and the IB Switches as well.
First, we need to generate the root's SSH environment (press <ENTER> after each question):
[root@myclusterdb01 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
05:bb:f3:3b:a7:d0:f5:7b:c2:71:10:05:2e:7e:6e:11 root@myclusterdb01.mydomain.com
The key's randomart image is:
+--[ RSA 2048]----+
|        .     .o.|
|         o   ..  |
|        . . . E. |
|         o . ... |
|        S   o o. |
|         + . +...|
|        . o  .+o |
|         .....o..|
|          o+  .o |
+-----------------+
[root@myclusterdb01 ~]#
And we will use your futur new best friend dcli to deploy the ssh keys to all the database and storage servers from the database node 1:
[root@myclusterdb01 ~]# dcli -g /root/all_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclusterdb01's password:
root@myclusterdb03's password:[root@myclusterdb01 ~]#
Note : you will be prompted for each server root's password, don't mess up,root user is locked after the first failed attempt :) Let's do the same for the IB Switches:
[root@myclusterdb01 ~]# dcli -g /root/ib_group -l root -k -s '-o StrictHostKeyChecking=no'[root@myclusterdb01 ~]#
You are now able to connect from the first DB server to any host with no password. You can quickly test it with the below command :
# dcli -g ~/all_group -l root date

All is now ready to jump to the installation part which is detailed in the part 2 of this blog !



No comments:

Post a Comment

CUDA: getting started on WSL

I have always preferred command line and vi finding it more efficient so after the CUDA: getting started on Windows , let's have a loo...