When you buy an Exadata machine, you can ask Oracle to proceed with the installation or let some brilliant DBA to do it :) I will be showing in this blog how to install a brand new Exadata; this procedure comes from the real life, it has been applied on X2 to X7 Exadatas that are now running on production. As the procedure is quite long, I have split it in two distinct parts :
- Part 1 (this one) is about the settings that are needed to proceed with the installation
- Part 2 is about the installation procedure itself
0 / Anonymization
The whole procedure has been anomymized using the below conventions :
- I have replaced the name of the Exadata cluster by "mycluster"
- "a_server" is a hostname outside the Exadata I used to reach it (a jump server, an administration server for example)
- "a_user" is my own non privileged user I use to reach the Exadata servers from this "jump server" (I could have written "fdenis" instead)
- I show the client specific IP like this one "IP_of_an_IB_Switch"
- Sometimes the customer specific IPs appear like "10.XX.YY.ZZ" (no secret here, 10.x.x.x being the class A private network)
- All the clear text IPs are the Exadata defaults
1/ What is already configured ?
When Oracle delivers an Exadata, they take care of the cabling and the first boot. During this first boot, few things are configured:
- The Ethernet Switch
- The InfiniBand Switches (IB Switches)
- The PDUs
- An access to the ILOM of the first database server
2/ What is needed to go further
2.1/ <client>-<cluster>.xml
The DBA needs some customer specific information before heading on to the network configuration and the software installation of the new Exadata machine such as the hostnames, the IP addresses of each component, the default gateway, etc . . .
Since Exadata X2, Oracle is kind enough to provide an easy way to achieve this with a software named Oracle Exadata Deployment Assistant (OEDA). The client then downloads OEDA on his laptop, provides all the needed information and then generates an XML file (named <client>-<cluster>.xml which contains all the required information for the DBA to proceed with the installation. This file is vital as the whole installation procedure relies on it.
Since Exadata X2, Oracle is kind enough to provide an easy way to achieve this with a software named Oracle Exadata Deployment Assistant (OEDA). The client then downloads OEDA on his laptop, provides all the needed information and then generates an XML file (named <client>-<cluster>.xml which contains all the required information for the DBA to proceed with the installation. This file is vital as the whole installation procedure relies on it.
2.2/ OEDA for Linux
OEDA is not only used to fill the <client>-<cluster>.xml XML file, it is also used to proceed with the set up (and as the OEDA can be updated quite often, it is NOT provided with the Exadata machine), we then need to upload on the first database server the following components :
- OEDA for Linux
- The <client>-<cluster>.xml file
- The databasemachine.xml file is nice to have as well as we can then use it to make an Exadata Status Script for example
3/ Exadata set up
3.1/ What's in it ?
First, let's have a look at what's in this new Exadata ! we can easily know what is in the box by using the ibhosts command from any part of the IB network (here, I did it from an IB switch as there's not much server reachable at this point):
a_server:a_user:/home/a_user> ssh root@IP_of_an_IB_Switch root@IP_of_an_IB_Switch's password: Last login: Fri Feb 26 15:36:23 2016 from a_server_ip You are now logged in to the root shell. It is recommended to use ILOM shell instead of root shell. All usage should be restricted to documented commands and documented config files. To view the list of documented commands, use "help" at linux prompt. [root@mycluster01sw-ib2 ~]# ibhosts Ca : 0x0010e00001888818 ports 2 "zfsp01ctl2 PCIe 5" Ca : 0x0010e00001889588 ports 2 "zfsp01ctl1 PCIe 5" Ca : 0x0010e0000187e6f8 ports 2 "node12 elasticNode 172.16.2.48,172.16.2.48 ETH0" Ca : 0x0010e00001754cf0 ports 2 "node10 elasticNode 172.16.2.46,172.16.2.46 ETH0" Ca : 0x0010e0000187e318 ports 2 "node9 elasticNode 172.16.2.45,172.16.2.45 ETH0" Ca : 0x0010e000014bab90 ports 2 "node8 elasticNode 172.16.2.44,172.16.2.44 ETH0" Ca : 0x0010e0000188a918 ports 2 "node6 elasticNode 172.16.2.42,172.16.2.42 ETH0" Ca : 0x0010e0000188a9b8 ports 2 "node4 elasticNode 172.16.2.40,172.16.2.40 ETH0" Ca : 0x0010e00001884898 ports 2 "node3 elasticNode 172.16.2.39,172.16.2.39 ETH0" Ca : 0x0010e00001887818 ports 2 "node2 elasticNode 172.16.2.38,172.16.2.38 ETH0" Ca : 0x0010e0000188a088 ports 2 "node1 elasticNode 172.16.2.37,172.16.2.37 ETH0" [root@mycluster01sw-ib2 ~]#We can see 5 cells (5 storage servers):
node1 elasticNode 172.16.2.37 node2 elasticNode 172.16.2.38 node3 elasticNode 172.16.2.39 node4 elasticNode 172.16.2.40 node6 elasticNode 172.16.2.42And 4 database servers:
node8 elasticNode 172.16.2.44 node9 elasticNode 172.16.2.45 node10 elasticNode 172.16.2.46 node12 elasticNode 172.16.2.48As no network configuration has been done yet, all components have the default factory hostnames and IP. Oracle uses the private 172.16 network for the factory IP settings starting with the first database node at 172.16.2.44. Oracle uses the Infiniband port number where the node is plugged (the first database node is plugged on the port number 8) + 36 => 36 + 8 = 44 then the default database server node 1 IP is 172.16.2.44.
3.2/ Network Configuration
Let's configure these IPs and hostnames with the customer's requirements in order to be able to access the database and storage servers from the customer network.
3.2.1/ /etc/hosts of the IB Switches
There's not much to configure on the IB Switches, just to check that the IP and the hosts are properly defined in the /etc/hosts file; add it if it's not already done (which is unlikely):
[root@mycluster01sw-ib2 ~]# cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 10.XX.YY.ZZ mycluster01sw-ib2.mydomain.com mycluster-01sw-ib2 [root@mycluster01sw-ib2 ~]#
3.2.2/ ILOM console
To start with, we have to connect to the ILOM of the first database server (the IP is accessible from the customer network).
a_server:a_user:/home/a_user> ssh root@ILOM_IP Password: Oracle(R) Integrated Lights Out Manager Version 3.2.4.52 r101649 Copyright (c) 2015, Oracle and/or its affiliates. All rights reserved. Hostname: ORACLESP-1518NM117R ->As we don't need to do much things on the ILOM itself, let's head on to the first database server using the ILOM console:
-> start /SP/console Are you sure you want to start /SP/console (y/n)? y Serial console started. To stop, type ESC ( Then here press <ENTER> and login as root node8.my.company.com login: root Password: Last login: Fri Mar 18 13:08:45 on tty1 [root@node8 ~]#As we saw previously, the first database server is <node8> and a Linux OS is pre-installed:
[root@node8 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VGExaDb-LVDbSys1 30G 4.3G 24G 16% / tmpfs 252G 4.0K 252G 1% /dev/shm /dev/sda1 504M 81M 398M 17% /boot /dev/mapper/VGExaDbOra-LVDbOra1 99G 188M 94G 1% /u01 [root@node8 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.7 (Santiago) [root@node8 ~]
3.3/ ReclaimSpace
Each Exadata database server comes with Linux pre-installed but it is also possible to switch to Oracle VM which is also pre-installed on disk. In order to not waste any disk space, we can reclaim that space and any other unused space by executing the below command on every database server (a list of IP for the database servers can be taken from the ibhosts command like shown earlier):
[root@node8 ~]# /opt/oracle.SupportTools/reclaimdisks.sh -free -reclaim Model is ORACLE SERVER X5-2 ... [INFO ] Mount /dev/mapper/VGExaDb-LVDbOra1 to /u01 [INFO ] Logical volume LVDbSys2 exists in volume group VGExaDb [root@node8 ~]#
3.4/ OneCommand
Create a OneCommand directory, copy and unzip OEDA and the XML configuration file provided by the client in it.
[root@node8 ~]# mkdir /u01/oracle.SupportTools/onecommand [root@node8 ~]# cd /root/config_files/ [root@node8 config_files]# ls -ltr total 68664 -rw------- 1 root root 70186224 Mar 18 14:00 Ocmd-16.049-OTN-linux-x64.zip -rw------- 1 root root 114077 Mar 18 14:01Please Note that it is not allowed to run OEDA from a root (/opt) directory then use /u01 to unzip it. Here is the error you would face if you launch OEDA from /opt:-mycluster.xml -rw------- 1 root root 6105 Mar 18 14:01 databasemachine.xml [root@node8 config_files]# cd /u01/oracle.SupportTools/onecommand> [root@node8 onecommand]# unzip Ocmd-16.049-OTN-linux-x64.zip Archive: Ocmd-16.049-OTN-linux-x64.zip creating: linux-x64/ inflating: linux-x64/config.sh inflating: linux-x64/README.txt creating: linux-x64/Lib/ …. [root@node8 onecommand]# cd linux-x64/ [root@node8 linux-x64]# pwd /u01/oracle.SupportTools/onecommand/linux-x64 [root@node8 linux-x64]# mv .. -mycluster.xml . [root@node8 linux-x64]#
Invoke OEDA from a non root file system. Current directory /opt/oracle.SupportTools/onecommand/linux-x64 is part of the root file system Non-root file systems with required space are: File System /u01 free space 95485 MB
3.5/ ApplyElasticConfig
We can now proceed with the network configuration (the applyElasticConfig.sh shell script is provided by OEDA and is located under /opt/oracle.SupportTools/onecommand/linux-x64)
[root@node8 linux-x64]# ./applyElasticConfig.sh -cfThis will apply all the customer's specifications to the database and storage servers and all servers will be rebooted at the end of their configuration. Once the set-up applied, you will be able to connect to each database and storage servers from the customer network using their 10.x.y.z specific IPs or their hostnames if they have already been defined in the corporate DNS.-mycluster.xml Applying Elastic Config... Applying Elastic configuration... Searching Subnet 172.16.2.x 10 live IPs in 172.16.2.x Exadata node found 172.16.2.38 Configuring node : 172.16.2.48 Done Configuring node : 172.16.2.48 Configuring node : 172.16.2.46 Done Configuring node : 172.16.2.46 Configuring node : 172.16.2.45 Done Configuring node : 172.16.2.45 Configuring node : 172.16.2.42 Done Configuring node : 172.16.2.42 Configuring node : 172.16.2.41 Done Configuring node : 172.16.2.41 Configuring node : 172.16.2.40 \\\\\ ….
3.5.1/ An Issue with the Cells IP config
It happened that after the previous step, the cells were not properly configured and then the cells configuration was different from the real network configuration shown by the ifconfig command. We have to correct that manually to avoid any issue later on specially when applying some patches; here are the error messages you could face if your cells were misconfigured:
CELL-01533: Unable to validate the IP addresses from the cellinit.ora file because the IP addresses may be down or misconfigured ORA-00700: soft internal error, arguments: [main_6a], [3], [IP addresses in cellinit.ora not operational], [], [], [], [], [], [], [], [], []Here is the output of the ifconfig command on one cell (is pasted here only the information we need for visibility):
[root@myclustercel01 ~]# ifconfig ib0 ... inet addr:192.168.12.17 Bcast:192.168.12.255 Mask:255.255.255.0 ... [root@yclustercel01 ~]# ifconfig ib1 ... inet addr:192.168.12.18 Bcast:192.168.12.255 Mask:255.255.255.0 ... [root@yclustercel01 ~]#Here is a (bad) output of one cell (is pasted here only the information we need for visibility):
[root@myclustercel01 ~]# cellcli CellCLI> list cell detail name: ru02 ... ipaddress1: 192.168.10.1/24 ipaddress2: 192.168.10.2/24 ... CellCLI>Note : /24 is the network mask, it should be set to /22 in the following step Who's right then ? well, ibhosts doesn't lie:
[root@myclusterdb01 ~]# ibhosts | grep cel01 Ca : 0x0010e00001884a08 ports 2 "myclustercel01 C 192.168.12.17,192.168.12.18 HCA-1" [root@myclusterdb01 ~]#We then need to update the configuration manually (even the hostname has to be set) and restart the cells:
CellCLI> alter cell name=myclustercel01,ipaddress1='192.168.12.17/22',ipaddress2='192.168.12.18/22' Network configuration altered. Please issue the following commands as root to restart the network and open IB stack: service openibd restart service network restart A restart of all services is required to put new network configuration into effect. MS-CELLSRV communication may be hampered until restart. Cell myclustercel01 successfully altered CellCLI> alter cell restart services all Stopping the RS, CELLSRV, and MS services... The SHUTDOWN of services was successful. Starting the RS, CELLSRV, and MS services... Getting the state of RS services... running Starting CELLSRV services... The STARTUP of CELLSRV services was successful. Starting MS services... The STARTUP of MS services was successful. CellCLI>And then a quick check of the new config :
CellCLI> list cell detail name: myclustercel01 … ipaddress1: 192.168.12.17/22 ipaddress2: 192.168.12.18/22 ... CellCLI>Note : this procedure has to be executed for each misconfigured cell only
3.6/ dbs_group, cell_group and all_group
You have probably already seen many Exadata documentations mentioning some files named dbs_group, cell_group and all_group.
But where do these files come from ? Well, you, the DBA, have to create them :)
As these files will be used on our daily basis administration tasks, it is a good idea to put them in a well known directory.
Please find below few commands to create them quickly -- I also like to have an ib_group file with the name of my IB Switches, this will be very useful when you will be patching the IB Switches later on (it would be sad to stop at the installation, isn't it ?).
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep db | awk '{print $6}' | sort > /root/dbs_group [root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep cel | awk '{print $6}' | sort > /root/cell_group [root@myclusterdb01 ~]# cat /root/dbs_group /root/cell_group > /root/all_group [root@myclusterdb01 ~]# ibswitches | awk '{print $10}' | sort > /root/ib_group [root@myclusterdb01 ~]#
3.7/ Root SSH equivalence
It is better to have a root SSH equivalence to ease all the installation and the administration tasks. We will choose one server (I use to choose DB server node 1 aka "myclusterdb01") and will be deploying its SSH key to all the other DB nodes, cell nodes and the IB Switches as well.
First, we need to generate the root's SSH environment (press <ENTER> after each question):
First, we need to generate the root's SSH environment (press <ENTER> after each question):
[root@myclusterdb01 ~]# ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 05:bb:f3:3b:a7:d0:f5:7b:c2:71:10:05:2e:7e:6e:11 root@myclusterdb01.mydomain.com The key's randomart image is: +--[ RSA 2048]----+ | . .o.| | o .. | | . . . E. | | o . ... | | S o o. | | + . +...| | . o .+o | | .....o..| | o+ .o | +-----------------+ [root@myclusterdb01 ~]#And we will use your futur new best friend dcli to deploy the ssh keys to all the database and storage servers from the database node 1:
[root@myclusterdb01 ~]# dcli -g /root/all_group -l root -k -s '-o StrictHostKeyChecking=no' root@myclusterdb01's password: root@myclusterdb03's password: … [root@myclusterdb01 ~]#Note : you will be prompted for each server root's password, don't mess up,root user is locked after the first failed attempt :) Let's do the same for the IB Switches:
[root@myclusterdb01 ~]# dcli -g /root/ib_group -l root -k -s '-o StrictHostKeyChecking=no' … [root@myclusterdb01 ~]#You are now able to connect from the first DB server to any host with no password. You can quickly test it with the below command :
# dcli -g ~/all_group -l root date
All is now ready to jump to the installation part which is detailed in the part 2 of this blog !
No comments:
Post a Comment