I thought that having to manually remove shared memory segments was an old problem; I remember having this kind of issue with Oracle 7 and 8 but not much after that but it seems that this is happenning (back) on VMs from time to time after some instances crash for example (never on bare metal as far as I remember -- well, nothing crashes on bare metal).
A scenario for example is that an Oracle instance crashes on a VM and cannot restart because there are not enough Huge Pages (the instance has use_large_pages=only) so we need to find the stuck shared memory segments and clean them up to free this memory.
Hopefully, it is easy to identify using the ipcs -ma command and the dest status:
A scenario for example is that an Oracle instance crashes on a VM and cannot restart because there are not enough Huge Pages (the instance has use_large_pages=only) so we need to find the stuck shared memory segments and clean them up to free this memory.
Hopefully, it is easy to identify using the ipcs -ma command and the dest status:
# ipcs -ma ------ Message Queues -------- key msqid owner perms used-bytes messages ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 333381638 oracle 600 40802189312 7245 0x00000000 333414407 oracle 600 140509184 2415 0x35854b58 333447176 oracle 600 2097152 2415 0x00000000 333971465 oracle 600 6291456 11 dest 0x00000000 334004234 oracle 600 40265318400 33 dest 0x00000000 334037003 oracle 600 140509184 11 dest 0x00000000 334069772 oracle 600 2097152 11 dest 0x00000000 334102541 oracle 600 6291456 1575 0x00000000 334135310 oracle 600 39191576576 4725 0x00000000 1741914130 oracle 600 5335154688 774 0x00000000 333774870 oracle 600 40265318400 4455 #The dest status means destroy letting the system know that it can free these shared memory segments. So why are they not freed ? this is because of the number of processes still attached to this memory segment which are shown in the nattch column. We then have to identify these processes and kill them which is also easy thanks to lsof:
# lsof | grep <shmid>Which could looks like below for the above 4 shared memory segments to clean up (all in one command is better than 4 commands):
# lsof | egrep '333971465|334004234|334037003|334069772'As you may have a lot of processes to kill, knowing that the process will be the second column of the lsof output, you can generate all the kill commands automatically:
# lsof | egrep '333971465|334004234|334037003|334069772' | awk '{print "kill -9 "$2 }' | sort | uniq kill -9 15692 kill -9 27132 kill -9 33037 kill -9 33725 kill -9 384913 kill -9 388064 kill -9 4602 kill -9 53630 kill -9 63605 #You can even | bash the output to kill all the processes in one command and the OS will then automatically release this stuck memory (see below, no more dest status):
# ipcs -ma ------ Message Queues -------- key msqid owner perms used-bytes messages ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 333381638 oracle 600 40802189312 7245 0x00000000 333414407 oracle 600 140509184 2415 0x35854b58 333447176 oracle 600 2097152 2415 0x00000000 334102541 oracle 600 6291456 1575 0x00000000 334135310 oracle 600 39191576576 4725 0x00000000 1741914130 oracle 600 5335154688 774 0x00000000 333774870 oracle 600 40265318400 4455 #That's all for today !
No comments:
Post a Comment