Twitter

Shared memory segments cleanup

I thought that having to manually remove shared memory segments was an old problem; I remember having this kind of issue with Oracle 7 and 8 but not much after that but it seems that this is happenning (back) on VMs from time to time after some instances crash for example (never on bare metal as far as I remember -- well, nothing crashes on bare metal).
A scenario for example is that an Oracle instance crashes on a VM and cannot restart because there are not enough Huge Pages (the instance has use_large_pages=only) so we need to find the stuck shared memory segments and clean them up to free this memory.

Hopefully, it is easy to identify using the ipcs -ma command and the dest status:
# ipcs -ma
------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 333381638  oracle     600        40802189312 7245
0x00000000 333414407  oracle     600        140509184  2415
0x35854b58 333447176  oracle     600        2097152    2415
0x00000000 333971465  oracle     600        6291456     11         dest
0x00000000 334004234  oracle     600        40265318400 33         dest
0x00000000 334037003  oracle     600        140509184   11         dest
0x00000000 334069772  oracle     600        2097152     11         dest
0x00000000 334102541  oracle     600        6291456    1575
0x00000000 334135310  oracle     600        39191576576 4725
0x00000000 1741914130 oracle     600        5335154688 774
0x00000000 333774870  oracle     600        40265318400 4455
#
The dest status means destroy letting the system know that it can free these shared memory segments. So why are they not freed ? this is because of the number of processes still attached to this memory segment which are shown in the nattch column. We then have to identify these processes and kill them which is also easy thanks to lsof:
# lsof | grep <shmid>
Which could looks like below for the above 4 shared memory segments to clean up (all in one command is better than 4 commands):
# lsof | egrep '333971465|334004234|334037003|334069772'
As you may have a lot of processes to kill, knowing that the process will be the second column of the lsof output, you can generate all the kill commands automatically:
# lsof | egrep '333971465|334004234|334037003|334069772' | awk '{print "kill -9 "$2 }' | sort | uniq
kill -9 15692
kill -9 27132
kill -9 33037
kill -9 33725
kill -9 384913
kill -9 388064
kill -9 4602
kill -9 53630
kill -9 63605
# 
You can even | bash the output to kill all the processes in one command and the OS will then automatically release this stuck memory (see below, no more dest status):
# ipcs -ma
------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 333381638  oracle     600        40802189312 7245
0x00000000 333414407  oracle     600        140509184  2415
0x35854b58 333447176  oracle     600        2097152    2415
0x00000000 334102541  oracle     600        6291456    1575
0x00000000 334135310  oracle     600        39191576576 4725
0x00000000 1741914130 oracle     600        5335154688 774
0x00000000 333774870  oracle     600        40265318400 4455
#
That's all for today !

No comments:

Post a Comment

CUDA: getting started on WSL

I have always preferred command line and vi finding it more efficient so after the CUDA: getting started on Windows , let's have a loo...