If you have a hanging server but no useful errors are shown in /var/log/messages, the way to find the reason is to install and configure kdump, so it will create a core dump file when the server hangs. In some cases kdump could not help on XEN PV machines, however, in any case, you should try it as on regular servers it is really doing a good job.
Here are the steps to install and configure kdump:
- Install kexec-tools:
1yum install kexec-tools
Edit /etc/kdump.conf, and set path variable to point to a directory with enough space to hold kernel dump file (default location is /var/crash/). File size will be about the size of the server RAM + 1GB.
- Edit /etc/grub.conf.
For CloudLinux 5 add to the kernel line as another boot parameter or modify existing one:1crashkernel=160M@12M
For CloudLinux 6 add to the kernel line as another boot parameter or modify existing one:1crashkernel=160M
For CloudLinux 7 edit /etc/default/grub and add crashkernel=160M to GRUB_CMDLINE_LINUX parameter (or modify existing one) so it looks like:
GRUB_CMDLINE_LINUX=”crashkernel=160M rhgb quiet”
Then regenerate grub.conf with the following command:1grub2-mkconfig -o /boot/grub2/grub.cfg
For CLoudLinux 5 and 6 – add kdump to chkconfig and turn it On during boot:12chkconfig --add kdumpchkconfig kdump on
- Modify /etc/sysctl.conf file and add the following block to catch all possible panic states:
12345678910# Enable reboots on panic to allow kdump make dumpskernel.sysrq=1kernel.hung_task_panic = 1kernel.panic = 1kernel.panic_on_io_nmi = 1kernel.panic_on_oops = 1kernel.panic_on_stackoverflow = 1kernel.panic_on_unrecovered_nmi = 1kernel.softlockup_panic = 1kernel.unknown_nmi_panic = 1
After the server boot check if kdump is running with:
service kdump status
Obtaining coredump if server hangs is described here.