My HP ProLiant MicroServer Gen8 G1610T VirtualBox host has been experiencing random restarts.
sudo cat /var/crash/127.0.0.1-2016-12-29-16\:39\:32/vmcore-dmesg.txt
To investigate let’s first take a look at the crash dumps:
... [2928954.556168] Call Trace: [2928954.556182][ ] dump_stack+0x19/0x1b [2928954.556227] [ ] panic+0xd8/0x1e7 [2928954.556256] [ ] hpwdt_pretimeout+0xdd/0xe0 [hpwdt] [2928954.556287] [ ] nmi_handle.isra.0+0x69/0xb0 [2928954.556316] [ ] do_nmi+0x126/0x340 [2928954.556340] [ ] end_repeat_nmi+0x1e/0x2e [2928954.556368] [ ] ? intel_idle+0xd7/0x160 [2928954.556395] [ ] ? intel_idle+0xd7/0x160 [2928954.556421] [ ] ? intel_idle+0xd7/0x160 [2928954.556446] < > [ ] cpuidle_enter_state+0x40/0xc0 [2928954.556483] [ ] cpuidle_idle_call+0xd9/0x210 [2928954.556513] [ ] arch_cpu_idle+0xe/0x30 [2928954.556540] [ ] cpu_startup_entry+0x245/0x290 [2928954.556570] [ ] rest_init+0x77/0x80 [2928954.556595] [ ] start_kernel+0x429/0x44a [2928954.556622] [ ] ? repair_env_string+0x5c/0x5c [2928954.556652] [ ] ? early_idt_handlers+0x120/0x120 [2928954.556681] [ ] x86_64_start_reservations+0x2a/0x2c [2928954.556712] [ ] x86_64_start_kernel+0x152/0x175
An identical box that is used as an NFS server has been up and running without any problems for many weeks now. The crash dump seems to implicate the hpwdt module, so the problem is probably caused by interaction between VirtualBox and hpwdt.
A workaround solution is the blacklisting of the hpwdt module:
sudo -s echo "blacklist hpwdt" >> /etc/modprobe.d/blacklist-hp.conf rmmod hpwdt dracut -f
Disabling hpwdt has so far eliminated the random restarts and I have not noticed any side effects. If anybody can pinpoint what exactly the problem is, please let me know.
It has been suggested (see comments and links below) that an alternative solution could be to disable ASR (Automatic Server Restart) in the BIOS. This I have not tried, but would likely to work as well.
Also see:
Ubuntu Bug #141758: HP Proliant Servers Advices for Ubuntu Linux (cmdline, panics, firmware options)Ubuntu Bug #1318551: Kernel Panic – not syncing: An NMI occurred, please see the Integrated Management Log for details.
Ubuntu Bug #1432837: HP Proliant Servers – Kernel Panic – NMI – DL360 & DL380 – HPWDT module loaded
virtualbox.org: Kernel Panic VirtualBoxc with HP proliant DL360G7
virtualbox.org: Intermittent Hang/Crash
community.hpe.com: Disable Automatic Server Restart (ASR) in BIOS
you have to disable ASR from HP bios
Thanks for the suggestion. I have not tried disabling the ASR (Automatic Server Recovery), but I agree that this would also solve the random restart problem. Having said this, it still seems to me that it is the hpwdt module that is not working as it should, and so I am inclined to ‘disable’ that rather then the ASR. For CentOS 7 as the OS, is there anything else apart from hpwdt that could trigger ASR? If yes, we may also want to keep ASR enabled for those, potentially useful, trigger events.