My HP ProLiant MicroServer Gen8 G1610T VirtualBox host has been experiencing random restarts.
sudo cat /var/crash/127.0.0.1-2016-12-29-16\:39\:32/vmcore-dmesg.txt
To investigate let’s first take a look at the crash dumps:
... [2928954.556168] Call Trace: [2928954.556182]
[ ] dump_stack+0x19/0x1b [2928954.556227] [ ] panic+0xd8/0x1e7 [2928954.556256] [ ] hpwdt_pretimeout+0xdd/0xe0 [hpwdt] [2928954.556287] [ ] nmi_handle.isra.0+0x69/0xb0 [2928954.556316] [ ] do_nmi+0x126/0x340 [2928954.556340] [ ] end_repeat_nmi+0x1e/0x2e [2928954.556368] [ ] ? intel_idle+0xd7/0x160 [2928954.556395] [ ] ? intel_idle+0xd7/0x160 [2928954.556421] [ ] ? intel_idle+0xd7/0x160 [2928954.556446] < > [ ] cpuidle_enter_state+0x40/0xc0 [2928954.556483] [ ] cpuidle_idle_call+0xd9/0x210 [2928954.556513] [ ] arch_cpu_idle+0xe/0x30 [2928954.556540] [ ] cpu_startup_entry+0x245/0x290 [2928954.556570] [ ] rest_init+0x77/0x80 [2928954.556595] [ ] start_kernel+0x429/0x44a [2928954.556622] [ ] ? repair_env_string+0x5c/0x5c [2928954.556652] [ ] ? early_idt_handlers+0x120/0x120 [2928954.556681] [ ] x86_64_start_reservations+0x2a/0x2c [2928954.556712] [ ] x86_64_start_kernel+0x152/0x175
An identical box that is used as an NFS server has been up and running without any problems for many weeks now. The crash dump seems to implicate the hpwdt module, so the problem is probably caused by interaction between VirtualBox and hpwdt.
A workaround solution is the blacklisting of the hpwdt module:
sudo -s echo "blacklist hpwdt" >> /etc/modprobe.d/blacklist-hp.conf rmmod hpwdt dracut -f
Disabling hpwdt has so far eliminated the random restarts and I have not noticed any side effects. If anybody can pinpoint what exactly the problem is, please let me know.
It has been suggested (see comments and links below) that an alternative solution could be to disable ASR (Automatic Server Restart) in the BIOS. This I have not tried, but would likely to work as well.
Also see:Ubuntu Bug #141758: HP Proliant Servers Advices for Ubuntu Linux (cmdline, panics, firmware options)
Ubuntu Bug #1318551: Kernel Panic – not syncing: An NMI occurred, please see the Integrated Management Log for details.
Ubuntu Bug #1432837: HP Proliant Servers – Kernel Panic – NMI – DL360 & DL380 – HPWDT module loaded
virtualbox.org: Kernel Panic VirtualBoxc with HP proliant DL360G7
virtualbox.org: Intermittent Hang/Crash
community.hpe.com: Disable Automatic Server Restart (ASR) in BIOS