Kvm - cpu cores are disabling and enabling in a loop


#1

On our virtualisation server with KVM, cpu cores are disabling and enabling in a loop after 10 minutes (every disable results in 15 seconds hang for all virtual machines).

It happens from thunderstorm before a week, when all virtual servers was hanged due to data disk error (system disk was ok). So we changed data disk. Next, we tried upgrade host system from ubuntu natty (kernel 2.6) to ubuntu precise (3.2), with no change.

I found only one forum about it, without solution ubuntuforums.org/showthread.php?p=12071553

I tried switch on kvm debug

and find exact place by kernel time in syslog, but i don’t undestand log and didn’t see any important difference

I think it could be some bad signal from motherboard. Due to disk error, it could happen something with motherboard, but i don’t know how to find

There is syslog part with one disable/enable loop

Thank you for any advice, how to trace, where to look, to resolve problem.

Jul 14 15:36:44 node-01 kernel: [56713.568733] kvm: disabling virtualization on CPU1 Jul 14 15:36:44 node-01 kernel: [56713.668842] CPU 1 is now offline Jul 14 15:36:44 node-01 kernel: [56713.670835] CPU 3 MCA banks CMCI:2 CMCI:3 CMCI:5 Jul 14 15:36:44 node-01 kernel: [56713.673771] kvm: disabling virtualization on CPU2 Jul 14 15:36:44 node-01 kernel: [56713.674492] CPU 2 is now offline Jul 14 15:36:44 node-01 kernel: [56713.680172] kvm: disabling virtualization on CPU3 Jul 14 15:36:44 node-01 kernel: [56713.681114] CPU 3 is now offline Jul 14 15:36:44 node-01 kernel: [56713.681119] SMP alternatives: switching to UP code Jul 14 15:36:44 node-01 kernel: [56713.701971] init: anacron main process (3613) killed by TERM signal Jul 14 15:36:44 node-01 kernel: [56713.709803] r8169 0000:01:00.0: eth0: link down Jul 14 15:36:44 node-01 kernel: [56713.710421] br0: port 1(eth0) entering forwarding state Jul 14 15:36:47 node-01 kernel: [56716.675313] r8169 0000:01:00.0: eth0: link up Jul 14 15:36:47 node-01 kernel: [56716.676438] br0: port 1(eth0) entering forwarding state Jul 14 15:36:47 node-01 kernel: [56716.676454] br0: port 1(eth0) entering forwarding state Jul 14 15:36:56 node-01 kernel: [56725.666787] br0: port 1(eth0) entering forwarding state Jul 14 15:37:02 node-01 kernel: [56730.815937] SMP alternatives: switching to SMP code Jul 14 15:37:02 node-01 kernel: [56730.825021] Booting Node 0 Processor 1 APIC 0x4 Jul 14 15:37:02 node-01 kernel: [56730.825025] smpboot cpu 1: start_ip = 9a000 Jul 14 15:37:02 node-01 kernel: [56730.836033] Calibrating delay loop (skipped) already calibrated this CPU Jul 14 15:37:02 node-01 kernel: [56730.837012] kvm: enabling virtualization on CPU1 Jul 14 15:37:02 node-01 kernel: [56730.858555] NMI watchdog enabled, takes one hw-pmu counter. Jul 14 15:37:02 node-01 kernel: [56730.862547] Booting Node 0 Processor 2 APIC 0x1 Jul 14 15:37:02 node-01 kernel: [56730.862551] smpboot cpu 2: start_ip = 9a000 Jul 14 15:37:02 node-01 kernel: [56730.873460] Calibrating delay loop (skipped) already calibrated this CPU Jul 14 15:37:02 node-01 kernel: [56730.874453] kvm: enabling virtualization on CPU2 Jul 14 15:37:02 node-01 kernel: [56730.896371] NMI watchdog enabled, takes one hw-pmu counter. Jul 14 15:37:02 node-01 kernel: [56730.898581] Booting Node 0 Processor 3 APIC 0x5 Jul 14 15:37:02 node-01 kernel: [56730.898586] smpboot cpu 3: start_ip = 9a000 Jul 14 15:37:02 node-01 kernel: [56730.909496] Calibrating delay loop (skipped) already calibrated this CPU Jul 14 15:37:02 node-01 kernel: [56730.910227] kvm: enabling virtualization on CPU3 Jul 14 15:37:02 node-01 kernel: [56730.930644] NMI watchdog enabled, takes one hw-pmu counter. Jul 14 15:37:02 node-01 kernel: [56730.963737] r8169 0000:01:00.0: eth0: link down Jul 14 15:37:02 node-01 kernel: [56730.964069] br0: port 1(eth0) entering forwarding state Jul 14 15:37:04 node-01 kernel: [56733.432535] r8169 0000:01:00.0: eth0: link up Jul 14 15:37:04 node-01 kernel: [56733.433808] br0: port 1(eth0) entering forwarding state Jul 14 15:37:04 node-01 kernel: [56733.433823] br0: port 1(eth0) entering forwarding state Jul 14 15:37:13 node-01 kernel: [56742.424751] br0: port 1(eth0) entering forwarding state