Major KVM issues with kernel 4.5 on the host

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Major KVM issues with kernel 4.5 on the host

Marc Haber-22
Hi,

I have a (semi-productive[1]) system ("host") running Debian unstable.
On this system, a few VMs (Debian unstable, Debian testing) ("vm1",
"vm2", "vm3") are running. I roll my own kernels and take vanilla
upstream sources. No distribution patches.

Since host was updated to Kernel 4.5, the VMs have started acting up.
All of them. The range of strangeness begins with "relocation error,
system halted" on system startup, corrupted data files on disk,
filesystems remounted read-only, libraries rejected with "invalid ELF
format", binaries segfaulting all of a sudden. Downgrading host to
kernel 4.4.5 magically fixed all those issues.

Going back to 4.5 lets the issues reappear. Here, for example, ext4 fs
errors, logged in one of the VMs:

Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #415065: comm aide: deleted inode referenced: 546538
Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #415065: comm aide: deleted inode referenced: 546530
Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546543: comm aide: bad extra_isize (44800 != 256)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: inode #546568: comm aide: bogus i_mode (144)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #546548: comm aide: deleted inode referenced: 546564
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #546548: comm aide: deleted inode referenced: 546562
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546563: comm aide: bad extra_isize (6464 != 256)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: inode #546561: comm aide: bogus i_mode (0)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546529: comm aide: bad extra_isize (1152 != 256)
Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_xattr_block_get:297: inode #546359: comm aide: bad block 677784

I'm going to try reproducing the issue on a less "important" machine
so that bisecting is less painful, but maybe you guys have an idea
what's going wrong here.

jftr, kernel 4.5 in guest and in standalone systems seems to be
unproblematic.

Greetings
Marc


[1] my main workstation, running enough services for the local network
that disturbances in its operation cause reasonable discomfort, but not the
Enterprise kind of "productive"

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Borislav Petkov-3
+ kvm ML.

Do you have any funky messages in host's dmesg ? Can you upload a full
dmesg from both a good and a bad host kernel?

On Thu, Mar 17, 2016 at 05:54:35PM +0100, Marc Haber wrote:

> Hi,
>
> I have a (semi-productive[1]) system ("host") running Debian unstable.
> On this system, a few VMs (Debian unstable, Debian testing) ("vm1",
> "vm2", "vm3") are running. I roll my own kernels and take vanilla
> upstream sources. No distribution patches.
>
> Since host was updated to Kernel 4.5, the VMs have started acting up.
> All of them. The range of strangeness begins with "relocation error,
> system halted" on system startup, corrupted data files on disk,
> filesystems remounted read-only, libraries rejected with "invalid ELF
> format", binaries segfaulting all of a sudden. Downgrading host to
> kernel 4.4.5 magically fixed all those issues.
>
> Going back to 4.5 lets the issues reappear. Here, for example, ext4 fs
> errors, logged in one of the VMs:
>
> Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #415065: comm aide: deleted inode referenced: 546538
> Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #415065: comm aide: deleted inode referenced: 546530
> Mar 17 17:39:57 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546543: comm aide: bad extra_isize (44800 != 256)
> Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: inode #546568: comm aide: bogus i_mode (144)
> Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #546548: comm aide: deleted inode referenced: 546564
> Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_lookup:1602: inode #546548: comm aide: deleted inode referenced: 546562
> Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546563: comm aide: bad extra_isize (6464 != 256)
> Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4466: inode #546561: comm aide: bogus i_mode (0)
> Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_iget:4269: inode #546529: comm aide: bad extra_isize (1152 != 256)
> Mar 17 17:39:58 spinturn kernel: EXT4-fs error (device dm-0): ext4_xattr_block_get:297: inode #546359: comm aide: bad block 677784
>
> I'm going to try reproducing the issue on a less "important" machine
> so that bisecting is less painful, but maybe you guys have an idea
> what's going wrong here.
>
> jftr, kernel 4.5 in guest and in standalone systems seems to be
> unproblematic.
>
> Greetings
> Marc
>
>
> [1] my main workstation, running enough services for the local network
> that disturbances in its operation cause reasonable discomfort, but not the
> Enterprise kind of "productive"
>
> --
> -----------------------------------------------------------------------------
> Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
> Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
> Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
>

--
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Paolo Bonzini


On 17/03/2016 19:11, Borislav Petkov wrote:
> I'm going to try reproducing the issue on a less "important" machine
> so that bisecting is less painful, but maybe you guys have an idea
> what's going wrong here.

No idea, sorry. :(  Bisecting would be great.  I'll also try reproducing
and bisecting next week, in the meanwhile just having the host dmesg
would help a lot.

Paolo
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Marc Haber-22
In reply to this post by Borislav Petkov-3
Hi Borislav,

On Thu, Mar 17, 2016 at 07:11:28PM +0100, Borislav Petkov wrote:
> Do you have any funky messages in host's dmesg ?

Not that I see.

> Can you upload a full dmesg from both a good and a bad host kernel?

http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5
http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5

Hope this helps.

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Borislav Petkov-3
On Fri, Mar 18, 2016 at 07:49:29PM +0100, Marc Haber wrote:
> http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5

This one I got.

> http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5

This one doesn't want:

HTTP request sent, awaiting response... 403 Forbidden
2016-03-18 22:57:46 ERROR 403: Forbidden.

So I have a similar system to yours, I'll try to reproduce on it with
4.5.

Anything special you're doing to cause the host kernel to barf which I
should do here?

--
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Marc Haber-22
Hi Borislav,

On Fri, Mar 18, 2016 at 11:04:29PM +0100, Borislav Petkov wrote:

> On Fri, Mar 18, 2016 at 07:49:29PM +0100, Marc Haber wrote:
> > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.4.5
>
> This one I got.
>
> > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5
>
> This one doesn't want:
>
> HTTP request sent, awaiting response... 403 Forbidden
> 2016-03-18 22:57:46 ERROR 403: Forbidden.

Idiot me. File permissions fixed.

> Anything special you're doing to cause the host kernel to barf which I
> should do here?

Booting Debian Linux, apt-get update, apt-get upgrade, and run aide
(which builds checksums for the entire filesystem, a rather disk-bound
activity).

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Borislav Petkov-3
On Sat, Mar 19, 2016 at 01:08:37AM +0100, Marc Haber wrote:
> Booting Debian Linux, apt-get update, apt-get upgrade, and run aide
> (which builds checksums for the entire filesystem, a rather disk-bound
> activity).

So I did that and aide ran a whole init and check all the way through
and all fine. I don't see anything out of the ordinary in your dmesg
outputs either.

The next things we should look like is:

* diff .configs - there might be something there

* try to reproduce on debian testing or even stable. I have had similar
issues with debian unstable in the past.

* something else which I'm not thinking of it right now.

--
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Andrey Korolyov
On Sun, Mar 20, 2016 at 4:31 PM, Borislav Petkov <[hidden email]> wrote:

> On Sat, Mar 19, 2016 at 01:08:37AM +0100, Marc Haber wrote:
>> Booting Debian Linux, apt-get update, apt-get upgrade, and run aide
>> (which builds checksums for the entire filesystem, a rather disk-bound
>> activity).
>
> So I did that and aide ran a whole init and check all the way through
> and all fine. I don't see anything out of the ordinary in your dmesg
> outputs either.
>
> The next things we should look like is:
>
> * diff .configs - there might be something there
>
> * try to reproduce on debian testing or even stable. I have had similar
> issues with debian unstable in the past.
>
> * something else which I'm not thinking of it right now.
>
> --
> Regards/Gruss,
>     Boris.
>

Kinda naive question - do you run same ucode version as Marc on his device?
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Borislav Petkov-3
On Sun, Mar 20, 2016 at 08:14:58PM +0300, Andrey Korolyov wrote:
> Kinda naive question - do you run same ucode version as Marc on his device?

Yeah, we both have 0x010000dc.

In case you're referring to the recent faulty AMD microcode patch -
it doesn't apply here. The boxes in question are family 0x10 and the
microcode patch is for family 0x15.

--
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Andrey Korolyov
On Sun, Mar 20, 2016 at 9:25 PM, Borislav Petkov <[hidden email]> wrote:
> On Sun, Mar 20, 2016 at 08:14:58PM +0300, Andrey Korolyov wrote:
>> Kinda naive question - do you run same ucode version as Marc on his device?
>
> Yeah, we both have 0x010000dc.
>
> In case you're referring to the recent faulty AMD microcode patch -
> it doesn't apply here. The boxes in question are family 0x10 and the
> microcode patch is for family 0x15.
>

Yes, I suggested that the issue could fall over a different family as
well to expose explicit corruption of a guest pages (as opposed to a
generic corruption in a known case). Since there is no direct evidence
of what exactly (data or pgt) is getting corrupted, would disabling
npt for a testing purposes be helpful?
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Borislav Petkov-3
On Sun, Mar 20, 2016 at 09:42:15PM +0300, Andrey Korolyov wrote:
> Yes, I suggested that the issue could fall over a different family as
> well to expose explicit corruption of a guest pages (as opposed to a
> generic corruption in a known case).

Probably, but I don't think it is microcode patch related.

> Since there is no direct evidence of what exactly (data or pgt) is
> getting corrupted, would disabling npt for a testing purposes be
> helpful?

So I'm not sure what even happens here yet. I haven't seen anything out
of the ordinary in Marc's dmesg and I wasn't able to reproduce either.
So would it be good to try with "npt=0"? Sure, why not.

Marc, you could give that a try to see if it changes anything...

--
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Paolo Bonzini
In reply to this post by Marc Haber-22


On 19/03/2016 01:08, Marc Haber wrote:

>> >
>>> > > http://q.bofh.de/~mh/stuff/20160317-fan-syslog-kvm-4.5
>> >
>> > This one doesn't want:
>> >
>> > HTTP request sent, awaiting response... 403 Forbidden
>> > 2016-03-18 22:57:46 ERROR 403: Forbidden.
> Idiot me. File permissions fixed.
>
>> > Anything special you're doing to cause the host kernel to barf which I
>> > should do here?
> Booting Debian Linux, apt-get update, apt-get upgrade, and run aide
> (which builds checksums for the entire filesystem, a rather disk-bound
> activity).

Ok, so this is AMD.  I'll take a look.

Paolo
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Marc Haber-22
In reply to this post by Borislav Petkov-3
On Sun, Mar 20, 2016 at 02:31:58PM +0100, Borislav Petkov wrote:

> On Sat, Mar 19, 2016 at 01:08:37AM +0100, Marc Haber wrote:
> > Booting Debian Linux, apt-get update, apt-get upgrade, and run aide
> > (which builds checksums for the entire filesystem, a rather disk-bound
> > activity).
>
> So I did that and aide ran a whole init and check all the way through
> and all fine. I don't see anything out of the ordinary in your dmesg
> outputs either.
>
> The next things we should look like is:
>
> * diff .configs - there might be something there#

Here we go:

[2/501]mh@fan:~$ diff -u0 /boot/config-4.4.6-zgws1 /boot/config-4.5.1-zgws1
--- /boot/config-4.4.6-zgws1    2016-03-28 15:50:36.000000000 +0200
+++ /boot/config-4.5.1-zgws1    2016-04-13 08:32:44.000000000 +0200
@@ -3 +3 @@
-# Linux/x86_64 4.4.6 Kernel Configuration
+# Linux/x86_64 4.5.1 Kernel Configuration
@@ -14 +13,0 @@
-CONFIG_HAVE_LATENCYTOP_SUPPORT=y
@@ -15,0 +15,4 @@
+CONFIG_ARCH_MMAP_RND_BITS_MIN=28
+CONFIG_ARCH_MMAP_RND_BITS_MAX=32
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
@@ -147,7 +149,0 @@
-# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_FREEZER=y
-CONFIG_CGROUP_PIDS=y
-CONFIG_CGROUP_DEVICE=y
-CONFIG_CPUSETS=y
-CONFIG_PROC_PID_CPUSET=y
-CONFIG_CGROUP_CPUACCT=y
@@ -158,3 +154,3 @@
-# CONFIG_MEMCG_KMEM is not set
-# CONFIG_CGROUP_HUGETLB is not set
-CONFIG_CGROUP_PERF=y
+CONFIG_BLK_CGROUP=y
+# CONFIG_DEBUG_BLK_CGROUP is not set
+CONFIG_CGROUP_WRITEBACK=y
@@ -165,3 +161,9 @@
-CONFIG_BLK_CGROUP=y
-# CONFIG_DEBUG_BLK_CGROUP is not set
-CONFIG_CGROUP_WRITEBACK=y
+CONFIG_CGROUP_PIDS=y
+CONFIG_CGROUP_FREEZER=y
+# CONFIG_CGROUP_HUGETLB is not set
+CONFIG_CPUSETS=y
+CONFIG_PROC_PID_CPUSET=y
+CONFIG_CGROUP_DEVICE=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_PERF=y
+# CONFIG_CGROUP_DEBUG is not set
@@ -254 +255,0 @@
-CONFIG_HAVE_DMA_ATTRS=y
@@ -288,0 +290,4 @@
+CONFIG_HAVE_ARCH_MMAP_RND_BITS=y
+CONFIG_ARCH_MMAP_RND_BITS=28
+CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS=y
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS=8
@@ -377,0 +383 @@
+CONFIG_X86_FAST_FEATURE_TESTS=y
@@ -383 +389 @@
-CONFIG_IOSF_MBI=m
+CONFIG_IOSF_MBI=y
@@ -390,0 +397 @@
+# CONFIG_QUEUED_LOCK_STAT is not set
@@ -769,0 +777 @@
+# CONFIG_VMD is not set
@@ -772,0 +781 @@
+CONFIG_NET_EGRESS=y
@@ -824,0 +834 @@
+# CONFIG_INET_DIAG_DESTROY is not set
@@ -945,0 +956,3 @@
+CONFIG_NF_DUP_NETDEV=m
+CONFIG_NFT_DUP_NETDEV=m
+CONFIG_NFT_FWD_NETDEV=m
@@ -1252,0 +1266 @@
+# CONFIG_6LOWPAN_DEBUGFS is not set
@@ -1344,0 +1359 @@
+CONFIG_SOCK_CGROUP_DATA=y
@@ -1411 +1425,0 @@
-CONFIG_WEXT_SPY=y
@@ -1423,5 +1437 @@
-CONFIG_LIB80211=m
-CONFIG_LIB80211_CRYPT_WEP=m
-CONFIG_LIB80211_CRYPT_CCMP=m
-CONFIG_LIB80211_CRYPT_TKIP=m
-# CONFIG_LIB80211_DEBUG is not set
+# CONFIG_LIB80211 is not set
@@ -1469 +1479,2 @@
-# CONFIG_NFC_ST_NCI is not set
+# CONFIG_NFC_ST_NCI_I2C is not set
+# CONFIG_NFC_ST_NCI_SPI is not set
@@ -1616,2 +1627,2 @@
-CONFIG_PARPORT_PC=m
-CONFIG_PARPORT_SERIAL=m
+CONFIG_PARPORT_PC=y
+CONFIG_PARPORT_SERIAL=y
@@ -1619 +1630 @@
-CONFIG_PARPORT_PC_SUPERIO=y
+# CONFIG_PARPORT_PC_SUPERIO is not set
@@ -1968,0 +1980 @@
+# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set
@@ -1971 +1982,0 @@
-# CONFIG_DM_DEBUG_BLOCK_STACK_TRACING is not set
@@ -2131,0 +2143 @@
+# CONFIG_NET_VENDOR_NETRONOME is not set
@@ -2263,43 +2275,6 @@
-# CONFIG_PCMCIA_RAYCS is not set
-# CONFIG_LIBERTAS_THINFIRM is not set
-# CONFIG_AIRO is not set
-# CONFIG_ATMEL is not set
-# CONFIG_AT76C50X_USB is not set
-# CONFIG_AIRO_CS is not set
-# CONFIG_PCMCIA_WL3501 is not set
-# CONFIG_PRISM54 is not set
-# CONFIG_USB_ZD1201 is not set
-# CONFIG_USB_NET_RNDIS_WLAN is not set
-# CONFIG_ADM8211 is not set
-# CONFIG_RTL8180 is not set
-# CONFIG_RTL8187 is not set
-# CONFIG_MAC80211_HWSIM is not set
-# CONFIG_MWL8K is not set
-# CONFIG_ATH_CARDS is not set
-CONFIG_B43=m
-CONFIG_B43_BCMA=y
-CONFIG_B43_SSB=y
-CONFIG_B43_BUSES_BCMA_AND_SSB=y
-# CONFIG_B43_BUSES_BCMA is not set
-# CONFIG_B43_BUSES_SSB is not set
-CONFIG_B43_PCI_AUTOSELECT=y
-CONFIG_B43_PCICORE_AUTOSELECT=y
-CONFIG_B43_SDIO=y
-CONFIG_B43_BCMA_PIO=y
-CONFIG_B43_PIO=y
-CONFIG_B43_PHY_G=y
-CONFIG_B43_PHY_N=y
-CONFIG_B43_PHY_LP=y
-CONFIG_B43_PHY_HT=y
-CONFIG_B43_LEDS=y
-CONFIG_B43_HWRNG=y
-# CONFIG_B43_DEBUG is not set
-# CONFIG_B43LEGACY is not set
-# CONFIG_BRCMSMAC is not set
-# CONFIG_BRCMFMAC is not set
-CONFIG_HOSTAP=m
-CONFIG_HOSTAP_FIRMWARE=y
-# CONFIG_HOSTAP_FIRMWARE_NVRAM is not set
-CONFIG_HOSTAP_PLX=m
-CONFIG_HOSTAP_PCI=m
-CONFIG_HOSTAP_CS=m
+# CONFIG_WLAN_VENDOR_ADMTEK is not set
+# CONFIG_WLAN_VENDOR_ATH is not set
+# CONFIG_WLAN_VENDOR_ATMEL is not set
+# CONFIG_WLAN_VENDOR_BROADCOM is not set
+# CONFIG_WLAN_VENDOR_CISCO is not set
+CONFIG_WLAN_VENDOR_INTEL=y
@@ -2307,0 +2283,2 @@
+# CONFIG_IWL4965 is not set
+# CONFIG_IWL3945 is not set
@@ -2321,14 +2298,13 @@
-# CONFIG_IWL4965 is not set
-# CONFIG_IWL3945 is not set
-# CONFIG_LIBERTAS is not set
-# CONFIG_HERMES is not set
-# CONFIG_P54_COMMON is not set
-# CONFIG_RT2X00 is not set
-# CONFIG_WL_MEDIATEK is not set
-# CONFIG_RTL_CARDS is not set
-# CONFIG_RTL8XXXU is not set
-# CONFIG_WL_TI is not set
-# CONFIG_ZD1211RW is not set
-# CONFIG_MWIFIEX is not set
-# CONFIG_CW1200 is not set
-# CONFIG_RSI_91X is not set
+# CONFIG_WLAN_VENDOR_INTERSIL is not set
+# CONFIG_WLAN_VENDOR_MARVELL is not set
+# CONFIG_WLAN_VENDOR_MEDIATEK is not set
+# CONFIG_WLAN_VENDOR_RALINK is not set
+# CONFIG_WLAN_VENDOR_REALTEK is not set
+# CONFIG_WLAN_VENDOR_RSI is not set
+# CONFIG_WLAN_VENDOR_ST is not set
+# CONFIG_WLAN_VENDOR_TI is not set
+# CONFIG_WLAN_VENDOR_ZYDAS is not set
+# CONFIG_PCMCIA_RAYCS is not set
+# CONFIG_PCMCIA_WL3501 is not set
+# CONFIG_MAC80211_HWSIM is not set
+# CONFIG_USB_NET_RNDIS_WLAN is not set
@@ -2466,0 +2443 @@
+# CONFIG_TOUCHSCREEN_EGALAX_SERIAL is not set
@@ -2612 +2589 @@
-CONFIG_PRINTER=m
+CONFIG_PRINTER=y
@@ -2614 +2591 @@
-CONFIG_PPDEV=m
+CONFIG_PPDEV=y
@@ -2766,0 +2744 @@
+# CONFIG_SPI_LOOPBACK_TEST is not set
@@ -2826,0 +2805 @@
+# CONFIG_GPIO_104_IDI_48 is not set
@@ -2993 +2971,0 @@
-# CONFIG_SENSORS_HTU21 is not set
@@ -3090,0 +3069 @@
+CONFIG_WATCHDOG_SYSFS=y
@@ -3096,0 +3076 @@
+# CONFIG_ZIIRAVE_WATCHDOG is not set
@@ -3151 +3130,0 @@
-CONFIG_SSB_BLOCKIO=y
@@ -3154 +3133 @@
-CONFIG_SSB_B43_PCI_BRIDGE=y
+# CONFIG_SSB_B43_PCI_BRIDGE is not set
@@ -3159 +3137,0 @@
-# CONFIG_SSB_HOST_SOC is not set
@@ -3171 +3148,0 @@
-CONFIG_BCMA_BLOCKIO=y
@@ -3256,0 +3234,2 @@
+# CONFIG_REGULATOR_PV88060 is not set
+# CONFIG_REGULATOR_PV88090 is not set
@@ -3565,0 +3545 @@
+# CONFIG_VIDEO_CS3308 is not set
@@ -3875 +3854,0 @@
-# CONFIG_DRM_RADEON_UMS is not set
@@ -3878,0 +3858 @@
+CONFIG_DRM_AMD_POWERPLAY=y
@@ -3917,0 +3898 @@
+CONFIG_FB_NOTIFY=y
@@ -4621,0 +4603 @@
+# CONFIG_RTC_DRV_RX8010 is not set
@@ -4842 +4823,0 @@
-# CONFIG_IIO_SIMPLE_DUMMY is not set
@@ -4870 +4851,2 @@
-# CONFIG_WILC1000_DRIVER is not set
+# CONFIG_WILC1000_SDIO is not set
+# CONFIG_WILC1000_SPI is not set
@@ -4896,0 +4879 @@
+# CONFIG_ASUS_WIRELESS is not set
@@ -4901,0 +4885 @@
+CONFIG_INTEL_HID_EVENT=m
@@ -4912,0 +4897 @@
+CONFIG_INTEL_PUNIT_IPC=m
@@ -4921,0 +4907,2 @@
+# CONFIG_COMMON_CLK_CS2000_CP is not set
+# CONFIG_COMMON_CLK_NXP is not set
@@ -4978,0 +4966 @@
+# CONFIG_IIO_CONFIGFS is not set
@@ -4980,0 +4969 @@
+# CONFIG_IIO_SW_TRIGGER is not set
@@ -4989,0 +4979,2 @@
+# CONFIG_MMA7455_I2C is not set
+# CONFIG_MMA7455_SPI is not set
@@ -4993,0 +4985 @@
+# CONFIG_MXC6255 is not set
@@ -5010,0 +5003 @@
+# CONFIG_INA2XX_ADC is not set
@@ -5028,0 +5022 @@
+# CONFIG_IAQCORE is not set
@@ -5061,0 +5056,5 @@
+# IIO dummy driver
+#
+# CONFIG_IIO_SIMPLE_DUMMY is not set
+
+#
@@ -5087,0 +5087,5 @@
+# Health sensors
+#
+# CONFIG_MAX30100 is not set
+
+#
@@ -5188,0 +5193,2 @@
+CONFIG_ARM_GIC_MAX_NR=1
+# CONFIG_TS4800_IRQ is not set
@@ -5297,0 +5304 @@
+# CONFIG_MANDATORY_FILE_LOCKING is not set
@@ -5574,0 +5582 @@
+# CONFIG_WQ_WATCHDOG is not set
@@ -5616,0 +5625 @@
+# CONFIG_DEBUG_WQ_FORCE_RR_CPU is not set
@@ -5692,0 +5702,3 @@
+CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
+# CONFIG_UBSAN is not set
+CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
@@ -5693,0 +5706 @@
+CONFIG_IO_STRICT_DEVMEM=y
@@ -5749,0 +5763 @@
+# CONFIG_INTEGRITY_TRUSTED_KEYRING is not set
@@ -5930,0 +5945,2 @@
+CONFIG_CRYPTO_DEV_QAT_C3XXX=m
+CONFIG_CRYPTO_DEV_QAT_C62X=m
@@ -5931,0 +5948,2 @@
+CONFIG_CRYPTO_DEV_QAT_C3XXXVF=m
+CONFIG_CRYPTO_DEV_QAT_C62XVF=m
@@ -6040,0 +6059 @@
+# CONFIG_IRQ_POLL is not set


> * try to reproduce on debian testing or even stable. I have had similar
> issues with debian unstable in the past.

Gut feeling is that this is not the case. Why would it only appear on
VM hosts then? Debian unstable and testing are pretty close together
these days, and the issue is around for a month now in current
unstable. Btw, I am a DD, I know my way around debian and my gut
feeling is pretty well calibrated here.

I can try stable in the VM, but I'd rather not take the host out of
business. Would that help?

My CPU is also rather old:

processor       : 5
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 10
model name      : AMD Phenom(tm) II X6 1090T Processor
stepping        : 0
microcode       : 0x10000dc
cpu MHz         : 1600.000
cache size      : 512 KB
physical id     : 0
siblings        : 6
core id         : 5
cpu cores       : 6
apicid          : 5
initial apicid  : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr cpb hw_pstate vmmcall npt lbrv svm_lock nrip_save pausefilter
bugs            : tlb_mmatch apic_c1e fxsave_leak sysret_ss_attrs
bogomips        : 6428.52
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate cpb

afaik, this CPU is not affected by the current microcode issues, isn't
it? I do have the amd64-microcode package installed, which is supposed
to do everything automatically, and my initramfs doesn't have a
microcode "partition" prepended, gunzip | cpio -i gives the plain
initramfs contents directly.

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Marc Haber-22
In reply to this post by Borislav Petkov-3
On Sun, Mar 20, 2016 at 07:58:13PM +0100, Borislav Petkov wrote:
> So I'm not sure what even happens here yet. I haven't seen anything out
> of the ordinary in Marc's dmesg and I wasn't able to reproduce either.
> So would it be good to try with "npt=0"? Sure, why not.

npt=0 goes on the kernel command line of the host or of the guest? Or
is it a KVM option?

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Marc Haber-22
In reply to this post by Paolo Bonzini
On Fri, Mar 18, 2016 at 11:01:46AM +0100, Paolo Bonzini wrote:
> On 17/03/2016 19:11, Borislav Petkov wrote:
> > I'm going to try reproducing the issue on a less "important" machine
> > so that bisecting is less painful, but maybe you guys have an idea
> > what's going wrong here.
>
> No idea, sorry. :(  Bisecting would be great.

Working on that now.

>   I'll also try reproducing and bisecting next week, in the meanwhile
>   just having the host dmesg would help a lot.

Attached. I hope the message will get through to the list.

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

dmesg.fan.4.5.1 (75K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Paolo Bonzini


On 13/04/2016 20:37, Marc Haber wrote:

> On Fri, Mar 18, 2016 at 11:01:46AM +0100, Paolo Bonzini wrote:
>> On 17/03/2016 19:11, Borislav Petkov wrote:
>>> I'm going to try reproducing the issue on a less "important" machine
>>> so that bisecting is less painful, but maybe you guys have an idea
>>> what's going wrong here.
>>
>> No idea, sorry. :(  Bisecting would be great.
>
> Working on that now.
>
>>   I'll also try reproducing and bisecting next week, in the meanwhile
>>   just having the host dmesg would help a lot.
>
> Attached. I hope the message will get through to the list.

Didn't help, but a fresh look at the list of 4.5 patches helped.
What the hell was I thinking, I missed write_rdtscp_aux who
obviously uses MSR_TSC_AUX.

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 31346a3f20a5..1481dea15844 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -39,6 +39,7 @@
 #include <asm/kvm_para.h>
 
 #include <asm/virtext.h>
+#include <asm/vgtod.h>
 #include "trace.h"
 
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
@@ -1240,9 +1241,6 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
  wrmsrl(MSR_AMD64_TSC_RATIO, tsc_ratio);
  }
  }
- /* This assumes that the kernel never uses MSR_TSC_AUX */
- if (static_cpu_has(X86_FEATURE_RDTSCP))
- wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
@@ -3847,6 +3845,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
  svm->vmcb->save.cr2 = vcpu->arch.cr2;
 
  clgi();
+ if (static_cpu_has(X86_FEATURE_RDTSCP))
+ wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
  local_irq_enable();
 
@@ -3923,6 +3923,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
  );
 
+ if (static_cpu_has(X86_FEATURE_RDTSCP))
+ wrmsrl(MSR_TSC_AUX, __getcpu());
 #ifdef CONFIG_X86_64
  wrmsrl(MSR_GS_BASE, svm->host.gs_base);
 #else


Paolo
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Paolo Bonzini
In reply to this post by Marc Haber-22


On 13/04/2016 20:22, Marc Haber wrote:
>> So I'm not sure what even happens here yet. I haven't seen anything out
>> > of the ordinary in Marc's dmesg and I wasn't able to reproduce either.
>> > So would it be good to try with "npt=0"? Sure, why not.
> npt=0 goes on the kernel command line of the host or of the guest? Or
> is it a KVM option?

It is an option to the kvm-amd module, but I think I found it.

Paolo
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Marc Haber-22
In reply to this post by Paolo Bonzini
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
> Didn't help, but a fresh look at the list of 4.5 patches helped.
> What the hell was I thinking, I missed write_rdtscp_aux who
> obviously uses MSR_TSC_AUX.

So you want me to apply that to 4.5 od 4.5.1 and try that?

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Marc Haber-22
In reply to this post by Paolo Bonzini
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
> Didn't help, but a fresh look at the list of 4.5 patches helped.
> What the hell was I thinking, I missed write_rdtscp_aux who
> obviously uses MSR_TSC_AUX.

I applied this patch to 4.5, which didn't go cleanly, I had to do it
manually, and there is no change in behavior. Sometimes, the Vm just
crashes, but most times the filesystem is remounted ro.

[   84.658968] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27903
[   84.664877] Aborting journal on device dm-0-8.
[   84.667992] EXT4-fs (dm-0): Remounting filesystem read-only
[   84.670972] EXT4-fs error (device dm-0): ext4_journal_check_start:56: Detected aborted journal
[   84.763331] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27898
[   84.825412] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27895
[   84.907959] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27893
[   84.915187] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27900
[   84.961062] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27889
[   84.983700] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27891
[   98.315538] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27897
[   98.323606] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27904
[   99.889927] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27892
[   99.893823] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27901
[   99.901140] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27890
[   99.904898] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27896
[   99.909758] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27899
[   99.914394] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27894
[  207.132045] serial8250: too much work for irq4
[  207.220043] serial8250: too much work for irq4
[  207.312028] serial8250: too much work for irq4


Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
Reply | Threaded
Open this post in threaded view
|

Re: Major KVM issues with kernel 4.5 on the host

Paolo Bonzini


On 14/04/2016 00:29, Marc Haber wrote:
> On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote:
>> Didn't help, but a fresh look at the list of 4.5 patches helped.
>> What the hell was I thinking, I missed write_rdtscp_aux who
>> obviously uses MSR_TSC_AUX.
>
> I applied this patch to 4.5, which didn't go cleanly, I had to do it
> manually, and there is no change in behavior. Sometimes, the Vm just
> crashes, but most times the filesystem is remounted ro.

Ok, then I guess bisection is needed.  Please first try commit
45bdbcfdf241.  If it fails, then the bug come together with KVM's merge
window changes for 4.5-rc1.  Please apply the patch I sent here when
bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means
that probably that should be the commit you try second; the bisection
then becomes much easier).

Thanks,

Paolo

> [   84.658968] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27903
> [   84.664877] Aborting journal on device dm-0-8.
> [   84.667992] EXT4-fs (dm-0): Remounting filesystem read-only
> [   84.670972] EXT4-fs error (device dm-0): ext4_journal_check_start:56: Detected aborted journal
> [   84.763331] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27898
> [   84.825412] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27895
> [   84.907959] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27893
> [   84.915187] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27900
> [   84.961062] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27889
> [   84.983700] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27891
> [   98.315538] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27897
> [   98.323606] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27904
> [   99.889927] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27892
> [   99.893823] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27901
> [   99.901140] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27890
> [   99.904898] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27896
> [   99.909758] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27899
> [   99.914394] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27894
> [  207.132045] serial8250: too much work for irq4
> [  207.220043] serial8250: too much work for irq4
> [  207.312028] serial8250: too much work for irq4
>
>
> Greetings
> Marc
>
123