[PATCH 0/5] vfio-pci: Add support for mmapping MSI-X table

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/5] vfio-pci: Add support for mmapping MSI-X table

Yongji Xie
Current vfio-pci implementation disallows to mmap the page
containing MSI-X table in case that users can write to MSI-X
table and generate an incorrect MSIs.

However, this will cause some performance issue when there
are some critical device registers in the same page as the
MSI-X table. We have to handle the mmio access to these
registers in QEMU emulation rather than in guest.

To solve this issue, this series allows to mmap MSI-X table
when hardware supports the capability of interrupt remapping
which can ensure that a given pci device can only shoot the
MSIs assigned for it. And we introduce a new bus_flags
PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
for different archs.

The patch 3 are based on the proposed patchset[1].

[1] http://www.spinics.net/lists/kvm/msg130256.html

Yongji Xie (5):
  PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag
  iommu: Set PCI_BUS_FLAGS_MSI_REMAP if IOMMU have capability of IRQ remapping
  PCI: Set PCI_BUS_FLAGS_MSI_REMAP if MSI controller supports IRQ remapping
  pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge
  vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

 arch/powerpc/platforms/powernv/pci-ioda.c |    8 ++++++++
 drivers/iommu/iommu.c                     |   15 +++++++++++++++
 drivers/pci/msi.c                         |   12 ++++++++++++
 drivers/pci/probe.c                       |    3 +++
 drivers/vfio/pci/vfio_pci.c               |    7 +++++--
 drivers/vfio/pci/vfio_pci_rdwr.c          |    3 ++-
 include/linux/msi.h                       |    6 +++++-
 include/linux/pci.h                       |    1 +
 8 files changed, 51 insertions(+), 4 deletions(-)

--
1.7.9.5

Reply | Threaded
Open this post in threaded view
|

[PATCH 2/5] iommu: Set PCI_BUS_FLAGS_MSI_REMAP if IOMMU have capability of IRQ remapping

Yongji Xie
The capability of IRQ remapping is abstracted on IOMMU side on
some archs. There is a existing flag IOMMU_CAP_INTR_REMAP for this.

To have a universal flag to test this capability for different
archs on PCI side, we set PCI_BUS_FLAGS_MSI_REMAP for PCI buses
when IOMMU_CAP_INTR_REMAP is set.

Signed-off-by: Yongji Xie <[hidden email]>
---
 drivers/iommu/iommu.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0e3b009..5d2b6f6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -813,6 +813,16 @@ struct iommu_group *pci_device_group(struct device *dev)
  return group;
 }
 
+static void pci_check_msi_remapping(struct pci_dev *pdev,
+ const struct iommu_ops *ops)
+{
+ struct pci_bus *bus = pdev->bus;
+
+ if (ops->capable(IOMMU_CAP_INTR_REMAP) &&
+ !(bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP))
+ bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+}
+
 /**
  * iommu_group_get_for_dev - Find or create the IOMMU group for a device
  * @dev: target device
@@ -871,6 +881,9 @@ static int add_iommu_group(struct device *dev, void *data)
  const struct iommu_ops *ops = cb->ops;
  int ret;
 
+ if (dev_is_pci(dev) && ops->capable)
+ pci_check_msi_remapping(to_pci_dev(dev), ops);
+
  if (!ops->add_device)
  return 0;
 
@@ -913,6 +926,8 @@ static int iommu_bus_notifier(struct notifier_block *nb,
  * result in ADD/DEL notifiers to group->notifier
  */
  if (action == BUS_NOTIFY_ADD_DEVICE) {
+ if (dev_is_pci(dev) && ops->capable)
+ pci_check_msi_remapping(to_pci_dev(dev), ops);
  if (ops->add_device)
  return ops->add_device(dev);
  } else if (action == BUS_NOTIFY_REMOVED_DEVICE) {
--
1.7.9.5

Reply | Threaded
Open this post in threaded view
|

[PATCH 3/5] PCI: Set PCI_BUS_FLAGS_MSI_REMAP if MSI controller supports IRQ remapping

Yongji Xie
In reply to this post by Yongji Xie
On ARM HW the capability of IRQ remapping is abstracted on
MSI controller side. MSI_FLAG_IRQ_REMAPPING is used to advertise
this [1].

To have a universal flag to test this capability for different
archs on PCI side, we set PCI_BUS_FLAGS_MSI_REMAP for PCI buses
when MSI_FLAG_IRQ_REMAPPING is set.

[1] http://www.spinics.net/lists/kvm/msg130256.html

Signed-off-by: Yongji Xie <[hidden email]>
---
 drivers/pci/msi.c   |   12 ++++++++++++
 drivers/pci/probe.c |    3 +++
 include/linux/msi.h |    6 +++++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a080f44..1661cdf 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1134,6 +1134,18 @@ void *msi_desc_to_pci_sysdata(struct msi_desc *desc)
 }
 EXPORT_SYMBOL_GPL(msi_desc_to_pci_sysdata);
 
+void pci_bus_check_msi_remapping(struct pci_bus *bus,
+ struct irq_domain *domain)
+{
+#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
+ struct msi_domain_info *info;
+
+ info = msi_get_domain_info(domain);
+ if (info->flags & MSI_FLAG_IRQ_REMAPPING)
+ bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+#endif
+}
+
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
 /**
  * pci_msi_domain_write_msg - Helper to write MSI message to PCI config space
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6d7ab9b..25cf1b1 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -696,6 +696,9 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus)
  if (!d)
  d = pci_host_bridge_msi_domain(b);
 
+ if (d && b == bus)
+ pci_bus_check_msi_remapping(bus, d);
+
  dev_set_msi_domain(&bus->dev, d);
 }
 
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 03eda72..b4c649e 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -15,6 +15,8 @@ extern int pci_msi_ignore_mask;
 struct irq_data;
 struct msi_desc;
 struct pci_dev;
+struct pci_bus;
+struct irq_domain;
 struct platform_msi_priv_data;
 void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg);
 void get_cached_msi_msg(unsigned int irq, struct msi_msg *msg);
@@ -155,6 +157,9 @@ void arch_restore_msi_irqs(struct pci_dev *dev);
 void default_teardown_msi_irqs(struct pci_dev *dev);
 void default_restore_msi_irqs(struct pci_dev *dev);
 
+void pci_bus_check_msi_remapping(struct pci_bus *bus,
+ struct irq_domain *domain);
+
 struct msi_controller {
  struct module *owner;
  struct device *dev;
@@ -173,7 +178,6 @@ struct msi_controller {
 #include <linux/irqhandler.h>
 #include <asm/msi.h>
 
-struct irq_domain;
 struct irq_domain_ops;
 struct irq_chip;
 struct device_node;
--
1.7.9.5

Reply | Threaded
Open this post in threaded view
|

[PATCH 1/5] PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag

Yongji Xie
In reply to this post by Yongji Xie
We introduce a new pci_bus_flags, PCI_BUS_FLAGS_MSI_REMAP
which indicates all devices on the bus are protected by the
hardware which supports IRQ remapping(intel naming).

This flag will be used to know whether it's safe to expose
MSI-X tables of PCI BARs to userspace. Because the capability
of IRQ remapping can guarantee the PCI device cannot trigger
MSIs that correspond to interrupt IDs of other devices.

There is a existing flag for this in the IOMMU space:

enum iommu_cap {
        IOMMU_CAP_CACHE_COHERENCY,
---> IOMMU_CAP_INTR_REMAP,
        IOMMU_CAP_NOEXEC,
};

and Eric also posted a patchset [1] to abstract this
capability on MSI controller side for ARM. But it would
make sense to have a more common flag like
PCI_BUS_FLAGS_MSI_REMAP in this patch so that we can use
a universal flag to test this capability on PCI side for
different archs.

[1] http://www.spinics.net/lists/kvm/msg130256.html

Signed-off-by: Yongji Xie <[hidden email]>
---
 include/linux/pci.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 27df4a6..d619228 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -193,6 +193,7 @@ typedef unsigned short __bitwise pci_bus_flags_t;
 enum pci_bus_flags {
  PCI_BUS_FLAGS_NO_MSI   = (__force pci_bus_flags_t) 1,
  PCI_BUS_FLAGS_NO_MMRBC = (__force pci_bus_flags_t) 2,
+ PCI_BUS_FLAGS_MSI_REMAP = (__force pci_bus_flags_t) 4,
 };
 
 /* These values come from the PCI Express Spec */
--
1.7.9.5

Reply | Threaded
Open this post in threaded view
|

[PATCH 4/5] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge

Yongji Xie
In reply to this post by Yongji Xie
Any IODA host bridge have the capability of IRQ remapping.
So we set PCI_BUS_FLAGS_MSI_REMAP when this kind of host birdge
is detected.

Signed-off-by: Yongji Xie <[hidden email]>
---
 arch/powerpc/platforms/powernv/pci-ioda.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f90dc04..9557638 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3080,6 +3080,12 @@ static void pnv_pci_ioda_fixup(void)
  pnv_npu_ioda_fixup();
 }
 
+int pnv_pci_ioda_root_bridge_prepare(struct pci_host_bridge *bridge)
+{
+ bridge->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+ return 0;
+}
+
 /*
  * Returns the alignment for I/O or memory windows for P2P
  * bridges. That actually depends on how PEs are segmented.
@@ -3364,6 +3370,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
  */
  ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 
+ ppc_md.pcibios_root_bridge_prepare = pnv_pci_ioda_root_bridge_prepare;
+
  if (phb->type == PNV_PHB_NPU)
  hose->controller_ops = pnv_npu_ioda_controller_ops;
  else
--
1.7.9.5

Reply | Threaded
Open this post in threaded view
|

[PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Yongji Xie
In reply to this post by Yongji Xie
This patch enables mmapping MSI-X tables if hardware supports
interrupt remapping which can ensure that a given pci device
can only shoot the MSIs assigned for it.

With MSI-X table mmapped, we also need to expose the
read/write interface which will be used to access MSI-X table.

Signed-off-by: Yongji Xie <[hidden email]>
---
 drivers/vfio/pci/vfio_pci.c      |    7 +++++--
 drivers/vfio/pci/vfio_pci_rdwr.c |    3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 98059df..7eba77d 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -591,7 +591,9 @@ static long vfio_pci_ioctl(void *device_data,
     pci_resource_flags(pdev, info.index) &
     IORESOURCE_MEM && info.size >= PAGE_SIZE) {
  info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
- if (info.index == vdev->msix_bar) {
+ if (info.index == vdev->msix_bar &&
+ !(pdev->bus->bus_flags &
+ PCI_BUS_FLAGS_MSI_REMAP)) {
  ret = msix_sparse_mmap_cap(vdev, &caps);
  if (ret)
  return ret;
@@ -1023,7 +1025,8 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
  if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
  return -EINVAL;
 
- if (index == vdev->msix_bar) {
+ if (index == vdev->msix_bar &&
+ !(pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) {
  /*
  * Disallow mmaps overlapping the MSI-X table; users don't
  * get to touch this directly.  We could find somewhere
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 5ffd1d9..dbf9cd0 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -164,7 +164,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
  } else
  io = vdev->barmap[bar];
 
- if (bar == vdev->msix_bar) {
+ if (bar == vdev->msix_bar &&
+ !(pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) {
  x_start = vdev->msix_offset;
  x_end = vdev->msix_offset + vdev->msix_size;
  }
--
1.7.9.5

Reply | Threaded
Open this post in threaded view
|

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Tian, Kevin
> From: Yongji Xie
> Sent: Wednesday, April 27, 2016 8:43 PM
>
> This patch enables mmapping MSI-X tables if hardware supports
> interrupt remapping which can ensure that a given pci device
> can only shoot the MSIs assigned for it.
>
> With MSI-X table mmapped, we also need to expose the
> read/write interface which will be used to access MSI-X table.
>
> Signed-off-by: Yongji Xie <[hidden email]>

A curious question here. Does "allow to mmap MSI-X" essentially
mean that KVM guest can directly read/write physical MSI-X
structure then?

Thanks
Kevin
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Yongji Xie
On 2016/5/3 13:34, Tian, Kevin wrote:

>> From: Yongji Xie
>> Sent: Wednesday, April 27, 2016 8:43 PM
>>
>> This patch enables mmapping MSI-X tables if hardware supports
>> interrupt remapping which can ensure that a given pci device
>> can only shoot the MSIs assigned for it.
>>
>> With MSI-X table mmapped, we also need to expose the
>> read/write interface which will be used to access MSI-X table.
>>
>> Signed-off-by: Yongji Xie <[hidden email]>
> A curious question here. Does "allow to mmap MSI-X" essentially
> mean that KVM guest can directly read/write physical MSI-X
> structure then?
>
> Thanks
> Kevin
>

Here we just allow to mmap MSI-X table in kernel. It doesn't
mean all KVM guest can directly read/write physical MSI-X
structure. This should be decided by QEMU. For PPC64
platform, we would allow to passthrough the MSI-X table
because we know guest kernel would not write physical
MSI-X structure when enabling MSI.

Thanks,
Yongji

Reply | Threaded
Open this post in threaded view
|

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Tian, Kevin
> From: Yongji Xie [mailto:[hidden email]]
> Sent: Tuesday, May 03, 2016 2:08 PM
>
> On 2016/5/3 13:34, Tian, Kevin wrote:
>
> >> From: Yongji Xie
> >> Sent: Wednesday, April 27, 2016 8:43 PM
> >>
> >> This patch enables mmapping MSI-X tables if hardware supports
> >> interrupt remapping which can ensure that a given pci device
> >> can only shoot the MSIs assigned for it.
> >>
> >> With MSI-X table mmapped, we also need to expose the
> >> read/write interface which will be used to access MSI-X table.
> >>
> >> Signed-off-by: Yongji Xie <[hidden email]>
> > A curious question here. Does "allow to mmap MSI-X" essentially
> > mean that KVM guest can directly read/write physical MSI-X
> > structure then?
> >
> > Thanks
> > Kevin
> >
>
> Here we just allow to mmap MSI-X table in kernel. It doesn't
> mean all KVM guest can directly read/write physical MSI-X
> structure. This should be decided by QEMU. For PPC64
> platform, we would allow to passthrough the MSI-X table
> because we know guest kernel would not write physical
> MSI-X structure when enabling MSI.
>

A bit confused here. If guest kernel doesn't need to write
physical MSI-X structure, what's the point of passing through
the table then?

I think the key whether MSI-X table can be passed through
is related to where hypervisor control is deployed. At least
for x86:

- When irq remapping is not enabled, host/hypervisor needs
to control physical interrupt message including vector/dest/etc.
directly in MSI-X structure, so we cannot allow a guest to
access it;

- when irq remapping is enabled, host/hypervisor can control
interrupt routing in irq remapping table. However MSI-X
also needs to be configured as remappable format. In this
manner we also cannot allow direct access from guest.

The only sane case to pass through MSI-X structure, is a
mechanism similar to irq remapping but w/o need to change
original MSI-X format so direct access from guest side is
safe. Is it the case in PPC64?

Thanks
Kevin
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Yongji Xie
On 2016/5/3 14:22, Tian, Kevin wrote:

>> From: Yongji Xie [mailto:[hidden email]]
>> Sent: Tuesday, May 03, 2016 2:08 PM
>>
>> On 2016/5/3 13:34, Tian, Kevin wrote:
>>
>>>> From: Yongji Xie
>>>> Sent: Wednesday, April 27, 2016 8:43 PM
>>>>
>>>> This patch enables mmapping MSI-X tables if hardware supports
>>>> interrupt remapping which can ensure that a given pci device
>>>> can only shoot the MSIs assigned for it.
>>>>
>>>> With MSI-X table mmapped, we also need to expose the
>>>> read/write interface which will be used to access MSI-X table.
>>>>
>>>> Signed-off-by: Yongji Xie <[hidden email]>
>>> A curious question here. Does "allow to mmap MSI-X" essentially
>>> mean that KVM guest can directly read/write physical MSI-X
>>> structure then?
>>>
>>> Thanks
>>> Kevin
>>>
>> Here we just allow to mmap MSI-X table in kernel. It doesn't
>> mean all KVM guest can directly read/write physical MSI-X
>> structure. This should be decided by QEMU. For PPC64
>> platform, we would allow to passthrough the MSI-X table
>> because we know guest kernel would not write physical
>> MSI-X structure when enabling MSI.
>>
> A bit confused here. If guest kernel doesn't need to write
> physical MSI-X structure, what's the point of passing through
> the table then?

We want to allow the MSI-X table because there may be
some critical registers in the same page as the MSI-X table.
We have to handle the mmio access to these register in QEMU
rather than in guest if mmapping MSI-X table is disallowed.

> I think the key whether MSI-X table can be passed through
> is related to where hypervisor control is deployed. At least
> for x86:
>
> - When irq remapping is not enabled, host/hypervisor needs
> to control physical interrupt message including vector/dest/etc.
> directly in MSI-X structure, so we cannot allow a guest to
> access it;
>
> - when irq remapping is enabled, host/hypervisor can control
> interrupt routing in irq remapping table. However MSI-X
> also needs to be configured as remappable format. In this
> manner we also cannot allow direct access from guest.
>
> The only sane case to pass through MSI-X structure, is a
> mechanism similar to irq remapping but w/o need to change
> original MSI-X format so direct access from guest side is
> safe. Is it the case in PPC64?
>
> Thanks
> Kevin

Acutually, we are not aimed at accessing MSI-X table from
guest. So I think it's safe to passthrough MSI-X table if we
can make sure guest kernel would not touch MSI-X table in
normal code path such as para-virtualized guest kernel on PPC64.

Thanks,
Yongji

Reply | Threaded
Open this post in threaded view
|

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Tian, Kevin
> From: Yongji Xie
> Sent: Tuesday, May 03, 2016 3:34 PM
>
> On 2016/5/3 14:22, Tian, Kevin wrote:
>
> >> From: Yongji Xie [mailto:[hidden email]]
> >> Sent: Tuesday, May 03, 2016 2:08 PM
> >>
> >> On 2016/5/3 13:34, Tian, Kevin wrote:
> >>
> >>>> From: Yongji Xie
> >>>> Sent: Wednesday, April 27, 2016 8:43 PM
> >>>>
> >>>> This patch enables mmapping MSI-X tables if hardware supports
> >>>> interrupt remapping which can ensure that a given pci device
> >>>> can only shoot the MSIs assigned for it.
> >>>>
> >>>> With MSI-X table mmapped, we also need to expose the
> >>>> read/write interface which will be used to access MSI-X table.
> >>>>
> >>>> Signed-off-by: Yongji Xie <[hidden email]>
> >>> A curious question here. Does "allow to mmap MSI-X" essentially
> >>> mean that KVM guest can directly read/write physical MSI-X
> >>> structure then?
> >>>
> >>> Thanks
> >>> Kevin
> >>>
> >> Here we just allow to mmap MSI-X table in kernel. It doesn't
> >> mean all KVM guest can directly read/write physical MSI-X
> >> structure. This should be decided by QEMU. For PPC64
> >> platform, we would allow to passthrough the MSI-X table
> >> because we know guest kernel would not write physical
> >> MSI-X structure when enabling MSI.
> >>
> > A bit confused here. If guest kernel doesn't need to write
> > physical MSI-X structure, what's the point of passing through
> > the table then?
>
> We want to allow the MSI-X table because there may be
> some critical registers in the same page as the MSI-X table.
> We have to handle the mmio access to these register in QEMU
> rather than in guest if mmapping MSI-X table is disallowed.

So you mean critical registers in same MMIO BAR as MSI-X
table, instead of two MMIO BARs in same page (the latter I
suppose with your whole patchset it won't happen then)?

>
> > I think the key whether MSI-X table can be passed through
> > is related to where hypervisor control is deployed. At least
> > for x86:
> >
> > - When irq remapping is not enabled, host/hypervisor needs
> > to control physical interrupt message including vector/dest/etc.
> > directly in MSI-X structure, so we cannot allow a guest to
> > access it;
> >
> > - when irq remapping is enabled, host/hypervisor can control
> > interrupt routing in irq remapping table. However MSI-X
> > also needs to be configured as remappable format. In this
> > manner we also cannot allow direct access from guest.
> >
> > The only sane case to pass through MSI-X structure, is a
> > mechanism similar to irq remapping but w/o need to change
> > original MSI-X format so direct access from guest side is
> > safe. Is it the case in PPC64?
> >
> > Thanks
> > Kevin
>
> Acutually, we are not aimed at accessing MSI-X table from
> guest. So I think it's safe to passthrough MSI-X table if we
> can make sure guest kernel would not touch MSI-X table in
> normal code path such as para-virtualized guest kernel on PPC64.
>

Then how do you prevent malicious guest kernel accessing it?

Thanks
Kevin
Reply | Threaded
Open this post in threaded view
|

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

David Laight-3
From: Tian, Kevin
> Sent: 05 May 2016 10:37
...
> > Acutually, we are not aimed at accessing MSI-X table from
> > guest. So I think it's safe to passthrough MSI-X table if we
> > can make sure guest kernel would not touch MSI-X table in
> > normal code path such as para-virtualized guest kernel on PPC64.
> >
>
> Then how do you prevent malicious guest kernel accessing it?

Or a malicious guest driver for an ethernet card setting up
the receive buffer ring to contain a single word entry that
contains the address associated with an MSI-X interrupt and
then using a loopback mode to cause a specific packet be
received that writes the required word through that address.

Remember the PCIe cycle for an interrupt is a normal memory write
cycle.

        David

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Yongji Xie
Hi David and Kevin,

On 2016/5/5 17:54, David Laight wrote:

> From: Tian, Kevin
>> Sent: 05 May 2016 10:37
> ...
>>> Acutually, we are not aimed at accessing MSI-X table from
>>> guest. So I think it's safe to passthrough MSI-X table if we
>>> can make sure guest kernel would not touch MSI-X table in
>>> normal code path such as para-virtualized guest kernel on PPC64.
>>>
>> Then how do you prevent malicious guest kernel accessing it?
> Or a malicious guest driver for an ethernet card setting up
> the receive buffer ring to contain a single word entry that
> contains the address associated with an MSI-X interrupt and
> then using a loopback mode to cause a specific packet be
> received that writes the required word through that address.
>
> Remember the PCIe cycle for an interrupt is a normal memory write
> cycle.
>
> David
>

If we have enough permission to load a malicious driver or
kernel, we can easily break the guest without exposed
MSI-X table.

I think it should be safe to expose MSI-X table if we can
make sure that malicious guest driver/kernel can't use
the MSI-X table to break other guest or host. The
capability of IRQ remapping could provide this
kind of protection.

Thanks,
Yongji

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Yongji Xie
In reply to this post by Tian, Kevin
On 2016/5/5 17:36, Tian, Kevin wrote:

>> From: Yongji Xie
>> Sent: Tuesday, May 03, 2016 3:34 PM
>>
>> On 2016/5/3 14:22, Tian, Kevin wrote:
>>
>>>> From: Yongji Xie [mailto:[hidden email]]
>>>> Sent: Tuesday, May 03, 2016 2:08 PM
>>>>
>>>> On 2016/5/3 13:34, Tian, Kevin wrote:
>>>>
>>>>>> From: Yongji Xie
>>>>>> Sent: Wednesday, April 27, 2016 8:43 PM
>>>>>>
>>>>>> This patch enables mmapping MSI-X tables if hardware supports
>>>>>> interrupt remapping which can ensure that a given pci device
>>>>>> can only shoot the MSIs assigned for it.
>>>>>>
>>>>>> With MSI-X table mmapped, we also need to expose the
>>>>>> read/write interface which will be used to access MSI-X table.
>>>>>>
>>>>>> Signed-off-by: Yongji Xie <[hidden email]>
>>>>> A curious question here. Does "allow to mmap MSI-X" essentially
>>>>> mean that KVM guest can directly read/write physical MSI-X
>>>>> structure then?
>>>>>
>>>>> Thanks
>>>>> Kevin
>>>>>
>>>> Here we just allow to mmap MSI-X table in kernel. It doesn't
>>>> mean all KVM guest can directly read/write physical MSI-X
>>>> structure. This should be decided by QEMU. For PPC64
>>>> platform, we would allow to passthrough the MSI-X table
>>>> because we know guest kernel would not write physical
>>>> MSI-X structure when enabling MSI.
>>>>
>>> A bit confused here. If guest kernel doesn't need to write
>>> physical MSI-X structure, what's the point of passing through
>>> the table then?
>> We want to allow the MSI-X table because there may be
>> some critical registers in the same page as the MSI-X table.
>> We have to handle the mmio access to these register in QEMU
>> rather than in guest if mmapping MSI-X table is disallowed.
> So you mean critical registers in same MMIO BAR as MSI-X
> table, instead of two MMIO BARs in same page (the latter I
> suppose with your whole patchset it won't happen then)?

Yes. That's what I mean!

Thanks,
Yongji

Reply | Threaded
Open this post in threaded view
|

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Tian, Kevin
In reply to this post by Yongji Xie
> From: Yongji Xie [mailto:[hidden email]]
> Sent: Thursday, May 05, 2016 7:43 PM
>
> Hi David and Kevin,
>
> On 2016/5/5 17:54, David Laight wrote:
>
> > From: Tian, Kevin
> >> Sent: 05 May 2016 10:37
> > ...
> >>> Acutually, we are not aimed at accessing MSI-X table from
> >>> guest. So I think it's safe to passthrough MSI-X table if we
> >>> can make sure guest kernel would not touch MSI-X table in
> >>> normal code path such as para-virtualized guest kernel on PPC64.
> >>>
> >> Then how do you prevent malicious guest kernel accessing it?
> > Or a malicious guest driver for an ethernet card setting up
> > the receive buffer ring to contain a single word entry that
> > contains the address associated with an MSI-X interrupt and
> > then using a loopback mode to cause a specific packet be
> > received that writes the required word through that address.
> >
> > Remember the PCIe cycle for an interrupt is a normal memory write
> > cycle.
> >
> > David
> >
>
> If we have enough permission to load a malicious driver or
> kernel, we can easily break the guest without exposed
> MSI-X table.
>
> I think it should be safe to expose MSI-X table if we can
> make sure that malicious guest driver/kernel can't use
> the MSI-X table to break other guest or host. The
> capability of IRQ remapping could provide this
> kind of protection.
>

With IRQ remapping it doesn't mean you can pass through MSI-X
structure to guest. I know actual IRQ remapping might be platform
specific, but at least for Intel VT-d specification, MSI-X entry must
be configured with a remappable format by host kernel which
contains an index into IRQ remapping table. The index will find a
IRQ remapping entry which controls interrupt routing for a specific
device. If you allow a malicious program random index into MSI-X
entry of assigned device, the hole is obvious...

Above might make sense only for a IRQ remapping implementation
which doesn't rely on extended MSI-X format (e.g. simply based on
BDF). If that's the case for PPC, then you should build MSI-X
passthrough based on this fact instead of general IRQ remapping
enabled or not.

Thanks
Kevin
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Yongji Xie
On 2016/5/5 20:15, Tian, Kevin wrote:

>> From: Yongji Xie [mailto:[hidden email]]
>> Sent: Thursday, May 05, 2016 7:43 PM
>>
>> Hi David and Kevin,
>>
>> On 2016/5/5 17:54, David Laight wrote:
>>
>>> From: Tian, Kevin
>>>> Sent: 05 May 2016 10:37
>>> ...
>>>>> Acutually, we are not aimed at accessing MSI-X table from
>>>>> guest. So I think it's safe to passthrough MSI-X table if we
>>>>> can make sure guest kernel would not touch MSI-X table in
>>>>> normal code path such as para-virtualized guest kernel on PPC64.
>>>>>
>>>> Then how do you prevent malicious guest kernel accessing it?
>>> Or a malicious guest driver for an ethernet card setting up
>>> the receive buffer ring to contain a single word entry that
>>> contains the address associated with an MSI-X interrupt and
>>> then using a loopback mode to cause a specific packet be
>>> received that writes the required word through that address.
>>>
>>> Remember the PCIe cycle for an interrupt is a normal memory write
>>> cycle.
>>>
>>> David
>>>
>> If we have enough permission to load a malicious driver or
>> kernel, we can easily break the guest without exposed
>> MSI-X table.
>>
>> I think it should be safe to expose MSI-X table if we can
>> make sure that malicious guest driver/kernel can't use
>> the MSI-X table to break other guest or host. The
>> capability of IRQ remapping could provide this
>> kind of protection.
>>
> With IRQ remapping it doesn't mean you can pass through MSI-X
> structure to guest. I know actual IRQ remapping might be platform
> specific, but at least for Intel VT-d specification, MSI-X entry must
> be configured with a remappable format by host kernel which
> contains an index into IRQ remapping table. The index will find a
> IRQ remapping entry which controls interrupt routing for a specific
> device. If you allow a malicious program random index into MSI-X
> entry of assigned device, the hole is obvious...

Do you mean we can trigger MSIs that correspond to interrupt
IDs of other devices by writing to MSI-X table although IRQ
remapping is enabled?

On PPC64, there is a mapping between MSIs and PE num
which can be used to identify a PCI device on PHB. So the
hardware can ensure a given pci device can only shoot the
MSIs assigned for it.  Isn't there a similar mapping in IRQ
remapping table on Intel.

Thanks,
Yongji

> Above might make sense only for a IRQ remapping implementation
> which doesn't rely on extended MSI-X format (e.g. simply based on
> BDF). If that's the case for PPC, then you should build MSI-X
> passthrough based on this fact instead of general IRQ remapping
> enabled or not.
>
> Thanks
> Kevin

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Alex Williamson
In reply to this post by Tian, Kevin
On Thu, 5 May 2016 12:15:46 +0000
"Tian, Kevin" <[hidden email]> wrote:

> > From: Yongji Xie [mailto:[hidden email]]
> > Sent: Thursday, May 05, 2016 7:43 PM
> >
> > Hi David and Kevin,
> >
> > On 2016/5/5 17:54, David Laight wrote:
> >  
> > > From: Tian, Kevin  
> > >> Sent: 05 May 2016 10:37  
> > > ...  
> > >>> Acutually, we are not aimed at accessing MSI-X table from
> > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > >>> can make sure guest kernel would not touch MSI-X table in
> > >>> normal code path such as para-virtualized guest kernel on PPC64.
> > >>>  
> > >> Then how do you prevent malicious guest kernel accessing it?  
> > > Or a malicious guest driver for an ethernet card setting up
> > > the receive buffer ring to contain a single word entry that
> > > contains the address associated with an MSI-X interrupt and
> > > then using a loopback mode to cause a specific packet be
> > > received that writes the required word through that address.
> > >
> > > Remember the PCIe cycle for an interrupt is a normal memory write
> > > cycle.
> > >
> > > David
> > >  
> >
> > If we have enough permission to load a malicious driver or
> > kernel, we can easily break the guest without exposed
> > MSI-X table.
> >
> > I think it should be safe to expose MSI-X table if we can
> > make sure that malicious guest driver/kernel can't use
> > the MSI-X table to break other guest or host. The
> > capability of IRQ remapping could provide this
> > kind of protection.
> >  
>
> With IRQ remapping it doesn't mean you can pass through MSI-X
> structure to guest. I know actual IRQ remapping might be platform
> specific, but at least for Intel VT-d specification, MSI-X entry must
> be configured with a remappable format by host kernel which
> contains an index into IRQ remapping table. The index will find a
> IRQ remapping entry which controls interrupt routing for a specific
> device. If you allow a malicious program random index into MSI-X
> entry of assigned device, the hole is obvious...
>
> Above might make sense only for a IRQ remapping implementation
> which doesn't rely on extended MSI-X format (e.g. simply based on
> BDF). If that's the case for PPC, then you should build MSI-X
> passthrough based on this fact instead of general IRQ remapping
> enabled or not.

I don't think anyone is expecting that we can expose the MSI-X vector
table to the guest and the guest can make direct use of it.  The end
goal here is that the guest on a power system is already
paravirtualized to not program the device MSI-X by directly writing to
the MSI-X vector table.  They have hypercalls for this since they
always run virtualized.  Therefore a) they never intend to touch the
MSI-X vector table and b) they have sufficient isolation that a guest
can only hurt itself by doing so.

On x86 we don't have a), our method of programming the MSI-X vector
table is to directly write to it. Therefore we will always require QEMU
to place a MemoryRegion over the vector table to intercept those
accesses.  However with interrupt remapping, we do have b) on x86, which
means that we don't need to be so strict in disallowing user accesses
to the MSI-X vector table.  It's not useful for configuring MSI-X on
the device, but the user should only be able to hurt themselves by
writing it directly.  x86 doesn't really get anything out of this
change, but it helps this special case on power pretty significantly
aiui.  Thanks,

Alex
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 4/5] pci-ioda: Set PCI_BUS_FLAGS_MSI_REMAP for IODA host bridge

Alexey Kardashevskiy
In reply to this post by Yongji Xie
On 04/27/2016 10:43 PM, Yongji Xie wrote:
> Any IODA host bridge have the capability of IRQ remapping.
> So we set PCI_BUS_FLAGS_MSI_REMAP when this kind of host birdge
> is detected.
>
> Signed-off-by: Yongji Xie <[hidden email]>


Reviewed-by: Alexey Kardashevskiy <[hidden email]>


> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index f90dc04..9557638 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3080,6 +3080,12 @@ static void pnv_pci_ioda_fixup(void)
>   pnv_npu_ioda_fixup();
>  }
>
> +int pnv_pci_ioda_root_bridge_prepare(struct pci_host_bridge *bridge)
> +{
> + bridge->bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
> + return 0;
> +}
> +
>  /*
>   * Returns the alignment for I/O or memory windows for P2P
>   * bridges. That actually depends on how PEs are segmented.
> @@ -3364,6 +3370,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
>   */
>   ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
>
> + ppc_md.pcibios_root_bridge_prepare = pnv_pci_ioda_root_bridge_prepare;
> +
>   if (phb->type == PNV_PHB_NPU)
>   hose->controller_ops = pnv_npu_ioda_controller_ops;
>   else
>


--
Alexey
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Alexey Kardashevskiy
In reply to this post by Alex Williamson
On 05/06/2016 01:05 AM, Alex Williamson wrote:

> On Thu, 5 May 2016 12:15:46 +0000
> "Tian, Kevin" <[hidden email]> wrote:
>
>>> From: Yongji Xie [mailto:[hidden email]]
>>> Sent: Thursday, May 05, 2016 7:43 PM
>>>
>>> Hi David and Kevin,
>>>
>>> On 2016/5/5 17:54, David Laight wrote:
>>>
>>>> From: Tian, Kevin
>>>>> Sent: 05 May 2016 10:37
>>>> ...
>>>>>> Acutually, we are not aimed at accessing MSI-X table from
>>>>>> guest. So I think it's safe to passthrough MSI-X table if we
>>>>>> can make sure guest kernel would not touch MSI-X table in
>>>>>> normal code path such as para-virtualized guest kernel on PPC64.
>>>>>>
>>>>> Then how do you prevent malicious guest kernel accessing it?
>>>> Or a malicious guest driver for an ethernet card setting up
>>>> the receive buffer ring to contain a single word entry that
>>>> contains the address associated with an MSI-X interrupt and
>>>> then using a loopback mode to cause a specific packet be
>>>> received that writes the required word through that address.
>>>>
>>>> Remember the PCIe cycle for an interrupt is a normal memory write
>>>> cycle.
>>>>
>>>> David
>>>>
>>>
>>> If we have enough permission to load a malicious driver or
>>> kernel, we can easily break the guest without exposed
>>> MSI-X table.
>>>
>>> I think it should be safe to expose MSI-X table if we can
>>> make sure that malicious guest driver/kernel can't use
>>> the MSI-X table to break other guest or host. The
>>> capability of IRQ remapping could provide this
>>> kind of protection.
>>>
>>
>> With IRQ remapping it doesn't mean you can pass through MSI-X
>> structure to guest. I know actual IRQ remapping might be platform
>> specific, but at least for Intel VT-d specification, MSI-X entry must
>> be configured with a remappable format by host kernel which
>> contains an index into IRQ remapping table. The index will find a
>> IRQ remapping entry which controls interrupt routing for a specific
>> device. If you allow a malicious program random index into MSI-X
>> entry of assigned device, the hole is obvious...
>>
>> Above might make sense only for a IRQ remapping implementation
>> which doesn't rely on extended MSI-X format (e.g. simply based on
>> BDF). If that's the case for PPC, then you should build MSI-X
>> passthrough based on this fact instead of general IRQ remapping
>> enabled or not.
>
> I don't think anyone is expecting that we can expose the MSI-X vector
> table to the guest and the guest can make direct use of it.  The end
> goal here is that the guest on a power system is already
> paravirtualized to not program the device MSI-X by directly writing to
> the MSI-X vector table.  They have hypercalls for this since they
> always run virtualized.  Therefore a) they never intend to touch the
> MSI-X vector table and b) they have sufficient isolation that a guest
> can only hurt itself by doing so.
>
> On x86 we don't have a), our method of programming the MSI-X vector
> table is to directly write to it. Therefore we will always require QEMU
> to place a MemoryRegion over the vector table to intercept those
> accesses.  However with interrupt remapping, we do have b) on x86, which
> means that we don't need to be so strict in disallowing user accesses
> to the MSI-X vector table.  It's not useful for configuring MSI-X on
> the device, but the user should only be able to hurt themselves by
> writing it directly.  x86 doesn't really get anything out of this
> change, but it helps this special case on power pretty significantly
> aiui.  Thanks,

Excellent short overview, saved :)

How do we proceed with these patches? Nobody seems objecting them but also
nobody seems taking them either...




--
Alexey
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

Alex Williamson
On Fri, 6 May 2016 16:35:38 +1000
Alexey Kardashevskiy <[hidden email]> wrote:

> On 05/06/2016 01:05 AM, Alex Williamson wrote:
> > On Thu, 5 May 2016 12:15:46 +0000
> > "Tian, Kevin" <[hidden email]> wrote:
> >  
> >>> From: Yongji Xie [mailto:[hidden email]]
> >>> Sent: Thursday, May 05, 2016 7:43 PM
> >>>
> >>> Hi David and Kevin,
> >>>
> >>> On 2016/5/5 17:54, David Laight wrote:
> >>>  
> >>>> From: Tian, Kevin  
> >>>>> Sent: 05 May 2016 10:37  
> >>>> ...  
> >>>>>> Acutually, we are not aimed at accessing MSI-X table from
> >>>>>> guest. So I think it's safe to passthrough MSI-X table if we
> >>>>>> can make sure guest kernel would not touch MSI-X table in
> >>>>>> normal code path such as para-virtualized guest kernel on PPC64.
> >>>>>>  
> >>>>> Then how do you prevent malicious guest kernel accessing it?  
> >>>> Or a malicious guest driver for an ethernet card setting up
> >>>> the receive buffer ring to contain a single word entry that
> >>>> contains the address associated with an MSI-X interrupt and
> >>>> then using a loopback mode to cause a specific packet be
> >>>> received that writes the required word through that address.
> >>>>
> >>>> Remember the PCIe cycle for an interrupt is a normal memory write
> >>>> cycle.
> >>>>
> >>>> David
> >>>>  
> >>>
> >>> If we have enough permission to load a malicious driver or
> >>> kernel, we can easily break the guest without exposed
> >>> MSI-X table.
> >>>
> >>> I think it should be safe to expose MSI-X table if we can
> >>> make sure that malicious guest driver/kernel can't use
> >>> the MSI-X table to break other guest or host. The
> >>> capability of IRQ remapping could provide this
> >>> kind of protection.
> >>>  
> >>
> >> With IRQ remapping it doesn't mean you can pass through MSI-X
> >> structure to guest. I know actual IRQ remapping might be platform
> >> specific, but at least for Intel VT-d specification, MSI-X entry must
> >> be configured with a remappable format by host kernel which
> >> contains an index into IRQ remapping table. The index will find a
> >> IRQ remapping entry which controls interrupt routing for a specific
> >> device. If you allow a malicious program random index into MSI-X
> >> entry of assigned device, the hole is obvious...
> >>
> >> Above might make sense only for a IRQ remapping implementation
> >> which doesn't rely on extended MSI-X format (e.g. simply based on
> >> BDF). If that's the case for PPC, then you should build MSI-X
> >> passthrough based on this fact instead of general IRQ remapping
> >> enabled or not.  
> >
> > I don't think anyone is expecting that we can expose the MSI-X vector
> > table to the guest and the guest can make direct use of it.  The end
> > goal here is that the guest on a power system is already
> > paravirtualized to not program the device MSI-X by directly writing to
> > the MSI-X vector table.  They have hypercalls for this since they
> > always run virtualized.  Therefore a) they never intend to touch the
> > MSI-X vector table and b) they have sufficient isolation that a guest
> > can only hurt itself by doing so.
> >
> > On x86 we don't have a), our method of programming the MSI-X vector
> > table is to directly write to it. Therefore we will always require QEMU
> > to place a MemoryRegion over the vector table to intercept those
> > accesses.  However with interrupt remapping, we do have b) on x86, which
> > means that we don't need to be so strict in disallowing user accesses
> > to the MSI-X vector table.  It's not useful for configuring MSI-X on
> > the device, but the user should only be able to hurt themselves by
> > writing it directly.  x86 doesn't really get anything out of this
> > change, but it helps this special case on power pretty significantly
> > aiui.  Thanks,  
>
> Excellent short overview, saved :)
>
> How do we proceed with these patches? Nobody seems objecting them but also
> nobody seems taking them either...

Well, this series is still based on some non-upstream patches, so...
Once that dependency is resolved this series should probably be split
into functional areas for acceptance by the appropriate subsystem
maintainers.
12