IO errors after "block: remove bio_get_nr_vecs()"

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Linus Torvalds-2
On Sun, Dec 20, 2015 at 8:43 PM, Artem S. Tashkinov <[hidden email]> wrote:
>
> In the past I happily ran an x86_64 bit kernel together with 32bit userland
> for quite some time but then I hit a wall: VirtualBox expects its kernel
> modules to have the same bitness as the application itself so I had to
> revert back to an i686 PAE setup.

Ugh, ok. That kind of forces your hand, yes.

Although:

> t's probably high time to try qemu however last time I looked at it a few
> years ago it lacked several crucial features I need from a VM.

kvm-qemu really ends up working pretty well.. Give it a try.

That said, we obviously need to figure out this current problem
regardless first..

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Linus Torvalds-2
In reply to this post by Tejun Heo-2
On Sun, Dec 20, 2015 at 8:26 PM, Tejun Heo <[hidden email]> wrote:
>
> I wonder whether ahci is screwing up command / sg table setup in a way
> that e.g. if there are too many segments the sg table overflows into
> the neighboring one which is now being exposed by upper layer being
> fixed to send down larger commands.  Looking into it.

That would explain the

  Corrupted low memory at c0001000 ...

that Artem also saw.

Anyway, it would be lovely to have some verification in the ATA
routines that the passed-on IO actually h9onors the limits it set.
Could you add a WARN_ON_ONCE(check_io_limits())" or similar, and maybe
we could catch whatever causes the overflow red-handed?

On a totally separate issue:

Just looking at some of the merging code, and I have to say that it
strikes me as insane. This in particular:

  #define __BIO_SEG_BOUNDARY(addr1, addr2, mask) \
        (((addr1) | (mask)) == (((addr2) - 1) | (mask)))
  #define BIOVEC_SEG_BOUNDARY(q, b1, b2) \
        __BIO_SEG_BOUNDARY(bvec_to_phys((b1)), bvec_to_phys((b2)) +
(b2)->bv_len, queue_segment_boundary((q)))

seems just *stupid*.

Why does it do that "bvec_to_phys((b2)) + (b2)->bv_len -1" on the
second bvec? That's the :"physical address of the last byte of the
second bvec".

I understand the "round both addresses up by the mask, and we want to
make sure that they are in the same segment" part.

But since an individual bvec had better be fully inside one segment
(since we split at bvec boundaries anyway, so if ). why do all that
crap anyway? The end address doesn't matter, you could just use the
beginning.

So remove the "-1" and remove the "+bv_len".

At which it would become just

  #define __BIO_SEG_BOUNDARY(addr1, addr2, mask) \
        ((addr1) | (mask) == (addr2)|(mask))
  #define BIOVEC_SEG_BOUNDARY(q, b1, b2) \
        __BIO_SEG_BOUNDARY(bvec_to_phys((b1)), bvec_to_phys((b2)),
queue_segment_boundary((q)))

which seems simpler and more understandable. "Are the beginning
addresses in within the same segment"

Or are there ever bv_len == 0 things at the boundary that we want to
merge. Because then the "-1+bv_len" case migth make sense.

Anyway, that shouldn't change the end result in any way, so that
doesn't all *matter*, but it worries me when things look more
complicated than I think they should be.

Is there something I'm missing?

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Linus Torvalds-2
In reply to this post by Linus Torvalds-2
On Sun, Dec 20, 2015 at 8:47 PM, Linus Torvalds
<[hidden email]> wrote:
>
> That said, we obviously need to figure out this current problem
> regardless first..

... although maybe it *would* be interesting to hear what happens if
you just compile a 64-bit kernel instead?

Do you still see the problem? Because if not, then we should look very
specifically for some 32-bit PAE issue.

For example, maybe we use "unsigned long" somewhere where we should
use "phys_addr_t". On x86-64, they obviously end up being the same. On
normal non-PAE x86-32, they are also the same. But ..

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Tejun Heo-2
In reply to this post by Linus Torvalds-2
Artem, can you please reproduce the issue with the following patch
applied and attach the kernel log?

Thanks.

---
 drivers/ata/libahci.c     |   40 ++++++++++++++++++++++++++++++++++++++--
 drivers/ata/libata-eh.c   |    4 ++++
 drivers/ata/libata-scsi.c |    1 +
 3 files changed, 43 insertions(+), 2 deletions(-)

--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -2278,7 +2278,7 @@ static int ahci_port_start(struct ata_po
  struct ahci_host_priv *hpriv = ap->host->private_data;
  struct device *dev = ap->host->dev;
  struct ahci_port_priv *pp;
- void *mem;
+ void *mem, *base;
  dma_addr_t mem_dma;
  size_t dma_sz, rx_fis_sz;
 
@@ -2319,7 +2319,9 @@ static int ahci_port_start(struct ata_po
  rx_fis_sz = AHCI_RX_FIS_SZ;
  }
 
- mem = dmam_alloc_coherent(dev, dma_sz, &mem_dma, GFP_KERNEL);
+ base = mem = dmam_alloc_coherent(dev, dma_sz, &mem_dma, GFP_KERNEL);
+ printk("XXX port %d dma_sz=%zu mem=%p mem_dma=%p",
+       ap->port_no, dma_sz, mem, (void *)mem_dma);
  if (!mem)
  return -ENOMEM;
  memset(mem, 0, dma_sz);
@@ -2331,6 +2333,8 @@ static int ahci_port_start(struct ata_po
  pp->cmd_slot = mem;
  pp->cmd_slot_dma = mem_dma;
 
+ pr_cont(" cmd_slot=%zu", mem - base);
+
  mem += AHCI_CMD_SLOT_SZ;
  mem_dma += AHCI_CMD_SLOT_SZ;
 
@@ -2340,6 +2344,8 @@ static int ahci_port_start(struct ata_po
  pp->rx_fis = mem;
  pp->rx_fis_dma = mem_dma;
 
+ pr_cont(" rx_fis=%zu", mem - base);
+
  mem += rx_fis_sz;
  mem_dma += rx_fis_sz;
 
@@ -2350,6 +2356,8 @@ static int ahci_port_start(struct ata_po
  pp->cmd_tbl = mem;
  pp->cmd_tbl_dma = mem_dma;
 
+ pr_cont(" cmd_tbl=%zu\n", mem - base);
+
  /*
  * Save off initial list of interrupts to be enabled.
  * This could be changed later
@@ -2540,6 +2548,34 @@ int ahci_host_activate(struct ata_host *
 }
 EXPORT_SYMBOL_GPL(ahci_host_activate);
 
+void ahci_dump_dma(struct ata_queued_cmd *qc)
+{
+ struct ata_port *ap = qc->ap;
+ struct ahci_port_priv *pp = ap->private_data;
+ struct ahci_cmd_hdr *cmd = &pp->cmd_slot[qc->tag];
+ void *cmd_tbl = pp->cmd_tbl + qc->tag * AHCI_CMD_TBL_SZ;
+ u32 *fis = cmd_tbl;
+ struct ahci_sg *ahci_sg = cmd_tbl + AHCI_CMD_TBL_HDR_SZ;
+ int prdtl = (cmd->opts & 0xffff0000) >> 16;
+ int i;
+
+ printk("XXX cmd=%p cmd_tbl=%p ahci_sg=%p\n", cmd, cmd_tbl, ahci_sg);
+ printk("XXX opts=%x st=%x addr=%x addr_hi=%x rsvd=%x:%x:%x:%x\n",
+       cmd->opts, cmd->status, cmd->tbl_addr, cmd->tbl_addr_hi,
+       cmd->reserved[0], cmd->reserved[1], cmd->reserved[2], cmd->reserved[3]);
+ printk("XXX fis=%08x:%08x:%08x:%08x %08x:%08x:%08x:%08x\n",
+       fis[0], fis[1], fis[2], fis[3],
+       fis[4], fis[5], fis[6], fis[7]);
+
+ printk("XXX qc->n_elem=%d fis_len=%d prdtl=%d\n",
+       qc->n_elem, cmd->opts & 0xf, prdtl);
+
+ for (i = 0; i < prdtl; i++)
+ printk("XXX sg[%d] = %x %x %x (%d)\n",
+       i, ahci_sg[i].addr, ahci_sg[i].addr_hi, ahci_sg[i].flags_size,
+       (ahci_sg[i].flags_size & 0x7fffffff) + 1);
+}
+
 MODULE_AUTHOR("Jeff Garzik");
 MODULE_DESCRIPTION("Common AHCI SATA low-level routines");
 MODULE_LICENSE("GPL");
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1059,6 +1059,7 @@ static int ata_do_link_abort(struct ata_
 
  if (qc && (!link || qc->dev->link == link)) {
  qc->flags |= ATA_QCFLAG_FAILED;
+ qc->err_mask = AC_ERR_DEV;
  ata_qc_complete(qc);
  nr_aborted++;
  }
@@ -2416,6 +2417,8 @@ const char *ata_get_cmd_descript(u8 comm
 }
 EXPORT_SYMBOL_GPL(ata_get_cmd_descript);
 
+void ahci_dump_dma(struct ata_queued_cmd *qc);
+
 /**
  * ata_eh_link_report - report error handling to user
  * @link: ATA link EH is going on
@@ -2590,6 +2593,7 @@ static void ata_eh_link_report(struct at
   res->feature & ATA_IDNF ? "IDNF " : "",
   res->feature & ATA_ABORTED ? "ABRT " : "");
 #endif
+ ahci_dump_dma(qc);
  }
 }
 
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4035,6 +4035,7 @@ int ata_scsi_user_scan(struct Scsi_Host
  }
 
  if (rc == 0) {
+ ata_port_freeze(ap);
  ata_port_schedule_eh(ap);
  spin_unlock_irqrestore(ap->lock, flags);
  ata_port_wait_eh(ap);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
On 2015-12-21 11:55, Tejun Heo wrote:
> Artem, can you please reproduce the issue with the following patch
> applied and attach the kernel log?
>
> Thanks.
>

I've applied this patch on top of vanilla 4.3.3 kernel (without Linus'es
revert). Hopefully it's how you intended it to be.

Here's the result (I skipped the beginning of dmesg - it's the same as
always - see bugzilla).

dmesg.log (109K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
In reply to this post by Linus Torvalds-2
On 2015-12-21 10:23, Linus Torvalds wrote:

> On Sun, Dec 20, 2015 at 8:47 PM, Linus Torvalds
> <[hidden email]> wrote:
>>
>> That said, we obviously need to figure out this current problem
>> regardless first..
>
> ... although maybe it *would* be interesting to hear what happens if
> you just compile a 64-bit kernel instead?
>
> Do you still see the problem? Because if not, then we should look very
> specifically for some 32-bit PAE issue.
>
> For example, maybe we use "unsigned long" somewhere where we should
> use "phys_addr_t". On x86-64, they obviously end up being the same. On
> normal non-PAE x86-32, they are also the same. But ..
>

Let's wait for what Tejun Heo might say - I've applied his debugging
patch and sent back the results.

Building x86_64 kernel here involves installing a 64bit Linux VM, so I'd
like it to be the last resort.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Dan Aloni-5
In reply to this post by Linus Torvalds-2
On Sun, Dec 20, 2015 at 10:41:44AM -0800, Linus Torvalds wrote:
>[..]
> Sadly, without CONFIG_LOCALVERSION_AUTO, there's no way to match up
> the dmesg files (in the same bisection tar-file as the bisection log)
> with the actual versions.

Perhaps we can print the Git revision in a manner independent of
CONFIG_LOCALVERSION_AUTO, using the attached patch. It will be emitted
in the dmesg Linux banner (though not in /proc/version, that's more
interface-ish and may break things).

--
Dan Aloni

0001-init-version.c-add-SCM_REVISION.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Tejun Heo-2
In reply to this post by Artem S. Tashkinov
Hello, Artem.

On Mon, Dec 21, 2015 at 12:25:06PM +0500, Artem S. Tashkinov wrote:
> I've applied this patch on top of vanilla 4.3.3 kernel (without Linus'es
> revert). Hopefully it's how you intended it to be.
>
> Here's the result (I skipped the beginning of dmesg - it's the same as
> always - see bugzilla).

I added some debug messages during init.  It isn't critical but it'd
be great if you attach the full log from now on.  Something seemingly
unrelated surprisingly often turns out to be an important clue.

> [   60.387407] Corrupted low memory at c0001000 (1000 phys) = cba3d25f
...
> [   60.389131] Corrupted low memory at c0001ffc (1ffc phys) = 2712322a

It looks like the controller shat on the entire second page, which is
really puzzling.  Looks like the controller is being fed corrupt DMA
SG table.

...
> [   74.367608] ata1.00: exception Emask 0x3 SAct 0x180000 SErr 0x3040400 action 0x6 frozen
> [   74.367613] ata1.00: irq_stat 0x45000008

The intresting bit here is that the controller is indicating OVERFLOW
which means that it consumed all PRD entries (ahci's DMA SG table) for
the command but the disk is still sending data to the host.

> [   74.367617] ata1: SError: { Proto CommWake TrStaTrns UnrecFIS }
> [   74.367621] ata1.00: failed command: READ FPDMA QUEUED
> [   74.367627] ata1.00: cmd 60/00:98:00:89:21/07:00:04:00:00/40 tag 19 ncq 917504 in
> [   74.367627]          res 40/00:98:00:89:21/00:00:04:00:00/40 Emask 0x3 (HSM violation)
> [   74.367630] ata1.00: status: { DRDY }

The followings are the data fed to the controller as seen from the
CPU.

> [   74.367632] XXX cmd=ee9e0260 cmd_tbl=ee9ed600 ahci_sg=ee9ed680
> [   74.367634] XXX opts=140005 st=0 addr=2e9ed600 addr_hi=0 rsvd=0:0:0:0
> [   74.367637] XXX fis=00608027:40218900:07000004:08000098 00000000:00000000:00000000:00001fff
> [   74.367639] XXX qc->n_elem=20 fis_len=5 prdtl=20
> [   74.367641] XXX sg[0] = 29007000 0 8fff (36864)
> [   74.367643] XXX sg[1] = 29218000 0 7fff (32768)
> [   74.367645] XXX sg[2] = 29298000 0 7fff (32768)
> [   74.367646] XXX sg[3] = 29ad8000 0 7fff (32768)
> [   74.367648] XXX sg[4] = 29338000 0 7fff (32768)
> [   74.367650] XXX sg[5] = 29370000 0 2fff (12288)
> [   74.367652] XXX sg[6] = 219000 0 2fff (12288)
> [   74.367653] XXX sg[7] = 230000 0 3fff (16384)
> [   74.367655] XXX sg[8] = 29373000 0 4fff (20480)
> [   74.367657] XXX sg[9] = 29130000 0 ffff (65536)
> [   74.367659] XXX sg[10] = 29170000 0 ffff (65536)
> [   74.367660] XXX sg[11] = 29280000 0 ffff (65536)
> [   74.367662] XXX sg[12] = 29200000 0 ffff (65536)
> [   74.367664] XXX sg[13] = 29320000 0 ffff (65536)
> [   74.367666] XXX sg[14] = 29360000 0 ffff (65536)
> [   74.367667] XXX sg[15] = 29340000 0 ffff (65536)
> [   74.367669] XXX sg[16] = 29350000 0 ffff (65536)
> [   74.367671] XXX sg[17] = 29300000 0 ffff (65536)
> [   74.367672] XXX sg[18] = 29310000 0 ffff (65536)
> [   74.367674] XXX sg[19] = 29020000 0 7fff (32768)

And everything checks out.  Data lenghts are consistent and all the
addresses look kosher - at least nothing should upset the data
transfer itself.

> [   74.367677] ata1.00: failed command: READ FPDMA QUEUED
> [   74.367682] ata1.00: cmd 60/90:a0:c0:fe:23/00:00:04:00:00/40 tag 20 ncq 73728 in
> [   74.367682]          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x7 (timeout)

This one looks like a collateral damage.

...

> [   74.763895] ata1.00: exception Emask 0x3 SAct 0x800000 SErr 0x3040400 action 0x6
> [   74.763900] ata1.00: irq_stat 0x45000008
> [   74.763903] ata1: SError: { Proto CommWake TrStaTrns UnrecFIS }
> [   74.763907] ata1.00: failed command: READ FPDMA QUEUED
> [   74.763913] ata1.00: cmd 60/98:b8:00:c9:20/03:00:04:00:00/40 tag 23 ncq 471040 in
> [   74.763913]          res 40/00:b8:00:c9:20/00:00:04:00:00/40 Emask 0x3 (HSM violation)
> [   74.763916] ata1.00: status: { DRDY }
> [   74.763918] XXX cmd=ee9e02e0 cmd_tbl=ee9f0200 ahci_sg=ee9f0280
> [   74.763921] XXX opts=160005 st=0 addr=2e9f0200 addr_hi=0 rsvd=0:0:0:0
> [   74.763924] XXX fis=98608027:4020c900:03000004:080000b8 00000000:00000000:00000000:00000fff
> [   74.763925] XXX qc->n_elem=22 fis_len=5 prdtl=22
> [   74.763928] XXX sg[0] = 2ab1b000 0 fff (4096)
...
> [   74.763964] XXX sg[21] = 29170000 0 dfff (57344)

This is a separate failure and shares the same pattern as before.
Everything looks proper.

The thing is ahci doesn't have much restrictions in terms of its DMA
capabilities.  It can digest pretty much anything.  The only
restriction is that each entry can't be larger than 4M - but our
segment maximum is 64k.  There's no noticeable boundary crossing
happening both in target DMA regions and command tables.  All
addresses are in linear mapped normal area.

If the controller is seeing what the host is seeing in the command
area, I can't see why it would be declaring overflow or dumping stuff
into the lowest pages.

Ming Lei reported a similar issue on 32bit ARM w/ PAE.  I don't
understand why PAE is making any difference.  Native 64bit machines
should be generating IOs just as large.  Maybe iommu is hiding the
issue?

I'm afraid we'll have to go brute force with the problem - dump more
information on both 32 and 64bits and see where the differences lie.
At the moment, given that the DMA table looks completely fine and ahci
isn't picky about the shape of data areas, I think it's more likely
some obscure issue in address mapping under PAE or a silly bug in ahci
than block layer screwing up merging.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Tejun Heo-2
Hello, Artem.

Can you please apply the following patch on top and see whether
anything changes?  If it does make the issue go away, can you please
revert the ".can_queue" part and test again?

Thanks.

---
 drivers/ata/ahci.h    |    2 +-
 drivers/ata/libahci.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -365,7 +365,7 @@ extern struct device_attribute *ahci_sde
  */
 #define AHCI_SHT(drv_name) \
  ATA_NCQ_SHT(drv_name), \
- .can_queue = AHCI_MAX_CMDS - 1, \
+ .can_queue = 1/*AHCI_MAX_CMDS - 1*/, \
  .sg_tablesize = AHCI_MAX_SG, \
  .dma_boundary = AHCI_DMA_BOUNDARY, \
  .shost_attrs = ahci_shost_attrs, \
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -420,7 +420,7 @@ void ahci_save_initial_config(struct dev
  hpriv->saved_cap2 = cap2 = 0;
 
  /* some chips have errata preventing 64bit use */
- if ((cap & HOST_CAP_64) && (hpriv->flags & AHCI_HFLAG_32BIT_ONLY)) {
+ if ((cap & HOST_CAP_64)/* && (hpriv->flags & AHCI_HFLAG_32BIT_ONLY)*/) {
  dev_info(dev, "controller can't do 64bit DMA, forcing 32bit\n");
  cap &= ~HOST_CAP_64;
  }

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Tejun Heo-2
Hello, again.

On Mon, Dec 21, 2015 at 03:07:21PM -0500, Tejun Heo wrote:
> Hello, Artem.
>
> Can you please apply the following patch on top and see whether
> anything changes?  If it does make the issue go away, can you please
> revert the ".can_queue" part and test again?

If the patch doesn't change anything, can you please try the
followings and see which one makes difference?

1. Exclude memory above 4G line with boot param "max_addr=4G".
2. Disable highmem with "highmem=0".
3. Try booting 64bit kernel.

At the moment, the only thing I can think of which can explain the PAE
+ bio_get_nr_vecs() situation is that the bio split code which is
activated by the bio_get_nr_vecs() somehow messes up 64bit or high
addresses on 32bit kernels.  I scanned for the obvious but at bio
layer, memory is represented by struct page, so nothing obvious seems
broken.

Note that for now I'm ignoring the debug dumps from the ahci debug
patch which indicates that the passed in addresses are all fine.  It
is possible that the controller gets confused with certain receiving
addresses and reports failure on later commands or maybe there is a
different sequence of events which can encompass both that we don't
know of yet.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Ming Lei-5
In reply to this post by Tejun Heo-2
On Tue, Dec 22, 2015 at 3:35 AM, Tejun Heo <[hidden email]> wrote:

>
>> [   74.367632] XXX cmd=ee9e0260 cmd_tbl=ee9ed600 ahci_sg=ee9ed680
>> [   74.367634] XXX opts=140005 st=0 addr=2e9ed600 addr_hi=0 rsvd=0:0:0:0
>> [   74.367637] XXX fis=00608027:40218900:07000004:08000098 00000000:00000000:00000000:00001fff
>> [   74.367639] XXX qc->n_elem=20 fis_len=5 prdtl=20
>> [   74.367641] XXX sg[0] = 29007000 0 8fff (36864)
>> [   74.367643] XXX sg[1] = 29218000 0 7fff (32768)
>> [   74.367645] XXX sg[2] = 29298000 0 7fff (32768)
>> [   74.367646] XXX sg[3] = 29ad8000 0 7fff (32768)
>> [   74.367648] XXX sg[4] = 29338000 0 7fff (32768)
>> [   74.367650] XXX sg[5] = 29370000 0 2fff (12288)
>> [   74.367652] XXX sg[6] = 219000 0 2fff (12288)
>> [   74.367653] XXX sg[7] = 230000 0 3fff (16384)
>> [   74.367655] XXX sg[8] = 29373000 0 4fff (20480)
>> [   74.367657] XXX sg[9] = 29130000 0 ffff (65536)
>> [   74.367659] XXX sg[10] = 29170000 0 ffff (65536)
>> [   74.367660] XXX sg[11] = 29280000 0 ffff (65536)
>> [   74.367662] XXX sg[12] = 29200000 0 ffff (65536)
>> [   74.367664] XXX sg[13] = 29320000 0 ffff (65536)
>> [   74.367666] XXX sg[14] = 29360000 0 ffff (65536)
>> [   74.367667] XXX sg[15] = 29340000 0 ffff (65536)
>> [   74.367669] XXX sg[16] = 29350000 0 ffff (65536)
>> [   74.367671] XXX sg[17] = 29300000 0 ffff (65536)
>> [   74.367672] XXX sg[18] = 29310000 0 ffff (65536)
>> [   74.367674] XXX sg[19] = 29020000 0 7fff (32768)
>
> And everything checks out.  Data lenghts are consistent and all the
> addresses look kosher - at least nothing should upset the data
> transfer itself.

Maybe we can check more, such as if the sg element is correctly
merged from bvec, and the following code should be useful to check
that:

+static void ahci_dump_req(struct ata_queued_cmd *qc)
+{
+       struct scsi_cmnd *cmd = qc->scsicmd;
+       struct request *req = cmd->request;
+       struct req_iterator iter;
+       struct bio_vec bv;
+       int i = 0;
+       phys_addr_t paddr;
+
+       printk("%s: \n", __func__);
+       rq_for_each_segment(bv, req, iter) {
+               paddr = page_to_phys(bv.bv_page);
+               printk("\t %3d: %x-%x %x %u\n", i++,
+                       (unsigned)paddr & 0xffffffff,
+                       (unsigned)(paddr >> 32),
+                       bv.bv_offset,
+                       bv.bv_len);
+       }
+}

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Kent Overstreet
In reply to this post by Tejun Heo-2
I just reproduced it - Artem, I'll let you know when we have a possible fix but
hopefully there won't be any need for you to beat up your hardware any more :)

On Mon, Dec 21, 2015 at 04:08:11PM -0500, Tejun Heo wrote:

> Hello, again.
>
> On Mon, Dec 21, 2015 at 03:07:21PM -0500, Tejun Heo wrote:
> > Hello, Artem.
> >
> > Can you please apply the following patch on top and see whether
> > anything changes?  If it does make the issue go away, can you please
> > revert the ".can_queue" part and test again?
>
> If the patch doesn't change anything, can you please try the
> followings and see which one makes difference?
>
> 1. Exclude memory above 4G line with boot param "max_addr=4G".
> 2. Disable highmem with "highmem=0".
> 3. Try booting 64bit kernel.
>
> At the moment, the only thing I can think of which can explain the PAE
> + bio_get_nr_vecs() situation is that the bio split code which is
> activated by the bio_get_nr_vecs() somehow messes up 64bit or high
> addresses on 32bit kernels.  I scanned for the obvious but at bio
> layer, memory is represented by struct page, so nothing obvious seems
> broken.
>
> Note that for now I'm ignoring the debug dumps from the ahci debug
> patch which indicates that the passed in addresses are all fine.  It
> is possible that the controller gets confused with certain receiving
> addresses and reports failure on later commands or maybe there is a
> different sequence of events which can encompass both that we don't
> know of yet.

It repros pretty easily, I should be able to just write some test code that
sends down single specific IOs so no matter what the controller is doing we know
exactly what IO triggered the error.

I'm checking all the 64 bit/pae/etc. stuff now.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Kent Overstreet
In reply to this post by Tejun Heo-2
On Mon, Dec 21, 2015 at 04:08:11PM -0500, Tejun Heo wrote:

reproduced it with 32 bit pae:

> 1. Exclude memory above 4G line with boot param "max_addr=4G".

doesn't work - max_addr=1G doesn't work either

> 2. Disable highmem with "highmem=0".

works!

> 3. Try booting 64bit kernel.

works

Ok, so maybe it actually is PAE specific... but like you noted the block layer
works entirely in terms of pages so...

The one idea I can think of is - maybe BIOVEC_PHYS_MERGEABLE() is broken in PAE
mode? I am unfamiliar with anything PAE though.

Where does the ahci code consume the sglist?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
In reply to this post by Linus Torvalds-2
On 2015-12-21 10:23, Linus Torvalds wrote:
> On Sun, Dec 20, 2015 at 8:47 PM, Linus Torvalds
> <[hidden email]> wrote:
>>
>> That said, we obviously need to figure out this current problem
>> regardless first..
>
> ... although maybe it *would* be interesting to hear what happens if
> you just compile a 64-bit kernel instead?

Under x86-64 I cannot reproduce this problem. It seems like it's PAE
specific (Kent Overstreet says he has reproduced it).

>
> Do you still see the problem? Because if not, then we should look very
> specifically for some 32-bit PAE issue.
>
> For example, maybe we use "unsigned long" somewhere where we should
> use "phys_addr_t". On x86-64, they obviously end up being the same. On
> normal non-PAE x86-32, they are also the same. But ..
>
>                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Kent Overstreet
In reply to this post by Tejun Heo-2
So what I _really_ want to know is - how the fuck is the actual ATA command
itself malformed?

You told me at one point that the error code indicated the controller was
claiming it overran the end of the sglist - well, if that's the case we ought to
be able to prove it with an assertion (I already tried; qc->nbytes does match
the sglist, checking from ata_sg_setup()).

Ignore bvec merging, PAE, all that crap - the controller doesn't know about any
of that. _WHAT_ are we feeding it that it doesn't like?

There shouldn't even be that much stuff to check, since in theory the only
possible thing that could be at fault is the sglist. Maybe they're too big, too
small, misaligned, too many of them, god knows what, but somehow the sglist
we're feeding the device has to be at fault, right?

But the sglists in Artem's debugging output look pretty uninteresting (one
of them has _no_ merged segments - it's just 18 4k pages - how could THAT be an
issue?)

Gonna apply your debugging patch and start throwing stuff at the wall next, I
guess...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
In reply to this post by Tejun Heo-2
On 2015-12-22 01:07, Tejun Heo wrote:

> Hello, Artem.
>
> Can you please apply the following patch on top and see whether
> anything changes?  If it does make the issue go away, can you please
> revert the ".can_queue" part and test again?
>
> Thanks.
>
> ---
>  drivers/ata/ahci.h    |    2 +-
>  drivers/ata/libahci.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> --- a/drivers/ata/ahci.h
> +++ b/drivers/ata/ahci.h
> @@ -365,7 +365,7 @@ extern struct device_attribute *ahci_sde
>   */
>  #define AHCI_SHT(drv_name) \
>   ATA_NCQ_SHT(drv_name), \
> - .can_queue = AHCI_MAX_CMDS - 1, \
> + .can_queue = 1/*AHCI_MAX_CMDS - 1*/, \
>   .sg_tablesize = AHCI_MAX_SG, \
>   .dma_boundary = AHCI_DMA_BOUNDARY, \
>   .shost_attrs = ahci_shost_attrs, \
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -420,7 +420,7 @@ void ahci_save_initial_config(struct dev
>   hpriv->saved_cap2 = cap2 = 0;
>
>   /* some chips have errata preventing 64bit use */
> - if ((cap & HOST_CAP_64) && (hpriv->flags & AHCI_HFLAG_32BIT_ONLY)) {
> + if ((cap & HOST_CAP_64)/* && (hpriv->flags &
> AHCI_HFLAG_32BIT_ONLY)*/) {
>   dev_info(dev, "controller can't do 64bit DMA, forcing 32bit\n");
>   cap &= ~HOST_CAP_64;
>   }

This patch fixes the issue for me. Now rechecking without .can_queue
part.

BTW, since I left debugging on, here's the part you wanted:

[    0.613851] XXX port 0 dma_sz=91392 mem=c0020000 mem_dma=00020000
cmd_slot=0 rx_fis=1024 cmd_tbl=1280
[    0.613865] XXX port 1 dma_sz=91392 mem=eea00000 mem_dma=2ea00000
cmd_slot=0 rx_fis=1024 cmd_tbl=1280
[    0.620464] XXX port 2 dma_sz=91392 mem=eea20000 mem_dma=2ea20000
cmd_slot=0 rx_fis=1024 cmd_tbl=1280
[    0.627121] XXX port 3 dma_sz=91392 mem=eea40000 mem_dma=2ea40000
cmd_slot=0 rx_fis=1024 cmd_tbl=1280
[    0.633791] XXX port 4 dma_sz=91392 mem=eea60000 mem_dma=2ea60000
cmd_slot=0 rx_fis=1024 cmd_tbl=1280
[    0.640445] XXX port 5 dma_sz=91392 mem=eea80000 mem_dma=2ea80000
cmd_slot=0 rx_fis=1024 cmd_tbl=1280

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
In reply to this post by Tejun Heo-2
On 2015-12-22 01:07, Tejun Heo wrote:

> Hello, Artem.
>
> Can you please apply the following patch on top and see whether
> anything changes?  If it does make the issue go away, can you please
> revert the ".can_queue" part and test again?
>
> Thanks.
>
> ---
>  drivers/ata/ahci.h    |    2 +-
>  drivers/ata/libahci.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> --- a/drivers/ata/ahci.h
> +++ b/drivers/ata/ahci.h
> @@ -365,7 +365,7 @@ extern struct device_attribute *ahci_sde
>   */
>  #define AHCI_SHT(drv_name) \
>   ATA_NCQ_SHT(drv_name), \
> - .can_queue = AHCI_MAX_CMDS - 1, \
> + .can_queue = 1/*AHCI_MAX_CMDS - 1*/, \
>   .sg_tablesize = AHCI_MAX_SG, \
>   .dma_boundary = AHCI_DMA_BOUNDARY, \
>   .shost_attrs = ahci_shost_attrs, \
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -420,7 +420,7 @@ void ahci_save_initial_config(struct dev
>   hpriv->saved_cap2 = cap2 = 0;
>
>   /* some chips have errata preventing 64bit use */
> - if ((cap & HOST_CAP_64) && (hpriv->flags & AHCI_HFLAG_32BIT_ONLY)) {
> + if ((cap & HOST_CAP_64)/* && (hpriv->flags &
> AHCI_HFLAG_32BIT_ONLY)*/) {
>   dev_info(dev, "controller can't do 64bit DMA, forcing 32bit\n");
>   cap &= ~HOST_CAP_64;
>   }
With the ".can_queue" part left intact the bug resurfaced. Full dmesg
output is attached.

dmesg.xz (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Junichi Nomura
In reply to this post by Kent Overstreet
On 12/22/15 12:59, Kent Overstreet wrote:

> reproduced it with 32 bit pae:
>
>> 1. Exclude memory above 4G line with boot param "max_addr=4G".
>
> doesn't work - max_addr=1G doesn't work either
>
>> 2. Disable highmem with "highmem=0".
>
> works!
>
>> 3. Try booting 64bit kernel.
>
> works

blk_queue_bio() does split then bounce, which makes the segment
counting based on pages before bouncing and could go wrong.

What do you think of a patch like this?

--
Jun'ichi Nomura, NEC Corporation

diff --git a/block/blk-core.c b/block/blk-core.c
index 5131993b..1d1c3c7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1689,8 +1689,6 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
  struct request *req;
  unsigned int request_count = 0;
 
- blk_queue_split(q, &bio, q->bio_split);
-
  /*
  * low level driver can indicate that it wants pages above a
  * certain limit bounced to low memory (ie for highmem, or even
@@ -1698,6 +1696,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
  */
  blk_queue_bounce(q, &bio);
 
+ blk_queue_split(q, &bio, q->bio_split);
+
  if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
  bio->bi_error = -EIO;
  bio_endio(bio);--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Kent Overstreet
On Tue, Dec 22, 2015 at 05:26:12AM +0000, Junichi Nomura wrote:

> On 12/22/15 12:59, Kent Overstreet wrote:
> > reproduced it with 32 bit pae:
> >
> >> 1. Exclude memory above 4G line with boot param "max_addr=4G".
> >
> > doesn't work - max_addr=1G doesn't work either
> >
> >> 2. Disable highmem with "highmem=0".
> >
> > works!
> >
> >> 3. Try booting 64bit kernel.
> >
> > works
>
> blk_queue_bio() does split then bounce, which makes the segment
> counting based on pages before bouncing and could go wrong.
>
> What do you think of a patch like this?

Shit, you nailed it. Can't believe I didn't think to check that.

>
> --
> Jun'ichi Nomura, NEC Corporation
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5131993b..1d1c3c7 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1689,8 +1689,6 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
>   struct request *req;
>   unsigned int request_count = 0;
>  
> - blk_queue_split(q, &bio, q->bio_split);
> -
>   /*
>   * low level driver can indicate that it wants pages above a
>   * certain limit bounced to low memory (ie for highmem, or even
> @@ -1698,6 +1696,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
>   */
>   blk_queue_bounce(q, &bio);
>  
> + blk_queue_split(q, &bio, q->bio_split);
> +
>   if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
>   bio->bi_error = -EIO;
>   bio_endio(bio);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Kent Overstreet
In reply to this post by Junichi Nomura
On Tue, Dec 22, 2015 at 05:26:12AM +0000, Junichi Nomura wrote:

> On 12/22/15 12:59, Kent Overstreet wrote:
> > reproduced it with 32 bit pae:
> >
> >> 1. Exclude memory above 4G line with boot param "max_addr=4G".
> >
> > doesn't work - max_addr=1G doesn't work either
> >
> >> 2. Disable highmem with "highmem=0".
> >
> > works!
> >
> >> 3. Try booting 64bit kernel.
> >
> > works
>
> blk_queue_bio() does split then bounce, which makes the segment
> counting based on pages before bouncing and could go wrong.
>
> What do you think of a patch like this?

Artem, can you give this patch a try?

>
> --
> Jun'ichi Nomura, NEC Corporation
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5131993b..1d1c3c7 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1689,8 +1689,6 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
>   struct request *req;
>   unsigned int request_count = 0;
>  
> - blk_queue_split(q, &bio, q->bio_split);
> -
>   /*
>   * low level driver can indicate that it wants pages above a
>   * certain limit bounced to low memory (ie for highmem, or even
> @@ -1698,6 +1696,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
>   */
>   blk_queue_bounce(q, &bio);
>  
> + blk_queue_split(q, &bio, q->bio_split);
> +
>   if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
>   bio->bi_error = -EIO;
>   bio_endio(bio);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
123