IO errors after "block: remove bio_get_nr_vecs()"

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

IO errors after "block: remove bio_get_nr_vecs()"

Linus Torvalds-2
Kent, Jens, Christoph et al,
 please see this bugzilla:

  https://bugzilla.kernel.org/show_bug.cgi?id=109661

where Artem Tashkinov bisected his problems with 4.3 down to commit
b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
signed off on.

(Also Tejun - maybe you can see what's up - maybe that error message
tells you something)

I'm not sure what's up with his machine, the disk doesn't seem to be
anyuthing particularly unusual, it looks like a 1TB Seagate Barracuda:

  ata1.00: ATA-8: ST1000DM003-1CH162, CC44, max UDMA/133

which doesn't strike me as odd.

Looking at the dmesg, it also looks like it's a pretty normal
Sandybridge setup with Intel chipset. Artem, can you confirm? The PCI
ID for the AHCI chip seems to be (INTEL, 0x1c02).

Any ideas? Anybody?

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Christoph Hellwig-2
On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
> Kent, Jens, Christoph et al,
>  please see this bugzilla:
>
>   https://bugzilla.kernel.org/show_bug.cgi?id=109661
>
> where Artem Tashkinov bisected his problems with 4.3 down to commit
> b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
> signed off on.

Artem,

can you re-check the commits around this series again?  I would be
extremtly surprised if it's really this particular commit and not
one just before it causing the problem - it just allocates bios
to the biggest possible instead of only allocating up to what
bio_add_page would accept.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Linus Torvalds-2
On Sun, Dec 20, 2015 at 10:18 AM, Christoph Hellwig <[hidden email]> wrote:
>
> Artem,
>
> can you re-check the commits around this series again?  I would be
> extremtly surprised if it's really this particular commit and not
> one just before it causing the problem - it just allocates bios
> to the biggest possible instead of only allocating up to what
> bio_add_page would accept.

Judging by Artem's bisect log, the last commit he tested before the
bad one was the commit before: commit 6cf66b4caf9c ("fs: use helper
bio_add_page() instead of open coding on bi_io_vec") and he marked
that one good.

Sadly, without CONFIG_LOCALVERSION_AUTO, there's no way to match up
the dmesg files (in the same bisection tar-file as the bisection log)
with the actual versions. Also, Artem's bisect.log isn't actually the
.git/BISECT_LOG file that contains the full information about what was
marked good and bad, so it's a bit hard to read (ie I can tell that
Artem had to mark commit 6cf66b4caf9c as "good" not because his log
says so, but because that explains the next commit to be tested).

Of course, it's fairly easy to make a mistake while bisecting (just
doing a thinko), but usually bisection miistakes end up causing you to
go into some "all good" or "all bad" region of commits, and the fact
that Artem seems to have marked the previous commit good and the final
commit bad does seem to imply the bisection was successful.

But yes, it is always nice to double-check the bisection results. The
best way to do it is generally to try to revert the bad commit and
verify that things work after that, but that commit doesn't revert
cleanly on top of 4.3 due to other changes.

Attached is a *COMPLETELY*UNTESTED* revertish patch for 4.3. It's
basically a revert of b54ffb73cadc, but with a few fixups to make the
revert work on top of 4.3.

So Artem, if you can test whether 4.3 works with that revert, and/or
double-check booting that b54ffb73cadc again (to verify that it's
really bad), and its parent (to double-check that it's really good),
that would be a good way to verify that yes, it is really that *one*
commit that breaks things for you.

                Linus

patch.diff (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Kent Overstreet
In reply to this post by Christoph Hellwig-2
On Sun, Dec 20, 2015 at 07:18:01PM +0100, Christoph Hellwig wrote:

> On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
> > Kent, Jens, Christoph et al,
> ie  please see this bugzilla:
> >o
> >   httpps://bugzilla.kernel.org/show_bug.cgi?id=109661
> >
> > where Artem Tashkinov bisected his problems with 4.3 down to commit
> > b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
> > signed off on.
>
> Artem,
>
> can you re-check the commits around this series again?  I would be
> extremtly surprised if it's really this particular commit and not
> one just before it causing the problem - it just allocates bios
> to the biggest possible instead of only allocating up to what
> bio_add_page would accept.

pretty sure it's something with how blk_bio_segment_split() decides what
segments are mergable and not. bio_get_nr_vecs() was just returning nr_pages ==
queue_max_segments (ignoring sectors for the moment) - so wait, wtf? that's
basically assuming no segment merging can ever happen, if it does then this was
causing us to send smaller requests to the device than we could have been.

so actually two possibilities I can see:
 - in blk_bio_segment_split(), something's screwed up with how it decides what
   segments are going to be mergable or not. but I don't think that's likely
   since it's doing the exact same thing the rest of the segment merging code
   does.
 - or, the driver was lying in its queue limits, using queue_max_segments for
   "the maximum number of pages I can possibly take", and that bug lurked
   undiscovered because of the screwed-upness in bio_get_nr_vecs().

Offhand I don't know where to start digging in the driver code to look into the
second theory though. Tejun, you got any ideas?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
In reply to this post by Linus Torvalds-2
On 2015-12-20 22:51, Linus Torvalds wrote:

> Kent, Jens, Christoph et al,
>  please see this bugzilla:
>
>   https://bugzilla.kernel.org/show_bug.cgi?id=109661
>
> where Artem Tashkinov bisected his problems with 4.3 down to commit
> b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
> signed off on.
>
> (Also Tejun - maybe you can see what's up - maybe that error message
> tells you something)
>
> I'm not sure what's up with his machine, the disk doesn't seem to be
> anyuthing particularly unusual, it looks like a 1TB Seagate Barracuda:
>
>   ata1.00: ATA-8: ST1000DM003-1CH162, CC44, max UDMA/133
>
> which doesn't strike me as odd.
>
> Looking at the dmesg, it also looks like it's a pretty normal
> Sandybridge setup with Intel chipset. Artem, can you confirm? The PCI
> ID for the AHCI chip seems to be (INTEL, 0x1c02).
>
> Any ideas? Anybody?
>

That's correct. That's a very usual Asus P8P67 Pro motherboard (Intel
P67 chipset) in AHCI mode and run of the mill HDD which is the one you
identified.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
In reply to this post by Christoph Hellwig-2
On 2015-12-20 23:18, Christoph Hellwig wrote:

> On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
>> Kent, Jens, Christoph et al,
>>  please see this bugzilla:
>>
>>   https://bugzilla.kernel.org/show_bug.cgi?id=109661
>>
>> where Artem Tashkinov bisected his problems with 4.3 down to commit
>> b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
>> signed off on.
>
> Artem,
>
> can you re-check the commits around this series again?  I would be
> extremtly surprised if it's really this particular commit and not
> one just before it causing the problem - it just allocates bios
> to the biggest possible instead of only allocating up to what
> bio_add_page would accept.

I'm positive about this particular commit. Of course, it might be
another
GCC 4.7.4 miscompilation which causes the errors which shouldn't be
there but
I'm not an expert, so.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
In reply to this post by Linus Torvalds-2
On 2015-12-20 23:41, Linus Torvalds wrote:

> On Sun, Dec 20, 2015 at 10:18 AM, Christoph Hellwig wrote:
>>
>> Artem,
>>
>> can you re-check the commits around this series again?  I would be
>> extremtly surprised if it's really this particular commit and not
>> one just before it causing the problem - it just allocates bios
>> to the biggest possible instead of only allocating up to what
>> bio_add_page would accept.
>
> Judging by Artem's bisect log, the last commit he tested before the
> bad one was the commit before: commit 6cf66b4caf9c ("fs: use helper
> bio_add_page() instead of open coding on bi_io_vec") and he marked
> that one good.
>
> Sadly, without CONFIG_LOCALVERSION_AUTO, there's no way to match up
> the dmesg files (in the same bisection tar-file as the bisection log)
> with the actual versions. Also, Artem's bisect.log isn't actually the
> .git/BISECT_LOG file that contains the full information about what was
> marked good and bad, so it's a bit hard to read (ie I can tell that
> Artem had to mark commit 6cf66b4caf9c as "good" not because his log
> says so, but because that explains the next commit to be tested).
>
> Of course, it's fairly easy to make a mistake while bisecting (just
> doing a thinko), but usually bisection miistakes end up causing you to
> go into some "all good" or "all bad" region of commits, and the fact
> that Artem seems to have marked the previous commit good and the final
> commit bad does seem to imply the bisection was successful.
>
> But yes, it is always nice to double-check the bisection results. The
> best way to do it is generally to try to revert the bad commit and
> verify that things work after that, but that commit doesn't revert
> cleanly on top of 4.3 due to other changes.
>
> Attached is a *COMPLETELY*UNTESTED* revertish patch for 4.3. It's
> basically a revert of b54ffb73cadc, but with a few fixups to make the
> revert work on top of 4.3.
>
> So Artem, if you can test whether 4.3 works with that revert, and/or
> double-check booting that b54ffb73cadc again (to verify that it's
> really bad), and its parent (to double-check that it's really good),
> that would be a good way to verify that yes, it is really that *one*
> commit that breaks things for you.
>

After reverting (applying) this patch on top of 4.3.3 everything is back
to normal. It's indeed a guilty commit.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
In reply to this post by Kent Overstreet
On 2015-12-20 23:44, Kent Overstreet wrote:

> On Sun, Dec 20, 2015 at 07:18:01PM +0100, Christoph Hellwig wrote:
>> On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
>> > Kent, Jens, Christoph et al,
>> ie  please see this bugzilla:
>> >o
>> >   httpps://bugzilla.kernel.org/show_bug.cgi?id=109661
>> >
>> > where Artem Tashkinov bisected his problems with 4.3 down to commit
>> > b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
>> > signed off on.
>>
>> Artem,
>>
>> can you re-check the commits around this series again?  I would be
>> extremtly surprised if it's really this particular commit and not
>> one just before it causing the problem - it just allocates bios
>> to the biggest possible instead of only allocating up to what
>> bio_add_page would accept.
>
> pretty sure it's something with how blk_bio_segment_split() decides
> what
> segments are mergable and not. bio_get_nr_vecs() was just returning
> nr_pages ==
> queue_max_segments (ignoring sectors for the moment) - so wait, wtf?
> that's
> basically assuming no segment merging can ever happen, if it does then
> this was
> causing us to send smaller requests to the device than we could have
> been.
>
> so actually two possibilities I can see:
>  - in blk_bio_segment_split(), something's screwed up with how it
> decides what
>    segments are going to be mergable or not. but I don't think that's
> likely
>    since it's doing the exact same thing the rest of the segment
> merging code
>    does.
>  - or, the driver was lying in its queue limits, using
> queue_max_segments for
>    "the maximum number of pages I can possibly take", and that bug
> lurked
>    undiscovered because of the screwed-upness in bio_get_nr_vecs().
>
> Offhand I don't know where to start digging in the driver code to look
> into the
> second theory though. Tejun, you got any ideas?

Here's an actual bisect log which Linus was missing:

git bisect start
# bad: [6a13feb9c82803e2b815eca72fa7a9f5561d7861] Linux 4.3
git bisect bad 6a13feb9c82803e2b815eca72fa7a9f5561d7861
# good: [64291f7db5bd8150a74ad2036f1037e6a0428df2] Linux 4.2
git bisect good 64291f7db5bd8150a74ad2036f1037e6a0428df2
# bad: [807249d3ada1ff28a47c4054ca4edd479421b671] Merge branch
'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
git bisect bad 807249d3ada1ff28a47c4054ca4edd479421b671
# good: [102178108e2246cb4b329d3fb7872cd3d7120205] Merge tag
'armsoc-drivers' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 102178108e2246cb4b329d3fb7872cd3d7120205
# good: [62da98656b62a5ca57f22263705175af8ded5aa1] netfilter:
nf_conntrack: make nf_ct_zone_dflt built-in
git bisect good 62da98656b62a5ca57f22263705175af8ded5aa1
# good: [f1a3c0b933e7ff856223d6fcd7456d403e54e4e5] Merge tag
'devicetree-for-4.3' of
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect good f1a3c0b933e7ff856223d6fcd7456d403e54e4e5
# bad: [9cbf22b37ae0592dea809cb8d424990774c21786] Merge tag 'dlm-4.3' of
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
git bisect bad 9cbf22b37ae0592dea809cb8d424990774c21786
# good: [8bdc69b764013a9b5ebeef7df8f314f1066c5d79] Merge branch
'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
git bisect good 8bdc69b764013a9b5ebeef7df8f314f1066c5d79
# good: [df910390e2db07a76c87f258475f6c96253cee6c] Merge tag 'scsi-misc'
of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect good df910390e2db07a76c87f258475f6c96253cee6c
# bad: [d975f309a8b250e67b66eabeb56be6989c783629] Merge branch
'for-4.3/sg' of git://git.kernel.dk/linux-block
git bisect bad d975f309a8b250e67b66eabeb56be6989c783629
# bad: [89e2a8404e4415da1edbac6ca4f7332b4a74fae2] crypto/omap-sham:
remove an open coded access to ->page_link
git bisect bad 89e2a8404e4415da1edbac6ca4f7332b4a74fae2
# good: [0e28997ec476bad4c7dbe0a08775290051325f53] btrfs: remove bio
splitting and merge_bvec_fn() calls
git bisect good 0e28997ec476bad4c7dbe0a08775290051325f53
# bad: [2ec3182f9c20a9eef0dacc0512cf2ca2df7be5ad] Documentation: update
notes in biovecs about arbitrarily sized bios
git bisect bad 2ec3182f9c20a9eef0dacc0512cf2ca2df7be5ad
# good: [7140aafce2fc14c5af02fdb7859b6bea0108be3d] md/raid5: get rid of
bio_fits_rdev()
git bisect good 7140aafce2fc14c5af02fdb7859b6bea0108be3d
# good: [6cf66b4caf9c71f64a5486cadbd71ab58d0d4307] fs: use helper
bio_add_page() instead of open coding on bi_io_vec
git bisect good 6cf66b4caf9c71f64a5486cadbd71ab58d0d4307
# bad: [b54ffb73cadcdcff9cc1ae0e11f502407e3e2e4c] block: remove
bio_get_nr_vecs()
git bisect bad b54ffb73cadcdcff9cc1ae0e11f502407e3e2e4c

And like he said since the step before the last one was good and the
very last one was bad there was no way I could have made a mistake.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Kent Overstreet
In reply to this post by Artem S. Tashkinov
On Mon, Dec 21, 2015 at 04:25:12AM +0500, Artem S. Tashkinov wrote:

> On 2015-12-20 23:18, Christoph Hellwig wrote:
> >On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
> >>Kent, Jens, Christoph et al,
> >> please see this bugzilla:
> >>
> >>  https://bugzilla.kernel.org/show_bug.cgi?id=109661
> >>
> >>where Artem Tashkinov bisected his problems with 4.3 down to commit
> >>b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
> >>signed off on.
> >
> >Artem,
> >
> >can you re-check the commits around this series again?  I would be
> >extremtly surprised if it's really this particular commit and not
> >one just before it causing the problem - it just allocates bios
> >to the biggest possible instead of only allocating up to what
> >bio_add_page would accept.
>
> I'm positive about this particular commit. Of course, it might be another
> GCC 4.7.4 miscompilation which causes the errors which shouldn't be there
> but
> I'm not an expert, so.

I believe you on the commit, and I doubt this has anything to do with gcc - the
errors you're getting are exactly what you normally get when you send the device
an sglist to dma to/from that it doesn't like.

The queue limits stuff is annoyingly fragile, you'd think we'd be able to check
directly in the driver that the stuff we're sending the device is sane but we
don't.

If I came up with a debug patch could you try it out? I don't have any ideas for
one yet, but if someone who knows the ATA code doesn't jump in I'll call up
Tejun and make him walk me through it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
On 2015-12-21 04:42, Kent Overstreet wrote:

> On Mon, Dec 21, 2015 at 04:25:12AM +0500, Artem S. Tashkinov wrote:
>> On 2015-12-20 23:18, Christoph Hellwig wrote:
>> >On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
>> >>Kent, Jens, Christoph et al,
>> >> please see this bugzilla:
>> >>
>> >>  https://bugzilla.kernel.org/show_bug.cgi?id=109661
>> >>
>> >>where Artem Tashkinov bisected his problems with 4.3 down to commit
>> >>b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
>> >>signed off on.
>> >
>> >Artem,
>> >
>> >can you re-check the commits around this series again?  I would be
>> >extremtly surprised if it's really this particular commit and not
>> >one just before it causing the problem - it just allocates bios
>> >to the biggest possible instead of only allocating up to what
>> >bio_add_page would accept.
>>
>> I'm positive about this particular commit. Of course, it might be
>> another
>> GCC 4.7.4 miscompilation which causes the errors which shouldn't be
>> there
>> but
>> I'm not an expert, so.
>
> I believe you on the commit, and I doubt this has anything to do with
> gcc - the
> errors you're getting are exactly what you normally get when you send
> the device
> an sglist to dma to/from that it doesn't like.
>
> The queue limits stuff is annoyingly fragile, you'd think we'd be able
> to check
> directly in the driver that the stuff we're sending the device is sane
> but we
> don't.
>
> If I came up with a debug patch could you try it out? I don't have any
> ideas for
> one yet, but if someone who knows the ATA code doesn't jump in I'll
> call up
> Tejun and make him walk me through it.

No problem, I just hope that this particular access mode (and you debug
patch) won't decrease the lifespan of my HDD. Seagate HDDs have been
very fragile (read atrociously unreliable) for the past five years.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Ming Lei-5
In reply to this post by Linus Torvalds-2
On Mon, Dec 21, 2015 at 1:51 AM, Linus Torvalds
<[hidden email]> wrote:

> Kent, Jens, Christoph et al,
>  please see this bugzilla:
>
>   https://bugzilla.kernel.org/show_bug.cgi?id=109661
>
> where Artem Tashkinov bisected his problems with 4.3 down to commit
> b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
> signed off on.
>
> (Also Tejun - maybe you can see what's up - maybe that error message
> tells you something)
>
> I'm not sure what's up with his machine, the disk doesn't seem to be
> anyuthing particularly unusual, it looks like a 1TB Seagate Barracuda:
>
>   ata1.00: ATA-8: ST1000DM003-1CH162, CC44, max UDMA/133
>
> which doesn't strike me as odd.
>
> Looking at the dmesg, it also looks like it's a pretty normal
> Sandybridge setup with Intel chipset. Artem, can you confirm? The PCI
> ID for the AHCI chip seems to be (INTEL, 0x1c02).
>
> Any ideas? Anybody?

BTW, I have posted very similar issue in the link:

http://marc.info/?l=linux-ide&m=145066119623811&w=2

Artem, I noticed from bugzillar that the hardware is i386, just
wondering if PAE is enabled?  If yes, I am more confident
that both the two kinds of report are similar or same.

Thanks,

>
>                        Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
On 2015-12-21 06:38, Ming Lei wrote:

> On Mon, Dec 21, 2015 at 1:51 AM, Linus Torvalds wrote:
>> Kent, Jens, Christoph et al,
>>  please see this bugzilla:
>>
>>   https://bugzilla.kernel.org/show_bug.cgi?id=109661
>>
>> where Artem Tashkinov bisected his problems with 4.3 down to commit
>> b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
>> signed off on.
>>
>> (Also Tejun - maybe you can see what's up - maybe that error message
>> tells you something)
>>
>> I'm not sure what's up with his machine, the disk doesn't seem to be
>> anyuthing particularly unusual, it looks like a 1TB Seagate Barracuda:
>>
>>   ata1.00: ATA-8: ST1000DM003-1CH162, CC44, max UDMA/133
>>
>> which doesn't strike me as odd.
>>
>> Looking at the dmesg, it also looks like it's a pretty normal
>> Sandybridge setup with Intel chipset. Artem, can you confirm? The PCI
>> ID for the AHCI chip seems to be (INTEL, 0x1c02).
>>
>> Any ideas? Anybody?
>
> BTW, I have posted very similar issue in the link:
>
> http://marc.info/?l=linux-ide&m=145066119623811&w=2
>
> Artem, I noticed from bugzillar that the hardware is i386, just
> wondering if PAE is enabled?  If yes, I am more confident
> that both the two kinds of report are similar or same.
>

Yes, I'm on i686 with PAE (16GB of RAM here) - it's specifically
mentioned in the corresponding bug report.

P.S. I know Linus doesn't condone PAE but I still find it more
preferrable than running a mixed environment with almost zero benefit in
regard to performance and quite obvious performance regressions related
to an increased number of libraries being loaded (i686 + x86_64) and
slightly bloated code which sometimes cannot fit in the CPU cache. Call
me old fashioned but I won't upgrade to x86_64 until most of the things
that I run locally are available for x86_64 and that won't happen any
time soon.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Ming Lei-5
On Mon, Dec 21, 2015 at 9:50 AM, Artem S. Tashkinov <[hidden email]> wrote:

>> BTW, I have posted very similar issue in the link:
>>
>> http://marc.info/?l=linux-ide&m=145066119623811&w=2
>>
>> Artem, I noticed from bugzillar that the hardware is i386, just
>> wondering if PAE is enabled?  If yes, I am more confident
>> that both the two kinds of report are similar or same.
>>
>
> Yes, I'm on i686 with PAE (16GB of RAM here) - it's specifically mentioned
> in the corresponding bug report.

OK, could you dump value of the following files under /sys/block/sdN/queue/ ?

            max_hw_sectors_kb
            max_sectors_kb
            max_segments
            max_segment_size

'sdN' is the faulted disk name.

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
On 2015-12-21 07:18, Ming Lei wrote:

> On Mon, Dec 21, 2015 at 9:50 AM, Artem S. Tashkinov wrote:
>>> BTW, I have posted very similar issue in the link:
>>>
>>> http://marc.info/?l=linux-ide&m=145066119623811&w=2
>>>
>>> Artem, I noticed from bugzillar that the hardware is i386, just
>>> wondering if PAE is enabled?  If yes, I am more confident
>>> that both the two kinds of report are similar or same.
>>>
>>
>> Yes, I'm on i686 with PAE (16GB of RAM here) - it's specifically
>> mentioned
>> in the corresponding bug report.
>
> OK, could you dump value of the following files under
> /sys/block/sdN/queue/ ?
>
>             max_hw_sectors_kb
>             max_sectors_kb
>             max_segments
>             max_segment_size
>
> 'sdN' is the faulted disk name.
>

# cat
/sys/block/sda/queue/{max_hw_sectors_kb,max_sectors_kb,max_segments,max_segment_size}
32767
32767
168
65536
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Kent Overstreet
In reply to this post by Artem S. Tashkinov
On Mon, Dec 21, 2015 at 06:50:21AM +0500, Artem S. Tashkinov wrote:

> On 2015-12-21 06:38, Ming Lei wrote:
> >On Mon, Dec 21, 2015 at 1:51 AM, Linus Torvalds wrote:
> >>Kent, Jens, Christoph et al,
> >> please see this bugzilla:
> >>
> >>  https://bugzilla.kernel.org/show_bug.cgi?id=109661
> >>
> >>where Artem Tashkinov bisected his problems with 4.3 down to commit
> >>b54ffb73cadc ("block: remove bio_get_nr_vecs()") that you've all
> >>signed off on.
> >>
> >>(Also Tejun - maybe you can see what's up - maybe that error message
> >>tells you something)
> >>
> >>I'm not sure what's up with his machine, the disk doesn't seem to be
> >>anyuthing particularly unusual, it looks like a 1TB Seagate Barracuda:
> >>
> >>  ata1.00: ATA-8: ST1000DM003-1CH162, CC44, max UDMA/133
> >>
> >>which doesn't strike me as odd.
> >>
> >>Looking at the dmesg, it also looks like it's a pretty normal
> >>Sandybridge setup with Intel chipset. Artem, can you confirm? The PCI
> >>ID for the AHCI chip seems to be (INTEL, 0x1c02).
> >>
> >>Any ideas? Anybody?
> >
> >BTW, I have posted very similar issue in the link:
> >
> >http://marc.info/?l=linux-ide&m=145066119623811&w=2
> >
> >Artem, I noticed from bugzillar that the hardware is i386, just
> >wondering if PAE is enabled?  If yes, I am more confident
> >that both the two kinds of report are similar or same.
> >
>
> Yes, I'm on i686 with PAE (16GB of RAM here) - it's specifically mentioned
> in the corresponding bug report.
>
> P.S. I know Linus doesn't condone PAE but I still find it more preferrable
> than running a mixed environment with almost zero benefit in regard to
> performance and quite obvious performance regressions related to an
> increased number of libraries being loaded (i686 + x86_64) and slightly
> bloated code which sometimes cannot fit in the CPU cache. Call me old
> fashioned but I won't upgrade to x86_64 until most of the things that I run
> locally are available for x86_64 and that won't happen any time soon.

oy vey. WTF's been happening in blk-merge.c?

Theyy're not the same bug. The bug in your thread was introduced by Jens in
5014c311ba "block: fix bogus compiler warnings in blk-merge.c", where he screwed
up the bvprv handling - but that patch comes after the patch Artem bisected to.

blk_bio_segment_split() looks correct in b54ffb73ca.

What we need to do is:
 in the _driver_, immediately before handing the sglist off to the device, walk
 the sglist and verify it obeys all the restrictions for that particular device
 - and if it's not, print out exactly what we screwed up.

I don't know where that code lives in the ahci driver, and more importantly I
don't know where the dma restrictions come from, but if someone who knows the
driver code can walk me through it I'll write the patch.

--------------

Also - Ming, Christoph, anyone else who might be working on this stuff in the
future:

The way all the queue limits stuff works is still way too fragile; this has been
a recurring source of bugs. There's way too many different restrictions
different devices need, and it's easy for a driver to specify the restrictions
incorrectly in a way that just happens to work, but for the wrong reasons - e.g.
"I can't handle more than x segments, but saying I can't handle more than x
sectors happens to work for now because of some other bug in the upper layers" -
and then when we have to debug that later, we're screwed.

My intent when I was working on this was to eventually push the implementation
of the limits down as much as possible to the actual drivers - i.e. there the
limitations come from, so the driver can say, for example:

"ok, my device can only do scatter/gather dma to max 20 different addresses, so
I'll allocate sglists with 20 entries, and it doesn't matter if the bio or
request or whatever is bigger because when I call blk_rq_map_sg() it's just
going to map as much of the request as will fit in a given sglist and requests
will get processed incrementally until they're finished - and if a particular sg
entry can only be a particular size, or has alignment restrictions or whatever,
I'll just pass that directly to blk_rq_map_sg()"

so that the driver is ideally specifying _only_ its real restrictions, and
they're being specified in the code exactly where they're being used.

-------

Basically, blk_queue_split() was only meant to be an interim solution, so I'd
suggest that instead of doing performance optimizations on that codepath a
better use of time and effort would be to work towards ripping it out entirely.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Ming Lei-5
On Mon, Dec 21, 2015 at 10:25 AM, Artem S. Tashkinov <[hidden email]> wrote:
> # cat
> /sys/block/sda/queue/{max_hw_sectors_kb,max_sectors_kb,max_segments,max_segment_size}
> 32767
> 32767
> 168
> 65536

Looks it is fine, then maybe it is related with BIOVEC_PHYS_MERGEABLE(),
BIOVEC_SEG_BOUNDARY() or sort of thing, because dma_addr_t and
phys_addr_t turn to 64-bit with PAE, but 'unsigned long' and 'void *'
is still 32bit.

It was confirmed that there isn't the issue if PAE is disabled.

Dumping both sata/ahci hw sg table and bio's bvec might be helpful.

On Mon, Dec 21, 2015 at 10:32 AM, Kent Overstreet
<[hidden email]> wrote:
>
> oy vey. WTF's been happening in blk-merge.c?
>
> Theyy're not the same bug. The bug in your thread was introduced by Jens in
> 5014c311ba "block: fix bogus compiler warnings in blk-merge.c", where he screwed
> up the bvprv handling - but that patch comes after the patch Artem bisected to.
>
> blk_bio_segment_split() looks correct in b54ffb73ca.

Yes, that is why reverting 578270bfb(block: fix segment split) can make the
issue disappear, because 5014c311ba "block: fix bogus compiler
warnings in blk-merge.c" basically disables sg-merge and prevents the
issue from being
triggered.



Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
On 2015-12-21 08:21, Ming Lei wrote:

> On Mon, Dec 21, 2015 at 10:25 AM, Artem S. Tashkinov wrote:
>> # cat
>> /sys/block/sda/queue/{max_hw_sectors_kb,max_sectors_kb,max_segments,max_segment_size}
>> 32767
>> 32767
>> 168
>> 65536
>
> Looks it is fine, then maybe it is related with
> BIOVEC_PHYS_MERGEABLE(),
> BIOVEC_SEG_BOUNDARY() or sort of thing, because dma_addr_t and
> phys_addr_t turn to 64-bit with PAE, but 'unsigned long' and 'void *'
> is still 32bit.
>
> It was confirmed that there isn't the issue if PAE is disabled.
>
> Dumping both sata/ahci hw sg table and bio's bvec might be helpful.

Um, sorry, what exact variables/files do you want to see? I'm not an
expert in /sys.

>
> On Mon, Dec 21, 2015 at 10:32 AM, Kent Overstreet wrote:
>>
>> oy vey. WTF's been happening in blk-merge.c?
>>
>> Theyy're not the same bug. The bug in your thread was introduced by
>> Jens in
>> 5014c311ba "block: fix bogus compiler warnings in blk-merge.c", where
>> he screwed
>> up the bvprv handling - but that patch comes after the patch Artem
>> bisected to.
>>
>> blk_bio_segment_split() looks correct in b54ffb73ca.
>
> Yes, that is why reverting 578270bfb(block: fix segment split) can make
> the
> issue disappear, because 5014c311ba "block: fix bogus compiler
> warnings in blk-merge.c" basically disables sg-merge and prevents the
> issue from being
> triggered.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Tejun Heo-2
In reply to this post by Linus Torvalds-2
Hello, Linus.

On Sun, Dec 20, 2015 at 09:51:14AM -0800, Linus Torvalds wrote:
...
> (Also Tejun - maybe you can see what's up - maybe that error message
> tells you something)

Hmmm... all it says is that something went wrong on the PCI side.

> I'm not sure what's up with his machine, the disk doesn't seem to be
> anyuthing particularly unusual, it looks like a 1TB Seagate Barracuda:
>
>   ata1.00: ATA-8: ST1000DM003-1CH162, CC44, max UDMA/133
>
> which doesn't strike me as odd.
>
> Looking at the dmesg, it also looks like it's a pretty normal
> Sandybridge setup with Intel chipset. Artem, can you confirm? The PCI
> ID for the AHCI chip seems to be (INTEL, 0x1c02).
>
> Any ideas? Anybody?

I wonder whether ahci is screwing up command / sg table setup in a way
that e.g. if there are too many segments the sg table overflows into
the neighboring one which is now being exposed by upper layer being
fixed to send down larger commands.  Looking into it.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Linus Torvalds-2
In reply to this post by Artem S. Tashkinov
On Sun, Dec 20, 2015 at 5:50 PM, Artem S. Tashkinov <[hidden email]> wrote:
>
> P.S. I know Linus doesn't condone PAE but I still find it more preferrable
> than running a mixed environment with almost zero benefit in regard to
> performance and quite obvious performance regressions related to an
> increased number of libraries being loaded (i686 + x86_64) and slightly
> bloated code which sometimes cannot fit in the CPU cache. Call me old
> fashioned but I won't upgrade to x86_64 until most of the things that I run
> locally are available for x86_64 and that won't happen any time soon.

Don't upgrade *user* land. User land doesn't use the braindamage that is PAE.

Just run a 64-bit kernel. Keep all your 32-bit userland apps and libraries.

Trust me, that *will* be faster. PAE works really horribly badly,
because all your really important data structures like your inodes and
directory cache will all be in the low 1GB even if you have 16BG of
RAM.

Of course, I'd also like more people to run things that way just to
get more coverage of the whole "yes, we do all the compat stuff
correctly". So I have some other reasons to prefer people running
64-bit kernels with 32-bit user land. But PAE really is a disaster.

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: IO errors after "block: remove bio_get_nr_vecs()"

Artem S. Tashkinov
On 2015-12-21 09:32, Linus Torvalds wrote:

> On Sun, Dec 20, 2015 at 5:50 PM, Artem S. Tashkinov wrote:
>>
>> P.S. I know Linus doesn't condone PAE but I still find it more
>> preferrable
>> than running a mixed environment with almost zero benefit in regard to
>> performance and quite obvious performance regressions related to an
>> increased number of libraries being loaded (i686 + x86_64) and
>> slightly
>> bloated code which sometimes cannot fit in the CPU cache. Call me old
>> fashioned but I won't upgrade to x86_64 until most of the things that
>> I run
>> locally are available for x86_64 and that won't happen any time soon.
>
> Don't upgrade *user* land. User land doesn't use the braindamage that
> is PAE.
>
> Just run a 64-bit kernel. Keep all your 32-bit userland apps and
> libraries.
>
> Trust me, that *will* be faster. PAE works really horribly badly,
> because all your really important data structures like your inodes and
> directory cache will all be in the low 1GB even if you have 16BG of
> RAM.
>
> Of course, I'd also like more people to run things that way just to
> get more coverage of the whole "yes, we do all the compat stuff
> correctly". So I have some other reasons to prefer people running
> 64-bit kernels with 32-bit user land. But PAE really is a disaster.
>

In the past I happily ran an x86_64 bit kernel together with 32bit
userland for quite some time but then I hit a wall: VirtualBox expects
its kernel modules to have the same bitness as the application itself so
I had to revert back to an i686 PAE setup. It's probably high time to
try qemu however last time I looked at it a few years ago it lacked
several crucial features I need from a VM.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
123