[PATCH v4 00/11] simplify block layer based on immutable biovecs

classic Classic list List threaded Threaded
62 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 08/11] block: kill merge_bvec_fn() completely

Christoph Hellwig-2
On Fri, May 22, 2015 at 11:18:40AM -0700, Ming Lin wrote:
> From: Kent Overstreet <[hidden email]>
>
> As generic_make_request() is now able to handle arbitrarily sized bios,
> it's no longer necessary for each individual block driver to define its
> own ->merge_bvec_fn() callback. Remove every invocation completely.

It might be good to replace patch 1 and this one by a patch per driver
to remove the merge_bvec_fn instance and add the blk_queue_split call
for all those drivers that actually had a ->merge_bvec_fn.  As some
of them were non-trivial attention from the maintainers would be helpful,
and a patch per driver might help with that.

> -/* This is called by bio_add_page().
> - *
> - * q->max_hw_sectors and other global limits are already enforced there.
> - *
> - * We need to call down to our lower level device,
> - * in case it has special restrictions.
> - *
> - * We also may need to enforce configured max-bio-bvecs limits.
> - *
> - * As long as the BIO is empty we have to allow at least one bvec,
> - * regardless of size and offset, so no need to ask lower levels.
> - */
> -int drbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bvm, struct bio_vec *bvec)


This just checks the lower device, so it looks obviously fine.

> -static int pkt_merge_bvec(struct request_queue *q, struct bvec_merge_data *bmd,
> -  struct bio_vec *bvec)
> -{
> - struct pktcdvd_device *pd = q->queuedata;
> - sector_t zone = get_zone(bmd->bi_sector, pd);
> - int used = ((bmd->bi_sector - zone) << 9) + bmd->bi_size;
> - int remaining = (pd->settings.size << 9) - used;
> - int remaining2;
> -
> - /*
> - * A bio <= PAGE_SIZE must be allowed. If it crosses a packet
> - * boundary, pkt_make_request() will split the bio.
> - */
> - remaining2 = PAGE_SIZE - bmd->bi_size;
> - remaining = max(remaining, remaining2);
> -
> - BUG_ON(remaining < 0);
> - return remaining;
> -}

As mentioned in the comment pkt_make_request will split the bio so pkt
looks fine.

> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index ec6c5c6..f50edb3 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -3440,52 +3440,6 @@ static int rbd_queue_rq(struct blk_mq_hw_ctx *hctx,
>   return BLK_MQ_RQ_QUEUE_OK;
>  }
>  
> -/*
> - * a queue callback. Makes sure that we don't create a bio that spans across
> - * multiple osd objects. One exception would be with a single page bios,
> - * which we handle later at bio_chain_clone_range()
> - */
> -static int rbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bmd,
> -  struct bio_vec *bvec)

It seems rbd handles requests spanning objects just fine, so I don't
really understand why rbd_merge_bvec even exists.  Getting some form
of ACK from the ceph folks would be useful.

> -/*
> - * We assume I/O is going to the origin (which is the volume
> - * more likely to have restrictions e.g. by being striped).
> - * (Looking up the exact location of the data would be expensive
> - * and could always be out of date by the time the bio is submitted.)
> - */
> -static int cache_bvec_merge(struct dm_target *ti,
> -    struct bvec_merge_data *bvm,
> -    struct bio_vec *biovec, int max_size)
> -{

DM seems to have the most complex merge functions of all drivers, so
I'd really love to see an ACK from Mike.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 06/11] md/raid5: get rid of bio_fits_rdev()

Christoph Hellwig-2
In reply to this post by NeilBrown
On Mon, May 25, 2015 at 05:54:14PM +1000, NeilBrown wrote:
> Did I write that?  I guess I did :-(
> I meant *after*.   Don't get rid of bio_fits_rdev until split_bio is in
> chunk_aligned_read().

I suspect the whole series could use some reordering.

patch 1:

 add ->bio_split and blk_queue_split

patch 2..n:

 one for each non-trivial driver that implements ->merge_bvec_fn to
 remove it and instead split bios in ->make_request.  The md patch
 to do the right thing in chunk_aligned_read goes into the general
 md patch here.  The bcache patch also goes into this series.

patch n+1:

 - add blk_queue_split calls for remaining trivial drivers

patch n+2:

 - remove ->merge_bvec_fn and checking of max_sectors a for all
   drivers, simplify bio_add_page

patch n+2:

 - remove splitting in blkdev_issue_discard

patch n+3

 - remove bio_fits_rdev

patch n+4

 - remove bio_get_nr_vecs

patch n+4

 - use bio_add_page

patch n+5

 - update documentation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 08/11] block: kill merge_bvec_fn() completely

Ilya Dryomov
In reply to this post by Christoph Hellwig-2
On Mon, May 25, 2015 at 5:04 PM, Christoph Hellwig <[hidden email]> wrote:

> On Fri, May 22, 2015 at 11:18:40AM -0700, Ming Lin wrote:
>> From: Kent Overstreet <[hidden email]>
>>
>> As generic_make_request() is now able to handle arbitrarily sized bios,
>> it's no longer necessary for each individual block driver to define its
>> own ->merge_bvec_fn() callback. Remove every invocation completely.
>
> It might be good to replace patch 1 and this one by a patch per driver
> to remove the merge_bvec_fn instance and add the blk_queue_split call
> for all those drivers that actually had a ->merge_bvec_fn.  As some
> of them were non-trivial attention from the maintainers would be helpful,
> and a patch per driver might help with that.
>
>> -/* This is called by bio_add_page().
>> - *
>> - * q->max_hw_sectors and other global limits are already enforced there.
>> - *
>> - * We need to call down to our lower level device,
>> - * in case it has special restrictions.
>> - *
>> - * We also may need to enforce configured max-bio-bvecs limits.
>> - *
>> - * As long as the BIO is empty we have to allow at least one bvec,
>> - * regardless of size and offset, so no need to ask lower levels.
>> - */
>> -int drbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bvm, struct bio_vec *bvec)
>
>
> This just checks the lower device, so it looks obviously fine.
>
>> -static int pkt_merge_bvec(struct request_queue *q, struct bvec_merge_data *bmd,
>> -                       struct bio_vec *bvec)
>> -{
>> -     struct pktcdvd_device *pd = q->queuedata;
>> -     sector_t zone = get_zone(bmd->bi_sector, pd);
>> -     int used = ((bmd->bi_sector - zone) << 9) + bmd->bi_size;
>> -     int remaining = (pd->settings.size << 9) - used;
>> -     int remaining2;
>> -
>> -     /*
>> -      * A bio <= PAGE_SIZE must be allowed. If it crosses a packet
>> -      * boundary, pkt_make_request() will split the bio.
>> -      */
>> -     remaining2 = PAGE_SIZE - bmd->bi_size;
>> -     remaining = max(remaining, remaining2);
>> -
>> -     BUG_ON(remaining < 0);
>> -     return remaining;
>> -}
>
> As mentioned in the comment pkt_make_request will split the bio so pkt
> looks fine.
>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index ec6c5c6..f50edb3 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -3440,52 +3440,6 @@ static int rbd_queue_rq(struct blk_mq_hw_ctx *hctx,
>>       return BLK_MQ_RQ_QUEUE_OK;
>>  }
>>
>> -/*
>> - * a queue callback. Makes sure that we don't create a bio that spans across
>> - * multiple osd objects. One exception would be with a single page bios,
>> - * which we handle later at bio_chain_clone_range()
>> - */
>> -static int rbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bmd,
>> -                       struct bio_vec *bvec)
>
> It seems rbd handles requests spanning objects just fine, so I don't
> really understand why rbd_merge_bvec even exists.  Getting some form
> of ACK from the ceph folks would be useful.

I'm not Alex, but yeah, we have all the clone/split machinery and so we
can handle a spanning case just fine.  I think rbd_merge_bvec() exists
to make sure we don't have to do that unless it's really necessary -
like when a single page gets submitted at an inconvenient offset.

I have a patch that adds a blk_queue_chunk_sectors(object_size) call to
rbd_init_disk() but I haven't had a chance to play with it yet.  In any
case, we should be fine with getting rid of rbd_merge_bvec().  If this
ends up a per-driver patchset, I can make rbd_merge_bvec() ->
blk_queue_chunk_sectors() a single patch and push it through
ceph-client.git.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 08/11] block: kill merge_bvec_fn() completely

Christoph Hellwig-2
On Mon, May 25, 2015 at 06:02:30PM +0300, Ilya Dryomov wrote:

> I'm not Alex, but yeah, we have all the clone/split machinery and so we
> can handle a spanning case just fine.  I think rbd_merge_bvec() exists
> to make sure we don't have to do that unless it's really necessary -
> like when a single page gets submitted at an inconvenient offset.
>
> I have a patch that adds a blk_queue_chunk_sectors(object_size) call to
> rbd_init_disk() but I haven't had a chance to play with it yet.  In any
> case, we should be fine with getting rid of rbd_merge_bvec().  If this
> ends up a per-driver patchset, I can make rbd_merge_bvec() ->
> blk_queue_chunk_sectors() a single patch and push it through
> ceph-client.git.

Hmm, looks like the new blk_queue_split_bio ignore the chunk_sectors
value, another thing that needs updating.  I forgot how many weird
merging hacks we had to add for nvme..

While I'd like to see per-driver patches we'd still need to merge
them together through the block tree.  Note that with this series
there won't be any benefit of using blk_queue_chunk_sectors over just
doing the split in rbd.  Maybe we can even remove it again and do
that work in the drivers in the future.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 08/11] block: kill merge_bvec_fn() completely

Ilya Dryomov
On Mon, May 25, 2015 at 6:08 PM, Christoph Hellwig <[hidden email]> wrote:

> On Mon, May 25, 2015 at 06:02:30PM +0300, Ilya Dryomov wrote:
>> I'm not Alex, but yeah, we have all the clone/split machinery and so we
>> can handle a spanning case just fine.  I think rbd_merge_bvec() exists
>> to make sure we don't have to do that unless it's really necessary -
>> like when a single page gets submitted at an inconvenient offset.
>>
>> I have a patch that adds a blk_queue_chunk_sectors(object_size) call to
>> rbd_init_disk() but I haven't had a chance to play with it yet.  In any
>> case, we should be fine with getting rid of rbd_merge_bvec().  If this
>> ends up a per-driver patchset, I can make rbd_merge_bvec() ->
>> blk_queue_chunk_sectors() a single patch and push it through
>> ceph-client.git.
>
> Hmm, looks like the new blk_queue_split_bio ignore the chunk_sectors
> value, another thing that needs updating.  I forgot how many weird
> merging hacks we had to add for nvme..
>
> While I'd like to see per-driver patches we'd still need to merge
> them together through the block tree.  Note that with this series
> there won't be any benefit of using blk_queue_chunk_sectors over just
> doing the split in rbd.  Maybe we can even remove it again and do
> that work in the drivers in the future.

OK, I'll drop it, especially if it's potentially on its way out.  With
the fancy striping support, which I'll hopefully get to sometime, the
striping pattern will become much more complicated anyway, so relying
on rbd doing bio splitting is right in the long run as well.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 08/11] block: kill merge_bvec_fn() completely

Alex Elder-5
In reply to this post by Ilya Dryomov
On 05/25/2015 10:02 AM, Ilya Dryomov wrote:

> On Mon, May 25, 2015 at 5:04 PM, Christoph Hellwig <[hidden email]> wrote:
>> On Fri, May 22, 2015 at 11:18:40AM -0700, Ming Lin wrote:
>>> From: Kent Overstreet <[hidden email]>
>>>
>>> As generic_make_request() is now able to handle arbitrarily sized bios,
>>> it's no longer necessary for each individual block driver to define its
>>> own ->merge_bvec_fn() callback. Remove every invocation completely.
>>
>> It might be good to replace patch 1 and this one by a patch per driver
>> to remove the merge_bvec_fn instance and add the blk_queue_split call
>> for all those drivers that actually had a ->merge_bvec_fn.  As some
>> of them were non-trivial attention from the maintainers would be helpful,
>> and a patch per driver might help with that.
>>
>>> -/* This is called by bio_add_page().
>>> - *
>>> - * q->max_hw_sectors and other global limits are already enforced there.
>>> - *
>>> - * We need to call down to our lower level device,
>>> - * in case it has special restrictions.
>>> - *
>>> - * We also may need to enforce configured max-bio-bvecs limits.
>>> - *
>>> - * As long as the BIO is empty we have to allow at least one bvec,
>>> - * regardless of size and offset, so no need to ask lower levels.
>>> - */
>>> -int drbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bvm, struct bio_vec *bvec)
>>
>>
>> This just checks the lower device, so it looks obviously fine.
>>
>>> -static int pkt_merge_bvec(struct request_queue *q, struct bvec_merge_data *bmd,
>>> -                       struct bio_vec *bvec)
>>> -{
>>> -     struct pktcdvd_device *pd = q->queuedata;
>>> -     sector_t zone = get_zone(bmd->bi_sector, pd);
>>> -     int used = ((bmd->bi_sector - zone) << 9) + bmd->bi_size;
>>> -     int remaining = (pd->settings.size << 9) - used;
>>> -     int remaining2;
>>> -
>>> -     /*
>>> -      * A bio <= PAGE_SIZE must be allowed. If it crosses a packet
>>> -      * boundary, pkt_make_request() will split the bio.
>>> -      */
>>> -     remaining2 = PAGE_SIZE - bmd->bi_size;
>>> -     remaining = max(remaining, remaining2);
>>> -
>>> -     BUG_ON(remaining < 0);
>>> -     return remaining;
>>> -}
>>
>> As mentioned in the comment pkt_make_request will split the bio so pkt
>> looks fine.
>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index ec6c5c6..f50edb3 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -3440,52 +3440,6 @@ static int rbd_queue_rq(struct blk_mq_hw_ctx *hctx,
>>>       return BLK_MQ_RQ_QUEUE_OK;
>>>  }
>>>
>>> -/*
>>> - * a queue callback. Makes sure that we don't create a bio that spans across
>>> - * multiple osd objects. One exception would be with a single page bios,
>>> - * which we handle later at bio_chain_clone_range()
>>> - */
>>> -static int rbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bmd,
>>> -                       struct bio_vec *bvec)
>>
>> It seems rbd handles requests spanning objects just fine, so I don't
>> really understand why rbd_merge_bvec even exists.  Getting some form
>> of ACK from the ceph folks would be useful.
>
> I'm not Alex, but yeah, we have all the clone/split machinery and so we
> can handle a spanning case just fine.  I think rbd_merge_bvec() exists
> to make sure we don't have to do that unless it's really necessary -
> like when a single page gets submitted at an inconvenient offset.

I am Alex.  This is something I never removed.  I haven't
looked at it closely now, but it seems to me that after I
created a function that split stuff properly up *before*
the BIO layer got to it (which has since been replaced by
code related to Kent's immutable BIO work), there has been
no need for this function.  Removing this was on a long-ago
to-do list--but I didn't want to do it without spending some
time ensuring it wouldn't break anything.

If you want me to work through it in more detail so I can
give a more certain response, let me know and I will do so.

                                        -Alex

> I have a patch that adds a blk_queue_chunk_sectors(object_size) call to
> rbd_init_disk() but I haven't had a chance to play with it yet.  In any
> case, we should be fine with getting rid of rbd_merge_bvec().  If this
> ends up a per-driver patchset, I can make rbd_merge_bvec() ->
> blk_queue_chunk_sectors() a single patch and push it through
> ceph-client.git.
>
> Thanks,
>
>                 Ilya
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 06/11] md/raid5: get rid of bio_fits_rdev()

Ming Lin-2
In reply to this post by Christoph Hellwig-2
On Mon, May 25, 2015 at 7:17 AM, Christoph Hellwig <[hidden email]> wrote:
> On Mon, May 25, 2015 at 05:54:14PM +1000, NeilBrown wrote:
>> Did I write that?  I guess I did :-(
>> I meant *after*.   Don't get rid of bio_fits_rdev until split_bio is in
>> chunk_aligned_read().
>
> I suspect the whole series could use some reordering.

Nice reordering.
I'll do this.

Thanks.

>
> patch 1:
>
>  add ->bio_split and blk_queue_split
>
> patch 2..n:
>
>  one for each non-trivial driver that implements ->merge_bvec_fn to
>  remove it and instead split bios in ->make_request.  The md patch
>  to do the right thing in chunk_aligned_read goes into the general
>  md patch here.  The bcache patch also goes into this series.
>
> patch n+1:
>
>  - add blk_queue_split calls for remaining trivial drivers
>
> patch n+2:
>
>  - remove ->merge_bvec_fn and checking of max_sectors a for all
>    drivers, simplify bio_add_page
>
> patch n+2:
>
>  - remove splitting in blkdev_issue_discard
>
> patch n+3
>
>  - remove bio_fits_rdev
>
> patch n+4
>
>  - remove bio_get_nr_vecs
>
> patch n+4
>
>  - use bio_add_page
>
> patch n+5
>
>  - update documentation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios

Mike Snitzer
In reply to this post by Ming Lin-2
On Fri, May 22 2015 at  2:18pm -0400,
Ming Lin <[hidden email]> wrote:

> From: Kent Overstreet <[hidden email]>
>
> The way the block layer is currently written, it goes to great lengths
> to avoid having to split bios; upper layer code (such as bio_add_page())
> checks what the underlying device can handle and tries to always create
> bios that don't need to be split.
>
> But this approach becomes unwieldy and eventually breaks down with
> stacked devices and devices with dynamic limits, and it adds a lot of
> complexity. If the block layer could split bios as needed, we could
> eliminate a lot of complexity elsewhere - particularly in stacked
> drivers. Code that creates bios can then create whatever size bios are
> convenient, and more importantly stacked drivers don't have to deal with
> both their own bio size limitations and the limitations of the
> (potentially multiple) devices underneath them.  In the future this will
> let us delete merge_bvec_fn and a bunch of other code.

This series doesn't take any steps to train upper layers
(e.g. filesystems) to size their bios larger (which is defined as
"whatever size bios are convenient" above).

bio_add_page(), and merge_bvec_fn, served as the means for upper layers
(and direct IO) to build up optimally sized bios.  Without a replacement
(that I can see anyway) how is this patchset making forward progress
(getting Acks, etc)!?

I like the idea of reduced complexity associated with these late bio
splitting changes I'm just not seeing how this is ready given there are
no upper layer changes that speak to building larger bios..

What am I missing?

Please advise, thanks!
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios

Ming Lin-2
On Tue, May 26, 2015 at 7:36 AM, Mike Snitzer <[hidden email]> wrote:

> On Fri, May 22 2015 at  2:18pm -0400,
> Ming Lin <[hidden email]> wrote:
>
>> From: Kent Overstreet <[hidden email]>
>>
>> The way the block layer is currently written, it goes to great lengths
>> to avoid having to split bios; upper layer code (such as bio_add_page())
>> checks what the underlying device can handle and tries to always create
>> bios that don't need to be split.
>>
>> But this approach becomes unwieldy and eventually breaks down with
>> stacked devices and devices with dynamic limits, and it adds a lot of
>> complexity. If the block layer could split bios as needed, we could
>> eliminate a lot of complexity elsewhere - particularly in stacked
>> drivers. Code that creates bios can then create whatever size bios are
>> convenient, and more importantly stacked drivers don't have to deal with
>> both their own bio size limitations and the limitations of the
>> (potentially multiple) devices underneath them.  In the future this will
>> let us delete merge_bvec_fn and a bunch of other code.
>
> This series doesn't take any steps to train upper layers
> (e.g. filesystems) to size their bios larger (which is defined as
> "whatever size bios are convenient" above).
>
> bio_add_page(), and merge_bvec_fn, served as the means for upper layers
> (and direct IO) to build up optimally sized bios.  Without a replacement
> (that I can see anyway) how is this patchset making forward progress
> (getting Acks, etc)!?
>
> I like the idea of reduced complexity associated with these late bio
> splitting changes I'm just not seeing how this is ready given there are
> no upper layer changes that speak to building larger bios..
>
> What am I missing?

See: [PATCH v4 02/11] block: simplify bio_add_page()
https://lkml.org/lkml/2015/5/22/754

Now bio_add_page() can build lager bios.
And blk_queue_split() can split the bios in ->make_request() if needed.

Thanks.

>
> Please advise, thanks!
> Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios

Alasdair G Kergon
On Tue, May 26, 2015 at 08:02:08AM -0700, Ming Lin wrote:
> Now bio_add_page() can build lager bios.
> And blk_queue_split() can split the bios in ->make_request() if needed.

But why not try to make the bio the right size in the first place so you
don't have to incur the performance impact of splitting?

What performance testing have you yet done to demonstrate the *actual* impact
of this patchset in situations where merge_bvec_fn is currently a net benefit?

Alasdair

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios

Mike Snitzer
In reply to this post by Ming Lin-2
On Tue, May 26 2015 at 11:02am -0400,
Ming Lin <[hidden email]> wrote:

> On Tue, May 26, 2015 at 7:36 AM, Mike Snitzer <[hidden email]> wrote:
> > On Fri, May 22 2015 at  2:18pm -0400,
> > Ming Lin <[hidden email]> wrote:
> >
> >> From: Kent Overstreet <[hidden email]>
> >>
> >> The way the block layer is currently written, it goes to great lengths
> >> to avoid having to split bios; upper layer code (such as bio_add_page())
> >> checks what the underlying device can handle and tries to always create
> >> bios that don't need to be split.
> >>
> >> But this approach becomes unwieldy and eventually breaks down with
> >> stacked devices and devices with dynamic limits, and it adds a lot of
> >> complexity. If the block layer could split bios as needed, we could
> >> eliminate a lot of complexity elsewhere - particularly in stacked
> >> drivers. Code that creates bios can then create whatever size bios are
> >> convenient, and more importantly stacked drivers don't have to deal with
> >> both their own bio size limitations and the limitations of the
> >> (potentially multiple) devices underneath them.  In the future this will
> >> let us delete merge_bvec_fn and a bunch of other code.
> >
> > This series doesn't take any steps to train upper layers
> > (e.g. filesystems) to size their bios larger (which is defined as
> > "whatever size bios are convenient" above).
> >
> > bio_add_page(), and merge_bvec_fn, served as the means for upper layers
> > (and direct IO) to build up optimally sized bios.  Without a replacement
> > (that I can see anyway) how is this patchset making forward progress
> > (getting Acks, etc)!?
> >
> > I like the idea of reduced complexity associated with these late bio
> > splitting changes I'm just not seeing how this is ready given there are
> > no upper layer changes that speak to building larger bios..
> >
> > What am I missing?
>
> See: [PATCH v4 02/11] block: simplify bio_add_page()
> https://lkml.org/lkml/2015/5/22/754
>
> Now bio_add_page() can build lager bios.
> And blk_queue_split() can split the bios in ->make_request() if needed.

That'll result in quite large bios and always needing splitting.

As Alasdair asked: please provide some performance data that justifies
these changes.  E.g use a setup like: XFS on a DM striped target.  We
can iterate on more complex setups once we have established some basic
tests.

If you're just punting to reviewers to do the testing for you that isn't
going to instill _any_ confidence in me for this patchset as a suitabe
replacement relative to performance.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios

Ming Lin-2
On Tue, May 26, 2015 at 9:04 AM, Mike Snitzer <[hidden email]> wrote:

> On Tue, May 26 2015 at 11:02am -0400,
> Ming Lin <[hidden email]> wrote:
>
>> On Tue, May 26, 2015 at 7:36 AM, Mike Snitzer <[hidden email]> wrote:
>> > On Fri, May 22 2015 at  2:18pm -0400,
>> > Ming Lin <[hidden email]> wrote:
>> >
>> >> From: Kent Overstreet <[hidden email]>
>> >>
>> >> The way the block layer is currently written, it goes to great lengths
>> >> to avoid having to split bios; upper layer code (such as bio_add_page())
>> >> checks what the underlying device can handle and tries to always create
>> >> bios that don't need to be split.
>> >>
>> >> But this approach becomes unwieldy and eventually breaks down with
>> >> stacked devices and devices with dynamic limits, and it adds a lot of
>> >> complexity. If the block layer could split bios as needed, we could
>> >> eliminate a lot of complexity elsewhere - particularly in stacked
>> >> drivers. Code that creates bios can then create whatever size bios are
>> >> convenient, and more importantly stacked drivers don't have to deal with
>> >> both their own bio size limitations and the limitations of the
>> >> (potentially multiple) devices underneath them.  In the future this will
>> >> let us delete merge_bvec_fn and a bunch of other code.
>> >
>> > This series doesn't take any steps to train upper layers
>> > (e.g. filesystems) to size their bios larger (which is defined as
>> > "whatever size bios are convenient" above).
>> >
>> > bio_add_page(), and merge_bvec_fn, served as the means for upper layers
>> > (and direct IO) to build up optimally sized bios.  Without a replacement
>> > (that I can see anyway) how is this patchset making forward progress
>> > (getting Acks, etc)!?
>> >
>> > I like the idea of reduced complexity associated with these late bio
>> > splitting changes I'm just not seeing how this is ready given there are
>> > no upper layer changes that speak to building larger bios..
>> >
>> > What am I missing?
>>
>> See: [PATCH v4 02/11] block: simplify bio_add_page()
>> https://lkml.org/lkml/2015/5/22/754
>>
>> Now bio_add_page() can build lager bios.
>> And blk_queue_split() can split the bios in ->make_request() if needed.
>
> That'll result in quite large bios and always needing splitting.
>
> As Alasdair asked: please provide some performance data that justifies
> these changes.  E.g use a setup like: XFS on a DM striped target.  We
> can iterate on more complex setups once we have established some basic
> tests.

I'll test XFS on DM and also what Christoph suggested:
https://lkml.org/lkml/2015/5/25/226

>
> If you're just punting to reviewers to do the testing for you that isn't
> going to instill _any_ confidence in me for this patchset as a suitabe
> replacement relative to performance.

Kent's Direct IO rewrite patch depends on this series.
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-dio-rewrite

I did test the dio patch on a 2 sockets(48 logical CPUs) server and
saw 40% improvement with 48 null_blks.
Here is the fio data of 4k read.

4.1-rc2
----------
Test 1: bw=50509MB/s, iops=12930K
Test 2: bw=49745MB/s, iops=12735K
Test 3: bw=50297MB/s, iops=12876K,
Average: bw=50183MB/s, iops=12847K

4.1-rc2-dio-rewrite
------------------------
Test 1: bw=70269MB/s, iops=17989K
Test 2: bw=70097MB/s, iops=17945K
Test 3: bw=70907MB/s, iops=18152K
Average: bw=70424MB/s, iops=18028K
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 06/11] md/raid5: get rid of bio_fits_rdev()

Ming Lin-2
In reply to this post by Ming Lin-2
On Tue, May 26, 2015 at 7:33 AM, Ming Lin <[hidden email]> wrote:

> On Mon, May 25, 2015 at 7:17 AM, Christoph Hellwig <[hidden email]> wrote:
>> On Mon, May 25, 2015 at 05:54:14PM +1000, NeilBrown wrote:
>>> Did I write that?  I guess I did :-(
>>> I meant *after*.   Don't get rid of bio_fits_rdev until split_bio is in
>>> chunk_aligned_read().
>>
>> I suspect the whole series could use some reordering.
>
> Nice reordering.
> I'll do this.

Here is the reordering.
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req

I'll post it if you are OK.

[PATCH 01/15] block: add blk_queue_split()
[PATCH 02/15] md: remove ->merge_bvec_fn
[PATCH 03/15] dm: remov merge functions
[PATCH 04/15] drbd: remove ->merge_bvec_fn
[PATCH 05/15] pktcdvd: remove ->merge_bvec_fn
[PATCH 06/15] rbd: remove ->merge_bvec_fn
[PATCH 07/15] bcache: remove driver private bio splitting code
[PATCH 08/15] btrfs: remove bio splitting and merge_bvec_fn() calls
[PATCH 09/15] block: call blk_queue_split() in make_request functions
[PATCH 10/15] block: kill ->merge_bvec_fn and simplify bio_add_page
[PATCH 11/15] block: remove split code in blkdev_issue_discard
[PATCH 12/15] md/raid5: get rid of bio_fits_rdev()
[PATCH 13/15] block: remove bio_get_nr_vecs()
[PATCH 14/15] fs: use helper bio_add_page() instead of open coding on
[PATCH 15/15] Documentation: update notes in biovecs about

>
> Thanks.
>
>>
>> patch 1:
>>
>>  add ->bio_split and blk_queue_split
>>
>> patch 2..n:
>>
>>  one for each non-trivial driver that implements ->merge_bvec_fn to
>>  remove it and instead split bios in ->make_request.  The md patch
>>  to do the right thing in chunk_aligned_read goes into the general
>>  md patch here.  The bcache patch also goes into this series.
>>
>> patch n+1:
>>
>>  - add blk_queue_split calls for remaining trivial drivers
>>
>> patch n+2:
>>
>>  - remove ->merge_bvec_fn and checking of max_sectors a for all
>>    drivers, simplify bio_add_page
>>
>> patch n+2:
>>
>>  - remove splitting in blkdev_issue_discard
>>
>> patch n+3
>>
>>  - remove bio_fits_rdev
>>
>> patch n+4
>>
>>  - remove bio_get_nr_vecs
>>
>> patch n+4
>>
>>  - use bio_add_page
>>
>> patch n+5
>>
>>  - update documentation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 06/11] md/raid5: get rid of bio_fits_rdev()

NeilBrown
On Tue, 26 May 2015 15:32:38 -0700 Ming Lin <[hidden email]> wrote:

> On Tue, May 26, 2015 at 7:33 AM, Ming Lin <[hidden email]> wrote:
> > On Mon, May 25, 2015 at 7:17 AM, Christoph Hellwig <[hidden email]> wrote:
> >> On Mon, May 25, 2015 at 05:54:14PM +1000, NeilBrown wrote:
> >>> Did I write that?  I guess I did :-(
> >>> I meant *after*.   Don't get rid of bio_fits_rdev until split_bio is in
> >>> chunk_aligned_read().
> >>
> >> I suspect the whole series could use some reordering.
> >
> > Nice reordering.
> > I'll do this.
>
> Here is the reordering.
> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
>
> I'll post it if you are OK.
>
> [PATCH 01/15] block: add blk_queue_split()
> [PATCH 02/15] md: remove ->merge_bvec_fn
> [PATCH 03/15] dm: remov merge functions
> [PATCH 04/15] drbd: remove ->merge_bvec_fn
> [PATCH 05/15] pktcdvd: remove ->merge_bvec_fn
> [PATCH 06/15] rbd: remove ->merge_bvec_fn
> [PATCH 07/15] bcache: remove driver private bio splitting code
> [PATCH 08/15] btrfs: remove bio splitting and merge_bvec_fn() calls
> [PATCH 09/15] block: call blk_queue_split() in make_request functions
> [PATCH 10/15] block: kill ->merge_bvec_fn and simplify bio_add_page
> [PATCH 11/15] block: remove split code in blkdev_issue_discard
> [PATCH 12/15] md/raid5: get rid of bio_fits_rdev()
> [PATCH 13/15] block: remove bio_get_nr_vecs()
> [PATCH 14/15] fs: use helper bio_add_page() instead of open coding on
> [PATCH 15/15] Documentation: update notes in biovecs about
The changes to dm.c and dm.h should be in the "dm:" patch, not "md:".

But I don't think the sequence is right.

You cannot remove ->merge_bvec_fn for *any* stacked device until *all* devices
make use of blk_queue_split() (or otherwise handle arbitrarily large bios).

I think it would be easiest to:
 - add blk_queue_split() and call it from common code before ->make_request_fn
   is called.  The ensure all devices can accept arbitrarily large bios.
 - driver-by-driver remove merge_bvec_fn and make sure the the driver can cope
   with arbitrary bios themselve, calling blk_queue_split in the make_request
   function only if needed
 - finally remove the call to blk_queue_split from the common code.

Does that make sense to others?

Thanks,
NeilBrown

>
> >
> > Thanks.
> >
> >>
> >> patch 1:
> >>
> >>  add ->bio_split and blk_queue_split
> >>
> >> patch 2..n:
> >>
> >>  one for each non-trivial driver that implements ->merge_bvec_fn to
> >>  remove it and instead split bios in ->make_request.  The md patch
> >>  to do the right thing in chunk_aligned_read goes into the general
> >>  md patch here.  The bcache patch also goes into this series.
> >>
> >> patch n+1:
> >>
> >>  - add blk_queue_split calls for remaining trivial drivers
> >>
> >> patch n+2:
> >>
> >>  - remove ->merge_bvec_fn and checking of max_sectors a for all
> >>    drivers, simplify bio_add_page
> >>
> >> patch n+2:
> >>
> >>  - remove splitting in blkdev_issue_discard
> >>
> >> patch n+3
> >>
> >>  - remove bio_fits_rdev
> >>
> >> patch n+4
> >>
> >>  - remove bio_get_nr_vecs
> >>
> >> patch n+4
> >>
> >>  - use bio_add_page
> >>
> >> patch n+5
> >>
> >>  - update documentation


attachment0 (828 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios

NeilBrown
In reply to this post by Alasdair G Kergon
On Tue, 26 May 2015 16:34:14 +0100 Alasdair G Kergon <[hidden email]> wrote:

> On Tue, May 26, 2015 at 08:02:08AM -0700, Ming Lin wrote:
> > Now bio_add_page() can build lager bios.
> > And blk_queue_split() can split the bios in ->make_request() if needed.
>
> But why not try to make the bio the right size in the first place so you
> don't have to incur the performance impact of splitting?

Because we don't know what the "right" size is.  And the "right" size can
change when array reconfiguration happens.

Splitting has to happen somewhere, if only in bio_addpage where it decides to
create a new bio rather than add another page to the current one.  So moving
the split to a different level of the stack shouldn't necessarily change the
performance profile.

Obviously testing is important to confirm that.

NeilBrown

>
> What performance testing have you yet done to demonstrate the *actual* impact
> of this patchset in situations where merge_bvec_fn is currently a net benefit?
>
> Alasdair
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


attachment0 (828 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 06/11] md/raid5: get rid of bio_fits_rdev()

Ming Lin-2
In reply to this post by NeilBrown
On Tue, May 26, 2015 at 4:03 PM, NeilBrown <[hidden email]> wrote:

> On Tue, 26 May 2015 15:32:38 -0700 Ming Lin <[hidden email]> wrote:
>
>> On Tue, May 26, 2015 at 7:33 AM, Ming Lin <[hidden email]> wrote:
>> > On Mon, May 25, 2015 at 7:17 AM, Christoph Hellwig <[hidden email]> wrote:
>> >> On Mon, May 25, 2015 at 05:54:14PM +1000, NeilBrown wrote:
>> >>> Did I write that?  I guess I did :-(
>> >>> I meant *after*.   Don't get rid of bio_fits_rdev until split_bio is in
>> >>> chunk_aligned_read().
>> >>
>> >> I suspect the whole series could use some reordering.
>> >
>> > Nice reordering.
>> > I'll do this.
>>
>> Here is the reordering.
>> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
>>
>> I'll post it if you are OK.
>>
>> [PATCH 01/15] block: add blk_queue_split()
>> [PATCH 02/15] md: remove ->merge_bvec_fn
>> [PATCH 03/15] dm: remov merge functions
>> [PATCH 04/15] drbd: remove ->merge_bvec_fn
>> [PATCH 05/15] pktcdvd: remove ->merge_bvec_fn
>> [PATCH 06/15] rbd: remove ->merge_bvec_fn
>> [PATCH 07/15] bcache: remove driver private bio splitting code
>> [PATCH 08/15] btrfs: remove bio splitting and merge_bvec_fn() calls
>> [PATCH 09/15] block: call blk_queue_split() in make_request functions
>> [PATCH 10/15] block: kill ->merge_bvec_fn and simplify bio_add_page
>> [PATCH 11/15] block: remove split code in blkdev_issue_discard
>> [PATCH 12/15] md/raid5: get rid of bio_fits_rdev()
>> [PATCH 13/15] block: remove bio_get_nr_vecs()
>> [PATCH 14/15] fs: use helper bio_add_page() instead of open coding on
>> [PATCH 15/15] Documentation: update notes in biovecs about
>
> The changes to dm.c and dm.h should be in the "dm:" patch, not "md:".

Will move it.

>
> But I don't think the sequence is right.
>
> You cannot remove ->merge_bvec_fn for *any* stacked device until *all* devices
> make use of blk_queue_split() (or otherwise handle arbitrarily large bios).
>
> I think it would be easiest to:
>  - add blk_queue_split() and call it from common code before ->make_request_fn
>    is called.  The ensure all devices can accept arbitrarily large bios.

For "common code", do you mean "generic_make_request()"

diff --git a/block/blk-core.c b/block/blk-core.c
index fbbb337..bb6455b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1942,6 +1942,7 @@ void generic_make_request(struct bio *bio)
        do {
                struct request_queue *q = bdev_get_queue(bio->bi_bdev);

+               blk_queue_split(q, &bio, q->bio_split);
                q->make_request_fn(q, bio);

                bio = bio_list_pop(current->bio_list);

>  - driver-by-driver remove merge_bvec_fn and make sure the the driver can cope
>    with arbitrary bios themselve, calling blk_queue_split in the make_request
>    function only if needed
>  - finally remove the call to blk_queue_split from the common code.
>
> Does that make sense to others?
>
> Thanks,
> NeilBrown
>
>>
>> >
>> > Thanks.
>> >
>> >>
>> >> patch 1:
>> >>
>> >>  add ->bio_split and blk_queue_split
>> >>
>> >> patch 2..n:
>> >>
>> >>  one for each non-trivial driver that implements ->merge_bvec_fn to
>> >>  remove it and instead split bios in ->make_request.  The md patch
>> >>  to do the right thing in chunk_aligned_read goes into the general
>> >>  md patch here.  The bcache patch also goes into this series.
>> >>
>> >> patch n+1:
>> >>
>> >>  - add blk_queue_split calls for remaining trivial drivers
>> >>
>> >> patch n+2:
>> >>
>> >>  - remove ->merge_bvec_fn and checking of max_sectors a for all
>> >>    drivers, simplify bio_add_page
>> >>
>> >> patch n+2:
>> >>
>> >>  - remove splitting in blkdev_issue_discard
>> >>
>> >> patch n+3
>> >>
>> >>  - remove bio_fits_rdev
>> >>
>> >> patch n+4
>> >>
>> >>  - remove bio_get_nr_vecs
>> >>
>> >> patch n+4
>> >>
>> >>  - use bio_add_page
>> >>
>> >> patch n+5
>> >>
>> >>  - update documentation
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 06/11] md/raid5: get rid of bio_fits_rdev()

NeilBrown
On Tue, 26 May 2015 16:42:35 -0700 Ming Lin <[hidden email]> wrote:

> On Tue, May 26, 2015 at 4:03 PM, NeilBrown <[hidden email]> wrote:
> > On Tue, 26 May 2015 15:32:38 -0700 Ming Lin <[hidden email]> wrote:
> >
> >> On Tue, May 26, 2015 at 7:33 AM, Ming Lin <[hidden email]> wrote:
> >> > On Mon, May 25, 2015 at 7:17 AM, Christoph Hellwig <[hidden email]> wrote:
> >> >> On Mon, May 25, 2015 at 05:54:14PM +1000, NeilBrown wrote:
> >> >>> Did I write that?  I guess I did :-(
> >> >>> I meant *after*.   Don't get rid of bio_fits_rdev until split_bio is in
> >> >>> chunk_aligned_read().
> >> >>
> >> >> I suspect the whole series could use some reordering.
> >> >
> >> > Nice reordering.
> >> > I'll do this.
> >>
> >> Here is the reordering.
> >> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
> >>
> >> I'll post it if you are OK.
> >>
> >> [PATCH 01/15] block: add blk_queue_split()
> >> [PATCH 02/15] md: remove ->merge_bvec_fn
> >> [PATCH 03/15] dm: remov merge functions
> >> [PATCH 04/15] drbd: remove ->merge_bvec_fn
> >> [PATCH 05/15] pktcdvd: remove ->merge_bvec_fn
> >> [PATCH 06/15] rbd: remove ->merge_bvec_fn
> >> [PATCH 07/15] bcache: remove driver private bio splitting code
> >> [PATCH 08/15] btrfs: remove bio splitting and merge_bvec_fn() calls
> >> [PATCH 09/15] block: call blk_queue_split() in make_request functions
> >> [PATCH 10/15] block: kill ->merge_bvec_fn and simplify bio_add_page
> >> [PATCH 11/15] block: remove split code in blkdev_issue_discard
> >> [PATCH 12/15] md/raid5: get rid of bio_fits_rdev()
> >> [PATCH 13/15] block: remove bio_get_nr_vecs()
> >> [PATCH 14/15] fs: use helper bio_add_page() instead of open coding on
> >> [PATCH 15/15] Documentation: update notes in biovecs about
> >
> > The changes to dm.c and dm.h should be in the "dm:" patch, not "md:".
>
> Will move it.
>
> >
> > But I don't think the sequence is right.
> >
> > You cannot remove ->merge_bvec_fn for *any* stacked device until *all* devices
> > make use of blk_queue_split() (or otherwise handle arbitrarily large bios).
> >
> > I think it would be easiest to:
> >  - add blk_queue_split() and call it from common code before ->make_request_fn
> >    is called.  The ensure all devices can accept arbitrarily large bios.
>
> For "common code", do you mean "generic_make_request()"
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index fbbb337..bb6455b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1942,6 +1942,7 @@ void generic_make_request(struct bio *bio)
>         do {
>                 struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>
> +               blk_queue_split(q, &bio, q->bio_split);
>                 q->make_request_fn(q, bio);
>
>                 bio = bio_list_pop(current->bio_list);
Yes, that is what I mean (assuming that is the only place that calls
->make_request_fn).

Thanks,
NeilBrown



attachment0 (828 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [dm-devel] [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios

Alasdair G Kergon
In reply to this post by NeilBrown
On Wed, May 27, 2015 at 09:06:40AM +1000, Neil Brown wrote:
> Because we don't know what the "right" size is.  And the "right" size can
> change when array reconfiguration happens.
 
In certain configurations today, device-mapper does report back a sensible
maximum bio size smaller than would otherwise be used and thereby avoids
retrospective splitting.  (In tests, the overhead of the duplicate calculation
was found to be negligible so we never restructured the code to optimise it away.)

> Splitting has to happen somewhere, if only in bio_addpage where it decides to
> create a new bio rather than add another page to the current one.  So moving
> the split to a different level of the stack shouldn't necessarily change the
> performance profile.
 
It does sometimes make a significant difference to device-mapper stacks.
DM only uses it for performance reasons - it can already split bios when
it needs to.  I tried to remove merge_bvec_fn from DM several years ago but
couldn't because of the adverse performance impact of lots of splitting activity.

The overall cost of splitting ought to be less in many (but not necessarily
all) cases now as a result of all these patches, so exactly where the best
balance lies now needs to be reassessed empirically.  It is hard to reach
conclusions theoretically because of the complex interplay between the various
factors at different levels.

Alasdair

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v4 06/11] md/raid5: get rid of bio_fits_rdev()

Christoph Hellwig-2
In reply to this post by NeilBrown
On Wed, May 27, 2015 at 09:03:09AM +1000, NeilBrown wrote:

> But I don't think the sequence is right.
>
> You cannot remove ->merge_bvec_fn for *any* stacked device until *all* devices
> make use of blk_queue_split() (or otherwise handle arbitrarily large bios).
>
> I think it would be easiest to:
>  - add blk_queue_split() and call it from common code before ->make_request_fn
>    is called.  The ensure all devices can accept arbitrarily large bios.
>  - driver-by-driver remove merge_bvec_fn and make sure the the driver can cope
>    with arbitrary bios themselve, calling blk_queue_split in the make_request
>    function only if needed
>  - finally remove the call to blk_queue_split from the common code.
>
> Does that make sense to others?

Ok, sorry for leading in the wrong direction.  Because we stack
->merge_bvec_fn calls we do indeed need it until the end.

In that case I think it's better to just go back to something like the
original order and not split the patches up, everything else is just
getting too complicated unfrotunately.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [dm-devel] [PATCH v4 01/11] block: make generic_make_request handle arbitrarily sized bios

Christoph Hellwig-2
In reply to this post by Alasdair G Kergon
On Wed, May 27, 2015 at 01:40:22AM +0100, Alasdair G Kergon wrote:
> It does sometimes make a significant difference to device-mapper stacks.
> DM only uses it for performance reasons - it can already split bios when
> it needs to.  I tried to remove merge_bvec_fn from DM several years ago but
> couldn't because of the adverse performance impact of lots of splitting activity.

Does it still?  Since the move to immutable biovecs the bio splits are
pretty cheap now, but I'd really like to see this verified by benchmarks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
1234