bpf: use-after-free in array_map_alloc

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bpf: use-after-free in array_map_alloc

Sasha Levin
Hi all,

I've hit the following while fuzzing with syzkaller inside a KVM tools guest
running the latest -next kernel:

[ 2590.845375] ==================================================================

[ 2590.845445] BUG: KASAN: use-after-free in pcpu_extend_area_map+0x8a/0x130 at addr ffff88035452a3cc

[ 2590.845457] Read of size 4 by task syz-executor/31307

[ 2590.845464] =============================================================================

[ 2590.845476] BUG kmalloc-128 (Tainted: G        W      ): kasan: bad access detected

[ 2590.845479] -----------------------------------------------------------------------------

[ 2590.845479]

[ 2590.845485] Disabling lock debugging due to kernel taint

[ 2590.845496] INFO: Allocated in 0xbbbbbbbbbbbbbbbb age=18446609615465671625 cpu=0 pid=0

[ 2590.845504] pcpu_mem_zalloc+0x7e/0xc0

[ 2590.845521] ___slab_alloc+0x7af/0x870

[ 2590.845528] __slab_alloc.isra.22+0xf4/0x130

[ 2590.845535] __kmalloc+0x1fe/0x340

[ 2590.845543] pcpu_mem_zalloc+0x7e/0xc0

[ 2590.845551] pcpu_create_chunk+0x79/0x600

[ 2590.845558] pcpu_alloc+0x5d4/0xe10

[ 2590.845567] __alloc_percpu_gfp+0x27/0x30

[ 2590.845582] array_map_alloc+0x595/0x710

[ 2590.845590] SyS_bpf+0x336/0xba0

[ 2590.845605] do_syscall_64+0x2a6/0x4a0

[ 2590.845639] return_from_SYSCALL_64+0x0/0x6a

[ 2590.845647] INFO: Freed in 0x10022ebb3 age=18446628393062689737 cpu=0 pid=0

[ 2590.845653] kvfree+0x45/0x50

[ 2590.845660] __slab_free+0x6a/0x2f0

[ 2590.845665] kfree+0x22c/0x270

[ 2590.845671] kvfree+0x45/0x50

[ 2590.845680] pcpu_balance_workfn+0x11a1/0x1280

[ 2590.845693] process_one_work+0x973/0x10b0

[ 2590.845700] worker_thread+0xcfd/0x1160

[ 2590.845708] kthread+0x2e7/0x300

[ 2590.845716] ret_from_fork+0x22/0x40

[ 2590.845724] INFO: Slab 0xffffea000d514a00 objects=35 used=33 fp=0xffff88035452b740 flags=0x2fffff80004080

[ 2590.845730] INFO: Object 0xffff88035452a3a0 @offset=9120 fp=0xbbbbbbbbbbbbbbbb

[ 2590.845730]

[ 2590.845743] Redzone ffff88035452a398: 00 00 00 00 00 00 00 00                          ........

[ 2590.845751] Object ffff88035452a3a0: bb bb bb bb bb bb bb bb 00 00 00 00 00 00 00 00  ................

[ 2590.845758] Object ffff88035452a3b0: b0 7b 17 3f 03 88 ff ff 00 00 08 00 00 00 08 00  .{.?............

[ 2590.845765] Object ffff88035452a3c0: 00 00 e0 f9 ff e8 ff ff 01 00 00 00 10 00 00 00  ................

[ 2590.845775] Object ffff88035452a3d0: 18 83 2c 3f 03 88 ff ff e0 ff ff ff 0f 00 00 00  ..,?............

[ 2590.845783] Object ffff88035452a3e0: e0 a3 52 54 03 88 ff ff e0 a3 52 54 03 88 ff ff  ..RT......RT....

[ 2590.845790] Object ffff88035452a3f0: 90 96 6b 9f ff ff ff ff e8 a7 13 3f 03 88 ff ff  ..k........?....

[ 2590.845797] Object ffff88035452a400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

[ 2590.845804] Object ffff88035452a410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

[ 2590.845811] Redzone ffff88035452a420: 00 00 00 00 00 00 00 00                          ........

[ 2590.845818] Padding ffff88035452a558: b5 eb 22 00 01 00 00 00                          ..".....

[ 2590.845833] CPU: 0 PID: 31307 Comm: syz-executor Tainted: G    B   W       4.6.0-rc3-next-20160412-sasha-00023-g0b02d6d-dirty #2998

[ 2590.845851]  0000000000000000 00000000a66f8039 ffff880354f8fa60 ffffffffa0fc9d01

[ 2590.845861]  ffffffff00000000 fffffbfff57ad2a0 0000000041b58ab3 ffffffffab65eee0

[ 2590.845870]  ffffffffa0fc9b88 00000000a66f8039 ffff88035e9d4000 ffffffffab67cede

[ 2590.845872] Call Trace:

[ 2590.845903] dump_stack (lib/dump_stack.c:53)
[ 2590.845939] print_trailer (mm/slub.c:668)
[ 2590.845948] object_err (mm/slub.c:675)
[ 2590.845958] kasan_report_error (mm/kasan/report.c:180 mm/kasan/report.c:276)
[ 2590.846007] __asan_report_load4_noabort (mm/kasan/report.c:318)
[ 2590.846028] pcpu_extend_area_map (mm/percpu.c:445)
[ 2590.846038] pcpu_alloc (mm/percpu.c:940)
[ 2590.846128] __alloc_percpu_gfp (mm/percpu.c:1068)
[ 2590.846140] array_map_alloc (kernel/bpf/arraymap.c:36 kernel/bpf/arraymap.c:99)
[ 2590.846150] SyS_bpf (kernel/bpf/syscall.c:35 kernel/bpf/syscall.c:183 kernel/bpf/syscall.c:830 kernel/bpf/syscall.c:787)
[ 2590.846203] do_syscall_64 (arch/x86/entry/common.c:350)
[ 2590.846214] entry_SYSCALL64_slow_path (arch/x86/entry/entry_64.S:251)
[ 2590.846217] Memory state around the buggy address:

[ 2590.846224]  ffff88035452a280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

[ 2590.846230]  ffff88035452a300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

[ 2590.846237] >ffff88035452a380: fc fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb

[ 2590.846240]                                               ^

[ 2590.846247]  ffff88035452a400: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc

[ 2590.846253]  ffff88035452a480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

[ 2590.846256] ==================================================================

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bpf: use-after-free in array_map_alloc

Alexei Starovoitov-2
On Sun, Apr 17, 2016 at 12:58:21PM -0400, Sasha Levin wrote:
> Hi all,
>
> I've hit the following while fuzzing with syzkaller inside a KVM tools guest
> running the latest -next kernel:

thanks for the report. Adding Tejun...
if I read the report correctly it's not about bpf, but rather points to
the issue inside percpu logic.
First __alloc_percpu_gfp() is called, then the memory is freed with
free_percpu() which triggers async pcpu_balance_work and then
pcpu_extend_area_map is hitting use-after-free.
I guess bpf percpu array map is stressing this logic the most.
Any simpler steps to reproduce ?

> [ 2590.845375] ==================================================================
>
> [ 2590.845445] BUG: KASAN: use-after-free in pcpu_extend_area_map+0x8a/0x130 at addr ffff88035452a3cc
>
> [ 2590.845457] Read of size 4 by task syz-executor/31307
>
> [ 2590.845464] =============================================================================
>
> [ 2590.845476] BUG kmalloc-128 (Tainted: G        W      ): kasan: bad access detected
>
> [ 2590.845479] -----------------------------------------------------------------------------
>
> [ 2590.845479]
>
> [ 2590.845485] Disabling lock debugging due to kernel taint
>
> [ 2590.845496] INFO: Allocated in 0xbbbbbbbbbbbbbbbb age=18446609615465671625 cpu=0 pid=0
>
> [ 2590.845504] pcpu_mem_zalloc+0x7e/0xc0
>
> [ 2590.845521] ___slab_alloc+0x7af/0x870
>
> [ 2590.845528] __slab_alloc.isra.22+0xf4/0x130
>
> [ 2590.845535] __kmalloc+0x1fe/0x340
>
> [ 2590.845543] pcpu_mem_zalloc+0x7e/0xc0
>
> [ 2590.845551] pcpu_create_chunk+0x79/0x600
>
> [ 2590.845558] pcpu_alloc+0x5d4/0xe10
>
> [ 2590.845567] __alloc_percpu_gfp+0x27/0x30
>
> [ 2590.845582] array_map_alloc+0x595/0x710
>
> [ 2590.845590] SyS_bpf+0x336/0xba0
>
> [ 2590.845605] do_syscall_64+0x2a6/0x4a0
>
> [ 2590.845639] return_from_SYSCALL_64+0x0/0x6a
>
> [ 2590.845647] INFO: Freed in 0x10022ebb3 age=18446628393062689737 cpu=0 pid=0
>
> [ 2590.845653] kvfree+0x45/0x50
>
> [ 2590.845660] __slab_free+0x6a/0x2f0
>
> [ 2590.845665] kfree+0x22c/0x270
>
> [ 2590.845671] kvfree+0x45/0x50
>
> [ 2590.845680] pcpu_balance_workfn+0x11a1/0x1280
>
> [ 2590.845693] process_one_work+0x973/0x10b0
>
> [ 2590.845700] worker_thread+0xcfd/0x1160
>
> [ 2590.845708] kthread+0x2e7/0x300
>
> [ 2590.845716] ret_from_fork+0x22/0x40
>
> [ 2590.845724] INFO: Slab 0xffffea000d514a00 objects=35 used=33 fp=0xffff88035452b740 flags=0x2fffff80004080
>
> [ 2590.845730] INFO: Object 0xffff88035452a3a0 @offset=9120 fp=0xbbbbbbbbbbbbbbbb
>
> [ 2590.845730]
>
> [ 2590.845743] Redzone ffff88035452a398: 00 00 00 00 00 00 00 00                          ........
>
> [ 2590.845751] Object ffff88035452a3a0: bb bb bb bb bb bb bb bb 00 00 00 00 00 00 00 00  ................
>
> [ 2590.845758] Object ffff88035452a3b0: b0 7b 17 3f 03 88 ff ff 00 00 08 00 00 00 08 00  .{.?............
>
> [ 2590.845765] Object ffff88035452a3c0: 00 00 e0 f9 ff e8 ff ff 01 00 00 00 10 00 00 00  ................
>
> [ 2590.845775] Object ffff88035452a3d0: 18 83 2c 3f 03 88 ff ff e0 ff ff ff 0f 00 00 00  ..,?............
>
> [ 2590.845783] Object ffff88035452a3e0: e0 a3 52 54 03 88 ff ff e0 a3 52 54 03 88 ff ff  ..RT......RT....
>
> [ 2590.845790] Object ffff88035452a3f0: 90 96 6b 9f ff ff ff ff e8 a7 13 3f 03 88 ff ff  ..k........?....
>
> [ 2590.845797] Object ffff88035452a400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>
> [ 2590.845804] Object ffff88035452a410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>
> [ 2590.845811] Redzone ffff88035452a420: 00 00 00 00 00 00 00 00                          ........
>
> [ 2590.845818] Padding ffff88035452a558: b5 eb 22 00 01 00 00 00                          ..".....
>
> [ 2590.845833] CPU: 0 PID: 31307 Comm: syz-executor Tainted: G    B   W       4.6.0-rc3-next-20160412-sasha-00023-g0b02d6d-dirty #2998
>
> [ 2590.845851]  0000000000000000 00000000a66f8039 ffff880354f8fa60 ffffffffa0fc9d01
>
> [ 2590.845861]  ffffffff00000000 fffffbfff57ad2a0 0000000041b58ab3 ffffffffab65eee0
>
> [ 2590.845870]  ffffffffa0fc9b88 00000000a66f8039 ffff88035e9d4000 ffffffffab67cede
>
> [ 2590.845872] Call Trace:
>
> [ 2590.845903] dump_stack (lib/dump_stack.c:53)
> [ 2590.845939] print_trailer (mm/slub.c:668)
> [ 2590.845948] object_err (mm/slub.c:675)
> [ 2590.845958] kasan_report_error (mm/kasan/report.c:180 mm/kasan/report.c:276)
> [ 2590.846007] __asan_report_load4_noabort (mm/kasan/report.c:318)
> [ 2590.846028] pcpu_extend_area_map (mm/percpu.c:445)
> [ 2590.846038] pcpu_alloc (mm/percpu.c:940)
> [ 2590.846128] __alloc_percpu_gfp (mm/percpu.c:1068)
> [ 2590.846140] array_map_alloc (kernel/bpf/arraymap.c:36 kernel/bpf/arraymap.c:99)
> [ 2590.846150] SyS_bpf (kernel/bpf/syscall.c:35 kernel/bpf/syscall.c:183 kernel/bpf/syscall.c:830 kernel/bpf/syscall.c:787)
> [ 2590.846203] do_syscall_64 (arch/x86/entry/common.c:350)
> [ 2590.846214] entry_SYSCALL64_slow_path (arch/x86/entry/entry_64.S:251)
> [ 2590.846217] Memory state around the buggy address:
>
> [ 2590.846224]  ffff88035452a280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
> [ 2590.846230]  ffff88035452a300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
> [ 2590.846237] >ffff88035452a380: fc fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb
>
> [ 2590.846240]                                               ^
>
> [ 2590.846247]  ffff88035452a400: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
>
> [ 2590.846253]  ffff88035452a480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
> [ 2590.846256] ==================================================================
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bpf: use-after-free in array_map_alloc

Sasha Levin
On 04/17/2016 01:29 PM, Alexei Starovoitov wrote:

> On Sun, Apr 17, 2016 at 12:58:21PM -0400, Sasha Levin wrote:
>> > Hi all,
>> >
>> > I've hit the following while fuzzing with syzkaller inside a KVM tools guest
>> > running the latest -next kernel:
> thanks for the report. Adding Tejun...
> if I read the report correctly it's not about bpf, but rather points to
> the issue inside percpu logic.
> First __alloc_percpu_gfp() is called, then the memory is freed with
> free_percpu() which triggers async pcpu_balance_work and then
> pcpu_extend_area_map is hitting use-after-free.
> I guess bpf percpu array map is stressing this logic the most.
> Any simpler steps to reproduce ?

No simple way to reproduce. I blamed bpf because I saw a few traces
and it was only bpf that was causing it, there was no other reasoning
behind it.


Thanks,
Sasha

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bpf: use-after-free in array_map_alloc

Vlastimil Babka
In reply to this post by Alexei Starovoitov-2
[+CC Christoph, linux-mm]

On 04/17/2016 07:29 PM, Alexei Starovoitov wrote:
> On Sun, Apr 17, 2016 at 12:58:21PM -0400, Sasha Levin wrote:
>> Hi all,
>>
>> I've hit the following while fuzzing with syzkaller inside a KVM tools guest
>> running the latest -next kernel:
>
> thanks for the report. Adding Tejun...

Looks like this report died, and meanwhile there's a CVE for it,
including a reproducer:

http://seclists.org/oss-sec/2016/q2/332

So maybe we should fix it now? :)

> if I read the report correctly it's not about bpf, but rather points to
> the issue inside percpu logic.
> First __alloc_percpu_gfp() is called, then the memory is freed with
> free_percpu() which triggers async pcpu_balance_work and then
> pcpu_extend_area_map is hitting use-after-free.
> I guess bpf percpu array map is stressing this logic the most.

I've been staring at it for a while (not knowing the code at all) and
the first thing that struck me is that pcpu_extend_area_map() is done
outside of pcpu_lock. So what prevents the chunk from being freed during
the extend?

> Any simpler steps to reproduce ?
>
>> [ 2590.845375] ==================================================================
>>
>> [ 2590.845445] BUG: KASAN: use-after-free in pcpu_extend_area_map+0x8a/0x130 at addr ffff88035452a3cc
>>
>> [ 2590.845457] Read of size 4 by task syz-executor/31307
>>
>> [ 2590.845464] =============================================================================
>>
>> [ 2590.845476] BUG kmalloc-128 (Tainted: G        W      ): kasan: bad access detected
>>
>> [ 2590.845479] -----------------------------------------------------------------------------
>>
>> [ 2590.845479]
>>
>> [ 2590.845485] Disabling lock debugging due to kernel taint
>>
>> [ 2590.845496] INFO: Allocated in 0xbbbbbbbbbbbbbbbb age=18446609615465671625 cpu=0 pid=0
>>
>> [ 2590.845504] pcpu_mem_zalloc+0x7e/0xc0
>>
>> [ 2590.845521] ___slab_alloc+0x7af/0x870
>>
>> [ 2590.845528] __slab_alloc.isra.22+0xf4/0x130
>>
>> [ 2590.845535] __kmalloc+0x1fe/0x340
>>
>> [ 2590.845543] pcpu_mem_zalloc+0x7e/0xc0
>>
>> [ 2590.845551] pcpu_create_chunk+0x79/0x600
>>
>> [ 2590.845558] pcpu_alloc+0x5d4/0xe10
>>
>> [ 2590.845567] __alloc_percpu_gfp+0x27/0x30
>>
>> [ 2590.845582] array_map_alloc+0x595/0x710
>>
>> [ 2590.845590] SyS_bpf+0x336/0xba0
>>
>> [ 2590.845605] do_syscall_64+0x2a6/0x4a0
>>
>> [ 2590.845639] return_from_SYSCALL_64+0x0/0x6a
>>
>> [ 2590.845647] INFO: Freed in 0x10022ebb3 age=18446628393062689737 cpu=0 pid=0
>>
>> [ 2590.845653] kvfree+0x45/0x50
>>
>> [ 2590.845660] __slab_free+0x6a/0x2f0
>>
>> [ 2590.845665] kfree+0x22c/0x270
>>
>> [ 2590.845671] kvfree+0x45/0x50
>>
>> [ 2590.845680] pcpu_balance_workfn+0x11a1/0x1280
>>
>> [ 2590.845693] process_one_work+0x973/0x10b0
>>
>> [ 2590.845700] worker_thread+0xcfd/0x1160
>>
>> [ 2590.845708] kthread+0x2e7/0x300
>>
>> [ 2590.845716] ret_from_fork+0x22/0x40
>>
>> [ 2590.845724] INFO: Slab 0xffffea000d514a00 objects=35 used=33 fp=0xffff88035452b740 flags=0x2fffff80004080
>>
>> [ 2590.845730] INFO: Object 0xffff88035452a3a0 @offset=9120 fp=0xbbbbbbbbbbbbbbbb
>>
>> [ 2590.845730]
>>
>> [ 2590.845743] Redzone ffff88035452a398: 00 00 00 00 00 00 00 00                          ........
>>
>> [ 2590.845751] Object ffff88035452a3a0: bb bb bb bb bb bb bb bb 00 00 00 00 00 00 00 00  ................
>>
>> [ 2590.845758] Object ffff88035452a3b0: b0 7b 17 3f 03 88 ff ff 00 00 08 00 00 00 08 00  .{.?............
>>
>> [ 2590.845765] Object ffff88035452a3c0: 00 00 e0 f9 ff e8 ff ff 01 00 00 00 10 00 00 00  ................
>>
>> [ 2590.845775] Object ffff88035452a3d0: 18 83 2c 3f 03 88 ff ff e0 ff ff ff 0f 00 00 00  ..,?............
>>
>> [ 2590.845783] Object ffff88035452a3e0: e0 a3 52 54 03 88 ff ff e0 a3 52 54 03 88 ff ff  ..RT......RT....
>>
>> [ 2590.845790] Object ffff88035452a3f0: 90 96 6b 9f ff ff ff ff e8 a7 13 3f 03 88 ff ff  ..k........?....
>>
>> [ 2590.845797] Object ffff88035452a400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>
>> [ 2590.845804] Object ffff88035452a410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>
>> [ 2590.845811] Redzone ffff88035452a420: 00 00 00 00 00 00 00 00                          ........
>>
>> [ 2590.845818] Padding ffff88035452a558: b5 eb 22 00 01 00 00 00                          ..".....
>>
>> [ 2590.845833] CPU: 0 PID: 31307 Comm: syz-executor Tainted: G    B   W       4.6.0-rc3-next-20160412-sasha-00023-g0b02d6d-dirty #2998
>>
>> [ 2590.845851]  0000000000000000 00000000a66f8039 ffff880354f8fa60 ffffffffa0fc9d01
>>
>> [ 2590.845861]  ffffffff00000000 fffffbfff57ad2a0 0000000041b58ab3 ffffffffab65eee0
>>
>> [ 2590.845870]  ffffffffa0fc9b88 00000000a66f8039 ffff88035e9d4000 ffffffffab67cede
>>
>> [ 2590.845872] Call Trace:
>>
>> [ 2590.845903] dump_stack (lib/dump_stack.c:53)
>> [ 2590.845939] print_trailer (mm/slub.c:668)
>> [ 2590.845948] object_err (mm/slub.c:675)
>> [ 2590.845958] kasan_report_error (mm/kasan/report.c:180 mm/kasan/report.c:276)
>> [ 2590.846007] __asan_report_load4_noabort (mm/kasan/report.c:318)
>> [ 2590.846028] pcpu_extend_area_map (mm/percpu.c:445)

This line is:

        if (new_alloc <= chunk->map_alloc)

i.e. the first time the function touches some part of the chunk object.


>> [ 2590.846038] pcpu_alloc (mm/percpu.c:940)
>> [ 2590.846128] __alloc_percpu_gfp (mm/percpu.c:1068)
>> [ 2590.846140] array_map_alloc (kernel/bpf/arraymap.c:36 kernel/bpf/arraymap.c:99)
>> [ 2590.846150] SyS_bpf (kernel/bpf/syscall.c:35 kernel/bpf/syscall.c:183 kernel/bpf/syscall.c:830 kernel/bpf/syscall.c:787)
>> [ 2590.846203] do_syscall_64 (arch/x86/entry/common.c:350)
>> [ 2590.846214] entry_SYSCALL64_slow_path (arch/x86/entry/entry_64.S:251)
>> [ 2590.846217] Memory state around the buggy address:
>>
>> [ 2590.846224]  ffff88035452a280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>
>> [ 2590.846230]  ffff88035452a300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>
>> [ 2590.846237] >ffff88035452a380: fc fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb
>>
>> [ 2590.846240]                                               ^
>>
>> [ 2590.846247]  ffff88035452a400: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
>>
>> [ 2590.846253]  ffff88035452a480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>
>> [ 2590.846256] ==================================================================
>>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bpf: use-after-free in array_map_alloc

Vlastimil Babka
On 05/23/2016 02:01 PM, Vlastimil Babka wrote:

>> if I read the report correctly it's not about bpf, but rather points to
>> the issue inside percpu logic.
>> First __alloc_percpu_gfp() is called, then the memory is freed with
>> free_percpu() which triggers async pcpu_balance_work and then
>> pcpu_extend_area_map is hitting use-after-free.
>> I guess bpf percpu array map is stressing this logic the most.
>
> I've been staring at it for a while (not knowing the code at all) and
> the first thing that struck me is that pcpu_extend_area_map() is done
> outside of pcpu_lock. So what prevents the chunk from being freed during
> the extend?

Erm to be precise, pcpu_lock is unlocked just before calling
pcpu_extend_area_map(), which relocks it after an allocation, and
assumes the chunk still exists at that point. Unless I'm missing
something, that's an unlocked window where chunk can be destroyed by the
workfn, as the report suggests?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bpf: use-after-free in array_map_alloc

Tejun Heo-2
Hello,

Can you please test whether this patch resolves the issue?  While
adding support for atomic allocations, I reduced alloc_mutex covered
region too much.

Thanks.

diff --git a/mm/percpu.c b/mm/percpu.c
index 0c59684..bd2df70 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -162,7 +162,7 @@ static struct pcpu_chunk *pcpu_reserved_chunk;
 static int pcpu_reserved_chunk_limit;
 
 static DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */
-static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop */
+static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map extension */
 
 static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
 
@@ -435,6 +435,8 @@ static int pcpu_extend_area_map(struct pcpu_chunk *chunk, int new_alloc)
  size_t old_size = 0, new_size = new_alloc * sizeof(new[0]);
  unsigned long flags;
 
+ lockdep_assert_held(&pcpu_alloc_mutex);
+
  new = pcpu_mem_zalloc(new_size);
  if (!new)
  return -ENOMEM;
@@ -895,6 +897,9 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
  return NULL;
  }
 
+ if (!is_atomic)
+ mutex_lock(&pcpu_alloc_mutex);
+
  spin_lock_irqsave(&pcpu_lock, flags);
 
  /* serve reserved allocations from the reserved chunk if available */
@@ -967,12 +972,11 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
  if (is_atomic)
  goto fail;
 
- mutex_lock(&pcpu_alloc_mutex);
+ lockdep_assert_held(&pcpu_alloc_mutex);
 
  if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) {
  chunk = pcpu_create_chunk();
  if (!chunk) {
- mutex_unlock(&pcpu_alloc_mutex);
  err = "failed to allocate new chunk";
  goto fail;
  }
@@ -983,7 +987,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
  spin_lock_irqsave(&pcpu_lock, flags);
  }
 
- mutex_unlock(&pcpu_alloc_mutex);
  goto restart;
 
 area_found:
@@ -993,8 +996,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
  if (!is_atomic) {
  int page_start, page_end, rs, re;
 
- mutex_lock(&pcpu_alloc_mutex);
-
  page_start = PFN_DOWN(off);
  page_end = PFN_UP(off + size);
 
@@ -1005,7 +1006,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 
  spin_lock_irqsave(&pcpu_lock, flags);
  if (ret) {
- mutex_unlock(&pcpu_alloc_mutex);
  pcpu_free_area(chunk, off, &occ_pages);
  err = "failed to populate";
  goto fail_unlock;
@@ -1045,6 +1045,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
  /* see the flag handling in pcpu_blance_workfn() */
  pcpu_atomic_alloc_failed = true;
  pcpu_schedule_balance_work();
+ } else {
+ mutex_unlock(&pcpu_alloc_mutex);
  }
  return NULL;
 }
@@ -1137,6 +1139,8 @@ static void pcpu_balance_workfn(struct work_struct *work)
  list_for_each_entry_safe(chunk, next, &to_free, list) {
  int rs, re;
 
+ cancel_work_sync(&chunk->map_extend_work);
+
  pcpu_for_each_pop_region(chunk, rs, re, 0, pcpu_unit_pages) {
  pcpu_depopulate_chunk(chunk, rs, re);
  spin_lock_irq(&pcpu_lock);
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bpf: use-after-free in array_map_alloc

Alexei Starovoitov-2
On Mon, May 23, 2016 at 05:35:01PM -0400, Tejun Heo wrote:
> Hello,
>
> Can you please test whether this patch resolves the issue?  While
> adding support for atomic allocations, I reduced alloc_mutex covered
> region too much.

after the patch the use-after-free is no longer seen.
Tested-by: Alexei Starovoitov <[hidden email]>

>
> Thanks.
>
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 0c59684..bd2df70 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -162,7 +162,7 @@ static struct pcpu_chunk *pcpu_reserved_chunk;
>  static int pcpu_reserved_chunk_limit;
>  
>  static DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */
> -static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop */
> +static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map extension */
>  
>  static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
>  
> @@ -435,6 +435,8 @@ static int pcpu_extend_area_map(struct pcpu_chunk *chunk, int new_alloc)
>   size_t old_size = 0, new_size = new_alloc * sizeof(new[0]);
>   unsigned long flags;
>  
> + lockdep_assert_held(&pcpu_alloc_mutex);
> +
>   new = pcpu_mem_zalloc(new_size);
>   if (!new)
>   return -ENOMEM;
> @@ -895,6 +897,9 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>   return NULL;
>   }
>  
> + if (!is_atomic)
> + mutex_lock(&pcpu_alloc_mutex);
> +
>   spin_lock_irqsave(&pcpu_lock, flags);
>  
>   /* serve reserved allocations from the reserved chunk if available */
> @@ -967,12 +972,11 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>   if (is_atomic)
>   goto fail;
>  
> - mutex_lock(&pcpu_alloc_mutex);
> + lockdep_assert_held(&pcpu_alloc_mutex);
>  
>   if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) {
>   chunk = pcpu_create_chunk();
>   if (!chunk) {
> - mutex_unlock(&pcpu_alloc_mutex);
>   err = "failed to allocate new chunk";
>   goto fail;
>   }
> @@ -983,7 +987,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>   spin_lock_irqsave(&pcpu_lock, flags);
>   }
>  
> - mutex_unlock(&pcpu_alloc_mutex);
>   goto restart;
>  
>  area_found:
> @@ -993,8 +996,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>   if (!is_atomic) {
>   int page_start, page_end, rs, re;
>  
> - mutex_lock(&pcpu_alloc_mutex);
> -
>   page_start = PFN_DOWN(off);
>   page_end = PFN_UP(off + size);
>  
> @@ -1005,7 +1006,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>  
>   spin_lock_irqsave(&pcpu_lock, flags);
>   if (ret) {
> - mutex_unlock(&pcpu_alloc_mutex);
>   pcpu_free_area(chunk, off, &occ_pages);
>   err = "failed to populate";
>   goto fail_unlock;
> @@ -1045,6 +1045,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>   /* see the flag handling in pcpu_blance_workfn() */
>   pcpu_atomic_alloc_failed = true;
>   pcpu_schedule_balance_work();
> + } else {
> + mutex_unlock(&pcpu_alloc_mutex);
>   }
>   return NULL;
>  }
> @@ -1137,6 +1139,8 @@ static void pcpu_balance_workfn(struct work_struct *work)
>   list_for_each_entry_safe(chunk, next, &to_free, list) {
>   int rs, re;
>  
> + cancel_work_sync(&chunk->map_extend_work);
> +
>   pcpu_for_each_pop_region(chunk, rs, re, 0, pcpu_unit_pages) {
>   pcpu_depopulate_chunk(chunk, rs, re);
>   spin_lock_irq(&pcpu_lock);
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bpf: use-after-free in array_map_alloc

Vlastimil Babka
In reply to this post by Tejun Heo-2
[+CC Marco who reported the CVE, forgot that earlier]

On 05/23/2016 11:35 PM, Tejun Heo wrote:
> Hello,
>
> Can you please test whether this patch resolves the issue?  While
> adding support for atomic allocations, I reduced alloc_mutex covered
> region too much.
>
> Thanks.

Ugh, this makes the code even more head-spinning than it was.

> diff --git a/mm/percpu.c b/mm/percpu.c
> index 0c59684..bd2df70 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -162,7 +162,7 @@ static struct pcpu_chunk *pcpu_reserved_chunk;
>   static int pcpu_reserved_chunk_limit;
>
>   static DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */
> -static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop */
> +static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map extension */
>
>   static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
>
> @@ -435,6 +435,8 @@ static int pcpu_extend_area_map(struct pcpu_chunk *chunk, int new_alloc)
>   size_t old_size = 0, new_size = new_alloc * sizeof(new[0]);
>   unsigned long flags;
>
> + lockdep_assert_held(&pcpu_alloc_mutex);

I don't see where the mutex gets locked when called via
pcpu_map_extend_workfn? (except via the new cancel_work_sync() call below?)

Also what protects chunks with scheduled work items from being removed?

> +
>   new = pcpu_mem_zalloc(new_size);
>   if (!new)
>   return -ENOMEM;
> @@ -895,6 +897,9 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>   return NULL;
>   }
>
> + if (!is_atomic)
> + mutex_lock(&pcpu_alloc_mutex);

BTW I noticed that
        bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;

this is too pessimistic IMHO. Reclaim is possible even without __GFP_FS
and __GFP_IO. Could you just use gfpflags_allow_blocking(gfp) here?

> +
>   spin_lock_irqsave(&pcpu_lock, flags);
>
>   /* serve reserved allocations from the reserved chunk if available */
> @@ -967,12 +972,11 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>   if (is_atomic)
>   goto fail;
>
> - mutex_lock(&pcpu_alloc_mutex);
> + lockdep_assert_held(&pcpu_alloc_mutex);
>
>   if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) {
>   chunk = pcpu_create_chunk();
>   if (!chunk) {
> - mutex_unlock(&pcpu_alloc_mutex);
>   err = "failed to allocate new chunk";
>   goto fail;
>   }
> @@ -983,7 +987,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>   spin_lock_irqsave(&pcpu_lock, flags);
>   }
>
> - mutex_unlock(&pcpu_alloc_mutex);
>   goto restart;
>
>   area_found:
> @@ -993,8 +996,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>   if (!is_atomic) {
>   int page_start, page_end, rs, re;
>
> - mutex_lock(&pcpu_alloc_mutex);
> -
>   page_start = PFN_DOWN(off);
>   page_end = PFN_UP(off + size);
>
> @@ -1005,7 +1006,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>
>   spin_lock_irqsave(&pcpu_lock, flags);
>   if (ret) {
> - mutex_unlock(&pcpu_alloc_mutex);
>   pcpu_free_area(chunk, off, &occ_pages);
>   err = "failed to populate";
>   goto fail_unlock;
> @@ -1045,6 +1045,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
>   /* see the flag handling in pcpu_blance_workfn() */
>   pcpu_atomic_alloc_failed = true;
>   pcpu_schedule_balance_work();
> + } else {
> + mutex_unlock(&pcpu_alloc_mutex);
>   }
>   return NULL;
>   }
> @@ -1137,6 +1139,8 @@ static void pcpu_balance_workfn(struct work_struct *work)
>   list_for_each_entry_safe(chunk, next, &to_free, list) {
>   int rs, re;
>
> + cancel_work_sync(&chunk->map_extend_work);

This deserves some comment?

> +
>   pcpu_for_each_pop_region(chunk, rs, re, 0, pcpu_unit_pages) {
>   pcpu_depopulate_chunk(chunk, rs, re);
>   spin_lock_irq(&pcpu_lock);
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [hidden email].  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[hidden email]"> [hidden email] </a>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bpf: use-after-free in array_map_alloc

Tejun Heo-2
Hello,

On Tue, May 24, 2016 at 10:40:54AM +0200, Vlastimil Babka wrote:

> [+CC Marco who reported the CVE, forgot that earlier]
>
> On 05/23/2016 11:35 PM, Tejun Heo wrote:
> > Hello,
> >
> > Can you please test whether this patch resolves the issue?  While
> > adding support for atomic allocations, I reduced alloc_mutex covered
> > region too much.
> >
> > Thanks.
>
> Ugh, this makes the code even more head-spinning than it was.

Locking-wise, it isn't complicated.  It used to be a single mutex
protecting everything.  Atomic alloc support required putting core
allocation parts under spinlock.  It is messy because the two paths
are mixed in the same function.  If we break out the core part to a
separate function and let the sleepable path call into that, it should
look okay, but that's for another patch.

Also, I think protecting chunk's lifetime w/ alloc_mutex is making it
a bit nasty.  Maybe we should do per-chunk "extending" completion and
let pcpu_alloc_mutex just protect populating chunks.

> > @@ -435,6 +435,8 @@ static int pcpu_extend_area_map(struct pcpu_chunk *chunk, int new_alloc)
> >   size_t old_size = 0, new_size = new_alloc * sizeof(new[0]);
> >   unsigned long flags;
> >
> > + lockdep_assert_held(&pcpu_alloc_mutex);
>
> I don't see where the mutex gets locked when called via
> pcpu_map_extend_workfn? (except via the new cancel_work_sync() call below?)

Ah, right.

> Also what protects chunks with scheduled work items from being removed?

cancel_work_sync(), which now obviously should be called outside
alloc_mutex.

> > @@ -895,6 +897,9 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
> >   return NULL;
> >   }
> >
> > + if (!is_atomic)
> > + mutex_lock(&pcpu_alloc_mutex);
>
> BTW I noticed that
> bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
>
> this is too pessimistic IMHO. Reclaim is possible even without __GFP_FS and
> __GFP_IO. Could you just use gfpflags_allow_blocking(gfp) here?

vmalloc hardcodes GFP_KERNEL, so getting more relaxed doesn't buy us
much.

Thanks.

--
tejun
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bpf: use-after-free in array_map_alloc

Tejun Heo-2
Hello,

Alexei, can you please verify this patch?  Map extension got rolled
into balance work so that there's no sync issues between the two async
operations.

Thanks.

Index: work/mm/percpu.c
===================================================================
--- work.orig/mm/percpu.c
+++ work/mm/percpu.c
@@ -112,7 +112,7 @@ struct pcpu_chunk {
  int map_used; /* # of map entries used before the sentry */
  int map_alloc; /* # of map entries allocated */
  int *map; /* allocation map */
- struct work_struct map_extend_work;/* async ->map[] extension */
+ struct list_head map_extend_list;/* on pcpu_map_extend_chunks */
 
  void *data; /* chunk data */
  int first_free; /* no free below this */
@@ -162,10 +162,13 @@ static struct pcpu_chunk *pcpu_reserved_
 static int pcpu_reserved_chunk_limit;
 
 static DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */
-static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop */
+static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map ext */
 
 static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
 
+/* chunks which need their map areas extended, protected by pcpu_lock */
+static LIST_HEAD(pcpu_map_extend_chunks);
+
 /*
  * The number of empty populated pages, protected by pcpu_lock.  The
  * reserved chunk doesn't contribute to the count.
@@ -395,13 +398,19 @@ static int pcpu_need_to_extend(struct pc
 {
  int margin, new_alloc;
 
+ lockdep_assert_held(&pcpu_lock);
+
  if (is_atomic) {
  margin = 3;
 
  if (chunk->map_alloc <
-    chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW &&
-    pcpu_async_enabled)
- schedule_work(&chunk->map_extend_work);
+    chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) {
+ if (list_empty(&chunk->map_extend_list)) {
+ list_add_tail(&chunk->map_extend_list,
+      &pcpu_map_extend_chunks);
+ pcpu_schedule_balance_work();
+ }
+ }
  } else {
  margin = PCPU_ATOMIC_MAP_MARGIN_HIGH;
  }
@@ -435,6 +444,8 @@ static int pcpu_extend_area_map(struct p
  size_t old_size = 0, new_size = new_alloc * sizeof(new[0]);
  unsigned long flags;
 
+ lockdep_assert_held(&pcpu_alloc_mutex);
+
  new = pcpu_mem_zalloc(new_size);
  if (!new)
  return -ENOMEM;
@@ -467,20 +478,6 @@ out_unlock:
  return 0;
 }
 
-static void pcpu_map_extend_workfn(struct work_struct *work)
-{
- struct pcpu_chunk *chunk = container_of(work, struct pcpu_chunk,
- map_extend_work);
- int new_alloc;
-
- spin_lock_irq(&pcpu_lock);
- new_alloc = pcpu_need_to_extend(chunk, false);
- spin_unlock_irq(&pcpu_lock);
-
- if (new_alloc)
- pcpu_extend_area_map(chunk, new_alloc);
-}
-
 /**
  * pcpu_fit_in_area - try to fit the requested allocation in a candidate area
  * @chunk: chunk the candidate area belongs to
@@ -740,7 +737,7 @@ static struct pcpu_chunk *pcpu_alloc_chu
  chunk->map_used = 1;
 
  INIT_LIST_HEAD(&chunk->list);
- INIT_WORK(&chunk->map_extend_work, pcpu_map_extend_workfn);
+ INIT_LIST_HEAD(&chunk->map_extend_list);
  chunk->free_size = pcpu_unit_size;
  chunk->contig_hint = pcpu_unit_size;
 
@@ -895,6 +892,9 @@ static void __percpu *pcpu_alloc(size_t
  return NULL;
  }
 
+ if (!is_atomic)
+ mutex_lock(&pcpu_alloc_mutex);
+
  spin_lock_irqsave(&pcpu_lock, flags);
 
  /* serve reserved allocations from the reserved chunk if available */
@@ -967,12 +967,9 @@ restart:
  if (is_atomic)
  goto fail;
 
- mutex_lock(&pcpu_alloc_mutex);
-
  if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) {
  chunk = pcpu_create_chunk();
  if (!chunk) {
- mutex_unlock(&pcpu_alloc_mutex);
  err = "failed to allocate new chunk";
  goto fail;
  }
@@ -983,7 +980,6 @@ restart:
  spin_lock_irqsave(&pcpu_lock, flags);
  }
 
- mutex_unlock(&pcpu_alloc_mutex);
  goto restart;
 
 area_found:
@@ -993,8 +989,6 @@ area_found:
  if (!is_atomic) {
  int page_start, page_end, rs, re;
 
- mutex_lock(&pcpu_alloc_mutex);
-
  page_start = PFN_DOWN(off);
  page_end = PFN_UP(off + size);
 
@@ -1005,7 +999,6 @@ area_found:
 
  spin_lock_irqsave(&pcpu_lock, flags);
  if (ret) {
- mutex_unlock(&pcpu_alloc_mutex);
  pcpu_free_area(chunk, off, &occ_pages);
  err = "failed to populate";
  goto fail_unlock;
@@ -1045,6 +1038,8 @@ fail:
  /* see the flag handling in pcpu_blance_workfn() */
  pcpu_atomic_alloc_failed = true;
  pcpu_schedule_balance_work();
+ } else {
+ mutex_unlock(&pcpu_alloc_mutex);
  }
  return NULL;
 }
@@ -1129,6 +1124,7 @@ static void pcpu_balance_workfn(struct w
  if (chunk == list_first_entry(free_head, struct pcpu_chunk, list))
  continue;
 
+ list_del_init(&chunk->map_extend_list);
  list_move(&chunk->list, &to_free);
  }
 
@@ -1146,6 +1142,24 @@ static void pcpu_balance_workfn(struct w
  pcpu_destroy_chunk(chunk);
  }
 
+ do {
+ int new_alloc = 0;
+
+ spin_lock_irq(&pcpu_lock);
+
+ chunk = list_first_entry_or_null(&pcpu_map_extend_chunks,
+ struct pcpu_chunk, map_extend_list);
+ if (chunk) {
+ list_del_init(&chunk->map_extend_list);
+ new_alloc = pcpu_need_to_extend(chunk, false);
+ }
+
+ spin_unlock_irq(&pcpu_lock);
+
+ if (new_alloc)
+ pcpu_extend_area_map(chunk, new_alloc);
+ } while (chunk);
+
  /*
  * Ensure there are certain number of free populated pages for
  * atomic allocs.  Fill up from the most packed so that atomic
@@ -1644,7 +1658,7 @@ int __init pcpu_setup_first_chunk(const
  */
  schunk = memblock_virt_alloc(pcpu_chunk_struct_size, 0);
  INIT_LIST_HEAD(&schunk->list);
- INIT_WORK(&schunk->map_extend_work, pcpu_map_extend_workfn);
+ INIT_LIST_HEAD(&schunk->map_extend_list);
  schunk->base_addr = base_addr;
  schunk->map = smap;
  schunk->map_alloc = ARRAY_SIZE(smap);
@@ -1673,7 +1687,7 @@ int __init pcpu_setup_first_chunk(const
  if (dyn_size) {
  dchunk = memblock_virt_alloc(pcpu_chunk_struct_size, 0);
  INIT_LIST_HEAD(&dchunk->list);
- INIT_WORK(&dchunk->map_extend_work, pcpu_map_extend_workfn);
+ INIT_LIST_HEAD(&dchunk->map_extend_list);
  dchunk->base_addr = base_addr;
  dchunk->map = dmap;
  dchunk->map_alloc = ARRAY_SIZE(dmap);
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bpf: use-after-free in array_map_alloc

Alexei Starovoitov-2
On Tue, May 24, 2016 at 12:04 PM, Tejun Heo <[hidden email]> wrote:
> Hello,
>
> Alexei, can you please verify this patch?  Map extension got rolled
> into balance work so that there's no sync issues between the two async
> operations.

tests look good. No uaf and basic bpf tests exercise per-cpu map are fine.

>
> Thanks.
>
> Index: work/mm/percpu.c
> ===================================================================
> --- work.orig/mm/percpu.c
> +++ work/mm/percpu.c
> @@ -112,7 +112,7 @@ struct pcpu_chunk {
>         int                     map_used;       /* # of map entries used before the sentry */
>         int                     map_alloc;      /* # of map entries allocated */
>         int                     *map;           /* allocation map */
> -       struct work_struct      map_extend_work;/* async ->map[] extension */
> +       struct list_head        map_extend_list;/* on pcpu_map_extend_chunks */
>
>         void                    *data;          /* chunk data */
>         int                     first_free;     /* no free below this */
> @@ -162,10 +162,13 @@ static struct pcpu_chunk *pcpu_reserved_
>  static int pcpu_reserved_chunk_limit;
>
>  static DEFINE_SPINLOCK(pcpu_lock);     /* all internal data structures */
> -static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop */
> +static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map ext */
>
>  static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
>
> +/* chunks which need their map areas extended, protected by pcpu_lock */
> +static LIST_HEAD(pcpu_map_extend_chunks);
> +
>  /*
>   * The number of empty populated pages, protected by pcpu_lock.  The
>   * reserved chunk doesn't contribute to the count.
> @@ -395,13 +398,19 @@ static int pcpu_need_to_extend(struct pc
>  {
>         int margin, new_alloc;
>
> +       lockdep_assert_held(&pcpu_lock);
> +
>         if (is_atomic) {
>                 margin = 3;
>
>                 if (chunk->map_alloc <
> -                   chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW &&
> -                   pcpu_async_enabled)
> -                       schedule_work(&chunk->map_extend_work);
> +                   chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) {
> +                       if (list_empty(&chunk->map_extend_list)) {
> +                               list_add_tail(&chunk->map_extend_list,
> +                                             &pcpu_map_extend_chunks);
> +                               pcpu_schedule_balance_work();
> +                       }
> +               }
>         } else {
>                 margin = PCPU_ATOMIC_MAP_MARGIN_HIGH;
>         }
> @@ -435,6 +444,8 @@ static int pcpu_extend_area_map(struct p
>         size_t old_size = 0, new_size = new_alloc * sizeof(new[0]);
>         unsigned long flags;
>
> +       lockdep_assert_held(&pcpu_alloc_mutex);
> +
>         new = pcpu_mem_zalloc(new_size);
>         if (!new)
>                 return -ENOMEM;
> @@ -467,20 +478,6 @@ out_unlock:
>         return 0;
>  }
>
> -static void pcpu_map_extend_workfn(struct work_struct *work)
> -{
> -       struct pcpu_chunk *chunk = container_of(work, struct pcpu_chunk,
> -                                               map_extend_work);
> -       int new_alloc;
> -
> -       spin_lock_irq(&pcpu_lock);
> -       new_alloc = pcpu_need_to_extend(chunk, false);
> -       spin_unlock_irq(&pcpu_lock);
> -
> -       if (new_alloc)
> -               pcpu_extend_area_map(chunk, new_alloc);
> -}
> -
>  /**
>   * pcpu_fit_in_area - try to fit the requested allocation in a candidate area
>   * @chunk: chunk the candidate area belongs to
> @@ -740,7 +737,7 @@ static struct pcpu_chunk *pcpu_alloc_chu
>         chunk->map_used = 1;
>
>         INIT_LIST_HEAD(&chunk->list);
> -       INIT_WORK(&chunk->map_extend_work, pcpu_map_extend_workfn);
> +       INIT_LIST_HEAD(&chunk->map_extend_list);
>         chunk->free_size = pcpu_unit_size;
>         chunk->contig_hint = pcpu_unit_size;
>
> @@ -895,6 +892,9 @@ static void __percpu *pcpu_alloc(size_t
>                 return NULL;
>         }
>
> +       if (!is_atomic)
> +               mutex_lock(&pcpu_alloc_mutex);
> +
>         spin_lock_irqsave(&pcpu_lock, flags);
>
>         /* serve reserved allocations from the reserved chunk if available */
> @@ -967,12 +967,9 @@ restart:
>         if (is_atomic)
>                 goto fail;
>
> -       mutex_lock(&pcpu_alloc_mutex);
> -
>         if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) {
>                 chunk = pcpu_create_chunk();
>                 if (!chunk) {
> -                       mutex_unlock(&pcpu_alloc_mutex);
>                         err = "failed to allocate new chunk";
>                         goto fail;
>                 }
> @@ -983,7 +980,6 @@ restart:
>                 spin_lock_irqsave(&pcpu_lock, flags);
>         }
>
> -       mutex_unlock(&pcpu_alloc_mutex);
>         goto restart;
>
>  area_found:
> @@ -993,8 +989,6 @@ area_found:
>         if (!is_atomic) {
>                 int page_start, page_end, rs, re;
>
> -               mutex_lock(&pcpu_alloc_mutex);
> -
>                 page_start = PFN_DOWN(off);
>                 page_end = PFN_UP(off + size);
>
> @@ -1005,7 +999,6 @@ area_found:
>
>                         spin_lock_irqsave(&pcpu_lock, flags);
>                         if (ret) {
> -                               mutex_unlock(&pcpu_alloc_mutex);
>                                 pcpu_free_area(chunk, off, &occ_pages);
>                                 err = "failed to populate";
>                                 goto fail_unlock;
> @@ -1045,6 +1038,8 @@ fail:
>                 /* see the flag handling in pcpu_blance_workfn() */
>                 pcpu_atomic_alloc_failed = true;
>                 pcpu_schedule_balance_work();
> +       } else {
> +               mutex_unlock(&pcpu_alloc_mutex);
>         }
>         return NULL;
>  }
> @@ -1129,6 +1124,7 @@ static void pcpu_balance_workfn(struct w
>                 if (chunk == list_first_entry(free_head, struct pcpu_chunk, list))
>                         continue;
>
> +               list_del_init(&chunk->map_extend_list);
>                 list_move(&chunk->list, &to_free);
>         }
>
> @@ -1146,6 +1142,24 @@ static void pcpu_balance_workfn(struct w
>                 pcpu_destroy_chunk(chunk);
>         }
>
> +       do {
> +               int new_alloc = 0;
> +
> +               spin_lock_irq(&pcpu_lock);
> +
> +               chunk = list_first_entry_or_null(&pcpu_map_extend_chunks,
> +                                       struct pcpu_chunk, map_extend_list);
> +               if (chunk) {
> +                       list_del_init(&chunk->map_extend_list);
> +                       new_alloc = pcpu_need_to_extend(chunk, false);
> +               }
> +
> +               spin_unlock_irq(&pcpu_lock);
> +
> +               if (new_alloc)
> +                       pcpu_extend_area_map(chunk, new_alloc);
> +       } while (chunk);
> +
>         /*
>          * Ensure there are certain number of free populated pages for
>          * atomic allocs.  Fill up from the most packed so that atomic
> @@ -1644,7 +1658,7 @@ int __init pcpu_setup_first_chunk(const
>          */
>         schunk = memblock_virt_alloc(pcpu_chunk_struct_size, 0);
>         INIT_LIST_HEAD(&schunk->list);
> -       INIT_WORK(&schunk->map_extend_work, pcpu_map_extend_workfn);
> +       INIT_LIST_HEAD(&schunk->map_extend_list);
>         schunk->base_addr = base_addr;
>         schunk->map = smap;
>         schunk->map_alloc = ARRAY_SIZE(smap);
> @@ -1673,7 +1687,7 @@ int __init pcpu_setup_first_chunk(const
>         if (dyn_size) {
>                 dchunk = memblock_virt_alloc(pcpu_chunk_struct_size, 0);
>                 INIT_LIST_HEAD(&dchunk->list);
> -               INIT_WORK(&dchunk->map_extend_work, pcpu_map_extend_workfn);
> +               INIT_LIST_HEAD(&dchunk->map_extend_list);
>                 dchunk->base_addr = base_addr;
>                 dchunk->map = dmap;
>                 dchunk->map_alloc = ARRAY_SIZE(dmap);
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH percpu/for-4.7-fixes 1/2] percpu: fix synchronization between chunk->map_extend_work and chunk destruction

Tejun Heo-2
Atomic allocations can trigger async map extensions which is serviced
by chunk->map_extend_work.  pcpu_balance_work which is responsible for
destroying idle chunks wasn't synchronizing properly against
chunk->map_extend_work and may end up freeing the chunk while the work
item is still in flight.

This patch fixes the bug by rolling async map extension operations
into pcpu_balance_work.

Signed-off-by: Tejun Heo <[hidden email]>
Reported-and-tested-by: Alexei Starovoitov <[hidden email]>
Reported-by: Vlastimil Babka <[hidden email]>
Reported-by: Sasha Levin <[hidden email]>
Cc: [hidden email] # v3.18+
Fixes: 9c824b6a172c ("percpu: make sure chunk->map array has available space")
---
 mm/percpu.c |   57 ++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 21 deletions(-)

--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -112,7 +112,7 @@ struct pcpu_chunk {
  int map_used; /* # of map entries used before the sentry */
  int map_alloc; /* # of map entries allocated */
  int *map; /* allocation map */
- struct work_struct map_extend_work;/* async ->map[] extension */
+ struct list_head map_extend_list;/* on pcpu_map_extend_chunks */
 
  void *data; /* chunk data */
  int first_free; /* no free below this */
@@ -166,6 +166,9 @@ static DEFINE_MUTEX(pcpu_alloc_mutex); /
 
 static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
 
+/* chunks which need their map areas extended, protected by pcpu_lock */
+static LIST_HEAD(pcpu_map_extend_chunks);
+
 /*
  * The number of empty populated pages, protected by pcpu_lock.  The
  * reserved chunk doesn't contribute to the count.
@@ -395,13 +398,19 @@ static int pcpu_need_to_extend(struct pc
 {
  int margin, new_alloc;
 
+ lockdep_assert_held(&pcpu_lock);
+
  if (is_atomic) {
  margin = 3;
 
  if (chunk->map_alloc <
-    chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW &&
-    pcpu_async_enabled)
- schedule_work(&chunk->map_extend_work);
+    chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) {
+ if (list_empty(&chunk->map_extend_list)) {
+ list_add_tail(&chunk->map_extend_list,
+      &pcpu_map_extend_chunks);
+ pcpu_schedule_balance_work();
+ }
+ }
  } else {
  margin = PCPU_ATOMIC_MAP_MARGIN_HIGH;
  }
@@ -467,20 +476,6 @@ out_unlock:
  return 0;
 }
 
-static void pcpu_map_extend_workfn(struct work_struct *work)
-{
- struct pcpu_chunk *chunk = container_of(work, struct pcpu_chunk,
- map_extend_work);
- int new_alloc;
-
- spin_lock_irq(&pcpu_lock);
- new_alloc = pcpu_need_to_extend(chunk, false);
- spin_unlock_irq(&pcpu_lock);
-
- if (new_alloc)
- pcpu_extend_area_map(chunk, new_alloc);
-}
-
 /**
  * pcpu_fit_in_area - try to fit the requested allocation in a candidate area
  * @chunk: chunk the candidate area belongs to
@@ -740,7 +735,7 @@ static struct pcpu_chunk *pcpu_alloc_chu
  chunk->map_used = 1;
 
  INIT_LIST_HEAD(&chunk->list);
- INIT_WORK(&chunk->map_extend_work, pcpu_map_extend_workfn);
+ INIT_LIST_HEAD(&chunk->map_extend_list);
  chunk->free_size = pcpu_unit_size;
  chunk->contig_hint = pcpu_unit_size;
 
@@ -1129,6 +1124,7 @@ static void pcpu_balance_workfn(struct w
  if (chunk == list_first_entry(free_head, struct pcpu_chunk, list))
  continue;
 
+ list_del_init(&chunk->map_extend_list);
  list_move(&chunk->list, &to_free);
  }
 
@@ -1146,6 +1142,25 @@ static void pcpu_balance_workfn(struct w
  pcpu_destroy_chunk(chunk);
  }
 
+ /* service chunks which requested async area map extension */
+ do {
+ int new_alloc = 0;
+
+ spin_lock_irq(&pcpu_lock);
+
+ chunk = list_first_entry_or_null(&pcpu_map_extend_chunks,
+ struct pcpu_chunk, map_extend_list);
+ if (chunk) {
+ list_del_init(&chunk->map_extend_list);
+ new_alloc = pcpu_need_to_extend(chunk, false);
+ }
+
+ spin_unlock_irq(&pcpu_lock);
+
+ if (new_alloc)
+ pcpu_extend_area_map(chunk, new_alloc);
+ } while (chunk);
+
  /*
  * Ensure there are certain number of free populated pages for
  * atomic allocs.  Fill up from the most packed so that atomic
@@ -1644,7 +1659,7 @@ int __init pcpu_setup_first_chunk(const
  */
  schunk = memblock_virt_alloc(pcpu_chunk_struct_size, 0);
  INIT_LIST_HEAD(&schunk->list);
- INIT_WORK(&schunk->map_extend_work, pcpu_map_extend_workfn);
+ INIT_LIST_HEAD(&schunk->map_extend_list);
  schunk->base_addr = base_addr;
  schunk->map = smap;
  schunk->map_alloc = ARRAY_SIZE(smap);
@@ -1673,7 +1688,7 @@ int __init pcpu_setup_first_chunk(const
  if (dyn_size) {
  dchunk = memblock_virt_alloc(pcpu_chunk_struct_size, 0);
  INIT_LIST_HEAD(&dchunk->list);
- INIT_WORK(&dchunk->map_extend_work, pcpu_map_extend_workfn);
+ INIT_LIST_HEAD(&dchunk->map_extend_list);
  dchunk->base_addr = base_addr;
  dchunk->map = dmap;
  dchunk->map_alloc = ARRAY_SIZE(dmap);
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH percpu/for-4.7-fixes 2/2] percpu: fix synchronization between synchronous map extension and chunk destruction

Tejun Heo-2
In reply to this post by Alexei Starovoitov-2
For non-atomic allocations, pcpu_alloc() can try to extend the area
map synchronously after dropping pcpu_lock; however, the extension
wasn't synchronized against chunk destruction and the chunk might get
freed while extension is in progress.

This patch fixes the bug by putting most of non-atomic allocations
under pcpu_alloc_mutex to synchronize against pcpu_balance_work which
is responsible for async chunk management including destruction.

Signed-off-by: Tejun Heo <[hidden email]>
Reported-and-tested-by: Alexei Starovoitov <[hidden email]>
Reported-by: Vlastimil Babka <[hidden email]>
Reported-by: Sasha Levin <[hidden email]>
Cc: [hidden email] # v3.18+
Fixes: 1a4d76076cda ("percpu: implement asynchronous chunk population")
---
Hello,

I'll send both patches mainline in a couple days through the percpu
tree.

Thanks.

 mm/percpu.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -162,7 +162,7 @@ static struct pcpu_chunk *pcpu_reserved_
 static int pcpu_reserved_chunk_limit;
 
 static DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */
-static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop */
+static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map ext */
 
 static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
 
@@ -444,6 +444,8 @@ static int pcpu_extend_area_map(struct p
  size_t old_size = 0, new_size = new_alloc * sizeof(new[0]);
  unsigned long flags;
 
+ lockdep_assert_held(&pcpu_alloc_mutex);
+
  new = pcpu_mem_zalloc(new_size);
  if (!new)
  return -ENOMEM;
@@ -890,6 +892,9 @@ static void __percpu *pcpu_alloc(size_t
  return NULL;
  }
 
+ if (!is_atomic)
+ mutex_lock(&pcpu_alloc_mutex);
+
  spin_lock_irqsave(&pcpu_lock, flags);
 
  /* serve reserved allocations from the reserved chunk if available */
@@ -962,12 +967,9 @@ restart:
  if (is_atomic)
  goto fail;
 
- mutex_lock(&pcpu_alloc_mutex);
-
  if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) {
  chunk = pcpu_create_chunk();
  if (!chunk) {
- mutex_unlock(&pcpu_alloc_mutex);
  err = "failed to allocate new chunk";
  goto fail;
  }
@@ -978,7 +980,6 @@ restart:
  spin_lock_irqsave(&pcpu_lock, flags);
  }
 
- mutex_unlock(&pcpu_alloc_mutex);
  goto restart;
 
 area_found:
@@ -988,8 +989,6 @@ area_found:
  if (!is_atomic) {
  int page_start, page_end, rs, re;
 
- mutex_lock(&pcpu_alloc_mutex);
-
  page_start = PFN_DOWN(off);
  page_end = PFN_UP(off + size);
 
@@ -1000,7 +999,6 @@ area_found:
 
  spin_lock_irqsave(&pcpu_lock, flags);
  if (ret) {
- mutex_unlock(&pcpu_alloc_mutex);
  pcpu_free_area(chunk, off, &occ_pages);
  err = "failed to populate";
  goto fail_unlock;
@@ -1040,6 +1038,8 @@ fail:
  /* see the flag handling in pcpu_blance_workfn() */
  pcpu_atomic_alloc_failed = true;
  pcpu_schedule_balance_work();
+ } else {
+ mutex_unlock(&pcpu_alloc_mutex);
  }
  return NULL;
 }
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH percpu/for-4.7-fixes 1/2] percpu: fix synchronization between chunk->map_extend_work and chunk destruction

Vlastimil Babka
In reply to this post by Tejun Heo-2
On 05/25/2016 05:44 PM, Tejun Heo wrote:

> Atomic allocations can trigger async map extensions which is serviced
> by chunk->map_extend_work.  pcpu_balance_work which is responsible for
> destroying idle chunks wasn't synchronizing properly against
> chunk->map_extend_work and may end up freeing the chunk while the work
> item is still in flight.
>
> This patch fixes the bug by rolling async map extension operations
> into pcpu_balance_work.
>
> Signed-off-by: Tejun Heo <[hidden email]>
> Reported-and-tested-by: Alexei Starovoitov <[hidden email]>
> Reported-by: Vlastimil Babka <[hidden email]>
> Reported-by: Sasha Levin <[hidden email]>
> Cc: [hidden email] # v3.18+
> Fixes: 9c824b6a172c ("percpu: make sure chunk->map array has available space")

I didn't spot issues, but I'm not that familiar with the code, so it doesn't
mean much. Just one question below:

> ---
>  mm/percpu.c |   57 ++++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 36 insertions(+), 21 deletions(-)
>
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -112,7 +112,7 @@ struct pcpu_chunk {
>   int map_used; /* # of map entries used before the sentry */
>   int map_alloc; /* # of map entries allocated */
>   int *map; /* allocation map */
> - struct work_struct map_extend_work;/* async ->map[] extension */
> + struct list_head map_extend_list;/* on pcpu_map_extend_chunks */
>
>   void *data; /* chunk data */
>   int first_free; /* no free below this */
> @@ -166,6 +166,9 @@ static DEFINE_MUTEX(pcpu_alloc_mutex); /
>
>  static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
>
> +/* chunks which need their map areas extended, protected by pcpu_lock */
> +static LIST_HEAD(pcpu_map_extend_chunks);
> +
>  /*
>   * The number of empty populated pages, protected by pcpu_lock.  The
>   * reserved chunk doesn't contribute to the count.
> @@ -395,13 +398,19 @@ static int pcpu_need_to_extend(struct pc
>  {
>   int margin, new_alloc;
>
> + lockdep_assert_held(&pcpu_lock);
> +
>   if (is_atomic) {
>   margin = 3;
>
>   if (chunk->map_alloc <
> -    chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW &&
> -    pcpu_async_enabled)
> - schedule_work(&chunk->map_extend_work);
> +    chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) {
> + if (list_empty(&chunk->map_extend_list)) {

So why this list_empty condition? Doesn't it deserve a comment then? And isn't
using a list an overkill in that case?

Thanks.

> + list_add_tail(&chunk->map_extend_list,
> +      &pcpu_map_extend_chunks);
> + pcpu_schedule_balance_work();
> + }
> + }
>   } else {
>   margin = PCPU_ATOMIC_MAP_MARGIN_HIGH;
>   }

[...]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH percpu/for-4.7-fixes 2/2] percpu: fix synchronization between synchronous map extension and chunk destruction

Vlastimil Babka
In reply to this post by Tejun Heo-2
On 05/25/2016 05:45 PM, Tejun Heo wrote:

> For non-atomic allocations, pcpu_alloc() can try to extend the area
> map synchronously after dropping pcpu_lock; however, the extension
> wasn't synchronized against chunk destruction and the chunk might get
> freed while extension is in progress.
>
> This patch fixes the bug by putting most of non-atomic allocations
> under pcpu_alloc_mutex to synchronize against pcpu_balance_work which
> is responsible for async chunk management including destruction.
>
> Signed-off-by: Tejun Heo <[hidden email]>
> Reported-and-tested-by: Alexei Starovoitov <[hidden email]>
> Reported-by: Vlastimil Babka <[hidden email]>
> Reported-by: Sasha Levin <[hidden email]>
> Cc: [hidden email] # v3.18+
> Fixes: 1a4d76076cda ("percpu: implement asynchronous chunk population")

Didn't spot any problems this time.

Thanks

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH percpu/for-4.7-fixes 1/2] percpu: fix synchronization between chunk->map_extend_work and chunk destruction

Tejun Heo-2
In reply to this post by Vlastimil Babka
Hello,

On Thu, May 26, 2016 at 11:19:06AM +0200, Vlastimil Babka wrote:
> >   if (is_atomic) {
> >   margin = 3;
> >
> >   if (chunk->map_alloc <
> > -    chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW &&
> > -    pcpu_async_enabled)
> > - schedule_work(&chunk->map_extend_work);
> > +    chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) {
> > + if (list_empty(&chunk->map_extend_list)) {

> So why this list_empty condition? Doesn't it deserve a comment then? And

Because doing list_add() twice corrupts the list.  I'm not sure that
deserves a comment.  We can do list_move() instead but that isn't
necessarily better.

> isn't using a list an overkill in that case?

That would require rebalance work to scan all chunks whenever it's
scheduled and if a lot of atomic allocations are taking place, it has
some possibility to become expensive with a lot of chunks.

Thanks.

--
tejun
Loading...