[regression] cross core scheduling frequency drop bisected to 0c313cb20732

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[regression] cross core scheduling frequency drop bisected to 0c313cb20732

Mike Galbraith-5
Greetings,

While measuring current NO_HZ cost to light tasks jabbering cross core
at high frequency (~7% max), I noticed that master lost an improvement
for same acquired in 4.5, so bisected it.

4.5.0
homer:~ # taskset 0xc pipe-test 1
2.367681 usecs/loop -- avg 2.367681 844.7 KHz
2.372502 usecs/loop -- avg 2.368163 844.5 KHz
2.342506 usecs/loop -- avg 2.365597 845.5 KHz
2.383029 usecs/loop -- avg 2.367341 844.8 KHz
2.321859 usecs/loop -- avg 2.362792 846.5 KHz   1.00

master
homer:~ # taskset 0xc pipe-test 1
2.797656 usecs/loop -- avg 2.797656 714.9 KHz
2.804518 usecs/loop -- avg 2.798342 714.7 KHz
2.804206 usecs/loop -- avg 2.798929 714.6 KHz
2.802887 usecs/loop -- avg 2.799324 714.5 KHz
2.801577 usecs/loop -- avg 2.799550 714.4 KHz   0.84

master 0c313cb20732 reverted
homer:~ # !taskset
homer:~ # taskset 0xc pipe-test 1
2.277494 usecs/loop -- avg 2.277494 878.2 KHz
2.320979 usecs/loop -- avg 2.281843 876.5 KHz
2.272750 usecs/loop -- avg 2.280933 876.8 KHz
2.272209 usecs/loop -- avg 2.280061 877.2 KHz
2.277279 usecs/loop -- avg 2.279783 877.3 KHz   1.03

0c313cb207326f759a58f486214288411b25d4cf is the first bad commit
commit 0c313cb207326f759a58f486214288411b25d4cf
Author: Rafael J. Wysocki <[hidden email]>
Date:   Sun Mar 20 01:33:35 2016 +0100

    cpuidle: menu: Fall back to polling if next timer event is near
   
    Commit a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable
    polling) changed the behavior of the fallback state selection part
    of menu_select() so it looks at interactivity_req instead of
    data->next_timer_us when it makes its decision.  That effectively
    caused polling to be used more often as fallback idle which led to
    significant increases of energy consumption in some cases.
   
    Commit e132b9b3bc7f (cpuidle: menu: use high confidence factors
    only when considering polling) changed that logic again to be more
    predictable, but that didn't help with the increased energy
    consumption problem.
   
    For this reason, go back to making decisions on which state to fall
    back to based on data->next_timer_us which is the time we know for
    sure something will happen rather than a prediction (which may be
    inaccurate and turns out to be so often enough to be problematic).
    However, take the target residency of the first proper idle state
    (C1) into account, so that state is not used as the fallback one
    if its target residency is greater than data->next_timer_us.
   
    Fixes: a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable polling)
    Signed-off-by: Rafael J. Wysocki <[hidden email]>
    Reported-and-tested-by: Doug Smythies <[hidden email]>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Peter Zijlstra-5
On Fri, Apr 08, 2016 at 07:20:54AM +0200, Mike Galbraith wrote:

> Greetings,
>
> While measuring current NO_HZ cost to light tasks jabbering cross core
> at high frequency (~7% max), I noticed that master lost an improvement
> for same acquired in 4.5, so bisected it.
>
> 4.5.0
> homer:~ # taskset 0xc pipe-test 1
> 2.367681 usecs/loop -- avg 2.367681 844.7 KHz
> 2.372502 usecs/loop -- avg 2.368163 844.5 KHz
> 2.342506 usecs/loop -- avg 2.365597 845.5 KHz
> 2.383029 usecs/loop -- avg 2.367341 844.8 KHz
> 2.321859 usecs/loop -- avg 2.362792 846.5 KHz   1.00
>
> master
> homer:~ # taskset 0xc pipe-test 1
> 2.797656 usecs/loop -- avg 2.797656 714.9 KHz
> 2.804518 usecs/loop -- avg 2.798342 714.7 KHz
> 2.804206 usecs/loop -- avg 2.798929 714.6 KHz
> 2.802887 usecs/loop -- avg 2.799324 714.5 KHz
> 2.801577 usecs/loop -- avg 2.799550 714.4 KHz   0.84
>
> master 0c313cb20732 reverted
> homer:~ # !taskset
> homer:~ # taskset 0xc pipe-test 1
> 2.277494 usecs/loop -- avg 2.277494 878.2 KHz
> 2.320979 usecs/loop -- avg 2.281843 876.5 KHz
> 2.272750 usecs/loop -- avg 2.280933 876.8 KHz
> 2.272209 usecs/loop -- avg 2.280061 877.2 KHz
> 2.277279 usecs/loop -- avg 2.279783 877.3 KHz   1.03
>
> 0c313cb207326f759a58f486214288411b25d4cf is the first bad commit
> commit 0c313cb207326f759a58f486214288411b25d4cf
> Author: Rafael J. Wysocki <[hidden email]>
> Date:   Sun Mar 20 01:33:35 2016 +0100
>
>     cpuidle: menu: Fall back to polling if next timer event is near
>    

Cute, I thought you used governor=performance for your runs?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Mike Galbraith-5
On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:

> Cute, I thought you used governor=performance for your runs?

I do, and those numbers are with it thus set.

        -Mike
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Rafael J. Wysocki-4
On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
> On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
>
> > Cute, I thought you used governor=performance for your runs?
>
> I do, and those numbers are with it thus set.

Well, this is a trade-off.

4.5 introduced a power regression here so this one goes back to the previous
state of things.

Thanks,
Rafael

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Doug Smythies
On 2016.04.08 14:00 Rafael J. Wysocki wrote:
> On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
>> On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
>>
>>> Cute, I thought you used governor=performance for your runs?
>>
>> I do, and those numbers are with it thus set.

> Well, this is a trade-off.
>
> 4.5 introduced a power regression here so this one goes back to the previous
> state of things.

Mike:

Could you send me, or point me to, the program "pipe-test"?
So far, I have only found one, but it is both old and not
the same program you are running (based on print statements).

I realize I might not be to recreate your problem scenario anyhow,
I just want to try.

... Doug


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Mike Galbraith-5
In reply to this post by Rafael J. Wysocki-4
On Fri, 2016-04-08 at 22:59 +0200, Rafael J. Wysocki wrote:

> On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
> > On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
> >
> > > Cute, I thought you used governor=performance for your runs?
> >
> > I do, and those numbers are with it thus set.
>
> Well, this is a trade-off.
>
> 4.5 introduced a power regression here so this one goes back to the previous
> state of things.

That sounds somewhat reasonable.  Too bad I don't have a super duper
watt meter handy.. seeing that you really really are saving me money
would perhaps make me less fond of those prettier numbers.

        -Mike
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Mike Galbraith-5
In reply to this post by Doug Smythies
On Fri, 2016-04-08 at 15:19 -0700, Doug Smythies wrote:

> Could you send me, or point me to, the program "pipe-test"?
> So far, I have only found one, but it is both old and not
> the same program you are running (based on print statements).

It's the same old pipe-test, just bent up a little to suit my usage.

        -Mike
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Doug Smythies
In reply to this post by Doug Smythies
On 2016.04.08 15:19 Doug Smythies wrote:
> On 2016.04.08 14:00 Rafael J. Wysocki wrote:
>> On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
>>> On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
>>>
>>>> Cute, I thought you used governor=performance for your runs?
>>>
>>> I do, and those numbers are with it thus set.

>> Well, this is a trade-off.
>>
>> 4.5 introduced a power regression here so this one goes back to the previous
>> state of things.

> Mike:
>
> Could you send me, or point me to, the program "pipe-test"?
> So far, I have only found one, but it is both old and not
> the same program you are running (based on print statements).
>
> I realize I might not be to recreate your problem scenario anyhow,
> I just want to try.

I still didn't find the exact same program, but I think I found some
earlier version of the correct test.

I get (long term average):
Kernel 4.4.0-17: Powersave 3.93 usecs/loop ; Performance 3.93 usecs/loop 0.89
Kernel 4.5-rc7: Powersave 3.47 usecs/loop ; Performance 3.51 usecs/loop  1.00
Kernel 4.6-rc1: Powersave 3.84 usecs/loop ; Performance 3.88 usecs/loop  0.90

So, similar results (so far, I didn't try reverted yet).

... Doug


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Mike Galbraith-5
On Sat, 2016-04-09 at 00:17 -0700, Doug Smythies wrote:

> I still didn't find the exact same program, but I think I found some
> earlier version of the correct test.
>
> I get (long term average):
> Kernel 4.4.0-17: Powersave 3.93 usecs/loop ; Performance 3.93 usecs/loop 0.89
> Kernel 4.5-rc7: Powersave 3.47 usecs/loop ; Performance 3.51 usecs/loop  1.00
> Kernel 4.6-rc1: Powersave 3.84 usecs/loop ; Performance 3.88 usecs/loop  0.90
>
> So, similar results (so far, I didn't try reverted yet).

I likely see a bit more go missing because I throttle no_hz when idle
is being hammered at high frequency.

        -Mike
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Peter Zijlstra-5
In reply to this post by Rafael J. Wysocki-4
On Fri, Apr 08, 2016 at 10:59:59PM +0200, Rafael J. Wysocki wrote:

> On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
> > On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
> >
> > > Cute, I thought you used governor=performance for your runs?
> >
> > I do, and those numbers are with it thus set.
>
> Well, this is a trade-off.
>
> 4.5 introduced a power regression here so this one goes back to the previous
> state of things.

Just for my elucidation; how can gov=performance have a 'power'
regression?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Peter Zijlstra-5
In reply to this post by Doug Smythies
On Fri, Apr 08, 2016 at 03:19:14PM -0700, Doug Smythies wrote:
> Could you send me, or point me to, the program "pipe-test"?
> So far, I have only found one, but it is both old and not
> the same program you are running (based on print statements).

The latest public one lives as: perf bench sched pipe

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Rafael J. Wysocki-5
In reply to this post by Peter Zijlstra-5
On Sat, Apr 9, 2016 at 1:07 PM, Peter Zijlstra <[hidden email]> wrote:

> On Fri, Apr 08, 2016 at 10:59:59PM +0200, Rafael J. Wysocki wrote:
>> On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
>> > On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
>> >
>> > > Cute, I thought you used governor=performance for your runs?
>> >
>> > I do, and those numbers are with it thus set.
>>
>> Well, this is a trade-off.
>>
>> 4.5 introduced a power regression here so this one goes back to the previous
>> state of things.
>
> Just for my elucidation; how can gov=performance have a 'power'
> regression?

Because of what is used as the "default" idle state most of the time.

C1 was used before 4.5 and that changed to polling in 4.5.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Rafael J. Wysocki-5
In reply to this post by Mike Galbraith-5
On Sat, Apr 9, 2016 at 8:40 AM, Mike Galbraith <[hidden email]> wrote:

> On Fri, 2016-04-08 at 22:59 +0200, Rafael J. Wysocki wrote:
>> On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
>> > On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
>> >
>> > > Cute, I thought you used governor=performance for your runs?
>> >
>> > I do, and those numbers are with it thus set.
>>
>> Well, this is a trade-off.
>>
>> 4.5 introduced a power regression here so this one goes back to the previous
>> state of things.
>
> That sounds somewhat reasonable.  Too bad I don't have a super duper
> watt meter handy.. seeing that you really really are saving me money
> would perhaps make me less fond of those prettier numbers.

You can look at the turbostat Watts numbers ("turbostat --debug" and
the last three columns of the output in turbostat as included in the
kernel source).

That requires an Intel CPU with RAPL.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Mike Galbraith-5
On Sat, 2016-04-09 at 14:33 +0200, Rafael J. Wysocki wrote:

> On Sat, Apr 9, 2016 at 8:40 AM, Mike Galbraith <[hidden email]> wrote:
> > On Fri, 2016-04-08 at 22:59 +0200, Rafael J. Wysocki wrote:
> > > On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
> > > > On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
> > > >
> > > > > Cute, I thought you used governor=performance for your runs?
> > > >
> > > > I do, and those numbers are with it thus set.
> > >
> > > Well, this is a trade-off.
> > >
> > > 4.5 introduced a power regression here so this one goes back to the previous
> > > state of things.
> >
> > That sounds somewhat reasonable.  Too bad I don't have a super duper
> > watt meter handy.. seeing that you really really are saving me money
> > would perhaps make me less fond of those prettier numbers.
>
> You can look at the turbostat Watts numbers ("turbostat --debug" and
> the last three columns of the output in turbostat as included in the
> kernel source).

Hm.  I think I want my prettier numbers back.

714KHz/877KHz = 0.81
25Watt/30Watt = 0.83

        -Mike


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Mike Galbraith-5

Hm, setting gov=performance, and taking the average of 3 30 second
interval PkgWatt samples as pipe-test runs..

714KHz/28.03Ws = 25.46
877KHz/30.28Ws = 28.96

..for pipe-test, the tradeoff look a bit more like red than green.

        -Mike
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Rafael J. Wysocki-5
On Sat, Apr 9, 2016 at 6:39 PM, Mike Galbraith <[hidden email]> wrote:
>
> Hm, setting gov=performance, and taking the average of 3 30 second
> interval PkgWatt samples as pipe-test runs..
>
> 714KHz/28.03Ws = 25.46
> 877KHz/30.28Ws = 28.96
>
> ..for pipe-test, the tradeoff look a bit more like red than green.

Well, fair enough, but that's just pipe-test, and what about the
people who don't see the performance gain and see the energy loss,
like Doug?

Essentially, this trades performance gains in somewhat special
workloads for increased energy consumption in idle.  Those workloads
need not be run by everybody, but idle is.

That said I applied the patch you're complaining about mostly because
the commit that introduced the change in question in 4.5 claimed that
it wouldn't affect idle power on systems with reasonably fast C1, but
that didn't pass the reality test.  I'm not totally against restoring
that change, but it would need to be based on very solid evidence.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Doug Smythies
On 2106.04.09 20:45 Rafael J. Wysocki wrote:

>On Sat, Apr 9, 2016 at 6:39 PM, Mike Galbraith wrote:
>>
>> Hm, setting gov=performance, and taking the average of 3 30 second
>> interval PkgWatt samples as pipe-test runs..
>>
>> 714KHz/28.03Ws = 25.46
>> 877KHz/30.28Ws = 28.96
>>
>> ..for pipe-test, the tradeoff look a bit more like red than green.
>
> Well, fair enough, but that's just pipe-test, and what about the
> people who don't see the performance gain and see the energy loss,
> like Doug?

Some numbers from my computer:

Pipe-test (100 seconds):

Kernel 4.6-rc2 gov=powersave:
Stock: 3.86 uSecs/loop and 3148.05 Joules
Reverted: 3.34 uSecs/loop and 3567.43 Joules

Reverted is 13% faster at a cost of 13% more energy.

Idle stats (done separately and for 20e6 loops)

State k46rc2-ps (sec) k46rc2-rev-ps(sec)
0.00 0.01 4.09
1.00 38.68 0.00
2.00 0.46 0.27
3.00 0.01 0.00
4.00 464.23 380.23
               
total 503.38 384.60

Kernel 4.6-rc2 gov=performance:
Stock: 3.89 uSecs/loop and 3154.72 Joules
Reverted: 3.25 uSecs/loop and 3445.90 Joules

Reverted is 16% faster at a cost of 9% more energy.

Idle stats (done separately and for 20e6 loops)

State k46rc2-pf (sec) k46rc2-rev-pf (sec)
0.00 0.00 1.43
1.00 38.89 0.04
2.00 2.08 0.03
3.00 0.01 0.01
4.00 463.05 381.54
               
total 504.03 383.05

9 incremental kernel compiles, with no changes:
(the reference test from last cycle):
(2000 seconds turbostat package energy sample time):
There is no detectable consistent change in compile times:

Kernel 4.6-rc2 gov=powersave:
Stock: 48557 Joules
Reverted: 65439 Joules

Reverted costs 34% more energy.
(note: this result is unusually high. There are variations test to test)

Kernel 4.6-rc2 gov=performance:
Stock: 49965 Joules
Reverted: 59232 Joules

Reverted costs 19% more energy.
(note: never tested gov=performance before)

Idle stats not re-done (we had several samples last cycle).

> Essentially, this trades performance gains in somewhat special
> workloads for increased energy consumption in idle.  Those workloads
> need not be run by everybody, but idle is.
>
> That said I applied the patch you're complaining about mostly because
> the commit that introduced the change in question in 4.5 claimed that
> it wouldn't affect idle power on systems with reasonably fast C1, but
> that didn't pass the reality test.  I'm not totally against restoring
> that change, but it would need to be based on very solid evidence.

... Doug


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Mike Galbraith-5
In reply to this post by Rafael J. Wysocki-5
On Sun, 2016-04-10 at 05:44 +0200, Rafael J. Wysocki wrote:

> On Sat, Apr 9, 2016 at 6:39 PM, Mike Galbraith <
> [hidden email]> wrote:
> >
> > Hm, setting gov=performance, and taking the average of 3 30 second
> > interval PkgWatt samples as pipe-test runs..
> >
> > 714KHz/28.03Ws = 25.46
> > 877KHz/30.28Ws = 28.96
> >
> > ..for pipe-test, the tradeoff look a bit more like red than green.
>
> Well, fair enough, but that's just pipe-test, and what about the
> people who don't see the performance gain and see the energy loss,
> like Doug?
Perhaps Doug sees increased power because he's not throttling no_hz,
whereas I am, so he burns more power getting _to_ idle?  Dunno, maybe
he'll try the attached.  If it's a general case energy loser, so be it,
numbers talk, bs walks and all that ;-)

> Essentially, this trades performance gains in somewhat special
> workloads for increased energy consumption in idle.  Those workloads
> need not be run by everybody, but idle is.

Cross core scheduling is routine business, we do truckloads of that for
good reason, and lots of stuff does wakeups at high frequency.

> That said I applied the patch you're complaining about mostly because
> the commit that introduced the change in question in 4.5 claimed that
> it wouldn't affect idle power on systems with reasonably fast C1, but
> that didn't pass the reality test.  I'm not totally against restoring
> that change, but it would need to be based on very solid evidence.

Understood.  My box seems to be saying we can hug the trees hardest by
telling the CPU get work done as quickly as possible, but I don't have
much experience at tree hugging measurement.  Performance wise, tasks
talking via localhost is definitely not special.

tbench     1      2      4     8
base     752   1283   2250  3362

select_idle_sibling() off
         735   1344   2080  2884
delta   .977  1.047   .924  .857

select_idle_sibling() on, 0c313cb20732 reverted
         816   1317   2240  3388
delta  1.085  1.026   .995 1.007 vs base
delta  1.110   .979  1.076 1.174 vs off
               (^hm)

        -Mike

sched-throttle-nohz.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Mike Galbraith-5
On Sun, 2016-04-10 at 11:35 +0200, Mike Galbraith wrote:

> On Sun, 2016-04-10 at 05:44 +0200, Rafael J. Wysocki wrote:
> > On Sat, Apr 9, 2016 at 6:39 PM, Mike Galbraith <
> > [hidden email]> wrote:
> > >
> > > Hm, setting gov=performance, and taking the average of 3 30 second
> > > interval PkgWatt samples as pipe-test runs..
> > >
> > > 714KHz/28.03Ws = 25.46
> > > 877KHz/30.28Ws = 28.96
> > >
> > > ..for pipe-test, the tradeoff look a bit more like red than green.
> >
> > Well, fair enough, but that's just pipe-test, and what about the
> > people who don't see the performance gain and see the energy loss,
> > like Doug?
>
> Perhaps Doug sees increased power because he's not throttling no_hz,
> whereas I am, so he burns more power getting _to_ idle?  Dunno, maybe
> he'll try the attached.  If it's a general case energy loser, so be it,
> numbers talk, bs walks and all that ;-)

And here are the rest of my numbers..

> tbench     1      2      4     8
> base     752   1283   2250  3362
>
> select_idle_sibling() off
>          735   1344   2080  2884
> delta   .977  1.047   .924  .857
>
> select_idle_sibling() on, 0c313cb20732 reverted
>          816   1317   2240  3388
> delta  1.085  1.026   .995 1.007 vs base
> delta  1.110   .979  1.076 1.174 vs off
>                (^hm)

tbench 2 turboboost off
base          1215  1.00   1215/32.24=37.68
revert        1252  1.03   1252/35.82=34.95=loser

tbench 2 throughput hm is apparently a turboboost oddity, and..

tbench (turboboost back on)
power      1      2      4     8
base   23.88  37.41  54.64 62.25
revert 31.25  42.53  55.11 62.66

MB/s/Ws    1      2      4     8
base   31.49  34.29  41.17 54.00
revert 26.11  30.96  40.64 54.06

..while single pipe-test pair said green/green, tbench numbers say
throughput green, but energy efficiency red across the board.

        -Mike
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

Mike Galbraith-5
In reply to this post by Rafael J. Wysocki-5
On Sat, 2016-04-09 at 14:31 +0200, Rafael J. Wysocki wrote:

> On Sat, Apr 9, 2016 at 1:07 PM, Peter Zijlstra <[hidden email]>
> wrote:
> > On Fri, Apr 08, 2016 at 10:59:59PM +0200, Rafael J. Wysocki wrote:
> > > On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
> > > > On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
> > > >
> > > > > Cute, I thought you used governor=performance for your runs?
> > > >
> > > > I do, and those numbers are with it thus set.
> > >
> > > Well, this is a trade-off.
> > >
> > > 4.5 introduced a power regression here so this one goes back to
> > > the previous
> > > state of things.
> >
> > Just for my elucidation; how can gov=performance have a 'power'
> > regression?
>
> Because of what is used as the "default" idle state most of the time.
>
> C1 was used before 4.5 and that changed to polling in 4.5.

Should the default idle state not then be governor dependent?  When I
set gov=performance, I'm expecting box to go just as fast as it can go
without melting.  Does polling risk CPU -> lava conversion?

        -Mike
12
Loading...