S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Ville Syrjälä-2
Hi,

I have a Lenovo Ideapad S10-3t machine here (Atom N450, 1 core, 2 HT)
which fails to resume from S3 on 4.6-rc releases. I bisected it down to

commit 1cf4f629d9d246519a1e76c021806f2a51ddba4d
Author: Thomas Gleixner <[hidden email]>
Date:   Fri Feb 26 18:43:39 2016 +0000

    cpu/hotplug: Move online calls to hotplugged cpu

Unfortunately that won't revert cleanly, and neither does the merge
commit, so I was unable to see if that is the only problematic commit
in 4.6.

Any ideas?

Oh, and this was with acpi_idle. This machine already failed to
resume from S3 with intel_idle since forever, as detailed in
https://bugzilla.kernel.org/show_bug.cgi?id=107151
but acpi_idle worked fine until now.

My .config is attached.

--
Ville Syrjälä
Intel OTC

.config (97K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Sebastian Andrzej Siewior-4
On 05/11/2016 12:19 PM, Ville Syrjälä wrote:
> Hi,
Hi,

> I have a Lenovo Ideapad S10-3t machine here (Atom N450, 1 core, 2 HT)
> which fails to resume from S3 on 4.6-rc releases. I bisected it down to
>
> commit 1cf4f629d9d246519a1e76c021806f2a51ddba4d
> Author: Thomas Gleixner <[hidden email]>
> Date:   Fri Feb 26 18:43:39 2016 +0000
>
>     cpu/hotplug: Move online calls to hotplugged cpu
>
> Unfortunately that won't revert cleanly, and neither does the merge
> commit, so I was unable to see if that is the only problematic commit
> in 4.6.
>
> Any ideas?

do you have a backtrace or anything or is it just not working and you
end up with a blank screen?

Sebastian
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Ville Syrjälä-2
On Wed, May 11, 2016 at 02:11:29PM +0200, Sebastian Andrzej Siewior wrote:

> On 05/11/2016 12:19 PM, Ville Syrjälä wrote:
> > Hi,
> Hi,
>
> > I have a Lenovo Ideapad S10-3t machine here (Atom N450, 1 core, 2 HT)
> > which fails to resume from S3 on 4.6-rc releases. I bisected it down to
> >
> > commit 1cf4f629d9d246519a1e76c021806f2a51ddba4d
> > Author: Thomas Gleixner <[hidden email]>
> > Date:   Fri Feb 26 18:43:39 2016 +0000
> >
> >     cpu/hotplug: Move online calls to hotplugged cpu
> >
> > Unfortunately that won't revert cleanly, and neither does the merge
> > commit, so I was unable to see if that is the only problematic commit
> > in 4.6.
> >
> > Any ideas?
>
> do you have a backtrace or anything or is it just not working and you
> end up with a blank screen?

Yeah can't get anything from the machine at that point. netconsole
didn't help either, and no serial on this machine. And IIRC I've
tried ramoops on this thing in the past but unfortunately the memory
got cleared on reboot.

--
Ville Syrjälä
Intel OTC
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Sebastian Andrzej Siewior-4
On 05/11/2016 02:21 PM, Ville Syrjälä wrote:
> Yeah can't get anything from the machine at that point. netconsole
> didn't help either, and no serial on this machine. And IIRC I've
> tried ramoops on this thing in the past but unfortunately the memory
> got cleared on reboot.

efi + pstore maybe?

Sebastian
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Ville Syrjälä-2
On Wed, May 11, 2016 at 02:24:51PM +0200, Sebastian Andrzej Siewior wrote:
> On 05/11/2016 02:21 PM, Ville Syrjälä wrote:
> > Yeah can't get anything from the machine at that point. netconsole
> > didn't help either, and no serial on this machine. And IIRC I've
> > tried ramoops on this thing in the past but unfortunately the memory
> > got cleared on reboot.
>
> efi + pstore maybe?

I think you might have the wrong decade in mind.

--
Ville Syrjälä
Intel OTC
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Arjan van de Ven
In reply to this post by Ville Syrjälä-2
On 5/11/2016 3:19 AM, Ville Syrjälä wrote:

> Oh, and this was with acpi_idle. This machine already failed to
> resume from S3 with intel_idle since forever, as detailed in
> https://bugzilla.kernel.org/show_bug.cgi?id=107151
> but acpi_idle worked fine until now.

this is the important clue part afaics.

some of these very old Atom's had issues (bios?) with S3 if the cores were in a too-deep C state,
and at some point there was a workaround (I forgot where in the code) to ban those deep
C states around S3 on those cpus. I wonder if moving things around has made said workaround
ineffective.....

Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Steven Rostedt
In reply to this post by Ville Syrjälä-2
On Wed, 11 May 2016 15:21:16 +0300
Ville Syrjälä <[hidden email]> wrote:

> Yeah can't get anything from the machine at that point. netconsole
> didn't help either, and no serial on this machine. And IIRC I've
> tried ramoops on this thing in the past but unfortunately the memory
> got cleared on reboot.
>

Can you look at the documentation in the kernel code at

Documentation/power/basic-pm-debugging.txt And follow the procedures
for testing suspend to RAM (although it requires mostly running the
same tests as for hibernation suspending).

You can also use the tool s2ram for this as well.

See Documentation/power/s2ram.txt

Perhaps this can give us a bit more light onto the problem.

Basically the above does partial suspend and resume, and can pinpoint
problem areas down to a more select location.


Thanks!

-- Steve
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Rafael J. Wysocki-3
In reply to this post by Ville Syrjälä-2
On 5/11/2016 2:21 PM, Ville Syrjälä wrote:

> On Wed, May 11, 2016 at 02:11:29PM +0200, Sebastian Andrzej Siewior wrote:
>> On 05/11/2016 12:19 PM, Ville Syrjälä wrote:
>>> Hi,
>> Hi,
>>
>>> I have a Lenovo Ideapad S10-3t machine here (Atom N450, 1 core, 2 HT)
>>> which fails to resume from S3 on 4.6-rc releases. I bisected it down to
>>>
>>> commit 1cf4f629d9d246519a1e76c021806f2a51ddba4d
>>> Author: Thomas Gleixner <[hidden email]>
>>> Date:   Fri Feb 26 18:43:39 2016 +0000
>>>
>>>      cpu/hotplug: Move online calls to hotplugged cpu
>>>
>>> Unfortunately that won't revert cleanly, and neither does the merge
>>> commit, so I was unable to see if that is the only problematic commit
>>> in 4.6.
>>>
>>> Any ideas?
>> do you have a backtrace or anything or is it just not working and you
>> end up with a blank screen?
> Yeah can't get anything from the machine at that point. netconsole
> didn't help either, and no serial on this machine. And IIRC I've
> tried ramoops on this thing in the past but unfortunately the memory
> got cleared on reboot.
>

Please try

# echo processors > /sys/power/pm_test

and then suspend (it should simulate a suspend, wait for approx. 5 sec
and then resume, see
Documentation/power/basic_pm_debugging.txt for details).  See if that
works or if you can get any
traces etc.

Thanks,
Rafael

Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Ville Syrjälä-2
In reply to this post by Steven Rostedt
On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:

> On Wed, 11 May 2016 15:21:16 +0300
> Ville Syrjälä <[hidden email]> wrote:
>
> > Yeah can't get anything from the machine at that point. netconsole
> > didn't help either, and no serial on this machine. And IIRC I've
> > tried ramoops on this thing in the past but unfortunately the memory
> > got cleared on reboot.
> >
>
> Can you look at the documentation in the kernel code at
>
> Documentation/power/basic-pm-debugging.txt And follow the procedures
> for testing suspend to RAM (although it requires mostly running the
> same tests as for hibernation suspending).
>
> You can also use the tool s2ram for this as well.
>
> See Documentation/power/s2ram.txt
>
> Perhaps this can give us a bit more light onto the problem.
>
> Basically the above does partial suspend and resume, and can pinpoint
> problem areas down to a more select location.

All the pm_test modes work fine. The only difference between them was
that 'platform' required me to manually wake up the machine (hitting a
key was sufficient), whereas the others woke up without help.

pm_trace gave me
[    1.306633]   Magic number: 0:185:178
[    1.322880]   hash matches ../drivers/base/power/main.c:1070
[    1.339270] acpi device:0e: hash matches
[    1.355414]  platform: hash matches

which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
there.

I guess I could try to sprinkle more TRACE_RESUMEs around into some
early resume code. If anyone has good ideas where to put them it
might speed things up a bit.

--
Ville Syrjälä
Intel OTC
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Jim Bos
In reply to this post by Rafael J. Wysocki-3
On 05/11/2016 03:36 PM, Rafael J. Wysocki wrote:

> On 5/11/2016 2:21 PM, Ville Syrjälä wrote:
>> On Wed, May 11, 2016 at 02:11:29PM +0200, Sebastian Andrzej Siewior
>> wrote:
>>> On 05/11/2016 12:19 PM, Ville Syrjälä wrote:
>>>> Hi,
>>> Hi,
>>>
>>>> I have a Lenovo Ideapad S10-3t machine here (Atom N450, 1 core, 2 HT)
>>>> which fails to resume from S3 on 4.6-rc releases. I bisected it down to
>>>>
>>>> commit 1cf4f629d9d246519a1e76c021806f2a51ddba4d
>>>> Author: Thomas Gleixner <[hidden email]>
>>>> Date:   Fri Feb 26 18:43:39 2016 +0000
>>>>
>>>>      cpu/hotplug: Move online calls to hotplugged cpu
>>>>
>>>> Unfortunately that won't revert cleanly, and neither does the merge
>>>> commit, so I was unable to see if that is the only problematic commit
>>>> in 4.6.
>>>>
>>>> Any ideas?
>>> do you have a backtrace or anything or is it just not working and you
>>> end up with a blank screen?
>> Yeah can't get anything from the machine at that point. netconsole
>> didn't help either, and no serial on this machine. And IIRC I've
>> tried ramoops on this thing in the past but unfortunately the memory
>> got cleared on reboot.
>>
>
> Please try
>
> # echo processors > /sys/power/pm_test
>
> and then suspend (it should simulate a suspend, wait for approx. 5 sec
> and then resume, see
> Documentation/power/basic_pm_debugging.txt for details).  See if that
> works or if you can get any
> traces etc.
>
> Thanks,
> Rafael
>

Hmm, I thought I had some resume issue but ignored that, so I just tried
again.
On 4.6.0-rc1 all is fine but on 4.6.0-rc7 on resume the machine locks up
totally. No response on ping or sysrq-B, only hard reset works.

Tried this 'echo processors > /sys/power/pm_test' follow by pm-suspend

and did find a lot of ACPI errors (attached) in the log which are
definitely not present after normal boot.

This is on a Intel(R) Pentium(R) CPU G3220 @ 3.00GHz

So not sure if this is same issue but just wanted to mention it.

_
Jim


dmesg.gz (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Arjan van de Ven
In reply to this post by Ville Syrjälä-2

> Oh, and this was with acpi_idle. This machine already failed to
> resume from S3 with intel_idle since forever, as detailed in
> https://bugzilla.kernel.org/show_bug.cgi?id=107151
> but acpi_idle worked fine until now.

can you disable (in sysfs) all C states other than C0/C1 and see if that makes it go away?
that would point at the problem pretty clearly...


Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Rafael J. Wysocki-5
In reply to this post by Jim Bos
On Wed, May 11, 2016 at 5:25 PM, Jim Bos <[hidden email]> wrote:

> On 05/11/2016 03:36 PM, Rafael J. Wysocki wrote:
>> On 5/11/2016 2:21 PM, Ville Syrjälä wrote:
>>> On Wed, May 11, 2016 at 02:11:29PM +0200, Sebastian Andrzej Siewior
>>> wrote:
>>>> On 05/11/2016 12:19 PM, Ville Syrjälä wrote:
>>>>> Hi,
>>>> Hi,
>>>>
>>>>> I have a Lenovo Ideapad S10-3t machine here (Atom N450, 1 core, 2 HT)
>>>>> which fails to resume from S3 on 4.6-rc releases. I bisected it down to
>>>>>
>>>>> commit 1cf4f629d9d246519a1e76c021806f2a51ddba4d
>>>>> Author: Thomas Gleixner <[hidden email]>
>>>>> Date:   Fri Feb 26 18:43:39 2016 +0000
>>>>>
>>>>>      cpu/hotplug: Move online calls to hotplugged cpu
>>>>>
>>>>> Unfortunately that won't revert cleanly, and neither does the merge
>>>>> commit, so I was unable to see if that is the only problematic commit
>>>>> in 4.6.
>>>>>
>>>>> Any ideas?
>>>> do you have a backtrace or anything or is it just not working and you
>>>> end up with a blank screen?
>>> Yeah can't get anything from the machine at that point. netconsole
>>> didn't help either, and no serial on this machine. And IIRC I've
>>> tried ramoops on this thing in the past but unfortunately the memory
>>> got cleared on reboot.
>>>
>>
>> Please try
>>
>> # echo processors > /sys/power/pm_test
>>
>> and then suspend (it should simulate a suspend, wait for approx. 5 sec
>> and then resume, see
>> Documentation/power/basic_pm_debugging.txt for details).  See if that
>> works or if you can get any
>> traces etc.
>>
>> Thanks,
>> Rafael
>>
>
>
> Hmm, I thought I had some resume issue but ignored that, so I just tried
> again.
> On 4.6.0-rc1 all is fine but on 4.6.0-rc7 on resume the machine locks up
> totally. No response on ping or sysrq-B, only hard reset works.
>
> Tried this 'echo processors > /sys/power/pm_test' follow by pm-suspend
>
> and did find a lot of ACPI errors (attached) in the log which are
> definitely not present after normal boot.
>
> This is on a Intel(R) Pentium(R) CPU G3220 @ 3.00GHz
>
> So not sure if this is same issue but just wanted to mention it.

If the problem is reproducible, you should be able to identify the
commit that broke things for you.

Have you tried to check if this is the same commit reported in this thread?
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Sebastian Andrzej Siewior-4
On 05/11/2016 06:19 PM, Rafael J. Wysocki wrote:
> Have you tried to check if this is the same commit reported in this thread?

The commit in this thread is part of v4.6-rc1 and he is saying that rc1
is working fine.

Sebastian
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Rafael J. Wysocki-5
On Wed, May 11, 2016 at 6:21 PM, Sebastian Andrzej Siewior
<[hidden email]> wrote:
> On 05/11/2016 06:19 PM, Rafael J. Wysocki wrote:
>> Have you tried to check if this is the same commit reported in this thread?
>
> The commit in this thread is part of v4.6-rc1 and he is saying that rc1
> is working fine.

I see.  That is a different problem then.

Jim, can you please start a new thread (with a CC to linux-pm) or just
file a bug at bugzilla.kernel.org to avoid confusing things?
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Ville Syrjälä-2
In reply to this post by Arjan van de Ven
On Wed, May 11, 2016 at 08:26:58AM -0700, Arjan van de Ven wrote:
>
> > Oh, and this was with acpi_idle. This machine already failed to
> > resume from S3 with intel_idle since forever, as detailed in
> > https://bugzilla.kernel.org/show_bug.cgi?id=107151
> > but acpi_idle worked fine until now.
>
> can you disable (in sysfs) all C states other than C0/C1 and see if that makes it go away?
> that would point at the problem pretty clearly...

No help there it seems.

However, as a sanity check I also tested that trick on the parent commit,
and disabling ACPI C2-C3 makes that one fail as well. Disabling just C3
is OK apparently.

--
Ville Syrjälä
Intel OTC
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Ville Syrjälä-2
In reply to this post by Ville Syrjälä-2
On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:

> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
> > On Wed, 11 May 2016 15:21:16 +0300
> > Ville Syrjälä <[hidden email]> wrote:
> >
> > > Yeah can't get anything from the machine at that point. netconsole
> > > didn't help either, and no serial on this machine. And IIRC I've
> > > tried ramoops on this thing in the past but unfortunately the memory
> > > got cleared on reboot.
> > >
> >
> > Can you look at the documentation in the kernel code at
> >
> > Documentation/power/basic-pm-debugging.txt And follow the procedures
> > for testing suspend to RAM (although it requires mostly running the
> > same tests as for hibernation suspending).
> >
> > You can also use the tool s2ram for this as well.
> >
> > See Documentation/power/s2ram.txt
> >
> > Perhaps this can give us a bit more light onto the problem.
> >
> > Basically the above does partial suspend and resume, and can pinpoint
> > problem areas down to a more select location.
>
> All the pm_test modes work fine. The only difference between them was
> that 'platform' required me to manually wake up the machine (hitting a
> key was sufficient), whereas the others woke up without help.
>
> pm_trace gave me
> [    1.306633]   Magic number: 0:185:178
> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
> [    1.339270] acpi device:0e: hash matches
> [    1.355414]  platform: hash matches
>
> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
> there.
>
> I guess I could try to sprinkle more TRACE_RESUMEs around into some
> early resume code. If anyone has good ideas where to put them it
> might speed things up a bit.
So I did a bunch of that and found that it gets stuck somewhere
around executing the _WAK method:
platform_resume_noirq
 acpi_pm_finish
  acpi_leave_sleep_state
   acpi_hw_sleep_dispatch
    acpi_hw_legacy_wake
     acpi_hw_execute_sleep_method
      acpi_evaluate_object
       acpi_ns_evaluate
        acpi_ps_execute_method
         acpi_ps_parse_aml

It also seesm that adding a few TRACE_RESUME()s or an msleep() right
after enable_nonboot_cpus() can avoid the hang, sometimes.

I've attached the DSDT in case anyone is interested in looking at it.

--
Ville Syrjälä
Intel OTC

dsdt.dat (41K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Rafael J. Wysocki-3
On 5/16/2016 9:39 PM, Ville Syrjälä wrote:

> On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
>> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
>>> On Wed, 11 May 2016 15:21:16 +0300
>>> Ville Syrjälä <[hidden email]> wrote:
>>>
>>>> Yeah can't get anything from the machine at that point. netconsole
>>>> didn't help either, and no serial on this machine. And IIRC I've
>>>> tried ramoops on this thing in the past but unfortunately the memory
>>>> got cleared on reboot.
>>>>
>>> Can you look at the documentation in the kernel code at
>>>
>>> Documentation/power/basic-pm-debugging.txt And follow the procedures
>>> for testing suspend to RAM (although it requires mostly running the
>>> same tests as for hibernation suspending).
>>>
>>> You can also use the tool s2ram for this as well.
>>>
>>> See Documentation/power/s2ram.txt
>>>
>>> Perhaps this can give us a bit more light onto the problem.
>>>
>>> Basically the above does partial suspend and resume, and can pinpoint
>>> problem areas down to a more select location.
>> All the pm_test modes work fine. The only difference between them was
>> that 'platform' required me to manually wake up the machine (hitting a
>> key was sufficient), whereas the others woke up without help.
>>
>> pm_trace gave me
>> [    1.306633]   Magic number: 0:185:178
>> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
>> [    1.339270] acpi device:0e: hash matches
>> [    1.355414]  platform: hash matches
>>
>> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
>> there.
>>
>> I guess I could try to sprinkle more TRACE_RESUMEs around into some
>> early resume code. If anyone has good ideas where to put them it
>> might speed things up a bit.
> So I did a bunch of that and found that it gets stuck somewhere
> around executing the _WAK method:
> platform_resume_noirq
>   acpi_pm_finish
>    acpi_leave_sleep_state
>     acpi_hw_sleep_dispatch
>      acpi_hw_legacy_wake
>       acpi_hw_execute_sleep_method
>        acpi_evaluate_object
>         acpi_ns_evaluate
>          acpi_ps_execute_method
>           acpi_ps_parse_aml
>
> It also seesm that adding a few TRACE_RESUME()s or an msleep() right
> after enable_nonboot_cpus() can avoid the hang, sometimes.
>
> I've attached the DSDT in case anyone is interested in looking at it.
>

What if you comment out the execution of _WAK (line 318 of
drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?

Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Ville Syrjälä-2
On Wed, May 18, 2016 at 01:14:42AM +0200, Rafael J. Wysocki wrote:

> On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
> > On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
> >> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
> >>> On Wed, 11 May 2016 15:21:16 +0300
> >>> Ville Syrjälä <[hidden email]> wrote:
> >>>
> >>>> Yeah can't get anything from the machine at that point. netconsole
> >>>> didn't help either, and no serial on this machine. And IIRC I've
> >>>> tried ramoops on this thing in the past but unfortunately the memory
> >>>> got cleared on reboot.
> >>>>
> >>> Can you look at the documentation in the kernel code at
> >>>
> >>> Documentation/power/basic-pm-debugging.txt And follow the procedures
> >>> for testing suspend to RAM (although it requires mostly running the
> >>> same tests as for hibernation suspending).
> >>>
> >>> You can also use the tool s2ram for this as well.
> >>>
> >>> See Documentation/power/s2ram.txt
> >>>
> >>> Perhaps this can give us a bit more light onto the problem.
> >>>
> >>> Basically the above does partial suspend and resume, and can pinpoint
> >>> problem areas down to a more select location.
> >> All the pm_test modes work fine. The only difference between them was
> >> that 'platform' required me to manually wake up the machine (hitting a
> >> key was sufficient), whereas the others woke up without help.
> >>
> >> pm_trace gave me
> >> [    1.306633]   Magic number: 0:185:178
> >> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
> >> [    1.339270] acpi device:0e: hash matches
> >> [    1.355414]  platform: hash matches
> >>
> >> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
> >> there.
> >>
> >> I guess I could try to sprinkle more TRACE_RESUMEs around into some
> >> early resume code. If anyone has good ideas where to put them it
> >> might speed things up a bit.
> > So I did a bunch of that and found that it gets stuck somewhere
> > around executing the _WAK method:
> > platform_resume_noirq
> >   acpi_pm_finish
> >    acpi_leave_sleep_state
> >     acpi_hw_sleep_dispatch
> >      acpi_hw_legacy_wake
> >       acpi_hw_execute_sleep_method
> >        acpi_evaluate_object
> >         acpi_ns_evaluate
> >          acpi_ps_execute_method
> >           acpi_ps_parse_aml
> >
> > It also seesm that adding a few TRACE_RESUME()s or an msleep() right
> > after enable_nonboot_cpus() can avoid the hang, sometimes.
> >
> > I've attached the DSDT in case anyone is interested in looking at it.
> >
>
> What if you comment out the execution of _WAK (line 318 of
> drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?

Indeed it does. Tried with acpi_idle and intel_idle, and both appear to
resume just fine with that hack.

-       acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
+       //acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
+       printk(KERN_CRIT "skipping _WAK\n");

--
Ville Syrjälä
Intel OTC
Reply | Threaded
Open this post in threaded view
|

Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

Ville Syrjälä-2
On Wed, May 18, 2016 at 10:24:24AM +0300, Ville Syrjälä wrote:

> On Wed, May 18, 2016 at 01:14:42AM +0200, Rafael J. Wysocki wrote:
> > On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
> > > On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
> > >> On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
> > >>> On Wed, 11 May 2016 15:21:16 +0300
> > >>> Ville Syrjälä <[hidden email]> wrote:
> > >>>
> > >>>> Yeah can't get anything from the machine at that point. netconsole
> > >>>> didn't help either, and no serial on this machine. And IIRC I've
> > >>>> tried ramoops on this thing in the past but unfortunately the memory
> > >>>> got cleared on reboot.
> > >>>>
> > >>> Can you look at the documentation in the kernel code at
> > >>>
> > >>> Documentation/power/basic-pm-debugging.txt And follow the procedures
> > >>> for testing suspend to RAM (although it requires mostly running the
> > >>> same tests as for hibernation suspending).
> > >>>
> > >>> You can also use the tool s2ram for this as well.
> > >>>
> > >>> See Documentation/power/s2ram.txt
> > >>>
> > >>> Perhaps this can give us a bit more light onto the problem.
> > >>>
> > >>> Basically the above does partial suspend and resume, and can pinpoint
> > >>> problem areas down to a more select location.
> > >> All the pm_test modes work fine. The only difference between them was
> > >> that 'platform' required me to manually wake up the machine (hitting a
> > >> key was sufficient), whereas the others woke up without help.
> > >>
> > >> pm_trace gave me
> > >> [    1.306633]   Magic number: 0:185:178
> > >> [    1.322880]   hash matches ../drivers/base/power/main.c:1070
> > >> [    1.339270] acpi device:0e: hash matches
> > >> [    1.355414]  platform: hash matches
> > >>
> > >> which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
> > >> there.
> > >>
> > >> I guess I could try to sprinkle more TRACE_RESUMEs around into some
> > >> early resume code. If anyone has good ideas where to put them it
> > >> might speed things up a bit.
> > > So I did a bunch of that and found that it gets stuck somewhere
> > > around executing the _WAK method:
> > > platform_resume_noirq
> > >   acpi_pm_finish
> > >    acpi_leave_sleep_state
> > >     acpi_hw_sleep_dispatch
> > >      acpi_hw_legacy_wake
> > >       acpi_hw_execute_sleep_method
> > >        acpi_evaluate_object
> > >         acpi_ns_evaluate
> > >          acpi_ps_execute_method
> > >           acpi_ps_parse_aml
> > >
> > > It also seesm that adding a few TRACE_RESUME()s or an msleep() right
> > > after enable_nonboot_cpus() can avoid the hang, sometimes.
> > >
> > > I've attached the DSDT in case anyone is interested in looking at it.
> > >
> >
> > What if you comment out the execution of _WAK (line 318 of
> > drivers/acpi/acpica/hwsleep.c in 4.6)?  Does that make any difference?
>
> Indeed it does. Tried with acpi_idle and intel_idle, and both appear to
> resume just fine with that hack.
>
> -       acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
> +       //acpi_hw_execute_sleep_method(METHOD_PATHNAME__WAK, sleep_state);
> +       printk(KERN_CRIT "skipping _WAK\n");

Continuing with my detective work a bit, I decided to hack the DSDT a
bit to see if I can narrow the it down further, and looks like I found
it on the first guess. The following change stops it from hanging.

@ -5056,7 +5056,7 @@
         If (LEqual (Arg0, 0x03))
         {
             Store (0x01, \SPNF)
-    TRAP (0x46)
+    //TRAP (0x46)
             P8XH (0x00, 0x03)
         }

So what does that do? Let's see:

    OperationRegion (IO_T, SystemIO, 0x0800, 0x10)
    Field (IO_T, ByteAcc, NoLock, Preserve)
    {
        Offset (0x08),
        TRP0,   8
    }

    OperationRegion (GNVS, SystemMemory, 0x3F5E0C7C, 0x0200)
    Field (GNVS, AnyAcc, Lock, Preserve)
    {
        OSYS,   16,
        SMIF,   8,
    ...

    Method (TRAP, 1, Serialized)
    {
        Store (Arg0, SMIF) /* \SMIF */
        Store (0x00, TRP0) /* \TRP0 */
        Return (SMIF) /* \SMIF */
    }

and a dump of the IOTR registers shows:

0x1e80: 0x0000fe01
0x1e84: 0x00020001
0x1e98: 0x000c0801
0x1e9c: 0x000200f0

which seems to be telling me that ports 0x800-0x80f and
0xfe00-0xfe03 would trigger an SMI.

So the next question is how do the idle drivers and cpu hotplug
fit into this picture. Do we need to force the second HT into
a specific C state before the SMI or something?

--
Ville Syrjälä
Intel OTC