linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* ACPI issues on cold power on [bisected]
@ 2017-12-08 15:11 Jonathan McDowell
  2017-12-22  0:21 ` Joonsoo Kim
  2018-01-03 11:49 ` [PATCH] ACPI / WMI: Call acpi_wmi_init() later Rafael J. Wysocki
  0 siblings, 2 replies; 13+ messages in thread
From: Jonathan McDowell @ 2017-12-08 15:11 UTC (permalink / raw)
  To: linux-acpi, linux-kernel, linux-mm; +Cc: Joonsoo Kim

I've been sitting on this for a while and should have spent time to
investigate sooner, but it's been an odd failure mode that wasn't quite
obvious.

In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I
don't see anything after grub says its booting. In 4.10 onwards the
laptop boots, but I get an Oops as part of the boot and ACPI is unhappy
(no suspend, no clean poweroff, no ACPI buttons). The Oops is below;
taken from 4.12 as that's the most recent error dmesg I have saved but
also seen back in 4.10. It's always address 0x30 for the dereference.

Rebooting the laptop does not lead to these problems; it's *only* from a
complete cold boot that they arise (which didn't help me in terms of
being able to reliably bisect). Once I realised that I was able to
bisect, but it leads me to an odd commit:

86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6
(mm/slab: fix kmemcg cache creation delayed issue)

If I revert this then I can cold boot without problems.

Also I don't see the problem with a stock Debian kernel, I think because
the ACPI support is modularised.

Config, dmesg + bisect log at:

https://the.earth.li/~noodles/acpi-problem/

-------
BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: netlink_broadcast_filtered+0x1d/0x3e0
PGD 0 
P4D 0 

Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 41 Comm: kworker/0:1 Not tainted 4.12.0 #1
Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A21 05/08/2017
Workqueue: kacpi_notify acpi_os_execute_deferred
task: ffff914e4c321240 task.stack: ffffa3bd4017c000
RIP: 0010:netlink_broadcast_filtered+0x1d/0x3e0
RSP: 0000:ffffa3bd4017fd90 EFLAGS: 00010286
RAX: 0000000000000001 RBX: ffff914e4c82b300 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000001080020 RDI: ffff914e4c82b300
RBP: ffff914e4c305614 R08: 0000000001080020 R09: 0000000000000000
R10: 0000000000000014 R11: ffffffffb8a31d40 R12: 0000000000000000
R13: 0000000000000000 R14: ffff914e4c305614 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff914e5ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 0000000236c09000 CR4: 00000000001406f0
Call Trace:
 ? __kmalloc_reserve.isra.37+0x24/0x70
 ? __nlmsg_put+0x63/0x80
 ? netlink_broadcast+0xa/0x10
 ? acpi_bus_generate_netlink_event+0x10d/0x150
 ? acpi_ev_notify_dispatch+0x37/0x4c
 ? acpi_os_execute_deferred+0xb/0x20
 ? process_one_work+0x1cf/0x3c0
 ? worker_thread+0x42/0x3c0
 ? __schedule+0x26c/0x660
 ? kthread+0xf7/0x130
 ? create_worker+0x190/0x190
 ? kthread_create_on_node+0x40/0x40
 ? ret_from_fork+0x22/0x30
Code: c8 c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 89 cf 41 56 41 55 49 89 fd 48 89 f7 44 89 c6 41 54 41 89 d4 55 53 48 83 ec 38 <49> 8b 6d 30 44 89 44 24 24 4c 89 4c 24 28 e8 a0 ec ff ff 48 c7 
RIP: netlink_broadcast_filtered+0x1d/0x3e0 RSP: ffffa3bd4017fd90
CR2: 0000000000000030
---[ end trace f8e25281792d4743 ]---

J.

-- 
/-\                             | 101 things you can't have too much
|@/  Debian GNU/Linux Developer |       of : 47 - More coffee.
\-                              |

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI issues on cold power on [bisected]
  2017-12-08 15:11 ACPI issues on cold power on [bisected] Jonathan McDowell
@ 2017-12-22  0:21 ` Joonsoo Kim
  2017-12-29 16:36   ` Jonathan McDowell
  2018-01-03 11:49 ` [PATCH] ACPI / WMI: Call acpi_wmi_init() later Rafael J. Wysocki
  1 sibling, 1 reply; 13+ messages in thread
From: Joonsoo Kim @ 2017-12-22  0:21 UTC (permalink / raw)
  To: Jonathan McDowell; +Cc: linux-acpi, linux-kernel, linux-mm

On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote:
> I've been sitting on this for a while and should have spent time to
> investigate sooner, but it's been an odd failure mode that wasn't quite
> obvious.
> 
> In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I
> don't see anything after grub says its booting. In 4.10 onwards the
> laptop boots, but I get an Oops as part of the boot and ACPI is unhappy
> (no suspend, no clean poweroff, no ACPI buttons). The Oops is below;
> taken from 4.12 as that's the most recent error dmesg I have saved but
> also seen back in 4.10. It's always address 0x30 for the dereference.
> 
> Rebooting the laptop does not lead to these problems; it's *only* from a
> complete cold boot that they arise (which didn't help me in terms of
> being able to reliably bisect). Once I realised that I was able to
> bisect, but it leads me to an odd commit:
> 
> 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6
> (mm/slab: fix kmemcg cache creation delayed issue)
> 
> If I revert this then I can cold boot without problems.
> 
> Also I don't see the problem with a stock Debian kernel, I think because
> the ACPI support is modularised.

Hello,

Sorry for late response. I was on a long vacation.

I have tried to solve the problem however I don't find any clue yet.

>From my analysis, oops report shows that 'struct sock *ssk' passed to
netlink_broadcast_filtered() is NULL. It means that some of
netlink_kernel_create() returns NULL. Maybe, it is due to slab
allocation failure. Could you check it by inserting some log on that
part? The issue cannot be reproducible in my side so I need your help.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI issues on cold power on [bisected]
  2017-12-22  0:21 ` Joonsoo Kim
@ 2017-12-29 16:36   ` Jonathan McDowell
  2018-01-02  2:54     ` Joonsoo Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan McDowell @ 2017-12-29 16:36 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-acpi, linux-kernel, linux-mm, netdev

On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote:
> On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote:
> > I've been sitting on this for a while and should have spent time to
> > investigate sooner, but it's been an odd failure mode that wasn't quite
> > obvious.
> > 
> > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I
> > don't see anything after grub says its booting. In 4.10 onwards the
> > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy
> > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below;
> > taken from 4.12 as that's the most recent error dmesg I have saved but
> > also seen back in 4.10. It's always address 0x30 for the dereference.
> > 
> > Rebooting the laptop does not lead to these problems; it's *only* from a
> > complete cold boot that they arise (which didn't help me in terms of
> > being able to reliably bisect). Once I realised that I was able to
> > bisect, but it leads me to an odd commit:
> > 
> > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6
> > (mm/slab: fix kmemcg cache creation delayed issue)
> > 
> > If I revert this then I can cold boot without problems.
> > 
> > Also I don't see the problem with a stock Debian kernel, I think because
> > the ACPI support is modularised.
> 
> Sorry for late response. I was on a long vacation.

No problem. I've been trying to get around to diagnosing this for a
while now anyway and this isn't a great time of year for fast responses.

> I have tried to solve the problem however I don't find any clue yet.
> 
> >From my analysis, oops report shows that 'struct sock *ssk' passed to
> netlink_broadcast_filtered() is NULL. It means that some of
> netlink_kernel_create() returns NULL. Maybe, it is due to slab
> allocation failure. Could you check it by inserting some log on that
> part? The issue cannot be reproducible in my side so I need your help.

I've added some debug in acpi_bus_generate_netlink_event +
genlmsg_multicast and the problem seems to be that genlmsg_multicast is
getting called when init_net.genl_sock has not yet been initialised,
leading to the NULL deference.

Full dmesg output from a cold 4.14.8 boot at:

https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken

And the same kernel after a reboot ("shutdown -r now"):

https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working

Patch that I've applied is at

https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff

The interesting difference seems to be:

 PCI: Using ACPI for IRQ routing
+ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492)
+ERROR: init_net.genl_sock is NULL
+BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
+IP: netlink_broadcast_filtered+0x20/0x3d0
+PGD 0 P4D 0 
+Oops: 0000 [#1] SMP
+Modules linked in:
+CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1
+Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017
+Workqueue: kacpi_notify acpi_os_execute_deferred

9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and
there's no visible event for it on a reboot, just on a cold power on.
Some sort of ordering issues such that genl_sock is being initialised
later with the slab change?

J.

-- 
  Hail Eris. All hail Discordia.   |  .''`.  Debian GNU/Linux Developer
              Fnord?               | : :' :  Happy to accept PGP signed
                                   | `. `'   or encrypted mail - RSA
                                   |   `-    key on the keyservers.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI issues on cold power on [bisected]
  2017-12-29 16:36   ` Jonathan McDowell
@ 2018-01-02  2:54     ` Joonsoo Kim
  2018-01-02 10:25       ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Joonsoo Kim @ 2018-01-02  2:54 UTC (permalink / raw)
  To: Jonathan McDowell; +Cc: linux-acpi, linux-kernel, linux-mm, netdev

On Fri, Dec 29, 2017 at 04:36:59PM +0000, Jonathan McDowell wrote:
> On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote:
> > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote:
> > > I've been sitting on this for a while and should have spent time to
> > > investigate sooner, but it's been an odd failure mode that wasn't quite
> > > obvious.
> > > 
> > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I
> > > don't see anything after grub says its booting. In 4.10 onwards the
> > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy
> > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below;
> > > taken from 4.12 as that's the most recent error dmesg I have saved but
> > > also seen back in 4.10. It's always address 0x30 for the dereference.
> > > 
> > > Rebooting the laptop does not lead to these problems; it's *only* from a
> > > complete cold boot that they arise (which didn't help me in terms of
> > > being able to reliably bisect). Once I realised that I was able to
> > > bisect, but it leads me to an odd commit:
> > > 
> > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6
> > > (mm/slab: fix kmemcg cache creation delayed issue)
> > > 
> > > If I revert this then I can cold boot without problems.
> > > 
> > > Also I don't see the problem with a stock Debian kernel, I think because
> > > the ACPI support is modularised.
> > 
> > Sorry for late response. I was on a long vacation.
> 
> No problem. I've been trying to get around to diagnosing this for a
> while now anyway and this isn't a great time of year for fast responses.
> 
> > I have tried to solve the problem however I don't find any clue yet.
> > 
> > >From my analysis, oops report shows that 'struct sock *ssk' passed to
> > netlink_broadcast_filtered() is NULL. It means that some of
> > netlink_kernel_create() returns NULL. Maybe, it is due to slab
> > allocation failure. Could you check it by inserting some log on that
> > part? The issue cannot be reproducible in my side so I need your help.
> 
> I've added some debug in acpi_bus_generate_netlink_event +
> genlmsg_multicast and the problem seems to be that genlmsg_multicast is
> getting called when init_net.genl_sock has not yet been initialised,
> leading to the NULL deference.
> 
> Full dmesg output from a cold 4.14.8 boot at:
> 
> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken
> 
> And the same kernel after a reboot ("shutdown -r now"):
> 
> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working
> 
> Patch that I've applied is at
> 
> https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff
> 

Thanks for testing! It's very helpful.

> The interesting difference seems to be:
> 
>  PCI: Using ACPI for IRQ routing
> +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492)
> +ERROR: init_net.genl_sock is NULL
> +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
> +IP: netlink_broadcast_filtered+0x20/0x3d0
> +PGD 0 P4D 0 
> +Oops: 0000 [#1] SMP
> +Modules linked in:
> +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1
> +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017
> +Workqueue: kacpi_notify acpi_os_execute_deferred
> 
> 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and
> there's no visible event for it on a reboot, just on a cold power on.
> Some sort of ordering issues such that genl_sock is being initialised
> later with the slab change?

I have checked that there is an ordering issue.

genl_init() which initializes init_net->genl_sock is called on
subsys_initcall().

acpi_wmi_init() which schedules acpi_wmi_notify_handler() to the
workqueue is called on subsys_initcall(), too.
(acpi_wmi_notify_handler() -> acpi_bus_generate_netlink_event() ->
netlink_broadcast())

In my system, acpi_wmi_init() is called before the genl_init().
Therefore, if the worker is scheduled before genl_init() is done, NULL
derefence would happen.

Although slab change revealed this problem, I think that problem is on
ACPI side and need to be fixed there.

Anyway, I'm not sure why it doesn't happen before. These ACPI
initialization code looks not changed for a long time. Could you test
this problem with the slub?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI issues on cold power on [bisected]
  2018-01-02  2:54     ` Joonsoo Kim
@ 2018-01-02 10:25       ` Rafael J. Wysocki
  2018-01-03  2:11         ` Joonsoo Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2018-01-02 10:25 UTC (permalink / raw)
  To: Joonsoo Kim, Jonathan McDowell
  Cc: ACPI Devel Maling List, Linux Kernel Mailing List,
	Linux Memory Management List, netdev

On Tue, Jan 2, 2018 at 3:54 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Fri, Dec 29, 2017 at 04:36:59PM +0000, Jonathan McDowell wrote:
>> On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote:
>> > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote:
>> > > I've been sitting on this for a while and should have spent time to
>> > > investigate sooner, but it's been an odd failure mode that wasn't quite
>> > > obvious.
>> > >
>> > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I
>> > > don't see anything after grub says its booting. In 4.10 onwards the
>> > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy
>> > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below;
>> > > taken from 4.12 as that's the most recent error dmesg I have saved but
>> > > also seen back in 4.10. It's always address 0x30 for the dereference.
>> > >
>> > > Rebooting the laptop does not lead to these problems; it's *only* from a
>> > > complete cold boot that they arise (which didn't help me in terms of
>> > > being able to reliably bisect). Once I realised that I was able to
>> > > bisect, but it leads me to an odd commit:
>> > >
>> > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6
>> > > (mm/slab: fix kmemcg cache creation delayed issue)
>> > >
>> > > If I revert this then I can cold boot without problems.
>> > >
>> > > Also I don't see the problem with a stock Debian kernel, I think because
>> > > the ACPI support is modularised.
>> >
>> > Sorry for late response. I was on a long vacation.
>>
>> No problem. I've been trying to get around to diagnosing this for a
>> while now anyway and this isn't a great time of year for fast responses.
>>
>> > I have tried to solve the problem however I don't find any clue yet.
>> >
>> > >From my analysis, oops report shows that 'struct sock *ssk' passed to
>> > netlink_broadcast_filtered() is NULL. It means that some of
>> > netlink_kernel_create() returns NULL. Maybe, it is due to slab
>> > allocation failure. Could you check it by inserting some log on that
>> > part? The issue cannot be reproducible in my side so I need your help.
>>
>> I've added some debug in acpi_bus_generate_netlink_event +
>> genlmsg_multicast and the problem seems to be that genlmsg_multicast is
>> getting called when init_net.genl_sock has not yet been initialised,
>> leading to the NULL deference.
>>
>> Full dmesg output from a cold 4.14.8 boot at:
>>
>> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken
>>
>> And the same kernel after a reboot ("shutdown -r now"):
>>
>> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working
>>
>> Patch that I've applied is at
>>
>> https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff
>>
>
> Thanks for testing! It's very helpful.
>
>> The interesting difference seems to be:
>>
>>  PCI: Using ACPI for IRQ routing
>> +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492)
>> +ERROR: init_net.genl_sock is NULL
>> +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
>> +IP: netlink_broadcast_filtered+0x20/0x3d0
>> +PGD 0 P4D 0
>> +Oops: 0000 [#1] SMP
>> +Modules linked in:
>> +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1
>> +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017
>> +Workqueue: kacpi_notify acpi_os_execute_deferred
>>
>> 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and
>> there's no visible event for it on a reboot, just on a cold power on.
>> Some sort of ordering issues such that genl_sock is being initialised
>> later with the slab change?
>
> I have checked that there is an ordering issue.
>
> genl_init() which initializes init_net->genl_sock is called on
> subsys_initcall().
>
> acpi_wmi_init() which schedules acpi_wmi_notify_handler() to the
> workqueue is called on subsys_initcall(), too.
> (acpi_wmi_notify_handler() -> acpi_bus_generate_netlink_event() ->
> netlink_broadcast())
>
> In my system, acpi_wmi_init() is called before the genl_init().
> Therefore, if the worker is scheduled before genl_init() is done, NULL
> derefence would happen.

Does it help to change the subsys_initcall() in wmi.c to subsys_initcall_sync()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI issues on cold power on [bisected]
  2018-01-02 10:25       ` Rafael J. Wysocki
@ 2018-01-03  2:11         ` Joonsoo Kim
  2018-01-03 10:38           ` Jonathan McDowell
  0 siblings, 1 reply; 13+ messages in thread
From: Joonsoo Kim @ 2018-01-03  2:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jonathan McDowell, ACPI Devel Maling List,
	Linux Kernel Mailing List, Linux Memory Management List, netdev

On Tue, Jan 02, 2018 at 11:25:01AM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 2, 2018 at 3:54 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Fri, Dec 29, 2017 at 04:36:59PM +0000, Jonathan McDowell wrote:
> >> On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote:
> >> > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote:
> >> > > I've been sitting on this for a while and should have spent time to
> >> > > investigate sooner, but it's been an odd failure mode that wasn't quite
> >> > > obvious.
> >> > >
> >> > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I
> >> > > don't see anything after grub says its booting. In 4.10 onwards the
> >> > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy
> >> > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below;
> >> > > taken from 4.12 as that's the most recent error dmesg I have saved but
> >> > > also seen back in 4.10. It's always address 0x30 for the dereference.
> >> > >
> >> > > Rebooting the laptop does not lead to these problems; it's *only* from a
> >> > > complete cold boot that they arise (which didn't help me in terms of
> >> > > being able to reliably bisect). Once I realised that I was able to
> >> > > bisect, but it leads me to an odd commit:
> >> > >
> >> > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6
> >> > > (mm/slab: fix kmemcg cache creation delayed issue)
> >> > >
> >> > > If I revert this then I can cold boot without problems.
> >> > >
> >> > > Also I don't see the problem with a stock Debian kernel, I think because
> >> > > the ACPI support is modularised.
> >> >
> >> > Sorry for late response. I was on a long vacation.
> >>
> >> No problem. I've been trying to get around to diagnosing this for a
> >> while now anyway and this isn't a great time of year for fast responses.
> >>
> >> > I have tried to solve the problem however I don't find any clue yet.
> >> >
> >> > >From my analysis, oops report shows that 'struct sock *ssk' passed to
> >> > netlink_broadcast_filtered() is NULL. It means that some of
> >> > netlink_kernel_create() returns NULL. Maybe, it is due to slab
> >> > allocation failure. Could you check it by inserting some log on that
> >> > part? The issue cannot be reproducible in my side so I need your help.
> >>
> >> I've added some debug in acpi_bus_generate_netlink_event +
> >> genlmsg_multicast and the problem seems to be that genlmsg_multicast is
> >> getting called when init_net.genl_sock has not yet been initialised,
> >> leading to the NULL deference.
> >>
> >> Full dmesg output from a cold 4.14.8 boot at:
> >>
> >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken
> >>
> >> And the same kernel after a reboot ("shutdown -r now"):
> >>
> >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working
> >>
> >> Patch that I've applied is at
> >>
> >> https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff
> >>
> >
> > Thanks for testing! It's very helpful.
> >
> >> The interesting difference seems to be:
> >>
> >>  PCI: Using ACPI for IRQ routing
> >> +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492)
> >> +ERROR: init_net.genl_sock is NULL
> >> +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
> >> +IP: netlink_broadcast_filtered+0x20/0x3d0
> >> +PGD 0 P4D 0
> >> +Oops: 0000 [#1] SMP
> >> +Modules linked in:
> >> +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1
> >> +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017
> >> +Workqueue: kacpi_notify acpi_os_execute_deferred
> >>
> >> 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and
> >> there's no visible event for it on a reboot, just on a cold power on.
> >> Some sort of ordering issues such that genl_sock is being initialised
> >> later with the slab change?
> >
> > I have checked that there is an ordering issue.
> >
> > genl_init() which initializes init_net->genl_sock is called on
> > subsys_initcall().
> >
> > acpi_wmi_init() which schedules acpi_wmi_notify_handler() to the
> > workqueue is called on subsys_initcall(), too.
> > (acpi_wmi_notify_handler() -> acpi_bus_generate_netlink_event() ->
> > netlink_broadcast())
> >
> > In my system, acpi_wmi_init() is called before the genl_init().
> > Therefore, if the worker is scheduled before genl_init() is done, NULL
> > derefence would happen.
> 
> Does it help to change the subsys_initcall() in wmi.c to subsys_initcall_sync()?

I guess that it would work. I cannot reproduce the issue so it needs
to be checked by Jonathan. Jonathan, could you check the problem
is disappeared with above change?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI issues on cold power on [bisected]
  2018-01-03  2:11         ` Joonsoo Kim
@ 2018-01-03 10:38           ` Jonathan McDowell
  2018-01-03 11:29             ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan McDowell @ 2018-01-03 10:38 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Rafael J. Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List, Linux Memory Management List, netdev

On Wed, Jan 03, 2018 at 11:11:29AM +0900, Joonsoo Kim wrote:
> On Tue, Jan 02, 2018 at 11:25:01AM +0100, Rafael J. Wysocki wrote:
> > On Tue, Jan 2, 2018 at 3:54 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > On Fri, Dec 29, 2017 at 04:36:59PM +0000, Jonathan McDowell wrote:
> > >> On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote:
> > >> > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote:
> > >> > > I've been sitting on this for a while and should have spent time to
> > >> > > investigate sooner, but it's been an odd failure mode that wasn't quite
> > >> > > obvious.
> > >> > >
> > >> > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I
> > >> > > don't see anything after grub says its booting. In 4.10 onwards the
> > >> > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy
> > >> > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below;
> > >> > > taken from 4.12 as that's the most recent error dmesg I have saved but
> > >> > > also seen back in 4.10. It's always address 0x30 for the dereference.
> > >> > >
> > >> > > Rebooting the laptop does not lead to these problems; it's *only* from a
> > >> > > complete cold boot that they arise (which didn't help me in terms of
> > >> > > being able to reliably bisect). Once I realised that I was able to
> > >> > > bisect, but it leads me to an odd commit:
> > >> > >
> > >> > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6
> > >> > > (mm/slab: fix kmemcg cache creation delayed issue)
> > >> > >
> > >> > > If I revert this then I can cold boot without problems.
> > >> > >
> > >> > > Also I don't see the problem with a stock Debian kernel, I think because
> > >> > > the ACPI support is modularised.
> > >> >
> > >> > Sorry for late response. I was on a long vacation.
> > >>
> > >> No problem. I've been trying to get around to diagnosing this for a
> > >> while now anyway and this isn't a great time of year for fast responses.
> > >>
> > >> > I have tried to solve the problem however I don't find any clue yet.
> > >> >
> > >> > >From my analysis, oops report shows that 'struct sock *ssk' passed to
> > >> > netlink_broadcast_filtered() is NULL. It means that some of
> > >> > netlink_kernel_create() returns NULL. Maybe, it is due to slab
> > >> > allocation failure. Could you check it by inserting some log on that
> > >> > part? The issue cannot be reproducible in my side so I need your help.
> > >>
> > >> I've added some debug in acpi_bus_generate_netlink_event +
> > >> genlmsg_multicast and the problem seems to be that genlmsg_multicast is
> > >> getting called when init_net.genl_sock has not yet been initialised,
> > >> leading to the NULL deference.
> > >>
> > >> Full dmesg output from a cold 4.14.8 boot at:
> > >>
> > >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken
> > >>
> > >> And the same kernel after a reboot ("shutdown -r now"):
> > >>
> > >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working
> > >>
> > >> Patch that I've applied is at
> > >>
> > >> https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff
> > >>
> > >
> > > Thanks for testing! It's very helpful.
> > >
> > >> The interesting difference seems to be:
> > >>
> > >>  PCI: Using ACPI for IRQ routing
> > >> +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492)
> > >> +ERROR: init_net.genl_sock is NULL
> > >> +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
> > >> +IP: netlink_broadcast_filtered+0x20/0x3d0
> > >> +PGD 0 P4D 0
> > >> +Oops: 0000 [#1] SMP
> > >> +Modules linked in:
> > >> +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1
> > >> +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017
> > >> +Workqueue: kacpi_notify acpi_os_execute_deferred
> > >>
> > >> 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and
> > >> there's no visible event for it on a reboot, just on a cold power on.
> > >> Some sort of ordering issues such that genl_sock is being initialised
> > >> later with the slab change?
> > >
> > > I have checked that there is an ordering issue.
> > >
> > > genl_init() which initializes init_net->genl_sock is called on
> > > subsys_initcall().
> > >
> > > acpi_wmi_init() which schedules acpi_wmi_notify_handler() to the
> > > workqueue is called on subsys_initcall(), too.
> > > (acpi_wmi_notify_handler() -> acpi_bus_generate_netlink_event() ->
> > > netlink_broadcast())
> > >
> > > In my system, acpi_wmi_init() is called before the genl_init().
> > > Therefore, if the worker is scheduled before genl_init() is done, NULL
> > > derefence would happen.
> > 
> > Does it help to change the subsys_initcall() in wmi.c to subsys_initcall_sync()?
> 
> I guess that it would work. I cannot reproduce the issue so it needs
> to be checked by Jonathan. Jonathan, could you check the problem
> is disappeared with above change?

I have confirmed that the problem also occurs when using SLUB instead of
SLAB, and that switching drivers/platform/x86/wmi.c to use
subsys_initcall_sync() instead of subsys_initcall() fixes the problem
for both. Weirdly I don't see the ACPI 208 event at boot time being
raised once that patch is in place.

J.

-- 
] https://www.earth.li/~noodles/ [] Synonym: word used when you can't  [
]  PGP/GPG Key @ the.earth.li    []       spell the one you want       [
] via keyserver, web or email.   []                                    [
] RSA: 4096/0x94FA372B2DA8B985   []                                    [

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI issues on cold power on [bisected]
  2018-01-03 10:38           ` Jonathan McDowell
@ 2018-01-03 11:29             ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2018-01-03 11:29 UTC (permalink / raw)
  To: Jonathan McDowell
  Cc: Joonsoo Kim, Rafael J. Wysocki, ACPI Devel Maling List,
	Linux Kernel Mailing List, Linux Memory Management List, netdev

On Wednesday, January 3, 2018 11:38:12 AM CET Jonathan McDowell wrote:
> On Wed, Jan 03, 2018 at 11:11:29AM +0900, Joonsoo Kim wrote:
> > On Tue, Jan 02, 2018 at 11:25:01AM +0100, Rafael J. Wysocki wrote:
> > > On Tue, Jan 2, 2018 at 3:54 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > On Fri, Dec 29, 2017 at 04:36:59PM +0000, Jonathan McDowell wrote:
> > > >> On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote:
> > > >> > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote:
> > > >> > > I've been sitting on this for a while and should have spent time to
> > > >> > > investigate sooner, but it's been an odd failure mode that wasn't quite
> > > >> > > obvious.
> > > >> > >
> > > >> > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I
> > > >> > > don't see anything after grub says its booting. In 4.10 onwards the
> > > >> > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy
> > > >> > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below;
> > > >> > > taken from 4.12 as that's the most recent error dmesg I have saved but
> > > >> > > also seen back in 4.10. It's always address 0x30 for the dereference.
> > > >> > >
> > > >> > > Rebooting the laptop does not lead to these problems; it's *only* from a
> > > >> > > complete cold boot that they arise (which didn't help me in terms of
> > > >> > > being able to reliably bisect). Once I realised that I was able to
> > > >> > > bisect, but it leads me to an odd commit:
> > > >> > >
> > > >> > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6
> > > >> > > (mm/slab: fix kmemcg cache creation delayed issue)
> > > >> > >
> > > >> > > If I revert this then I can cold boot without problems.
> > > >> > >
> > > >> > > Also I don't see the problem with a stock Debian kernel, I think because
> > > >> > > the ACPI support is modularised.
> > > >> >
> > > >> > Sorry for late response. I was on a long vacation.
> > > >>
> > > >> No problem. I've been trying to get around to diagnosing this for a
> > > >> while now anyway and this isn't a great time of year for fast responses.
> > > >>
> > > >> > I have tried to solve the problem however I don't find any clue yet.
> > > >> >
> > > >> > >From my analysis, oops report shows that 'struct sock *ssk' passed to
> > > >> > netlink_broadcast_filtered() is NULL. It means that some of
> > > >> > netlink_kernel_create() returns NULL. Maybe, it is due to slab
> > > >> > allocation failure. Could you check it by inserting some log on that
> > > >> > part? The issue cannot be reproducible in my side so I need your help.
> > > >>
> > > >> I've added some debug in acpi_bus_generate_netlink_event +
> > > >> genlmsg_multicast and the problem seems to be that genlmsg_multicast is
> > > >> getting called when init_net.genl_sock has not yet been initialised,
> > > >> leading to the NULL deference.
> > > >>
> > > >> Full dmesg output from a cold 4.14.8 boot at:
> > > >>
> > > >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken
> > > >>
> > > >> And the same kernel after a reboot ("shutdown -r now"):
> > > >>
> > > >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working
> > > >>
> > > >> Patch that I've applied is at
> > > >>
> > > >> https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff
> > > >>
> > > >
> > > > Thanks for testing! It's very helpful.
> > > >
> > > >> The interesting difference seems to be:
> > > >>
> > > >>  PCI: Using ACPI for IRQ routing
> > > >> +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492)
> > > >> +ERROR: init_net.genl_sock is NULL
> > > >> +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
> > > >> +IP: netlink_broadcast_filtered+0x20/0x3d0
> > > >> +PGD 0 P4D 0
> > > >> +Oops: 0000 [#1] SMP
> > > >> +Modules linked in:
> > > >> +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1
> > > >> +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017
> > > >> +Workqueue: kacpi_notify acpi_os_execute_deferred
> > > >>
> > > >> 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and
> > > >> there's no visible event for it on a reboot, just on a cold power on.
> > > >> Some sort of ordering issues such that genl_sock is being initialised
> > > >> later with the slab change?
> > > >
> > > > I have checked that there is an ordering issue.
> > > >
> > > > genl_init() which initializes init_net->genl_sock is called on
> > > > subsys_initcall().
> > > >
> > > > acpi_wmi_init() which schedules acpi_wmi_notify_handler() to the
> > > > workqueue is called on subsys_initcall(), too.
> > > > (acpi_wmi_notify_handler() -> acpi_bus_generate_netlink_event() ->
> > > > netlink_broadcast())
> > > >
> > > > In my system, acpi_wmi_init() is called before the genl_init().
> > > > Therefore, if the worker is scheduled before genl_init() is done, NULL
> > > > derefence would happen.
> > > 
> > > Does it help to change the subsys_initcall() in wmi.c to subsys_initcall_sync()?
> > 
> > I guess that it would work. I cannot reproduce the issue so it needs
> > to be checked by Jonathan. Jonathan, could you check the problem
> > is disappeared with above change?
> 
> I have confirmed that the problem also occurs when using SLUB instead of
> SLAB, and that switching drivers/platform/x86/wmi.c to use
> subsys_initcall_sync() instead of subsys_initcall() fixes the problem
> for both. Weirdly I don't see the ACPI 208 event at boot time being
> raised once that patch is in place.

Interesting.

Anyway, let me send this change as a proper patch.

Thanks,
Rafael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ACPI / WMI: Call acpi_wmi_init() later
  2017-12-08 15:11 ACPI issues on cold power on [bisected] Jonathan McDowell
  2017-12-22  0:21 ` Joonsoo Kim
@ 2018-01-03 11:49 ` Rafael J. Wysocki
  2018-01-05 23:30   ` Rafael J. Wysocki
  1 sibling, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2018-01-03 11:49 UTC (permalink / raw)
  To: Andy Shevchenko, Darren Hart
  Cc: Jonathan McDowell, linux-acpi, linux-kernel, linux-mm,
	Joonsoo Kim, platform-driver-x86, Andy Lutomirski

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Calling acpi_wmi_init() at the subsys_initcall() level causes ordering
issues to appear on some systems and they are difficult to reproduce,
because there is no guaranteed ordering between subsys_initcall()
calls, so they may occur in different orders on different systems.

In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache
creation delayed issue) exposed one of these issues where genl_init()
and acpi_wmi_init() are both called at the same initcall level, but
the former must run before the latter so as to avoid a NULL pointer
dereference.

For this reason, move the acpi_wmi_init() invocation to the
initcall_sync level which should still be early enough for things
to work correctly in the WMI land.

Link: https://marc.info/?t=151274596700002&r=1&w=2
Reported-by: Jonathan McDowell <noodles@earth.li>
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/platform/x86/wmi.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-pm/drivers/platform/x86/wmi.c
===================================================================
--- linux-pm.orig/drivers/platform/x86/wmi.c
+++ linux-pm/drivers/platform/x86/wmi.c
@@ -1458,5 +1458,5 @@ static void __exit acpi_wmi_exit(void)
 	class_unregister(&wmi_bus_class);
 }
 
-subsys_initcall(acpi_wmi_init);
+subsys_initcall_sync(acpi_wmi_init);
 module_exit(acpi_wmi_exit);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ACPI / WMI: Call acpi_wmi_init() later
  2018-01-03 11:49 ` [PATCH] ACPI / WMI: Call acpi_wmi_init() later Rafael J. Wysocki
@ 2018-01-05 23:30   ` Rafael J. Wysocki
  2018-01-06  1:16     ` Darren Hart
  2018-01-06 11:02     ` Jonathan McDowell
  0 siblings, 2 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2018-01-05 23:30 UTC (permalink / raw)
  To: Andy Shevchenko, Darren Hart
  Cc: Jonathan McDowell, ACPI Devel Maling List,
	Linux Kernel Mailing List, Linux Memory Management List,
	Joonsoo Kim, Platform Driver, Andy Lutomirski

On Wed, Jan 3, 2018 at 12:49 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Calling acpi_wmi_init() at the subsys_initcall() level causes ordering
> issues to appear on some systems and they are difficult to reproduce,
> because there is no guaranteed ordering between subsys_initcall()
> calls, so they may occur in different orders on different systems.
>
> In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache
> creation delayed issue) exposed one of these issues where genl_init()
> and acpi_wmi_init() are both called at the same initcall level, but
> the former must run before the latter so as to avoid a NULL pointer
> dereference.
>
> For this reason, move the acpi_wmi_init() invocation to the
> initcall_sync level which should still be early enough for things
> to work correctly in the WMI land.
>
> Link: https://marc.info/?t=151274596700002&r=1&w=2
> Reported-by: Jonathan McDowell <noodles@earth.li>
> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Tested-by: Jonathan McDowell <noodles@earth.li>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Guys, this fixes a crash on boot.

If there are no concerns/objections I will just take it through the ACPI tree.

> ---
>  drivers/platform/x86/wmi.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-pm/drivers/platform/x86/wmi.c
> ===================================================================
> --- linux-pm.orig/drivers/platform/x86/wmi.c
> +++ linux-pm/drivers/platform/x86/wmi.c
> @@ -1458,5 +1458,5 @@ static void __exit acpi_wmi_exit(void)
>         class_unregister(&wmi_bus_class);
>  }
>
> -subsys_initcall(acpi_wmi_init);
> +subsys_initcall_sync(acpi_wmi_init);
>  module_exit(acpi_wmi_exit);
>
> --

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ACPI / WMI: Call acpi_wmi_init() later
  2018-01-05 23:30   ` Rafael J. Wysocki
@ 2018-01-06  1:16     ` Darren Hart
  2018-01-06 11:02     ` Jonathan McDowell
  1 sibling, 0 replies; 13+ messages in thread
From: Darren Hart @ 2018-01-06  1:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andy Shevchenko, Jonathan McDowell, ACPI Devel Maling List,
	Linux Kernel Mailing List, Linux Memory Management List,
	Joonsoo Kim, Platform Driver, Andy Lutomirski

On Sat, Jan 06, 2018 at 12:30:23AM +0100, Rafael J. Wysocki wrote:
> On Wed, Jan 3, 2018 at 12:49 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Calling acpi_wmi_init() at the subsys_initcall() level causes ordering
> > issues to appear on some systems and they are difficult to reproduce,
> > because there is no guaranteed ordering between subsys_initcall()
> > calls, so they may occur in different orders on different systems.
> >
> > In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache
> > creation delayed issue) exposed one of these issues where genl_init()
> > and acpi_wmi_init() are both called at the same initcall level, but
> > the former must run before the latter so as to avoid a NULL pointer
> > dereference.
> >
> > For this reason, move the acpi_wmi_init() invocation to the
> > initcall_sync level which should still be early enough for things
> > to work correctly in the WMI land.
> >
> > Link: https://marc.info/?t=151274596700002&r=1&w=2
> > Reported-by: Jonathan McDowell <noodles@earth.li>
> > Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Tested-by: Jonathan McDowell <noodles@earth.li>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Guys, this fixes a crash on boot.
> 
> If there are no concerns/objections I will just take it through the ACPI tree.

Queued up and running through tests now. I'll have it in for-next as soon as
those complete assuming to issues.

-- 
Darren Hart
VMware Open Source Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ACPI / WMI: Call acpi_wmi_init() later
  2018-01-05 23:30   ` Rafael J. Wysocki
  2018-01-06  1:16     ` Darren Hart
@ 2018-01-06 11:02     ` Jonathan McDowell
  2018-01-06 22:59       ` Darren Hart
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan McDowell @ 2018-01-06 11:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andy Shevchenko, Darren Hart, ACPI Devel Maling List,
	Linux Kernel Mailing List, Linux Memory Management List,
	Joonsoo Kim, Platform Driver, Andy Lutomirski

On Sat, Jan 06, 2018 at 12:30:23AM +0100, Rafael J. Wysocki wrote:
> On Wed, Jan 3, 2018 at 12:49 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Calling acpi_wmi_init() at the subsys_initcall() level causes ordering
> > issues to appear on some systems and they are difficult to reproduce,
> > because there is no guaranteed ordering between subsys_initcall()
> > calls, so they may occur in different orders on different systems.
> >
> > In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache
> > creation delayed issue) exposed one of these issues where genl_init()
> > and acpi_wmi_init() are both called at the same initcall level, but
> > the former must run before the latter so as to avoid a NULL pointer
> > dereference.
> >
> > For this reason, move the acpi_wmi_init() invocation to the
> > initcall_sync level which should still be early enough for things
> > to work correctly in the WMI land.
> >
> > Link: https://marc.info/?t=151274596700002&r=1&w=2
> > Reported-by: Jonathan McDowell <noodles@earth.li>
> > Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Tested-by: Jonathan McDowell <noodles@earth.li>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Guys, this fixes a crash on boot.
> 
> If there are no concerns/objections I will just take it through the ACPI tree.

Note that I first started seeing it in v4.9 so would ideally hit the
appropriate stable trees too.

> > ---
> >  drivers/platform/x86/wmi.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Index: linux-pm/drivers/platform/x86/wmi.c
> > ===================================================================
> > --- linux-pm.orig/drivers/platform/x86/wmi.c
> > +++ linux-pm/drivers/platform/x86/wmi.c
> > @@ -1458,5 +1458,5 @@ static void __exit acpi_wmi_exit(void)
> >         class_unregister(&wmi_bus_class);
> >  }
> >
> > -subsys_initcall(acpi_wmi_init);
> > +subsys_initcall_sync(acpi_wmi_init);
> >  module_exit(acpi_wmi_exit);
> >
> > --

J.

-- 
/-\                             | 101 things you can't have too much
|@/  Debian GNU/Linux Developer |    of : 36 - Spare video tapes.
\-                              |

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] ACPI / WMI: Call acpi_wmi_init() later
  2018-01-06 11:02     ` Jonathan McDowell
@ 2018-01-06 22:59       ` Darren Hart
  0 siblings, 0 replies; 13+ messages in thread
From: Darren Hart @ 2018-01-06 22:59 UTC (permalink / raw)
  To: Jonathan McDowell
  Cc: Rafael J. Wysocki, Andy Shevchenko, ACPI Devel Maling List,
	Linux Kernel Mailing List, Linux Memory Management List,
	Joonsoo Kim, Platform Driver, Andy Lutomirski

On Sat, Jan 06, 2018 at 11:02:27AM +0000, Jonathan McDowell wrote:
> On Sat, Jan 06, 2018 at 12:30:23AM +0100, Rafael J. Wysocki wrote:
> > On Wed, Jan 3, 2018 at 12:49 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > Calling acpi_wmi_init() at the subsys_initcall() level causes ordering
> > > issues to appear on some systems and they are difficult to reproduce,
> > > because there is no guaranteed ordering between subsys_initcall()
> > > calls, so they may occur in different orders on different systems.
> > >
> > > In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache
> > > creation delayed issue) exposed one of these issues where genl_init()
> > > and acpi_wmi_init() are both called at the same initcall level, but
> > > the former must run before the latter so as to avoid a NULL pointer
> > > dereference.
> > >
> > > For this reason, move the acpi_wmi_init() invocation to the
> > > initcall_sync level which should still be early enough for things
> > > to work correctly in the WMI land.
> > >
> > > Link: https://marc.info/?t=151274596700002&r=1&w=2
> > > Reported-by: Jonathan McDowell <noodles@earth.li>
> > > Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > Tested-by: Jonathan McDowell <noodles@earth.li>
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > Guys, this fixes a crash on boot.
> > 
> > If there are no concerns/objections I will just take it through the ACPI tree.
> 
> Note that I first started seeing it in v4.9 so would ideally hit the
> appropriate stable trees too.

Thanks, I'll take care of that.

-- 
Darren Hart
VMware Open Source Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-01-06 22:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-08 15:11 ACPI issues on cold power on [bisected] Jonathan McDowell
2017-12-22  0:21 ` Joonsoo Kim
2017-12-29 16:36   ` Jonathan McDowell
2018-01-02  2:54     ` Joonsoo Kim
2018-01-02 10:25       ` Rafael J. Wysocki
2018-01-03  2:11         ` Joonsoo Kim
2018-01-03 10:38           ` Jonathan McDowell
2018-01-03 11:29             ` Rafael J. Wysocki
2018-01-03 11:49 ` [PATCH] ACPI / WMI: Call acpi_wmi_init() later Rafael J. Wysocki
2018-01-05 23:30   ` Rafael J. Wysocki
2018-01-06  1:16     ` Darren Hart
2018-01-06 11:02     ` Jonathan McDowell
2018-01-06 22:59       ` Darren Hart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox