* ACPI issues on cold power on [bisected] @ 2017-12-08 15:11 Jonathan McDowell 2017-12-22 0:21 ` Joonsoo Kim 2018-01-03 11:49 ` [PATCH] ACPI / WMI: Call acpi_wmi_init() later Rafael J. Wysocki 0 siblings, 2 replies; 13+ messages in thread From: Jonathan McDowell @ 2017-12-08 15:11 UTC (permalink / raw) To: linux-acpi, linux-kernel, linux-mm; +Cc: Joonsoo Kim I've been sitting on this for a while and should have spent time to investigate sooner, but it's been an odd failure mode that wasn't quite obvious. In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I don't see anything after grub says its booting. In 4.10 onwards the laptop boots, but I get an Oops as part of the boot and ACPI is unhappy (no suspend, no clean poweroff, no ACPI buttons). The Oops is below; taken from 4.12 as that's the most recent error dmesg I have saved but also seen back in 4.10. It's always address 0x30 for the dereference. Rebooting the laptop does not lead to these problems; it's *only* from a complete cold boot that they arise (which didn't help me in terms of being able to reliably bisect). Once I realised that I was able to bisect, but it leads me to an odd commit: 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6 (mm/slab: fix kmemcg cache creation delayed issue) If I revert this then I can cold boot without problems. Also I don't see the problem with a stock Debian kernel, I think because the ACPI support is modularised. Config, dmesg + bisect log at: https://the.earth.li/~noodles/acpi-problem/ ------- BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: netlink_broadcast_filtered+0x1d/0x3e0 PGD 0 P4D 0 Oops: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 41 Comm: kworker/0:1 Not tainted 4.12.0 #1 Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A21 05/08/2017 Workqueue: kacpi_notify acpi_os_execute_deferred task: ffff914e4c321240 task.stack: ffffa3bd4017c000 RIP: 0010:netlink_broadcast_filtered+0x1d/0x3e0 RSP: 0000:ffffa3bd4017fd90 EFLAGS: 00010286 RAX: 0000000000000001 RBX: ffff914e4c82b300 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000001080020 RDI: ffff914e4c82b300 RBP: ffff914e4c305614 R08: 0000000001080020 R09: 0000000000000000 R10: 0000000000000014 R11: ffffffffb8a31d40 R12: 0000000000000000 R13: 0000000000000000 R14: ffff914e4c305614 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff914e5ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 0000000236c09000 CR4: 00000000001406f0 Call Trace: ? __kmalloc_reserve.isra.37+0x24/0x70 ? __nlmsg_put+0x63/0x80 ? netlink_broadcast+0xa/0x10 ? acpi_bus_generate_netlink_event+0x10d/0x150 ? acpi_ev_notify_dispatch+0x37/0x4c ? acpi_os_execute_deferred+0xb/0x20 ? process_one_work+0x1cf/0x3c0 ? worker_thread+0x42/0x3c0 ? __schedule+0x26c/0x660 ? kthread+0xf7/0x130 ? create_worker+0x190/0x190 ? kthread_create_on_node+0x40/0x40 ? ret_from_fork+0x22/0x30 Code: c8 c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 89 cf 41 56 41 55 49 89 fd 48 89 f7 44 89 c6 41 54 41 89 d4 55 53 48 83 ec 38 <49> 8b 6d 30 44 89 44 24 24 4c 89 4c 24 28 e8 a0 ec ff ff 48 c7 RIP: netlink_broadcast_filtered+0x1d/0x3e0 RSP: ffffa3bd4017fd90 CR2: 0000000000000030 ---[ end trace f8e25281792d4743 ]--- J. -- /-\ | 101 things you can't have too much |@/ Debian GNU/Linux Developer | of : 47 - More coffee. \- | -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ACPI issues on cold power on [bisected] 2017-12-08 15:11 ACPI issues on cold power on [bisected] Jonathan McDowell @ 2017-12-22 0:21 ` Joonsoo Kim 2017-12-29 16:36 ` Jonathan McDowell 2018-01-03 11:49 ` [PATCH] ACPI / WMI: Call acpi_wmi_init() later Rafael J. Wysocki 1 sibling, 1 reply; 13+ messages in thread From: Joonsoo Kim @ 2017-12-22 0:21 UTC (permalink / raw) To: Jonathan McDowell; +Cc: linux-acpi, linux-kernel, linux-mm On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote: > I've been sitting on this for a while and should have spent time to > investigate sooner, but it's been an odd failure mode that wasn't quite > obvious. > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I > don't see anything after grub says its booting. In 4.10 onwards the > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below; > taken from 4.12 as that's the most recent error dmesg I have saved but > also seen back in 4.10. It's always address 0x30 for the dereference. > > Rebooting the laptop does not lead to these problems; it's *only* from a > complete cold boot that they arise (which didn't help me in terms of > being able to reliably bisect). Once I realised that I was able to > bisect, but it leads me to an odd commit: > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6 > (mm/slab: fix kmemcg cache creation delayed issue) > > If I revert this then I can cold boot without problems. > > Also I don't see the problem with a stock Debian kernel, I think because > the ACPI support is modularised. Hello, Sorry for late response. I was on a long vacation. I have tried to solve the problem however I don't find any clue yet. >From my analysis, oops report shows that 'struct sock *ssk' passed to netlink_broadcast_filtered() is NULL. It means that some of netlink_kernel_create() returns NULL. Maybe, it is due to slab allocation failure. Could you check it by inserting some log on that part? The issue cannot be reproducible in my side so I need your help. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ACPI issues on cold power on [bisected] 2017-12-22 0:21 ` Joonsoo Kim @ 2017-12-29 16:36 ` Jonathan McDowell 2018-01-02 2:54 ` Joonsoo Kim 0 siblings, 1 reply; 13+ messages in thread From: Jonathan McDowell @ 2017-12-29 16:36 UTC (permalink / raw) To: Joonsoo Kim; +Cc: linux-acpi, linux-kernel, linux-mm, netdev On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote: > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote: > > I've been sitting on this for a while and should have spent time to > > investigate sooner, but it's been an odd failure mode that wasn't quite > > obvious. > > > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I > > don't see anything after grub says its booting. In 4.10 onwards the > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below; > > taken from 4.12 as that's the most recent error dmesg I have saved but > > also seen back in 4.10. It's always address 0x30 for the dereference. > > > > Rebooting the laptop does not lead to these problems; it's *only* from a > > complete cold boot that they arise (which didn't help me in terms of > > being able to reliably bisect). Once I realised that I was able to > > bisect, but it leads me to an odd commit: > > > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6 > > (mm/slab: fix kmemcg cache creation delayed issue) > > > > If I revert this then I can cold boot without problems. > > > > Also I don't see the problem with a stock Debian kernel, I think because > > the ACPI support is modularised. > > Sorry for late response. I was on a long vacation. No problem. I've been trying to get around to diagnosing this for a while now anyway and this isn't a great time of year for fast responses. > I have tried to solve the problem however I don't find any clue yet. > > >From my analysis, oops report shows that 'struct sock *ssk' passed to > netlink_broadcast_filtered() is NULL. It means that some of > netlink_kernel_create() returns NULL. Maybe, it is due to slab > allocation failure. Could you check it by inserting some log on that > part? The issue cannot be reproducible in my side so I need your help. I've added some debug in acpi_bus_generate_netlink_event + genlmsg_multicast and the problem seems to be that genlmsg_multicast is getting called when init_net.genl_sock has not yet been initialised, leading to the NULL deference. Full dmesg output from a cold 4.14.8 boot at: https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken And the same kernel after a reboot ("shutdown -r now"): https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working Patch that I've applied is at https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff The interesting difference seems to be: PCI: Using ACPI for IRQ routing +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492) +ERROR: init_net.genl_sock is NULL +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 +IP: netlink_broadcast_filtered+0x20/0x3d0 +PGD 0 P4D 0 +Oops: 0000 [#1] SMP +Modules linked in: +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1 +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017 +Workqueue: kacpi_notify acpi_os_execute_deferred 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and there's no visible event for it on a reboot, just on a cold power on. Some sort of ordering issues such that genl_sock is being initialised later with the slab change? J. -- Hail Eris. All hail Discordia. | .''`. Debian GNU/Linux Developer Fnord? | : :' : Happy to accept PGP signed | `. `' or encrypted mail - RSA | `- key on the keyservers. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ACPI issues on cold power on [bisected] 2017-12-29 16:36 ` Jonathan McDowell @ 2018-01-02 2:54 ` Joonsoo Kim 2018-01-02 10:25 ` Rafael J. Wysocki 0 siblings, 1 reply; 13+ messages in thread From: Joonsoo Kim @ 2018-01-02 2:54 UTC (permalink / raw) To: Jonathan McDowell; +Cc: linux-acpi, linux-kernel, linux-mm, netdev On Fri, Dec 29, 2017 at 04:36:59PM +0000, Jonathan McDowell wrote: > On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote: > > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote: > > > I've been sitting on this for a while and should have spent time to > > > investigate sooner, but it's been an odd failure mode that wasn't quite > > > obvious. > > > > > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I > > > don't see anything after grub says its booting. In 4.10 onwards the > > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy > > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below; > > > taken from 4.12 as that's the most recent error dmesg I have saved but > > > also seen back in 4.10. It's always address 0x30 for the dereference. > > > > > > Rebooting the laptop does not lead to these problems; it's *only* from a > > > complete cold boot that they arise (which didn't help me in terms of > > > being able to reliably bisect). Once I realised that I was able to > > > bisect, but it leads me to an odd commit: > > > > > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6 > > > (mm/slab: fix kmemcg cache creation delayed issue) > > > > > > If I revert this then I can cold boot without problems. > > > > > > Also I don't see the problem with a stock Debian kernel, I think because > > > the ACPI support is modularised. > > > > Sorry for late response. I was on a long vacation. > > No problem. I've been trying to get around to diagnosing this for a > while now anyway and this isn't a great time of year for fast responses. > > > I have tried to solve the problem however I don't find any clue yet. > > > > >From my analysis, oops report shows that 'struct sock *ssk' passed to > > netlink_broadcast_filtered() is NULL. It means that some of > > netlink_kernel_create() returns NULL. Maybe, it is due to slab > > allocation failure. Could you check it by inserting some log on that > > part? The issue cannot be reproducible in my side so I need your help. > > I've added some debug in acpi_bus_generate_netlink_event + > genlmsg_multicast and the problem seems to be that genlmsg_multicast is > getting called when init_net.genl_sock has not yet been initialised, > leading to the NULL deference. > > Full dmesg output from a cold 4.14.8 boot at: > > https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken > > And the same kernel after a reboot ("shutdown -r now"): > > https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working > > Patch that I've applied is at > > https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff > Thanks for testing! It's very helpful. > The interesting difference seems to be: > > PCI: Using ACPI for IRQ routing > +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492) > +ERROR: init_net.genl_sock is NULL > +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 > +IP: netlink_broadcast_filtered+0x20/0x3d0 > +PGD 0 P4D 0 > +Oops: 0000 [#1] SMP > +Modules linked in: > +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1 > +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017 > +Workqueue: kacpi_notify acpi_os_execute_deferred > > 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and > there's no visible event for it on a reboot, just on a cold power on. > Some sort of ordering issues such that genl_sock is being initialised > later with the slab change? I have checked that there is an ordering issue. genl_init() which initializes init_net->genl_sock is called on subsys_initcall(). acpi_wmi_init() which schedules acpi_wmi_notify_handler() to the workqueue is called on subsys_initcall(), too. (acpi_wmi_notify_handler() -> acpi_bus_generate_netlink_event() -> netlink_broadcast()) In my system, acpi_wmi_init() is called before the genl_init(). Therefore, if the worker is scheduled before genl_init() is done, NULL derefence would happen. Although slab change revealed this problem, I think that problem is on ACPI side and need to be fixed there. Anyway, I'm not sure why it doesn't happen before. These ACPI initialization code looks not changed for a long time. Could you test this problem with the slub? Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ACPI issues on cold power on [bisected] 2018-01-02 2:54 ` Joonsoo Kim @ 2018-01-02 10:25 ` Rafael J. Wysocki 2018-01-03 2:11 ` Joonsoo Kim 0 siblings, 1 reply; 13+ messages in thread From: Rafael J. Wysocki @ 2018-01-02 10:25 UTC (permalink / raw) To: Joonsoo Kim, Jonathan McDowell Cc: ACPI Devel Maling List, Linux Kernel Mailing List, Linux Memory Management List, netdev On Tue, Jan 2, 2018 at 3:54 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > On Fri, Dec 29, 2017 at 04:36:59PM +0000, Jonathan McDowell wrote: >> On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote: >> > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote: >> > > I've been sitting on this for a while and should have spent time to >> > > investigate sooner, but it's been an odd failure mode that wasn't quite >> > > obvious. >> > > >> > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I >> > > don't see anything after grub says its booting. In 4.10 onwards the >> > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy >> > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below; >> > > taken from 4.12 as that's the most recent error dmesg I have saved but >> > > also seen back in 4.10. It's always address 0x30 for the dereference. >> > > >> > > Rebooting the laptop does not lead to these problems; it's *only* from a >> > > complete cold boot that they arise (which didn't help me in terms of >> > > being able to reliably bisect). Once I realised that I was able to >> > > bisect, but it leads me to an odd commit: >> > > >> > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6 >> > > (mm/slab: fix kmemcg cache creation delayed issue) >> > > >> > > If I revert this then I can cold boot without problems. >> > > >> > > Also I don't see the problem with a stock Debian kernel, I think because >> > > the ACPI support is modularised. >> > >> > Sorry for late response. I was on a long vacation. >> >> No problem. I've been trying to get around to diagnosing this for a >> while now anyway and this isn't a great time of year for fast responses. >> >> > I have tried to solve the problem however I don't find any clue yet. >> > >> > >From my analysis, oops report shows that 'struct sock *ssk' passed to >> > netlink_broadcast_filtered() is NULL. It means that some of >> > netlink_kernel_create() returns NULL. Maybe, it is due to slab >> > allocation failure. Could you check it by inserting some log on that >> > part? The issue cannot be reproducible in my side so I need your help. >> >> I've added some debug in acpi_bus_generate_netlink_event + >> genlmsg_multicast and the problem seems to be that genlmsg_multicast is >> getting called when init_net.genl_sock has not yet been initialised, >> leading to the NULL deference. >> >> Full dmesg output from a cold 4.14.8 boot at: >> >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken >> >> And the same kernel after a reboot ("shutdown -r now"): >> >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working >> >> Patch that I've applied is at >> >> https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff >> > > Thanks for testing! It's very helpful. > >> The interesting difference seems to be: >> >> PCI: Using ACPI for IRQ routing >> +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492) >> +ERROR: init_net.genl_sock is NULL >> +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 >> +IP: netlink_broadcast_filtered+0x20/0x3d0 >> +PGD 0 P4D 0 >> +Oops: 0000 [#1] SMP >> +Modules linked in: >> +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1 >> +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017 >> +Workqueue: kacpi_notify acpi_os_execute_deferred >> >> 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and >> there's no visible event for it on a reboot, just on a cold power on. >> Some sort of ordering issues such that genl_sock is being initialised >> later with the slab change? > > I have checked that there is an ordering issue. > > genl_init() which initializes init_net->genl_sock is called on > subsys_initcall(). > > acpi_wmi_init() which schedules acpi_wmi_notify_handler() to the > workqueue is called on subsys_initcall(), too. > (acpi_wmi_notify_handler() -> acpi_bus_generate_netlink_event() -> > netlink_broadcast()) > > In my system, acpi_wmi_init() is called before the genl_init(). > Therefore, if the worker is scheduled before genl_init() is done, NULL > derefence would happen. Does it help to change the subsys_initcall() in wmi.c to subsys_initcall_sync()? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ACPI issues on cold power on [bisected] 2018-01-02 10:25 ` Rafael J. Wysocki @ 2018-01-03 2:11 ` Joonsoo Kim 2018-01-03 10:38 ` Jonathan McDowell 0 siblings, 1 reply; 13+ messages in thread From: Joonsoo Kim @ 2018-01-03 2:11 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jonathan McDowell, ACPI Devel Maling List, Linux Kernel Mailing List, Linux Memory Management List, netdev On Tue, Jan 02, 2018 at 11:25:01AM +0100, Rafael J. Wysocki wrote: > On Tue, Jan 2, 2018 at 3:54 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > > On Fri, Dec 29, 2017 at 04:36:59PM +0000, Jonathan McDowell wrote: > >> On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote: > >> > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote: > >> > > I've been sitting on this for a while and should have spent time to > >> > > investigate sooner, but it's been an odd failure mode that wasn't quite > >> > > obvious. > >> > > > >> > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I > >> > > don't see anything after grub says its booting. In 4.10 onwards the > >> > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy > >> > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below; > >> > > taken from 4.12 as that's the most recent error dmesg I have saved but > >> > > also seen back in 4.10. It's always address 0x30 for the dereference. > >> > > > >> > > Rebooting the laptop does not lead to these problems; it's *only* from a > >> > > complete cold boot that they arise (which didn't help me in terms of > >> > > being able to reliably bisect). Once I realised that I was able to > >> > > bisect, but it leads me to an odd commit: > >> > > > >> > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6 > >> > > (mm/slab: fix kmemcg cache creation delayed issue) > >> > > > >> > > If I revert this then I can cold boot without problems. > >> > > > >> > > Also I don't see the problem with a stock Debian kernel, I think because > >> > > the ACPI support is modularised. > >> > > >> > Sorry for late response. I was on a long vacation. > >> > >> No problem. I've been trying to get around to diagnosing this for a > >> while now anyway and this isn't a great time of year for fast responses. > >> > >> > I have tried to solve the problem however I don't find any clue yet. > >> > > >> > >From my analysis, oops report shows that 'struct sock *ssk' passed to > >> > netlink_broadcast_filtered() is NULL. It means that some of > >> > netlink_kernel_create() returns NULL. Maybe, it is due to slab > >> > allocation failure. Could you check it by inserting some log on that > >> > part? The issue cannot be reproducible in my side so I need your help. > >> > >> I've added some debug in acpi_bus_generate_netlink_event + > >> genlmsg_multicast and the problem seems to be that genlmsg_multicast is > >> getting called when init_net.genl_sock has not yet been initialised, > >> leading to the NULL deference. > >> > >> Full dmesg output from a cold 4.14.8 boot at: > >> > >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken > >> > >> And the same kernel after a reboot ("shutdown -r now"): > >> > >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working > >> > >> Patch that I've applied is at > >> > >> https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff > >> > > > > Thanks for testing! It's very helpful. > > > >> The interesting difference seems to be: > >> > >> PCI: Using ACPI for IRQ routing > >> +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492) > >> +ERROR: init_net.genl_sock is NULL > >> +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 > >> +IP: netlink_broadcast_filtered+0x20/0x3d0 > >> +PGD 0 P4D 0 > >> +Oops: 0000 [#1] SMP > >> +Modules linked in: > >> +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1 > >> +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017 > >> +Workqueue: kacpi_notify acpi_os_execute_deferred > >> > >> 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and > >> there's no visible event for it on a reboot, just on a cold power on. > >> Some sort of ordering issues such that genl_sock is being initialised > >> later with the slab change? > > > > I have checked that there is an ordering issue. > > > > genl_init() which initializes init_net->genl_sock is called on > > subsys_initcall(). > > > > acpi_wmi_init() which schedules acpi_wmi_notify_handler() to the > > workqueue is called on subsys_initcall(), too. > > (acpi_wmi_notify_handler() -> acpi_bus_generate_netlink_event() -> > > netlink_broadcast()) > > > > In my system, acpi_wmi_init() is called before the genl_init(). > > Therefore, if the worker is scheduled before genl_init() is done, NULL > > derefence would happen. > > Does it help to change the subsys_initcall() in wmi.c to subsys_initcall_sync()? I guess that it would work. I cannot reproduce the issue so it needs to be checked by Jonathan. Jonathan, could you check the problem is disappeared with above change? Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ACPI issues on cold power on [bisected] 2018-01-03 2:11 ` Joonsoo Kim @ 2018-01-03 10:38 ` Jonathan McDowell 2018-01-03 11:29 ` Rafael J. Wysocki 0 siblings, 1 reply; 13+ messages in thread From: Jonathan McDowell @ 2018-01-03 10:38 UTC (permalink / raw) To: Joonsoo Kim Cc: Rafael J. Wysocki, ACPI Devel Maling List, Linux Kernel Mailing List, Linux Memory Management List, netdev On Wed, Jan 03, 2018 at 11:11:29AM +0900, Joonsoo Kim wrote: > On Tue, Jan 02, 2018 at 11:25:01AM +0100, Rafael J. Wysocki wrote: > > On Tue, Jan 2, 2018 at 3:54 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > > > On Fri, Dec 29, 2017 at 04:36:59PM +0000, Jonathan McDowell wrote: > > >> On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote: > > >> > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote: > > >> > > I've been sitting on this for a while and should have spent time to > > >> > > investigate sooner, but it's been an odd failure mode that wasn't quite > > >> > > obvious. > > >> > > > > >> > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I > > >> > > don't see anything after grub says its booting. In 4.10 onwards the > > >> > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy > > >> > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below; > > >> > > taken from 4.12 as that's the most recent error dmesg I have saved but > > >> > > also seen back in 4.10. It's always address 0x30 for the dereference. > > >> > > > > >> > > Rebooting the laptop does not lead to these problems; it's *only* from a > > >> > > complete cold boot that they arise (which didn't help me in terms of > > >> > > being able to reliably bisect). Once I realised that I was able to > > >> > > bisect, but it leads me to an odd commit: > > >> > > > > >> > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6 > > >> > > (mm/slab: fix kmemcg cache creation delayed issue) > > >> > > > > >> > > If I revert this then I can cold boot without problems. > > >> > > > > >> > > Also I don't see the problem with a stock Debian kernel, I think because > > >> > > the ACPI support is modularised. > > >> > > > >> > Sorry for late response. I was on a long vacation. > > >> > > >> No problem. I've been trying to get around to diagnosing this for a > > >> while now anyway and this isn't a great time of year for fast responses. > > >> > > >> > I have tried to solve the problem however I don't find any clue yet. > > >> > > > >> > >From my analysis, oops report shows that 'struct sock *ssk' passed to > > >> > netlink_broadcast_filtered() is NULL. It means that some of > > >> > netlink_kernel_create() returns NULL. Maybe, it is due to slab > > >> > allocation failure. Could you check it by inserting some log on that > > >> > part? The issue cannot be reproducible in my side so I need your help. > > >> > > >> I've added some debug in acpi_bus_generate_netlink_event + > > >> genlmsg_multicast and the problem seems to be that genlmsg_multicast is > > >> getting called when init_net.genl_sock has not yet been initialised, > > >> leading to the NULL deference. > > >> > > >> Full dmesg output from a cold 4.14.8 boot at: > > >> > > >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken > > >> > > >> And the same kernel after a reboot ("shutdown -r now"): > > >> > > >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working > > >> > > >> Patch that I've applied is at > > >> > > >> https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff > > >> > > > > > > Thanks for testing! It's very helpful. > > > > > >> The interesting difference seems to be: > > >> > > >> PCI: Using ACPI for IRQ routing > > >> +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492) > > >> +ERROR: init_net.genl_sock is NULL > > >> +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 > > >> +IP: netlink_broadcast_filtered+0x20/0x3d0 > > >> +PGD 0 P4D 0 > > >> +Oops: 0000 [#1] SMP > > >> +Modules linked in: > > >> +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1 > > >> +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017 > > >> +Workqueue: kacpi_notify acpi_os_execute_deferred > > >> > > >> 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and > > >> there's no visible event for it on a reboot, just on a cold power on. > > >> Some sort of ordering issues such that genl_sock is being initialised > > >> later with the slab change? > > > > > > I have checked that there is an ordering issue. > > > > > > genl_init() which initializes init_net->genl_sock is called on > > > subsys_initcall(). > > > > > > acpi_wmi_init() which schedules acpi_wmi_notify_handler() to the > > > workqueue is called on subsys_initcall(), too. > > > (acpi_wmi_notify_handler() -> acpi_bus_generate_netlink_event() -> > > > netlink_broadcast()) > > > > > > In my system, acpi_wmi_init() is called before the genl_init(). > > > Therefore, if the worker is scheduled before genl_init() is done, NULL > > > derefence would happen. > > > > Does it help to change the subsys_initcall() in wmi.c to subsys_initcall_sync()? > > I guess that it would work. I cannot reproduce the issue so it needs > to be checked by Jonathan. Jonathan, could you check the problem > is disappeared with above change? I have confirmed that the problem also occurs when using SLUB instead of SLAB, and that switching drivers/platform/x86/wmi.c to use subsys_initcall_sync() instead of subsys_initcall() fixes the problem for both. Weirdly I don't see the ACPI 208 event at boot time being raised once that patch is in place. J. -- ] https://www.earth.li/~noodles/ [] Synonym: word used when you can't [ ] PGP/GPG Key @ the.earth.li [] spell the one you want [ ] via keyserver, web or email. [] [ ] RSA: 4096/0x94FA372B2DA8B985 [] [ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ACPI issues on cold power on [bisected] 2018-01-03 10:38 ` Jonathan McDowell @ 2018-01-03 11:29 ` Rafael J. Wysocki 0 siblings, 0 replies; 13+ messages in thread From: Rafael J. Wysocki @ 2018-01-03 11:29 UTC (permalink / raw) To: Jonathan McDowell Cc: Joonsoo Kim, Rafael J. Wysocki, ACPI Devel Maling List, Linux Kernel Mailing List, Linux Memory Management List, netdev On Wednesday, January 3, 2018 11:38:12 AM CET Jonathan McDowell wrote: > On Wed, Jan 03, 2018 at 11:11:29AM +0900, Joonsoo Kim wrote: > > On Tue, Jan 02, 2018 at 11:25:01AM +0100, Rafael J. Wysocki wrote: > > > On Tue, Jan 2, 2018 at 3:54 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > > > > On Fri, Dec 29, 2017 at 04:36:59PM +0000, Jonathan McDowell wrote: > > > >> On Fri, Dec 22, 2017 at 09:21:09AM +0900, Joonsoo Kim wrote: > > > >> > On Fri, Dec 08, 2017 at 03:11:59PM +0000, Jonathan McDowell wrote: > > > >> > > I've been sitting on this for a while and should have spent time to > > > >> > > investigate sooner, but it's been an odd failure mode that wasn't quite > > > >> > > obvious. > > > >> > > > > > >> > > In 4.9 if I cold power on my laptop (Dell E7240) it fails to boot - I > > > >> > > don't see anything after grub says its booting. In 4.10 onwards the > > > >> > > laptop boots, but I get an Oops as part of the boot and ACPI is unhappy > > > >> > > (no suspend, no clean poweroff, no ACPI buttons). The Oops is below; > > > >> > > taken from 4.12 as that's the most recent error dmesg I have saved but > > > >> > > also seen back in 4.10. It's always address 0x30 for the dereference. > > > >> > > > > > >> > > Rebooting the laptop does not lead to these problems; it's *only* from a > > > >> > > complete cold boot that they arise (which didn't help me in terms of > > > >> > > being able to reliably bisect). Once I realised that I was able to > > > >> > > bisect, but it leads me to an odd commit: > > > >> > > > > > >> > > 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6 > > > >> > > (mm/slab: fix kmemcg cache creation delayed issue) > > > >> > > > > > >> > > If I revert this then I can cold boot without problems. > > > >> > > > > > >> > > Also I don't see the problem with a stock Debian kernel, I think because > > > >> > > the ACPI support is modularised. > > > >> > > > > >> > Sorry for late response. I was on a long vacation. > > > >> > > > >> No problem. I've been trying to get around to diagnosing this for a > > > >> while now anyway and this isn't a great time of year for fast responses. > > > >> > > > >> > I have tried to solve the problem however I don't find any clue yet. > > > >> > > > > >> > >From my analysis, oops report shows that 'struct sock *ssk' passed to > > > >> > netlink_broadcast_filtered() is NULL. It means that some of > > > >> > netlink_kernel_create() returns NULL. Maybe, it is due to slab > > > >> > allocation failure. Could you check it by inserting some log on that > > > >> > part? The issue cannot be reproducible in my side so I need your help. > > > >> > > > >> I've added some debug in acpi_bus_generate_netlink_event + > > > >> genlmsg_multicast and the problem seems to be that genlmsg_multicast is > > > >> getting called when init_net.genl_sock has not yet been initialised, > > > >> leading to the NULL deference. > > > >> > > > >> Full dmesg output from a cold 4.14.8 boot at: > > > >> > > > >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-broken > > > >> > > > >> And the same kernel after a reboot ("shutdown -r now"): > > > >> > > > >> https://the.earth.li/~noodles/acpi-problem/dmesg-4.14.8-working > > > >> > > > >> Patch that I've applied is at > > > >> > > > >> https://the.earth.li/~noodles/acpi-problem/debug-acpi.diff > > > >> > > > > > > > > Thanks for testing! It's very helpful. > > > > > > > >> The interesting difference seems to be: > > > >> > > > >> PCI: Using ACPI for IRQ routing > > > >> +ACPI: Generating event type 208 (:9DBB5994-A997-11DA-B012-B622A1EF5492) > > > >> +ERROR: init_net.genl_sock is NULL > > > >> +BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 > > > >> +IP: netlink_broadcast_filtered+0x20/0x3d0 > > > >> +PGD 0 P4D 0 > > > >> +Oops: 0000 [#1] SMP > > > >> +Modules linked in: > > > >> +CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.14.8+ #1 > > > >> +Hardware name: Dell Inc. Latitude E7240/07RPNV, BIOS A22 10/18/2017 > > > >> +Workqueue: kacpi_notify acpi_os_execute_deferred > > > >> > > > >> 9DBB5994-A997-11DA-B012-B622A1EF5492 is the Dell WMI event GUID and > > > >> there's no visible event for it on a reboot, just on a cold power on. > > > >> Some sort of ordering issues such that genl_sock is being initialised > > > >> later with the slab change? > > > > > > > > I have checked that there is an ordering issue. > > > > > > > > genl_init() which initializes init_net->genl_sock is called on > > > > subsys_initcall(). > > > > > > > > acpi_wmi_init() which schedules acpi_wmi_notify_handler() to the > > > > workqueue is called on subsys_initcall(), too. > > > > (acpi_wmi_notify_handler() -> acpi_bus_generate_netlink_event() -> > > > > netlink_broadcast()) > > > > > > > > In my system, acpi_wmi_init() is called before the genl_init(). > > > > Therefore, if the worker is scheduled before genl_init() is done, NULL > > > > derefence would happen. > > > > > > Does it help to change the subsys_initcall() in wmi.c to subsys_initcall_sync()? > > > > I guess that it would work. I cannot reproduce the issue so it needs > > to be checked by Jonathan. Jonathan, could you check the problem > > is disappeared with above change? > > I have confirmed that the problem also occurs when using SLUB instead of > SLAB, and that switching drivers/platform/x86/wmi.c to use > subsys_initcall_sync() instead of subsys_initcall() fixes the problem > for both. Weirdly I don't see the ACPI 208 event at boot time being > raised once that patch is in place. Interesting. Anyway, let me send this change as a proper patch. Thanks, Rafael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] ACPI / WMI: Call acpi_wmi_init() later 2017-12-08 15:11 ACPI issues on cold power on [bisected] Jonathan McDowell 2017-12-22 0:21 ` Joonsoo Kim @ 2018-01-03 11:49 ` Rafael J. Wysocki 2018-01-05 23:30 ` Rafael J. Wysocki 1 sibling, 1 reply; 13+ messages in thread From: Rafael J. Wysocki @ 2018-01-03 11:49 UTC (permalink / raw) To: Andy Shevchenko, Darren Hart Cc: Jonathan McDowell, linux-acpi, linux-kernel, linux-mm, Joonsoo Kim, platform-driver-x86, Andy Lutomirski From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Calling acpi_wmi_init() at the subsys_initcall() level causes ordering issues to appear on some systems and they are difficult to reproduce, because there is no guaranteed ordering between subsys_initcall() calls, so they may occur in different orders on different systems. In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache creation delayed issue) exposed one of these issues where genl_init() and acpi_wmi_init() are both called at the same initcall level, but the former must run before the latter so as to avoid a NULL pointer dereference. For this reason, move the acpi_wmi_init() invocation to the initcall_sync level which should still be early enough for things to work correctly in the WMI land. Link: https://marc.info/?t=151274596700002&r=1&w=2 Reported-by: Jonathan McDowell <noodles@earth.li> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Tested-by: Jonathan McDowell <noodles@earth.li> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- drivers/platform/x86/wmi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-pm/drivers/platform/x86/wmi.c =================================================================== --- linux-pm.orig/drivers/platform/x86/wmi.c +++ linux-pm/drivers/platform/x86/wmi.c @@ -1458,5 +1458,5 @@ static void __exit acpi_wmi_exit(void) class_unregister(&wmi_bus_class); } -subsys_initcall(acpi_wmi_init); +subsys_initcall_sync(acpi_wmi_init); module_exit(acpi_wmi_exit); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] ACPI / WMI: Call acpi_wmi_init() later 2018-01-03 11:49 ` [PATCH] ACPI / WMI: Call acpi_wmi_init() later Rafael J. Wysocki @ 2018-01-05 23:30 ` Rafael J. Wysocki 2018-01-06 1:16 ` Darren Hart 2018-01-06 11:02 ` Jonathan McDowell 0 siblings, 2 replies; 13+ messages in thread From: Rafael J. Wysocki @ 2018-01-05 23:30 UTC (permalink / raw) To: Andy Shevchenko, Darren Hart Cc: Jonathan McDowell, ACPI Devel Maling List, Linux Kernel Mailing List, Linux Memory Management List, Joonsoo Kim, Platform Driver, Andy Lutomirski On Wed, Jan 3, 2018 at 12:49 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Calling acpi_wmi_init() at the subsys_initcall() level causes ordering > issues to appear on some systems and they are difficult to reproduce, > because there is no guaranteed ordering between subsys_initcall() > calls, so they may occur in different orders on different systems. > > In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache > creation delayed issue) exposed one of these issues where genl_init() > and acpi_wmi_init() are both called at the same initcall level, but > the former must run before the latter so as to avoid a NULL pointer > dereference. > > For this reason, move the acpi_wmi_init() invocation to the > initcall_sync level which should still be early enough for things > to work correctly in the WMI land. > > Link: https://marc.info/?t=151274596700002&r=1&w=2 > Reported-by: Jonathan McDowell <noodles@earth.li> > Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Tested-by: Jonathan McDowell <noodles@earth.li> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Guys, this fixes a crash on boot. If there are no concerns/objections I will just take it through the ACPI tree. > --- > drivers/platform/x86/wmi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-pm/drivers/platform/x86/wmi.c > =================================================================== > --- linux-pm.orig/drivers/platform/x86/wmi.c > +++ linux-pm/drivers/platform/x86/wmi.c > @@ -1458,5 +1458,5 @@ static void __exit acpi_wmi_exit(void) > class_unregister(&wmi_bus_class); > } > > -subsys_initcall(acpi_wmi_init); > +subsys_initcall_sync(acpi_wmi_init); > module_exit(acpi_wmi_exit); > > -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] ACPI / WMI: Call acpi_wmi_init() later 2018-01-05 23:30 ` Rafael J. Wysocki @ 2018-01-06 1:16 ` Darren Hart 2018-01-06 11:02 ` Jonathan McDowell 1 sibling, 0 replies; 13+ messages in thread From: Darren Hart @ 2018-01-06 1:16 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andy Shevchenko, Jonathan McDowell, ACPI Devel Maling List, Linux Kernel Mailing List, Linux Memory Management List, Joonsoo Kim, Platform Driver, Andy Lutomirski On Sat, Jan 06, 2018 at 12:30:23AM +0100, Rafael J. Wysocki wrote: > On Wed, Jan 3, 2018 at 12:49 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Calling acpi_wmi_init() at the subsys_initcall() level causes ordering > > issues to appear on some systems and they are difficult to reproduce, > > because there is no guaranteed ordering between subsys_initcall() > > calls, so they may occur in different orders on different systems. > > > > In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache > > creation delayed issue) exposed one of these issues where genl_init() > > and acpi_wmi_init() are both called at the same initcall level, but > > the former must run before the latter so as to avoid a NULL pointer > > dereference. > > > > For this reason, move the acpi_wmi_init() invocation to the > > initcall_sync level which should still be early enough for things > > to work correctly in the WMI land. > > > > Link: https://marc.info/?t=151274596700002&r=1&w=2 > > Reported-by: Jonathan McDowell <noodles@earth.li> > > Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> > > Tested-by: Jonathan McDowell <noodles@earth.li> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Guys, this fixes a crash on boot. > > If there are no concerns/objections I will just take it through the ACPI tree. Queued up and running through tests now. I'll have it in for-next as soon as those complete assuming to issues. -- Darren Hart VMware Open Source Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] ACPI / WMI: Call acpi_wmi_init() later 2018-01-05 23:30 ` Rafael J. Wysocki 2018-01-06 1:16 ` Darren Hart @ 2018-01-06 11:02 ` Jonathan McDowell 2018-01-06 22:59 ` Darren Hart 1 sibling, 1 reply; 13+ messages in thread From: Jonathan McDowell @ 2018-01-06 11:02 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andy Shevchenko, Darren Hart, ACPI Devel Maling List, Linux Kernel Mailing List, Linux Memory Management List, Joonsoo Kim, Platform Driver, Andy Lutomirski On Sat, Jan 06, 2018 at 12:30:23AM +0100, Rafael J. Wysocki wrote: > On Wed, Jan 3, 2018 at 12:49 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Calling acpi_wmi_init() at the subsys_initcall() level causes ordering > > issues to appear on some systems and they are difficult to reproduce, > > because there is no guaranteed ordering between subsys_initcall() > > calls, so they may occur in different orders on different systems. > > > > In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache > > creation delayed issue) exposed one of these issues where genl_init() > > and acpi_wmi_init() are both called at the same initcall level, but > > the former must run before the latter so as to avoid a NULL pointer > > dereference. > > > > For this reason, move the acpi_wmi_init() invocation to the > > initcall_sync level which should still be early enough for things > > to work correctly in the WMI land. > > > > Link: https://marc.info/?t=151274596700002&r=1&w=2 > > Reported-by: Jonathan McDowell <noodles@earth.li> > > Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> > > Tested-by: Jonathan McDowell <noodles@earth.li> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Guys, this fixes a crash on boot. > > If there are no concerns/objections I will just take it through the ACPI tree. Note that I first started seeing it in v4.9 so would ideally hit the appropriate stable trees too. > > --- > > drivers/platform/x86/wmi.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Index: linux-pm/drivers/platform/x86/wmi.c > > =================================================================== > > --- linux-pm.orig/drivers/platform/x86/wmi.c > > +++ linux-pm/drivers/platform/x86/wmi.c > > @@ -1458,5 +1458,5 @@ static void __exit acpi_wmi_exit(void) > > class_unregister(&wmi_bus_class); > > } > > > > -subsys_initcall(acpi_wmi_init); > > +subsys_initcall_sync(acpi_wmi_init); > > module_exit(acpi_wmi_exit); > > > > -- J. -- /-\ | 101 things you can't have too much |@/ Debian GNU/Linux Developer | of : 36 - Spare video tapes. \- | -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] ACPI / WMI: Call acpi_wmi_init() later 2018-01-06 11:02 ` Jonathan McDowell @ 2018-01-06 22:59 ` Darren Hart 0 siblings, 0 replies; 13+ messages in thread From: Darren Hart @ 2018-01-06 22:59 UTC (permalink / raw) To: Jonathan McDowell Cc: Rafael J. Wysocki, Andy Shevchenko, ACPI Devel Maling List, Linux Kernel Mailing List, Linux Memory Management List, Joonsoo Kim, Platform Driver, Andy Lutomirski On Sat, Jan 06, 2018 at 11:02:27AM +0000, Jonathan McDowell wrote: > On Sat, Jan 06, 2018 at 12:30:23AM +0100, Rafael J. Wysocki wrote: > > On Wed, Jan 3, 2018 at 12:49 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > Calling acpi_wmi_init() at the subsys_initcall() level causes ordering > > > issues to appear on some systems and they are difficult to reproduce, > > > because there is no guaranteed ordering between subsys_initcall() > > > calls, so they may occur in different orders on different systems. > > > > > > In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache > > > creation delayed issue) exposed one of these issues where genl_init() > > > and acpi_wmi_init() are both called at the same initcall level, but > > > the former must run before the latter so as to avoid a NULL pointer > > > dereference. > > > > > > For this reason, move the acpi_wmi_init() invocation to the > > > initcall_sync level which should still be early enough for things > > > to work correctly in the WMI land. > > > > > > Link: https://marc.info/?t=151274596700002&r=1&w=2 > > > Reported-by: Jonathan McDowell <noodles@earth.li> > > > Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> > > > Tested-by: Jonathan McDowell <noodles@earth.li> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Guys, this fixes a crash on boot. > > > > If there are no concerns/objections I will just take it through the ACPI tree. > > Note that I first started seeing it in v4.9 so would ideally hit the > appropriate stable trees too. Thanks, I'll take care of that. -- Darren Hart VMware Open Source Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-01-06 22:59 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-08 15:11 ACPI issues on cold power on [bisected] Jonathan McDowell 2017-12-22 0:21 ` Joonsoo Kim 2017-12-29 16:36 ` Jonathan McDowell 2018-01-02 2:54 ` Joonsoo Kim 2018-01-02 10:25 ` Rafael J. Wysocki 2018-01-03 2:11 ` Joonsoo Kim 2018-01-03 10:38 ` Jonathan McDowell 2018-01-03 11:29 ` Rafael J. Wysocki 2018-01-03 11:49 ` [PATCH] ACPI / WMI: Call acpi_wmi_init() later Rafael J. Wysocki 2018-01-05 23:30 ` Rafael J. Wysocki 2018-01-06 1:16 ` Darren Hart 2018-01-06 11:02 ` Jonathan McDowell 2018-01-06 22:59 ` Darren Hart
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox