From: Michal Hocko <mhocko@suse.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: "Petr Mladek" <pmladek@suse.com>,
"Patrick Daly" <quic_pdaly@quicinc.com>,
"Mel Gorman" <mgorman@techsingularity.net>,
"David Hildenbrand" <david@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Sergey Senozhatsky" <senozhatsky@chromium.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
"John Ogness" <john.ogness@linutronix.de>,
syzkaller-bugs@googlegroups.com,
"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>,
linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH v2] mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
Date: Tue, 4 Apr 2023 17:20:35 +0200 [thread overview]
Message-ID: <ZCxAQ++B3Io/dk6E@dhcp22.suse.cz> (raw)
In-Reply-To: <8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp>
On Tue 04-04-23 23:31:58, Tetsuo Handa wrote:
> syzbot is reporting circular locking dependency which involves
> zonelist_update_seq seqlock [1], for this lock is checked by memory
> allocation requests which do not need to be retried.
>
> One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
>
> CPU0
> ----
> __build_all_zonelists() {
> write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
> // e.g. timer interrupt handler runs at this moment
> some_timer_func() {
> kmalloc(GFP_ATOMIC) {
> __alloc_pages_slowpath() {
> read_seqbegin(&zonelist_update_seq) {
> // spins forever because zonelist_update_seq.seqcount is odd
> }
> }
> }
> }
> // e.g. timer interrupt handler finishes
> write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
> }
>
> This deadlock scenario can be easily eliminated by not calling
> read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
> requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
> requests. But Michal Hocko does not know whether we should go with this
> approach.
It would have been more useful to explain why that is not preferred or
desirable.
> Another deadlock scenario which syzbot is reporting is a race between
> kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer()
> with port->lock held and printk() from __build_all_zonelists() with
> zonelist_update_seq held.
>
> CPU0 CPU1
> ---- ----
> pty_write() {
> tty_insert_flip_string_and_push_buffer() {
> __build_all_zonelists() {
> write_seqlock(&zonelist_update_seq);
> build_zonelists() {
> printk() {
> vprintk() {
> vprintk_default() {
> vprintk_emit() {
> console_unlock() {
> console_flush_all() {
> console_emit_next_record() {
> con->write() = serial8250_console_write() {
> spin_lock_irqsave(&port->lock, flags);
> tty_insert_flip_string() {
> tty_insert_flip_string_fixed_flag() {
> __tty_buffer_request_room() {
> tty_buffer_alloc() {
> kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
> __alloc_pages_slowpath() {
> zonelist_iter_begin() {
> read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
> spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
> }
> }
> }
> }
> }
> }
> }
> }
> spin_unlock_irqrestore(&port->lock, flags);
> // message is printed to console
> spin_unlock_irqrestore(&port->lock, flags);
> }
> }
> }
> }
> }
> }
> }
> }
> }
> write_sequnlock(&zonelist_update_seq);
> }
> }
> }
>
> This deadlock scenario can be eliminated by
>
> preventing interrupt context from calling kmalloc(GFP_ATOMIC)
>
> and
>
> preventing printk() from calling console_flush_all()
>
> while zonelist_update_seq.seqcount is odd.
>
> Since Petr Mladek thinks that __build_all_zonelists() can become a
> candidate for deferring printk() [2], let's address this problem by
>
> disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
>
> and
>
> disabling synchronous printk() in order to avoid console_flush_all()
>
> .
>
> As a side effect of minimizing duration of zonelist_update_seq.seqcount
> being odd by disabling synchronous printk(), latency at
> read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
> __GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
> lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
> do not record unnecessary locking dependency) from interrupt context is
> still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside
> write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
> section...
I have really hard time to wrap my head around this changelog. I would
rephrase as follows.
The syzbot has noticed the following deadlock scenario[1]
CPU0 CPU1
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); (A)
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags); (B)
spin_lock_irqsave(&port->lock, flags); <<< spinning on (B)
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); <<< spinning on (A)
This can happen during memory hotplug operation. This means that
write_seqlock on zonelist_update_seq is not allowed to call into
synchronous printk code path. This can be avoided by using a deferred
printk context.
This is not the only problematic scenario though. Another one would be
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); <<< (A)
<IRQ>
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) <<< spinning on (A)
Allocations from (soft)IRQ contexts are quite common. This can be
avoided by disabling interrupts for this path so we won't self livelock.
> Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
> Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
> Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Petr Mladek <pmladek@suse.com>
Anyway the patch is correct
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> Changes in v2:
> Update patch description and comment.
>
> mm/page_alloc.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7136c36c5d01..e8b4f294d763 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6632,7 +6632,21 @@ static void __build_all_zonelists(void *data)
> int nid;
> int __maybe_unused cpu;
> pg_data_t *self = data;
> + unsigned long flags;
>
> + /*
> + * Explicitly disable this CPU's interrupts before taking seqlock
> + * to prevent any IRQ handler from calling into the page allocator
> + * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
> + */
> + local_irq_save(flags);
> + /*
> + * Explicitly disable this CPU's synchronous printk() before taking
> + * seqlock to prevent any printk() from trying to hold port->lock, for
> + * tty_insert_flip_string_and_push_buffer() on other CPU might be
> + * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
> + */
> + printk_deferred_enter();
> write_seqlock(&zonelist_update_seq);
>
> #ifdef CONFIG_NUMA
> @@ -6671,6 +6685,8 @@ static void __build_all_zonelists(void *data)
> }
>
> write_sequnlock(&zonelist_update_seq);
> + printk_deferred_exit();
> + local_irq_restore(flags);
> }
>
> static noinline void __init
> --
> 2.34.1
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2023-04-04 15:20 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <000000000000b21f0a05e9ec310d@google.com>
[not found] ` <f6bd471c-f961-ef5e-21c5-bf158be19d12@linux.intel.com>
2023-04-02 10:48 ` [PATCH] mm/page_alloc: don't check zonelist_update_seq from atomic allocations Tetsuo Handa
2023-04-03 8:15 ` Michal Hocko
2023-04-03 11:14 ` Tetsuo Handa
2023-04-03 12:09 ` Michal Hocko
2023-04-03 12:51 ` Tetsuo Handa
2023-04-03 13:44 ` Michal Hocko
2023-04-03 15:12 ` Petr Mladek
2023-04-04 0:37 ` [PATCH] mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock Tetsuo Handa
2023-04-04 2:11 ` Sergey Senozhatsky
2023-04-04 7:43 ` Petr Mladek
2023-04-04 7:54 ` Michal Hocko
2023-04-04 8:20 ` Tetsuo Handa
2023-04-04 11:05 ` Michal Hocko
2023-04-04 11:19 ` Tetsuo Handa
2023-04-04 14:31 ` [PATCH v2] " Tetsuo Handa
2023-04-04 15:20 ` Michal Hocko [this message]
2023-04-05 9:02 ` Mel Gorman
2023-04-04 21:25 ` Andrew Morton
2023-04-05 8:28 ` Michal Hocko
2023-04-05 8:53 ` Petr Mladek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZCxAQ++B3Io/dk6E@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=john.ogness@linutronix.de \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=pmladek@suse.com \
--cc=quic_pdaly@quicinc.com \
--cc=rostedt@goodmis.org \
--cc=senozhatsky@chromium.org \
--cc=syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox