From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>,
Andrew Morton <akpm@linux-foundation.org>,
Petr Mladek <pmladek@suse.com>, linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
"Luis Claudio R. Goncalves" <lgoncalv@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>, Ingo Molnar <mingo@redhat.com>,
John Ogness <john.ogness@linutronix.de>,
Mel Gorman <mgorman@techsingularity.net>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH v2] mm/page_alloc: don't check zonelist_update_seq from atomic allocations
Date: Thu, 10 Aug 2023 09:26:37 +0200 [thread overview]
Message-ID: <20230810072637.6Sc3UU3R@linutronix.de> (raw)
In-Reply-To: <6cc13636-eda6-6a95-6564-db1c9ae76bb6@I-love.SAKURA.ne.jp>
On 2023-08-09 20:03:00 [+0900], Tetsuo Handa wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7d3460c7a480..5557d9a2ff2c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3638,26 +3638,44 @@ EXPORT_SYMBOL_GPL(fs_reclaim_release);
…
> -static DEFINE_SEQLOCK(zonelist_update_seq);
> +static unsigned int zonelist_update_seq;
>
> static unsigned int zonelist_iter_begin(void)
> {
> if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
> - return read_seqbegin(&zonelist_update_seq);
> + /* See comment above. */
> + return data_race(READ_ONCE(zonelist_update_seq));
This is open coded raw_read_seqcount() while it should have been
raw_seqcount_begin().
> return 0;
> }
>
> -static unsigned int check_retry_zonelist(unsigned int seq)
> +static unsigned int check_retry_zonelist(gfp_t gfp, unsigned int seq)
> {
> - if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
> - return read_seqretry(&zonelist_update_seq, seq);
> + if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE) && (gfp & __GFP_DIRECT_RECLAIM)) {
> + /* See comment above. */
> + unsigned int seq2 = data_race(READ_ONCE(zonelist_update_seq));
>
> - return seq;
> + /*
> + * "seq != seq2" indicates that __build_all_zonelists() has
> + * started or has finished rebuilding zonelists, hence retry.
> + * "seq == seq2 && (seq2 & 1)" indicates that
> + * __build_all_zonelists() is still rebuilding zonelists
> + * with context switching disabled, hence retry.
> + * "seq == seq2 && !(seq2 & 1)" indicates that
> + * __build_all_zonelists() did not rebuild zonelists, hence
> + * no retry.
> + */
> + return unlikely(seq != seq2 || (seq2 & 1));
open coded read_seqcount_retry().
> + }
> +
> + return 0;
> }
>
> /* Perform direct synchronous page reclaim */
> @@ -5136,22 +5154,17 @@ static void __build_all_zonelists(void *data)
> int nid;
> int __maybe_unused cpu;
> pg_data_t *self = data;
> + static DEFINE_SPINLOCK(lock);
> unsigned long flags;
>
> - /*
> - * Explicitly disable this CPU's interrupts before taking seqlock
> - * to prevent any IRQ handler from calling into the page allocator
> - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
> - */
> - local_irq_save(flags);
> - /*
> - * Explicitly disable this CPU's synchronous printk() before taking
> - * seqlock to prevent any printk() from trying to hold port->lock, for
> - * tty_insert_flip_string_and_push_buffer() on other CPU might be
> - * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
> - */
> - printk_deferred_enter();
> - write_seqlock(&zonelist_update_seq);
> +#ifdef CONFIG_PREEMPT_RT
> + migrate_disable()
> +#endif
There is no justification/ explanation why migrate_disable() here is
needed on PREEMPT_RT and I don't see one.
There are two changes here:
- The replacement of seqlock_t with something open coded
- Logic change when a retry is needed (the gfp mask is considered).
I am not a big fan of open coding things especially when not needed and
then there is this ifdef which is not needed as well. I don't comment on
the logic change.
Can we please put an end to this?
> + /* Serialize increments of zonelist_update_seq. */
> + spin_lock_irqsave(&lock, flags);
> + zonelist_update_seq++;
> + /* Tell check_retry_zonelist() that we started rebuilding zonelists. */
> + smp_wmb();
>
> #ifdef CONFIG_NUMA
> memset(node_load, 0, sizeof(node_load));
Sebastian
next prev parent reply other threads:[~2023-08-10 7:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-09 11:03 Tetsuo Handa
2023-08-09 12:49 ` Michal Hocko
2023-08-10 7:26 ` Sebastian Andrzej Siewior [this message]
2023-08-10 9:58 ` Tetsuo Handa
2023-08-10 15:35 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230810072637.6Sc3UU3R@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=john.ogness@linutronix.de \
--cc=lgoncalv@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox