From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>,
Andrew Morton <akpm@linux-foundation.org>,
maple-tree@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org,
linux-renesas-soc@vger.kernel.org,
Shanker Donthineni <sdonthineni@nvidia.com>
Subject: Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
Date: Mon, 11 Sep 2023 19:54:52 -0400 [thread overview]
Message-ID: <20230911235452.xhtnt7ply7ayr53x@revolver> (raw)
In-Reply-To: <495849d6-1dc6-4f38-bce7-23c50df3a99f@paulmck-laptop>
* Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > >
> > > > Apologies on the late response, I was away and have been struggling to
> > > > get a working PPC32 test environment.
> > > >
> > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > Hi Liam,
> > > > >
> > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > incorrect ranges to be returned to a reader during an update. Although
> > > > > > this has not been reported or seen, disable the append write operation
> > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > >
> > > > ...
> > > > > >
...
> > > > > RCU-related configs:
> > > > >
> > > > > $ grep RCU .config
> > > > > # RCU Subsystem
> > > > > CONFIG_TINY_RCU=y
> > > > > # CONFIG_RCU_EXPERT is not set
> > > > > CONFIG_TINY_SRCU=y
> > > > > # end of RCU Subsystem
> > > > > # RCU Debugging
> > > > > # CONFIG_RCU_SCALE_TEST is not set
> > > > > # CONFIG_RCU_TORTURE_TEST is not set
> > > > > # CONFIG_RCU_REF_SCALE_TEST is not set
> > > > > # CONFIG_RCU_TRACE is not set
> > > > > # CONFIG_RCU_EQS_DEBUG is not set
> > > > > # end of RCU Debugging
> > > >
> > > > I used the configuration from debian 8 and ran 'make oldconfig' to build
> > > > my kernel. I have attached the configuration.
...
> > > > It appears to be something to do with struct maple_tree sparse_irqs. If
> > > > you drop the rcu flag from that maple tree, then my configuration boots
> > > > without the warning.
> > > >
> > > > I *think* this is because we will reuse a lot more nodes. And I *think*
> > > > the rcu flag is not needed, since there is a comment about reading the
> > > > tree being protected by the mutex sparse_irq_lock within the
> > > > kernel/irq/irqdesc.c file. Shanker, can you comment on that?
> > > >
> > > > I wonder if there is a limit to the number of RCU free events before
> > > > something is triggered to flush them out which could trigger IRQ
> > > > enabling? Paul, could this be the case?
> > >
> > > Are you asking if call_rcu() will re-enable interrupts in the following
> > > use case?
> > >
> > > local_irq_disable();
> > > call_rcu(&p->rh, my_cb_func);
> > > local_irq_enable();
I am not.
...
> > >
> > > Or am I missing your point?
> >
> > This is very early in the boot sequence when interrupts have not been
> > enabled. What we are seeing is a WARN_ON() that is triggered by
> > interrupts being enabled before they should be enabled.
> >
> > I was wondering if, for example, I called call_rcu() a lot *before*
> > interrupts were enabled, that something could trigger that would either
> > enable interrupts or indicate the task needs rescheduling?
>
> You aren't doing call_rcu() enough to hit OOM, are you? The actual RCU
> callback invocations won't happen until some time after the scheduler
> starts up.
I am not, it's just a detection of IRQs being enabled early.
>
> > Specifically the rescheduling part is suspect. I tracked down the call
> > to a mutex_lock() which calls cond_resched(), so could rcu be
> > 'encouraging' the rcu window by a reschedule request?
>
> During boot before interrupts are enabled, RCU has not yet spawned any of
> its kthreads. Therefore, all of its attempts to do wakeups would notice
> a NULL task_struct pointer and refrain from actually doing the wakeup.
> If it did do the wakeup, you would see a NULL-pointer exception. See
> for example, invoke_rcu_core_kthread(), though that won't happen unless
> you booted with rcutree.use_softirq=0.
>
> Besides, since when did doing a wakeup enable interrupts? That would
> make it hard to do wakeups from hardware interrupt handlers, not?
Taking the mutex lock in kernel/irq/manage.c __setup_irq() is calling a
cond_resched().
From what Michael said [1] in this thread, since something has already
set TIF_NEED_RESCHED, it will eventually enable interrupts on us.
I've traced this to running call_rcu() in kernel/rcu/tiny.c and
is_idle_task(current) is true, which means rcu runs:
/* force scheduling for rcu_qs() */
resched_cpu(0);
the task is set idle in sched_init() -> init_idle() and never changed,
afaict.
Removing the RCU option from the maple tree in kernel/irq/irqdesc.c
fixes the issue by avoiding the maple tree running call_rcu(). I am not
sure on the locking of the tree so I feel this change may cause other
issues...also it's before lockdep_init(), so any issue I introduce may
not be detected.
When CONFIG_DEBUG_ATOMIC_SLEEP is configured, it seems that rcu does the
same thing, but the IRQs are not enabled on return. So, resched_cpu(0)
is called, but the IRQs warning of enabled isn't triggered. I failed to
find a reason why.
I am not entirely sure what makes ppc32 different than other platforms
in that the initial task is configured to an idle task and the first
call to call_rcu (tiny!) would cause the observed behaviour.
Non-tiny rcu calls (as I am sure you know, but others may not)
kernel/rcu/tree.c which in turn calls __call_rcu_common(). That
function is far more complex than the tiny version. Maybe it's part of
why we see different behaviour based on platforms? I don't see an idle
check in that version of call_rcu().
Or maybe PPC32 has something set incorrectly to cause this failure in
early boot and I've just found something that needs to be set
differently?
>
> But why not put some WARN_ON_ONCE(!irqs_disabled()) calls in the areas
> of greatest suspicion, starting from the stack trace generated by that
> mutex_lock()? A stray interrupt-enable could be pretty much anywhere.
>
> But where are those call_rcu() invocations? Before rcu_init()?
During init_IRQ(), which is after rcu_init() but before rcu_init_nohz(),
srcu_init(), and softirq_init() in init/main.c start_kernel().
> Presumably before init is spawned and the early_init() calls.
>
> And what is the RCU-related Kconfig and boot-parameter setup?
The .config was attached to the email I sent, and it matches what was
quoted above in the "RCU-related configs" section.
[1] https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/
next prev parent reply other threads:[~2023-09-11 23:55 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-19 0:43 [PATCH v2 0/2] maple_tree: mas_wr_append() fix ups Liam R. Howlett
2023-08-19 0:43 ` [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible Liam R. Howlett
2023-08-29 16:42 ` Geert Uytterhoeven
2023-08-31 5:39 ` Michael Ellerman
2023-08-31 8:25 ` Geert Uytterhoeven
2023-08-31 8:45 ` Peng Zhang
2023-08-31 9:43 ` Geert Uytterhoeven
2023-09-06 15:23 ` Liam R. Howlett
2023-09-06 17:23 ` Paul E. McKenney
2023-09-06 17:29 ` Liam R. Howlett
2023-09-06 18:02 ` Paul E. McKenney
2023-09-11 23:54 ` Liam R. Howlett [this message]
2023-09-12 8:14 ` Paul E. McKenney
2023-09-12 8:23 ` Geert Uytterhoeven
2023-09-12 8:30 ` Paul E. McKenney
2023-09-12 8:34 ` Geert Uytterhoeven
2023-09-12 10:00 ` Paul E. McKenney
2023-09-12 13:56 ` Liam R. Howlett
2023-09-12 14:29 ` Liam R. Howlett
2023-09-12 15:08 ` Paul E. McKenney
2023-09-12 15:27 ` Christophe Leroy
2023-09-12 15:49 ` Liam R. Howlett
2023-09-12 15:07 ` Paul E. McKenney
2023-09-12 15:44 ` Liam R. Howlett
2023-09-12 16:49 ` Paul E. McKenney
2023-09-12 17:02 ` Christophe Leroy
2023-09-12 17:09 ` Christophe Leroy
2023-09-12 17:38 ` Liam R. Howlett
2023-09-13 13:14 ` Geert Uytterhoeven
2023-09-13 13:24 ` Liam R. Howlett
2023-09-13 13:26 ` Geert Uytterhoeven
2023-09-12 14:37 ` Christophe Leroy
2023-09-12 14:10 ` Matthew Wilcox
2023-09-12 14:17 ` Liam R. Howlett
2023-09-06 19:06 ` Geert Uytterhoeven
2023-09-11 12:27 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-10-16 8:29 ` Linux regression tracking #update (Thorsten Leemhuis)
2023-08-30 19:49 ` Andreas Schwab
2023-08-31 5:37 ` Michael Ellerman
2023-08-31 19:01 ` Andreas Schwab
2023-09-12 18:15 ` Andreas Schwab
2023-09-12 19:09 ` Liam R. Howlett
2023-09-12 20:01 ` Andreas Schwab
2023-08-19 0:43 ` [PATCH v2 2/2] maple_tree: Clean up mas_wr_append() Liam R. Howlett
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230911235452.xhtnt7ply7ayr53x@revolver \
--to=liam.howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=geert@linux-m68k.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-renesas-soc@vger.kernel.org \
--cc=maple-tree@lists.infradead.org \
--cc=paulmck@kernel.org \
--cc=sdonthineni@nvidia.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox