linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: paulmck@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	willy@infradead.org, mgorman@suse.de, rostedt@goodmis.org,
	jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com,
	boris.ostrovsky@oracle.com, konrad.wilk@oracle.com,
	jgross@suse.com, andrew.cooper3@citrix.com,
	Frederic Weisbecker <fweisbec@gmail.com>
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED
Date: Wed, 18 Oct 2023 15:16:12 +0200	[thread overview]
Message-ID: <87pm1c3wbn.ffs@tglx> (raw)
In-Reply-To: <a375674b-de27-4965-a4bf-e0679229e28e@paulmck-laptop>

Paul!

On Tue, Oct 17 2023 at 18:03, Paul E. McKenney wrote:
> Belatedly calling out some RCU issues.  Nothing fatal, just a
> (surprisingly) few adjustments that will need to be made.  The key thing
> to note is that from RCU's viewpoint, with this change, all kernels
> are preemptible, though rcu_read_lock() readers remain
> non-preemptible.

Why? Either I'm confused or you or both of us :)

With this approach the kernel is by definition fully preemptible, which
means means rcu_read_lock() is preemptible too. That's pretty much the
same situation as with PREEMPT_DYNAMIC.

For throughput sake this fully preemptible kernel provides a mechanism
to delay preemption for SCHED_OTHER tasks, i.e. instead of setting
NEED_RESCHED the scheduler sets NEED_RESCHED_LAZY.

That means the preemption points in preempt_enable() and return from
interrupt to kernel will not see NEED_RESCHED and the tasks can run to
completion either to the point where they call schedule() or when they
return to user space. That's pretty much what PREEMPT_NONE does today.

The difference to NONE/VOLUNTARY is that the explicit cond_resched()
points are not longer required because the scheduler can preempt the
long running task by setting NEED_RESCHED instead.

That preemption might be suboptimal in some cases compared to
cond_resched(), but from my initial experimentation that's not really an
issue.

> With that:
>
> 1.	As an optimization, given that preempt_count() would always give
> 	good information, the scheduling-clock interrupt could sense RCU
> 	readers for new-age CONFIG_PREEMPT_NONE=y kernels.  As might the
> 	IPI handlers for expedited grace periods.  A nice optimization.
> 	Except that...
>
> 2.	The quiescent-state-forcing code currently relies on the presence
> 	of cond_resched() in CONFIG_PREEMPT_RCU=n kernels.  One fix
> 	would be to do resched_cpu() more quickly, but some workloads
> 	might not love the additional IPIs.  Another approach to do #1
> 	above to replace the quiescent states from cond_resched() with
> 	scheduler-tick-interrupt-sensed quiescent states.

Right. The tick can see either the lazy resched bit "ignored" or some
magic "RCU needs a quiescent state" and force a reschedule.

> 	Plus...
>
> 3.	For nohz_full CPUs that run for a long time in the kernel,
> 	there are no scheduling-clock interrupts.  RCU reaches for
> 	the resched_cpu() hammer a few jiffies into the grace period.
> 	And it sets the ->rcu_urgent_qs flag so that the holdout CPU's
> 	interrupt-entry code will re-enable its scheduling-clock interrupt
> 	upon receiving the resched_cpu() IPI.

You can spare the IPI by setting NEED_RESCHED on the remote CPU which
will cause it to preempt.

> 	So nohz_full CPUs should be OK as far as RCU is concerned.
> 	Other subsystems might have other opinions.
>
> 4.	As another optimization, kvfree_rcu() could unconditionally
> 	check preempt_count() to sense a clean environment suitable for
> 	memory allocation.

Correct. All the limitations of preempt count being useless are gone.

> 5.	Kconfig files with "select TASKS_RCU if PREEMPTION" must
> 	instead say "select TASKS_RCU".  This means that the #else
> 	in include/linux/rcupdate.h that defines TASKS_RCU in terms of
> 	vanilla RCU must go.  There might be be some fallout if something
> 	fails to select TASKS_RCU, builds only with CONFIG_PREEMPT_NONE=y,
> 	and expects call_rcu_tasks(), synchronize_rcu_tasks(), or
> 	rcu_tasks_classic_qs() do do something useful.

In the end there is no CONFIG_PREEMPT_XXX anymore. The only knob
remaining would be CONFIG_PREEMPT_RT, which should be renamed to
CONFIG_RT or such as it does not really change the preemption
model itself. RT just reduces the preemption disabled sections with the
lock conversions, forced interrupt threading and some more.

> 6.	You might think that RCU Tasks (as opposed to RCU Tasks Trace
> 	or RCU Tasks Rude) would need those pesky cond_resched() calls
> 	to stick around.  The reason is that RCU Tasks readers are ended
> 	only by voluntary context switches.  This means that although a
> 	preemptible infinite loop in the kernel won't inconvenience a
> 	real-time task (nor an non-real-time task for all that long),
> 	and won't delay grace periods for the other flavors of RCU,
> 	it would indefinitely delay an RCU Tasks grace period.
>
> 	However, RCU Tasks grace periods seem to be finite in preemptible
> 	kernels today, so they should remain finite in limited-preemptible
> 	kernels tomorrow.  Famous last words...

That's an issue which you have today with preempt FULL, right? So if it
turns out to be a problem then it's not a problem of the new model.

> 7.	RCU Tasks Trace, RCU Tasks Rude, and SRCU shouldn't notice
> 	any algorithmic difference from this change.
>
> 8.	As has been noted elsewhere, in this new limited-preemption
> 	mode of operation, rcu_read_lock() readers remain preemptible.
> 	This means that most of the CONFIG_PREEMPT_RCU #ifdefs remain.

Why? You fundamentally have a preemptible kernel with PREEMPT_RCU, no?

> 9.	The rcu_preempt_depth() macro could do something useful in
> 	limited-preemption kernels.  Its current lack of ability in
> 	CONFIG_PREEMPT_NONE=y kernels has caused trouble in the past.

Correct.

> 10.	The cond_resched_rcu() function must remain because we still
> 	have non-preemptible rcu_read_lock() readers.

Where?

> 11.	My guess is that the IPVS_EST_TICK_CHAINS heuristic remains
> 	unchanged, but I must defer to the include/net/ip_vs.h people.

*blink*

> 12.	I need to check with the BPF folks on the BPF verifier's
> 	definition of BTF_ID(func, rcu_read_unlock_strict).
>
> 13.	The kernel/locking/rtmutex.c file's rtmutex_spin_on_owner()
> 	function might have some redundancy across the board instead
> 	of just on CONFIG_PREEMPT_RCU=y.  Or might not.
>
> 14.	The kernel/trace/trace_osnoise.c file's run_osnoise() function
> 	might need to do something for non-preemptible RCU to make
> 	up for the lack of cond_resched() calls.  Maybe just drop the
> 	"IS_ENABLED()" and execute the body of the current "if" statement
> 	unconditionally.

Again. There is no non-preemtible RCU with this model, unless I'm
missing something important here.

> 15.	I must defer to others on the mm/pgtable-generic.c file's
> 	#ifdef that depends on CONFIG_PREEMPT_RCU.

All those ifdefs should die :)

> While in the area, I noted that KLP seems to depend on cond_resched(),
> but on this I must defer to the KLP people.

Yeah, KLP needs some thoughts, but that's not rocket science to fix IMO.

> I am sure that I am missing something, but I have not yet seen any
> show-stoppers.  Just some needed adjustments.

Right. If it works out as I think it can work out the main adjustments
are to remove a large amount of #ifdef maze and related gunk :)

Thanks,

        tglx


  parent reply	other threads:[~2023-10-18 13:16 UTC|newest]

Thread overview: 152+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-30 18:49 [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-30 18:49 ` [PATCH v2 1/9] mm/clear_huge_page: allow arch override for clear_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 2/9] mm/huge_page: separate clear_huge_page() and copy_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 3/9] mm/huge_page: cleanup clear_/copy_subpage() Ankur Arora
2023-09-08 13:09   ` Matthew Wilcox
2023-09-11 17:22     ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 4/9] x86/clear_page: extend clear_page*() for multi-page clearing Ankur Arora
2023-09-08 13:11   ` Matthew Wilcox
2023-08-30 18:49 ` [PATCH v2 5/9] x86/clear_page: add clear_pages() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 6/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-31 18:26   ` kernel test robot
2023-09-08 12:38   ` Peter Zijlstra
2023-09-13  6:43   ` Raghavendra K T
2023-08-30 18:49 ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-08  7:02   ` Peter Zijlstra
2023-09-08 17:15     ` Linus Torvalds
2023-09-08 22:50       ` Peter Zijlstra
2023-09-09  5:15         ` Linus Torvalds
2023-09-09  6:39           ` Ankur Arora
2023-09-09  9:11             ` Peter Zijlstra
2023-09-09 20:04               ` Ankur Arora
2023-09-09  5:30       ` Ankur Arora
2023-09-09  9:12         ` Peter Zijlstra
2023-09-09 20:15     ` Ankur Arora
2023-09-09 21:16       ` Linus Torvalds
2023-09-10  3:48         ` Ankur Arora
2023-09-10  4:35           ` Linus Torvalds
2023-09-10 10:01             ` Ankur Arora
2023-09-10 18:32               ` Linus Torvalds
2023-09-11 15:04                 ` Peter Zijlstra
2023-09-11 16:29                   ` andrew.cooper3
2023-09-11 17:04                   ` Ankur Arora
2023-09-12  8:26                     ` Peter Zijlstra
2023-09-12 12:24                       ` Phil Auld
2023-09-12 12:33                       ` Matthew Wilcox
2023-09-18 23:42                       ` Thomas Gleixner
2023-09-19  1:57                         ` Linus Torvalds
2023-09-19  8:03                           ` Ingo Molnar
2023-09-19  8:43                             ` Ingo Molnar
2023-09-19 13:43                               ` Thomas Gleixner
2023-09-19 13:25                             ` Thomas Gleixner
2023-09-19 12:30                           ` Thomas Gleixner
2023-09-19 13:00                             ` Arches that don't support PREEMPT Matthew Wilcox
2023-09-19 13:34                               ` Geert Uytterhoeven
2023-09-19 13:37                               ` John Paul Adrian Glaubitz
2023-09-19 13:42                                 ` Peter Zijlstra
2023-09-19 13:48                                   ` John Paul Adrian Glaubitz
2023-09-19 14:16                                     ` Peter Zijlstra
2023-09-19 14:24                                       ` John Paul Adrian Glaubitz
2023-09-19 14:32                                         ` Matthew Wilcox
2023-09-19 15:31                                           ` Steven Rostedt
2023-09-20 14:38                                       ` Anton Ivanov
2023-09-21 12:20                                       ` Arnd Bergmann
2023-09-19 14:17                                     ` Thomas Gleixner
2023-09-19 14:50                                       ` H. Peter Anvin
2023-09-19 14:57                                         ` Matt Turner
2023-09-19 17:09                                         ` Ulrich Teichert
2023-09-19 17:25                                     ` Linus Torvalds
2023-09-19 17:58                                       ` John Paul Adrian Glaubitz
2023-09-19 18:31                                       ` Thomas Gleixner
2023-09-19 18:38                                         ` Steven Rostedt
2023-09-19 18:52                                           ` Linus Torvalds
2023-09-19 19:53                                             ` Thomas Gleixner
2023-09-20  7:32                                           ` Ingo Molnar
2023-09-20  7:29                                         ` Ingo Molnar
2023-09-20  8:26                                       ` Thomas Gleixner
2023-09-20 10:37                                       ` David Laight
2023-09-19 14:21                                   ` Anton Ivanov
2023-09-19 15:17                                     ` Thomas Gleixner
2023-09-19 15:21                                       ` Anton Ivanov
2023-09-19 16:22                                         ` Richard Weinberger
2023-09-19 16:41                                           ` Anton Ivanov
2023-09-19 17:33                                             ` Thomas Gleixner
2023-10-06 14:51                               ` Geert Uytterhoeven
2023-09-20 14:22                             ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-20 20:51                               ` Thomas Gleixner
2023-09-21  0:14                                 ` Thomas Gleixner
2023-09-21  0:58                                 ` Ankur Arora
2023-09-21  2:12                                   ` Thomas Gleixner
2023-09-20 23:58                             ` Thomas Gleixner
2023-09-21  0:57                               ` Ankur Arora
2023-09-21  2:02                                 ` Thomas Gleixner
2023-09-21  4:16                                   ` Ankur Arora
2023-09-21 13:59                                     ` Steven Rostedt
2023-09-21 16:00                               ` Linus Torvalds
2023-09-21 22:55                                 ` Thomas Gleixner
2023-09-23  1:11                                   ` Thomas Gleixner
2023-10-02 14:15                                     ` Steven Rostedt
2023-10-02 16:13                                       ` Thomas Gleixner
2023-10-18  1:03                                     ` Paul E. McKenney
2023-10-18 12:09                                       ` Ankur Arora
2023-10-18 17:51                                         ` Paul E. McKenney
2023-10-18 22:53                                           ` Thomas Gleixner
2023-10-18 23:25                                             ` Paul E. McKenney
2023-10-18 13:16                                       ` Thomas Gleixner [this message]
2023-10-18 14:31                                         ` Steven Rostedt
2023-10-18 17:55                                           ` Paul E. McKenney
2023-10-18 18:00                                             ` Steven Rostedt
2023-10-18 18:13                                               ` Paul E. McKenney
2023-10-19 12:37                                                 ` Daniel Bristot de Oliveira
2023-10-19 17:08                                                   ` Paul E. McKenney
2023-10-18 17:19                                         ` Paul E. McKenney
2023-10-18 17:41                                           ` Steven Rostedt
2023-10-18 17:59                                             ` Paul E. McKenney
2023-10-18 20:15                                           ` Ankur Arora
2023-10-18 20:42                                             ` Paul E. McKenney
2023-10-19  0:21                                           ` Thomas Gleixner
2023-10-19 19:13                                             ` Paul E. McKenney
2023-10-20 21:59                                               ` Paul E. McKenney
2023-10-20 22:56                                               ` Ankur Arora
2023-10-20 23:36                                                 ` Paul E. McKenney
2023-10-21  1:05                                                   ` Ankur Arora
2023-10-21  2:08                                                     ` Paul E. McKenney
2023-10-24 12:15                                               ` Thomas Gleixner
2023-10-24 18:59                                                 ` Paul E. McKenney
2023-09-23 22:50                             ` Thomas Gleixner
2023-09-24  0:10                               ` Thomas Gleixner
2023-09-24  7:19                               ` Matthew Wilcox
2023-09-24  7:55                                 ` Thomas Gleixner
2023-09-24 10:29                                   ` Matthew Wilcox
2023-09-25  0:13                               ` Ankur Arora
2023-10-06 13:01                             ` Geert Uytterhoeven
2023-09-19  7:21                         ` Ingo Molnar
2023-09-19 19:05                         ` Ankur Arora
2023-10-24 14:34                         ` Steven Rostedt
2023-10-25  1:49                           ` Steven Rostedt
2023-10-26  7:50                           ` Sergey Senozhatsky
2023-10-26 12:48                             ` Steven Rostedt
2023-09-11 16:48             ` Steven Rostedt
2023-09-11 20:50               ` Linus Torvalds
2023-09-11 21:16                 ` Linus Torvalds
2023-09-12  7:20                   ` Peter Zijlstra
2023-09-12  7:38                     ` Ingo Molnar
2023-09-11 22:20                 ` Steven Rostedt
2023-09-11 23:10                   ` Ankur Arora
2023-09-11 23:16                     ` Steven Rostedt
2023-09-12 16:30                   ` Linus Torvalds
2023-09-12  3:27                 ` Matthew Wilcox
2023-09-12 16:20                   ` Linus Torvalds
2023-09-19  3:21   ` Andy Lutomirski
2023-09-19  9:20     ` Thomas Gleixner
2023-09-19  9:49       ` Ingo Molnar
2023-08-30 18:49 ` [PATCH v2 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-09-08 12:42   ` Peter Zijlstra
2023-09-11 17:24     ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-09-08 12:45   ` Peter Zijlstra
2023-09-03  8:14 ` [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Mateusz Guzik
2023-09-05 22:14   ` Ankur Arora
2023-09-08  2:18   ` Raghavendra K T
2023-09-05  1:06 ` Raghavendra K T
2023-09-05 19:36   ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pm1c3wbn.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jon.grimm@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox