From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B0952C4167B
	for <linux-mm@archiver.kernel.org>; Tue, 28 Nov 2023 17:05:23 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 516416B0323; Tue, 28 Nov 2023 12:05:23 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4EBE46B0326; Tue, 28 Nov 2023 12:05:23 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3B4656B032A; Tue, 28 Nov 2023 12:05:23 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id 2BA3D6B0323
	for <linux-mm@kvack.org>; Tue, 28 Nov 2023 12:05:23 -0500 (EST)
Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 00271A0300
	for <linux-mm@kvack.org>; Tue, 28 Nov 2023 17:05:22 +0000 (UTC)
X-FDA: 81507988926.12.569313A
Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55])
	by imf05.hostedemail.com (Postfix) with ESMTP id 28B85100173
	for <linux-mm@kvack.org>; Tue, 28 Nov 2023 17:04:36 +0000 (UTC)
Authentication-Results: imf05.hostedemail.com;
	dkim=pass header.d=linutronix.de header.s=2020 header.b=07ofSme9;
	dkim=pass header.d=linutronix.de header.s=2020e header.b=EVBWFb7X;
	dmarc=pass (policy=none) header.from=linutronix.de;
	spf=pass (imf05.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1701191078;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=Xfko3HMIoKv9mdpGScU+7vTPH+jGNQNO1UKU6GwR9yA=;
	b=Ao7MQvrfaRMk3C0j8bQCczPTxszlJ1CW+e655Q33N255XEAMlkG8Cu52czX53DR494OTKJ
	SLy6ICoJWYq5eZs6eHLEzi/xpSLV4gZ3LOFNcivl4e1pbbk9rwMoDbLRwcx8CaxHH63ROG
	wGHP3KiOvTbVrYc6vD/xLXsWbh09x1I=
ARC-Authentication-Results: i=1;
	imf05.hostedemail.com;
	dkim=pass header.d=linutronix.de header.s=2020 header.b=07ofSme9;
	dkim=pass header.d=linutronix.de header.s=2020e header.b=EVBWFb7X;
	dmarc=pass (policy=none) header.from=linutronix.de;
	spf=pass (imf05.hostedemail.com: domain of tglx@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tglx@linutronix.de
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701191078; a=rsa-sha256;
	cv=none;
	b=3S4dYwoItFrrZZU5d+4KsDaY4OCWXUeXIv0tw8Ygf0EpTE/4GMtAIB55IoXcWLis7w4+RE
	mbEx16Y8rY7V8rSHbR4bAxLZUjwPlH+wslCJDT2OlOqdewf3bMjKf2K1OKQ7dObcIFKJg/
	7EfvDZWFaS7yRLT7TFvI5jSwnBeaoOA=
From: Thomas Gleixner <tglx@linutronix.de>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de;
	s=2020; t=1701191074;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Xfko3HMIoKv9mdpGScU+7vTPH+jGNQNO1UKU6GwR9yA=;
	b=07ofSme9YUvQgryEpVhg44lrLogDCyumJ3hy4MaHGcCnBrz236kh0U9II9+lblo874CUXO
	te5W3gSbGGj0bWX/Lc6oGvz4jR1XwkdFKHpzBZ8GzcsRT++j/W+JswAAT/WclHpqsR8xle
	swMpHjkGMK6Me7zldR7DH5Ipgmu1o1zT8j8yiL/o8odJM3KK2dXaU/Kr9kDsazzWnJOe66
	0SQfTWPugExYwTdUhtdaM3eRsi+QWCaF4tbvTvFxR7VvSyQkU9kef5ftEx4ys/Sj34ebz/
	zSl1k0Io1mpm0SJtdEtta+sO3a32z0ZaITI944s9qyBoyWs6z2RuFEI7qxiRTQ==
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de;
	s=2020e; t=1701191074;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Xfko3HMIoKv9mdpGScU+7vTPH+jGNQNO1UKU6GwR9yA=;
	b=EVBWFb7XtKdcqybnG2Hs74oRpvnl8zsjRZH0FVFTjyRJVXM0GwjBQH9y9Sn/xE3p6L5o3D
	AclsCwKiP2kvRyDw==
To: paulmck@kernel.org, Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
 torvalds@linux-foundation.org, linux-mm@kvack.org, x86@kernel.org,
 akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
 dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
 juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org,
 mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com,
 raghavendra.kt@amd.com, boris.ostrovsky@oracle.com,
 konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com,
 mingo@kernel.org, bristot@kernel.org, mathieu.desnoyers@efficios.com,
 geert@linux-m68k.org, glaubitz@physik.fu-berlin.de,
 anton.ivanov@cambridgegreys.com, mattst88@gmail.com,
 krypton@ulrich-teichert.org, rostedt@goodmis.org, David.Laight@aculab.com,
 richard@nod.at, mjguzik@gmail.com
Subject: Re: [RFC PATCH 48/86] rcu: handle quiescent states for PREEMPT_RCU=n
In-Reply-To: <2027da00-273d-41cf-b9e7-460776181083@paulmck-laptop>
References: <20231107215742.363031-1-ankur.a.arora@oracle.com>
 <20231107215742.363031-49-ankur.a.arora@oracle.com>
 <2027da00-273d-41cf-b9e7-460776181083@paulmck-laptop>
Date: Tue, 28 Nov 2023 18:04:33 +0100
Message-ID: <87v89lzu5a.ffs@tglx>
MIME-Version: 1.0
Content-Type: text/plain
X-Rspam-User: 
X-Stat-Signature: kwtarruo1py7adju5rfuperqdwshbdi7
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: 28B85100173
X-HE-Tag: 1701191076-117098
X-HE-Meta: U2FsdGVkX1/Fx7sEyEm92zz0oOcvpGLl/8Vbkv5DMrhHaMn6SwHoAIssnMt3v/Dvm5hRKsBNXUiPgzXRPfeLpMiEoI9Ou+fP74ebl+KT8RN0fJZ3MNxDyQiqKlcCmld8K/yMjFXuTb8+aJMjB5TTR6Hwgjk2poecBP5GpQ3iswCnY0jDhc/Hb4pmfAfBZyy59EJWPm8IF8zq8COsmBN2JwCqCTqQsm53W1shVGEjbMspcPwYdOi1OgNqjRPKrSbej2v+GDsbay0KMK9RA6HiTiIiU/1yroLob7QRU+FrKRq8S+Fza9EUNYxOLrcFKBihC1pThHqCq7ZghJ3HrZDDnDVYb/LTXPqJtHb2Hz1Akl0au1EDAyJByMDDnCpbKuGKoNLyJdVRAm8BrOFNt0Pr1LEk/dzVF9Ya2OHFiEjcqVfx6XJc57CjyF7NEYuYLgtyxGLvqQkiA4S9x4LXuR3mgJuQzAqVKexk+6d4R7InhrgJS7n84wsQVkPSqlEo84mpwK4qEUeuJHm18QJtX4nUtpWlYT4RVNpEGYNw/X5Q2+vUQxlGHPWJpdbrdmWw1NciFXLV0GME2MmditsfQeku3eCmipqwF10HgAfb9B7kafKn2ACdrLkybpaSSdrz7j6M0uJg/KjYwnxcbrzrijNJ9V3S8hKY1pLXH+nnCe3wi5xVZkbGl/g3JAS7rZXNX5p3IJwzrhO3g7uQvxpcQuGhGiXYopsL9Vp1KvAEpMp1eqrX2zkIupUZgQJUGBoMHWGJi6VPYjZVWJTWIbX2vkaW51ygn5EGg6+DTLOO72La0e8BnKp7bQdp+MhUO4PAfulRqepmfLNeJfzJOVAQ2Hzpn/SuJAvsJkxYAM87TA4rxrxRwZipt7CUGI/wO40Ay0HQ2YmT8Qeij8FDKA9gW3H5v+iJ8ZYpbsbM90aA1els9DqyMnVlOhmr4yEyUiwLZ5rovUTnNlcOnvdLqoentUV
 9zaaijau
 /mbfMgCITEjn0HjHop7DA+aSH7pW/tJXw1XKizI/OBs1lisYEGC/aMMxmiqgs4u/iFJQagEwSSM6dxKbSZ9aU4cvOKwuyE+upJGl0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Paul!

On Mon, Nov 20 2023 at 16:38, Paul E. McKenney wrote:
> But...
>
> Suppose we have a long-running loop in the kernel that regularly
> enables preemption, but only momentarily.  Then the added
> rcu_flavor_sched_clock_irq() check would almost always fail, making
> for extremely long grace periods.  Or did I miss a change that causes
> preempt_enable() to help RCU out?

So first of all this is not any different from today and even with
RCU_PREEMPT=y a tight loop:

    do {
    	preempt_disable();
        do_stuff();
        preempt_enable();
    }

will not allow rcu_flavor_sched_clock_irq() to detect QS reliably. All
it can do is to force reschedule/preemption after some time, which in
turn ends up in a QS.

The current NONE/VOLUNTARY models, which imply RCU_PRREMPT=n cannot do
that at all because the preempt_enable() is a NOOP and there is no
preemption point at return from interrupt to kernel.

    do {
        do_stuff();
    }

So the only thing which makes that "work" is slapping a cond_resched()
into the loop:

    do {
        do_stuff();
        cond_resched();
    }

But the whole concept behind LAZY is that the loop will always be:

    do {
    	preempt_disable();
        do_stuff();
        preempt_enable();
    }

and the preempt_enable() will always be a functional preemption point.

So let's look at the simple case where more than one task is runnable on
a given CPU:

    loop()

      preempt_disable();

      --> tick interrupt
          set LAZY_NEED_RESCHED

      preempt_enable() -> Does nothing because NEED_RESCHED is not set

      preempt_disable();

      --> tick interrupt
          set NEED_RESCHED

      preempt_enable()
        preempt_schedule()
          schedule()
            report_QS()

which means that on the second tick a quiesent state is
reported. Whether that's really going to be a full tick which is granted
that's a scheduler decision and implementation detail and not really
relevant for discussing the concept.

Now the problematic case is when there is only one task runnable on a
given CPU because then the tick interrupt will set neither of the
preemption bits. Which is fine from a scheduler perspective, but not so
much from a RCU perspective.

But the whole point of LAZY is to be able to enforce rescheduling at the
next possible preemption point. So RCU can utilize that too:

rcu_flavor_sched_clock_irq(bool user)
{
	if (user || rcu_is_cpu_rrupt_from_idle() ||
	    !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
		rcu_qs();
                return;
	}

        if (this_cpu_read(rcu_data.rcu_urgent_qs))
        	set_need_resched();
}

So:

    loop()

      preempt_disable();

      --> tick interrupt
            rcu_flavor_sched_clock_irq()
                sets NEED_RESCHED

      preempt_enable()
        preempt_schedule()
          schedule()
            report_QS()

See? No magic nonsense in preempt_enable(), no cond_resched(), nothing.

The above rcu_flavor_sched_clock_irq() check for rcu_data.rcu_urgent_qs
is not really fundamentaly different from the check in rcu_all_gs(). The
main difference is that it is bound to the tick, so the detection/action
might be delayed by a tick. If that turns out to be a problem, then this
stuff has far more serious issues underneath.

So now you might argue that for a loop like this:

    do {
        mutex_lock();
        do_stuff();
        mutex_unlock();
    }

the ideal preemption point is post mutex_unlock(), which is where
someone would mindfully (*cough*) place a cond_resched(), right?

So if that turns out to matter in reality and not just by academic
inspection, then we are far better off to annotate such code with:

    do {
        preempt_lazy_disable();
        mutex_lock();
        do_stuff();
        mutex_unlock();
        preempt_lazy_enable();
    }

and let preempt_lazy_enable() evaluate the NEED_RESCHED_LAZY bit.

Then rcu_flavor_sched_clock_irq(bool user) can then use a two stage
approach like the scheduler:

rcu_flavor_sched_clock_irq(bool user)
{
	if (user || rcu_is_cpu_rrupt_from_idle() ||
	    !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
		rcu_qs();
                return;
	}

        if (this_cpu_read(rcu_data.rcu_urgent_qs)) {
        	if (!need_resched_lazy()))
                	set_need_resched_lazy();
                else
                	set_need_resched();
	}
}

But for a start I would just use the trivial

        if (this_cpu_read(rcu_data.rcu_urgent_qs))
        	set_need_resched();

approach and see where this gets us.

With the approach I suggested to Ankur, i.e. having PREEMPT_AUTO(or
LAZY) as a config option we can work on the details of the AUTO and
RCU_PREEMPT=n flavour up to the point where we are happy to get rid
of the whole zoo of config options alltogether.

Just insisting that RCU_PREEMPT=n requires cond_resched() and whatsoever
is not really getting us anywhere.

Thanks,

        tglx