linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Gabriele Monaco <gmonaco@redhat.com>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	linux-mm@kvack.org
Cc: Ingo Molnar <mingo@kernel.org>, Shuah Khan <shuah@kernel.org>
Subject: Re: [PATCH v8 1/2] sched: Move task_mm_cid_work to mm work_struct
Date: Thu, 20 Feb 2025 09:42:46 -0500	[thread overview]
Message-ID: <c9026605-da1b-4631-b0dd-68ae0700ec87@efficios.com> (raw)
In-Reply-To: <20250220102639.141314-2-gmonaco@redhat.com>

On 2025-02-20 05:26, Gabriele Monaco wrote:
> Currently, the task_mm_cid_work function is called in a task work
> triggered by a scheduler tick to frequently compact the mm_cids of each
> process. This can delay the execution of the corresponding thread for
> the entire duration of the function, negatively affecting the response
> in case of real time tasks. In practice, we observe task_mm_cid_work
> increasing the latency of 30-35us on a 128 cores system, this order of
> magnitude is meaningful under PREEMPT_RT.
> 
> Run the task_mm_cid_work in a new work_struct connected to the
> mm_struct rather than in the task context before returning to
> userspace.
> 
> This work_struct is initialised with the mm and disabled before freeing
> it. The queuing of the work happens while returning to userspace in
> __rseq_handle_notify_resume, maintaining the checks to avoid running
> more frequently than MM_CID_SCAN_DELAY.
> To make sure this happens predictably also on long running tasks, we
> trigger a call to __rseq_handle_notify_resume also from the scheduler
> tick (which in turn will also schedule the work item).
> 
> The main advantage of this change is that the function can be offloaded
> to a different CPU and even preempted by RT tasks.
> 
> Moreover, this new behaviour is more predictable with periodic tasks
> with short runtime, which may rarely run during a scheduler tick.
> Now, the work is always scheduled when the task returns to userspace.
> 
> The work is disabled during mmdrop, since the function cannot sleep in
> all kernel configurations, we cannot wait for possibly running work
> items to terminate. We make sure the mm is valid in case the task is
> terminating by reserving it with mmgrab/mmdrop, returning prematurely if
> we are really the last user while the work gets to run.
> This situation is unlikely since we don't schedule the work for exiting
> tasks, but we cannot rule it out.
> 
> Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
[...]
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9aecd914ac691..363e51dd25175 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5663,7 +5663,7 @@ void sched_tick(void)
>   		resched_latency = cpu_resched_latency(rq);
>   	calc_global_load_tick(rq);
>   	sched_core_tick(rq);
> -	task_tick_mm_cid(rq, donor);
> +	rseq_preempt(donor);
>   	scx_tick(rq);
>   
>   	rq_unlock(rq, &rf);

There is one tiny important detail worth discussing here: I wonder if
executing a __rseq_handle_notify_resume() on return to userspace on
every scheduler tick will cause noticeable performance degradation ?

I think we can mitigate the impact if we can quickly compute the amount
of contiguous unpreempted runtime since last preemption, then we could
use this as a way to only issue rseq_preempt() when there has been a
minimum amount of contiguous unpreempted execution. Otherwise the
rseq_preempt() already issued by preemption is enough.

I'm not entirely sure how to compute this "unpreempted contiguous
runtime" value within sched_tick() though, any ideas ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


  reply	other threads:[~2025-02-20 14:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250220102639.141314-1-gmonaco@redhat.com>
2025-02-20 10:26 ` Gabriele Monaco
2025-02-20 14:42   ` Mathieu Desnoyers [this message]
2025-02-20 15:30     ` Gabriele Monaco
2025-02-20 15:47       ` Mathieu Desnoyers
2025-02-20 17:31         ` Gabriele Monaco
2025-02-20 21:10           ` Mathieu Desnoyers
2025-02-20 15:30     ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9026605-da1b-4631-b0dd-68ae0700ec87@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=akpm@linux-foundation.org \
    --cc=gmonaco@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox