Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Peter Oskolkov <posk@posk.io>
Cc: Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-api@vger.kernel.org, Paul Turner <pjt@google.com>,
	Ben Segall <bsegall@google.com>, Peter Oskolkov <posk@google.com>,
	Andrei Vagin <avagin@google.com>, Jann Horn <jannh@google.com>,
	Thierry Delisle <tdelisle@uwaterloo.ca>
Subject: Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls
Date: Mon, 29 Nov 2021 17:41:05 +0100	[thread overview]
Message-ID: <YaUCoe07Wl9Stlch@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CAFTs51XnN+N74i1XHvRUAUWd04-Fs9uV6ouXo=CQSQs8MaEM5A@mail.gmail.com>

On Sun, Nov 28, 2021 at 04:29:11PM -0800, Peter Oskolkov wrote:

> wait_wake_only is not needed if you have both next_tid and server_tid,
> as your patch has. In my version of the patch, next_tid is the same as
> server_tid, so the flag is needed to indicate to the kernel that
> next_tid is the wakee, not the server.

Ah, okay.

> re: (idle_)server_tid_ptr: it seems that you assume that blocked
> workers keep their servers, while in my patch they "lose them" once
> they block, and so there should be a global idle server pointer to
> wake the server in my scheme (if there is an idle one). The main
> difference is that in my approach a server has only a single, running,
> worker assigned to it, while in your approach it can have a number of
> blocked/idle workers to take care of as well.

Correct; I've been thinking in analogues of the way we schedule CPUs.
Each CPU has a ready/run queue along with the current task.
fundamentally the RUNNABLE tasks need to go somewhere when all servers
are busy. So at that point the previous server is as good a place as
any.

Now, I sympathise with a blocked task not having a relation; I often
argue this same, since we have wakeup balancing etc. And I've not really
thought about how to best do wakeup-balancing, also see below.

> The main difference between our approaches, as I see it: in my
> approach if a worker is running, its server is sleeping, period. If we
> have N servers, and N running workers, there are no servers to wake
> when a previously blocked worker finishes its blocking op. In your
> approach, it seems that N servers have each a bunch of workers
> pointing at them, and a single worker running. If a previously blocked
> worker wakes up, it wakes the server it was assigned to previously,

Right; it does that. It can check the ::state of it's current task,
possibly set TF_PREEMPT or just go back to sleep.

> and so now we have more than N physical tasks/threads running: N
> workers and the woken server. This is not ideal: if the process is
> affined to only N CPUs, that means a worker will be preempted to let
> the woken server run, which is somewhat against the goal of letting
> the workers run more or less uninterrupted. This is not deal breaking,
> but maybe something to keep in mind.

I suppose it's easy enough to make this behaviour configurable though;
simply enqueue and not wake.... Hmm.. how would this worker know if the
server was 'busy' or not? The whole 'current' thing is a user-space
construct. I suppose that's what your pointer was for? Puts an actual
idle server in there, if there is one. Let me ponder that a bit.

However, do note this whole scheme fundamentally has some of that, the
moment the syscall unblocks until sys_exit is 'unmanaged' runtime for
all tasks, they can consume however much time the syscall needs there.

Also, timeout on sys_umcg_wait() gets you the exact same situation (or
worse, multiple running workers).

> Another big concern I have is that you removed UMCG_TF_LOCKED. I

OOh yes, I forgot to mention that. I couldn't figure out what it was
supposed to do.

> definitely needed it to guard workers during "sched work" in the
> userspace in my approach. I'm not sure if the flag is absolutely
> needed with your approach, but most likely it is - the kernel-side
> scheduler does lock tasks and runqueues and disables interrupts and
> migrations and other things so that the scheduling logic is not
> hijacked by concurrent stuff. Why do you assume that the userspace
> scheduling code does not need similar protections?

I've not yet come across a case where this is needed. Migration for
instance is possible when RUNNABLE, simply write ::server_tid before
::state. Userspace just needs to make sure who actually owns the task,
but it can do that outside of this state.

But like I said; I've not yet done the userspace part (and I lost most
of today trying to install a new machine), so perhaps I'll run into it
soon enough.

next prev parent reply	other threads:[~2021-11-29 16:41 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-22 21:13 [PATCH v0.9.1 0/6] sched,mm,x86/uaccess: implement User Managed Concurrency Groups Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 1/6] sched/umcg: add WF_CURRENT_CPU and externise ttwu Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 2/6] mm, x86/uaccess: add userspace atomic helpers Peter Oskolkov
2021-11-24 14:31   ` Peter Zijlstra
2021-11-22 21:13 ` [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls Peter Oskolkov
2021-11-24 18:36   ` kernel test robot
2021-11-24 20:08   ` Peter Zijlstra
2021-11-24 21:32     ` Peter Zijlstra
2021-11-25 17:28     ` Peter Oskolkov
2021-11-26 17:09       ` Peter Zijlstra
2021-11-26 21:08         ` Thomas Gleixner
2021-11-26 21:59           ` Peter Zijlstra
2021-11-26 22:07             ` Peter Zijlstra
2021-11-27  0:45             ` Thomas Gleixner
2021-11-29 15:05               ` Peter Zijlstra
2021-11-26 22:16         ` Peter Zijlstra
2021-11-27  1:16           ` Thomas Gleixner
2021-11-29 15:07             ` Peter Zijlstra
2021-11-29  0:29         ` Peter Oskolkov
2021-11-29 16:41           ` Peter Zijlstra [this message]
2021-11-29 17:34             ` Peter Oskolkov
2021-11-29 21:08               ` Peter Zijlstra
2021-11-29 21:29                 ` Peter Zijlstra
2021-11-29 23:38                 ` Peter Oskolkov
2021-12-06 11:32                   ` Peter Zijlstra
2021-12-06 12:04                     ` Peter Zijlstra
2021-12-13 13:55                     ` Peter Zijlstra
2021-12-06 11:47               ` Peter Zijlstra
2022-01-19 17:26                 ` Peter Oskolkov
2022-01-20 11:07                   ` Peter Zijlstra
2021-11-24 21:19   ` Peter Zijlstra
2021-11-26 21:11     ` Thomas Gleixner
2021-11-26 21:52       ` Peter Zijlstra
2021-11-29 22:07         ` Thomas Gleixner
2021-11-29 22:22           ` Peter Zijlstra
2021-11-24 21:41   ` Peter Zijlstra
2021-11-24 21:58   ` Peter Zijlstra
2021-11-24 22:18   ` Peter Zijlstra
2021-11-22 21:13 ` [PATCH v0.9.1 4/6] sched/umcg, lib/umcg: implement libumcg Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 5/6] sched/umcg: add Documentation/userspace-api/umcg.txt Peter Oskolkov
2021-11-22 21:13 ` [PATCH v0.9.1 6/6] sched/umcg, lib/umcg: add tools/lib/umcg/libumcg.txt Peter Oskolkov
2021-11-24 14:06 ` [PATCH v0.9.1 0/6] sched,mm,x86/uaccess: implement User Managed Concurrency Groups Peter Zijlstra
2021-11-24 16:28   ` Peter Oskolkov
2021-11-24 17:20     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YaUCoe07Wl9Stlch@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@google.com \
    --cc=bsegall@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pjt@google.com \
    --cc=posk@google.com \
    --cc=posk@posk.io \
    --cc=tdelisle@uwaterloo.ca \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox