From: Peter Zijlstra <peterz@infradead.org>
To: Peter Oskolkov <posk@posk.io>
Cc: mingo@redhat.com, tglx@linutronix.de, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-api@vger.kernel.org, x86@kernel.org,
pjt@google.com, posk@google.com, avagin@google.com,
jannh@google.com, tdelisle@uwaterloo.ca
Subject: Re: [RFC][PATCH 3/3] sched: User Mode Concurency Groups
Date: Tue, 18 Jan 2022 11:09:11 +0100 [thread overview]
Message-ID: <YeaRx9oDp08ABvyU@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <YeGEM7TP3tekBVEh@hirez.programming.kicks-ass.net>
On Fri, Jan 14, 2022 at 03:09:55PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 21, 2021 at 05:19:00PM +0000, Peter Oskolkov wrote:
> > What does worry me is that in this wakeup the server calls sys_umcg_wait()
> > with another worker in next_tid, so now the server will have two
> > workers running: the current kernel API seems to allow this to happen.
> > In my patchset the invariant that no more than one worker running
> > per server was enforced by the kernel.
>
> So one of the things I've started, but didn't finished, is to forward
> port the Proxy-Execution patches to a current kernel and play with the
> PE+UMCG interaction.
>
> Thinking about that interaction I've ran into that exact problem.
>
> The 'nice' solution is to actually block the worker, but that's also the
> slow solution :/
>
> The other solution seems to be to keep kernel state; track the current
> worker associated with the server. I haven't (so far) done that due to
> my futex trauma.
>
> So yes, the current API can be used to do the wrong thing, but the
> kernel doesn't care and you get to keep the pieces in userspace. And I
> much prefer user pieces over kernel pieces.
So I think I came up with a 3rd option; since the TID range is 30 bits
(per FUTEX_TID_MASK) we have those same two top bits to play with.
So we write into server::next_tid the tid of the worker we want to wake;
and we currently have it cleared such that we can distinguish between
the case where sys_umcg_wait() returned an error before or after waking
it.
However; we can use one of the 2 remaining bits to indicate the worker
is woken, let's say bit 31. This then has server::next_tid always
containing the current tid, even when the server has a 'spurious' wakeup
for other things.
Then all we need to do is modify the state check when the bit is set to
ensure we don't wake the worker again if we race sys_umcg_wait() with a
worker blocking.
A bit like this, I suppose... (incompete patch in that it relies on a
whole pile of local changes that might or might not live).
--- a/include/uapi/linux/umcg.h
+++ b/include/uapi/linux/umcg.h
@@ -94,6 +94,8 @@ struct umcg_task {
*/
__u32 state; /* r/w */
+#define UMCG_TID_RUNNING 0x80000000U
+#define UMCG_TID_MASK 0x3fffffffU
/**
* @next_tid: the TID of the UMCG task that should be context-switched
* into in sys_umcg_wait(). Can be zero, in which case
--- a/kernel/sched/umcg.c
+++ b/kernel/sched/umcg.c
@@ -20,7 +20,7 @@ static struct task_struct *umcg_get_task
if (tid) {
rcu_read_lock();
- tsk = find_task_by_vpid(tid);
+ tsk = find_task_by_vpid(tid & UMCG_TID_MASK);
if (tsk && current->mm == tsk->mm && tsk->umcg_task)
get_task_struct(tsk);
else
@@ -289,27 +291,6 @@ static int umcg_wake_task(struct task_st
return 0;
}
-static int umcg_wake_next(struct task_struct *tsk)
-{
- int ret = umcg_wake_task(tsk->umcg_next, tsk->umcg_next_task);
- if (ret)
- return ret;
-
- /*
- * If userspace sets umcg_task::next_tid, it needs to remove
- * that task from the ready-queue to avoid another server
- * selecting it. However, that also means it needs to put it
- * back in case it went unused.
- *
- * By clearing the field on use, userspace can detect this case
- * and DTRT.
- */
- if (put_user(0u, &tsk->umcg_task->next_tid))
- return -EFAULT;
-
- return 0;
-}
-
static int umcg_wake_server(struct task_struct *tsk)
{
int ret = umcg_wake_task(tsk->umcg_server, tsk->umcg_server_task);
@@ -637,6 +599,48 @@ SYSCALL_DEFINE2(umcg_kick, u32, flags, p
return 0;
}
+static int umcg_wake_next(struct task_struct *tsk, struct umcg_task __user *self)
+{
+ struct umcg_task __user *next_task;
+ struct task_struct *next;
+ u32 next_tid, state;
+ int ret;
+
+ if (get_user(next_tid, &self->next_tid))
+ return -EFAULT;
+
+ next = umcg_get_task(next_tid);
+ if (!next)
+ return -ESRCH;
+
+ next_task = READ_ONCE(next->umcg_task);
+
+ if (next_tid & UMCG_TID_RUNNING) {
+ ret = -EFAULT;
+ if (get_user(state, &next_task->state))
+ goto put_next;
+
+ ret = 0;
+ if ((state & UMCG_TASK_MASK) != UMCG_TASK_RUNNING)
+ ret = -EAGAIN;
+
+ } else {
+ ret = umcg_wake_task(next, next_task);
+ if (ret)
+ goto put_next;
+
+ ret = -EFAULT;
+ if (put_user(next_tid | UMCG_TID_RUNNING, &self->next_tid))
+ goto put_next;
+
+ ret = 0;
+ }
+
+put_next:
+ put_task_struct(next);
+ return ret;
+}
+
/**
* sys_umcg_wait: transfer running context
*
And once we have this, we can add sanity checks that server::next_tid is
what it should be for ever worker moving away from RUNNING state (which
depends on the assumption that all threads are in the same PID
namespace).
Does this make sense?
next prev parent reply other threads:[~2022-01-18 10:09 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-14 20:44 [RFC][PATCH 0/3] sched: User Managed Concurrency Groups Peter Zijlstra
2021-12-14 20:44 ` [RFC][PATCH 1/3] sched/umcg: add WF_CURRENT_CPU and externise ttwu Peter Zijlstra
2021-12-14 20:44 ` [RFC][PATCH 2/3] x86/uaccess: Implement unsafe_try_cmpxchg_user() Peter Zijlstra
2021-12-20 17:30 ` Sean Christopherson
2021-12-21 11:17 ` Peter Zijlstra
2021-12-14 20:44 ` [RFC][PATCH 3/3] sched: User Mode Concurency Groups Peter Zijlstra
2021-12-21 17:19 ` Peter Oskolkov
2022-01-14 14:09 ` Peter Zijlstra
2022-01-14 15:16 ` Peter Zijlstra
2022-01-14 17:15 ` Peter Zijlstra
2022-01-17 11:35 ` Peter Zijlstra
2022-01-17 12:22 ` Peter Zijlstra
2022-01-17 12:12 ` Peter Zijlstra
2022-01-18 10:09 ` Peter Zijlstra [this message]
2022-01-18 18:19 ` Peter Oskolkov
2022-01-19 8:47 ` Peter Zijlstra
2022-01-19 17:33 ` Peter Oskolkov
2022-01-19 8:51 ` Peter Zijlstra
2022-01-19 8:59 ` Peter Zijlstra
2022-01-19 17:52 ` Peter Oskolkov
2022-01-20 10:37 ` Peter Zijlstra
2022-01-17 13:04 ` Peter Zijlstra
2021-12-24 11:27 ` Peter Zijlstra
2021-12-14 21:00 ` [RFC][PATCH 0/3] sched: User Managed Concurrency Groups Peter Zijlstra
2021-12-15 3:46 ` Peter Oskolkov
2021-12-15 10:06 ` Peter Zijlstra
2021-12-15 13:03 ` Peter Zijlstra
2021-12-15 17:56 ` Peter Oskolkov
2021-12-15 18:18 ` Peter Zijlstra
2021-12-15 19:49 ` Peter Oskolkov
2021-12-15 22:25 ` Peter Zijlstra
2021-12-15 23:26 ` Peter Oskolkov
2021-12-16 13:23 ` Thomas Gleixner
2021-12-15 18:25 ` Peter Zijlstra
2021-12-15 21:04 ` Peter Oskolkov
2021-12-15 23:16 ` Peter Zijlstra
2021-12-15 23:31 ` Peter Oskolkov
2021-12-15 10:44 ` Peter Zijlstra
2021-12-15 13:49 ` Matthew Wilcox
2021-12-15 17:54 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YeaRx9oDp08ABvyU@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=avagin@google.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=jannh@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=pjt@google.com \
--cc=posk@google.com \
--cc=posk@posk.io \
--cc=rostedt@goodmis.org \
--cc=tdelisle@uwaterloo.ca \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox