linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: mingo@redhat.com, tglx@linutronix.de, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-api@vger.kernel.org, x86@kernel.org,
	pjt@google.com, posk@google.com, avagin@google.com,
	jannh@google.com, tdelisle@uwaterloo.ca, posk@posk.io
Subject: Re: [RFC][PATCH v2 5/5] sched: User Mode Concurency Groups
Date: Mon, 24 Jan 2022 10:48:46 +0100	[thread overview]
Message-ID: <20220124094846.GN20638@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <Yerl+ZrZ2qflIMyg@FVFF77S0Q05N>

On Fri, Jan 21, 2022 at 04:57:29PM +0000, Mark Rutland wrote:
> On Thu, Jan 20, 2022 at 04:55:22PM +0100, Peter Zijlstra wrote:
> > User Managed Concurrency Groups is an M:N threading toolkit that allows
> > constructing user space schedulers designed to efficiently manage
> > heterogeneous in-process workloads while maintaining high CPU
> > utilization (95%+).
> > 
> > XXX moar changelog explaining how this is moar awesome than
> > traditional user-space threading.
> 
> Awaiting a commit message that I can parse, I'm just looking at the entry bits
> for now. TBH I have no idea what this is actually trying to do...

Ha! yes.. I knew I was going to have to do that eventually :-)

It's basically a user-space scheduler that is subservient to the kernel
scheduler (hierarchical scheduling were a user task is a server for
other user tasks), where a server thread is in charge of selecting which
of it's worker threads gets to run. The original idea was that each
server only ever runs a single worker, but PeterO is currently trying to
reconsider that.

The *big* feature here, over traditional N:M scheduling, is that threads
can block, while traditional userspace threading is limited to
non-blocking system calls (and per later, page-faults).

In order to make that happen we must ovbiously hook schedule() for
these worker threads and inform userspace (the server thread) when this
happens such that it can select another worker thread to go vroom.

Meanwhile, a worker task getting woken from schedule() must not continue
running; instead it must enter the server's ready-queue and await it's
turn again. Instead of dealing with arbitrary delays deep inside the
kernel block chain, we punt and let the task complete until
return-to-user and block it there. The time between schedule() and
return-to-user is unmanaged time.

Now, since we can't readily poke at userspace memory from schedule(), we
could be holding mmap_sem etc., we pin the worker and server page on
sys-enter such that when we hit schedule() we can update state and then
unpin the pages such that page pin time is from sys-enter to first
schedule(), or sys-exit which ever comes first. This ensures the
page-pin is *short* term.

Additionally we must deal with signals :-(, the currnt approach is to
let them bust boundaries and run them as unmanaged time. UMCG userspace
can obviously control this by using pthread_sigmask() and friends.

Now, the reason for irqentry_irq_enable() is mostly #PF.  When a worker
faults and blocks we want the same things to happen.

Anyway, so workers have 3 layers of hooks:

		sys_enter
				schedule()
		sys_exit

	return-to-user

There's a bunch of paths through this:

 - sys_enter -> sys_exit:

	no blocking; nothing changes:
	  - sys_enter:
	    * pin pages

	  - sys_exit:
	    * unpin pages

 - sys_enter -> schedule() -> sys_exit:

	we did block:
	  - sys_enter:
	    * pin pages

	  - schedule():
	    * mark worker BLOCKED
	    * wake server (it will observe it's current worker !RUNNING
	      and select a new worker or idles)
	    * unpin pages

	  - sys_exit():
	    * mark worker RUNNABLE
	    * enqueue worker on server's runnable_list
	    * wake server (which will observe a new runnable task, add
	      it to whatever and if it was idle goes run, otherwise goes
	      back to sleep to let it's current worker finish)
	    * block until RUNNING

 - sys_enter -> schedule() -> sys_exit -> return_to_user:

	As above; except now we got a signal while !RUNNING. sys_exit()
	terminates and return-to-user takes over running the signal and
	on return from the signal we'll again block until RUNNING, or do
	the whole signal dance again if so required.


Does this clarify things a little?


  reply	other threads:[~2022-01-24  9:49 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-20 15:55 [RFC][PATCH v2 0/5] sched: User Managed Concurrency Groups Peter Zijlstra
2022-01-20 15:55 ` [RFC][PATCH v2 1/5] mm: Avoid unmapping pinned pages Peter Zijlstra
2022-01-20 18:03   ` Nadav Amit
2022-01-21  7:59     ` Peter Zijlstra
2022-01-20 18:25   ` David Hildenbrand
2022-01-21  7:51     ` Peter Zijlstra
2022-01-21  8:22       ` David Hildenbrand
2022-01-21  8:59       ` Peter Zijlstra
2022-01-21  9:04         ` David Hildenbrand
2022-01-21 11:40           ` Peter Zijlstra
2022-01-21 12:04             ` David Hildenbrand
2022-01-20 15:55 ` [RFC][PATCH v2 2/5] entry,x86: Create common IRQ operations for exceptions Peter Zijlstra
2022-01-21 16:34   ` Mark Rutland
2022-01-20 15:55 ` [RFC][PATCH v2 3/5] sched/umcg: add WF_CURRENT_CPU and externise ttwu Peter Zijlstra
2022-01-20 15:55 ` [RFC][PATCH v2 4/5] x86/uaccess: Implement unsafe_try_cmpxchg_user() Peter Zijlstra
2022-01-27  2:17   ` Sean Christopherson
2022-01-27  6:36     ` Sean Christopherson
2022-01-27  9:56       ` Peter Zijlstra
2022-01-27 23:33         ` Sean Christopherson
2022-01-28  0:17           ` Nick Desaulniers
2022-01-28 16:29             ` Sean Christopherson
2022-01-27  9:55     ` Peter Zijlstra
2022-01-20 15:55 ` [RFC][PATCH v2 5/5] sched: User Mode Concurency Groups Peter Zijlstra
2022-01-21 11:47   ` Peter Zijlstra
2022-01-21 15:18     ` Peter Zijlstra
2022-01-24 14:29       ` Peter Zijlstra
2022-01-24 16:44         ` Peter Zijlstra
2022-01-24 17:06           ` Peter Oskolkov
2022-01-25 14:59         ` Peter Zijlstra
2022-01-24 13:59     ` Peter Zijlstra
2022-01-21 12:26   ` Peter Zijlstra
2022-01-21 16:57   ` Mark Rutland
2022-01-24  9:48     ` Peter Zijlstra [this message]
2022-01-24 10:03     ` Peter Zijlstra
2022-01-24 10:07       ` Peter Zijlstra
2022-01-24 10:27         ` Mark Rutland
2022-01-24 14:46   ` Tao Zhou
2022-01-27 12:19     ` Peter Zijlstra
2022-01-27 18:33       ` Tao Zhou
2022-01-27 12:25     ` Peter Zijlstra
2022-01-27 18:47       ` Tao Zhou
2022-01-27 12:26     ` Peter Zijlstra
2022-01-27 18:31   ` Tao Zhou
2022-01-20 17:28 ` [RFC][PATCH v2 0/5] sched: User Managed Concurrency Groups Peter Oskolkov
2022-01-21  8:01   ` Peter Zijlstra
2022-01-21 18:01 ` Steven Rostedt
2022-01-24  8:20   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220124094846.GN20638@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=avagin@google.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=jannh@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pjt@google.com \
    --cc=posk@google.com \
    --cc=posk@posk.io \
    --cc=rostedt@goodmis.org \
    --cc=tdelisle@uwaterloo.ca \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox