Re: fuse uring / wake_up on the same core

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Bernd Schubert <bschubert@ddn.com>
To: Hillf Danton <hdanton@sina.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Miklos Szeredi <miklos@szeredi.hu>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Andrei Vagin <avagin@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: fuse uring / wake_up on the same core
Date: Fri, 28 Apr 2023 21:54:51 +0000	[thread overview]
Message-ID: <d954ca54-2a3f-b111-7ba5-41169de473ce@ddn.com> (raw)
In-Reply-To: <20230428014443.2539-1-hdanton@sina.com>

On 4/28/23 03:44, Hillf Danton wrote:
> On 27 Apr 2023 13:35:31 +0000 Bernd Schubert <bschubert@ddn.com>
>> Btw, a very hackish way to 'solve' the issue is this
>>
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index cd7aa679c3ee..dd32effb5010 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -373,6 +373,26 @@ static void request_wait_answer(struct fuse_req *req)
>>           int err;
>>           int prev_cpu = task_cpu(current);
>>    
>> +       /* When running over uring and core affined userspace threads, we
>> +        * do not want to let migrate away the request submitting process.
>> +        * Issue is that even after waking up on the right core, processes
>> +        * that have submitted requests might get migrated away, because
>> +        * the ring thread is still doing a bit of work or is in the process
>> +        * to go to sleep. Assumption here is that processes are started on
>> +        * the right core (i.e. idle cores) and can then stay on that core
>> +        * when they come and do file system requests.
>> +        * Another alternative way is to set SCHED_IDLE for ring threads,
>> +        * but that would have an issue if there are other processes keeping
>> +        * the cpu busy.
>> +        * SCHED_IDLE or this hack here result in about factor 3.5 for
>> +        * max meta request performance.
>> +        *
>> +        * Ideal would to tell the scheduler that ring threads are not disturbing
>> +        * that migration away from it should very very rarely happen.
>> +        */
>> +       if (fc->ring.ready)
>> +               migrate_disable();
>> +
>>           if (!fc->no_interrupt) {
>>                   /* Any signal may interrupt this */
>>                   err = wait_event_interruptible(req->waitq,
>>
> If I understand it correctly, the seesaw workload hint to scheduler looks
> like the diff below, leaving scheduler free to pull the two players apart
> across CPU and to migrate anyone.

Thank a lot Hillf! I had a day off / family day today, kernel is now 
eventually compiling.

> 
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -421,6 +421,7 @@ static void __fuse_request_send(struct f
>   		/* acquire extra reference, since request is still needed
>   		   after fuse_request_end() */
>   		__fuse_get_request(req);
> +		current->seesaw = 1;
>   		queue_request_and_unlock(fiq, req);
>   
>   		request_wait_answer(req);
> @@ -1229,6 +1230,7 @@ static ssize_t fuse_dev_do_read(struct f
>   			   fc->max_write))
>   		return -EINVAL;
>   
> +	current->seesaw = 1;

fuse_dev_do_read is plain /dev/fuse (with read/write) and we don't know 
on which core these IO threads are running and which of them to wake up 
when an application comes with a request.

There is a patch to use __wake_up_sync to wake the IO thread and reports 
that it helps in performance, but I don't see it and I think Miklos 
neither. For direct-io read I had also already tested disabling 
migration - it didn't show any effect - we better don't set 
current->seesaw = 1 in fuse_dev_do_read for now.

With my fuse-uring patches things are more clear
(https://lwn.net/Articles/926773/), there is one IO thread per core and 
libfuse side is binding these threads to a single core only.

nproc    /dev/fuse     /dev/fuse     fuse uring    fuse uring
           migrate on   migrate off  migrate on    migrate off
1         2023          1652          1151         3998
2         3375          2805          2221         7950
4         3823          4193          4540         15022
8         7796          8161          7846         22591
16        8520          8518          12235        27864
24        8361          8084          9415         27864
32        8361          8084          9124         12971

(in MiB/s)

So core affinity really matters and with core affinity it is always 
faster with fuse-uring over the existing code.

For single threaded metadata (file creates/stat/unlink) difference 
between migrate on/off is rather similar.  Going to run with multiple 
processes during the next days.

For paged (async) IO it behaves a bit different as here uring can show 
it strength and multiple requests can be combined on CQE processing - 
better to chose and idle ring thread on another core. I actually have a 
question for that as well - later.


>    restart:
>   	for (;;) {
>   		spin_lock(&fiq->lock);
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -953,6 +953,7 @@ struct task_struct {
>   	/* delay due to memory thrashing */
>   	unsigned                        in_thrashing:1;
>   #endif
> +	unsigned 			seesaw:1;
>   
>   	unsigned long			atomic_flags; /* Flags requiring atomic access. */
>   
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7424,6 +7424,8 @@ select_task_rq_fair(struct task_struct *
>   	if (wake_flags & WF_TTWU) {
>   		record_wakee(p);
>   
> +		if (p->seesaw && current->seesaw)
> +			return cpu;
>   		if (sched_energy_enabled()) {
>   			new_cpu = find_energy_efficient_cpu(p, prev_cpu);
>   			if (new_cpu >= 0)


Hmm, WF_CURRENT_CPU works rather similar, except that it tests if cpu is 
in cpus_ptr?  The combination of both patches results in

		if (p->seesaw && current->seesaw)
			return cpu;

		if ((wake_flags & WF_CURRENT_CPU) &&
		    cpumask_test_cpu(cpu, p->cpus_ptr))
			return cpu;



While writing the mail kernel compilation is ready, but it got late, 
will test in the morning.


Thanks again,
Bernd

next prev parent reply	other threads:[~2023-04-28 21:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d0ed1dbd-1b7e-bf98-65c0-7f61dd1a3228@ddn.com>
     [not found] ` <20230327102845.GB7701@hirez.programming.kicks-ass.net>
     [not found]   ` <20230427122417.2452-1-hdanton@sina.com>
     [not found]     ` <3c0facd0-e3c7-0aa1-8b2e-961120d4f43d@ddn.com>
2023-04-28  1:44       ` Hillf Danton
2023-04-28 21:54         ` Bernd Schubert [this message]
2023-04-28 23:37           ` Hillf Danton
2023-05-01 21:44           ` Bernd Schubert
     [not found]           ` <20230502003335.3253-1-hdanton@sina.com>
2023-05-03 17:04             ` Bernd Schubert
2023-05-04  2:16               ` Hillf Danton
2023-05-05 13:10                 ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d954ca54-2a3f-b111-7ba5-41169de473ce@ddn.com \
    --to=bschubert@ddn.com \
    --cc=avagin@gmail.com \
    --cc=hdanton@sina.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox