From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59DCBC77B73 for ; Fri, 28 Apr 2023 01:45:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F25E6B0071; Thu, 27 Apr 2023 21:45:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A2B2900002; Thu, 27 Apr 2023 21:45:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 891556B0074; Thu, 27 Apr 2023 21:45:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 796AE6B0071 for ; Thu, 27 Apr 2023 21:45:02 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2A7ADC0148 for ; Fri, 28 Apr 2023 01:45:02 +0000 (UTC) X-FDA: 80729106444.27.5E88C23 Received: from r3-22.sinamail.sina.com.cn (r3-22.sinamail.sina.com.cn [202.108.3.22]) by imf16.hostedemail.com (Postfix) with ESMTP id 1A8B0180004 for ; Fri, 28 Apr 2023 01:44:57 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.22 as permitted sender) smtp.mailfrom=hdanton@sina.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682646299; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T5W1X9fS+jn+40B1jsPct/1dFW267pODxIxH5NG23/c=; b=T3jlYQ7AP9qmqYYH2gWluCJFsFFSmZI71bS0RVwv201L68YJ/u5W6xr51dAJqt1wFEmKPX pPXaWvXFYx1lDSa8wwIdjmWS1RYZkE/QMZIPOTb9c8Za6vFGLM+HtDsMfzPXBCUAyss0zs ZnajBBS7uM5vjvmjIKWut+zeAOpkodU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.22 as permitted sender) smtp.mailfrom=hdanton@sina.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682646299; a=rsa-sha256; cv=none; b=A2f2Eac0gb8AUwTAGfQP3qe+ztnq3loQs8djlCIcknK538I5csHoS+0pn1Uq5jwp8KWQA/ rvg6j4m+/O/nhHSaNs0VYLldo+mSY5n3d49/TpK7eYevn7J/dY9NCeSezw7e4LQ0OoKn10 BqebeoR5FgbRH9zo8lDyheEBuJmEEXQ= X-SMAIL-HELO: localhost.localdomain Received: from unknown (HELO localhost.localdomain)([114.249.59.75]) by sina.com (172.16.97.32) with ESMTP id 644B24BF00013AE6; Fri, 28 Apr 2023 09:43:29 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 430423628783 From: Hillf Danton To: Bernd Schubert Cc: Peter Zijlstra , Miklos Szeredi , K Prateek Nayak , Andrei Vagin , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: fuse uring / wake_up on the same core Date: Fri, 28 Apr 2023 09:44:43 +0800 Message-Id: <20230428014443.2539-1-hdanton@sina.com> In-Reply-To: <3c0facd0-e3c7-0aa1-8b2e-961120d4f43d@ddn.com> References: <20230327102845.GB7701@hirez.programming.kicks-ass.net> <20230427122417.2452-1-hdanton@sina.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1A8B0180004 X-Rspam-User: X-Stat-Signature: c94b3y8gdku7xtummw1i7ci6tqse57mo X-HE-Tag: 1682646297-342705 X-HE-Meta: U2FsdGVkX1+vuHUvYh/b83+b900aox3suunSrCVEZ2YV3C+hIS3tO0ENFqDipGBdVtkcA+LFfS4ZlPhZf5eV3zfRI8uBRJCBN/IK2JVqxSKIiVRFoDw5v3JWlCD1OHIVQueehGQoNQoVqJDIkhEyD2veyjSItJ6ovz3MtfjElZdojhLOu4pUaEPUIV1nS3f8+yryIEz0A57Tn1ILhZq0IXTabjQ5LoA8DqOJFHUeV9q+bVyv8CzZi0p3Ot7+ZCy6IgtDQO9YTSKX9jQh7ORO1OclK/k1FWbwHRdAa6trotNwPtvZBh0BcqMx/vqb8s52PN90nJNmi3nJsgrAB9NP1u0DlsYrR3OWxF63xX2/wLmF6p4wvZtKhQwHCMjLKwFdt1KlmU/AtGh9Bs/BKQM67n9e1VfzW77wsu/1PZwNVn/XoLumfTAIxSFv9Pd7XfMHQ8T75sPClznoconvXRTys9WEL5+y1DbYOgtrNFs2vYinRG2BGMqtfI19zJFqI/EXYVzRjC5fCm1LUcBt/slKwOEGQAixB6cUHDylb2cp3/ou4rZBLjGM9xNS9EXiVXP8LZlKyHvoPu8kRaLNDDQqxcxWzsulXzbNCFEHaNuVaTNheUMxqtn4CHUUE53vjdF1DdXQpcVYnUarOSDWnVCocZEoJGGvsG0RvrJTPPYq2POzD/Ae3O1mQlIq4GEn860JUtOuiafR3p4jYNJA2JdyJNGT7P8evvSvdBpM3rUyg2lqobCdOcTkGhfvAJwTA/1SA/aB8JoxbNdMmK/FJKUify9vzGVBrpguJeDMLpeipHB/9n7E6+K6PbxKYjaBkR1N5WfjUoP1AFfSwlZw4o9pqpr7lWL+Cp0gKZMSGvVp5XCehfoyqPDkqFkazHpEHUA3Lqwmwgs5KbyIjw9vzeXrfoM7fKPGph/DJVll0fUzqtpYGq+CSu3K2ZG59Ni4T3+ik1urMX+ujfc/4j5vyM3 xo9NgL9Q VAToaaB3KBaus7Z6P/zOvsFVXsKs/NVlVtNPNUrWXGRbGmZs0UO7Jme9kPhHRjkaLOlmwemGP+OoE60urEYMciN7+2z6HTcOL93xP6ox0S223tec5FabMHd+8O+XDTJ9gc9vP3oxUtBl4t/J6Q7RxRmK9rSYcQ674JWF3oU7Sv6na7LMUKidCtnq9cg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 27 Apr 2023 13:35:31 +0000 Bernd Schubert > Btw, a very hackish way to 'solve' the issue is this > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index cd7aa679c3ee..dd32effb5010 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -373,6 +373,26 @@ static void request_wait_answer(struct fuse_req *req) > int err; > int prev_cpu = task_cpu(current); > > + /* When running over uring and core affined userspace threads, we > + * do not want to let migrate away the request submitting process. > + * Issue is that even after waking up on the right core, processes > + * that have submitted requests might get migrated away, because > + * the ring thread is still doing a bit of work or is in the process > + * to go to sleep. Assumption here is that processes are started on > + * the right core (i.e. idle cores) and can then stay on that core > + * when they come and do file system requests. > + * Another alternative way is to set SCHED_IDLE for ring threads, > + * but that would have an issue if there are other processes keeping > + * the cpu busy. > + * SCHED_IDLE or this hack here result in about factor 3.5 for > + * max meta request performance. > + * > + * Ideal would to tell the scheduler that ring threads are not disturbing > + * that migration away from it should very very rarely happen. > + */ > + if (fc->ring.ready) > + migrate_disable(); > + > if (!fc->no_interrupt) { > /* Any signal may interrupt this */ > err = wait_event_interruptible(req->waitq, > If I understand it correctly, the seesaw workload hint to scheduler looks like the diff below, leaving scheduler free to pull the two players apart across CPU and to migrate anyone. --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -421,6 +421,7 @@ static void __fuse_request_send(struct f /* acquire extra reference, since request is still needed after fuse_request_end() */ __fuse_get_request(req); + current->seesaw = 1; queue_request_and_unlock(fiq, req); request_wait_answer(req); @@ -1229,6 +1230,7 @@ static ssize_t fuse_dev_do_read(struct f fc->max_write)) return -EINVAL; + current->seesaw = 1; restart: for (;;) { spin_lock(&fiq->lock); --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -953,6 +953,7 @@ struct task_struct { /* delay due to memory thrashing */ unsigned in_thrashing:1; #endif + unsigned seesaw:1; unsigned long atomic_flags; /* Flags requiring atomic access. */ --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7424,6 +7424,8 @@ select_task_rq_fair(struct task_struct * if (wake_flags & WF_TTWU) { record_wakee(p); + if (p->seesaw && current->seesaw) + return cpu; if (sched_energy_enabled()) { new_cpu = find_energy_efficient_cpu(p, prev_cpu); if (new_cpu >= 0)