From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D7D2C369B2 for ; Mon, 14 Apr 2025 23:37:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 46518280109; Mon, 14 Apr 2025 19:37:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 414BA280108; Mon, 14 Apr 2025 19:37:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28BE3280109; Mon, 14 Apr 2025 19:37:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 080D5280108 for ; Mon, 14 Apr 2025 19:37:11 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C06DD5B1C8 for ; Mon, 14 Apr 2025 23:37:11 +0000 (UTC) X-FDA: 83334262662.11.9DBA07B Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf25.hostedemail.com (Postfix) with ESMTP id E0D5CA000C for ; Mon, 14 Apr 2025 23:37:09 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aq1FB8gW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744673829; a=rsa-sha256; cv=none; b=8QGyd7s2LwPeb32H2UqKgugiI9/KcOBmE5jk2urVSCK6k4XNB9OP4E/VFLwPTsHM01dsNr w4VGTU2qtighiE4yA8PpiLOvG+yytwAriPwLBrYxgIjiqTh+Y4pslWe+BIuk6nkxZJJUd6 4lCFFsNZ6RWmNTZNRrvCBhEONqlxNQE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aq1FB8gW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744673829; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qkEJFtdZi+EzgX1Mal3WJcgq6ZXtnUW/ie4E4ZvuZww=; b=wBd7aOXOXAnaTGz6dNpuhylK9sGFxX5cjgvawcILDJmqI6TUUyQtosTCDT8gKY755djsQ2 NqDNwo9vMLHlTqXObj+bj2Pqh4Hr+Ejqdd78M+85ST8ou/JfbUsnStbrLk69SrYupZPVVX 4NH5e1Or61zJGapnXZ+WVmY7db61N5s= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-4769b16d4fbso27457851cf.2 for ; Mon, 14 Apr 2025 16:37:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744673829; x=1745278629; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qkEJFtdZi+EzgX1Mal3WJcgq6ZXtnUW/ie4E4ZvuZww=; b=aq1FB8gWtDTZvxKC8iSh43nxb61E6jRvlJ1iEpvFmyqwby+1TqdS5Uu7dkBSMlwRAc 4nCBvR5mT27WjX6fm+N+yTJ6CoiePNddUeJmnBpFFF2eMPFtlH3Cv46WuVOKUXxwm4xl doWRwHRddoYj0rV3D9U3tCSTUKYCRk6tz4G10wl1CM5zTzuRCuN4c8Ic4DyFT62tng3z ipQmduNfO2ebsl+5Yrgl4EapAqYQYvFhN/GKK8KGURUVfV0peNFnV0qnUyGGWVlKGsv/ dnHRWSkZx5yHfAVpxFjSDPElX3vrJ+Lg9mUksS9ZWRo6MPMlFAN9skNfo7bzWeVgjXq1 z2CQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744673829; x=1745278629; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qkEJFtdZi+EzgX1Mal3WJcgq6ZXtnUW/ie4E4ZvuZww=; b=OeWwHyfKjq8MhDJ2TxHN5nBK0DAoMbS2WKExqVCLT24k5fzVJa34EVOa5E021ze8cO RKXewgKENaEN3x4KJ236x6MwX9pmN6PbW0kq5y1XPIQjSgDXeOpZAzg2Pswhl3IwzOfA m4E8iUz3aFBc006UiVVUOAesmOOq1C/R2i2RObFQ32DVAFTs0D7VpnyPBP+l2DT+86Eo angfBEUDQ3liWd+XF5VEDUh6ywevL5PBh0sFIM2YQDe/fTlAFdLuzadHtvzrBAFbm3UE B+C8cT4YMLGndIEMn/+3ohLCoRJcFL+h29+80hhyqLWnWFOmt4pnbU26xOgQQRRl7OBe teSw== X-Forwarded-Encrypted: i=1; AJvYcCUR5MLDRLK1OExTLavbyY9XAWm89iAuDqfwtlygFH7J+aXh5iuiGRoo5y5EXgC/Fm9g7wszLX9LhA==@kvack.org X-Gm-Message-State: AOJu0Yw6thaWee9gVbmcMHtkYGzZJcMFAQosoLW8TYdlrR5sDk6XAu7m T2GBppOn7JchWzFXJCbK8iIOgfxy1Vt3HzJRSyOtHf7R6t1EZzGaFk9lRdnIJmsVbmXPtYr60vI msYHPVRzdLuFtC7TTpsSWw1+7YiI= X-Gm-Gg: ASbGncu8iTsW+PKA+fCu4fDZD37W66BlM08qqf7fDAc+AG4UwhE0OqsYSqwIn7AJfG4 1D9Tl6k3yR9WCvy4okoZYYGE8vBRb799zthQJrvAXighXgDyUOvN9mVlJBxa/Y74w+IapxZORjw B/uIJYGavbdEhzxiZIhNppyAWiUpP9H48XtyGH/A== X-Google-Smtp-Source: AGHT+IGrjgxGl1fonqvUMphRraDW6+qDVp27T+I+zxoVUnnOfYSUccDwx7Pa9fqJCO7wPBSKxzCxIMmoZU5rIIaN40k= X-Received: by 2002:ac8:5e10:0:b0:477:6fdd:c429 with SMTP id d75a77b69052e-47977535df0mr191286141cf.10.1744673828979; Mon, 14 Apr 2025 16:37:08 -0700 (PDT) MIME-Version: 1.0 References: <20250414222210.3995795-1-joannelkoong@gmail.com> In-Reply-To: From: Joanne Koong Date: Mon, 14 Apr 2025 16:36:58 -0700 X-Gm-Features: ATxdqUE23w-ZM4jk8nnKPQna5P-H_f0w3LsuVG9TXVHNx4uBCMkRUMmWbBEen-I Message-ID: Subject: Re: [PATCH v8 0/2] fuse: remove temp page copies in writeback To: Shakeel Butt Cc: miklos@szeredi.hu, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jefflexu@linux.alibaba.com, david@redhat.com, bernd.schubert@fastmail.fm, ziy@nvidia.com, jlayton@kernel.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: E0D5CA000C X-Rspamd-Server: rspam04 X-Stat-Signature: 6c9jibbkmjr79suu9zqctaf7xzfc8wg4 X-HE-Tag: 1744673829-46767 X-HE-Meta: U2FsdGVkX19/h42FjTBx7T+tFiW74U+s8jCuP+vBCVEHzm9KkwDwenrd+hZTxMst8fs+qn38wkIETtyPQWgIJ3Wv4+/cUpQ3ey38q57BNogz0PWh7WO5nnkc9SGUjRw+d5KqAh1avlWdKXPxTxe06xDvAUMGACWL9neDvXGopXke4Hn/v2o9Ad+iy897WuLtKdSmau+DuWfNqVSS7m1mHIOkrNzc1Uuo1RKIvJOMigEPOWo3xhVjRI7ZqJq18XhBybuPJTrm11kDvRqZXMCBluYUSXj2ykTqc80FsYYpjKohdJRc0KtROdlfUsKz+Ed64VuVU+BLXh11hNeaKWrg/gYpJSWO4PjDUeARlRsbK3JfMJQ7v36IM020Ln2akHcsx7TVyZEDRPyFIfiDGul2nmQoLjgh7FAD/rMMkuQmCkGU42n/2+Up5rf80IE8CNJWAN7aGNuUY4YxPXN441B6y6EQaj7qaSRhx1cXG1qYndDHfhk3cBMk0T0r1pnaU4WcbtqZreTbeJ+13rYdA8Fp+Not1AQ9pUdDiKwXnJNHhRZ+mTv0g9d7NVPT5i3R5WEBDnFGDeudZcekCZCHRIQR5tF/4HKoOcu+6hI72rKVfuegUBouUz7TnYUlCmuIy+RM8ojHR9SReOMOVBJRryrI4fw0QccspCbath3V9XwSWBUHJcU9fki2D8/UoxmnxrJ1KQiAPQ8VimrFAbYV3czx+tN7GZslUfwW++jU8YlS/NX4J8LfhOSuaV09uUZfMqv/LWKJQimO77H2zgvwnVer143EtRtg9boVbMBWzHw+6xWH1ZSdDpGviQg+eMnFh092pKsRtq+ute9+/V6gmAaRk0AsHuTwLEV5brkz8/TcNMg6tV5JuS1rHfiTo7SPdlGxadB1prkHWFxgWGl7ePgBf58Sbe+dOl+7V038tueN4K7OXzp6SOnlTOM4u8fVc4l67cT8xIw+lzFy6kgilZ1 oUCeJsSb uR6Dp9wknLrQej7TIHLE+1YADhQ2mB8Zv2X7HlzOm2Mvqf/z2z0NmGI4zxm7t23/vMGaaaYQZ2GogFD7izPsJ/6tDdBcJJUnYwdPRngx6unVZYL/sFQHbBDVRACDqDhaFGX1GIl7efvQHkOoI6LXbximX1Wgapkxx7A0PB7DMZJL5ustfL/5FgIM8K2HJsAjgpKBFy0weVfxZhP/Hc1jjiEQDpO61hJdZbYXdTCCuVri22QDKfdY3L/HuVu+9pqyOpQ5e2bD7YMuCX+nL9ZCmQIsELVd+Dx4gE3CqG434PLLb4Z7jLntUKOy+9IzMBxLcgrDw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000018, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 14, 2025 at 3:47=E2=80=AFPM Shakeel Butt wrote: > > On Mon, Apr 14, 2025 at 03:22:08PM -0700, Joanne Koong wrote: > > The purpose of this patchset is to help make writeback in FUSE filesyst= ems as > > fast as possible. > > > > In the current FUSE writeback design (see commit 3be5a52b30aa > > ("fuse: support writable mmap"))), a temp page is allocated for every d= irty > > page to be written back, the contents of the dirty page are copied over= to the > > temp page, and the temp page gets handed to the server to write back. T= his is > > done so that writeback may be immediately cleared on the dirty page, an= d this > > in turn is done in order to mitigate the following deadlock scenario th= at may > > arise if reclaim waits on writeback on the dirty page to complete (more > > details > > can be found in this thread [1]): > > * single-threaded FUSE server is in the middle of handling a request > > that needs a memory allocation > > * memory allocation triggers direct reclaim > > * direct reclaim waits on a folio under writeback > > * the FUSE server can't write back the folio since it's stuck in > > direct reclaim > > > > Allocating and copying dirty pages to temp pages is the biggest perform= ance > > bottleneck for FUSE writeback. This patchset aims to get rid of the tem= p page > > altogether (which will also allow us to get rid of the internal FUSE rb= tree > > that is needed to keep track of writeback status on the temp pages). > > Benchmarks show approximately a 20% improvement in throughput for 4k > > block-size writes and a 45% improvement for 1M block-size writes. > > > > In the current reclaim code, there is one scenario where writeback is w= aited > > on, which is the case where the system is running legacy cgroupv1 and r= eclaim > > encounters a folio that already has the reclaim flag set and the caller= did > > not have __GFP_FS (or __GFP_IO if swap) set. > > > > This patchset adds a new mapping flag, AS_WRITEBACK_MAY_DEADLOCK_ON_REC= LAIM, > > which filesystems may set on its inode mappings to indicate that reclai= m > > should not wait on writeback. FUSE will set this flag on its mappings. = Reclaim > > for the legacy cgroup v1 case described above will skip reclaim of foli= os with > > that flag set. With this flag set, now FUSE can remove temp pages altog= ether. > > > > With this change, writeback state is now only cleared on the dirty page= after > > the server has written it back to disk. If the server is deliberately > > malicious or well-intentioned but buggy, this may stall sync(2) and pag= e > > migration, but for sync(2), a malicious server may already stall this b= y not > > replying to the FUSE_SYNCFS request and for page migration, there are a= lready > > many easier ways to stall this by having FUSE permanently hold the foli= o lock. > > A fuller discussion on this can be found in [2]. Long-term, there needs= to be > > a more comprehensive solution for addressing migration of FUSE pages th= at > > handles all scenarios where FUSE may permanently hold the lock, but tha= t is > > outside the scope of this patchset and will be done as future work. Ple= ase > > also note that this change also now ensures that when sync(2) returns, = FUSE > > filesystems will have persisted writeback changes. > > > > For this patchset, it would be ideal if the first patch could be taken = by > > Andrew to the mm tree and the second patch could be taken by Miklos int= o the > > fuse tree, as the fuse large folios patchset [3] depends on the second = patch. > > Why not take both patches through FUSE tree? Second patch has dependency > on first patch, so there is no need to keep them separate. If that's possible, that sounds great to me too. The patchset went through Andrew's mm tree last time, so I'm not sure if the protocol is that any/all mm changes need to go through Andrew's tree. Thanks, Joanne