From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F5EDC369A2 for ; Wed, 9 Apr 2025 23:47:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27E912800B2; Wed, 9 Apr 2025 19:47:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 255B62800AF; Wed, 9 Apr 2025 19:47:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11DFF2800B2; Wed, 9 Apr 2025 19:47:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E96732800AF for ; Wed, 9 Apr 2025 19:47:56 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D5658AFA37 for ; Wed, 9 Apr 2025 23:47:57 +0000 (UTC) X-FDA: 83316145794.13.17FECB9 Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by imf30.hostedemail.com (Postfix) with ESMTP id 2FA108000B for ; Wed, 9 Apr 2025 23:47:56 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=cn5ti90A; spf=pass (imf30.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744242476; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sH6s526b/H/uV5A/JlFAjOiPD/1m+0j9a+DTV4AyZK8=; b=fEDcY+9C20VjAwF3gL36vpcxV/MCnOImlML09BnCq7hs0tXOL+mepj4PZFgblEYo+BTQN+ 1pgftZ2+BXDJJH8hDaG1C9IqzBrw5IAnu5d2gATVwqCdHt08cJXRL4QnnnxQjh6X9/C4vU ejPcLSjLWjSNonXLMs9Znq+2QjI9HrI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744242476; a=rsa-sha256; cv=none; b=HbrfyqP5WPGiQO05v7peJUJGAeeRBn0WYazOUk58/WI8NMYtUT58kdEojs15INmXK4fZQT kxvR5SxzBX0QWVO7kU/VhGA9zSOLprEgNodJaFbeXOQ6m0v88jbzEMCPbOyr7Wp9BDt4vj v5b2KJjhCGbZ1sBWbWKByFMr8SUN7o4= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=cn5ti90A; spf=pass (imf30.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-476964b2c1dso3070371cf.3 for ; Wed, 09 Apr 2025 16:47:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744242475; x=1744847275; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sH6s526b/H/uV5A/JlFAjOiPD/1m+0j9a+DTV4AyZK8=; b=cn5ti90AbZgxQ7UwOcnLQjCmggRt8LVn6Cw+ZChgY4ahONp+26uNKwZVt57lNi46Dm HCrBnTSM2qRwoHO8u84RM2nlMcbjt5ERe6ibQc/2MhzoRQ3yri0RQJKUNxYC+AuaJgEQ LlvZjAXxekNTUSVd7kW2AmbsDarEcbbIgHWam6+4pgKSLSVKQ/YIy8UkFa92s0VUATlN Dhd7IbzSow2poLWAtj1HJ7EqZocpFedswsVVo6LmWPRcXWKm9S7HxkqCgrzGmCojxYVo tq4WJFjOChbbVKVHa/kXDaLct6E/zag0vSXcFHDtayJMoE9IHfVwqlimjxdHDBspjA86 Oqjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744242475; x=1744847275; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sH6s526b/H/uV5A/JlFAjOiPD/1m+0j9a+DTV4AyZK8=; b=ol8hQ1EPQNMI7FbFOxrrMljU1St8c/AA37IYjszvTe3ZeoRBN7xPfyCgOkb5KZp95W JpUXL3aBSOgVbIWFnN4u5XGQXOVQ9hdNL65neBkjKLHEhVyGTYkIzuq/A5KK/+ZSCAyt +QsgKyWCOvxAsjy03lm0elIJoBWOSHnPcGJWp9fzhVxDQoj3p7E+HY4qqdoq9h4PmjQy 762nthxJXVKJfSLhSK+ZsFByyDYwf0dUem2cCNUDJKyM+leuRHUL6onbgiqXnEMGC49W UdDbt+W0qEO6pbLkFLQyO5bLjcxlHxrQUsL2JApB7lN1gR4K1MFIE9aGyeN/TERuKLT1 AeuQ== X-Forwarded-Encrypted: i=1; AJvYcCVXN8KJxUQ09erIzpVCLcYvXZIXKdTPbYql8DaawTza3q8h5uYr+QZJYWsC2ak6maoUZk6E2nmjNQ==@kvack.org X-Gm-Message-State: AOJu0YwXD2FnRkW/WSCyivuRf8ixAQ7b6f9N/LVCMMAVAVPTugHcy0iE c1GVAMMHhDvK6nryRgFLhlpiZE3DPPSUl7INfLWsdKYWMJQlHYXaEMZkDiFL9ahnLTEu9rmifEl HBQyjfHnaGuXzUvkTQ8ZQ0rTqSMM= X-Gm-Gg: ASbGnct80kbpP88m/vUMmLNuJI6aNBnFFH5Cy9BC5w90bYnuF6hwKjXyOSfSBcAa1Ys ox8uzOIs7P1PnO93gWam06+cwfkEh3mN4wNbTwk2wMOP2cU7FwvrHa8WSh/pRNdKw8lS59H1e37 mNiQfauMmnuEeyie0dkBvrpGs4+qUIfrVWXmEyLQ== X-Google-Smtp-Source: AGHT+IERGk9wevWZ+l8YN4KmfhTtuWPlo1z7SWNtw+b0FP/EVbjZvk/hiqxi/Dtw/Rfy6TIJou3oyTAlEtYj4nRO3Rk= X-Received: by 2002:a05:622a:302:b0:476:ad9d:d4f0 with SMTP id d75a77b69052e-4796cd2b6a5mr11020881cf.48.1744242475178; Wed, 09 Apr 2025 16:47:55 -0700 (PDT) MIME-Version: 1.0 References: <20250404181443.1363005-1-joannelkoong@gmail.com> <20250404181443.1363005-4-joannelkoong@gmail.com> In-Reply-To: From: Joanne Koong Date: Wed, 9 Apr 2025 16:47:43 -0700 X-Gm-Features: ATxdqUH1fd4rusWjsAd3F0sRWY5U7a_9O4LHaVYFSacQdPCkdCpyUId4hSVdGBo Message-ID: Subject: Re: [PATCH v7 3/3] fuse: remove tmp folio for writebacks and internal rb tree To: Jingbo Xu Cc: miklos@szeredi.hu, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, shakeel.butt@linux.dev, david@redhat.com, bernd.schubert@fastmail.fm, ziy@nvidia.com, jlayton@kernel.org, kernel-team@meta.com, Miklos Szeredi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2FA108000B X-Stat-Signature: 39oisyw3g45spoixx5fbf73btwdjuonk X-HE-Tag: 1744242476-217475 X-HE-Meta: U2FsdGVkX1/OaIPE82PQNF78DE/qboGeuaSKi7yKcWSyeWoiU7w50Pgc3kb8DLR7hvsnTsvdg+UnalCZmpAox7O4iZYSXgYvZoup0OIUTJDHyi9csJdQZJgtv/8FiPW4c2cBQ2p0hVdKQOrnj6cT3KLX+1fN7HlDIaRpeMPSKPTk6JNgpN1JKr+Oxm3KddwjdVj8mZbY6mMLJ/7wB6LRAWr6hHVGJ9knbvm0JsKV7VPyX3PlAsqHNxxUSYgE0875nHF6rQ+Xj2RG/A3szePBIUyEaGOm8P2BeioL4hHACHMLZ83LgQqNO+97Zo6xnl+5AFYwmwbtiIPBguVpNXo3YoxAUdwWZ/742vCTcJjvR6+H0peK8AGuPI/6fKAaBWe7ilpgh8AvlmM3vnlXdL5HRbx+YoqPP/ugkl4TDAB7G5U+qQCL1LXZ6a38kdebmDQjwpaDLUQDMKyL//lWSjVn7qylW5PXaVo4dxAVDLfqfO9tBAfRX6foq0SThR/TZeoFVUesfe2u2GaqlbxbME5TnN+eift0LCh1KkxvnYgk7qBghig/GsWa8GG/2JIJBm0qpIlIsrsn4htcKEG/0ZvLfBPUaebY4/8A0lRAyC/PqATdGjP5/zDHvzTZLsNf90/S+ynqC9tFYMaLx9zR35oUWbMM4T+Mj1sQ+hnIyP8LLssSIBkXxukgr7g4ID7AUrpgPP71M5AjvVtohMn9ps9xNcSD9/AQWOIVsiQd2VqwX+XfJI5G0bzZHFVrJqGbl2oEKG4Z6+ML/IOTl6r0beB9kw1a2+3xifu30UUIyuhVfRpwanKZ3FiaJGFw5yc81uEB0Lk6r90n67QnvvluaVN0gEbpQny7Kek3IMyvExSHkAleo5XB+MtzAee1zlp3V9aMhY5ODSuMY6qf5LAm//IEoiF8v+q6CCdFPrtxvj/M6GvTfenofOINk7PJZu8wMJwlvpEt8PgS040= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 8, 2025 at 7:43=E2=80=AFPM Jingbo Xu wrote: > > Hi Joanne, > > On 4/5/25 2:14 AM, Joanne Koong wrote: > > In the current FUSE writeback design (see commit 3be5a52b30aa > > ("fuse: support writable mmap")), a temp page is allocated for every > > dirty page to be written back, the contents of the dirty page are copie= d over > > to the temp page, and the temp page gets handed to the server to write = back. > > > > This is done so that writeback may be immediately cleared on the dirty = page, > > and this in turn is done in order to mitigate the following deadlock sc= enario > > that may arise if reclaim waits on writeback on the dirty page to compl= ete: > > * single-threaded FUSE server is in the middle of handling a request > > that needs a memory allocation > > * memory allocation triggers direct reclaim > > * direct reclaim waits on a folio under writeback > > * the FUSE server can't write back the folio since it's stuck in > > direct reclaim > > > > With a recent change that added AS_WRITEBACK_INDETERMINATE and mitigate= s > > the situations described above, FUSE writeback does not need to use > > temp pages if it sets AS_WRITEBACK_INDETERMINATE on its inode mappings. > > > > This commit sets AS_WRITEBACK_INDETERMINATE on the inode mappings > > and removes the temporary pages + extra copying and the internal rb > > tree. > > > > fio benchmarks -- > > (using averages observed from 10 runs, throwing away outliers) > > > > Setup: > > sudo mount -t tmpfs -o size=3D30G tmpfs ~/tmp_mount > > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=3D4= -o source=3D~/tmp_mount ~/fuse_mount > > > > fio --name=3Dwriteback --ioengine=3Dsync --rw=3Dwrite --bs=3D{1k,4k,1M}= --size=3D2G > > --numjobs=3D2 --ramp_time=3D30 --group_reporting=3D1 --directory=3D/roo= t/fuse_mount > > > > bs =3D 1k 4k 1M > > Before 351 MiB/s 1818 MiB/s 1851 MiB/s > > After 341 MiB/s 2246 MiB/s 2685 MiB/s > > % diff -3% 23% 45% > > > > Signed-off-by: Joanne Koong > > Reviewed-by: Jingbo Xu > > Acked-by: Miklos Szeredi > Hi Jingbo, Thanks for sharing your analysis for this. > Overall this patch LGTM. > > Apart from that, IMO the fi->writectr and fi->queued_writes mechanism is > also unneeded then, at least the DIRECT IO routine (i.e. I took a look at fi->writectr and fi->queued_writes and my understanding is that we do still need this. For example, for truncates (I'm looking at fuse_do_setattr()), I think we still need to prevent concurrent writeback or else the setattr request and the writeback request could race which would result in a mismatch between the file's reported size and the actual data written to disk. > fuse_direct_io()) doesn't need fuse_sync_writes() anymore. That is > because after removing the temp page, the DIRECT IO routine has already > been waiting for all inflight WRITE requests, see > > # DIRECT read > generic_file_read_iter > kiocb_write_and_wait > filemap_write_and_wait_range Where do you see generic_file_read_iter() getting called for direct io read= s? For direct io reads, I'm only seeing fuse_direct_IO() __fuse_direct_read() fuse_direct_io() and fuse_file_read_iter() fuse_direct_read_iter() fuse_direct_IO() / __fuse_direct_read() > > # DIRECT write > generic_file_write_iter > generic_file_direct_write > kiocb_invalidate_pages > filemap_invalidate_pages > filemap_write_and_wait_range Similarly, where do you see generic_file_write_iter() getting called for direct io writes? My understanding is that it'd either go through fuse_file_write_iter() -> fuse_direct_write_iter() or through the fuse_direct_IO() callback. > > The DIRECT write routine will also invalidate the page cache in the > range that is written to, so that the following buffer write needs to > read the page cache back first. The writeback following the buffer write > is much likely after the DIRECT write, so that the writeback won't > conflict with the DIRECT write (i.e. there won't be duplicate WRITE > requests for the same page that are initiated from DIRECT write and > writeback at the same time), which is exactly why fi->writectr and > fi->queued_writes are introduced. Where do you see fi->writectr / fi->queued-writes preventing this race? It looks to me like in the existing code, this race condition you described of direct write invalidating the page cache, then another buffer write reads the page cache and dirties it, then writeback is called on that, and the 2 write requests racing, could still happen? > However it seems that the writeback > won't wait for previous inflight DIRECT WRITE requests, so I'm not much > sure about that. Maybe other folks could offer more insights... My understanding is that these lines if (!cuse && filemap_range_has_writeback(...)) { ... fuse_sync_writes(inode); ... } in fuse_direct_io() is what waits on previous inflight direct write requests to complete before the direct io happens. > > Also fuse_sync_writes() is not needed in fuse_flush() anymore, with > which I'm pretty sure. Why don't we still need this for fuse_flush()? If a caller calls close(), this will call filp_close() filp_flush() filp->f_op->flush() fuse_flush() it seems like we should still be waiting for all writebacks to finish before sending the fuse server the fuse_flush request, no? > > The potential cleanup for fi->writectr and fi->queued_writes could be > offered as following separate patches (if any). > Thanks, Joanne > > > --- > > fs/fuse/file.c | 360 ++++------------------------------------------- > > fs/fuse/fuse_i.h | 3 - > > 2 files changed, 28 insertions(+), 335 deletions(-) > > > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > > index 754378dd9f71..91ada0208863 100644 > > --- a/fs/fuse/file.c > > +++ b/fs/fuse/file.c > > @@ -415,89 +415,11 @@ u64 fuse_lock_owner_id(struct fuse_conn *fc, fl_o= wner_t id) > > > > struct fuse_writepage_args { > > struct fuse_io_args ia; > > - struct rb_node writepages_entry; > > struct list_head queue_entry; > > - struct fuse_writepage_args *next; > > struct inode *inode; > > struct fuse_sync_bucket *bucket; > > }; > > > > -static struct fuse_writepage_args *fuse_find_writeback(struct fuse_ino= de *fi, > > - pgoff_t idx_from, pgoff_t idx= _to) > > -{ > > - struct rb_node *n; > > - > > - n =3D fi->writepages.rb_node; > > - > > - while (n) { > > - struct fuse_writepage_args *wpa; > > - pgoff_t curr_index; > > - > > -- > Thanks, > Jingbo