From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7430AC369AB for ; Tue, 15 Apr 2025 15:59:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30DC5280073; Tue, 15 Apr 2025 11:59:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B5ED280064; Tue, 15 Apr 2025 11:59:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B9F7280073; Tue, 15 Apr 2025 11:59:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D1F34280064 for ; Tue, 15 Apr 2025 11:59:48 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0DB035F12E for ; Tue, 15 Apr 2025 15:59:49 +0000 (UTC) X-FDA: 83336738898.27.6A5BE4C Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by imf17.hostedemail.com (Postfix) with ESMTP id 209FE40014 for ; Tue, 15 Apr 2025 15:59:46 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bZqGZESM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744732787; a=rsa-sha256; cv=none; b=axZ8pmBkf7HBV9q6tWnmi8ebK08LgL/2MJi7UJ738XO1E+ULsQJ9FFVMQdWjkvFLx4B+Qd QK8JriOMI/u/rIN0YUCF373QPae4AvvII4J6R/vikgdruGBKsFyGySg+f4N8S3NqI1KaCP amB/yu//9Fva5dfJ/JiQ6GRpZZ0lMqA= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bZqGZESM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744732787; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=arh8IkZ87TlXMPTVH/FNxuR9f4DY9b3/C3NAqBFAa9c=; b=u9JFG6LI50nvmxN+FbCzH2+N3XBURna4zP5NUtznnkq15POtx68zkjUiZ3zaOI7lSQdSZW 8lZluKEqWDiQqn7ZycESzoPjSHRwAeQDUexolmLuHK8ViGWRFIOruiukvbBbe0P5/rurUE 0ULVEqaZNaNJehAMaqC9IE23G6yD/Co= Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-4772f48f516so63394091cf.1 for ; Tue, 15 Apr 2025 08:59:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744732786; x=1745337586; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=arh8IkZ87TlXMPTVH/FNxuR9f4DY9b3/C3NAqBFAa9c=; b=bZqGZESMsp0t/rKKBgqMPdLQvCq0NHV9ErVOZq0UKKU7sIr7BZBPO6YCJXv7W0NqzM c0IW+Qo/y9yYx8laeAxktZHIs+ZuLanARpD29VhQJf5mcYq0kg8kX1gqZJ0l7yd/0w0C 4LvXOeVw+H/niEpL2MFZgJBr5rtQpbCHQ0P9QYJMh4fmtS+eLmQaA2hGwZJ6juVlNNja JCI0hPBPGqZrzLrj5SfY9yv7aNqdZ4H1SxbzcKalrT8ksu+YuoCtZX5ItG0bLKEy3jaE KjbiqYO+rzIO+94tYD9XFa09METZLx4gcpJhBOSPde3OPotZlSCUEdX/Wty/VZpoxBTM 98lA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744732786; x=1745337586; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=arh8IkZ87TlXMPTVH/FNxuR9f4DY9b3/C3NAqBFAa9c=; b=bk4ltGT3wm24rgHNdHY1ckbxqv+wPZtvcmH6HvOeK40HJa3Kv+sV+4RgKXgSDoimb9 KIjVrMqbYBcGE1GJ/hQAdTNwGiYkMi2d8C9G2vkLmcJ01BPVxGpiyp3z/tJwhEHDfnvy eF2CgrdhEHwkx3k5hHbg4o39TPe0hoX5Tu8nYApjs8IP3whkWJe+8g6PUBvUXAXoQUm3 2Yub0JhhJAVOu7SozLZIbl1N8bIVt6KSh2im/s77yzet8CGd1R30tPxN9cW9pJqAkKGN +SsQhfR5e3pgTWxUm5FEHbVlPVL18dsiGoSt8VF5JA4F9yArB38fjFAs6NICAk96pHUL UCcQ== X-Forwarded-Encrypted: i=1; AJvYcCWLQb2v0WcgSqRsF84X55kBbyyubPzJBj9al3OIUudLF6ejJDrRKjyqib69079abHBb/iXpiLf7SA==@kvack.org X-Gm-Message-State: AOJu0Yx9EJx2lpGy56qEXyAwn1iwKQiK1G3H0lyKYMAhqGAeqU19NX5i bQ0nJEaTB0uHn0oO6EhLqX7ptz1naReVdXQbl2lu3RzvBwbSKRxFyaLUgzvK3HDD+idQ036Rw4k 9R/lRLD+lrfLT3klEEWcAuD73p6Q= X-Gm-Gg: ASbGnctorfWJ5RDNgbUJk5zngFvKIVXyggf1vxdbUl78UMlCitszmCqTOuCP1u5VUhI LrfFebGDKOcjAMKMb4M1k1vJgKoPMJ68H7H/qasXupzFICEj7J0SAo0lQWfOj2DOwmwKVo9qTWc XA1nfIz+9CCEIXi0yQXQEq4ndwpVHHRwOy7hyXyA== X-Google-Smtp-Source: AGHT+IGtKI5ZMHMn2uAXp0tTyih3ObFA9uINH2KRKQ3F2B3zr+7NjUkRrsiPjeL8d27TgFKXRCjLJX8GQyMo/0AMIJ4= X-Received: by 2002:ac8:590f:0:b0:477:6e32:aae2 with SMTP id d75a77b69052e-47ad34dce36mr1156381cf.0.1744732786110; Tue, 15 Apr 2025 08:59:46 -0700 (PDT) MIME-Version: 1.0 References: <20250404181443.1363005-1-joannelkoong@gmail.com> <20250404181443.1363005-4-joannelkoong@gmail.com> <7e9b1a40-4708-42a8-b8fc-44fa50227e5b@linux.alibaba.com> <9a3cfb55-faae-4551-9bef-b9650432848a@linux.alibaba.com> In-Reply-To: From: Joanne Koong Date: Tue, 15 Apr 2025 08:59:34 -0700 X-Gm-Features: ATxdqUHM-7cPRPuM5ineWxM-iffBsxHZuzqztvNExJta1io3pILv8v00sqSnPXE Message-ID: Subject: Re: [PATCH v7 3/3] fuse: remove tmp folio for writebacks and internal rb tree To: Jingbo Xu Cc: miklos@szeredi.hu, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, shakeel.butt@linux.dev, david@redhat.com, bernd.schubert@fastmail.fm, ziy@nvidia.com, jlayton@kernel.org, kernel-team@meta.com, Miklos Szeredi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 209FE40014 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: iwg8tpqq3oswz84pjigya3d5yo3gc3mq X-HE-Tag: 1744732786-65317 X-HE-Meta: U2FsdGVkX1/GglCKj6u3R6g1ofu+WExpLXfNfUZ8+P9K7yh5/HfItRNXPeOXCbLIZh+sFF3eGHi2KzfBk8fquk5S/2lEdbEH3oqYxjFkY4J9RO/IfevLxY7i/CI9kOPKnoJfldLOfvO64zdzUqzDA36jCyuz7kJwXvyUzMIFwrlw+aaucozlf+q9UyvJwMHl1iPfghDIc1DtX7QC6PcmW3hAsY3CrwYNK5yVDVNS9LqQw8NkiOaw8qujLttC9azKQAi+R7FJ7TwX+WYFvsSpUE67czpPS8JDM9TFjqXhPJmN9jJjsQ+snmpZV11dxx3+j7jIyuUOqK4PjCBleiddQAWV+YEjNJ5AXpQmq420L6pVi9UgeWBc8db/G6slHbXVjZYbZKg/QIsn9UnjxOg/fj1jgUL87jRCHmzUP7QGzBcdNkhFvAQ+rKngS9qXBGCF4tEHTrE0nQTwDNpEcfWl2DgMEtkBNu8Y7net/AElZcK96Gdp+UqXiYgx1VIM5vUGAosIyOITuRnWHvOOcWRk/MW96nEhUk43ukqGx9hiXromGBxTh4PES9K0xRABjjoNdvF12rBOzpudosdYNDHJbrdkDz3bVFux8s0xBm3XlGF+6S1Ctl0L34YwZT2vtX0AcAzJQDuxafacgwnufZMnYNqfB0HaUCvo+v5yaBr4IfZWrvKvnvuWBw90RWg2gbHGh1djl9qAQG2Zh5LP5Ihb1OC/QXsefIbGq1b/ye+JHV0aKJWF+y6HfBIHaZjbF9Rlef3Ms6/jMN8j85D2ASGePcFizd2r2wp3Z1fy4NIvVo2+N89U3SmGb8YWoarV7+YQPZpv/Ouu95puUefJBXL8LalJVm/zBHuzkp/C0GvziXqIIa9KCcEOzlsMsRy+P5hwkA562IO4+ZuSKmGmvd0dVImm1D0PgL+2XZZZ9sg3kuF7eX+NReEp49InrmI0O0w7pcezm/xSYz1+p0w6EPV +/rHVoAC FWjnXEycybZ9G2q/h3CjMf7jgu/NM2Ev/T9Cq8TmzIvoYC1rhOLQvJ7/UFR+2aDHJk9j36gLIirdb3/AsBpV871U3y/pGaKv9tMBlzmhF2eSPbkB4bDW6iZN28cpocg9v2W0DtGPg6MDUxBNjjcMWJ3fWKhaOCYvrHLRLzhE6UvymEbZ0ttGC1zxFPzUZ0BcJo0bPplMaIrbttZUOZrjnCo2pXV6H2nsvdhMAuMVQbPp2LHlD3xQ+pFveQYAuEAogCQuDB6WCeoek6cxQ9BgRFLa4kwOtvgltJGTs8O5+WUVCO2CqAHIAfqOkSzsKGfss18asm0u7jGHwhETX8ntN+7Q+RLRAvLqJpXJWcCcFffD3kZxOrNszB772F4bawo4IzP8WL6h2Ube153w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 15, 2025 at 12:49=E2=80=AFAM Jingbo Xu wrote: > > Hi Joanne, > > Sorry for the late reply... Hi Jingbo, No worries at all. > > > On 4/11/25 12:11 AM, Joanne Koong wrote: > > On Thu, Apr 10, 2025 at 8:11=E2=80=AFAM Jingbo Xu wrote: > >> > >> On 4/10/25 11:07 PM, Joanne Koong wrote: > >>> On Wed, Apr 9, 2025 at 7:12=E2=80=AFPM Jingbo Xu wrote: > >>>> > >>>> > >>>> > >>>> On 4/10/25 7:47 AM, Joanne Koong wrote: > >>>>> On Tue, Apr 8, 2025 at 7:43=E2=80=AFPM Jingbo Xu wrote: > >>>>>> > >>>>>> Hi Joanne, > >>>>>> > >>>>>> On 4/5/25 2:14 AM, Joanne Koong wrote: > >>>>>>> In the current FUSE writeback design (see commit 3be5a52b30aa > >>>>>>> ("fuse: support writable mmap")), a temp page is allocated for ev= ery > >>>>>>> dirty page to be written back, the contents of the dirty page are= copied over > >>>>>>> to the temp page, and the temp page gets handed to the server to = write back. > >>>>>>> > >>>>>>> This is done so that writeback may be immediately cleared on the = dirty page, > >>>>>>> and this in turn is done in order to mitigate the following deadl= ock scenario > >>>>>>> that may arise if reclaim waits on writeback on the dirty page to= complete: > >>>>>>> * single-threaded FUSE server is in the middle of handling a requ= est > >>>>>>> that needs a memory allocation > >>>>>>> * memory allocation triggers direct reclaim > >>>>>>> * direct reclaim waits on a folio under writeback > >>>>>>> * the FUSE server can't write back the folio since it's stuck in > >>>>>>> direct reclaim > >>>>>>> > >>>>>>> With a recent change that added AS_WRITEBACK_INDETERMINATE and mi= tigates > >>>>>>> the situations described above, FUSE writeback does not need to u= se > >>>>>>> temp pages if it sets AS_WRITEBACK_INDETERMINATE on its inode map= pings. > >>>>>>> > >>>>>>> This commit sets AS_WRITEBACK_INDETERMINATE on the inode mappings > >>>>>>> and removes the temporary pages + extra copying and the internal = rb > >>>>>>> tree. > >>>>>>> > >>>>>>> fio benchmarks -- > >>>>>>> (using averages observed from 10 runs, throwing away outliers) > >>>>>>> > >>>>>>> Setup: > >>>>>>> sudo mount -t tmpfs -o size=3D30G tmpfs ~/tmp_mount > >>>>>>> ./libfuse/build/example/passthrough_ll -o writeback -o max_threa= ds=3D4 -o source=3D~/tmp_mount ~/fuse_mount > >>>>>>> > >>>>>>> fio --name=3Dwriteback --ioengine=3Dsync --rw=3Dwrite --bs=3D{1k,= 4k,1M} --size=3D2G > >>>>>>> --numjobs=3D2 --ramp_time=3D30 --group_reporting=3D1 --directory= =3D/root/fuse_mount > >>>>>>> > >>>>>>> bs =3D 1k 4k 1M > >>>>>>> Before 351 MiB/s 1818 MiB/s 1851 MiB/s > >>>>>>> After 341 MiB/s 2246 MiB/s 2685 MiB/s > >>>>>>> % diff -3% 23% 45% > >>>>>>> > >>>>>>> Signed-off-by: Joanne Koong > >>>>>>> Reviewed-by: Jingbo Xu > >>>>>>> Acked-by: Miklos Szeredi > >>>>>> > >>>>> > >>>>> Hi Jingbo, > >>>>> > >>>>> Thanks for sharing your analysis for this. > >>>>> > >>>>>> Overall this patch LGTM. > >>>>>> > >>>>>> Apart from that, IMO the fi->writectr and fi->queued_writes mechan= ism is > >>>>>> also unneeded then, at least the DIRECT IO routine (i.e. > >>>>> > >>>>> I took a look at fi->writectr and fi->queued_writes and my > >>>>> understanding is that we do still need this. For example, for > >>>>> truncates (I'm looking at fuse_do_setattr()), I think we still need= to > >>>>> prevent concurrent writeback or else the setattr request and the > >>>>> writeback request could race which would result in a mismatch betwe= en > >>>>> the file's reported size and the actual data written to disk. > >>>> > >>>> I haven't looked into the truncate routine yet. I will see it later= . > >>>> > >>>>> > >>>>>> fuse_direct_io()) doesn't need fuse_sync_writes() anymore. That i= s > >>>>>> because after removing the temp page, the DIRECT IO routine has al= ready > >>>>>> been waiting for all inflight WRITE requests, see > >>>>>> > >>>>>> # DIRECT read > >>>>>> generic_file_read_iter > >>>>>> kiocb_write_and_wait > >>>>>> filemap_write_and_wait_range > >>>>> > >>>>> Where do you see generic_file_read_iter() getting called for direct= io reads? > >>>> > >>>> # DIRECT read > >>>> fuse_file_read_iter > >>>> fuse_cache_read_iter > >>>> generic_file_read_iter > >>>> kiocb_write_and_wait > >>>> filemap_write_and_wait_range > >>>> a_ops->direct_IO(),i.e. fuse_direct_IO() > >>>> > >>> > >>> Oh I see, I thought files opened with O_DIRECT automatically call the > >>> .direct_IO handler for reads/writes but you're right, it first goes > >>> through .read_iter / .write_iter handlers, and the .direct_IO handler > >>> only gets invoked through generic_file_read_iter() / > >>> generic_file_direct_write() in mm/filemap.c > >>> > >>> There's two paths for direct io in FUSE: > >>> a) fuse server sets fi->direct_io =3D true when a file is opened, whi= ch > >>> will set the FOPEN_DIRECT_IO bit in ff->open_flags on the kernel side > >>> b) fuse server doesn't set fi->direct_io =3D true, but the client ope= ns > >>> the file with O_DIRECT > >>> > >>> We only go through the stack trace you listed above for the b) case. > >>> For the a) case, we'll hit > >>> > >>> if (ff->open_flags & FOPEN_DIRECT_IO) > >>> return fuse_direct_read_iter(iocb, to); > >>> > >>> and > >>> > >>> if (ff->open_flags & FOPEN_DIRECT_IO) > >>> return fuse_direct_write_iter(iocb, from); > >>> > >>> which will invoke fuse_direct_IO() / fuse_direct_io() without going > >>> through the kiocb_write_and_wait() -> filemap_write_and_wait_range() = / > >>> kiocb_invalidate_pages() -> filemap_write_and_wait_range() you listed > >>> above. > >>> > >>> So for the a) case I think we'd still need the fuse_sync_writes() in > >>> case there's still pending writeback. > >>> > >>> Do you agree with this analysis or am I missing something here? > >> > >> Yeah, that's true. But instead of calling fuse_sync_writes(), we can > >> call filemap_wait_range() or something similar here. > >> > > > > Agreed. Actually, the more I look at this, the more I think we can > > replace all fuse_sync_writes() and get rid of it entirely. > > > I have seen your latest reply that this cleaning up won't be included in > this series, which is okay. > > > > fuse_sync_writes() is called in: > > > > fuse_fsync(): > > /* > > * Start writeback against all dirty pages of the inode, then > > * wait for all outstanding writes, before sending the FSYNC > > * request. > > */ > > err =3D file_write_and_wait_range(file, start, end); > > if (err) > > goto out; > > > > fuse_sync_writes(inode); > > > > /* > > * Due to implementation of fuse writeback > > * file_write_and_wait_range() does not catch errors. > > * We have to do this directly after fuse_sync_writes() > > */ > > err =3D file_check_and_advance_wb_err(file); > > if (err) > > goto out; > > > > > > We can get rid of the fuse_sync_writes() and > > file_check_and_advance_wb_err() entirely since now without temp pages, > > the file_write_and_wait_range() call actually ensures that writeback > > is completed > > > > > > > > fuse_writeback_range(): > > static int fuse_writeback_range(struct inode *inode, loff_t > > start, loff_t end) > > { > > int err =3D > > filemap_write_and_wait_range(inode->i_mapping, start, LLONG_MAX); > > > > if (!err) > > fuse_sync_writes(inode); > > > > return err; > > } > > > > > > We can replace fuse_writeback_range() entirely with > > filemap_write_and_wait_range(). > > > > > > > > fuse_direct_io(): > > if (fopen_direct_io && fc->direct_io_allow_mmap) { > > res =3D filemap_write_and_wait_range(mapping, pos, pos = + > > count - 1); > > if (res) { > > fuse_io_free(ia); > > return res; > > } > > } > > if (!cuse && filemap_range_has_writeback(mapping, pos, (pos + > > count - 1))) { > > if (!write) > > inode_lock(inode); > > fuse_sync_writes(inode); > > if (!write) > > inode_unlock(inode); > > } > > > > > > I think this can just replaced with > > if (fopen_direct_io && (fc->direct_io_allow_mmap || !cu= se)) { > > res =3D filemap_write_and_wait_range(mapping, > > pos, pos + count - 1); > > if (res) { > > fuse_io_free(ia); > > return res; > > } > > } > > Alright. But I would prefer doing this filemap_write_and_wait_range() in > fuse_direct_write_iter() rather than fuse_direct_io() if possible. > > > since for the !fopen_direct_io case, it will already go through > > filemap_write_and_wait_range(), as you mentioned in your previous > > message. I think this also fixes a bug (?) in the original code - in > > the fopen_direct_io && !fc->direct_io_allow_mmap case, I think we > > still need to write out dirty pages first, which we don't currently > > do. > > Nope. In case of fopen_direct_io && !fc->direct_io_allow_mmap, there > won't be any page cache at all, right? > Isn't there still a page cache if the file was previously opened without direct io and then the client opens another handle to that file with direct io? In that case, the pages could still be dirty in the page cache and would need to be written back first, no? Thanks, Joanne > > > -- > Thanks, > Jingbo