From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 924F8C369B2 for ; Mon, 14 Apr 2025 20:29:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2080280090; Mon, 14 Apr 2025 16:29:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CA76B280088; Mon, 14 Apr 2025 16:29:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF9CD280090; Mon, 14 Apr 2025 16:29:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8E61C280088 for ; Mon, 14 Apr 2025 16:29:00 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 643C7BA0B5 for ; Mon, 14 Apr 2025 20:29:01 +0000 (UTC) X-FDA: 83333788482.27.A9F6819 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf13.hostedemail.com (Postfix) with ESMTP id 7C8F020003 for ; Mon, 14 Apr 2025 20:28:59 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Zu38jHSF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744662539; a=rsa-sha256; cv=none; b=wJ+BRkF0L/Ariq1z9bO84eXU8xQwk72byT6/mu6tXFhOMhrjNizBBkbOVTw5Dac+1fbXk8 TZdUm7myY+UpbEVAInOyQvZqnEK19veJyzNxyPoT3ysOLh7HZetw/B3VwhuduWRf8xkJSn 4gJnTfcJdQCCTjkRoUJHP3bHUWm12bE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Zu38jHSF; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744662539; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ttWdWafouB0HD66F1+3TDOUDb5fuNwcS4l2VvOPoK1Y=; b=XG1jluMKZvpmgE6aNY8WORTUDk4pJKe3r72TVIgZ96TafWPXqrDRzlJXxcBmYjpytt/wmB LlOKc8HAuNDvOljMsZ3ozcx6Adf6K6+uzdodX//RNzhFUJgRgAe0DD0KPgxtwzQZMdrV6I hg4Gk/mgnt9QLnUFDgsvErtZjF0PnTY= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-47692b9d059so61827141cf.3 for ; Mon, 14 Apr 2025 13:28:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744662539; x=1745267339; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ttWdWafouB0HD66F1+3TDOUDb5fuNwcS4l2VvOPoK1Y=; b=Zu38jHSFC1WZ6mPKG8mos3FjXJWgnh4KzOMWpWP+BhOIBgtM6s0nAL4Lu4k3a1q6rn RD5cG/0L5Q0+8wuCZ0XwOqMSXAGdbGjFkru1E5SINo7uEKFoMhzCvQ1oN19OuH5mrXSV WoftdwCB0t9mC5y7SOQdTesaUcVH8cuEe/UursCB0RGCxiR9U9ZMa+KakA1AJXe0f68F oRhsa4Ja4cmJkXkQflWD0oTTaZY40iC7oPXW+w70xucP9gc7u/9DyZ7rEP222OcrrLwo MDfDUQRclqErj7n3m2tgoS/O/WIgrR0MIm6UYZfRqrDN7sJ73ZGicgNe1Od+gJ0EMd/P k6/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744662539; x=1745267339; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ttWdWafouB0HD66F1+3TDOUDb5fuNwcS4l2VvOPoK1Y=; b=L2Im0+uTwhI8btr9maPJjfTfV7i+ADqeKeDtHW82XHZ1VWQNjm/G6YKs7WSi2jkOQg E205sNza91yWMVxvX7ZaBkjE3PE75uVtRbsUKDrlBvh4yJicyH3qPTatuMLzCjS2w34M 6i76oIslWJq0EfeaSbq6fRvvvGsV6i1kZFkWyab9zDym7Qgodhv96SP1bvGxjbJ2U39R 5RfoJ3mxFM5A99midBwrc7mTxGVkspF6RMZmG1+VfvnKwgxqM9z6auUHCCFgDjfjuwmR 3vwf+rmpptF4Sr6bYkCwgO6k9ee644qPsgMQA2emn+PMyaXVLaqecFrfZHUasfQCMBPs D2bA== X-Forwarded-Encrypted: i=1; AJvYcCVEbpElQYC0nlZ8dN9yS5dQBpWhnya4jiXjBgneYIRdBhmj9Lg2Prq+RpMIZiXs6/m5Aa8qACjVsg==@kvack.org X-Gm-Message-State: AOJu0Ywrp1+1fCduRHeWNO6af4sABH7u/mocdqpdeI+WQ/lW7KxLV11I iOANYgtmE5mUTXdgvRlTFnrzON9vCz8TWNPQN9NPruHekBZByGhUvcL5idBwUJF0t6Xjpnmynuk sOxLN1McUOwtwfcaYxWVC6lRD9I4= X-Gm-Gg: ASbGncvi6MlwuDg3PwIb26+3D3qj/ra7HR5wEKbEpExjNYSJl9dgIj1Ax3eCWgek6fZ cu9moWTOsd5uCFiiNsspCW//2zrDWda/p1KPkR49CwfoFt6Tp+meska0xIRVVQ5MDuV3qmcfHsf uV6GkwfgzxSPyjHGh3nP8RRBbgzIrv5eJp/8AnNg== X-Google-Smtp-Source: AGHT+IHSrB0T2DSpx1i97VmlExtNn2cE0h2L2ynxckzNpsTKqgOJnQSvOClkgH89lh2tz57ZyQwh8KelEHDZaSixoAg= X-Received: by 2002:a05:622a:130b:b0:479:1a3d:25c2 with SMTP id d75a77b69052e-479775e8daamr183165711cf.44.1744662538486; Mon, 14 Apr 2025 13:28:58 -0700 (PDT) MIME-Version: 1.0 References: <20250404181443.1363005-1-joannelkoong@gmail.com> <0e00e8b306620c781868f375a462127d72b26289.camel@kernel.org> In-Reply-To: <0e00e8b306620c781868f375a462127d72b26289.camel@kernel.org> From: Joanne Koong Date: Mon, 14 Apr 2025 13:28:47 -0700 X-Gm-Features: ATxdqUEkQqC5DZ8gLVnvizomRcaBVulIbgqiiVX2ATJ_Ss03yHY8PIOw2e2eH6I Message-ID: Subject: Re: [PATCH v7 0/3] fuse: remove temp page copies in writeback To: Jeff Layton Cc: miklos@szeredi.hu, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jefflexu@linux.alibaba.com, shakeel.butt@linux.dev, david@redhat.com, bernd.schubert@fastmail.fm, ziy@nvidia.com, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7C8F020003 X-Stat-Signature: ycf53umm7chmjs4dha1oz7cy7fci8i17 X-HE-Tag: 1744662539-286469 X-HE-Meta: U2FsdGVkX1+ZMOCymfCYCTASHAMq/H/cN5Rr+NRM48XXDvp2+PF09yFzocUMflpotGP6VXxv9BdQNetj3kmHfFp1+DGcBBzodQmHyuUz0o5LyRgRaGW4oQvTGZs71Wa8MzeYf1CDcdE6GKcB4XsOCPuW4PvN9tZXXxKNvZpWURbHPfFvWdul9zN9VUSBQxyJKn8+ny7GutGbzOC7+qaO+xJSiswdpTCeHTHJUXbdSrgSvEpQiuU826t8d4q364mgJEDL4GocRKejkLub+KJPb8GbZZJ9aaQELOKYSjzlZ4Y5XjC82o0d7l+6XIUt2rmEgqTY1Vv6SQC/Ll6qcMOC47SSBtCDHxAT/RszmWR3ouytTchl8huJYIP9oQ3ZTDdGPUz9xj6fBZbcLeQXQlYoRBfx3TVy3Yww9Oiqm+vV9L/PqhqVA8ViaNkRM0Y0namPWVxYQQneXxluvbFENfu8ceL6kwl3hUFkIcI2hLfP9+OiICfX42uKrMPVGabK/3U+S2kkAPI32g2xEEATUrOIhGQ53dmvx9mEDSCTdjonb8rl5eCpUywDZwHz6wNSMOjJcT4wn4SQDFrIf+51alW6pfVvGwaA4UthOE55ZwBFW/3qZot/qDWEH974IRy81hr3fQKMhnSM2tahIjlew2/fZUnbJZgi7khgfTL7mfwnZnSIWfIUPgjKK1Upc+sRwm14cfoMwIZFUJ1MWeOj9LSMMUhBQFW6ZL/ttMxyanFoRq+IdAaZLKHDrBX8Reri0X7zMH+d8du7dx+E6SMJnVGKewoFPeTC6DQkR0FDgUe60ygybQveSXrqcT7WAJXYhKK64q4hf/6mjmwtCkEsvykIkOQ1S14oy4aeULj5h22DtbVME3lcQ5peD+mXF2Fh0SO4hSUvuf5SMrimZyPc49fJCrYyHEAenlfbHE1kK1nLwGttcIwJRaCisLWasqoVE/rvjuCMgOZqN7rnvIA3boO PH/3857t sg9VlOINPU35gSc7HZztu3ntvO76D5nVfeDgjWZt/d6YZG1mPsoe1SAA0myo5Gd4ssSc3p4G5TvaXcDEdQuq5UUUG0mj8GjBU68CxOrl0/4L1/tt5No3N73ef/ANjU5fjm1WiURVWVJRd83TE5Q0K9q6UCKhWZQEK32adeg0zFfZSgdte946fH2yMRwajRwdqRYSoc4vIUGXXuXn66jwBZMiPGHBa0WxG/KctfjWlniNrbosRCqOHCm59dJJrtUtYX0N8a2iSiM0oJhPxH2DMef3ddf1uw22bQrkH//8pMNa/zX0ZRxq5cAr3oCfbOjvwUJdffHWzuaxCYP07mfkBZ7KIX1zIeTcNECHg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 14, 2025 at 9:21=E2=80=AFAM Jeff Layton wr= ote: > > On Fri, 2025-04-04 at 11:14 -0700, Joanne Koong wrote: > > The purpose of this patchset is to help make writeback in FUSE filesyst= ems as > > fast as possible. > > > > In the current FUSE writeback design (see commit 3be5a52b30aa > > ("fuse: support writable mmap"))), a temp page is allocated for every d= irty > > page to be written back, the contents of the dirty page are copied over= to the > > temp page, and the temp page gets handed to the server to write back. T= his is > > done so that writeback may be immediately cleared on the dirty page, an= d this > > in turn is done in order to mitigate the following deadlock scenario th= at may > > arise if reclaim waits on writeback on the dirty page to complete (more= details > > can be found in this thread [1]): > > * single-threaded FUSE server is in the middle of handling a request > > that needs a memory allocation > > * memory allocation triggers direct reclaim > > * direct reclaim waits on a folio under writeback > > * the FUSE server can't write back the folio since it's stuck in > > direct reclaim > > > > Allocating and copying dirty pages to temp pages is the biggest perform= ance > > bottleneck for FUSE writeback. This patchset aims to get rid of the tem= p page > > altogether (which will also allow us to get rid of the internal FUSE rb= tree > > that is needed to keep track of writeback status on the temp pages). > > Benchmarks show approximately a 20% improvement in throughput for 4k > > block-size writes and a 45% improvement for 1M block-size writes. > > > > In the current reclaim code, there is one scenario where writeback is w= aited > > on, which is the case where the system is running legacy cgroupv1 and r= eclaim > > encounters a folio that already has the reclaim flag set and the caller= did > > not have __GFP_FS (or __GFP_IO if swap) set. > > > > This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, whic= h > > filesystems may set on its inode mappings to indicate that writeback > > operations may take an indeterminate amount of time to complete. FUSE w= ill set > > this flag on its mappings. Reclaim for the legacy cgroup v1 case descri= bed > > above will skip reclaim of folios with that flag set. > > > > With this change, writeback state is now only cleared on the dirty page= after > > the server has written it back to disk. If the server is deliberately > > malicious or well-intentioned but buggy, this may stall sync(2) and pag= e > > migration, but for sync(2), a malicious server may already stall this b= y not > > replying to the FUSE_SYNCFS request and for page migration, there are a= lready > > many easier ways to stall this by having FUSE permanently hold the foli= o lock. > > A fuller discussion on this can be found in [2]. Long-term, there needs= to be > > a more comprehensive solution for addressing migration of FUSE pages th= at > > handles all scenarios where FUSE may permanently hold the lock, but tha= t is > > outside the scope of this patchset and will be done as future work. Ple= ase > > also note that this change also now ensures that when sync(2) returns, = FUSE > > filesystems will have persisted writeback changes. > > > > [1] https://lore.kernel.org/linux-kernel/495d2400-1d96-4924-99d3-8b2952= e05fc3@linux.alibaba.com/ > > [2] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joann= elkoong@gmail.com/ > > > > Changelog > > --------- > > v6: > > https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelko= ong@gmail.com/ > > Changes from v6 -> v7: > > * Drop migration and sync patches, as they are useless if a server is > > determined to be malicious > > > > v5: > > https://lore.kernel.org/linux-fsdevel/20241115224459.427610-1-joannelko= ong@gmail.com/ > > Changes from v5 -> v6: > > * Add Shakeel and Jingbo's reviewed-bys > > * Move folio_end_writeback() to fuse_writepage_finish() (Jingbo) > > * Embed fuse_writepage_finish_stat() logic inline (Jingbo) > > * Remove node_stat NR_WRITEBACK inc/sub (Jingbo) > > > > v4: > > https://lore.kernel.org/linux-fsdevel/20241107235614.3637221-1-joannelk= oong@gmail.com/ > > Changes from v4 -> v5: > > * AS_WRITEBACK_MAY_BLOCK -> AS_WRITEBACK_INDETERMINATE (Shakeel) > > * Drop memory hotplug patch (David and Shakeel) > > * Remove some more kunnecessary writeback waits in fuse code (Jingbo) > > * Make commit message for reclaim patch more concise - drop part about > > deadlock and just focus on how it may stall waits > > > > v3: > > https://lore.kernel.org/linux-fsdevel/20241107191618.2011146-1-joannelk= oong@gmail.com/ > > Changes from v3 -> v4: > > * Use filemap_fdatawait_range() instead of filemap_range_has_writeback(= ) in > > readahead > > > > v2: > > https://lore.kernel.org/linux-fsdevel/20241014182228.1941246-1-joannelk= oong@gmail.com/ > > Changes from v2 -> v3: > > * Account for sync and page migration cases as well (Miklos) > > * Change AS_NO_WRITEBACK_RECLAIM to the more generic AS_WRITEBACK_MAY_B= LOCK > > * For fuse inodes, set mapping_writeback_may_block only if fc->writebac= k_cache > > is enabled > > > > v1: > > https://lore.kernel.org/linux-fsdevel/20241011223434.1307300-1-joannelk= oong@gmail.com/T/#t > > Changes from v1 -> v2: > > * Have flag in "enum mapping_flags" instead of creating asop_flags (Sha= keel) > > * Set fuse inodes to use AS_NO_WRITEBACK_RECLAIM (Shakeel) > > > > Joanne Koong (3): > > mm: add AS_WRITEBACK_INDETERMINATE mapping flag > > mm: skip reclaiming folios in legacy memcg writeback indeterminate > > contexts > > fuse: remove tmp folio for writebacks and internal rb tree > > > > fs/fuse/file.c | 360 ++++------------------------------------ > > fs/fuse/fuse_i.h | 3 - > > include/linux/pagemap.h | 11 ++ > > mm/vmscan.c | 10 +- > > 4 files changed, 46 insertions(+), 338 deletions(-) > > > > This looks sane, and I love that diffstat. > > I also agree with David about changing the flag name to something more > specific. As a kernel engineer, anything with "INDETERMINATE" in the > name gives me the ick. > > Assuming that the only real change in v8 will be the flag name change, > you can add: > > Reviewed-by: Jeff Layton > > Assuming others are ok with this, how do you see this going in? Maybe > Andrew could pick up the mm bits and Miklos could take the FUSE patch? Thanks for the review. The only thing I plan to change for v8 is the flag name and removing the unneeded fuse_sync_writes() call in fuse_flush() that Jingbo pointed out. With v8, I'm hoping the mm bits (first 2 patches) could be picked up by Andrew and that the 3rd patch (the one with FUSE changes) could be taken by Miklos, as the FUSE large folios patchset [1] I will be resending will depend on patch 3. Thanks, Joanne [1] https://lore.kernel.org/linux-fsdevel/20241213221818.322371-1-joannelk= oong@gmail.com/