From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82690E7717F for ; Thu, 12 Dec 2024 21:55:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 053386B00A4; Thu, 12 Dec 2024 16:55:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0033E6B00A6; Thu, 12 Dec 2024 16:55:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E34856B00A7; Thu, 12 Dec 2024 16:55:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C53366B00A4 for ; Thu, 12 Dec 2024 16:55:28 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4323D120844 for ; Thu, 12 Dec 2024 21:55:28 +0000 (UTC) X-FDA: 82887663138.23.CA30084 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf13.hostedemail.com (Postfix) with ESMTP id E4C8E2000D for ; Thu, 12 Dec 2024 21:55:01 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=R3aDk0nG; spf=pass (imf13.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734040510; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aYOz2EtXsxscCq7snwg9gISQ03O0oecfZCV5i8i6cfo=; b=MgDTROewN1iucFA+wLmnvWT2nQqDHoYE3Wd2OcrIlyhxkUjTemLjpyfTGT5FW4I3rRgAu1 GVeHpFDCHlFTEi0g+9DNPcgQNmHp+JehKNpwbobULd/ThEn3DYzeUMFpBrOrh59q6xaLSa NHDnfl32MCpUYRserHJPZ9prHvBTpIE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734040510; a=rsa-sha256; cv=none; b=GisWfwc7ttcqW1FpjMBDFnVA0Yf4UDATA8QjpwJ+vsIJGl7TJdqeNZQ6+BFEcJSH3rgAAW hRvmhwUHAwp+Sks/i0BgRUeA2O1vLXPbO9jzYgAwcR5+956V/2iiArHE7GF4htDEkqzmeH 7kpXlT53smMu7DBvdObIWxy998P45xc= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=R3aDk0nG; spf=pass (imf13.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-46753242ef1so16289781cf.1 for ; Thu, 12 Dec 2024 13:55:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734040525; x=1734645325; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=aYOz2EtXsxscCq7snwg9gISQ03O0oecfZCV5i8i6cfo=; b=R3aDk0nGJGWhrewj5b/YRrtarAlu0HZNgeSLg33ut1H8/mr7hkzlmbCQtRwPiztlN8 fmI1IqavyRJy+q7cCad42kdFBLeNQs5CvVCoZh8iHg55JnJkgFDh7KzTc2mVkVxi3mxm yaInrMWQytyiveMfbKyFcBEROwYDs9NUC0WqbOSGKyomyhbg+X1+Omzk37PsD1oTZxLr 4ISt4tfsEQGPBx3BHSZyv8WsbfKVL0beKrM39OI8fRoLYgDIMKBRYyMPDE65YP5ZH/E9 rwaQIu86mAYaVqIw+TH53qRwbEaOQsHcQWPR3G0/AmZJKE8Jhgrcnfk8TZ8afkfZXFK6 uXhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734040525; x=1734645325; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aYOz2EtXsxscCq7snwg9gISQ03O0oecfZCV5i8i6cfo=; b=cKXfTxTbxLUbCGtrNT3WIbRmyLlGegvhUrtnqbV1Gl8KX5r+Ue+RBBpYwBMsDIFwOp LVwiS+KblQLcEXmNuVbbOff+y79Sc89p9fPbeRN35oavVuHw8F2TB0MzPRkQ+m5Wayhb hr4kxbyXGVsrvXYm/ANnswPPwmn61LEYcBkvr8uZy6z6Rp6R9tofZxJur+Y5nmSSvk8d zv2eqZk7lF9Cnqf9XTLos8zS1pFhCOXoiHeQ3MoFRZ0EMEQRNseQnCKIkgIzLuqBUhMH Zky7hrYGcUr5QB67R/vyfN3bxbFNScVX+Ey/KLrM2LY0SFoC/tbfG6YLyEmI9f8Hblq9 rSVw== X-Forwarded-Encrypted: i=1; AJvYcCXDmhOyFU4/ODc31DpkedSOujy6CDQ37n8kwUICQOvppl/gyaaP2PG5wgaezBu3oTZFGCVgRErjFA==@kvack.org X-Gm-Message-State: AOJu0YwyYsI98bZ/2XqeZ/Mxi/mjkTBJDVxCBDyuaoGbj7pn1BeWG36G quaXQASFk/et0t5MlaxFfzwS5O08LuoodYfOOM0Q8ldumAmnaP7g4gWgXYDX0crFxdyTqeAe/SG UmBBmT4w0Be+RzLFnddUhw6Obldw= X-Gm-Gg: ASbGncviEAV4e7dH+MdB44f3LvOnulnev/H8c99jm/raq3s3VtT+Cqx2Ka8hzl3P80Q eerH3sibN2VATT1y1w/oDzNtGEULdkNaOl21OlXkFcte1X1ZYPdOu X-Google-Smtp-Source: AGHT+IFdDU0spNTAwnjdShyrZ8l3zwPUvJqmRzczAM5hgXNB6sFDCRmmBrK5ZHO0603S3eQ6qoGl9iBMyDOZgZ1IyKw= X-Received: by 2002:a05:622a:304:b0:467:5d0b:c750 with SMTP id d75a77b69052e-467a575607fmr3062781cf.22.1734040525424; Thu, 12 Dec 2024 13:55:25 -0800 (PST) MIME-Version: 1.0 References: <20241122232359.429647-1-joannelkoong@gmail.com> In-Reply-To: <20241122232359.429647-1-joannelkoong@gmail.com> From: Joanne Koong Date: Thu, 12 Dec 2024 13:55:14 -0800 Message-ID: Subject: Re: [PATCH v6 0/5] fuse: remove temp page copies in writeback To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, bernd.schubert@fastmail.fm, linux-mm@kvack.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: E4C8E2000D X-Stat-Signature: bzeaf139wxrfi6kmyuh155t1bjaizyij X-Rspam-User: X-HE-Tag: 1734040501-43533 X-HE-Meta: U2FsdGVkX1+QjdGOXCz1OpolXctFPdVvAQS6ogen1j3LlV/KGU0Zx43LGX2og46LGjoo1dIVQasNz6m9fBnV90rMNAeq328GODJ86n4OwVjO1l/dIrsjYJcTWI8LDbXUMNBr8LK8WcVvjv/Qq7k5GJvTwfE67RN9/19q3dqeFMDng+uIxnwBgMR0eE+a9LUWhkT3G21FFiJACzYKqQp8frZI+IEcRHrt9AzEJ+B/yh1MxgsREYpY/FkgvkZu2Pg4QfYfP61+TYlrG+SSkacu4/RtIrNIAZtEWoOj98q+a6flXf/e5qsjjfnfxTFiHNgN5+q6tojiRyIgxhBzJcNzw8yUzuSXDlwXPhvJijrG9vqmoffzNq+xGEWM8yFk3xyBXomJjbxkc6gm/0JQt6FuNaAqvOdV18s3P/1aPJbP/LV8T60NqlukNcoMlzZYGXGDVNwhIBGIBST9BtLKpYZLnavH2NjFYhZqXN/op4WXR47g6b8QmtWyybmN2SJo0WXwzu+bA10oYGB/ue85igc7MRgW5ZsnlGFkoe1o/FSgn3YnRh8QSEpTS4ZNaXm5R8Bd3RkaZi4lP90ep5f997vchccNoBfRanvtKdcHhcpYxN3bEj8X3pSaVG5u3I9gdBSnxFTwuewEQ6roG69BRKutTVGF9psiKooj5+Tj2UIiyHxe9EU0GOpOwMPyUIInU5Lk5697Ctc6JYAWPc+jHryJRch3FSQ3hHq9TqzZTSRYV+L++KRtxLt4KqX/AEg3s1zLFqg+1UbgGMAVMYbxIRql96C0CN/N37S4diMD57hvolraeqGN8duOAq3kfYFHq9U+WllZoiVBxvY8sNQwbfG8zyirfyNuEZVN4GrmD3CJ4Jls5Flm6gK4gUYZv+YHCEoCvuklIyGtyzR9FcZfky8bd34C+bbkBdZfbUB/xRDkBFwLPeTXxKYEZenOXkfEudgY/usDCqnN2i8qg7E94Jo nrf+dADB xyw7Cey6jbsMuKaJVwDugvN8GjZ941fJW7wE13/KZan5F7/4R4ysC0vKx3g2poRyjy0Y727+j3Mx8EMQMGKtlKJNaFfZcsiCfYEUk++IFj8S9XroCV+EuUSQo/96DzZr2L5jwUdWpsjlUJ3ophnWW4ERouaoVNWmRpjxjuGRPJzgAIX5OqbxcZFh1u0l1mXBid+BPAcS3+Auff7AzVlkR+e2ljU69a791RZK/q434ca5oUx1yuBrkHjE0N3vyVjdwfROpn608IVuA7qEiFUoPcOkNVGIRSoa1B8FlvaEDo6U5oIw5ughVy0VxLO2QvcWIp4/UiI4GHhrL4vAPfRwHbKnQC/aONeBOlkyZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000149, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Nov 22, 2024 at 3:24=E2=80=AFPM Joanne Koong wrote: > > The purpose of this patchset is to help make writeback-cache write > performance in FUSE filesystems as fast as possible. > > In the current FUSE writeback design (see commit 3be5a52b30aa > ("fuse: support writable mmap"))), a temp page is allocated for every dir= ty > page to be written back, the contents of the dirty page are copied over t= o the > temp page, and the temp page gets handed to the server to write back. Thi= s is > done so that writeback may be immediately cleared on the dirty page, and = this > in turn is done for two reasons: > a) in order to mitigate the following deadlock scenario that may arise if > reclaim waits on writeback on the dirty page to complete (more details ca= n be > found in this thread [1]): > * single-threaded FUSE server is in the middle of handling a request > that needs a memory allocation > * memory allocation triggers direct reclaim > * direct reclaim waits on a folio under writeback > * the FUSE server can't write back the folio since it's stuck in > direct reclaim > b) in order to unblock internal (eg sync, page compaction) waits on write= back > without needing the server to complete writing back to disk, which may ta= ke > an indeterminate amount of time. > > Allocating and copying dirty pages to temp pages is the biggest performan= ce > bottleneck for FUSE writeback. This patchset aims to get rid of the temp = page > altogether (which will also allow us to get rid of the internal FUSE rb t= ree > that is needed to keep track of writeback status on the temp pages). > Benchmarks show approximately a 20% improvement in throughput for 4k > block-size writes and a 45% improvement for 1M block-size writes. > > With removing the temp page, writeback state is now only cleared on the d= irty > page after the server has written it back to disk. This may take an > indeterminate amount of time. As well, there is also the possibility of > malicious or well-intentioned but buggy servers where writeback may in th= e > worst case scenario, never complete. This means that any > folio_wait_writeback() on a dirty page belonging to a FUSE filesystem nee= ds to > be carefully audited. > > In particular, these are the cases that need to be accounted for: > * potentially deadlocking in reclaim, as mentioned above > * potentially stalling sync(2) > * potentially stalling page migration / compaction > > This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which > filesystems may set on its inode mappings to indicate that writeback > operations may take an indeterminate amount of time to complete. FUSE wil= l set > this flag on its mappings. This patchset adds checks to the critical part= s of > reclaim, sync, and page migration logic where writeback may be waited on. > > Please note the following: > * For sync(2), waiting on writeback will be skipped for FUSE, but this ha= s no > effect on existing behavior. Dirty FUSE pages are already not guarantee= d to > be written to disk by the time sync(2) returns (eg writeback is cleared= on > the dirty page but the server may not have written out the temp page to= disk > yet). If the caller wishes to ensure the data has actually been synced = to > disk, they should use fsync(2)/fdatasync(2) instead. > * AS_WRITEBACK_INDETERMINATE does not indicate that the folios should nev= er be > waited on when in writeback. There are some cases where the wait is > desirable. For example, for the sync_file_range() syscall, it is fine t= o > wait on the writeback since the caller passes in a fd for the operation= . > > [1] > https://lore.kernel.org/linux-kernel/495d2400-1d96-4924-99d3-8b2952e05fc3= @linux.alibaba.com/ > > Changelog > --------- > v5: > https://lore.kernel.org/linux-fsdevel/20241115224459.427610-1-joannelkoon= g@gmail.com/ > Changes from v5 -> v6: > * Add Shakeel and Jingbo's reviewed-bys > * Move folio_end_writeback() to fuse_writepage_finish() (Jingbo) > * Embed fuse_writepage_finish_stat() logic inline (Jingbo) > * Remove node_stat NR_WRITEBACK inc/sub (Jingbo) > > v4: > https://lore.kernel.org/linux-fsdevel/20241107235614.3637221-1-joannelkoo= ng@gmail.com/ > Changes from v4 -> v5: > * AS_WRITEBACK_MAY_BLOCK -> AS_WRITEBACK_INDETERMINATE (Shakeel) > * Drop memory hotplug patch (David and Shakeel) > * Remove some more kunnecessary writeback waits in fuse code (Jingbo) > * Make commit message for reclaim patch more concise - drop part about > deadlock and just focus on how it may stall waits > > v3: > https://lore.kernel.org/linux-fsdevel/20241107191618.2011146-1-joannelkoo= ng@gmail.com/ > Changes from v3 -> v4: > * Use filemap_fdatawait_range() instead of filemap_range_has_writeback() = in > readahead > > v2: > https://lore.kernel.org/linux-fsdevel/20241014182228.1941246-1-joannelkoo= ng@gmail.com/ > Changes from v2 -> v3: > * Account for sync and page migration cases as well (Miklos) > * Change AS_NO_WRITEBACK_RECLAIM to the more generic AS_WRITEBACK_MAY_BLO= CK > * For fuse inodes, set mapping_writeback_may_block only if fc->writeback_= cache > is enabled > > v1: > https://lore.kernel.org/linux-fsdevel/20241011223434.1307300-1-joannelkoo= ng@gmail.com/T/#t > Changes from v1 -> v2: > * Have flag in "enum mapping_flags" instead of creating asop_flags (Shake= el) > * Set fuse inodes to use AS_NO_WRITEBACK_RECLAIM (Shakeel) > > Joanne Koong (5): > mm: add AS_WRITEBACK_INDETERMINATE mapping flag > mm: skip reclaiming folios in legacy memcg writeback indeterminate > contexts > fs/writeback: in wait_sb_inodes(), skip wait for > AS_WRITEBACK_INDETERMINATE mappings > mm/migrate: skip migrating folios under writeback with > AS_WRITEBACK_INDETERMINATE mappings > fuse: remove tmp folio for writebacks and internal rb tree > > fs/fs-writeback.c | 3 + > fs/fuse/file.c | 360 ++++------------------------------------ > fs/fuse/fuse_i.h | 3 - > include/linux/pagemap.h | 11 ++ > mm/migrate.c | 5 +- > mm/vmscan.c | 10 +- > 6 files changed, 53 insertions(+), 339 deletions(-) > Miklos, may I get your thoughts on this patchset? Thanks, Joanne > -- > 2.43.5 >