From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C908E77188 for ; Wed, 18 Dec 2024 17:37:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CBAB6B0083; Wed, 18 Dec 2024 12:37:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 07BC76B0085; Wed, 18 Dec 2024 12:37:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5E9D6B0088; Wed, 18 Dec 2024 12:37:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C31D56B0083 for ; Wed, 18 Dec 2024 12:37:50 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5AC92438AA for ; Wed, 18 Dec 2024 17:37:50 +0000 (UTC) X-FDA: 82908786534.26.6C5F7A5 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf05.hostedemail.com (Postfix) with ESMTP id 5231A10000F for ; Wed, 18 Dec 2024 17:36:47 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IpbQDc7S; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734543453; a=rsa-sha256; cv=none; b=ZVsmqtYwymkOtCBGMCB3TIZqsUk0/0wxrpTYZJKWj7vIrX1ZEreRLNj6tqc8FsDVtEZnlW 5VmEYshxxHvGcwvLchWeBQ6JuKIYqs1r4pGraJZQ6vd/oi3OLCXby4hAH7+CCaKK+ByhJq q+BzEYkaFQJcQ+hW+NOGQ3jvS6Ls2LM= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IpbQDc7S; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734543453; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4fH7ukBubw8q9Mz+EdgjPyJ9msb32DYRVDX0kJLFFcI=; b=Sex7aCkOfS5LnaixOqUVpxjc9ZRJlNb0Ea2A5Q/m0KtOHG/jkONVv5wgAoJLHAfwmm78pz 9GEEQcJn6yTd2pWC35Up9Wg5n0/hlJiTD9DSax0FnnwiLLZmo640aL9dwkyym6mI6fWATC 5aiUZhjJYdjRGRlhxlZ/r8spRqIiqgI= Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-46785fbb949so73538421cf.3 for ; Wed, 18 Dec 2024 09:37:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734543468; x=1735148268; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4fH7ukBubw8q9Mz+EdgjPyJ9msb32DYRVDX0kJLFFcI=; b=IpbQDc7StzdUXWXqzr7gAfmeYXCiLJJw2knTNRekD06Ysq6OkCZ//hh+nrnsf/7IXu GvWxs9Oe8c8eBWMHlOySmn5k1beyH4/uadWOeU3UvbWuz9N/KAiN8OWhKNqoHjoNrahX KM1b/ipw2TDSMN+ZMbPEtoHSArRFuzseXJKmAEo37cJAJp7LJBpL36hv5pm35Hh3JhFm 4JbNZCWYRUQT255OHqcX9eGUg+VX+VYtZM5s0HsLIy2VfVUzRbOf9mmbhLBU2xHFaLFu pLiM4n2ApCHBXNHRVWgo/laG05uMkrxpbnU7HLNyKliZOC++pigSsGwf04pOasOAPZ8E RtPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734543468; x=1735148268; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4fH7ukBubw8q9Mz+EdgjPyJ9msb32DYRVDX0kJLFFcI=; b=ZV93uel7EOgdRhVlsa29pOnXSiROJ7RMJDhNEb5o2QgfzN6+tQROAV1vTA6jM6L7BB fDGhkf0e5ahhQqoGASQ2IYv4S4FpIhXN31ks/9gpPOlHunETpkegMhMqwDlZ02tZ6zRG UVmrtumgSLhb0mA3bbW6tHL9Zgypal3oOYVODv1S/wDGKdPT5FPacxSTIcHWqY12zKTE xPDnT7pg2pEFx4VL1rolXd1D/fyEWi/QI6XxvKjeQROmEJ+ZWB4uuK1cbIxO0b0RYzg+ emIrFSOZ38v/8QY33MIfAunTqZCDmhiMGEOnNu8vl22khLoiQhwwyYBaVhqMIFMj9LB+ m0MQ== X-Forwarded-Encrypted: i=1; AJvYcCUFVyKHSpl3msPiO9VAVeMhwQMphpXF7//9Ny/v1ewGVBDeSsFLg/AmM1+gbjJatOx/Xcvt9nK68A==@kvack.org X-Gm-Message-State: AOJu0YzJuLcl2njDHpG5PN5671A9JgW9qG10AwUwu4DuWi9KZJapL98Q XQ+dY2SqCSMuctzTe84Hy5ZHoZkzGDXiEG4UYYEBzpIDdYIVVa8vOsTla9CNaDHzhijPCVlGz++ ntJ1GZrHvK6fau71q9/t+a8a3fNc= X-Gm-Gg: ASbGncu8AcXbryEghv+ohjYdhtPhf5oDKg+YtlTnpZI/+w8xygwIfTN/d+RhRsWy9Kl aQDhy8ALUnyBSdGCcZ5eRFJwMfkeENe29OMZBv7w= X-Google-Smtp-Source: AGHT+IEua5mW1kcPCg8I2bjSWXvBp0ap5K4QMStKmb6Gjgo0XnziiDA6snQ0vsHpcLUOFjdpIIvNmhoTsuqGpn8AsYg= X-Received: by 2002:ac8:57c1:0:b0:467:6e45:218d with SMTP id d75a77b69052e-46908dbf364mr63775501cf.3.1734543467618; Wed, 18 Dec 2024 09:37:47 -0800 (PST) MIME-Version: 1.0 References: <20241122232359.429647-1-joannelkoong@gmail.com> In-Reply-To: From: Joanne Koong Date: Wed, 18 Dec 2024 09:37:37 -0800 Message-ID: Subject: Re: [PATCH v6 0/5] fuse: remove temp page copies in writeback To: Shakeel Butt Cc: Miklos Szeredi , Andrew Morton , linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com, josef@toxicpanda.com, bernd.schubert@fastmail.fm, linux-mm@kvack.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 5231A10000F X-Stat-Signature: t44jk756ai8tgxxeocq35y88wiewecdx X-HE-Tag: 1734543407-311312 X-HE-Meta: U2FsdGVkX18UwcS3B+LCQAo1q/GiahHLMYDNnNhxgvrJzEttocfCq8dpKNCWFwsmo/MV/0pTC8tFpMscNX8T7ELnoUtI5OOT1gFcPEx0InbtHgQBJTF1NdluJ0hTf6dLGwixVdn0N440BtBjMLCS4NMapd0oFK8R2Gi9JcXdodMEEl1OFd7L9ISDOyM9v67jb+BNnrZWBXUmQb0T7Crjxa6OVGXfnu3r3towuTMuW9SjVzCTdXM3DzCtErrc7a1TDoZum11Lc+XzWmwy7OW0vn87GVYhuJ0e7Jidjr/RK4m43Ddw4AqUFgBEpbB2k4d0XvZRrKLdJVCAZ0+b3uwfNsmr37ACJDz4+qLckzbPB93HyvVkb/tKMqAXrpsVEXH+ajPx7tUXVMfKeqji/k7v13TBp8BRai+SY5eDX97AavbvhRzhrnAnYMiCZ4WZ1LQq3K+FbwfPME+SfVJ/9SRi4+bGqba3wvgbEjD+jxF2QRjKuyM0SUgEr1I9ORruh+EgnU53H0ta9jFx6DillvCC0dMH41JVAU+rUPU89o4QP484+BupcEb8JMBHiOxKy2S4wtD27xQJEqvHB8yLJjUyrSv+568XSR6G4KWPfdxocpeOc0oKXQT6rb1AKZVHHTpqtBDrrkexx/dBGeutD+QgYzSMQZS93e5XnNlPu/p3WdEWijm9lUrdJYD5+9iDjDqs8UyXacxy6wrp977C6VIIgB8/5G1U55+j3kMFNlK8rZSbzYADJVIRb/u3e4Iz6CNDHyuzz4j3kzlPy1ZrpbIAzV7pKcH4o4BHVd02NLZZ6Y3Lrb+NuIyNaAWQ+/5M218Xz0XDm8ctXx3UdR/Q7uEHZkgrHGm5bmitNsG95XC9lzCFhQJjoJjZKCFLhRAsmQmVWMa7DdVQw0rJJWqlOHm1JnNUfre0GBgjYmU6jfPefYQLWlcTB2+OG2y9Xr+nB2bthqedSVwd8xAt20+abWW Q12Parah 36zroMInHN7wyfLwO6sUD5KhMJUyCLE/5ADsTv+fForCq5IOLTIoZL6k816VTHCXX0cLXL2SEB7NsrrvXbo5TWZhsIcxbTJMjnxtNVlDGTZ08ZT3pPK0SRmDDje2ckTd1eN0puRzohcakGfO2fY32fsNw29T+GE4JVyHzctk1SocfFdZgjwB/Hy0Z5QS2cKsXoPVsMVl68JERCUDlc2kXIKwCntA+ztYtB+kbA9YbgZeaNYgJoIyF39kTFF9TcyESgfwsFy1HW8Icvtw6P+2tdFzM+1dgOicz4svC851AVUBgeG0/8d1FxoCPwzOnXGNOTiEXi9Nwq4VyHwhXcFxvqv003+RReGAu/CtAnw7li4YXIV7GfafswBuFGg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000388, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 13, 2024 at 8:47=E2=80=AFAM Shakeel Butt wrote: > > +Andrew > > On Fri, Dec 13, 2024 at 12:52:44PM +0100, Miklos Szeredi wrote: > > On Sat, 23 Nov 2024 at 00:24, Joanne Koong wro= te: > > > > > > The purpose of this patchset is to help make writeback-cache write > > > performance in FUSE filesystems as fast as possible. > > > > > > In the current FUSE writeback design (see commit 3be5a52b30aa > > > ("fuse: support writable mmap"))), a temp page is allocated for every= dirty > > > page to be written back, the contents of the dirty page are copied ov= er to the > > > temp page, and the temp page gets handed to the server to write back.= This is > > > done so that writeback may be immediately cleared on the dirty page, = and this > > > in turn is done for two reasons: > > > a) in order to mitigate the following deadlock scenario that may aris= e if > > > reclaim waits on writeback on the dirty page to complete (more detail= s can be > > > found in this thread [1]): > > > * single-threaded FUSE server is in the middle of handling a request > > > that needs a memory allocation > > > * memory allocation triggers direct reclaim > > > * direct reclaim waits on a folio under writeback > > > * the FUSE server can't write back the folio since it's stuck in > > > direct reclaim > > > b) in order to unblock internal (eg sync, page compaction) waits on w= riteback > > > without needing the server to complete writing back to disk, which ma= y take > > > an indeterminate amount of time. > > > > > > Allocating and copying dirty pages to temp pages is the biggest perfo= rmance > > > bottleneck for FUSE writeback. This patchset aims to get rid of the t= emp page > > > altogether (which will also allow us to get rid of the internal FUSE = rb tree > > > that is needed to keep track of writeback status on the temp pages). > > > Benchmarks show approximately a 20% improvement in throughput for 4k > > > block-size writes and a 45% improvement for 1M block-size writes. > > > > > > With removing the temp page, writeback state is now only cleared on t= he dirty > > > page after the server has written it back to disk. This may take an > > > indeterminate amount of time. As well, there is also the possibility = of > > > malicious or well-intentioned but buggy servers where writeback may i= n the > > > worst case scenario, never complete. This means that any > > > folio_wait_writeback() on a dirty page belonging to a FUSE filesystem= needs to > > > be carefully audited. > > > > > > In particular, these are the cases that need to be accounted for: > > > * potentially deadlocking in reclaim, as mentioned above > > > * potentially stalling sync(2) > > > * potentially stalling page migration / compaction > > > > > > This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, wh= ich > > > filesystems may set on its inode mappings to indicate that writeback > > > operations may take an indeterminate amount of time to complete. FUSE= will set > > > this flag on its mappings. This patchset adds checks to the critical = parts of > > > reclaim, sync, and page migration logic where writeback may be waited= on. > > > > > > Please note the following: > > > * For sync(2), waiting on writeback will be skipped for FUSE, but thi= s has no > > > effect on existing behavior. Dirty FUSE pages are already not guara= nteed to > > > be written to disk by the time sync(2) returns (eg writeback is cle= ared on > > > the dirty page but the server may not have written out the temp pag= e to disk > > > yet). If the caller wishes to ensure the data has actually been syn= ced to > > > disk, they should use fsync(2)/fdatasync(2) instead. > > > * AS_WRITEBACK_INDETERMINATE does not indicate that the folios should= never be > > > waited on when in writeback. There are some cases where the wait is > > > desirable. For example, for the sync_file_range() syscall, it is fi= ne to > > > wait on the writeback since the caller passes in a fd for the opera= tion. > > > > Looks good, thanks. > > > > Acked-by: Miklos Szeredi > > > > I think this should go via the mm tree. > > Andrew, can you please pick this series up or Joanne can send an updated > version with all Acks/Review tag collected? Let us know what you prefer. > Hi Andrew, Could you let us know your preference or if there's anything else you need from us to proceed? Thanks, Joanne > Thanks, > Shakeel