From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FD25C369B2 for ; Tue, 15 Apr 2025 00:07:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 792522800B4; Mon, 14 Apr 2025 20:07:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 743A02800B3; Mon, 14 Apr 2025 20:07:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60A722800B4; Mon, 14 Apr 2025 20:07:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 39A3E2800B3 for ; Mon, 14 Apr 2025 20:07:07 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B75721C92EA for ; Tue, 15 Apr 2025 00:07:07 +0000 (UTC) X-FDA: 83334338094.23.526D915 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) by imf01.hostedemail.com (Postfix) with ESMTP id 48CDA4000B for ; Tue, 15 Apr 2025 00:07:04 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=LApGYkPW; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf01.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744675626; a=rsa-sha256; cv=none; b=3wDT+ZdamEdV9ZHIlcfkFL+Bbc4+xAPFxOcTG6PPUsEw6xlM4jw71HoL7/iahnwhU7YqsI 2EUGcf78jSxa2qqs2Gssis97ofUiY+7XA+1daflIPfGDuilbOJsJpVt7Z0h0oEVrEH5OXC WuilGy2InshuFmyK5Bk/Cq3Bi+8X1i0= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=LApGYkPW; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf01.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744675626; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vO1VVSpgnIbT9iiI1C/w6Oo36KjjAxjmpD4Oh0dGrtY=; b=QVYN5DCF6AAWXbam3YWyzJKep7bFxnx+N043Qk2UaJluEXukgz2YgW061PjLUgAkoC6lMY P3h8ImQ9bXO/hRzNyIR++AdAkgIhosq7MocWgWo7L577A1fF7ohKT2urLWXz8Kzo+b3T2Q pKfYXx0udK4rN4DQTa4/jl7O0KVjDg0= Date: Mon, 14 Apr 2025 17:06:57 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1744675621; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vO1VVSpgnIbT9iiI1C/w6Oo36KjjAxjmpD4Oh0dGrtY=; b=LApGYkPWsZgEIABfvcBx5pg3tMJ8FHyw63VxpzndBxxx6WOejSOrN+HdNlEOttQH92NoBo J/RZjm36WmGRctMf4HJYmQ7+zYrS6KAUOYwPYJnauVty4jsv1eZcsIgqg/mTn2M56KpYjO uc0dLvsPNWGh9jlFZwM2/Y58ds04tyk= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Joanne Koong Cc: miklos@szeredi.hu, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jefflexu@linux.alibaba.com, david@redhat.com, bernd.schubert@fastmail.fm, ziy@nvidia.com, jlayton@kernel.org, kernel-team@meta.com Subject: Re: [PATCH v8 0/2] fuse: remove temp page copies in writeback Message-ID: <57pojgb4bsesfvbbeit3ohjre5sorcafqs62zszrdgfeyp3qaz@k732xugk53lm> References: <20250414222210.3995795-1-joannelkoong@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: 48CDA4000B X-Rspamd-Server: rspam04 X-Stat-Signature: zt6n9mte5u1rsph1oa3uq3x18i1amyhr X-HE-Tag: 1744675624-745383 X-HE-Meta: U2FsdGVkX1+hdrbmQV0AfN3wq/hL2meqLft1Hm/O03gvnlMad3g7XmBCRfQfjrOGMEHeXDwEC2586R9lN/5RNTtVRSaZ4u97rUcrl6EVhxCaX3Rjiamqz0Lvt1gGim7QaV6wUVgtpK2XtoQ79Ayu1lFRTh52FzkdHivagHaK0kLOH/vx9s5ZPQJ84+HhBgEkKuh/7VNAi72vUKZsNDtizivhvS7xlYpC5kKmBdYXWR54VzqI6Zf/zlbyNcSMVB1JmjRiJhjELJXKlG39kV9CPP8ic2af4F+GbgKB1CSwgQboHwz4c6YvsIW9+wOlWikHWIwuk86c5MrOeaxn4c64dvlS2zKETnRXstmBrF6hSg2E1F51/yfLn+wEVtopUTPadB1RBOanv85AwN5rmpSduPwRLYwtZu+CvGPQvKcaYF6XbLKQJ33tpl0Y9aLNQ6zObb8ZY4slGeGUogv1C8Q31hoKbTsIVO4YehqKXCYRqngWRjgu8eNIZuKqwA1I2G1UAj1JHWl6vXAXz7BvWHM2rz1/O1C4GjuAZ0v2y8VNUH8BgIDkLh7TlrcFH/SCkco5q6XgMf7AY2RmaiQvKx2Iy6BR12cecgbTvNld3ZLxqPNQu/KnWfoYow4QjtsTvC1X8ViCa1BuLoI0+AwWo0Eo0pMcW48kjds2Ij7z2bgyEeK2sabasPefjT+HQYETc1yibLJvzXkujwcZ5RRq+2Vbnepzz6vzZVF/6uMWcnPzSjst86Ypg9hvoFhQCNLLXz4PavUQbjxqy/IUnHrp2bx7JmHN0Db4/W8RhhlJ5/meU5DYpYaZnOQotmFkFCRQTRzOGcq+btGkyfuhabPx7g9Hpj9t4cvh/yKJJAEpBaFApZW+Z/nEo++HjQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 14, 2025 at 04:36:58PM -0700, Joanne Koong wrote: > On Mon, Apr 14, 2025 at 3:47 PM Shakeel Butt wrote: > > > > On Mon, Apr 14, 2025 at 03:22:08PM -0700, Joanne Koong wrote: > > > The purpose of this patchset is to help make writeback in FUSE filesystems as > > > fast as possible. > > > > > > In the current FUSE writeback design (see commit 3be5a52b30aa > > > ("fuse: support writable mmap"))), a temp page is allocated for every dirty > > > page to be written back, the contents of the dirty page are copied over to the > > > temp page, and the temp page gets handed to the server to write back. This is > > > done so that writeback may be immediately cleared on the dirty page, and this > > > in turn is done in order to mitigate the following deadlock scenario that may > > > arise if reclaim waits on writeback on the dirty page to complete (more > > > details > > > can be found in this thread [1]): > > > * single-threaded FUSE server is in the middle of handling a request > > > that needs a memory allocation > > > * memory allocation triggers direct reclaim > > > * direct reclaim waits on a folio under writeback > > > * the FUSE server can't write back the folio since it's stuck in > > > direct reclaim > > > > > > Allocating and copying dirty pages to temp pages is the biggest performance > > > bottleneck for FUSE writeback. This patchset aims to get rid of the temp page > > > altogether (which will also allow us to get rid of the internal FUSE rb tree > > > that is needed to keep track of writeback status on the temp pages). > > > Benchmarks show approximately a 20% improvement in throughput for 4k > > > block-size writes and a 45% improvement for 1M block-size writes. > > > > > > In the current reclaim code, there is one scenario where writeback is waited > > > on, which is the case where the system is running legacy cgroupv1 and reclaim > > > encounters a folio that already has the reclaim flag set and the caller did > > > not have __GFP_FS (or __GFP_IO if swap) set. > > > > > > This patchset adds a new mapping flag, AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, > > > which filesystems may set on its inode mappings to indicate that reclaim > > > should not wait on writeback. FUSE will set this flag on its mappings. Reclaim > > > for the legacy cgroup v1 case described above will skip reclaim of folios with > > > that flag set. With this flag set, now FUSE can remove temp pages altogether. > > > > > > With this change, writeback state is now only cleared on the dirty page after > > > the server has written it back to disk. If the server is deliberately > > > malicious or well-intentioned but buggy, this may stall sync(2) and page > > > migration, but for sync(2), a malicious server may already stall this by not > > > replying to the FUSE_SYNCFS request and for page migration, there are already > > > many easier ways to stall this by having FUSE permanently hold the folio lock. > > > A fuller discussion on this can be found in [2]. Long-term, there needs to be > > > a more comprehensive solution for addressing migration of FUSE pages that > > > handles all scenarios where FUSE may permanently hold the lock, but that is > > > outside the scope of this patchset and will be done as future work. Please > > > also note that this change also now ensures that when sync(2) returns, FUSE > > > filesystems will have persisted writeback changes. > > > > > > For this patchset, it would be ideal if the first patch could be taken by > > > Andrew to the mm tree and the second patch could be taken by Miklos into the > > > fuse tree, as the fuse large folios patchset [3] depends on the second patch. > > > > Why not take both patches through FUSE tree? Second patch has dependency > > on first patch, so there is no need to keep them separate. > > If that's possible, that sounds great to me too. The patchset went > through Andrew's mm tree last time, so I'm not sure if the protocol is > that any/all mm changes need to go through Andrew's tree. This series can go through mm tree or fuse tree but it seems like you plan to do a followup fuse work which requires this series. I would suggest to go through fuse tree. Just let Andrew know and he is mostly fine with it.