From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 957E7C369B2 for ; Mon, 14 Apr 2025 22:47:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 652972800EE; Mon, 14 Apr 2025 18:47:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6015E2800A7; Mon, 14 Apr 2025 18:47:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F0592800EE; Mon, 14 Apr 2025 18:47:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 312892800A7 for ; Mon, 14 Apr 2025 18:47:18 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6947A12078F for ; Mon, 14 Apr 2025 22:47:19 +0000 (UTC) X-FDA: 83334136998.10.4797E81 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) by imf30.hostedemail.com (Postfix) with ESMTP id 85DAD80009 for ; Mon, 14 Apr 2025 22:47:17 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=JRl9IPp2; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf30.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744670837; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uIIErBlG5wRoFqOpPjHx999YVfJm+/rYZBXFoFaIvT4=; b=HdaRdRSDkIlP/X8LYrQNRqhHxU/bL1ypZWd3IevsLutLXHeVFzn/4ju50i8cqukI1O7ull l7rvnpW4o+uYIj5FVkpEekTV6+1HYC/U7tPkg2vaR1nmybYXHXb3RlTP4RHi9f35Bolgt8 IWGcEfGEJ4czH5pNOeHP+UGYZjUs2Eg= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=JRl9IPp2; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf30.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744670837; a=rsa-sha256; cv=none; b=fGlFunWD1tMagzDtG7CU6Hmz6evSwrnwslhfhfoEommTrGXc3lO4rH9TbN+RlT9kkdpIAK NEc/V4W33YoOvhWOZ6NXJTElWeZYx/y1v7OKjOjXMP02m0i+uvSET6UIgc6TG/NPQpPmcm ly4Gtijl0XErs3jwxr7LGvI8U/XrCms= Date: Mon, 14 Apr 2025 15:47:09 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1744670835; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uIIErBlG5wRoFqOpPjHx999YVfJm+/rYZBXFoFaIvT4=; b=JRl9IPp2jq0qLMiWcJnyAUjvI+Swn1kHwvThtHYD+G68ctLENM9P3d3Mvpb4ndvdiXgkBg jvLRiSDpB7XZBg8uXth0cYFLcnqtPlhHRYw0SLUEkDVXG5v8YXpbc4PWWj/Ww4HfoaV42L ujJZI+A42nG0ATXnJ++GH1z3hM/YkJY= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Joanne Koong Cc: miklos@szeredi.hu, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, jefflexu@linux.alibaba.com, david@redhat.com, bernd.schubert@fastmail.fm, ziy@nvidia.com, jlayton@kernel.org, kernel-team@meta.com Subject: Re: [PATCH v8 0/2] fuse: remove temp page copies in writeback Message-ID: References: <20250414222210.3995795-1-joannelkoong@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250414222210.3995795-1-joannelkoong@gmail.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 85DAD80009 X-Stat-Signature: uabafty9deq6fgsopj3c7cim5gwoybt7 X-Rspam-User: X-HE-Tag: 1744670837-430236 X-HE-Meta: U2FsdGVkX19ZZTpdQGRHXP7ZrlzQ8yc1wiUlOUHH8rHXQrEt2pC9Eyl2RxDlkppKxFAG2nFMoh9RtnXg/SD9eeCyMYplByXzFZJf2arTSAruIt4dAa39l6Duac9jKsQMIOk+lyEFd1WMH0N7b2WS2VmxWC4YS2uk/yHrKNYBW8W7K0blLZJXW23LFJWgVeLYmj2YX+FOps/EqreSm6oOZPy+lhtU9BS5LCkvJFUGPdRuKZA+vekZzm5RuuKDl7qL37JCQZy5fYbYxpUAr6tFaeCzxrBB+IoRwoBwOI6UjDpeF0PBfmwTso3a4y0oKvYMWiNrjowtc5NR3x83S3U7U2UXp9ZNGQ5sJcO+1KAdPet4M6UlNrUhWGEOtNvyOH1pqHQ95f/C7Ikc11KSrk+GOXQqOec1BD4v23P2DogQr2QCgvQOkAAuK6wddHq0Bwz6v0mAJqZ3umgz0G/QrKAF8soisTeOPosEaZfXER3Vk/gA9h/Wf7crKtGbGBr4dbyvEMNwiKa+hHyCZD3Vt72fE8dhMnYXgJTKsJMMlWHmrAGbGbsTPMV+QMV+HN2nutN/tCjQH41nJtFXJqTysjV42De/cx7Yc44z6ASW6MlHc5anHuFZS4AfecfnGUJlEN9/8VyLm8UXaxfFnuqHa8Ge+2P3V1K//lgfNamzVs74frl54v+p7hMb9d731Dg2angLucUalOuFyC+dTQVum30qcjhOY/fP66Gpa5JzQk37smdXlwyLZJ4p1DyL65HOfJVzcWjpU2JiP3cHe+xH0zBe1E6eZG4IbLKprUCrt6Q+IGMaeVLuODAdoP+CFeAGZpVYTTmS9fjTGEKJzNMLD7ribIoMH7wVU4IcTDzH4OLEGwqVFHe4UDOg67fjajQn1Vcu32x3Oyi29sXzQi9wbKm2BP6dugWjJ745gcBx4rvAttHdL2gytvmi+2XbBS0vZV28MpQcQi18g23kz1uPhRO 8cJidPyE K+wnbW/K5meoGYzeVbxgaW/6ROhgaU82iHjGp4ZyxHoCAIr/DQEObrrrbMjYaMUgA76gROj3SfQ/93BHUlF4rdE+6uZjqP9P90UOauXKC+HUHL14YK9qWErutWuaYxSQCJuKxLwFQk30q7v3w5g/i/pWW2VHATY0hgMrjSVtjWv3bSRAU9S+lbyOmgw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 14, 2025 at 03:22:08PM -0700, Joanne Koong wrote: > The purpose of this patchset is to help make writeback in FUSE filesystems as > fast as possible. > > In the current FUSE writeback design (see commit 3be5a52b30aa > ("fuse: support writable mmap"))), a temp page is allocated for every dirty > page to be written back, the contents of the dirty page are copied over to the > temp page, and the temp page gets handed to the server to write back. This is > done so that writeback may be immediately cleared on the dirty page, and this > in turn is done in order to mitigate the following deadlock scenario that may > arise if reclaim waits on writeback on the dirty page to complete (more > details > can be found in this thread [1]): > * single-threaded FUSE server is in the middle of handling a request > that needs a memory allocation > * memory allocation triggers direct reclaim > * direct reclaim waits on a folio under writeback > * the FUSE server can't write back the folio since it's stuck in > direct reclaim > > Allocating and copying dirty pages to temp pages is the biggest performance > bottleneck for FUSE writeback. This patchset aims to get rid of the temp page > altogether (which will also allow us to get rid of the internal FUSE rb tree > that is needed to keep track of writeback status on the temp pages). > Benchmarks show approximately a 20% improvement in throughput for 4k > block-size writes and a 45% improvement for 1M block-size writes. > > In the current reclaim code, there is one scenario where writeback is waited > on, which is the case where the system is running legacy cgroupv1 and reclaim > encounters a folio that already has the reclaim flag set and the caller did > not have __GFP_FS (or __GFP_IO if swap) set. > > This patchset adds a new mapping flag, AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, > which filesystems may set on its inode mappings to indicate that reclaim > should not wait on writeback. FUSE will set this flag on its mappings. Reclaim > for the legacy cgroup v1 case described above will skip reclaim of folios with > that flag set. With this flag set, now FUSE can remove temp pages altogether. > > With this change, writeback state is now only cleared on the dirty page after > the server has written it back to disk. If the server is deliberately > malicious or well-intentioned but buggy, this may stall sync(2) and page > migration, but for sync(2), a malicious server may already stall this by not > replying to the FUSE_SYNCFS request and for page migration, there are already > many easier ways to stall this by having FUSE permanently hold the folio lock. > A fuller discussion on this can be found in [2]. Long-term, there needs to be > a more comprehensive solution for addressing migration of FUSE pages that > handles all scenarios where FUSE may permanently hold the lock, but that is > outside the scope of this patchset and will be done as future work. Please > also note that this change also now ensures that when sync(2) returns, FUSE > filesystems will have persisted writeback changes. > > For this patchset, it would be ideal if the first patch could be taken by > Andrew to the mm tree and the second patch could be taken by Miklos into the > fuse tree, as the fuse large folios patchset [3] depends on the second patch. Why not take both patches through FUSE tree? Second patch has dependency on first patch, so there is no need to keep them separate.