From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48D42E7717F for ; Fri, 13 Dec 2024 16:47:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E0686B008C; Fri, 13 Dec 2024 11:47:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9904C6B0092; Fri, 13 Dec 2024 11:47:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87F266B0093; Fri, 13 Dec 2024 11:47:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6781F6B008C for ; Fri, 13 Dec 2024 11:47:48 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 130F1141086 for ; Fri, 13 Dec 2024 16:47:48 +0000 (UTC) X-FDA: 82890516828.18.FFEAFDF Received: from out-173.mta1.migadu.com (out-173.mta1.migadu.com [95.215.58.173]) by imf16.hostedemail.com (Postfix) with ESMTP id E643F18000D for ; Fri, 13 Dec 2024 16:47:19 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=EsiplEy7; spf=pass (imf16.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.173 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734108454; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LKbanjR8vOBdA+hsZLiBqN3TO/TNUVm/rbUUnKN4CJU=; b=WUOofu1BJyWJuCxZqmIbQy+QkiRwVrxf5seVAQ9dD49duYuBTPAZqrV5Mwp2rDgdADY3wE 970DcWRkoi3ovAVH1ddNsVaE8OI5MtFBz5USp7dNzZpvTvebN1bOfVo5RKn0eIH8mUadfA cgndMxJ2AauEpfQMMzyT29ZHnk1dIdw= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=EsiplEy7; spf=pass (imf16.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.173 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734108454; a=rsa-sha256; cv=none; b=VGfSks7rqmub0LslsC8G830IHH31qo4vP5EHLHdmobjAH7afgueXiCDCMXuWvLgC9mJMT/ Qrqu9qmhwxp8+QZOBVvXJmOPjauMLzhohcjw/+jBA3BR7x++aE+4Fa6cfj8JUJkK9YFI5T izP/1YSLEpg7wx4A/GOUAS7+I3uHe3A= Date: Fri, 13 Dec 2024 08:47:38 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1734108464; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=LKbanjR8vOBdA+hsZLiBqN3TO/TNUVm/rbUUnKN4CJU=; b=EsiplEy7RPANZWqFzt3WlFaXcQl1ND/87bm5CDu1XqRLANcT/h9VVEaG20RY17p3ac5v0/ yHR1hP+rVuo1kWzfLwFQWbuLlmhg/Ixkc0XERtNU6tFxHnALH1zqv4g0eBFeRus4fNTwRO vt0cO/CY4qi612HFpDSa0MdrZ/8V3co= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Miklos Szeredi , Andrew Morton Cc: Joanne Koong , linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com, josef@toxicpanda.com, bernd.schubert@fastmail.fm, linux-mm@kvack.org, kernel-team@meta.com Subject: Re: [PATCH v6 0/5] fuse: remove temp page copies in writeback Message-ID: References: <20241122232359.429647-1-joannelkoong@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: E643F18000D X-Stat-Signature: weqdcfqpb36gjimrxurfr8yjb7wwiity X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1734108439-215806 X-HE-Meta: U2FsdGVkX19SbvVvf/BLSn04HGPtpBpoHIEH6hbpkbirZiLd1VGRGvIdE99akUy8Wu2BWLzOUI3C2PYIOom6IHViWIg5I5mOatJWUYvWna55xs3NhSLCpVFgJElUGSOWcKYHrL6hSe1Z7iTfFhT1hpkQD8b6RKzoD7Oqm4HQaYRorV2QuWiPmsCNaHMMtbVFKbWOIAdXu5nmLiBd7d472VDQED9Ant9j7IkYdSeVZxk8LRwoRrmOw9uCxtN28hcVtUWm2SHTfEuXySP/RcFQNUdBnPT81vhmaUSYezc9bu9M9JUiXwI0GKReKFXkkoJkjUzh2v9gt5pkoj6E3aAX9om2gGl8N11UYTPp6KYlBtVc6MSl5XxqgKxMZvkMDCzNhvWqp5t/iVbu3tPF5QLXYPpHH7OpydcI/Yo6avDweGoY3mAifVdGCTwOvvG8Gfhb0XYTg1FuYnlKGNRYTb7MmaEota6SxTtYKdoqSTW+kBXuiGGcfQKvt9mrPIj7xwamZx1GK7rixQX/OxC5IrdOHwHKi9CBQ4HFWEBFdhzKfI1HIkgFNyd8oxYQ4tt3amUZ+bZQefvgNLTPd7vci9zJNbAZrn7enTnCaNlnHSF5deqfYypX0rR/ZLJqsLhuKOXw9bHF9LZvkjdsaO317vwKF9BwG1M4eRdgD5K8nmyg2xn3CnOS9Vlqw7dC8s5tmGMs4qC+aFnJREW1Ad6CBc7VIWU44Yvfrsm20Y6kPCL4E3kG5R7LgVMiOneKeKiCQPhe51dK1K6eFTxhnce6lqm7Nm3XSuxZrwQ9hjLGUQvwCHEvJc/3JzWdGvZOhX8lzsNaNnIv3C5eNO7u0eAQqZdT5foSLZiIVfLoU+mP3sKXYlJz3BWHLurqR5m9HfwoA9EHBJ/3omeGayGLsvmkGVCm5e4QMqnnoXRO5iuRsk+83Aa2OI+0KRRwycOKnqx4GRzQz6CSikEfMLsl3ZmrsvU g4FVSRrV C3hjxW7mcdFLO1hkZq7v69B4c94ETgoCpaMfkzc1PeCJYJjZLDDPkCXUjRodVo9tcKi3ZjLdds/IiJT7pEXtkSI+m60ldpx9fwOVXowRalh6buYEnNoYssjzMKs1Sxll5YEgouCcb1p7fmt9Q8Y4f/eWUVjhbwLwt2cU/1UJMeIpwZgBgC+tkq6EoGgcIADeJG81U X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: +Andrew On Fri, Dec 13, 2024 at 12:52:44PM +0100, Miklos Szeredi wrote: > On Sat, 23 Nov 2024 at 00:24, Joanne Koong wrote: > > > > The purpose of this patchset is to help make writeback-cache write > > performance in FUSE filesystems as fast as possible. > > > > In the current FUSE writeback design (see commit 3be5a52b30aa > > ("fuse: support writable mmap"))), a temp page is allocated for every dirty > > page to be written back, the contents of the dirty page are copied over to the > > temp page, and the temp page gets handed to the server to write back. This is > > done so that writeback may be immediately cleared on the dirty page, and this > > in turn is done for two reasons: > > a) in order to mitigate the following deadlock scenario that may arise if > > reclaim waits on writeback on the dirty page to complete (more details can be > > found in this thread [1]): > > * single-threaded FUSE server is in the middle of handling a request > > that needs a memory allocation > > * memory allocation triggers direct reclaim > > * direct reclaim waits on a folio under writeback > > * the FUSE server can't write back the folio since it's stuck in > > direct reclaim > > b) in order to unblock internal (eg sync, page compaction) waits on writeback > > without needing the server to complete writing back to disk, which may take > > an indeterminate amount of time. > > > > Allocating and copying dirty pages to temp pages is the biggest performance > > bottleneck for FUSE writeback. This patchset aims to get rid of the temp page > > altogether (which will also allow us to get rid of the internal FUSE rb tree > > that is needed to keep track of writeback status on the temp pages). > > Benchmarks show approximately a 20% improvement in throughput for 4k > > block-size writes and a 45% improvement for 1M block-size writes. > > > > With removing the temp page, writeback state is now only cleared on the dirty > > page after the server has written it back to disk. This may take an > > indeterminate amount of time. As well, there is also the possibility of > > malicious or well-intentioned but buggy servers where writeback may in the > > worst case scenario, never complete. This means that any > > folio_wait_writeback() on a dirty page belonging to a FUSE filesystem needs to > > be carefully audited. > > > > In particular, these are the cases that need to be accounted for: > > * potentially deadlocking in reclaim, as mentioned above > > * potentially stalling sync(2) > > * potentially stalling page migration / compaction > > > > This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which > > filesystems may set on its inode mappings to indicate that writeback > > operations may take an indeterminate amount of time to complete. FUSE will set > > this flag on its mappings. This patchset adds checks to the critical parts of > > reclaim, sync, and page migration logic where writeback may be waited on. > > > > Please note the following: > > * For sync(2), waiting on writeback will be skipped for FUSE, but this has no > > effect on existing behavior. Dirty FUSE pages are already not guaranteed to > > be written to disk by the time sync(2) returns (eg writeback is cleared on > > the dirty page but the server may not have written out the temp page to disk > > yet). If the caller wishes to ensure the data has actually been synced to > > disk, they should use fsync(2)/fdatasync(2) instead. > > * AS_WRITEBACK_INDETERMINATE does not indicate that the folios should never be > > waited on when in writeback. There are some cases where the wait is > > desirable. For example, for the sync_file_range() syscall, it is fine to > > wait on the writeback since the caller passes in a fd for the operation. > > Looks good, thanks. > > Acked-by: Miklos Szeredi > > I think this should go via the mm tree. Andrew, can you please pick this series up or Joanne can send an updated version with all Acks/Review tag collected? Let us know what you prefer. Thanks, Shakeel