From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 771CBC3ABB2 for ; Fri, 30 May 2025 03:37:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0DBE6B0082; Thu, 29 May 2025 23:37:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EBED86B0083; Thu, 29 May 2025 23:37:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD5856B0085; Thu, 29 May 2025 23:37:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BC2FD6B0082 for ; Thu, 29 May 2025 23:37:13 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 20054C2043 for ; Fri, 30 May 2025 03:37:13 +0000 (UTC) X-FDA: 83498163546.04.DC3F6FC Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf11.hostedemail.com (Postfix) with ESMTP id 6EC5C4000A for ; Fri, 30 May 2025 03:37:11 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=uh6IdToS; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748576231; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2LkW8pLuFljGIeCAazp65VucmI98VGwolK55I9VwmAY=; b=g2QMFY2MdByKgR4TNr5Mnaozry0MU8gKd7fJV5+X6sE8PmJo9wMO7+FIKu87vRl62uMCOg o+gcg0duNAOYiuTXNRRCmFi4AEa+Sq4yBcMSbid9oePcFJhygm0+lOC7sEz36bGuuwvtzI 5EaPOUm6qqnSJ9+mh5x7ZnYmbXgNi5c= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=uh6IdToS; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748576231; a=rsa-sha256; cv=none; b=BecveM5vEOhyd95hJJ5M/qbhin0Jj5/Op2i+VhoO6yUzf0mrfpbjULot8/fo3+A8x2JmNQ 64L4APHQY06LYgSt54WocimcCuwaazXbvyFZL53i+iG1CdtUJcMtjXEAqomh1RSZy83VFP i6s50paMQTqylYl6GJJP7+P7pufFjSo= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 11C5E43740; Fri, 30 May 2025 03:37:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 07E1BC4CEE7; Fri, 30 May 2025 03:37:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1748576229; bh=OSG0k/shA7/oPeCkLIqhhJKa64IsPWc6byb8m63R2lg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=uh6IdToSMXduPJ0LBRaOW7H0TbqSEGqbYCiuOrOx2biqC/urC9X8571G+b159WuzP B74SmCo4aL5ncirSXsWdGPWtts6XpLJpxUtR2AGvKAuDPsMDM+qhq00g5Enpi+aDKI Zyygr6GhtcZTYWAeHt7PJgJX93t5B4C4QvAJQFwA= Date: Thu, 29 May 2025 20:37:08 -0700 From: Andrew Morton To: Kundan Kumar Cc: jaegeuk@kernel.org, chao@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, miklos@szeredi.hu, agruenba@redhat.com, trondmy@kernel.org, anna@kernel.org, willy@infradead.org, mcgrof@kernel.org, clm@meta.com, david@fromorbit.com, amir73il@gmail.com, axboe@kernel.dk, hch@lst.de, ritesh.list@gmail.com, djwong@kernel.org, dave@stgolabs.net, p.raghav@samsung.com, da.gomez@samsung.com, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, gfs2@lists.linux.dev, linux-nfs@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com Subject: Re: [PATCH 00/13] Parallelizing filesystem writeback Message-Id: <20250529203708.9afe27783b218ad2d2babb0c@linux-foundation.org> In-Reply-To: <20250529111504.89912-1-kundan.kumar@samsung.com> References: <20250529111504.89912-1-kundan.kumar@samsung.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 6EC5C4000A X-Stat-Signature: y8fo6g4yqyzno3bcz7mtq5tp4fpm4r38 X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1748576231-524646 X-HE-Meta: U2FsdGVkX19PRdjJScqTyuNVCIekncDAfrep+fwbF4ZwznPdUBLEfmFsz7vjNGoxrtWBYM4uGjY+SMTdqnCq8xUToLuvop7v46ekZDwc+BZXvztLCnxNIm5jDr/sXFDhcpI0BPowbi9yf8qdASCiYOf3j41GMcbeoNfZo68kQfE7qT/G32DT1Ae+oPDJXWHvUYmRVTLWruvyFgxAazkl5hfhHdW1CFE+t16hNlyLn75X+Ui8ERZ2VwcVAwjYuyCjF4P0/80sX1+UD1pIGNiore8n0OOJr0gSZPcr6MTBhE29i1mKIOlCICKs8pHOu9osxSHEGypap0elmAPU/OVlDd7E/l01ZaKzZtcs28xqQ4siYsX9vjCf0X7BeNGjDlxguQsjMNLvWYZrw/5Oy3upCnrTMvfHPfzU0UgOqH7TtWMIEc0Cr8tKmBGIVurw7bSHjcvWjZs8lBdzS0aPbQllAcFUuMWre7mvDpItwFgTy1ZUVzEszONUqGjdQjS1Kb/H+PPyEzU8SPly3js1oTontnFd7izaFE/DJVRhxQpWCi77u1Xzb+5lnVJLoBrS3XSu5vRi60Gk9ctMIn0Bz2cvqApOus7kzqxnF1cR6D4oy+mBpooEiiW09FlQn17ujgd0wf2v8U0b0OvjXM/XkCkiwEeZdifMJ+e0c8/CU7N6gmN3DSIqG1tt+vGXGQujB7BnlsQVgtzDfUk0Sc3hu+z21R+YtivumR0HBuS8puTkH66LVAAgEhgDPaVjT1ndyqbSVTbgmp9fkNrwQKQVDZIRlM5Cjy3ahxKEGjlz6cstDZLPMKOtu8FYVEAbZbCwCfH3O2KI3b7E6nCS3ghaQpJq7E13r5tSIeFowsQIlxjKN5m1LicnZqpkNftc4k6J7pA6rZXhztjJnSAyGqTE0I0sA2xjhkXBLMsmMz7Rp077Qn/eqQpkOV/BdqVx/u+71HjMohS6pjO+Wnmlg+mL8Ju RQOPxmyo mcZiQppIplULUNh8Dh1mHNLGgog== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 29 May 2025 16:44:51 +0530 Kundan Kumar wrote: > Currently, pagecache writeback is performed by a single thread. Inodes > are added to a dirty list, and delayed writeback is triggered. The single > writeback thread then iterates through the dirty inode list, and executes > the writeback. > > This series parallelizes the writeback by allowing multiple writeback > contexts per backing device (bdi). These writebacks contexts are executed > as separate, independent threads, improving overall parallelism. > > Would love to hear feedback in-order to move this effort forward. > > Design Overview > ================ > Following Jan Kara's suggestion [1], we have introduced a new bdi > writeback context within the backing_dev_info structure. Specifically, > we have created a new structure, bdi_writeback_context, which contains > its own set of members for each writeback context. > > struct bdi_writeback_ctx { > struct bdi_writeback wb; > struct list_head wb_list; /* list of all wbs */ > struct radix_tree_root cgwb_tree; > struct rw_semaphore wb_switch_rwsem; > wait_queue_head_t wb_waitq; > }; > > There can be multiple writeback contexts in a bdi, which helps in > achieving writeback parallelism. > > struct backing_dev_info { > ... > int nr_wb_ctx; > struct bdi_writeback_ctx **wb_ctx_arr; I don't think the "_arr" adds value. bdi->wb_contexts[i]? > ... > }; > > FS geometry and filesystem fragmentation > ======================================== > The community was concerned that parallelizing writeback would impact > delayed allocation and increase filesystem fragmentation. > Our analysis of XFS delayed allocation behavior showed that merging of > extents occurs within a specific inode. Earlier experiments with multiple > writeback contexts [2] resulted in increased fragmentation due to the > same inode being processed by different threads. > > To address this, we now affine an inode to a specific writeback context > ensuring that delayed allocation works effectively. > > Number of writeback contexts > =========================== > The plan is to keep the nr_wb_ctx as 1, ensuring default single threaded > behavior. However, we set the number of writeback contexts equal to > number of CPUs in the current version. Makes sense. It would be good to test this on a non-SMP machine, if you can find one ;) > Later we will make it configurable > using a mount option, allowing filesystems to choose the optimal number > of writeback contexts. > > IOPS and throughput > =================== > We see significant improvement in IOPS across several filesystem on both > PMEM and NVMe devices. > > Performance gains: > - On PMEM: > Base XFS : 544 MiB/s > Parallel Writeback XFS : 1015 MiB/s (+86%) > Base EXT4 : 536 MiB/s > Parallel Writeback EXT4 : 1047 MiB/s (+95%) > > - On NVMe: > Base XFS : 651 MiB/s > Parallel Writeback XFS : 808 MiB/s (+24%) > Base EXT4 : 494 MiB/s > Parallel Writeback EXT4 : 797 MiB/s (+61%) > > We also see that there is no increase in filesystem fragmentation > # of extents: > - On XFS (on PMEM): > Base XFS : 1964 > Parallel Writeback XFS : 1384 > > - On EXT4 (on PMEM): > Base EXT4 : 21 > Parallel Writeback EXT4 : 11 Please test the performance on spinning disks, and with more filesystems?