From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD303C54FB3 for ; Thu, 29 May 2025 11:33:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3AE5E6B0131; Thu, 29 May 2025 07:33:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 35F346B0132; Thu, 29 May 2025 07:33:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24DF86B0133; Thu, 29 May 2025 07:33:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 051DB6B0131 for ; Thu, 29 May 2025 07:33:33 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5353D160137 for ; Thu, 29 May 2025 11:33:33 +0000 (UTC) X-FDA: 83495735106.24.735BD8F Received: from mailout1.samsung.com (mailout1.samsung.com [203.254.224.24]) by imf24.hostedemail.com (Postfix) with ESMTP id CAD6C180017 for ; Thu, 29 May 2025 11:33:29 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=IWzvH78S; spf=pass (imf24.hostedemail.com: domain of kundan.kumar@samsung.com designates 203.254.224.24 as permitted sender) smtp.mailfrom=kundan.kumar@samsung.com; dmarc=pass (policy=none) header.from=samsung.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748518411; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=0BBOPh/1Je3me0LukkKrHkNdCa1/chw529Mw+ILY7AM=; b=oarWOmNDyA2zHCfOsGSD2QhrptEJbGCOJtdO0SWJ5Y2shDH6Z2IYnNVpkVgtIau290iUAr khJb2zdQj4ovL0Bd2y3UihUPCZk4gSEgrBuDXJRaVQPhYskqpI0UV8nvUMl+MqQWdMi33L 7ppaimSfejIqyMltlT3Zzls2i9WVFGw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=IWzvH78S; spf=pass (imf24.hostedemail.com: domain of kundan.kumar@samsung.com designates 203.254.224.24 as permitted sender) smtp.mailfrom=kundan.kumar@samsung.com; dmarc=pass (policy=none) header.from=samsung.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748518411; a=rsa-sha256; cv=none; b=3ypk7ahx1lwGl8t9w71ctemtejGb4hEsHx7ef+vPGF8zo8EZq1Ra37W8FstcDHR1dsfe46 2BJ105g3rLz1fiKJP5AqlXh7zIKSbTbVfQGH2Y30ZzzTGCrNil+rgbXzISMOuC95cRzUdi vitUFjOQlgT/7U7qPG7RdOv23XtjtNw= Received: from epcas5p2.samsung.com (unknown [182.195.41.40]) by mailout1.samsung.com (KnoxPortal) with ESMTP id 20250529113326epoutp01f8e870b80875e0ff4c232a09f0cd6b34~D-Dnw-aB92841628416epoutp01Q for ; Thu, 29 May 2025 11:33:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout1.samsung.com 20250529113326epoutp01f8e870b80875e0ff4c232a09f0cd6b34~D-Dnw-aB92841628416epoutp01Q DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1748518406; bh=0BBOPh/1Je3me0LukkKrHkNdCa1/chw529Mw+ILY7AM=; h=From:To:Cc:Subject:Date:References:From; b=IWzvH78SF7qzQn/pWr7KEbtYNBmiy9GCBm4F1TQntNi3tDSGz1/QFS7nZLrLipOpN gJkLbItfAm9glffjxkV/yI0O5X7sKV5fD6n4e2lbrAbO8TMKBdDC+lC0H/0aLaAAx2 chuMjm3PlDX4QuNwpszIJBdLE8HxNmyoqGAonJu4= Received: from epsnrtp02.localdomain (unknown [182.195.42.154]) by epcas5p3.samsung.com (KnoxPortal) with ESMTPS id 20250529113325epcas5p3830c2f39e388a74a690d47b523499383~D-Dm0q5r70040000400epcas5p3P; Thu, 29 May 2025 11:33:25 +0000 (GMT) Received: from epcas5p3.samsung.com (unknown [182.195.38.182]) by epsnrtp02.localdomain (Postfix) with ESMTP id 4b7PPv4tyMz2SSKX; Thu, 29 May 2025 11:33:23 +0000 (GMT) Received: from epsmtrp2.samsung.com (unknown [182.195.40.14]) by epcas5p2.samsung.com (KnoxPortal) with ESMTPA id 20250529113215epcas5p2edd67e7b129621f386be005fdba53378~D-Cl_h9Q22517425174epcas5p2v; Thu, 29 May 2025 11:32:15 +0000 (GMT) Received: from epsmgmc1p1new.samsung.com (unknown [182.195.42.40]) by epsmtrp2.samsung.com (KnoxPortal) with ESMTP id 20250529113215epsmtrp24d38e9c495ac3d242ed907100353fa8f~D-Cl8Jy3Y3146431464epsmtrp2c; Thu, 29 May 2025 11:32:15 +0000 (GMT) X-AuditID: b6c32a28-460ee70000001e8a-e5-683845bf1925 Received: from epsmtip2.samsung.com ( [182.195.34.31]) by epsmgmc1p1new.samsung.com (Symantec Messaging Gateway) with SMTP id B0.41.07818.FB548386; Thu, 29 May 2025 20:32:15 +0900 (KST) Received: from localhost.localdomain (unknown [107.99.41.245]) by epsmtip2.samsung.com (KnoxPortal) with ESMTPA id 20250529113211epsmtip2df4f1dfd18c671ff59a420a522ae4244~D-CiJOJGk2207922079epsmtip2Z; Thu, 29 May 2025 11:32:11 +0000 (GMT) From: Kundan Kumar To: jaegeuk@kernel.org, chao@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, miklos@szeredi.hu, agruenba@redhat.com, trondmy@kernel.org, anna@kernel.org, akpm@linux-foundation.org, willy@infradead.org, mcgrof@kernel.org, clm@meta.com, david@fromorbit.com, amir73il@gmail.com, axboe@kernel.dk, hch@lst.de, ritesh.list@gmail.com, djwong@kernel.org, dave@stgolabs.net, p.raghav@samsung.com, da.gomez@samsung.com Cc: linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, gfs2@lists.linux.dev, linux-nfs@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com, Kundan Kumar Subject: [PATCH 00/13] Parallelizing filesystem writeback Date: Thu, 29 May 2025 16:44:51 +0530 Message-Id: <20250529111504.89912-1-kundan.kumar@samsung.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrLIsWRmVeSWpSXmKPExsWy7bCSvO5+V4sMg2WbRCy2rdvNbjFn/Ro2 iwvrVjNatO78z2Kx+m4/m8Xrw58YLU5PPctkseWSvcX7y9uYLFbfXMNoseXYPUaLy0/4LHZP /8dqcfPATiaLlauPMlnMnt7MZPFk/Sxmi61fvrJaXFrkbrFn70kWi3tr/rNaXDhwmtXixoSn jBbPdm9ktvi8tIXd4uCpDnaLT3OBhpz/e5zV4vePOWwOsh6nFkl47Jx1l91j8wotj8tnSz02 repk89j0aRK7x4kZv1k8Xmyeyeixe8FnJo/dNxvYPM5drPB4v+8qm0ffllWMHlNn13ucWXCE 3WPFtItMAUJRXDYpqTmZZalF+nYJXBnH+/ayFNxSreh8uIq5gXGpbBcjB4eEgInErBOKXYxc HEICuxklJk6Yxd7FyAkUl5HYfXcnK4QtLLHy33N2iKKPjBLfn+xjAWlmE9CV+NEUChIXEbjJ LHHu7BlWEIdZ4CajxJzNq8EmCQtYSkxc8BpsEouAqsTbL69YQGxeAVuJBS+OMUNskJeYeek7 O0RcUOLkzCdgNcxA8eats5knMPLNQpKahSS1gJFpFaNkakFxbnpusmGBYV5quV5xYm5xaV66 XnJ+7iZGcIxraexgfPetSf8QIxMH4yFGCQ5mJRHeJnuzDCHelMTKqtSi/Pii0pzU4kOM0hws SuK8Kw0j0oUE0hNLUrNTUwtSi2CyTBycUg1MPSVnXuydV7xnYaZRrR/b/N9OS/jErxavbJy6 iXOe0TWriZU5hQzi0l9kdhk4JPF7Xag8u/HlbwMXbk3fzQFf/0u8u2VTY1C9guMig37EvMzG E9sPHrsqsJZRWfqkQYTxrsV/Lj+Y2ffu6xyJhQKRq08GrK2/rZXO+LBjd8YUj0LndfeeuAkE iG3UviXQes2H30x0khvXgzrDzfc5S22PSSktnvpw6Z3ZT3j55SLXmorN2RVzOf6Zi45k2L+l b6qVDO6sym5co6C1R+KMxWEHp+22NglPhO40lGT9cQi8+mvFinuWZ1dadFiGGdnWPD4X5134 Oy6lfPVuP+kZnd47Tx8/tWKCdNUd5pOBZwr8lViKMxINtZiLihMBvytCF2ADAAA= X-CMS-MailID: 20250529113215epcas5p2edd67e7b129621f386be005fdba53378 X-Msg-Generator: CA Content-Type: text/plain; charset="utf-8" X-Sendblock-Type: REQ_APPROVE CMS-TYPE: 105P cpgsPolicy: CPGSC10-542,Y X-CFilter-Loop: Reflected X-CMS-RootMailID: 20250529113215epcas5p2edd67e7b129621f386be005fdba53378 References: X-Rspam-User: X-Rspamd-Queue-Id: CAD6C180017 X-Rspamd-Server: rspam09 X-Stat-Signature: ajfstgiccfyibtufktkzrf5usbk9tgig X-HE-Tag: 1748518409-302532 X-HE-Meta: U2FsdGVkX18HffJFoZM4PgSivXR1ppDPAB9A52IBKzEHONlBcy3C5Yp1dp3Ncqw0Y0SyC7vuQMyc76X2JI15RKdmaTHvg9ZBR2CQhYGBS7K1n8YFblPNIbAXEkpULPNyT1CfZLhVjkIFfXsMTm/jOGauYdGRjDaLAgIeWrnMvtAMiWegshuAr40Fx0+BAOge2J6fKSXdCdUY1/WfvXELmcfN7VTHQzx7vhO8K6Ufm+wo1fWcUWZEYk9J1DGvI2vt45roxBUijSHyXpaJeYiNTi3A90ucr5sQ8qDedoHQ3KyzsW+CklYae4vtvMiQKujRnowdHZsYD5rzBvuBwfeFzcEOdya435cb6y1YJukPbiVq1y+1vN5aAmwI4UVROlxnINDc6Ry1d3STciRLY+HII6AX1a+5WmDzkbMeOrTe+jpKQklG2bjw7NGX2yTpkZ/g2IT5FIN3uvKHIUt6tmWWkUGTNr20xRdvrKrjaMW7rPUHYpbEK8cPHbYb7LteitPeshjtxQOuPIR6NZjoJs77xqyUbp45Et/GOEdvqOfN/Z9KzcAp2C9BZiNF9Q5Bjj2I2FHab4nR1OEXW9O9qxC9lJFp5zqvOlKdZgn20FtxP3OP8smUo0y1Yjl3zyPt/fJIMTUeW2wl23crHQ8LJgJDX5J098VopYGGhGBH9vUGjQLuVySGEbYk37M2/FbtK7DiaYCnksDvpkbUiUUpLTbln7fPqmNIgsj6t4fz3dicDd8GbwYIdVc1QoC+bSVeHDwT/TI5JE7rRmP37wXMn2gBQXKoVl0KoNKEp2VITpx9F0Y0OAcpgn93RgfJqBZ4aTIEaRX9v99IlZi+7HbK29dbhoWSZPiAY2YRymAjXpf0MpFHkdk+0gfAbWsW1V3dazQ8YMu8RdcVlrNo+w5+RIrsAHEDvMez86GN5O1Ta1V63F/g4Xe++kOoPlTepw3SftOu1crQbHLn5/9a+h+IqxJ xSsHZXiq reARfO9SZXXOwo+G8TKayQ9moZImwaDMyZJcr+I1RGJIcbSBPty/ehbK25Pho8J6Bhl/cW0cuAyL67I+VvVn9AJjwbUHcyBeE7tvuv6ntFk7yJha6d4GNR4H8kxM+sV6k/iJyBF4f4xa/2ZoC4sAGM7PLJW6gRHsTuZJy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, pagecache writeback is performed by a single thread. Inodes are added to a dirty list, and delayed writeback is triggered. The single writeback thread then iterates through the dirty inode list, and executes the writeback. This series parallelizes the writeback by allowing multiple writeback contexts per backing device (bdi). These writebacks contexts are executed as separate, independent threads, improving overall parallelism. Would love to hear feedback in-order to move this effort forward. Design Overview ================ Following Jan Kara's suggestion [1], we have introduced a new bdi writeback context within the backing_dev_info structure. Specifically, we have created a new structure, bdi_writeback_context, which contains its own set of members for each writeback context. struct bdi_writeback_ctx { struct bdi_writeback wb; struct list_head wb_list; /* list of all wbs */ struct radix_tree_root cgwb_tree; struct rw_semaphore wb_switch_rwsem; wait_queue_head_t wb_waitq; }; There can be multiple writeback contexts in a bdi, which helps in achieving writeback parallelism. struct backing_dev_info { ... int nr_wb_ctx; struct bdi_writeback_ctx **wb_ctx_arr; ... }; FS geometry and filesystem fragmentation ======================================== The community was concerned that parallelizing writeback would impact delayed allocation and increase filesystem fragmentation. Our analysis of XFS delayed allocation behavior showed that merging of extents occurs within a specific inode. Earlier experiments with multiple writeback contexts [2] resulted in increased fragmentation due to the same inode being processed by different threads. To address this, we now affine an inode to a specific writeback context ensuring that delayed allocation works effectively. Number of writeback contexts =========================== The plan is to keep the nr_wb_ctx as 1, ensuring default single threaded behavior. However, we set the number of writeback contexts equal to number of CPUs in the current version. Later we will make it configurable using a mount option, allowing filesystems to choose the optimal number of writeback contexts. IOPS and throughput =================== We see significant improvement in IOPS across several filesystem on both PMEM and NVMe devices. Performance gains: - On PMEM: Base XFS : 544 MiB/s Parallel Writeback XFS : 1015 MiB/s (+86%) Base EXT4 : 536 MiB/s Parallel Writeback EXT4 : 1047 MiB/s (+95%) - On NVMe: Base XFS : 651 MiB/s Parallel Writeback XFS : 808 MiB/s (+24%) Base EXT4 : 494 MiB/s Parallel Writeback EXT4 : 797 MiB/s (+61%) We also see that there is no increase in filesystem fragmentation # of extents: - On XFS (on PMEM): Base XFS : 1964 Parallel Writeback XFS : 1384 - On EXT4 (on PMEM): Base EXT4 : 21 Parallel Writeback EXT4 : 11 [1] Jan Kara suggestion : https://lore.kernel.org/all/gamxtewl5yzg4xwu7lpp7obhp44xh344swvvf7tmbiknvbd3ww@jowphz4h4zmb/ [2] Writeback using unaffined N (# of CPUs) threads : https://lore.kernel.org/all/20250414102824.9901-1-kundan.kumar@samsung.com/ Kundan Kumar (13): writeback: add infra for parallel writeback writeback: add support to initialize and free multiple writeback ctxs writeback: link bdi_writeback to its corresponding bdi_writeback_ctx writeback: affine inode to a writeback ctx within a bdi writeback: modify bdi_writeback search logic to search across all wb ctxs writeback: invoke all writeback contexts for flusher and dirtytime writeback writeback: modify sync related functions to iterate over all writeback contexts writeback: add support to collect stats for all writeback ctxs f2fs: add support in f2fs to handle multiple writeback contexts fuse: add support for multiple writeback contexts in fuse gfs2: add support in gfs2 to handle multiple writeback contexts nfs: add support in nfs to handle multiple writeback contexts writeback: set the num of writeback contexts to number of online cpus fs/f2fs/node.c | 11 +- fs/f2fs/segment.h | 7 +- fs/fs-writeback.c | 146 +++++++++++++------- fs/fuse/file.c | 9 +- fs/gfs2/super.c | 11 +- fs/nfs/internal.h | 4 +- fs/nfs/write.c | 5 +- include/linux/backing-dev-defs.h | 32 +++-- include/linux/backing-dev.h | 45 +++++-- include/linux/fs.h | 1 - mm/backing-dev.c | 225 ++++++++++++++++++++----------- mm/page-writeback.c | 5 +- 12 files changed, 333 insertions(+), 168 deletions(-) -- 2.25.1