From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6BE2C8303C for ; Wed, 2 Jul 2025 18:44:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 53E856B009C; Wed, 2 Jul 2025 14:44:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EE7F6B00A0; Wed, 2 Jul 2025 14:44:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DED36B00AE; Wed, 2 Jul 2025 14:44:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2ADD06B009C for ; Wed, 2 Jul 2025 14:44:43 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E60421403BE for ; Wed, 2 Jul 2025 18:44:42 +0000 (UTC) X-FDA: 83620200804.29.E478ADC Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id 3ADCD1A000E for ; Wed, 2 Jul 2025 18:44:41 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=LTe3GXWS; spf=pass (imf19.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751481881; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bSAembVttdqegivCJ4Fqoj2UKAEnPFR49RsQZIk5y8Y=; b=0w5i7Qf23kbKwL3UW4/qjKcKSnLyCauA5FW4dZ+xtAaU2a9VnZ+xLw7KJlcayZQk4cTfw9 K2lpUbW9g2Y56sBjbTmwKm0bhfi44peFPScP8LHV/TOR+T+imqYsOQ7HtN0UyEMCkA9m23 c35sSnwWdWpP8OxbRVpZU9AZXiMik3c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751481881; a=rsa-sha256; cv=none; b=HMDkLtITmk6K00sVIlqb8MGg06hATGIxPKSkbSRTq9+lxSrUp3DrGBo1yN2s5ClgaxRGD2 QXrE6Yw+3uoqhtgvL5IioECw3R3yQaQC4b1XuK7T09KCdA1dZUEBVhAoo94o8ZN7FoqIWh n3hzdktObMDzd6ACzhMju/0KXRhv7Bw= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=LTe3GXWS; spf=pass (imf19.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id F0C0A5C6E64; Wed, 2 Jul 2025 18:44:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 929F6C4CEE7; Wed, 2 Jul 2025 18:44:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751481879; bh=/IZPd8KlF1Rfpn2HYXRQXoTSPYc958DcN74ujLHkxLA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LTe3GXWS6Db++/57WYwVc3keMIXhJgZR4AR38dxLEyuCPmELgp+zbrl4ZBaRTSrU4 cgOdmtJoRoq6XVoUNDgzn32O8s8Ldv6Pbr/j0mTPm9x7Mxon1GryHW7d6k+VR1ijtU t1OYIydDucFAbmJfRvfrYndlpUrRnTA9hd13CAMdlypcg68bIpUIWIAP/1m00NoYtA 8XFp30voETMfh7A6fJYAWNQhqcRn2jGAXGyHs8Od9eYjLFYQUhrApNqoCCQ6UzPGVE oN+Nv/SqIH5Kf8azSfXJ3hTAd2J26JhN2KGNL0kDLqFTjs3ox+gZrC/ARaA6Wws7Jc sGmv0XhJNxFIA== Date: Wed, 2 Jul 2025 11:44:39 -0700 From: "Darrick J. Wong" To: Kundan Kumar Cc: Anuj gupta , Christoph Hellwig , Anuj Gupta/Anuj Gupta , Kundan Kumar , jaegeuk@kernel.org, chao@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, miklos@szeredi.hu, agruenba@redhat.com, trondmy@kernel.org, anna@kernel.org, akpm@linux-foundation.org, willy@infradead.org, mcgrof@kernel.org, clm@meta.com, david@fromorbit.com, amir73il@gmail.com, axboe@kernel.dk, ritesh.list@gmail.com, dave@stgolabs.net, p.raghav@samsung.com, da.gomez@samsung.com, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, gfs2@lists.linux.dev, linux-nfs@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com Subject: Re: [PATCH 00/13] Parallelizing filesystem writeback Message-ID: <20250702184439.GD9991@frogsfrogsfrogs> References: <20250529111504.89912-1-kundan.kumar@samsung.com> <20250602141904.GA21996@lst.de> <20250603132434.GA10865@lst.de> <20250611155144.GD6138@frogsfrogsfrogs> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 3ADCD1A000E X-Stat-Signature: rtufhdg5na9jnoz13ue87wdxrpaquifo X-HE-Tag: 1751481881-966259 X-HE-Meta: U2FsdGVkX1/r3Uc1A5UVONGIJ+Xdxqn7UGDWV8DPS0NCdaPCxz9YYfSysXTydQ0ZsDel2t7PacJfvETeCJZz9nE/CfXyp1JxjtqyBQAYqYTARAXVIFDbWwJ09EvpbGz8QOud/dBsMGgpwuvtLgCg8cuvH0UkbhLl7cprow+1o2mUhhYs8m0Tqpj5kxoXUGcP0icL/rRAJCci+eUZ+b1usFuaDLZgWASh/7OpqPi+l1cO2Fwqsc1NoIQv/As8mXNy8be2UWrAZXi6BeXrrNx2s8An7Ji4LjJi6mkmiivSFMQVfPdG5vJ92BtUM2gXDQVVVuoxwWU4OAEamaavIJbF41scjIQqTB4G0F3/+wky5WZV2nrgTkrUMjm8TU+M9IUIrzQnWA66KHdvi5rn4FBVF8C6HypNlbhckd2YKaRgDGvOSS1ltMnOPayCxrgBP46AgNcYoS4E4YPVf+557rhQ69rH81lzFjy0y514gC5xPCXsiFJPDcI+PMZJjShKSpn6SySBX8zPmbHY4mKX6T/F/6VHg3pcJZb6g2c4a4h6MTMttz8VuBvIxjRQZr0h3CW2hEBQQqawjVUKYGUX2AlU226QB99NmCJ9RzaIcKZJ4YJnoVwZ3pX1u7eOMYxhClrWbfUY0NPctiLvekRfFnKObOpvp0XeNE1VYEKx76b9Q3HHyFTZ4rboSKUGIXkOLfpGsc9W4tYt3nUUO5k759klNWYZMbIYPm/H8G3qShAsgu2b4v0Nd8oDVN+JBz1X9Jd2RBDbHgSZAXZvsuIC/C6ZYL+Cr7MT2f1oip2lMDtjhWb0qPsfMq6hyWk8CXiteDU+xLkzZSvdiE1EfmQRcp+RK5vwUnn6pY79NN8Yc+lQylUqmaVGRBMtg5RxwLyIjuJpqdOQlnhH109vNso2eeT/By2SrvhUos0sz/z+TMhKh1Ep4YSMSSDPMa7iMVYGE+YfG6qejn3TAR9Fsrg3ctO YlI0XHvB y6MD299MR9fm8s8Tihpb+aRAZTUi3xVrwB3+ZArXUs7kD0MzsqQhgkGy0lpdjqOKPQsRkZoCpqOzUeABy8AvWDSb0BsDVSs4CdDJD9udngBt9XEj/YsSef+pxOl+gv8Qju7cHMJ/QvMynylZFacTSmk9NDU4H0q4aeZ2GfKfcjSXhaYl4YahVuau+nZdtG1YZ3xn9ZfGYRNfGZlg4m6s/HFCjBRyOwqnXp9mguBXdyMn0aMkI4Pw3FHWOq9SyP3Y/TiRBkPlB5QlLA7B/luHOmbpo7aHY+hX733EKZQ3QFKESyXlJlPTz2KYIeFqlku/Ys8Czs/JNiydlMelpPuKkHpQxacXRldu0/S0QY1UDw+b5TX+vkfHbVHmgyw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 24, 2025 at 11:29:28AM +0530, Kundan Kumar wrote: > On Wed, Jun 11, 2025 at 9:21 PM Darrick J. Wong wrote: > > > > On Wed, Jun 04, 2025 at 02:52:34PM +0530, Kundan Kumar wrote: > > > > > > For xfs used this command: > > > > > > xfs_io -c "stat" /mnt/testfile > > > > > > And for ext4 used this: > > > > > > filefrag /mnt/testfile > > > > > > > > > > filefrag merges contiguous extents, and only counts up for discontiguous > > > > > mappings, while fsxattr.nextents counts all extent even if they are > > > > > contiguous. So you probably want to use filefrag for both cases. > > > > > > > > Got it — thanks for the clarification. We'll switch to using filefrag > > > > and will share updated extent count numbers accordingly. > > > > > > Using filefrag, we recorded extent counts on xfs and ext4 at three > > > stages: > > > a. Just after a 1G random write, > > > b. After a 30-second wait, > > > c. After unmounting and remounting the filesystem, > > > > > > xfs > > > Base > > > a. 6251 b. 2526 c. 2526 > > > Parallel writeback > > > a. 6183 b. 2326 c. 2326 > > > > Interesting that the mapping record count goes down... > > > > I wonder, you said the xfs filesystem has 4 AGs and 12 cores, so I guess > > wb_ctx_arr[] is 12? I wonder, do you see a knee point in writeback > > throughput when the # of wb contexts exceeds the AG count? > > > > Though I guess for the (hopefully common) case of pure overwrites, we > > don't have to do any metadata updates so we wouldn't really hit a > > scaling limit due to ag count or log contention or whatever. Does that > > square with what you see? > > > > Hi Darrick, > > We analyzed AG count vs. number of writeback contexts to identify any > knee point. Earlier, wb_ctx_arr[] was fixed at 12; now we varied nr_wb_ctx > and measured the impact. > > We implemented a configurable number of writeback contexts to measure > throughput more easily. This feature will be exposed in the next series. > To configure, used: echo > /sys/class/bdi/259:2/nwritebacks. > > In our test, writing 1G across 12 directories showed improved bandwidth up > to the number of allocation groups (AGs), mostly a knee point, but gains > tapered off beyond that. Also, we see a good increase in bandwidth of about > 16 times from base to nr_wb_ctx = 6. > > Base (single threaded) : 9799KiB/s > Parallel Writeback (nr_wb_ctx = 1) : 9727KiB/s > Parallel Writeback (nr_wb_ctx = 2) : 18.1MiB/s > Parallel Writeback (nr_wb_ctx = 3) : 46.4MiB/s > Parallel Writeback (nr_wb_ctx = 4) : 135MiB/s > Parallel Writeback (nr_wb_ctx = 5) : 160MiB/s > Parallel Writeback (nr_wb_ctx = 6) : 163MiB/s Heh, nice! > Parallel Writeback (nr_wb_ctx = 7) : 162MiB/s > Parallel Writeback (nr_wb_ctx = 8) : 154MiB/s > Parallel Writeback (nr_wb_ctx = 9) : 152MiB/s > Parallel Writeback (nr_wb_ctx = 10) : 145MiB/s > Parallel Writeback (nr_wb_ctx = 11) : 145MiB/s > Parallel Writeback (nr_wb_ctx = 12) : 138MiB/s > > > System config > =========== > Number of CPUs = 12 > System RAM = 9G > For XFS number of AGs = 4 > Used NVMe SSD of 3.84 TB (Enterprise SSD PM1733a) > > Script > ===== > mkfs.xfs -f /dev/nvme0n1 > mount /dev/nvme0n1 /mnt > echo > /sys/class/bdi/259:2/nwritebacks > sync > echo 3 > /proc/sys/vm/drop_caches > > for i in {1..12}; do > mkdir -p /mnt/dir$i > done > > fio job_nvme.fio > > umount /mnt > echo 3 > /proc/sys/vm/drop_caches > sync > > fio job > ===== > [global] > bs=4k > iodepth=1 > rw=randwrite > ioengine=io_uring > nrfiles=12 > numjobs=1 # Each job writes to a different file > size=1g > direct=0 # Buffered I/O to trigger writeback > group_reporting=1 > create_on_open=1 > name=test > > [job1] > directory=/mnt/dir1 > > [job2] > directory=/mnt/dir2 > ... > ... > [job12] > directory=/mnt/dir1 > > > > ext4 > > > Base > > > a. 7080 b. 7080 c. 11 > > > Parallel writeback > > > a. 5961 b. 5961 c. 11 > > > > Hum, that's particularly ... interesting. I wonder what the mapping > > count behaviors are when you turn off delayed allocation? > > > > --D > > > > I attempted to disable delayed allocation by setting allocsize=4096 > during mount (mount -o allocsize=4096 /dev/pmem0 /mnt), but still > observed a reduction in file fragments after a delay. Is there something > I'm overlooking? Not that I know of. Maybe we should just take the win. :) --D > -Kundan >