From: Kundan Kumar <kundan.kumar@samsung.com>
To: jaegeuk@kernel.org, chao@kernel.org, viro@zeniv.linux.org.uk,
brauner@kernel.org, jack@suse.cz, miklos@szeredi.hu,
agruenba@redhat.com, trondmy@kernel.org, anna@kernel.org,
akpm@linux-foundation.org, willy@infradead.org,
mcgrof@kernel.org, clm@meta.com, david@fromorbit.com,
amir73il@gmail.com, axboe@kernel.dk, hch@lst.de,
ritesh.list@gmail.com, djwong@kernel.org, dave@stgolabs.net,
p.raghav@samsung.com, da.gomez@samsung.com
Cc: linux-f2fs-devel@lists.sourceforge.net,
linux-fsdevel@vger.kernel.org, gfs2@lists.linux.dev,
linux-nfs@vger.kernel.org, linux-mm@kvack.org,
gost.dev@samsung.com, Kundan Kumar <kundan.kumar@samsung.com>
Subject: [PATCH 00/13] Parallelizing filesystem writeback
Date: Thu, 29 May 2025 16:44:51 +0530 [thread overview]
Message-ID: <20250529111504.89912-1-kundan.kumar@samsung.com> (raw)
In-Reply-To: <CGME20250529113215epcas5p2edd67e7b129621f386be005fdba53378@epcas5p2.samsung.com>
Currently, pagecache writeback is performed by a single thread. Inodes
are added to a dirty list, and delayed writeback is triggered. The single
writeback thread then iterates through the dirty inode list, and executes
the writeback.
This series parallelizes the writeback by allowing multiple writeback
contexts per backing device (bdi). These writebacks contexts are executed
as separate, independent threads, improving overall parallelism.
Would love to hear feedback in-order to move this effort forward.
Design Overview
================
Following Jan Kara's suggestion [1], we have introduced a new bdi
writeback context within the backing_dev_info structure. Specifically,
we have created a new structure, bdi_writeback_context, which contains
its own set of members for each writeback context.
struct bdi_writeback_ctx {
struct bdi_writeback wb;
struct list_head wb_list; /* list of all wbs */
struct radix_tree_root cgwb_tree;
struct rw_semaphore wb_switch_rwsem;
wait_queue_head_t wb_waitq;
};
There can be multiple writeback contexts in a bdi, which helps in
achieving writeback parallelism.
struct backing_dev_info {
...
int nr_wb_ctx;
struct bdi_writeback_ctx **wb_ctx_arr;
...
};
FS geometry and filesystem fragmentation
========================================
The community was concerned that parallelizing writeback would impact
delayed allocation and increase filesystem fragmentation.
Our analysis of XFS delayed allocation behavior showed that merging of
extents occurs within a specific inode. Earlier experiments with multiple
writeback contexts [2] resulted in increased fragmentation due to the
same inode being processed by different threads.
To address this, we now affine an inode to a specific writeback context
ensuring that delayed allocation works effectively.
Number of writeback contexts
===========================
The plan is to keep the nr_wb_ctx as 1, ensuring default single threaded
behavior. However, we set the number of writeback contexts equal to
number of CPUs in the current version. Later we will make it configurable
using a mount option, allowing filesystems to choose the optimal number
of writeback contexts.
IOPS and throughput
===================
We see significant improvement in IOPS across several filesystem on both
PMEM and NVMe devices.
Performance gains:
- On PMEM:
Base XFS : 544 MiB/s
Parallel Writeback XFS : 1015 MiB/s (+86%)
Base EXT4 : 536 MiB/s
Parallel Writeback EXT4 : 1047 MiB/s (+95%)
- On NVMe:
Base XFS : 651 MiB/s
Parallel Writeback XFS : 808 MiB/s (+24%)
Base EXT4 : 494 MiB/s
Parallel Writeback EXT4 : 797 MiB/s (+61%)
We also see that there is no increase in filesystem fragmentation
# of extents:
- On XFS (on PMEM):
Base XFS : 1964
Parallel Writeback XFS : 1384
- On EXT4 (on PMEM):
Base EXT4 : 21
Parallel Writeback EXT4 : 11
[1] Jan Kara suggestion :
https://lore.kernel.org/all/gamxtewl5yzg4xwu7lpp7obhp44xh344swvvf7tmbiknvbd3ww@jowphz4h4zmb/
[2] Writeback using unaffined N (# of CPUs) threads :
https://lore.kernel.org/all/20250414102824.9901-1-kundan.kumar@samsung.com/
Kundan Kumar (13):
writeback: add infra for parallel writeback
writeback: add support to initialize and free multiple writeback ctxs
writeback: link bdi_writeback to its corresponding bdi_writeback_ctx
writeback: affine inode to a writeback ctx within a bdi
writeback: modify bdi_writeback search logic to search across all wb
ctxs
writeback: invoke all writeback contexts for flusher and dirtytime
writeback
writeback: modify sync related functions to iterate over all writeback
contexts
writeback: add support to collect stats for all writeback ctxs
f2fs: add support in f2fs to handle multiple writeback contexts
fuse: add support for multiple writeback contexts in fuse
gfs2: add support in gfs2 to handle multiple writeback contexts
nfs: add support in nfs to handle multiple writeback contexts
writeback: set the num of writeback contexts to number of online cpus
fs/f2fs/node.c | 11 +-
fs/f2fs/segment.h | 7 +-
fs/fs-writeback.c | 146 +++++++++++++-------
fs/fuse/file.c | 9 +-
fs/gfs2/super.c | 11 +-
fs/nfs/internal.h | 4 +-
fs/nfs/write.c | 5 +-
include/linux/backing-dev-defs.h | 32 +++--
include/linux/backing-dev.h | 45 +++++--
include/linux/fs.h | 1 -
mm/backing-dev.c | 225 ++++++++++++++++++++-----------
mm/page-writeback.c | 5 +-
12 files changed, 333 insertions(+), 168 deletions(-)
--
2.25.1
next parent reply other threads:[~2025-05-29 11:33 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20250529113215epcas5p2edd67e7b129621f386be005fdba53378@epcas5p2.samsung.com>
2025-05-29 11:14 ` Kundan Kumar [this message]
[not found] ` <CGME20250529113219epcas5p4d8ccb25ea910faea7120f092623f321d@epcas5p4.samsung.com>
2025-05-29 11:14 ` [PATCH 01/13] writeback: add infra for parallel writeback Kundan Kumar
[not found] ` <CGME20250529113224epcas5p2eea35fd0ebe445d8ad0471a144714b23@epcas5p2.samsung.com>
2025-05-29 11:14 ` [PATCH 02/13] writeback: add support to initialize and free multiple writeback ctxs Kundan Kumar
[not found] ` <CGME20250529113228epcas5p1db88ab42c2dac0698d715e38bd5e0896@epcas5p1.samsung.com>
2025-05-29 11:14 ` [PATCH 03/13] writeback: link bdi_writeback to its corresponding bdi_writeback_ctx Kundan Kumar
[not found] ` <CGME20250529113232epcas5p4e6f3b2f03d3a5f8fcaace3ddd03298d0@epcas5p4.samsung.com>
2025-05-29 11:14 ` [PATCH 04/13] writeback: affine inode to a writeback ctx within a bdi Kundan Kumar
2025-06-02 14:24 ` Christoph Hellwig
[not found] ` <CGME20250529113236epcas5p2049b6cc3be27d8727ac1f15697987ff5@epcas5p2.samsung.com>
2025-05-29 11:14 ` [PATCH 05/13] writeback: modify bdi_writeback search logic to search across all wb ctxs Kundan Kumar
[not found] ` <CGME20250529113240epcas5p295dcf9a016cc28e5c3e88d698808f645@epcas5p2.samsung.com>
2025-05-29 11:14 ` [PATCH 06/13] writeback: invoke all writeback contexts for flusher and dirtytime writeback Kundan Kumar
[not found] ` <CGME20250529113245epcas5p2978b77ce5ccf2d620f2a9ee5e796bee3@epcas5p2.samsung.com>
2025-05-29 11:14 ` [PATCH 07/13] writeback: modify sync related functions to iterate over all writeback contexts Kundan Kumar
[not found] ` <CGME20250529113249epcas5p38b29d3c6256337eadc2d1644181f9b74@epcas5p3.samsung.com>
2025-05-29 11:14 ` [PATCH 08/13] writeback: add support to collect stats for all writeback ctxs Kundan Kumar
[not found] ` <CGME20250529113253epcas5p1a28e77b2d9824d55f594ccb053725ece@epcas5p1.samsung.com>
2025-05-29 11:15 ` [PATCH 09/13] f2fs: add support in f2fs to handle multiple writeback contexts Kundan Kumar
2025-06-02 14:20 ` Christoph Hellwig
[not found] ` <CGME20250529113257epcas5p4dbaf9c8e2dc362767c8553399632c1ea@epcas5p4.samsung.com>
2025-05-29 11:15 ` [PATCH 10/13] fuse: add support for multiple writeback contexts in fuse Kundan Kumar
2025-06-02 14:21 ` Christoph Hellwig
2025-06-02 15:50 ` Bernd Schubert
2025-06-02 15:55 ` Christoph Hellwig
[not found] ` <CGME20250529113302epcas5p3bdae265288af32172fb7380a727383eb@epcas5p3.samsung.com>
2025-05-29 11:15 ` [PATCH 11/13] gfs2: add support in gfs2 to handle multiple writeback contexts Kundan Kumar
[not found] ` <CGME20250529113306epcas5p3d10606ae4ea7c3491e93bde9ae408c9f@epcas5p3.samsung.com>
2025-05-29 11:15 ` [PATCH 12/13] nfs: add support in nfs " Kundan Kumar
2025-06-02 14:22 ` Christoph Hellwig
[not found] ` <CGME20250529113311epcas5p3c8f1785b34680481e2126fda3ab51ad9@epcas5p3.samsung.com>
2025-05-29 11:15 ` [PATCH 13/13] writeback: set the num of writeback contexts to number of online cpus Kundan Kumar
2025-06-03 14:36 ` kernel test robot
2025-05-30 3:37 ` [PATCH 00/13] Parallelizing filesystem writeback Andrew Morton
2025-06-25 15:44 ` Kundan Kumar
2025-07-02 18:43 ` Darrick J. Wong
2025-07-03 13:05 ` Christoph Hellwig
2025-07-04 7:02 ` Kundan Kumar
2025-07-07 14:28 ` Christoph Hellwig
2025-07-07 15:47 ` Jan Kara
2025-06-02 14:19 ` Christoph Hellwig
2025-06-03 9:16 ` Anuj Gupta/Anuj Gupta
2025-06-03 13:24 ` Christoph Hellwig
2025-06-03 13:52 ` Anuj gupta
2025-06-03 14:04 ` Christoph Hellwig
2025-06-03 14:05 ` Christoph Hellwig
2025-06-06 5:04 ` Kundan Kumar
2025-06-09 4:00 ` Christoph Hellwig
2025-06-04 9:22 ` Kundan Kumar
2025-06-11 15:51 ` Darrick J. Wong
2025-06-24 5:59 ` Kundan Kumar
2025-07-02 18:44 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250529111504.89912-1-kundan.kumar@samsung.com \
--to=kundan.kumar@samsung.com \
--cc=agruenba@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=amir73il@gmail.com \
--cc=anna@kernel.org \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=chao@kernel.org \
--cc=clm@meta.com \
--cc=da.gomez@samsung.com \
--cc=dave@stgolabs.net \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=gfs2@lists.linux.dev \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jaegeuk@kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=miklos@szeredi.hu \
--cc=p.raghav@samsung.com \
--cc=ritesh.list@gmail.com \
--cc=trondmy@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox