linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Bharata B Rao <bharata@amd.com>
Cc: bfoster@redhat.com, clm@meta.com, hannes@cmpxchg.org,
	kirill@shutemov.name, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	willy@infradead.org
Subject: Re: [PATCHSET v6 0/12] Uncached buffered IO
Date: Thu, 12 Dec 2024 08:46:41 -0700	[thread overview]
Message-ID: <6e6475c2-7a5f-4742-b4ae-e1eef5f2c508@kernel.dk> (raw)
In-Reply-To: <20241210094842.204504-1-bharata@amd.com>

On 12/10/24 2:48 AM, Bharata B Rao wrote:
> Hi Jens,
> 
> I ran a couple of variants of FIO to check how this patchset affects the
> FIO numbers. My other motivation with this patchset is to check if it
> improves the scalability issues that were discussed in [1] and [2].
> But for now, here are some initial numbers.

Thanks for testing!

> Also note that I am using your buffered-uncached.8 branch from
> https://git.kernel.dk/cgit/linux/log/?h=buffered-uncached.8 that has
> changes to enable uncached buffered IO for EXT4 and block devices.
> 
> In the below reported numbers,
> 'base' means kernel from buffered-uncached.8 branch and
> 'patched' means kernel from buffered-uncached.8 branch + above shown FIO change
> 
> FIO on EXT4 partitions
> ======================
> nvme1n1     259:12   0   3.5T  0 disk 
> ??nvme1n1p1 259:13   0 894.3G  0 part /mnt1
> ??nvme1n1p2 259:14   0 894.3G  0 part /mnt2
> ??nvme1n1p3 259:15   0 894.3G  0 part /mnt3
> ??nvme1n1p4 259:16   0 894.1G  0 part /mnt4
> 
> fio -directory=/mnt4/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest
> fio -directory=/mnt3/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest
> fio -directory=/mnt1/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest
> fio -directory=/mnt2/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest
> 
> Four NVME devices are formatted with EXT4 and four parallel FIO instances
> are run on them with the options as shown above.
> 
> FIO output looks like this:
> 
> base:
>    READ: bw=1233MiB/s (1293MB/s), 1233MiB/s-1233MiB/s (1293MB/s-1293MB/s), io=4335GiB (4654GB), run=3600097-3600097msec
>   WRITE: bw=529MiB/s (554MB/s), 529MiB/s-529MiB/s (554MB/s-554MB/s), io=1858GiB (1995GB), run=3600097-3600097msec
>    READ: bw=1248MiB/s (1308MB/s), 1248MiB/s-1248MiB/s (1308MB/s-1308MB/s), io=4387GiB (4710GB), run=3600091-3600091msec
>   WRITE: bw=535MiB/s (561MB/s), 535MiB/s-535MiB/s (561MB/s-561MB/s), io=1880GiB (2019GB), run=3600091-3600091msec
>    READ: bw=1235MiB/s (1294MB/s), 1235MiB/s-1235MiB/s (1294MB/s-1294MB/s), io=4340GiB (4660GB), run=3600094-3600094msec
>   WRITE: bw=529MiB/s (555MB/s), 529MiB/s-529MiB/s (555MB/s-555MB/s), io=1860GiB (1997GB), run=3600094-3600094msec
>    READ: bw=1234MiB/s (1294MB/s), 1234MiB/s-1234MiB/s (1294MB/s-1294MB/s), io=4337GiB (4657GB), run=3600093-3600093msec
>   WRITE: bw=529MiB/s (554MB/s), 529MiB/s-529MiB/s (554MB/s-554MB/s), io=1859GiB (1996GB), run=3600093-3600093msec
> 
> patched:
>    READ: bw=1400MiB/s (1469MB/s), 1400MiB/s-1400MiB/s (1469MB/s-1469MB/s), io=4924GiB (5287GB), run=3600100-3600100msec
>   WRITE: bw=600MiB/s (629MB/s), 600MiB/s-600MiB/s (629MB/s-629MB/s), io=2110GiB (2266GB), run=3600100-3600100msec
>    READ: bw=1395MiB/s (1463MB/s), 1395MiB/s-1395MiB/s (1463MB/s-1463MB/s), io=4904GiB (5266GB), run=3600148-3600148msec
>   WRITE: bw=598MiB/s (627MB/s), 598MiB/s-598MiB/s (627MB/s-627MB/s), io=2102GiB (2257GB), run=3600148-3600148msec
>    READ: bw=1385MiB/s (1452MB/s), 1385MiB/s-1385MiB/s (1452MB/s-1452MB/s), io=4868GiB (5227GB), run=3600136-3600136msec
>   WRITE: bw=594MiB/s (622MB/s), 594MiB/s-594MiB/s (622MB/s-622MB/s), io=2087GiB (2241GB), run=3600136-3600136msec
>    READ: bw=1376MiB/s (1443MB/s), 1376MiB/s-1376MiB/s (1443MB/s-1443MB/s), io=4837GiB (5194GB), run=3600145-3600145msec
>   WRITE: bw=590MiB/s (618MB/s), 590MiB/s-590MiB/s (618MB/s-618MB/s), io=2073GiB (2226GB), run=3600145-3600145msec
> 
> FIO on block devices
> ====================
> nvme1n1     259:12   0   3.5T  0 disk 
> ??nvme1n1p1 259:13   0 894.3G  0 part 
> ??nvme1n1p2 259:14   0 894.3G  0 part 
> ??nvme1n1p3 259:15   0 894.3G  0 part 
> ??nvme1n1p4 259:16   0 894.1G  0 part 
> 
> fio -filename=/dev/nvme1n1p4 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest
> fio -filename=/dev/nvme1n1p2 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest
> fio -filename=/dev/nvme1n1p1 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest
> fio -filename=/dev/nvme1n1p3 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest
> 
> Four instances of FIO are run on four different NVME block devices
> with the options as shown above.
> 
> base:
>    READ: bw=8712MiB/s (9135MB/s), 8712MiB/s-8712MiB/s (9135MB/s-9135MB/s), io=29.9TiB (32.9TB), run=3600011-3600011msec
>   WRITE: bw=3734MiB/s (3915MB/s), 3734MiB/s-3734MiB/s (3915MB/s-3915MB/s), io=12.8TiB (14.1TB), run=3600011-3600011msec
>    READ: bw=8727MiB/s (9151MB/s), 8727MiB/s-8727MiB/s (9151MB/s-9151MB/s), io=30.0TiB (32.9TB), run=3600005-3600005msec
>   WRITE: bw=3740MiB/s (3922MB/s), 3740MiB/s-3740MiB/s (3922MB/s-3922MB/s), io=12.8TiB (14.1TB), run=3600005-3600005msec
>    READ: bw=8701MiB/s (9123MB/s), 8701MiB/s-8701MiB/s (9123MB/s-9123MB/s), io=29.9TiB (32.8TB), run=3600004-3600004msec
>   WRITE: bw=3729MiB/s (3910MB/s), 3729MiB/s-3729MiB/s (3910MB/s-3910MB/s), io=12.8TiB (14.1TB), run=3600004-3600004msec
>    READ: bw=8706MiB/s (9128MB/s), 8706MiB/s-8706MiB/s (9128MB/s-9128MB/s), io=29.9TiB (32.9TB), run=3600005-3600005msec
>   WRITE: bw=3731MiB/s (3913MB/s), 3731MiB/s-3731MiB/s (3913MB/s-3913MB/s), io=12.8TiB (14.1TB), run=3600005-3600005msec
> 
> patched:
>    READ: bw=1844MiB/s (1933MB/s), 1844MiB/s-1844MiB/s (1933MB/s-1933MB/s), io=6500GiB (6980GB), run=3610641-3610641msec
>   WRITE: bw=790MiB/s (828MB/s), 790MiB/s-790MiB/s (828MB/s-828MB/s), io=2786GiB (2991GB), run=3610642-3610642msec
>    READ: bw=1753MiB/s (1838MB/s), 1753MiB/s-1753MiB/s (1838MB/s-1838MB/s), io=6235GiB (6695GB), run=3641973-3641973msec
>   WRITE: bw=751MiB/s (788MB/s), 751MiB/s-751MiB/s (788MB/s-788MB/s), io=2672GiB (2869GB), run=3641969-3641969msec
>    READ: bw=1078MiB/s (1130MB/s), 1078MiB/s-1078MiB/s (1130MB/s-1130MB/s), io=3788GiB (4068GB), run=3600007-3600007msec
>   WRITE: bw=462MiB/s (484MB/s), 462MiB/s-462MiB/s (484MB/s-484MB/s), io=1624GiB (1743GB), run=3600007-3600007msec
>    READ: bw=1752MiB/s (1838MB/s), 1752MiB/s-1752MiB/s (1838MB/s-1838MB/s), io=6234GiB (6694GB), run=3642657-3642657msec
>   WRITE: bw=751MiB/s (788MB/s), 751MiB/s-751MiB/s (788MB/s-788MB/s), io=2672GiB (2869GB), run=3642622-3642622msec
> 
> While FIO on FS shows improvement, FIO on block shows numbers going down.
> Is this expected or am I missing enabling anything else for the block option?

The fs side looks as expected, that's a nice win. For the bdev side, I
deliberately did not post the bdev patch for enabling uncached buffered
IO on a raw block device, as it's just a hack for testing. It currently
needs punting similar to dirtying of pages, and it's not optimized at
all. We really need the raw bdev ops moving fully to iomap and not being
dependent on buffer_heads for this to pan out, so the most likely
outcome here is that raw bdev uncached buffered IO will not really be
supported until the time that someone (probably Christoph) does that
work.

I don't think this is a big issue, can't imagine buffered IO on raw
block devices being THAT interesting of a use case.

-- 
Jens Axboe


      reply	other threads:[~2024-12-12 15:46 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-03 15:31 Jens Axboe
2024-12-03 15:31 ` [PATCH 01/12] mm/filemap: change filemap_create_folio() to take a struct kiocb Jens Axboe
2024-12-10 11:13   ` Christoph Hellwig
2024-12-12 15:49     ` Jens Axboe
2024-12-03 15:31 ` [PATCH 02/12] mm/readahead: add folio allocation helper Jens Axboe
2024-12-03 15:31 ` [PATCH 03/12] mm: add PG_uncached page flag Jens Axboe
2024-12-03 15:31 ` [PATCH 04/12] mm/readahead: add readahead_control->uncached member Jens Axboe
2024-12-03 15:31 ` [PATCH 05/12] mm/filemap: use page_cache_sync_ra() to kick off read-ahead Jens Axboe
2024-12-10 11:15   ` Christoph Hellwig
2024-12-03 15:31 ` [PATCH 06/12] mm/truncate: add folio_unmap_invalidate() helper Jens Axboe
2024-12-10 11:21   ` Christoph Hellwig
2024-12-12 20:19     ` Jens Axboe
2024-12-03 15:31 ` [PATCH 07/12] fs: add RWF_UNCACHED iocb and FOP_UNCACHED file_operations flag Jens Axboe
2024-12-06 17:35   ` Darrick J. Wong
2024-12-10 11:22   ` Christoph Hellwig
2024-12-12 19:42     ` Jens Axboe
2024-12-03 15:31 ` [PATCH 08/12] mm/filemap: add read support for RWF_UNCACHED Jens Axboe
2024-12-03 15:31 ` [PATCH 09/12] mm/filemap: drop uncached pages when writeback completes Jens Axboe
2024-12-03 15:31 ` [PATCH 10/12] mm/filemap: add filemap_fdatawrite_range_kick() helper Jens Axboe
2024-12-03 15:31 ` [PATCH 11/12] mm/filemap: make buffered writes work with RWF_UNCACHED Jens Axboe
2024-12-06 17:17   ` Darrick J. Wong
2024-12-06 18:22     ` Jens Axboe
2024-12-10 11:31       ` Christoph Hellwig
2024-12-12 15:51         ` Jens Axboe
2024-12-03 15:31 ` [PATCH 12/12] mm: add FGP_UNCACHED folio creation flag Jens Axboe
2024-12-03 18:23 ` [PATCHSET v6 0/12] Uncached buffered IO Christoph Lameter (Ampere)
2024-12-03 21:06   ` Jens Axboe
2024-12-03 22:16     ` Christoph Lameter (Ampere)
2024-12-03 22:41       ` Jens Axboe
2024-12-04  5:52         ` Darrick J. Wong
2024-12-04 16:36           ` Jens Axboe
2024-12-10 11:11           ` Christoph Hellwig
2024-12-12 15:48             ` Jens Axboe
2024-12-12 16:59               ` Christoph Lameter (Ampere)
2024-12-12 19:14                 ` Jens Axboe
2024-12-12 19:35                   ` Matthew Wilcox
2024-12-12 19:36                     ` Jens Axboe
2024-12-12 20:06                     ` Christoph Lameter (Ampere)
2024-12-13  5:04                     ` Johannes Weiner
2024-12-13 14:49                       ` Jens Axboe
2024-12-06 17:37 ` Darrick J. Wong
2024-12-10  9:48 ` Bharata B Rao
2024-12-12 15:46   ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e6475c2-7a5f-4742-b4ae-e1eef5f2c508@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=bfoster@redhat.com \
    --cc=bharata@amd.com \
    --cc=clm@meta.com \
    --cc=hannes@cmpxchg.org \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox