From: Mateusz Guzik <mjguzik@gmail.com>
To: Bharata B Rao <bharata@amd.com>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
nikunj@amd.com, willy@infradead.org, vbabka@suse.cz,
david@redhat.com, akpm@linux-foundation.org, yuzhao@google.com,
axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org,
jack@suse.cz, joshdon@google.com, clm@meta.com
Subject: Re: [RFC PATCH 0/1] Large folios in block buffered IO path
Date: Thu, 28 Nov 2024 05:31:38 +0100 [thread overview]
Message-ID: <CAGudoHHo4sLNpoVw-WTGVCc-gL0xguYWfUWfV1CSsQo6-bGnFg@mail.gmail.com> (raw)
In-Reply-To: <CAGudoHHBu663RSjQUwi14_d+Ln6mw_ESvYCc6dTec-O0Wi1-Eg@mail.gmail.com>
On Thu, Nov 28, 2024 at 5:22 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Thu, Nov 28, 2024 at 5:02 AM Bharata B Rao <bharata@amd.com> wrote:
> >
> > The contention with inode_lock is gone after your above changes. The new
> > top 10 contention data looks like this now:
> >
> > contended total wait max wait avg wait type caller
> >
> > 2441494015 172.15 h 1.72 ms 253.83 us spinlock
> > folio_wait_bit_common+0xd5
> > 0xffffffffadbf60a3
> > native_queued_spin_lock_slowpath+0x1f3
> > 0xffffffffadbf5d01 _raw_spin_lock_irq+0x51
> > 0xffffffffacdd1905 folio_wait_bit_common+0xd5
> > 0xffffffffacdd2d0a filemap_get_pages+0x68a
> > 0xffffffffacdd2e73 filemap_read+0x103
> > 0xffffffffad1d67ba blkdev_read_iter+0x6a
> > 0xffffffffacf06937 vfs_read+0x297
> > 0xffffffffacf07653 ksys_read+0x73
> > 25269947 1.58 h 1.72 ms 225.44 us spinlock
> > folio_wake_bit+0x62
> > 0xffffffffadbf60a3
> > native_queued_spin_lock_slowpath+0x1f3
> > 0xffffffffadbf537c _raw_spin_lock_irqsave+0x5c
> > 0xffffffffacdcf322 folio_wake_bit+0x62
> > 0xffffffffacdd2ca7 filemap_get_pages+0x627
> > 0xffffffffacdd2e73 filemap_read+0x103
> > 0xffffffffad1d67ba blkdev_read_iter+0x6a
> > 0xffffffffacf06937 vfs_read+0x297
> > 0xffffffffacf07653 ksys_read+0x73
> > 44757761 1.05 h 1.55 ms 84.41 us spinlock
> > folio_wake_bit+0x62
> > 0xffffffffadbf60a3
> > native_queued_spin_lock_slowpath+0x1f3
> > 0xffffffffadbf537c _raw_spin_lock_irqsave+0x5c
> > 0xffffffffacdcf322 folio_wake_bit+0x62
> > 0xffffffffacdcf7bc folio_end_read+0x2c
> > 0xffffffffacf6d4cf mpage_read_end_io+0x6f
> > 0xffffffffad1d8abb bio_endio+0x12b
> > 0xffffffffad1f07bd blk_mq_end_request_batch+0x12d
> > 0xffffffffc05e4e9b nvme_pci_complete_batch+0xbb
> [snip]
> > However a point of concern is that FIO bandwidth comes down drastically
> > after the change.
> >
>
> Nicely put :)
>
> > default inode_lock-fix
> > rw=30%
> > Instance 1 r=55.7GiB/s,w=23.9GiB/s r=9616MiB/s,w=4121MiB/s
> > Instance 2 r=38.5GiB/s,w=16.5GiB/s r=8482MiB/s,w=3635MiB/s
> > Instance 3 r=37.5GiB/s,w=16.1GiB/s r=8609MiB/s,w=3690MiB/s
> > Instance 4 r=37.4GiB/s,w=16.0GiB/s r=8486MiB/s,w=3637MiB/s
> >
>
> This means that the folio waiting stuff has poor scalability, but
> without digging into it I have no idea what can be done. The easy way
> out would be to speculatively spin before buggering off, but one would
> have to check what happens in real workloads -- presumably the lock
> owner can be off cpu for a long time (I presume there is no way to
> store the owner).
>
> The now-removed lock uses rwsems which behave better when contested
> and was pulling contention away from folios, artificially *helping*
> performance by having the folio bottleneck be exercised less.
>
> The right thing to do in the long run is still to whack the llseek
> lock acquire, but in the light of the above it can probably wait for
> better times.
WIlly mentioned the folio wait queue hash table could be grown, you
can find it in mm/filemap.c:
1062 #define PAGE_WAIT_TABLE_BITS 8
1063 #define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS)
1064 static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE]
__cacheline_aligned;
1065
1066 static wait_queue_head_t *folio_waitqueue(struct folio *folio)
1067 {
1068 │ return &folio_wait_table[hash_ptr(folio, PAGE_WAIT_TABLE_BITS)];
1069 }
Can you collect off cpu time? offcputime-bpfcc -K > /tmp/out
On debian this ships with the bpfcc-tools package.
--
Mateusz Guzik <mjguzik gmail.com>
next prev parent reply other threads:[~2024-11-28 4:31 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-27 5:47 Bharata B Rao
2024-11-27 5:47 ` [RFC PATCH 1/1] block/ioctl: Add an ioctl to enable large folios for " Bharata B Rao
2024-11-27 6:26 ` Christoph Hellwig
2024-11-27 10:37 ` Bharata B Rao
2024-11-28 5:43 ` Christoph Hellwig
2024-11-27 6:13 ` [RFC PATCH 0/1] Large folios in " Mateusz Guzik
2024-11-27 6:19 ` Mateusz Guzik
2024-11-27 12:02 ` Jan Kara
2024-11-27 12:13 ` Christian Brauner
2024-11-28 5:40 ` Ritesh Harjani
2024-11-27 12:18 ` Bharata B Rao
2024-11-27 12:28 ` Mateusz Guzik
2024-11-28 4:01 ` Bharata B Rao
2024-11-28 4:22 ` Matthew Wilcox
2024-11-28 4:37 ` Bharata B Rao
2024-11-28 11:23 ` Bharata B Rao
2024-11-28 23:31 ` Mateusz Guzik
2024-11-29 10:32 ` Bharata B Rao
2024-11-28 4:22 ` Mateusz Guzik
2024-11-28 4:31 ` Mateusz Guzik [this message]
2024-12-02 9:37 ` Bharata B Rao
2024-12-02 10:08 ` Mateusz Guzik
2024-12-03 5:01 ` Bharata B Rao
2024-11-28 4:43 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGudoHHo4sLNpoVw-WTGVCc-gL0xguYWfUWfV1CSsQo6-bGnFg@mail.gmail.com \
--to=mjguzik@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bharata@amd.com \
--cc=brauner@kernel.org \
--cc=clm@meta.com \
--cc=david@redhat.com \
--cc=jack@suse.cz \
--cc=joshdon@google.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nikunj@amd.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox