[PATCH] block: Skip the folio lock if the folio is already dirty

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] block: Skip the folio lock if the folio is already dirty
@ 2025-01-24 22:48 Matthew Wilcox (Oracle)
  2025-01-24 22:49 ` Matthew Wilcox
  2025-01-27  7:52 ` Hannes Reinecke
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-01-24 22:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-block, linux-fsdevel, Andres Freund

Postgres sees significant contention on the hashed folio waitqueue lock
when performing direct I/O to 1GB hugetlb pages.  This is because we
mark the destination pages as dirty, and the locks end up 512x more
contended with 1GB pages than with 2MB pages.

We can skip the locking if the folio is already marked as dirty.
The writeback path clears the dirty flag before commencing writeback,
if we see the dirty flag set, the data written to the folio will be
written back.

In one test, throughput increased from 18GB/s to 20GB/s and moved the
bottleneck elsewhere.

Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 block/bio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index f0c416e5931d..e8d18a0fecb5 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1404,6 +1404,8 @@ void bio_set_pages_dirty(struct bio *bio)
 	struct folio_iter fi;

 	bio_for_each_folio_all(fi, bio) {
+		if (folio_test_dirty(folio))
+			continue;
 		folio_lock(fi.folio);
 		folio_mark_dirty(fi.folio);
 		folio_unlock(fi.folio);
-- 
2.45.2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: Skip the folio lock if the folio is already dirty
  2025-01-24 22:48 [PATCH] block: Skip the folio lock if the folio is already dirty Matthew Wilcox (Oracle)
@ 2025-01-24 22:49 ` Matthew Wilcox
  2025-01-27  7:52 ` Hannes Reinecke
  1 sibling, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2025-01-24 22:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-mm, linux-block, linux-fsdevel, Andres Freund

On Fri, Jan 24, 2025 at 10:48:29PM +0000, Matthew Wilcox (Oracle) wrote:
> Postgres sees significant contention on the hashed folio waitqueue lock
> when performing direct I/O to 1GB hugetlb pages.  This is because we
> mark the destination pages as dirty, and the locks end up 512x more
> contended with 1GB pages than with 2MB pages.
> 
> We can skip the locking if the folio is already marked as dirty.
> The writeback path clears the dirty flag before commencing writeback,
> if we see the dirty flag set, the data written to the folio will be
> written back.
> 
> In one test, throughput increased from 18GB/s to 20GB/s and moved the
> bottleneck elsewhere.
> 
> Reported-by: Andres Freund <andres@anarazel.de>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

... this is the wrong version of the patch *facepalm*


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: Skip the folio lock if the folio is already dirty
  2025-01-24 22:48 [PATCH] block: Skip the folio lock if the folio is already dirty Matthew Wilcox (Oracle)
  2025-01-24 22:49 ` Matthew Wilcox
@ 2025-01-27  7:52 ` Hannes Reinecke
  1 sibling, 0 replies; 5+ messages in thread
From: Hannes Reinecke @ 2025-01-27  7:52 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Jens Axboe
  Cc: linux-mm, linux-block, linux-fsdevel, Andres Freund

On 1/24/25 23:48, Matthew Wilcox (Oracle) wrote:
> Postgres sees significant contention on the hashed folio waitqueue lock
> when performing direct I/O to 1GB hugetlb pages.  This is because we
> mark the destination pages as dirty, and the locks end up 512x more
> contended with 1GB pages than with 2MB pages.
> 
> We can skip the locking if the folio is already marked as dirty.
> The writeback path clears the dirty flag before commencing writeback,
> if we see the dirty flag set, the data written to the folio will be
> written back.
> 
> In one test, throughput increased from 18GB/s to 20GB/s and moved the
> bottleneck elsewhere.
> 
> Reported-by: Andres Freund <andres@anarazel.de>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   block/bio.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/block/bio.c b/block/bio.c
> index f0c416e5931d..e8d18a0fecb5 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1404,6 +1404,8 @@ void bio_set_pages_dirty(struct bio *bio)
>   	struct folio_iter fi;
>   
>   	bio_for_each_folio_all(fi, bio) {
> +		if (folio_test_dirty(folio))
> +			continue;
>   		folio_lock(fi.folio);
>   		folio_mark_dirty(fi.folio);
>   		folio_unlock(fi.folio);

The same reasoning can probably applied to __bio_release_pages().

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] block: Skip the folio lock if the folio is already dirty
@ 2025-01-24 22:51 Matthew Wilcox (Oracle)
  2025-01-28  5:58 ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-01-24 22:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-block, linux-fsdevel, Andres Freund

Postgres sees significant contention on the hashed folio waitqueue lock
when performing direct I/O to 1GB hugetlb pages.  This is because we
mark the destination pages as dirty, and the locks end up 512x more
contended with 1GB pages than with 2MB pages.

We can skip the locking if the folio is already marked as dirty.
The writeback path clears the dirty flag before commencing writeback,
if we see the dirty flag set, the data written to the folio will be
written back.

In one test, throughput increased from 18GB/s to 20GB/s and moved the
bottleneck elsewhere.

Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 block/bio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index f0c416e5931d..58d30b1dc08e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1404,6 +1404,8 @@ void bio_set_pages_dirty(struct bio *bio)
 	struct folio_iter fi;

 	bio_for_each_folio_all(fi, bio) {
+		if (folio_test_dirty(fi.folio))
+			continue;
 		folio_lock(fi.folio);
 		folio_mark_dirty(fi.folio);
 		folio_unlock(fi.folio);
-- 
2.45.2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] block: Skip the folio lock if the folio is already dirty
  2025-01-24 22:51 Matthew Wilcox (Oracle)
@ 2025-01-28  5:58 ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2025-01-28  5:58 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Jens Axboe, linux-mm, linux-block, linux-fsdevel, Andres Freund

On Fri, Jan 24, 2025 at 10:51:02PM +0000, Matthew Wilcox (Oracle) wrote:
>  	bio_for_each_folio_all(fi, bio) {
> +		if (folio_test_dirty(fi.folio))
> +			continue;

Can you add a comment why this is safe (the answer probably is "folio
dirtying through direct I/O is racy as hell anyway and we don't care")
and desirable?

Otherwise this looks great.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-01-28  5:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-24 22:48 [PATCH] block: Skip the folio lock if the folio is already dirty Matthew Wilcox (Oracle)
2025-01-24 22:49 ` Matthew Wilcox
2025-01-27  7:52 ` Hannes Reinecke
2025-01-24 22:51 Matthew Wilcox (Oracle)
2025-01-28  5:58 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox