linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: linux-mm@kvack.org
Cc: linux-block@vger.kernel.org, Muchun Song <muchun.song@linux.dev>,
	Jane Chu <jane.chu@oracle.com>
Subject: Direct I/O performance problems with 1GB pages
Date: Sun, 26 Jan 2025 00:46:45 +0000	[thread overview]
Message-ID: <Z5WF9cA-RZKZ5lDN@casper.infradead.org> (raw)

Postgres are experimenting with doing direct I/O to 1GB hugetlb pages.
Andres has gathered some performance data showing significantly worse
performance with 1GB pages compared to 2MB pages.  I sent a patch
recently which improves matters [1], but problems remain.

The primary problem we've identified is contention of folio->_refcount
with a strong secondary contention on folio->_pincount.  This is coming
from the call chain:

iov_iter_extract_pages ->
gup_fast_fallback ->
try_grab_folio_fast

Obviously we can fix this by sharding the counts.  We could do that by
address, since there's no observed performance problem with 2MB pages.
But I think we'd do better to shard by CPU.  We have percpu-refcount.h
already, and I think it'll work.

The key to percpu refcounts is knowing at what point you need to start
caring about whether the refcount has hit zero (we don't care if the
refcount oscillates between 1 and 2, but we very much care about when
we hit 0).

I think the point at which we call percpu_ref_kill() is when we remove a
folio from the page cache.  Before that point, the refcount is guaranteed
to always be positive.  After that point, once the refcount hits zero,
we must free the folio.

It's pretty rare to remove a hugetlb page from the page cache while it's
still mapped.  So we don't need to worry about scalability at that point.

Any volunteers to prototype this?  Andres is a delight to work with,
but I just don't have time to take on this project right now.

[1] https://lore.kernel.org/linux-block/20250124225104.326613-1-willy@infradead.org/


             reply	other threads:[~2025-01-26  0:46 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-26  0:46 Matthew Wilcox [this message]
2025-01-27 14:09 ` David Hildenbrand
2025-01-27 16:02   ` Matthew Wilcox
2025-01-27 16:09     ` David Hildenbrand
2025-01-27 16:20       ` David Hildenbrand
2025-01-27 16:56         ` Matthew Wilcox
2025-01-27 16:59           ` David Hildenbrand
2025-01-27 18:21       ` Andres Freund
2025-01-27 18:54         ` Jens Axboe
2025-01-27 19:07           ` David Hildenbrand
2025-01-27 21:32           ` Pavel Begunkov
2025-01-27 16:24     ` Keith Busch
2025-01-27 17:25   ` Andres Freund
2025-01-27 19:20     ` David Hildenbrand
2025-01-27 19:36       ` Andres Freund
2025-01-28  5:56 ` Christoph Hellwig
2025-01-28  9:47   ` David Hildenbrand
2025-01-29  6:03     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5WF9cA-RZKZ5lDN@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=jane.chu@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox