From: Gregory Price <gourry@gourry.net>
To: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
nehagholkar@meta.com, abhishekd@meta.com, kernel-team@meta.com,
david@redhat.com, nphamcs@gmail.com, akpm@linux-foundation.org,
hannes@cmpxchg.org, kbusch@meta.com
Subject: Re: [RFC v2 PATCH 0/5] Promotion of Unmapped Page Cache Folios.
Date: Tue, 31 Dec 2024 02:32:55 -0500 [thread overview]
Message-ID: <Z3OeJ9KGLQOt1KOI@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <Z29yxfeZMowr27ZZ@gourry-fedora-PF4VCD3F>
On Fri, Dec 27, 2024 at 10:38:45PM -0500, Gregory Price wrote:
> On Fri, Dec 27, 2024 at 02:09:50PM -0500, Gregory Price wrote:
>
> This seems to imply that the overhead we're seeing from read() even
> when filecache is on the remote node isn't actually related to the
> memory speed, but instead likely related to some kind of stale
> metadata in the filesystem or filecache layers.
>
> ~Gregory
Mystery solved
> +void promotion_candidate(struct folio *folio)
> +{
... snip ...
> + list_add(&folio->lru, promo_list);
> +}
read(file, length) will do a linear read, and promotion_candidate will
add those pages to the promotion list head resulting into a reversed
promotion order
so you read [1,2,3,4] folios, you'll promote in [4,3,2,1] order.
The result of this, on an unloaded system, is essentially that pages end
up in the worst possible configuration for the prefetcher, and therefore
TLB hits. I figured this out because i was seeing the additional ~30%
overhead show up purely in `copy_page_to_iter()` (i.e. copy_to_user).
Swapping this for list_add_tail results in the following test result:
initializing
Read loop took 9.41 seconds <- reading from CXL
Read loop took 31.74 seconds <- migration enabled
Read loop took 10.31 seconds
Read loop took 7.71 seconds <- migration finished
Read loop took 7.71 seconds
Read loop took 7.70 seconds
Read loop took 7.75 seconds
Read loop took 19.34 seconds <- dropped caches
Read loop took 13.68 seconds <- cache refilling to DRAM
Read loop took 7.37 seconds
Read loop took 7.68 seconds
Read loop took 7.65 seconds <- back to DRAM baseline
On our CXL devices, we're seeing a 22-27% performance penalty for a file
being hosted entirely out of CXL. When we promote this file out of CXL,
we set a 22-27% performance boost.
Probably list_add_tail is right here, but since files *tend to* be read
linearly with `read()` this should *tend toward* optimal. That said, we
can probably make this more reliable by adding batch migration function
`mpol_migrate_misplaced_batch()` which also tries to do bulk allocation
of destination folios. This will also probably save us a bunch of
invalidation overhead.
I'm also noticing that the migration limit (256mbps) is not being
respected, probably because we're doing 1 folio at a time instead of a
batch. Will probably look at changing promotion_candidate to limit the
number of selected pages to promote per read-call.
---
diff --git a/mm/migrate.c b/mm/migrate.c
index f965814b7d40..99b584f22bcb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2675,7 +2675,7 @@ void promotion_candidate(struct folio *folio)
folio_putback_lru(folio);
return;
}
- list_add(&folio->lru, promo_list);
+ list_add_tail(&folio->lru, promo_list);
return;
}
next prev parent reply other threads:[~2024-12-31 7:33 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-10 21:37 Gregory Price
2024-12-10 21:37 ` [RFC v2 PATCH 1/5] migrate: Allow migrate_misplaced_folio_prepare() to accept a NULL VMA Gregory Price
2024-12-10 21:37 ` [RFC v2 PATCH 2/5] memory: move conditionally defined enums use inside ifdef tags Gregory Price
2024-12-27 10:34 ` Donet Tom
2024-12-27 15:42 ` Gregory Price
2024-12-29 14:49 ` Donet Tom
2024-12-10 21:37 ` [RFC v2 PATCH 3/5] memory: allow non-fault migration in numa_migrate_check path Gregory Price
2024-12-10 21:37 ` [RFC v2 PATCH 4/5] vmstat: add page-cache numa hints Gregory Price
2024-12-27 10:48 ` Donet Tom
2024-12-27 15:49 ` Gregory Price
2024-12-29 14:57 ` Donet Tom
2025-01-03 10:18 ` Donet Tom
2025-01-03 19:19 ` Gregory Price
2024-12-10 21:37 ` [RFC v2 PATCH 5/5] migrate,sysfs: add pagecache promotion Gregory Price
2024-12-27 11:01 ` Donet Tom
2024-12-27 15:56 ` Gregory Price
2024-12-29 15:00 ` Donet Tom
2024-12-21 5:18 ` [RFC v2 PATCH 0/5] Promotion of Unmapped Page Cache Folios Huang, Ying
2024-12-21 14:48 ` Gregory Price
2024-12-22 7:09 ` Huang, Ying
2024-12-22 16:22 ` Gregory Price
2024-12-27 2:16 ` Huang, Ying
2024-12-27 15:40 ` Gregory Price
2024-12-27 19:09 ` Gregory Price
2024-12-28 3:38 ` Gregory Price
2024-12-31 7:32 ` Gregory Price [this message]
2025-01-02 2:58 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z3OeJ9KGLQOt1KOI@gourry-fedora-PF4VCD3F \
--to=gourry@gourry.net \
--cc=abhishekd@meta.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=kbusch@meta.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nehagholkar@meta.com \
--cc=nphamcs@gmail.com \
--cc=ying.huang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox