From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: lsf-pc@lists.linux-foundation.org, Peter Xu <peterx@redhat.com>,
Muchun Song <muchun.song@linux.dev>,
linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] HugeTLB generic pagewalk
Date: Wed, 5 Feb 2025 10:33:17 +0100 [thread overview]
Message-ID: <Z6MwXUuQhJHDxa6N@localhost.localdomain> (raw)
In-Reply-To: <74ecaa8b-9e94-4ba8-a2f0-a312607516ba@redhat.com>
On Tue, Feb 04, 2025 at 09:40:16PM +0100, David Hildenbrand wrote:
> Unfortunately not that easy. You can only have parts of a large folio
> mapped. It would be PAGE_SIZE * nr_ptes when batching.
I see, you are right.
> I think this is the wrong approach. We should replace that pagewalk API
> usage by something better that hides all the batching.
>
> The interface would look similar to folio_walk (no callbacks, handling of
> locking), but (a) work on ranges; (b) work also on non-folio entries; and
> (c) batch all suitable entries in the range.
>
> Something like a pt_range_walk_start() that returns a "type" (folio range,
> migration entries, swap range, ...) + stores other details (range, level,
> ptep, ...) in a structure like folio_walk, to then provide mechanisms to
> continue (pt_walk_continue()) to walk or abort (pt_walk_done()) it. Similar
> to page_vma_mapped_walk(), but not specific to a given page/folio.
>
> Then, we would simply process the output of that. With the hope that, for
> hugetlb it will just batch all cont-pte / cont-pmd entries into a single
> return value.
>
> That will make the R/O walking as in task_mmu.c easier, hopefully.
>
> Not so much with PTE/PMD modifications, like damon_mkold_ops ... :( But
> maybe that just has to be special-cased for hugetlb, somehow ...
Ok, let me see if we are on the same page.
You are basically saying that we should replace the existing pagewalk API with
something similar to what you described above.
I have to confess that when you first mentioned this back in July when I
posted the RFC, I felt dishearted, because it implies an even bigger
surgery.
But having felt the mess that dealing with cont-{pmd,pud}s, and the
inability of the existing API to do that in a clean way (without having
to teach each and every function about that if needed), maybe it is the
only way to do this 1) right and 2) clean.
I thought that maybe we can get away and to the batching somehow before
calling in the callbacks e.g: at walk_{pud,pmd,pte}_range level, but I
am not sure whether 1) that is possible and 2) how ugly it would look.
So, given that this is not a really urgent matter, something that needs
to be fixed asap, maybe the way to go is to create an API that can deal
with all that, abstracting all these details.
I am willing to take a shot on this, if we are clear that it makes sense
to pursue this road.
--
Oscar Salvador
SUSE Labs
next prev parent reply other threads:[~2025-02-05 9:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-30 21:36 Oscar Salvador
2025-01-30 22:45 ` Peter Xu
2025-01-30 22:46 ` Matthew Wilcox
2025-01-30 23:19 ` David Hildenbrand
2025-01-31 15:42 ` Christophe Leroy
2025-02-04 20:19 ` David Hildenbrand
2025-02-03 10:10 ` Oscar Salvador
2025-02-04 20:40 ` David Hildenbrand
2025-02-05 9:33 ` Oscar Salvador [this message]
2025-02-11 13:31 ` David Hildenbrand
2025-02-12 9:13 ` Oscar Salvador
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6MwXUuQhJHDxa6N@localhost.localdomain \
--to=osalvador@suse.de \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=muchun.song@linux.dev \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox