From: David Hildenbrand <david@redhat.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>,
Oscar Salvador <osalvador@suse.de>,
lsf-pc@lists.linux-foundation.org
Cc: Peter Xu <peterx@redhat.com>, Muchun Song <muchun.song@linux.dev>,
linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] HugeTLB generic pagewalk
Date: Tue, 4 Feb 2025 21:19:16 +0100 [thread overview]
Message-ID: <8bd74d6f-6086-41d2-97ec-98bf1b9cb07e@redhat.com> (raw)
In-Reply-To: <660f6ee7-f474-4f72-b442-5f048a2ff8bb@csgroup.eu>
>>
>> commit 0549e76663730235a10395a7af7ad3d3ce6e2402
>> Author: Christophe Leroy <christophe.leroy@csgroup.eu>
>> Date: Tue Jul 2 15:51:25 2024 +0200
>>
>> powerpc/8xx: rework support for 8M pages using contiguous PTE entries
>> In order to fit better with standard Linux page tables layout, add
>> support
>> for 8M pages using contiguous PTE entries in a standard page
>> table. Page
>> tables will then be populated with 1024 similar entries and two PMD
>> entries will point to that page table.
>> The PMD entries also get a flag to tell it is addressing an 8M
>> page, this
>> is required for the HW tablewalk assistance.
>>
>> Where we are walking a PTE table, but actually there is another PTE
>> table we
>> have to modify in the same go.
>>
>>
>> Very hard to make that non-hugetlb aware, as it's simply completely
>> different compared
>> to ordinary page table walking/modifications today.
>>
>> Maybe there are ideas to tackle that, and I'd be very interested in them.
>>
>
>
> But at least that 8xx change allowed us to get ride of huge page
> directories (hugepd) which was even more painful IIUC.
Yes, don't get me wrong, it was a clear win to get rid of hugepd,
allowing for GUP and folio_walk to work in a non-hugetlb fashion: at
least, when all we want to do is lookup which page is mapped at a given
address.
Unfortunately, that's not what all page table walkers do.
>
> Neverthless, can't we turn that into a standard walk in a way or another ?
>
> While we walk we reach a PMD entry which is marked as a CONT-PMD, but it
> is not tagged as a leaf entry, so there is a page table below. PMD_SIZE
> is 4M but the page size is 8M so once you've walked the page table
> entirely you know you still have 4M to go so you have to walk the second
> PMD and the page table it points to.
We would somehow have to fake that it is a PMD leaf, and realize that
they both are cont, so we can batch both PMDs. The PTE page table
handling is a bit of a pain, though.
... and modifying entries it is a bit of a pain as well; unless we can
hide all that somehow in the powerpc pmd setters.
Hm, far from ideal, at least at this stage, because we don't really
support cont-pmd outside of hugetlb, and a lot of page table walkers
must be taught do deal with cont-pmd.
>
> By the way, don't know it can help or make things worse, but indeed from
> a HW point of view there is no need to replicate 1024 times the PTE
> entry. Here we used a standard page table because it looked more generic
> from kernel point of view, but all the HW needs is a single PTE located
> at a page aligned address. Thats what we had when we used huge page
> directories (hugepd). It was even easier because both PMD entries were
> pointing to the same hugepd entry hence no need of CONT-PTE-like
> management at PTE level.
Ah, I see. I'll have to think about that a bit ... far from trivial.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-02-04 20:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-30 21:36 Oscar Salvador
2025-01-30 22:45 ` Peter Xu
2025-01-30 22:46 ` Matthew Wilcox
2025-01-30 23:19 ` David Hildenbrand
2025-01-31 15:42 ` Christophe Leroy
2025-02-04 20:19 ` David Hildenbrand [this message]
2025-02-03 10:10 ` Oscar Salvador
2025-02-04 20:40 ` David Hildenbrand
2025-02-05 9:33 ` Oscar Salvador
2025-02-11 13:31 ` David Hildenbrand
2025-02-12 9:13 ` Oscar Salvador
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8bd74d6f-6086-41d2-97ec-98bf1b9cb07e@redhat.com \
--to=david@redhat.com \
--cc=christophe.leroy@csgroup.eu \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox