linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>,
	Oscar Salvador <osalvador@suse.de>,
	lsf-pc@lists.linux-foundation.org
Cc: Peter Xu <peterx@redhat.com>, Muchun Song <muchun.song@linux.dev>,
	linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] HugeTLB generic pagewalk
Date: Tue, 4 Feb 2025 21:19:16 +0100	[thread overview]
Message-ID: <8bd74d6f-6086-41d2-97ec-98bf1b9cb07e@redhat.com> (raw)
In-Reply-To: <660f6ee7-f474-4f72-b442-5f048a2ff8bb@csgroup.eu>

>>
>> commit 0549e76663730235a10395a7af7ad3d3ce6e2402
>> Author: Christophe Leroy <christophe.leroy@csgroup.eu>
>> Date:   Tue Jul 2 15:51:25 2024 +0200
>>
>>       powerpc/8xx: rework support for 8M pages using contiguous PTE entries
>>       In order to fit better with standard Linux page tables layout, add
>> support
>>       for 8M pages using contiguous PTE entries in a standard page
>> table.  Page
>>       tables will then be populated with 1024 similar entries and two PMD
>>       entries will point to that page table.
>>       The PMD entries also get a flag to tell it is addressing an 8M
>> page, this
>>       is required for the HW tablewalk assistance.
>>
>> Where we are walking a PTE table, but actually there is another PTE
>> table we
>> have to modify in the same go.
>>
>>
>> Very hard to make that non-hugetlb aware, as it's simply completely
>> different compared
>> to ordinary page table walking/modifications today.
>>
>> Maybe there are ideas to tackle that, and I'd be very interested in them.
>>
> 
> 
> But at least that 8xx change allowed us to get ride of huge page
> directories (hugepd) which was even more painful IIUC.

Yes, don't get me wrong, it was a clear win to get rid of hugepd, 
allowing for GUP and folio_walk to work in a non-hugetlb fashion: at 
least, when all we want to do is lookup which page is mapped at a given 
address.

Unfortunately, that's not what all page table walkers do.

> 
> Neverthless, can't we turn that into a standard walk in a way or another ?
> 
> While we walk we reach a PMD entry which is marked as a CONT-PMD, but it
> is not tagged as a leaf entry, so there is a page table below. PMD_SIZE
> is 4M but the page size is 8M so once you've walked the page table
> entirely you know you still have 4M to go so you have to walk the second
> PMD and the page table it points to.

We would somehow have to fake that it is a PMD leaf, and realize that 
they both are cont, so we can batch both PMDs. The PTE page table 
handling is a bit of a pain, though.

... and modifying entries it is a bit of a pain as well; unless we can 
hide all that somehow in the powerpc pmd setters.

Hm, far from ideal, at least at this stage, because we don't really 
support cont-pmd outside of hugetlb, and a lot of page table walkers 
must be taught do deal with cont-pmd.

> 
> By the way, don't know it can help or make things worse, but indeed from
> a HW point of view there is no need to replicate 1024 times the PTE
> entry. Here we used a standard page table because it looked more generic
> from kernel point of view, but all the HW needs is a single PTE located
> at a page aligned address. Thats what we had when we used huge page
> directories (hugepd). It was even easier because both PMD entries were
> pointing to the same hugepd entry hence no need of CONT-PTE-like
> management at PTE level.

Ah, I see. I'll have to think about that a bit ... far from trivial.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-02-04 20:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-30 21:36 Oscar Salvador
2025-01-30 22:45 ` Peter Xu
2025-01-30 22:46 ` Matthew Wilcox
2025-01-30 23:19 ` David Hildenbrand
2025-01-31 15:42   ` Christophe Leroy
2025-02-04 20:19     ` David Hildenbrand [this message]
2025-02-03 10:10   ` Oscar Salvador
2025-02-04 20:40     ` David Hildenbrand
2025-02-05  9:33       ` Oscar Salvador
2025-02-11 13:31         ` David Hildenbrand
2025-02-12  9:13           ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8bd74d6f-6086-41d2-97ec-98bf1b9cb07e@redhat.com \
    --to=david@redhat.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox