linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: "peterx@redhat.com" <peterx@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	James Houghton <jthoughton@google.com>,
	David Hildenbrand <david@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Yang Shi <shy828301@gmail.com>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Aneesh Kumar K . V" <aneesh.kumar@kernel.org>,
	Rik van Riel <riel@surriel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mike Rapoport <rppt@kernel.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Andrew Jones <andrew.jones@linux.dev>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <muchun.song@linux.dev>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Christoph Hellwig <hch@infradead.org>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v2 06/13] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing
Date: Tue, 16 Jan 2024 18:32:32 +0000	[thread overview]
Message-ID: <44e450cb-5d3f-407e-97a3-024eb936f74b@csgroup.eu> (raw)
In-Reply-To: <20240116123138.GZ734935@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]



Le 16/01/2024 à 13:31, Jason Gunthorpe a écrit :
> On Tue, Jan 16, 2024 at 06:30:39AM +0000, Christophe Leroy wrote:
>>
>>
>> Le 15/01/2024 à 19:37, Jason Gunthorpe a écrit :
>>> On Wed, Jan 03, 2024 at 05:14:16PM +0800, peterx@redhat.com wrote:
>>>> From: Peter Xu <peterx@redhat.com>
>>>>
>>>> Hugepd format for GUP is only used in PowerPC with hugetlbfs.  There are
>>>> some kernel usage of hugepd (can refer to hugepd_populate_kernel() for
>>>> PPC_8XX), however those pages are not candidates for GUP.
>>>>
>>>> Commit a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM GUP-fast writing to
>>>> file-backed mappings") added a check to fail gup-fast if there's potential
>>>> risk of violating GUP over writeback file systems.  That should never apply
>>>> to hugepd.  Considering that hugepd is an old format (and even
>>>> software-only), there's no plan to extend hugepd into other file typed
>>>> memories that is prone to the same issue.
>>>
>>> I didn't dig into the ppc stuff too deeply, but this looks to me like
>>> it is the same thing as ARM's contig bits?
>>>
>>> ie a chunk of PMD/etc entries are all managed together as though they
>>> are a virtual larger entry and we use the hugepte_addr_end() stuff to
>>> iterate over each sub entry.
>>
>> As far as I understand ARM's contig stuff, hugepd on powerpc is
>> something different.
>>
>> hugepd is a page directory dedicated to huge pages, where you have huge
>> pages listed instead of regular pages. For instance, on powerpc 32 with
>> each PGD entries covering 4Mbytes, a regular page table has 1024 PTEs. A
>> hugepd for 512k is a page table with 8 entries.
>>
>> And for 8Mbytes entries, the hugepd is a page table with only one entry.
>> And 2 consecutive PGS entries will point to the same hugepd to cover the
>> entire 8Mbytes.
> 
> That still sounds alot like the ARM thing - except ARM replicates the
> entry, you also said PPC relicates the entry like ARM to get to the
> 8M?

Is it like ARM ? Not sure. The PTE is not in the PGD it must be in a L2 
directory, even for 8M.

You can see in attached picture what the hardware expects.

> 
> I guess the difference is in how the table memory is layed out? ARM
> marks the size in the same entry that has the physical address so the
> entries are self describing and then replicated. It kind of sounds
> like PPC is marking the size in prior level and then reconfiguring the
> layout of the lower level? Otherwise it surely must do the same
> replication to make a radix index work..

Yes that's how it works on powerpc. For 8xx we used to do that for both 
8M and 512k pages. Now for 512k pages we do kind of like ARM (which 
means replicating the entry 128 times) as that's needed to allow mixing 
different page sizes for a given PGD entry.

But for 8M pages that would mean replicating the entry 2048 times. 
That's a bit too much isn't it ?

> 
> If yes, I guess that is the main problem, the mm APIs don't have way
> today to convey data from the pgd level to understand how to parse the
> pmd level?
> 
>>> It seems to me we should see ARM and PPC agree on what the API is for
>>> this and then get rid of hugepd by making both use the same page table
>>> walker API. Is that too hopeful?
>>
>> Can't see the similarity between ARM contig PTE and PPC huge page
>> directories.
> 
> Well, they are both variable sized entries.
> 
> So if you imagine a pmd_leaf(), pmd_leaf_size() and a pte_leaf_size()
> that would return enough information for both.

pmd_leaf() ? Unless I'm missing something I can't do leaf at PMD (PGD) 
level. It must be a two-level process even for pages bigger than a PMD 
entry.

Christophe

[-- Attachment #2: MPC8xx_page_tables.png --]
[-- Type: image/png, Size: 126859 bytes --]

  reply	other threads:[~2024-01-16 18:32 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-03  9:14 [PATCH v2 00/13] mm/gup: Unify hugetlb, part 2 peterx
2024-01-03  9:14 ` [PATCH v2 01/13] mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES peterx
2024-01-15 17:37   ` Jason Gunthorpe
2024-01-22  8:25     ` Peter Xu
2024-01-03  9:14 ` [PATCH v2 02/13] mm/hugetlb: Declare hugetlbfs_pagecache_present() non-static peterx
2024-01-03  9:14 ` [PATCH v2 03/13] mm: Provide generic pmd_thp_or_huge() peterx
     [not found]   ` <20240115175551.GP734935@nvidia.com>
2024-02-21  9:37     ` Peter Xu
2024-02-21 12:57       ` Jason Gunthorpe
2024-01-03  9:14 ` [PATCH v2 04/13] mm: Make HPAGE_PXD_* macros even if !THP peterx
2024-01-15 17:59   ` Jason Gunthorpe
2024-01-03  9:14 ` [PATCH v2 05/13] mm: Introduce vma_pgtable_walk_{begin|end}() peterx
2024-01-03  9:14 ` [PATCH v2 06/13] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing peterx
2024-01-15 18:37   ` Jason Gunthorpe
2024-01-16  6:30     ` Christophe Leroy
2024-01-16 12:31       ` Jason Gunthorpe
2024-01-16 18:32         ` Christophe Leroy [this message]
2024-01-17 13:22           ` Jason Gunthorpe
2024-01-18 15:15             ` Ryan Roberts
2024-02-21 11:55     ` Peter Xu
2024-01-03  9:14 ` [PATCH v2 07/13] mm/gup: Refactor record_subpages() to find 1st small page peterx
2024-01-03  9:14 ` [PATCH v2 08/13] mm/gup: Handle hugetlb for no_page_table() peterx
2024-01-15 18:39   ` Jason Gunthorpe
2024-01-03  9:14 ` [PATCH v2 09/13] mm/gup: Cache *pudp in follow_pud_mask() peterx
2024-01-15 18:41   ` Jason Gunthorpe
2024-01-03  9:14 ` [PATCH v2 10/13] mm/gup: Handle huge pud for follow_pud_mask() peterx
2024-01-15 18:49   ` Jason Gunthorpe
2024-01-03  9:14 ` [PATCH v2 11/13] mm/gup: Handle huge pmd for follow_pmd_mask() peterx
2024-01-15 18:51   ` Jason Gunthorpe
2024-01-03  9:14 ` [PATCH v2 12/13] mm/gup: Handle hugepd for follow_page() peterx
2024-01-03  9:14 ` [PATCH v2 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code peterx
2024-01-03 11:14 ` [PATCH v2 00/13] mm/gup: Unify hugetlb, part 2 Christophe Leroy
2024-01-08  7:27   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44e450cb-5d3f-407e-97a3-024eb936f74b@csgroup.eu \
    --to=christophe.leroy@csgroup.eu \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.jones@linux.dev \
    --cc=aneesh.kumar@kernel.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hch@infradead.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lstoakes@gmail.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mpe@ellerman.id.au \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox