linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Ryan Roberts <ryan.roberts@arm.com>, Peter Xu <peterx@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Christoph Hellwig <hch@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	James Houghton <jthoughton@google.com>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	David Hildenbrand <david@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	John Hubbard <jhubbard@nvidia.com>,
	Yang Shi <shy828301@gmail.com>, Rik van Riel <riel@surriel.com>,
	Hugh Dickins <hughd@google.com>, Jason Gunthorpe <jgg@nvidia.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	Mike Rapoport <rppt@kernel.org>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing
Date: Mon, 4 Dec 2023 11:25:16 +0000	[thread overview]
Message-ID: <97c21205-f3e6-4634-82e6-c7bbd81d1835@csgroup.eu> (raw)
In-Reply-To: <01aad92f-b1e0-4f31-b905-8b1c2012ebab@arm.com>



Le 04/12/2023 à 12:11, Ryan Roberts a écrit :
> On 03/12/2023 13:33, Christophe Leroy wrote:
>>
>>
>> Le 30/11/2023 à 22:30, Peter Xu a écrit :
>>> On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote:
>>>> On Fri, Nov 24, 2023 at 09:06:01AM +0000, Ryan Roberts wrote:
>>>>> I don't have any micro-benchmarks for GUP though, if that's your question. Is
>>>>> there an easy-to-use test I can run to get some numbers? I'd be happy to try it out.
>>>>
>>>> Thanks Ryan.  Then nothing is needed to be tested if gup is not yet touched
>>>> from your side, afaict.  I'll see whether I can provide some rough numbers
>>>> instead in the next post (I'll probably only be able to test it in a VM,
>>>> though, but hopefully that should still reflect mostly the truth).
>>>
>>> An update: I finished a round of 64K cont_pte test, in the slow gup micro
>>> benchmark I see ~15% perf degrade with this patchset applied on a VM on top
>>> of Apple M1.
>>>
>>> Frankly that's even less than I expected, considering not only how slow gup
>>> THP used to be, but also on the fact that that's a tight loop over slow
>>> gup, which in normal cases shouldn't happen: "present" ptes normally goes
>>> to fast-gup, while !present goes into a fault following it.  I assume
>>> that's why nobody cared slow gup for THP before.  I think adding cont_pte
>>> support shouldn't be very hard, but that will include making cont_pte idea
>>> global just for arm64 and riscv Svnapot.
>>
>> Is there any documentation on what cont_pte is ? I have always wondered
>> if it could also fit powerpc 8xx need ?
> 
> pte_cont() (and pte_mkcont() and pte_mknoncont()) test and manipulte the
> "contiguous bit" in the arm64 PTE entries. Those helpers are arm64-specific
> (AFAIK). The contiguous bit is a hint to the HW to tell it that a block of PTEs
> are mapping a physically contiguous and naturally aligned piece of memory. The
> HW can use this to coalesce entries in the TLB. When using 4K base pages, the
> contpte size is 64K (16 PTEs). For 16K base pages, its 2M (128 PTEs) and for 64K
> base pages, its 2M (32 PTEs).
> 
>>
>> On powerpc, for 16k pages, we have to define 4 consecutive PTEs. All 4
>> PTE are flagged with the SPS bit telling it's a 16k pages, but for TLB
>> misses the HW needs one entrie for each 4k fragment.
> 
>  From that description, it sounds like the SPS bit might be similar to arm64
> contiguous bit? Although sounds like you are currently using it in a slightly
> different way - telling kernel that the base page is 16K but mapping each 16K
> page with 4x 4K entries (plus the SPS bit set)?

Yes it's both.

When the base page is 16k, there are 4x 4k entries (with SPS bit set) in 
the page table, and pte_t is a table of 4 'unsigned long'

When the base page is 4k, there is a 16k hugepage size, which is the 
same 4x 4k entries with SPS bit set.

So it looks similar to the contiguous bit.


And by extension, the same principle is used for 512k hugepages, the bit 
_PAGE_HUGE is copied by the TLB miss handler into the lower bit of PS, 
PS being as follows:
- 00 Small (4 Kbyte or 16 Kbyte)
- 01 512 Kbyte
- 10 Reserved
- 11 8 Mbyte

So as PMD size is 4M, 512k pages are 128 identical consecutive entries 
in the page table.

I which I could have THP with 16k or 512k pages.

Christophe

  reply	other threads:[~2023-12-04 11:25 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-16  1:28 [PATCH RFC 00/12] mm/gup: Unify hugetlb, part 2 Peter Xu
2023-11-16  1:28 ` [PATCH RFC 01/12] mm/hugetlb: Export hugetlbfs_pagecache_present() Peter Xu
2023-11-23  7:23   ` Christoph Hellwig
2023-11-23 16:05     ` Peter Xu
2023-11-16  1:28 ` [PATCH RFC 02/12] mm: Provide generic pmd_thp_or_huge() Peter Xu
2023-11-16  1:28 ` [PATCH RFC 03/12] mm: Export HPAGE_PXD_* macros even if !THP Peter Xu
2023-11-23  7:23   ` Christoph Hellwig
2023-11-23  9:53     ` Mike Rapoport
2023-11-23 15:27       ` Peter Xu
2023-11-16  1:29 ` [PATCH RFC 04/12] mm: Introduce vma_pgtable_walk_{begin|end}() Peter Xu
2023-11-23  7:24   ` Christoph Hellwig
2023-11-23 16:11     ` Peter Xu
2023-11-24  4:02   ` Aneesh Kumar K.V
2023-11-24 15:34     ` Peter Xu
2023-11-16  1:29 ` [PATCH RFC 05/12] mm/gup: Fix follow_devmap_p[mu]d() to return even if NULL Peter Xu
2023-11-23  7:25   ` Christoph Hellwig
2023-11-23 17:59     ` Peter Xu
2023-11-16  1:29 ` [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing Peter Xu
2023-11-20  8:26   ` Christoph Hellwig
2023-11-21 15:59     ` Peter Xu
2023-11-22  8:00       ` Christoph Hellwig
2023-11-22 15:22         ` Peter Xu
2023-11-23  7:21           ` Christoph Hellwig
2023-11-23 16:10             ` Peter Xu
2023-11-23 18:22           ` Christophe Leroy
2023-11-23 19:37             ` Peter Xu
2023-11-24  5:28               ` Aneesh Kumar K.V
2023-11-24  7:03               ` Christophe Leroy
2023-11-24  1:06           ` Michael Ellerman
2023-11-23 15:47         ` Matthew Wilcox
2023-11-23 17:22           ` Peter Xu
2023-11-23 19:11             ` Ryan Roberts
2023-11-23 19:46               ` Peter Xu
2023-11-24  9:06                 ` Ryan Roberts
2023-11-24 16:07                   ` Peter Xu
2023-11-30 21:30                     ` Peter Xu
2023-12-03 13:33                       ` Christophe Leroy
2023-12-04 11:11                         ` Ryan Roberts
2023-12-04 11:25                           ` Christophe Leroy [this message]
2023-12-04 11:46                             ` Ryan Roberts
2023-12-04 11:57                               ` Christophe Leroy
2023-12-04 12:02                                 ` Ryan Roberts
2023-12-04 16:48                           ` Peter Xu
2023-11-16  1:29 ` [PATCH RFC 07/12] mm/gup: Refactor record_subpages() to find 1st small page Peter Xu
2023-11-16 14:51   ` Matthew Wilcox
2023-11-16 19:40     ` Peter Xu
2023-11-16 19:41       ` Matthew Wilcox
2023-11-16  1:29 ` [PATCH RFC 08/12] mm/gup: Handle hugetlb for no_page_table() Peter Xu
2023-11-23  7:26   ` Christoph Hellwig
2023-11-16  1:29 ` [PATCH RFC 09/12] mm/gup: Handle huge pud for follow_pud_mask() Peter Xu
2023-11-23  7:28   ` Christoph Hellwig
2023-11-23 16:19     ` Peter Xu
2023-11-16  1:29 ` [PATCH RFC 10/12] mm/gup: Handle huge pmd for follow_pmd_mask() Peter Xu
2023-11-16  1:29 ` [PATCH RFC 11/12] mm/gup: Handle hugepd for follow_page() Peter Xu
2023-11-16  1:29 ` [PATCH RFC 12/12] mm/gup: Merge hugetlb into generic mm code Peter Xu
2023-11-23  7:29   ` Christoph Hellwig
2023-11-23 16:21     ` Peter Xu
2023-11-22 14:51 ` [PATCH RFC 00/12] mm/gup: Unify hugetlb, part 2 Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97c21205-f3e6-4634-82e6-c7bbd81d1835@csgroup.eu \
    --to=christophe.leroy@csgroup.eu \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lstoakes@gmail.com \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox