From: Peter Xu <peterx@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
James Houghton <jthoughton@google.com>,
David Hildenbrand <david@redhat.com>,
"Kirill A . Shutemov" <kirill@shutemov.name>,
Yang Shi <shy828301@gmail.com>,
linux-riscv@lists.infradead.org,
Andrew Morton <akpm@linux-foundation.org>,
"Aneesh Kumar K . V" <aneesh.kumar@kernel.org>,
Rik van Riel <riel@surriel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Mike Rapoport <rppt@kernel.org>,
John Hubbard <jhubbard@nvidia.com>,
Vlastimil Babka <vbabka@suse.cz>,
Michael Ellerman <mpe@ellerman.id.au>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Andrew Jones <andrew.jones@linux.dev>,
linuxppc-dev@lists.ozlabs.org,
Mike Kravetz <mike.kravetz@oracle.com>,
Muchun Song <muchun.song@linux.dev>,
linux-arm-kernel@lists.infradead.org,
Christoph Hellwig <hch@infradead.org>,
Lorenzo Stoakes <lstoakes@gmail.com>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v2 03/13] mm: Provide generic pmd_thp_or_huge()
Date: Wed, 21 Feb 2024 17:37:37 +0800 [thread overview]
Message-ID: <ZdXEYfs_xhS_9gRo@x1n> (raw)
In-Reply-To: <20240115175551.GP734935@nvidia.com>
On Mon, Jan 15, 2024 at 01:55:51PM -0400, Jason Gunthorpe wrote:
> On Wed, Jan 03, 2024 at 05:14:13PM +0800, peterx@redhat.com wrote:
> > From: Peter Xu <peterx@redhat.com>
> >
> > ARM defines pmd_thp_or_huge(), detecting either a THP or a huge PMD. It
> > can be a helpful helper if we want to merge more THP and hugetlb code
> > paths. Make it a generic default implementation, only exist when
> > CONFIG_MMU. Arch can overwrite it by defining its own version.
> >
> > For example, ARM's pgtable-2level.h defines it to always return false.
> >
> > Keep the macro declared with all config, it should be optimized to a false
> > anyway if !THP && !HUGETLB.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > include/linux/pgtable.h | 4 ++++
> > mm/gup.c | 3 +--
> > 2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> > index 466cf477551a..2b42e95a4e3a 100644
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -1362,6 +1362,10 @@ static inline int pmd_write(pmd_t pmd)
> > #endif /* pmd_write */
> > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >
> > +#ifndef pmd_thp_or_huge
> > +#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))
> > +#endif
>
> Why not just use pmd_leaf() ?
>
> This GUP case seems to me exactly like what pmd_leaf() should really
> do and be used for..
I think I mostly agree with you, and these APIs are indeed confusing. IMHO
the challenge is about the risk of breaking others on small changes in the
details where evil resides.
>
> eg x86 does:
>
> #define pmd_leaf pmd_large
> static inline int pmd_large(pmd_t pte)
> return pmd_flags(pte) & _PAGE_PSE;
>
> static inline int pmd_trans_huge(pmd_t pmd)
> return (pmd_val(pmd) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE;
>
> int pmd_huge(pmd_t pmd)
> return !pmd_none(pmd) &&
> (pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
For example, here I don't think it's strictly pmd_leaf()? As pmd_huge()
will return true if PRESENT=0 && PSE=0 (as long as none pte ruled out
first), while pmd_leaf() will return false; I think that came from
cbef8478bee5. I'm not sure whether that is the best solution, e.g., from a
1st glance it seems better to me to process swap entries separately
(including both migration and poisoned entries)..
Sparc has similar things there, which in that case I'm not sure whether a
direct replace is always safe.
Besides that, there're also other cases where it's not clear of such direct
replacement, not until further investigated. E.g., arm-3level has:
#define pmd_leaf(pmd) pmd_sect(pmd)
#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
PMD_TYPE_SECT)
#define PMD_TYPE_SECT (_AT(pmdval_t, 1) << 0)
While pmd_huge() there relies on PMD_TABLE_BIT ()
int pmd_huge(pmd_t pmd)
{
return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
}
#define PMD_TABLE_BIT (_AT(pmdval_t, 1) << 1)
These are just the trivial details that I wanted to avoid to touch in this
series, so as to resolve the hugetlb issue separately from others.
The new pmd_huge_or_thp() is not ideal, but that easily isolates all these
trivial details / evils out of the picture, so that we can tackle them one
by one. It is strictly an OR or huge||thp, so it's hopefully safe to not
break anything yet from that regard.
>
> I spot checked a couple arches and it looks like it holds up.
>
> Further, it looks to me like this site in GUP is the only core code
> caller..
>
> So, I'd suggest a small series to go arch by arch and convert the arch
> to use pmd_huge() == pmd_leaf(). Then retire pmd_huge() as a public
> API.
>
> > diff --git a/mm/gup.c b/mm/gup.c
> > index df83182ec72d..eebae70d2465 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -3004,8 +3004,7 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo
> > if (!pmd_present(pmd))
> > return 0;
> >
> > - if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) ||
> > - pmd_devmap(pmd))) {
> > + if (unlikely(pmd_thp_or_huge(pmd) || pmd_devmap(pmd))) {
> > /* See gup_pte_range() */
> > if (pmd_protnone(pmd))
> > return 0;
>
> And the devmap thing here doesn't make any sense either. The arch
> should ensure that pmd_devmap() implies pmd_leaf(). Since devmap is a
> purely SW construct it almost certainly does already anyhow.
Yep, but only if pmd_leaf() is safe to be put here. A pmd devmap should
always imply as a pmd_leaf() indeed.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2024-02-21 9:37 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-03 9:14 [PATCH v2 00/13] mm/gup: Unify hugetlb, part 2 peterx
2024-01-03 9:14 ` [PATCH v2 01/13] mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES peterx
2024-01-15 17:37 ` Jason Gunthorpe
2024-01-22 8:25 ` Peter Xu
2024-01-03 9:14 ` [PATCH v2 02/13] mm/hugetlb: Declare hugetlbfs_pagecache_present() non-static peterx
2024-01-03 9:14 ` [PATCH v2 03/13] mm: Provide generic pmd_thp_or_huge() peterx
[not found] ` <20240115175551.GP734935@nvidia.com>
2024-02-21 9:37 ` Peter Xu [this message]
2024-02-21 12:57 ` Jason Gunthorpe
2024-01-03 9:14 ` [PATCH v2 04/13] mm: Make HPAGE_PXD_* macros even if !THP peterx
2024-01-15 17:59 ` Jason Gunthorpe
2024-01-03 9:14 ` [PATCH v2 05/13] mm: Introduce vma_pgtable_walk_{begin|end}() peterx
2024-01-03 9:14 ` [PATCH v2 06/13] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing peterx
2024-01-15 18:37 ` Jason Gunthorpe
2024-01-16 6:30 ` Christophe Leroy
2024-01-16 12:31 ` Jason Gunthorpe
2024-01-16 18:32 ` Christophe Leroy
2024-01-17 13:22 ` Jason Gunthorpe
2024-01-18 15:15 ` Ryan Roberts
2024-02-21 11:55 ` Peter Xu
2024-01-03 9:14 ` [PATCH v2 07/13] mm/gup: Refactor record_subpages() to find 1st small page peterx
2024-01-03 9:14 ` [PATCH v2 08/13] mm/gup: Handle hugetlb for no_page_table() peterx
2024-01-15 18:39 ` Jason Gunthorpe
2024-01-03 9:14 ` [PATCH v2 09/13] mm/gup: Cache *pudp in follow_pud_mask() peterx
2024-01-15 18:41 ` Jason Gunthorpe
2024-01-03 9:14 ` [PATCH v2 10/13] mm/gup: Handle huge pud for follow_pud_mask() peterx
2024-01-15 18:49 ` Jason Gunthorpe
2024-01-03 9:14 ` [PATCH v2 11/13] mm/gup: Handle huge pmd for follow_pmd_mask() peterx
2024-01-15 18:51 ` Jason Gunthorpe
2024-01-03 9:14 ` [PATCH v2 12/13] mm/gup: Handle hugepd for follow_page() peterx
2024-01-03 9:14 ` [PATCH v2 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code peterx
2024-01-03 11:14 ` [PATCH v2 00/13] mm/gup: Unify hugetlb, part 2 Christophe Leroy
2024-01-08 7:27 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZdXEYfs_xhS_9gRo@x1n \
--to=peterx@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.jones@linux.dev \
--cc=aneesh.kumar@kernel.org \
--cc=axelrasmussen@google.com \
--cc=christophe.leroy@csgroup.eu \
--cc=david@redhat.com \
--cc=hch@infradead.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=jthoughton@google.com \
--cc=kirill@shutemov.name \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lstoakes@gmail.com \
--cc=mike.kravetz@oracle.com \
--cc=mpe@ellerman.id.au \
--cc=muchun.song@linux.dev \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=shy828301@gmail.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox