From: James Houghton <jthoughton@google.com>
To: Mina Almasry <almasrymina@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
Muchun Song <songmuchun@bytedance.com>,
Peter Xu <peterx@redhat.com>,
David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
"Zach O'Keefe" <zokeefe@google.com>,
Manish Mishra <manish.mishra@nutanix.com>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Yang Shi <shy828301@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 10/47] hugetlb: add hugetlb_pte to track HugeTLB page table entries
Date: Fri, 9 Dec 2022 11:02:35 -0500 [thread overview]
Message-ID: <CADrL8HXiNHw2MdgCWmi1JpK=dckJ=D-5-Wm5Ofv0L6Uh7nvqRg@mail.gmail.com> (raw)
In-Reply-To: <CAHS8izPYvrviLbtVNkg+bnSXt5zvaXfJJV9+CAZ_0qESyMimBw@mail.gmail.com>
On Wed, Dec 7, 2022 at 7:46 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Fri, Oct 21, 2022 at 9:37 AM James Houghton <jthoughton@google.com> wrote:
> >
> > After high-granularity mapping, page table entries for HugeTLB pages can
> > be of any size/type. (For example, we can have a 1G page mapped with a
> > mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB
> > PTE after we have done a page table walk.
> >
> > Without this, we'd have to pass around the "size" of the PTE everywhere.
> > We effectively did this before; it could be fetched from the hstate,
> > which we pass around pretty much everywhere.
> >
> > hugetlb_pte_present_leaf is included here as a helper function that will
> > be used frequently later on.
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> > include/linux/hugetlb.h | 88 +++++++++++++++++++++++++++++++++++++++++
> > mm/hugetlb.c | 29 ++++++++++++++
> > 2 files changed, 117 insertions(+)
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index db3ed6095b1c..d30322108b34 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -50,6 +50,75 @@ enum {
> > __NR_USED_SUBPAGE,
> > };
> >
> > +enum hugetlb_level {
> > + HUGETLB_LEVEL_PTE = 1,
> > + /*
> > + * We always include PMD, PUD, and P4D in this enum definition so that,
> > + * when logged as an integer, we can easily tell which level it is.
> > + */
> > + HUGETLB_LEVEL_PMD,
> > + HUGETLB_LEVEL_PUD,
> > + HUGETLB_LEVEL_P4D,
> > + HUGETLB_LEVEL_PGD,
> > +};
> > +
>
> Don't we need to support CONTIG_PTE/PMD levels here for ARM64?
Yeah, which is why shift and level aren't quite the same thing.
Contiguous PMDs would be HUGETLB_LEVEL_PMD but have shift =
CONT_PMD_SHIFT, whereas regular PMDs would have shift = PMD_SHIFT.
>
> > +struct hugetlb_pte {
> > + pte_t *ptep;
> > + unsigned int shift;
> > + enum hugetlb_level level;
>
> Is shift + level redundant? When would those diverge?
Peter asked a very similar question. `shift` can be used to determine
`level` if no levels are being folded. In the case of folded levels,
you might have a single shift that corresponds to multiple "levels".
That isn't necessarily a problem, as folding a level just means
casting your p?d_t* differently, but I think it's good to be able to
*know* if the hugetlb_pte was populated with a pud_t* that we treat it
like a pud_t* always.
If `ptep` was instead a union, then `level` would be the tag. Perhaps
it should be written that way.
>
> > + spinlock_t *ptl;
> > +};
> > +
> > +static inline
> > +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep,
> > + unsigned int shift, enum hugetlb_level level)
> > +{
> > + WARN_ON_ONCE(!ptep);
> > + hpte->ptep = ptep;
> > + hpte->shift = shift;
> > + hpte->level = level;
> > + hpte->ptl = NULL;
> > +}
> > +
> > +static inline
> > +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte)
> > +{
> > + WARN_ON_ONCE(!hpte->ptep);
> > + return 1UL << hpte->shift;
> > +}
> > +
> > +static inline
> > +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte)
> > +{
> > + WARN_ON_ONCE(!hpte->ptep);
> > + return ~(hugetlb_pte_size(hpte) - 1);
> > +}
> > +
> > +static inline
> > +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte)
> > +{
> > + WARN_ON_ONCE(!hpte->ptep);
> > + return hpte->shift;
> > +}
> > +
> > +static inline
> > +enum hugetlb_level hugetlb_pte_level(const struct hugetlb_pte *hpte)
> > +{
> > + WARN_ON_ONCE(!hpte->ptep);
> > + return hpte->level;
> > +}
> > +
> > +static inline
> > +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src)
> > +{
> > + dest->ptep = src->ptep;
> > + dest->shift = src->shift;
> > + dest->level = src->level;
> > + dest->ptl = src->ptl;
> > +}
> > +
> > +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte);
> > +
> > struct hugepage_subpool {
> > spinlock_t lock;
> > long count;
> > @@ -1210,6 +1279,25 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h,
> > return ptl;
> > }
> >
> > +static inline
> > +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> > +{
> > +
> > + BUG_ON(!hpte->ptep);
>
> I think BUG_ON()s will be frowned upon. This function also doesn't
> really need ptep. Maybe let hugetlb_pte_shift() decide to BUG_ON() if
> necessary.
Right. I'll remove this (and others that aren't really necessary).
Peter's suggestion to just let the kernel take a #pf and crash
(thereby logging more info) SGTM.
>
>
> > + if (hpte->ptl)
> > + return hpte->ptl;
> > + return huge_pte_lockptr(hugetlb_pte_shift(hpte), mm, hpte->ptep);
>
> I don't know if this fallback to huge_pte_lockptr() should be obivous
> to the reader. If not, a comment would help.
I'll clean this up a little for the next version. If something like
this branch stays, I'll add a comment.
>
> > +}
> > +
> > +static inline
> > +spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte)
> > +{
> > + spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte);
> > +
> > + spin_lock(ptl);
> > + return ptl;
> > +}
> > +
> > #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA)
> > extern void __init hugetlb_cma_reserve(int order);
> > #else
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index ef7662bd0068..a0e46d35dabc 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1127,6 +1127,35 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
> > return false;
> > }
> >
> > +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte)
>
> I also don't know if this is obvious to other readers, but I'm quite
> confused that we pass both hugetlb_pte and pte_t here, especially when
> hpte has a pte_t inside of it. Maybe a comment would help.
It's possible for the value of the pte to change if we haven't locked
the PTL; we only store a pte_t* in hugetlb_pte, not the value itself.
Thinking about this... we *do* store `shift` which technically depends
on the value of the PTE. If the PTE is pte_none, the true `shift` of
the PTE is ambiguous, and so we just provide what the user asked for.
That could lead to a scenario where UFFDIO_CONTINUE(some 4K page) then
UFFDIO_CONTINUE(CONT_PTE_SIZE range around that page) can both succeed
because we merely check if the *first* PTE in the contiguous bunch is
none/has changed.
So, in the case of a contiguous PTE where we *think* we're overwriting
a bunch of none PTEs, we need to check that each PTE we're overwriting
is still none while holding the PTL. That means that the PTL we use
for cont PTEs and non-cont PTEs of the same level must be the same.
So for the next version, I'll:
- add some requirement that contiguous and non-contiguous PTEs on the
same level must use the same PTL
- think up some kind of API like all_contig_ptes_none(), but it only
really applies for arm64, so I think actually putting it in can wait.
I'll at least put a comment in hugetlb_mcopy_atomic_pte and
hugetlb_no_page (near the final huge_pte_none() and pte_same()
checks).
>
> > +{
> > + pgd_t pgd;
> > + p4d_t p4d;
> > + pud_t pud;
> > + pmd_t pmd;
> > +
> > + WARN_ON_ONCE(!hpte->ptep);
> > + switch (hugetlb_pte_level(hpte)) {
> > + case HUGETLB_LEVEL_PGD:
> > + pgd = __pgd(pte_val(pte));
> > + return pgd_present(pgd) && pgd_leaf(pgd);
> > + case HUGETLB_LEVEL_P4D:
> > + p4d = __p4d(pte_val(pte));
> > + return p4d_present(p4d) && p4d_leaf(p4d);
> > + case HUGETLB_LEVEL_PUD:
> > + pud = __pud(pte_val(pte));
> > + return pud_present(pud) && pud_leaf(pud);
> > + case HUGETLB_LEVEL_PMD:
> > + pmd = __pmd(pte_val(pte));
> > + return pmd_present(pmd) && pmd_leaf(pmd);
> > + case HUGETLB_LEVEL_PTE:
> > + return pte_present(pte);
> > + default:
> > + WARN_ON_ONCE(1);
> > + return false;
> > + }
> > +}
> > +
> > static void enqueue_huge_page(struct hstate *h, struct page *page)
> > {
> > int nid = page_to_nid(page);
> > --
> > 2.38.0.135.g90850a2211-goog
> >
next prev parent reply other threads:[~2022-12-09 16:02 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-21 16:36 [RFC PATCH v2 00/47] hugetlb: introduce HugeTLB high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 01/47] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE James Houghton
2022-11-16 16:30 ` Peter Xu
2022-11-21 18:33 ` James Houghton
2022-12-08 22:55 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 02/47] hugetlb: remove mk_huge_pte; it is unused James Houghton
2022-11-16 16:35 ` Peter Xu
2022-12-07 23:13 ` Mina Almasry
2022-12-08 23:42 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 03/47] hugetlb: remove redundant pte_mkhuge in migration path James Houghton
2022-11-16 16:36 ` Peter Xu
2022-12-07 23:16 ` Mina Almasry
2022-12-09 0:10 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 04/47] hugetlb: only adjust address ranges when VMAs want PMD sharing James Houghton
2022-11-16 16:50 ` Peter Xu
2022-12-09 0:22 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock_alloc return its failure reason James Houghton
2022-11-16 17:08 ` Peter Xu
2022-11-21 18:11 ` James Houghton
2022-12-07 23:33 ` Mina Almasry
2022-12-09 22:36 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 06/47] hugetlb: extend vma lock for shared vmas James Houghton
2022-11-30 21:01 ` Peter Xu
2022-11-30 23:29 ` James Houghton
2022-12-09 22:48 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 07/47] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
2022-12-09 22:52 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions James Houghton
2022-11-16 17:19 ` Peter Xu
2022-12-08 0:26 ` Mina Almasry
2022-12-09 15:41 ` James Houghton
2022-12-13 0:13 ` Mike Kravetz
2022-12-13 15:49 ` James Houghton
2022-12-15 17:51 ` Mike Kravetz
2022-12-15 18:08 ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 09/47] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
2022-12-08 0:30 ` Mina Almasry
2022-12-13 0:25 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 10/47] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
2022-11-16 22:17 ` Peter Xu
2022-11-17 1:00 ` James Houghton
2022-11-17 16:27 ` Peter Xu
2022-12-08 0:46 ` Mina Almasry
2022-12-09 16:02 ` James Houghton [this message]
2022-12-13 18:44 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 11/47] hugetlb: add hugetlb_pmd_alloc and hugetlb_pte_alloc James Houghton
2022-12-13 19:32 ` Mike Kravetz
2022-12-13 20:18 ` James Houghton
2022-12-14 0:04 ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step James Houghton
2022-11-16 22:02 ` Peter Xu
2022-11-17 1:39 ` James Houghton
2022-12-14 0:47 ` Mike Kravetz
2023-01-05 0:57 ` Jane Chu
2023-01-05 1:12 ` Jane Chu
2023-01-05 1:23 ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 13/47] hugetlb: add make_huge_pte_with_shift James Houghton
2022-12-14 1:08 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 14/47] hugetlb: make default arch_make_huge_pte understand small mappings James Houghton
2022-12-14 22:17 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 15/47] hugetlbfs: for unmapping, treat HGM-mapped pages as potentially mapped James Houghton
2022-12-14 23:37 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 16/47] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
2022-12-15 0:28 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 17/47] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
2022-12-15 18:15 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 18/47] hugetlb: enlighten follow_hugetlb_page to support HGM James Houghton
2022-12-15 19:29 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 19/47] hugetlb: make hugetlb_follow_page_mask HGM-enabled James Houghton
2022-12-16 0:25 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 20/47] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 21/47] mm: rmap: provide pte_order in page_vma_mapped_walk James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 22/47] mm: rmap: make page_vma_mapped_walk callers use pte_order James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 23/47] rmap: update hugetlb lock comment for HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 24/47] hugetlb: update page_vma_mapped to do high-granularity walks James Houghton
2022-12-15 17:49 ` James Houghton
2022-12-15 18:45 ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 25/47] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
2022-11-30 21:32 ` Peter Xu
2022-11-30 23:18 ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 26/47] hugetlb: make move_hugetlb_page_tables compatible with HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 27/47] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 28/47] rmap: in try_to_{migrate,unmap}_one, check head page for page flags James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 29/47] hugetlb: add high-granularity migration support James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 30/47] hugetlb: add high-granularity check for hwpoison in fault path James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 31/47] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 32/47] hugetlb: add for_each_hgm_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
2022-11-16 22:28 ` Peter Xu
2022-11-16 23:30 ` James Houghton
2022-12-21 19:23 ` Peter Xu
2022-12-21 20:21 ` James Houghton
2022-12-21 21:39 ` Mike Kravetz
2022-12-21 22:10 ` Peter Xu
2022-12-21 22:31 ` Mike Kravetz
2022-12-22 0:02 ` James Houghton
2022-12-22 0:38 ` Mike Kravetz
2022-12-22 1:24 ` James Houghton
2022-12-22 14:30 ` Peter Xu
2022-12-27 17:02 ` James Houghton
2023-01-03 17:06 ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 34/47] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE James Houghton
2022-11-17 16:58 ` Peter Xu
2022-12-23 18:38 ` Peter Xu
2022-12-27 16:38 ` James Houghton
2023-01-03 17:09 ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 35/47] userfaultfd: require UFFD_FEATURE_EXACT_ADDRESS when using HugeTLB HGM James Houghton
2022-12-22 21:47 ` Peter Xu
2022-12-27 16:39 ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 36/47] hugetlb: add MADV_COLLAPSE for hugetlb James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 37/47] hugetlb: remove huge_pte_lock and huge_pte_lockptr James Houghton
2022-11-16 20:16 ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 38/47] hugetlb: replace make_huge_pte with make_huge_pte_with_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 39/47] mm: smaps: add stats for HugeTLB mapping size James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 40/47] hugetlb: x86: enable high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 41/47] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 42/47] docs: proc: include information about HugeTLB HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 43/47] selftests/vm: add HugeTLB HGM to userfaultfd selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 44/47] selftests/kvm: add HugeTLB HGM to KVM demand paging selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 45/47] selftests/vm: add anon and shared hugetlb to migration test James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 46/47] selftests/vm: add hugetlb HGM test to migration selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 47/47] selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests James Houghton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CADrL8HXiNHw2MdgCWmi1JpK=dckJ=D-5-Wm5Ofv0L6Uh7nvqRg@mail.gmail.com' \
--to=jthoughton@google.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=axelrasmussen@google.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=manish.mishra@nutanix.com \
--cc=mike.kravetz@oracle.com \
--cc=naoya.horiguchi@nec.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=songmuchun@bytedance.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox