From: James Houghton <jthoughton@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>,
Peter Xu <peterx@redhat.com>,
David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Mina Almasry <almasrymina@google.com>,
"Zach O'Keefe" <zokeefe@google.com>,
Manish Mishra <manish.mishra@nutanix.com>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Yang Shi <shy828301@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions
Date: Tue, 13 Dec 2022 10:49:32 -0500 [thread overview]
Message-ID: <CADrL8HU9sQuh_W3Qx4dvGV44VLYNbt300cpWLU--BqLo3Xxgpw@mail.gmail.com> (raw)
In-Reply-To: <Y5fDwH6XiM808oUM@monkey>
On Mon, Dec 12, 2022 at 7:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 10/21/22 16:36, James Houghton wrote:
> > Currently it is possible for all shared VMAs to use HGM, but it must be
> > enabled first. This is because with HGM, we lose PMD sharing, and page
> > table walks require additional synchronization (we need to take the VMA
> > lock).
>
> Not sure yet, but I expect Peter's series will help with locking for
> hugetlb specific page table walks.
It should make things a little bit cleaner in this series; I'll rebase
HGM on top of those patches this week (and hopefully get a v1 out
soon).
I don't think it's possible to implement MADV_COLLAPSE with RCU alone
(as implemented in Peter's series anyway); we still need the VMA lock.
>
> >
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > ---
> > include/linux/hugetlb.h | 22 +++++++++++++
> > mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 91 insertions(+)
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 534958499ac4..6e0c36b08a0c 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -123,6 +123,9 @@ struct hugetlb_vma_lock {
> >
> > struct hugetlb_shared_vma_data {
> > struct hugetlb_vma_lock vma_lock;
> > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> > + bool hgm_enabled;
> > +#endif
> > };
> >
> > extern struct resv_map *resv_map_alloc(void);
> > @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node)
> > }
> > #endif /* CONFIG_HUGETLB_PAGE */
> >
> > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma);
> > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma);
> > +int enable_hugetlb_hgm(struct vm_area_struct *vma);
> > +#else
> > +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> > +{
> > + return false;
> > +}
> > +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma)
> > +{
> > + return false;
> > +}
> > +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma)
> > +{
> > + return -EINVAL;
> > +}
> > +#endif
> > +
> > static inline spinlock_t *huge_pte_lock(struct hstate *h,
> > struct mm_struct *mm, pte_t *pte)
> > {
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 5ae8bc8c928e..a18143add956 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma)
> > #ifdef CONFIG_USERFAULTFD
> > if (uffd_disable_huge_pmd_share(vma))
> > return false;
> > +#endif
> > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> > + if (hugetlb_hgm_enabled(vma))
> > + return false;
> > #endif
> > /*
> > * Only shared VMAs can share PMDs.
> > @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma)
> > kref_init(&data->vma_lock.refs);
> > init_rwsem(&data->vma_lock.rw_sema);
> > data->vma_lock.vma = vma;
> > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> > + data->hgm_enabled = false;
> > +#endif
> > vma->vm_private_data = data;
> > return 0;
> > }
> > @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h)
> >
> > #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
> >
> > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
> > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma)
> > +{
> > + /*
> > + * All shared VMAs may have HGM.
> > + *
> > + * HGM requires using the VMA lock, which only exists for shared VMAs.
> > + * To make HGM work for private VMAs, we would need to use another
> > + * scheme to prevent collapsing/splitting from invalidating other
> > + * threads' page table walks.
> > + */
> > + return vma && (vma->vm_flags & VM_MAYSHARE);
>
> I am not yet 100% convinced you can/will take care of all possible code
> paths where hugetlb_vma_data allocation may fail. If not, then you
> should be checking vm_private_data here as well.
I think the check here makes sense -- if a VMA is shared, then it is
eligible for HGM, but we might fail to enable it because we can't
allocate the VMA lock. I'll reword the comment to clearly say this.
There is the problem of splitting, though: if we have high-granularity
mapped PTEs in a VMA and that VMA gets split, we need to remember that
the VMA had HGM enabled even if allocating the VMA lock fails,
otherwise things get out of sync. How does PMD sharing handle the
splitting case?
An easy way HGM could handle this is by disallowing splitting, but I
think we can do better. If we fail to allocate the VMA lock, then we
can no longer MADV_COLLAPSE safely, but everything else can proceed as
normal, and so some "hugetlb_hgm_enabled" checks can be
removed/changed. This should make things easier for when we have to
handle (some bits of) HGM for private mappings, too. I'll make some
improvements here for v1.
>
> > +}
> > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
> > +{
> > + struct hugetlb_shared_vma_data *data = vma->vm_private_data;
> > +
> > + if (!vma || !(vma->vm_flags & VM_MAYSHARE))
> > + return false;
> > +
> > + return data && data->hgm_enabled;
> > +}
> > +
> > +/*
> > + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM
> > + * cannot be turned off.
> > + *
> > + * PMDs cannot be shared in HGM VMAs.
> > + */
> > +int enable_hugetlb_hgm(struct vm_area_struct *vma)
> > +{
> > + int ret;
> > + struct hugetlb_shared_vma_data *data;
> > +
> > + if (!hugetlb_hgm_eligible(vma))
> > + return -EINVAL;
> > +
> > + if (hugetlb_hgm_enabled(vma))
> > + return 0;
> > +
> > + /*
> > + * We must hold the mmap lock for writing so that callers can rely on
> > + * hugetlb_hgm_enabled returning a consistent result while holding
> > + * the mmap lock for reading.
> > + */
> > + mmap_assert_write_locked(vma->vm_mm);
> > +
> > + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */
> > + ret = hugetlb_vma_data_alloc(vma);
> > + if (ret)
> > + return ret;
> > +
> > + data = vma->vm_private_data;
> > + BUG_ON(!data);
>
> Would rather have hugetlb_hgm_eligible check for vm_private_data as
> suggested above instead of the BUG here.
I don't think we'd ever actually BUG() here. Please correct me if I'm
wrong, but if we are eligible for HGM, then hugetlb_vma_data_alloc()
will only succeed if we actually allocated the VMA data/lock, so
vma->vm_private_data should never be NULL (with the BUG_ON to inform
the reader). Maybe I should just drop the BUG()?
>
> --
> Mike Kravetz
>
> > + data->hgm_enabled = true;
> > +
> > + /* We don't support PMD sharing with HGM. */
> > + hugetlb_unshare_all_pmds(vma);
> > + return 0;
> > +}
> > +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
> > +
> > /*
> > * These functions are overwritable if your architecture needs its own
> > * behavior.
> > --
> > 2.38.0.135.g90850a2211-goog
> >
next prev parent reply other threads:[~2022-12-13 15:49 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-21 16:36 [RFC PATCH v2 00/47] hugetlb: introduce HugeTLB high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 01/47] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE James Houghton
2022-11-16 16:30 ` Peter Xu
2022-11-21 18:33 ` James Houghton
2022-12-08 22:55 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 02/47] hugetlb: remove mk_huge_pte; it is unused James Houghton
2022-11-16 16:35 ` Peter Xu
2022-12-07 23:13 ` Mina Almasry
2022-12-08 23:42 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 03/47] hugetlb: remove redundant pte_mkhuge in migration path James Houghton
2022-11-16 16:36 ` Peter Xu
2022-12-07 23:16 ` Mina Almasry
2022-12-09 0:10 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 04/47] hugetlb: only adjust address ranges when VMAs want PMD sharing James Houghton
2022-11-16 16:50 ` Peter Xu
2022-12-09 0:22 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock_alloc return its failure reason James Houghton
2022-11-16 17:08 ` Peter Xu
2022-11-21 18:11 ` James Houghton
2022-12-07 23:33 ` Mina Almasry
2022-12-09 22:36 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 06/47] hugetlb: extend vma lock for shared vmas James Houghton
2022-11-30 21:01 ` Peter Xu
2022-11-30 23:29 ` James Houghton
2022-12-09 22:48 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 07/47] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
2022-12-09 22:52 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions James Houghton
2022-11-16 17:19 ` Peter Xu
2022-12-08 0:26 ` Mina Almasry
2022-12-09 15:41 ` James Houghton
2022-12-13 0:13 ` Mike Kravetz
2022-12-13 15:49 ` James Houghton [this message]
2022-12-15 17:51 ` Mike Kravetz
2022-12-15 18:08 ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 09/47] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
2022-12-08 0:30 ` Mina Almasry
2022-12-13 0:25 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 10/47] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
2022-11-16 22:17 ` Peter Xu
2022-11-17 1:00 ` James Houghton
2022-11-17 16:27 ` Peter Xu
2022-12-08 0:46 ` Mina Almasry
2022-12-09 16:02 ` James Houghton
2022-12-13 18:44 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 11/47] hugetlb: add hugetlb_pmd_alloc and hugetlb_pte_alloc James Houghton
2022-12-13 19:32 ` Mike Kravetz
2022-12-13 20:18 ` James Houghton
2022-12-14 0:04 ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step James Houghton
2022-11-16 22:02 ` Peter Xu
2022-11-17 1:39 ` James Houghton
2022-12-14 0:47 ` Mike Kravetz
2023-01-05 0:57 ` Jane Chu
2023-01-05 1:12 ` Jane Chu
2023-01-05 1:23 ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 13/47] hugetlb: add make_huge_pte_with_shift James Houghton
2022-12-14 1:08 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 14/47] hugetlb: make default arch_make_huge_pte understand small mappings James Houghton
2022-12-14 22:17 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 15/47] hugetlbfs: for unmapping, treat HGM-mapped pages as potentially mapped James Houghton
2022-12-14 23:37 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 16/47] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
2022-12-15 0:28 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 17/47] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
2022-12-15 18:15 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 18/47] hugetlb: enlighten follow_hugetlb_page to support HGM James Houghton
2022-12-15 19:29 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 19/47] hugetlb: make hugetlb_follow_page_mask HGM-enabled James Houghton
2022-12-16 0:25 ` Mike Kravetz
2022-10-21 16:36 ` [RFC PATCH v2 20/47] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 21/47] mm: rmap: provide pte_order in page_vma_mapped_walk James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 22/47] mm: rmap: make page_vma_mapped_walk callers use pte_order James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 23/47] rmap: update hugetlb lock comment for HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 24/47] hugetlb: update page_vma_mapped to do high-granularity walks James Houghton
2022-12-15 17:49 ` James Houghton
2022-12-15 18:45 ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 25/47] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
2022-11-30 21:32 ` Peter Xu
2022-11-30 23:18 ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 26/47] hugetlb: make move_hugetlb_page_tables compatible with HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 27/47] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 28/47] rmap: in try_to_{migrate,unmap}_one, check head page for page flags James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 29/47] hugetlb: add high-granularity migration support James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 30/47] hugetlb: add high-granularity check for hwpoison in fault path James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 31/47] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 32/47] hugetlb: add for_each_hgm_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
2022-11-16 22:28 ` Peter Xu
2022-11-16 23:30 ` James Houghton
2022-12-21 19:23 ` Peter Xu
2022-12-21 20:21 ` James Houghton
2022-12-21 21:39 ` Mike Kravetz
2022-12-21 22:10 ` Peter Xu
2022-12-21 22:31 ` Mike Kravetz
2022-12-22 0:02 ` James Houghton
2022-12-22 0:38 ` Mike Kravetz
2022-12-22 1:24 ` James Houghton
2022-12-22 14:30 ` Peter Xu
2022-12-27 17:02 ` James Houghton
2023-01-03 17:06 ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 34/47] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE James Houghton
2022-11-17 16:58 ` Peter Xu
2022-12-23 18:38 ` Peter Xu
2022-12-27 16:38 ` James Houghton
2023-01-03 17:09 ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 35/47] userfaultfd: require UFFD_FEATURE_EXACT_ADDRESS when using HugeTLB HGM James Houghton
2022-12-22 21:47 ` Peter Xu
2022-12-27 16:39 ` James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 36/47] hugetlb: add MADV_COLLAPSE for hugetlb James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 37/47] hugetlb: remove huge_pte_lock and huge_pte_lockptr James Houghton
2022-11-16 20:16 ` Peter Xu
2022-10-21 16:36 ` [RFC PATCH v2 38/47] hugetlb: replace make_huge_pte with make_huge_pte_with_shift James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 39/47] mm: smaps: add stats for HugeTLB mapping size James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 40/47] hugetlb: x86: enable high-granularity mapping James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 41/47] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 42/47] docs: proc: include information about HugeTLB HGM James Houghton
2022-10-21 16:36 ` [RFC PATCH v2 43/47] selftests/vm: add HugeTLB HGM to userfaultfd selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 44/47] selftests/kvm: add HugeTLB HGM to KVM demand paging selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 45/47] selftests/vm: add anon and shared hugetlb to migration test James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 46/47] selftests/vm: add hugetlb HGM test to migration selftest James Houghton
2022-10-21 16:37 ` [RFC PATCH v2 47/47] selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests James Houghton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CADrL8HU9sQuh_W3Qx4dvGV44VLYNbt300cpWLU--BqLo3Xxgpw@mail.gmail.com \
--to=jthoughton@google.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=axelrasmussen@google.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=manish.mishra@nutanix.com \
--cc=mike.kravetz@oracle.com \
--cc=naoya.horiguchi@nec.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=songmuchun@bytedance.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox