From: "manish.mishra" <manish.mishra@nutanix.com>
To: James Houghton <jthoughton@google.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Muchun Song <songmuchun@bytedance.com>,
Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Mina Almasry <almasrymina@google.com>, Jue Wang <juew@google.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality
Date: Mon, 27 Jun 2022 19:20:58 +0530 [thread overview]
Message-ID: <142229ce-fb50-992b-3b11-a1fb5f9175f9@nutanix.com> (raw)
In-Reply-To: <20220624173656.2033256-13-jthoughton@google.com>
On 24/06/22 11:06 pm, James Houghton wrote:
> The new function, hugetlb_split_to_shift, will optimally split the page
> table to map a particular address at a particular granularity.
>
> This is useful for punching a hole in the mapping and for mapping small
> sections of a HugeTLB page (via UFFDIO_CONTINUE, for example).
>
> Signed-off-by: James Houghton <jthoughton@google.com>
> ---
> mm/hugetlb.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 122 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3ec2a921ee6f..eaffe7b4f67c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -102,6 +102,18 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
> /* Forward declaration */
> static int hugetlb_acct_memory(struct hstate *h, long delta);
>
> +/*
> + * Find the subpage that corresponds to `addr` in `hpage`.
> + */
> +static struct page *hugetlb_find_subpage(struct hstate *h, struct page *hpage,
> + unsigned long addr)
> +{
> + size_t idx = (addr & ~huge_page_mask(h))/PAGE_SIZE;
> +
> + BUG_ON(idx >= pages_per_huge_page(h));
> + return &hpage[idx];
> +}
> +
> static inline bool subpool_is_free(struct hugepage_subpool *spool)
> {
> if (spool->count)
> @@ -7044,6 +7056,116 @@ static unsigned int __shift_for_hstate(struct hstate *h)
> for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \
> (tmp_h) <= &hstates[hugetlb_max_hstate]; \
> (tmp_h)++)
> +
> +/*
> + * Given a particular address, split the HugeTLB PTE that currently maps it
> + * so that, for the given address, the PTE that maps it is `desired_shift`.
> + * This function will always split the HugeTLB PTE optimally.
> + *
> + * For example, given a HugeTLB 1G page that is mapped from VA 0 to 1G. If we
> + * call this function with addr=0 and desired_shift=PAGE_SHIFT, will result in
> + * these changes to the page table:
> + * 1. The PUD will be split into 2M PMDs.
> + * 2. The first PMD will be split again into 4K PTEs.
> + */
> +static int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma,
> + const struct hugetlb_pte *hpte,
> + unsigned long addr, unsigned long desired_shift)
> +{
> + unsigned long start, end, curr;
> + unsigned long desired_sz = 1UL << desired_shift;
> + struct hstate *h = hstate_vma(vma);
> + int ret;
> + struct hugetlb_pte new_hpte;
> + struct mmu_notifier_range range;
> + struct page *hpage = NULL;
> + struct page *subpage;
> + pte_t old_entry;
> + struct mmu_gather tlb;
> +
> + BUG_ON(!hpte->ptep);
> + BUG_ON(hugetlb_pte_size(hpte) == desired_sz);
can it be BUG_ON(hugetlb_pte_size(hpte) <= desired_sz)
> +
> + start = addr & hugetlb_pte_mask(hpte);
> + end = start + hugetlb_pte_size(hpte);
> +
> + i_mmap_assert_write_locked(vma->vm_file->f_mapping);
As it is just changing mappings is holding f_mapping required? I mean in future
is it any paln or way to use some per process level sub-lock?
> +
> + BUG_ON(!hpte->ptep);
> + /* This function only works if we are looking at a leaf-level PTE. */
> + BUG_ON(!hugetlb_pte_none(hpte) && !hugetlb_pte_present_leaf(hpte));
> +
> + /*
> + * Clear the PTE so that we will allocate the PT structures when
> + * walking the page table.
> + */
> + old_entry = huge_ptep_get_and_clear(mm, start, hpte->ptep);
> +
> + if (!huge_pte_none(old_entry))
> + hpage = pte_page(old_entry);
> +
> + BUG_ON(!IS_ALIGNED(start, desired_sz));
> + BUG_ON(!IS_ALIGNED(end, desired_sz));
> +
> + for (curr = start; curr < end;) {
> + struct hstate *tmp_h;
> + unsigned int shift;
> +
> + for_each_hgm_shift(h, tmp_h, shift) {
> + unsigned long sz = 1UL << shift;
> +
> + if (!IS_ALIGNED(curr, sz) || curr + sz > end)
> + continue;
> + /*
> + * If we are including `addr`, we need to make sure
> + * splitting down to the correct size. Go to a smaller
> + * size if we are not.
> + */
> + if (curr <= addr && curr + sz > addr &&
> + shift > desired_shift)
> + continue;
> +
> + /*
> + * Continue the page table walk to the level we want,
> + * allocate PT structures as we go.
> + */
As i understand this for_each_hgm_shift loop is just to find right size of shift,
then code below this line can be put out of loop, no strong feeling but it looks
more proper may make code easier to understand.
> + hugetlb_pte_copy(&new_hpte, hpte);
> + ret = hugetlb_walk_to(mm, &new_hpte, curr, sz,
> + /*stop_at_none=*/false);
> + if (ret)
> + goto err;
> + BUG_ON(hugetlb_pte_size(&new_hpte) != sz);
> + if (hpage) {
> + pte_t new_entry;
> +
> + subpage = hugetlb_find_subpage(h, hpage, curr);
> + new_entry = make_huge_pte_with_shift(vma, subpage,
> + huge_pte_write(old_entry),
> + shift);
> + set_huge_pte_at(mm, curr, new_hpte.ptep, new_entry);
> + }
> + curr += sz;
> + goto next;
> + }
> + /* We couldn't find a size that worked. */
> + BUG();
> +next:
> + continue;
> + }
> +
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> + start, end);
> + mmu_notifier_invalidate_range_start(&range);
sorry did not understand where tlb flush will be taken care in case of success?
I see set_huge_pte_at does not do it internally by self.
> + return 0;
> +err:
> + tlb_gather_mmu(&tlb, mm);
> + /* Free any newly allocated page table entries. */
> + hugetlb_free_range(&tlb, hpte, start, curr);
> + /* Restore the old entry. */
> + set_huge_pte_at(mm, start, hpte->ptep, old_entry);
> + tlb_finish_mmu(&tlb);
> + return ret;
> +}
> #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
>
> /*
next prev parent reply other threads:[~2022-06-27 13:51 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
2022-06-24 17:36 ` [RFC PATCH 01/26] hugetlb: make hstate accessor functions const James Houghton
2022-06-24 18:43 ` Mina Almasry
[not found] ` <e55f90f5-ba14-5d6e-8f8f-abf731b9095e@nutanix.com>
2022-06-27 12:09 ` manish.mishra
2022-06-28 17:08 ` James Houghton
2022-06-29 6:18 ` Muchun Song
2022-06-24 17:36 ` [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
2022-06-24 18:51 ` Mina Almasry
2022-06-27 12:08 ` manish.mishra
2022-06-28 15:35 ` James Houghton
2022-06-27 18:42 ` Mike Kravetz
2022-06-28 15:40 ` James Houghton
2022-06-29 6:39 ` Muchun Song
2022-06-29 21:06 ` Mike Kravetz
2022-06-29 21:13 ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift James Houghton
2022-06-24 19:01 ` Mina Almasry
2022-06-27 12:13 ` manish.mishra
2022-06-24 17:36 ` [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
2022-06-27 12:26 ` manish.mishra
2022-06-27 20:51 ` Mike Kravetz
2022-06-28 15:29 ` James Houghton
2022-06-29 6:09 ` Muchun Song
2022-06-29 21:03 ` Mike Kravetz
2022-06-29 21:39 ` James Houghton
2022-06-29 22:24 ` Mike Kravetz
2022-06-30 9:35 ` Muchun Song
2022-06-30 16:23 ` James Houghton
2022-06-30 17:40 ` Mike Kravetz
2022-07-01 3:32 ` Muchun Song
2022-06-24 17:36 ` [RFC PATCH 05/26] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
2022-06-27 12:28 ` manish.mishra
2022-06-28 20:03 ` Mina Almasry
2022-06-24 17:36 ` [RFC PATCH 06/26] mm: make free_p?d_range functions public James Houghton
2022-06-27 12:31 ` manish.mishra
2022-06-28 20:35 ` Mike Kravetz
2022-07-12 20:52 ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
2022-06-27 12:47 ` manish.mishra
2022-06-29 16:28 ` James Houghton
2022-06-28 20:25 ` Mina Almasry
2022-06-29 16:42 ` James Houghton
2022-06-28 20:44 ` Mike Kravetz
2022-06-29 16:24 ` James Houghton
2022-07-11 23:32 ` Mike Kravetz
2022-07-12 9:42 ` Dr. David Alan Gilbert
2022-07-12 17:51 ` Mike Kravetz
2022-07-15 16:35 ` Peter Xu
2022-07-15 21:52 ` Axel Rasmussen
2022-07-15 23:03 ` Peter Xu
2022-09-08 17:38 ` Peter Xu
2022-09-08 17:54 ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures James Houghton
2022-06-27 12:52 ` manish.mishra
2022-06-28 20:27 ` Mina Almasry
2022-06-24 17:36 ` [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled James Houghton
2022-06-27 12:55 ` manish.mishra
2022-06-28 20:33 ` Mina Almasry
2022-09-08 18:07 ` Peter Xu
2022-09-08 18:13 ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift James Houghton
2022-06-27 13:01 ` manish.mishra
2022-06-28 21:58 ` Mina Almasry
2022-07-07 21:39 ` Mike Kravetz
2022-07-08 15:52 ` James Houghton
2022-07-09 21:55 ` Mina Almasry
2022-06-24 17:36 ` [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks James Houghton
2022-06-27 13:07 ` manish.mishra
2022-07-07 23:03 ` Mike Kravetz
2022-09-08 18:20 ` Peter Xu
2022-06-24 17:36 ` [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality James Houghton
2022-06-27 13:50 ` manish.mishra [this message]
2022-06-29 16:10 ` James Houghton
2022-06-29 14:33 ` manish.mishra
2022-06-29 16:20 ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 13/26] hugetlb: add huge_pte_alloc_high_granularity James Houghton
2022-06-29 14:11 ` manish.mishra
2022-06-24 17:36 ` [RFC PATCH 14/26] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
2022-06-29 14:40 ` manish.mishra
2022-06-29 15:56 ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 15/26] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
2022-07-19 10:19 ` manish.mishra
2022-07-19 15:58 ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 16/26] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
2022-06-24 17:36 ` [RFC PATCH 17/26] hugetlb: update follow_hugetlb_page to support HGM James Houghton
2022-07-19 10:48 ` manish.mishra
2022-07-19 16:19 ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
2022-06-24 17:36 ` [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
2022-07-11 23:41 ` Mike Kravetz
2022-07-12 17:19 ` James Houghton
2022-07-12 18:06 ` Mike Kravetz
2022-07-15 21:39 ` Axel Rasmussen
2022-06-24 17:36 ` [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE James Houghton
2022-07-15 16:21 ` Peter Xu
2022-07-15 16:58 ` James Houghton
2022-07-15 17:20 ` Peter Xu
2022-07-20 20:58 ` James Houghton
2022-07-21 19:09 ` Peter Xu
2022-07-21 19:44 ` James Houghton
2022-07-21 19:53 ` Peter Xu
2022-06-24 17:36 ` [RFC PATCH 21/26] hugetlb: add hugetlb_collapse James Houghton
2022-06-24 17:36 ` [RFC PATCH 22/26] madvise: add uapi for HugeTLB HGM collapse: MADV_COLLAPSE James Houghton
2022-06-24 17:36 ` [RFC PATCH 23/26] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
2022-06-24 17:36 ` [RFC PATCH 24/26] arm64/hugetlb: add support for high-granularity mappings James Houghton
2022-06-24 17:36 ` [RFC PATCH 25/26] selftests: add HugeTLB HGM to userfaultfd selftest James Houghton
2022-06-24 17:36 ` [RFC PATCH 26/26] selftests: add HugeTLB HGM to KVM demand paging selftest James Houghton
2022-06-24 18:29 ` [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping Matthew Wilcox
2022-06-27 16:36 ` James Houghton
2022-06-27 17:56 ` Dr. David Alan Gilbert
2022-06-27 20:31 ` James Houghton
2022-06-28 0:04 ` Nadav Amit
2022-06-30 19:21 ` Peter Xu
2022-07-01 5:54 ` Nadav Amit
2022-06-28 8:20 ` Dr. David Alan Gilbert
2022-06-30 16:09 ` Peter Xu
2022-06-24 18:41 ` Mina Almasry
2022-06-27 16:27 ` James Houghton
2022-06-28 14:17 ` Muchun Song
2022-06-28 17:26 ` Mina Almasry
2022-06-28 17:56 ` Dr. David Alan Gilbert
2022-06-29 18:31 ` James Houghton
2022-06-29 20:39 ` Axel Rasmussen
2022-06-24 18:47 ` Matthew Wilcox
2022-06-27 16:48 ` James Houghton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=142229ce-fb50-992b-3b11-a1fb5f9175f9@nutanix.com \
--to=manish.mishra@nutanix.com \
--cc=almasrymina@google.com \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=jthoughton@google.com \
--cc=juew@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox