linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: James Houghton <jthoughton@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	 Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>,
	David Rientjes <rientjes@google.com>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	 Jue Wang <juew@google.com>,
	Manish Mishra <manish.mishra@nutanix.com>,
	 "Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 James Houghton <jthoughton@google.com>
Subject: [RFC PATCH 13/26] hugetlb: add huge_pte_alloc_high_granularity
Date: Fri, 24 Jun 2022 17:36:43 +0000	[thread overview]
Message-ID: <20220624173656.2033256-14-jthoughton@google.com> (raw)
In-Reply-To: <20220624173656.2033256-1-jthoughton@google.com>

This function is to be used to do a HugeTLB page table walk where we may
need to split a leaf-level huge PTE into a new page table level.

Consider the case where we want to install 4K inside an empty 1G page:
1. We walk to the PUD and notice that it is pte_none.
2. We split the PUD by calling `hugetlb_split_to_shift`, creating a
   standard PUD that points to PMDs that are all pte_none.
3. We continue the PT walk to find the PMD. We split it just like we
   split the PUD.
4. We find the PTE and give it back to the caller.

To avoid concurrent splitting operations on the same page table entry,
we require that the mapping rwsem is held for writing while collapsing
and for reading when doing a high-granularity PT walk.

Signed-off-by: James Houghton <jthoughton@google.com>
---
 include/linux/hugetlb.h | 23 ++++++++++++++
 mm/hugetlb.c            | 67 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 605aa19d8572..321f5745d87f 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1176,14 +1176,37 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 }
 #endif	/* CONFIG_HUGETLB_PAGE */
 
+enum split_mode {
+	HUGETLB_SPLIT_NEVER   = 0,
+	HUGETLB_SPLIT_NONE    = 1 << 0,
+	HUGETLB_SPLIT_PRESENT = 1 << 1,
+	HUGETLB_SPLIT_ALWAYS  = HUGETLB_SPLIT_NONE | HUGETLB_SPLIT_PRESENT,
+};
 #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING
 /* If HugeTLB high-granularity mappings are enabled for this VMA. */
 bool hugetlb_hgm_enabled(struct vm_area_struct *vma);
+int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
+				    struct mm_struct *mm,
+				    struct vm_area_struct *vma,
+				    unsigned long addr,
+				    unsigned int desired_sz,
+				    enum split_mode mode,
+				    bool write_locked);
 #else
 static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma)
 {
 	return false;
 }
+static inline int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
+					   struct mm_struct *mm,
+					   struct vm_area_struct *vma,
+					   unsigned long addr,
+					   unsigned int desired_sz,
+					   enum split_mode mode,
+					   bool write_locked)
+{
+	return -EINVAL;
+}
 #endif
 
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index eaffe7b4f67c..6e0c5fbfe32c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7166,6 +7166,73 @@ static int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *v
 	tlb_finish_mmu(&tlb);
 	return ret;
 }
+
+/*
+ * Similar to huge_pte_alloc except that this can be used to create or walk
+ * high-granularity mappings. It will automatically split existing HugeTLB PTEs
+ * if required by @mode. The resulting HugeTLB PTE will be returned in @hpte.
+ *
+ * There are three options for @mode:
+ *  - HUGETLB_SPLIT_NEVER   - Never split.
+ *  - HUGETLB_SPLIT_NONE    - Split empty PTEs.
+ *  - HUGETLB_SPLIT_PRESENT - Split present PTEs.
+ *  - HUGETLB_SPLIT_ALWAYS  - Split both empty and present PTEs.
+ */
+int huge_pte_alloc_high_granularity(struct hugetlb_pte *hpte,
+				    struct mm_struct *mm,
+				    struct vm_area_struct *vma,
+				    unsigned long addr,
+				    unsigned int desired_shift,
+				    enum split_mode mode,
+				    bool write_locked)
+{
+	struct address_space *mapping = vma->vm_file->f_mapping;
+	bool has_write_lock = write_locked;
+	unsigned long desired_sz = 1UL << desired_shift;
+	int ret;
+
+	BUG_ON(!hpte);
+
+	if (has_write_lock)
+		i_mmap_assert_write_locked(mapping);
+	else
+		i_mmap_assert_locked(mapping);
+
+retry:
+	ret = 0;
+	hugetlb_pte_init(hpte);
+
+	ret = hugetlb_walk_to(mm, hpte, addr, desired_sz,
+			      !(mode & HUGETLB_SPLIT_NONE));
+	if (ret || hugetlb_pte_size(hpte) == desired_sz)
+		goto out;
+
+	if (
+		((mode & HUGETLB_SPLIT_NONE) && hugetlb_pte_none(hpte)) ||
+		((mode & HUGETLB_SPLIT_PRESENT) &&
+		  hugetlb_pte_present_leaf(hpte))
+	   ) {
+		if (!has_write_lock) {
+			i_mmap_unlock_read(mapping);
+			i_mmap_lock_write(mapping);
+			has_write_lock = true;
+			goto retry;
+		}
+		ret = hugetlb_split_to_shift(mm, vma, hpte, addr,
+					     desired_shift);
+	}
+
+out:
+	if (has_write_lock && !write_locked) {
+		/* Drop the write lock. */
+		i_mmap_unlock_write(mapping);
+		i_mmap_lock_read(mapping);
+		has_write_lock = false;
+		goto retry;
+	}
+
+	return ret;
+}
 #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */
 
 /*
-- 
2.37.0.rc0.161.g10f37bed90-goog



  parent reply	other threads:[~2022-06-24 17:37 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-24 17:36 [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping James Houghton
2022-06-24 17:36 ` [RFC PATCH 01/26] hugetlb: make hstate accessor functions const James Houghton
2022-06-24 18:43   ` Mina Almasry
     [not found]   ` <e55f90f5-ba14-5d6e-8f8f-abf731b9095e@nutanix.com>
2022-06-27 12:09     ` manish.mishra
2022-06-28 17:08       ` James Houghton
2022-06-29  6:18   ` Muchun Song
2022-06-24 17:36 ` [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates James Houghton
2022-06-24 18:51   ` Mina Almasry
2022-06-27 12:08   ` manish.mishra
2022-06-28 15:35     ` James Houghton
2022-06-27 18:42   ` Mike Kravetz
2022-06-28 15:40     ` James Houghton
2022-06-29  6:39       ` Muchun Song
2022-06-29 21:06         ` Mike Kravetz
2022-06-29 21:13           ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 03/26] hugetlb: add make_huge_pte_with_shift James Houghton
2022-06-24 19:01   ` Mina Almasry
2022-06-27 12:13   ` manish.mishra
2022-06-24 17:36 ` [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument James Houghton
2022-06-27 12:26   ` manish.mishra
2022-06-27 20:51   ` Mike Kravetz
2022-06-28 15:29     ` James Houghton
2022-06-29  6:09     ` Muchun Song
2022-06-29 21:03       ` Mike Kravetz
2022-06-29 21:39         ` James Houghton
2022-06-29 22:24           ` Mike Kravetz
2022-06-30  9:35             ` Muchun Song
2022-06-30 16:23               ` James Houghton
2022-06-30 17:40                 ` Mike Kravetz
2022-07-01  3:32                 ` Muchun Song
2022-06-24 17:36 ` [RFC PATCH 05/26] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING James Houghton
2022-06-27 12:28   ` manish.mishra
2022-06-28 20:03     ` Mina Almasry
2022-06-24 17:36 ` [RFC PATCH 06/26] mm: make free_p?d_range functions public James Houghton
2022-06-27 12:31   ` manish.mishra
2022-06-28 20:35   ` Mike Kravetz
2022-07-12 20:52     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries James Houghton
2022-06-27 12:47   ` manish.mishra
2022-06-29 16:28     ` James Houghton
2022-06-28 20:25   ` Mina Almasry
2022-06-29 16:42     ` James Houghton
2022-06-28 20:44   ` Mike Kravetz
2022-06-29 16:24     ` James Houghton
2022-07-11 23:32   ` Mike Kravetz
2022-07-12  9:42     ` Dr. David Alan Gilbert
2022-07-12 17:51       ` Mike Kravetz
2022-07-15 16:35       ` Peter Xu
2022-07-15 21:52         ` Axel Rasmussen
2022-07-15 23:03           ` Peter Xu
2022-09-08 17:38   ` Peter Xu
2022-09-08 17:54     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 08/26] hugetlb: add hugetlb_free_range to free PT structures James Houghton
2022-06-27 12:52   ` manish.mishra
2022-06-28 20:27   ` Mina Almasry
2022-06-24 17:36 ` [RFC PATCH 09/26] hugetlb: add hugetlb_hgm_enabled James Houghton
2022-06-27 12:55   ` manish.mishra
2022-06-28 20:33   ` Mina Almasry
2022-09-08 18:07   ` Peter Xu
2022-09-08 18:13     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 10/26] hugetlb: add for_each_hgm_shift James Houghton
2022-06-27 13:01   ` manish.mishra
2022-06-28 21:58   ` Mina Almasry
2022-07-07 21:39     ` Mike Kravetz
2022-07-08 15:52     ` James Houghton
2022-07-09 21:55       ` Mina Almasry
2022-06-24 17:36 ` [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks James Houghton
2022-06-27 13:07   ` manish.mishra
2022-07-07 23:03     ` Mike Kravetz
2022-09-08 18:20   ` Peter Xu
2022-06-24 17:36 ` [RFC PATCH 12/26] hugetlb: add HugeTLB splitting functionality James Houghton
2022-06-27 13:50   ` manish.mishra
2022-06-29 16:10     ` James Houghton
2022-06-29 14:33   ` manish.mishra
2022-06-29 16:20     ` James Houghton
2022-06-24 17:36 ` James Houghton [this message]
2022-06-29 14:11   ` [RFC PATCH 13/26] hugetlb: add huge_pte_alloc_high_granularity manish.mishra
2022-06-24 17:36 ` [RFC PATCH 14/26] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page James Houghton
2022-06-29 14:40   ` manish.mishra
2022-06-29 15:56     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 15/26] hugetlb: make unmapping compatible with high-granularity mappings James Houghton
2022-07-19 10:19   ` manish.mishra
2022-07-19 15:58     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 16/26] hugetlb: make hugetlb_change_protection compatible with HGM James Houghton
2022-06-24 17:36 ` [RFC PATCH 17/26] hugetlb: update follow_hugetlb_page to support HGM James Houghton
2022-07-19 10:48   ` manish.mishra
2022-07-19 16:19     ` James Houghton
2022-06-24 17:36 ` [RFC PATCH 18/26] hugetlb: use struct hugetlb_pte for walk_hugetlb_range James Houghton
2022-06-24 17:36 ` [RFC PATCH 19/26] hugetlb: add HGM support for copy_hugetlb_page_range James Houghton
2022-07-11 23:41   ` Mike Kravetz
2022-07-12 17:19     ` James Houghton
2022-07-12 18:06       ` Mike Kravetz
2022-07-15 21:39         ` Axel Rasmussen
2022-06-24 17:36 ` [RFC PATCH 20/26] hugetlb: add support for high-granularity UFFDIO_CONTINUE James Houghton
2022-07-15 16:21   ` Peter Xu
2022-07-15 16:58     ` James Houghton
2022-07-15 17:20       ` Peter Xu
2022-07-20 20:58         ` James Houghton
2022-07-21 19:09           ` Peter Xu
2022-07-21 19:44             ` James Houghton
2022-07-21 19:53               ` Peter Xu
2022-06-24 17:36 ` [RFC PATCH 21/26] hugetlb: add hugetlb_collapse James Houghton
2022-06-24 17:36 ` [RFC PATCH 22/26] madvise: add uapi for HugeTLB HGM collapse: MADV_COLLAPSE James Houghton
2022-06-24 17:36 ` [RFC PATCH 23/26] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM James Houghton
2022-06-24 17:36 ` [RFC PATCH 24/26] arm64/hugetlb: add support for high-granularity mappings James Houghton
2022-06-24 17:36 ` [RFC PATCH 25/26] selftests: add HugeTLB HGM to userfaultfd selftest James Houghton
2022-06-24 17:36 ` [RFC PATCH 26/26] selftests: add HugeTLB HGM to KVM demand paging selftest James Houghton
2022-06-24 18:29 ` [RFC PATCH 00/26] hugetlb: Introduce HugeTLB high-granularity mapping Matthew Wilcox
2022-06-27 16:36   ` James Houghton
2022-06-27 17:56     ` Dr. David Alan Gilbert
2022-06-27 20:31       ` James Houghton
2022-06-28  0:04         ` Nadav Amit
2022-06-30 19:21           ` Peter Xu
2022-07-01  5:54             ` Nadav Amit
2022-06-28  8:20         ` Dr. David Alan Gilbert
2022-06-30 16:09           ` Peter Xu
2022-06-24 18:41 ` Mina Almasry
2022-06-27 16:27   ` James Houghton
2022-06-28 14:17     ` Muchun Song
2022-06-28 17:26     ` Mina Almasry
2022-06-28 17:56       ` Dr. David Alan Gilbert
2022-06-29 18:31         ` James Houghton
2022-06-29 20:39       ` Axel Rasmussen
2022-06-24 18:47 ` Matthew Wilcox
2022-06-27 16:48   ` James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220624173656.2033256-14-jthoughton@google.com \
    --to=jthoughton@google.com \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=juew@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=manish.mishra@nutanix.com \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox