[RFC PATCH 0/3] Reduce dependence on vmas deep in hugetlb allocation code

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] Reduce dependence on vmas deep in hugetlb allocation code
@ 2024-10-11 23:22 Ackerley Tng
  2024-10-11 23:22 ` [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() Ackerley Tng
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Ackerley Tng @ 2024-10-11 23:22 UTC (permalink / raw)
  To: muchun.song, peterx, akpm, rientjes, fvdl, jthoughton, david
  Cc: isaku.yamahata, zhiquan1.li, fan.du, jun.miao, tabba,
	quic_eberman, roypat, jgg, jhubbard, seanjc, pbonzini,
	erdemaktas, vannapurve, ackerleytng, pgonda, linux-kernel,
	linux-mm

I hope to use these 3 patches to start a discussion on eventually
removing the need to pass a struct vma pointer when taking a folio
from the global pool (i.e. dequeue_hugetlb_folio_vma()).

Why eliminate passing the struct vma pointer?

VMAs are more related to mapping into userspace, and it would be cleaner if the
HugeTLB folio allocation process could just focus on returning a folio.

Currently, the vma struct is a convenient struct that holds pieces of
information required in the allocation process, but dequeuing should not depend
on the VMA concept.

If the vma is needed deep in the allocation process, then allocation could
become awkward, such as in HugeTLBfs's fallocate, where there is no vma (yet)
and a pseudo-vma has to be created.

Separation will help with HugeTLB unification. Taking reference from the buddy
allocator, __alloc_pages_noprof() is conceptually separate from VMAs.

I started looking into this because we want to use HugeTLB folios in guest_memfd
[1], and then I found that the HugeTLB folio allocation process is tightly
coupled with VMAs. This makes it hard to use HugeTLB folios in guest_memfd,
which does not have VMAs for private pages.

Then, I watched Peter Xu's talk at LSFMM [2] about HugeTLB unifications and
thought that these patches could also contribute to the unification effort.

As discussed at LPC 2024 [3], the general preference is for guest_memfd to use
HugeTLB folios. While that is being worked out, I hope these patches can be
separately considered and merged. I believe the patches are still useful in
improving understandability of the resv_map/subpool/hstate reservation system in
HugeTLB, and there are no functionality changes intended.

---

Why use HugeTLB folios in guest_memfd?

HugeTLB is *the* source of 1G pages in the kernel today and it would be best for
all 1G page users (HugeTLB, HugeTLBfs, or guest_memfd) on a host to draw from
the same pool of 1G pages.

This allows central tracking of all 1G pages, a precious resource on a machine.

Having a separate 1G page allocator would not only require rebuilding
of features that HugeTLB has, but also cause a split 1G pool. If both
allocators are used on a machine, it would be complicated to

(a) predetermine how many pages to put in each allocator's pool or
(b) transfer pages between the pools at runtime.

---

[1] https://lore.kernel.org/all/cover.1726009989.git.ackerleytng@google.com/T/
[2] https://youtu.be/7k-m2gTDu2k?si=ghWZ6qa1GAdaHOFP
[3] https://youtu.be/PVTjLLEpozE?si=HvdDlUc_4ElVXu5R

Ackerley Tng (3):
  mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma()
  mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv()
  mm: hugetlb: Remove unnecessary check for avoid_reserve

 mm/hugetlb.c | 57 +++++++++++++++++++++-------------------------------
 1 file changed, 23 insertions(+), 34 deletions(-)

--
2.47.0.rc1.288.g06298d1525-goog

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma()
  2024-10-11 23:22 [RFC PATCH 0/3] Reduce dependence on vmas deep in hugetlb allocation code Ackerley Tng
@ 2024-10-11 23:22 ` Ackerley Tng
  2024-10-30 14:31   ` Sean Christopherson
                     ` (2 more replies)
  2024-10-11 23:22 ` [RFC PATCH 2/3] mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() Ackerley Tng
  2024-10-11 23:22 ` [RFC PATCH 3/3] mm: hugetlb: Remove unnecessary check for avoid_reserve Ackerley Tng
  2 siblings, 3 replies; 9+ messages in thread
From: Ackerley Tng @ 2024-10-11 23:22 UTC (permalink / raw)
  To: muchun.song, peterx, akpm, rientjes, fvdl, jthoughton, david
  Cc: isaku.yamahata, zhiquan1.li, fan.du, jun.miao, tabba,
	quic_eberman, roypat, jgg, jhubbard, seanjc, pbonzini,
	erdemaktas, vannapurve, ackerleytng, pgonda, linux-kernel,
	linux-mm

Replace arguments avoid_reserve and chg in dequeue_hugetlb_folio_vma()
so dequeue_hugetlb_folio_vma() is more understandable.

The new argument, use_hstate_resv, indicates whether the folio to be
dequeued should be taken from reservations in hstate.

If use_hstate_resv is true, the folio to be dequeued should be taken
from reservations in hstate and hence h->resv_huge_pages is
decremented, and the folio is marked so that the reservation is
restored.

If use_hstate_resv is false, then a folio needs to be taken from the
pool and hence there must exist available_huge_pages(h), failing
which, goto err.

The bool use_hstate_resv can be reused within
dequeue_hugetlb_folio_vma()'s caller, alloc_hugetlb_folio().

No functional changes are intended.

As proof, the original two if conditions

!vma_has_reserves(vma, chg) && !available_huge_pages(h)

and

avoid_reserve && !available_huge_pages(h)

can be combined into

(avoid_reserve || !vma_has_reserves(vma, chg))
&& !available_huge_pages(h).

Applying de Morgan's theorem on

avoid_reserve || !vma_has_reserves(vma, chg)

yields

!avoid_reserve && vma_has_reserves(vma, chg),

hence the simplification is correct.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
 mm/hugetlb.c | 33 +++++++++++----------------------
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 190fa05635f4..73165c670739 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1281,8 +1281,9 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
 	}

 	/*
-	 * Only the process that called mmap() has reserves for
-	 * private mappings.
+	 * Only the process that called mmap() has reserves for private
+	 * mappings. A child process with MAP_PRIVATE mappings created by their
+	 * parent have no page reserves.
 	 */
 	if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
 		/*
@@ -1394,8 +1395,7 @@ static unsigned long available_huge_pages(struct hstate *h)

 static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
 				struct vm_area_struct *vma,
-				unsigned long address, int avoid_reserve,
-				long chg)
+				unsigned long address, bool use_hstate_resv)
 {
 	struct folio *folio = NULL;
 	struct mempolicy *mpol;
@@ -1403,16 +1403,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
 	nodemask_t *nodemask;
 	int nid;

-	/*
-	 * A child process with MAP_PRIVATE mappings created by their parent
-	 * have no page reserves. This check ensures that reservations are
-	 * not "stolen". The child may still get SIGKILLed
-	 */
-	if (!vma_has_reserves(vma, chg) && !available_huge_pages(h))
-		goto err;
-
-	/* If reserves cannot be used, ensure enough pages are in the pool */
-	if (avoid_reserve && !available_huge_pages(h))
+	if (!use_hstate_resv && !available_huge_pages(h))
 		goto err;

 	gfp_mask = htlb_alloc_mask(h);
@@ -1430,7 +1421,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
 		folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask,
 							nid, nodemask);

-	if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) {
+	if (folio && use_hstate_resv) {
 		folio_set_hugetlb_restore_reserve(folio);
 		h->resv_huge_pages--;
 	}
@@ -2973,6 +2964,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 	struct mem_cgroup *memcg;
 	bool deferred_reserve;
 	gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
+	bool use_hstate_resv;

 	memcg = get_mem_cgroup_from_current();
 	memcg_charge_ret = mem_cgroup_hugetlb_try_charge(memcg, gfp, nr_pages);
@@ -3033,20 +3025,17 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 	if (ret)
 		goto out_uncharge_cgroup_reservation;

+	use_hstate_resv = !avoid_reserve && vma_has_reserves(vma, gbl_chg);
+
 	spin_lock_irq(&hugetlb_lock);
-	/*
-	 * glb_chg is passed to indicate whether or not a page must be taken
-	 * from the global free pool (global change).  gbl_chg == 0 indicates
-	 * a reservation exists for the allocation.
-	 */
-	folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
+	folio = dequeue_hugetlb_folio_vma(h, vma, addr, use_hstate_resv);
 	if (!folio) {
 		spin_unlock_irq(&hugetlb_lock);
 		folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
 		if (!folio)
 			goto out_uncharge_cgroup;
 		spin_lock_irq(&hugetlb_lock);
-		if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
+		if (use_hstate_resv) {
 			folio_set_hugetlb_restore_reserve(folio);
 			h->resv_huge_pages--;
 		}
--
2.47.0.rc1.288.g06298d1525-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma()
  2024-10-11 23:22 ` [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() Ackerley Tng
@ 2024-10-30 14:31   ` Sean Christopherson
  2024-11-05 17:10   ` Peter Xu
  2024-11-06 10:13   ` Oscar Salvador
  2 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-10-30 14:31 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: muchun.song, peterx, akpm, rientjes, fvdl, jthoughton, david,
	isaku.yamahata, zhiquan1.li, fan.du, jun.miao, tabba,
	quic_eberman, roypat, jgg, jhubbard, pbonzini, erdemaktas,
	vannapurve, pgonda, linux-kernel, linux-mm

On Fri, Oct 11, 2024, Ackerley Tng wrote:
> Replace arguments avoid_reserve and chg in dequeue_hugetlb_folio_vma()
> so dequeue_hugetlb_folio_vma() is more understandable.
> 
> The new argument, use_hstate_resv, indicates whether the folio to be
> dequeued should be taken from reservations in hstate.
> 
> If use_hstate_resv is true, the folio to be dequeued should be taken
> from reservations in hstate and hence h->resv_huge_pages is
> decremented, and the folio is marked so that the reservation is
> restored.
> 
> If use_hstate_resv is false, then a folio needs to be taken from the
> pool and hence there must exist available_huge_pages(h), failing
> which, goto err.
> 
> The bool use_hstate_resv can be reused within
> dequeue_hugetlb_folio_vma()'s caller, alloc_hugetlb_folio().
> 
> No functional changes are intended.
> 
> As proof, the original two if conditions
> 
> !vma_has_reserves(vma, chg) && !available_huge_pages(h)
> 
> and
> 
> avoid_reserve && !available_huge_pages(h)
> 
> can be combined into
> 
> (avoid_reserve || !vma_has_reserves(vma, chg))
> && !available_huge_pages(h).

The period here, and the comma later, are weird.

> Applying de Morgan's theorem on
> 
> avoid_reserve || !vma_has_reserves(vma, chg)
> 
> yields
> 
> !avoid_reserve && vma_has_reserves(vma, chg),
> 
> hence the simplification is correct.

Whitespace exists for a reason :-)

---

As proof, the original two if conditions

  !vma_has_reserves(vma, chg) && !available_huge_pages(h)

and

  avoid_reserve && !available_huge_pages(h)

can be combined into

  (avoid_reserve || !vma_has_reserves(vma, chg)) && !available_huge_pages(h)

Applying de Morgan's theorem on

  avoid_reserve || !vma_has_reserves(vma, chg)

yields

  !avoid_reserve && vma_has_reserves(vma, chg)

hence the simplification is correct.

---


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma()
  2024-10-11 23:22 ` [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() Ackerley Tng
  2024-10-30 14:31   ` Sean Christopherson
@ 2024-11-05 17:10   ` Peter Xu
  2024-11-06 10:13   ` Oscar Salvador
  2 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2024-11-05 17:10 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: muchun.song, akpm, rientjes, fvdl, jthoughton, david,
	isaku.yamahata, zhiquan1.li, fan.du, jun.miao, tabba,
	quic_eberman, roypat, jgg, jhubbard, seanjc, pbonzini,
	erdemaktas, vannapurve, pgonda, linux-kernel, linux-mm

On Fri, Oct 11, 2024 at 11:22:36PM +0000, Ackerley Tng wrote:
> Replace arguments avoid_reserve and chg in dequeue_hugetlb_folio_vma()
> so dequeue_hugetlb_folio_vma() is more understandable.
> 
> The new argument, use_hstate_resv, indicates whether the folio to be
> dequeued should be taken from reservations in hstate.
> 
> If use_hstate_resv is true, the folio to be dequeued should be taken
> from reservations in hstate and hence h->resv_huge_pages is
> decremented, and the folio is marked so that the reservation is
> restored.
> 
> If use_hstate_resv is false, then a folio needs to be taken from the
> pool and hence there must exist available_huge_pages(h), failing
> which, goto err.
> 
> The bool use_hstate_resv can be reused within
> dequeue_hugetlb_folio_vma()'s caller, alloc_hugetlb_folio().
> 
> No functional changes are intended.
> 
> As proof, the original two if conditions
> 
> !vma_has_reserves(vma, chg) && !available_huge_pages(h)
> 
> and
> 
> avoid_reserve && !available_huge_pages(h)
> 
> can be combined into
> 
> (avoid_reserve || !vma_has_reserves(vma, chg))
> && !available_huge_pages(h).
> 
> Applying de Morgan's theorem on
> 
> avoid_reserve || !vma_has_reserves(vma, chg)
> 
> yields
> 
> !avoid_reserve && vma_has_reserves(vma, chg),
> 
> hence the simplification is correct.

Some spacing is definitely good.. as Sean pointed out.

> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
>  mm/hugetlb.c | 33 +++++++++++----------------------
>  1 file changed, 11 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 190fa05635f4..73165c670739 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1281,8 +1281,9 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
>  	}
> 
>  	/*
> -	 * Only the process that called mmap() has reserves for
> -	 * private mappings.
> +	 * Only the process that called mmap() has reserves for private
> +	 * mappings. A child process with MAP_PRIVATE mappings created by their
> +	 * parent have no page reserves.
>  	 */
>  	if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
>  		/*
> @@ -1394,8 +1395,7 @@ static unsigned long available_huge_pages(struct hstate *h)
> 
>  static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
>  				struct vm_area_struct *vma,
> -				unsigned long address, int avoid_reserve,
> -				long chg)
> +				unsigned long address, bool use_hstate_resv)

Here "avoid_reserve" + "chg" is indeed confusing, especially with the prior
"if (avoid_reserve) gbl_chg = 1;".  The new flag can make it slightly
easier to understand indeed for dequeue_hugetlb_folio_vma() alone.

I still feel like there can be something more to be cleaned up here even
after your patch 2-3, but I suppose this could be seen as a small-step
forward, considering one patch change will be harder to review.  Feel free
to take:

Acked-by: Peter Xu <peterx@redhat.com>

>  {
>  	struct folio *folio = NULL;
>  	struct mempolicy *mpol;
> @@ -1403,16 +1403,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
>  	nodemask_t *nodemask;
>  	int nid;
> 
> -	/*
> -	 * A child process with MAP_PRIVATE mappings created by their parent
> -	 * have no page reserves. This check ensures that reservations are
> -	 * not "stolen". The child may still get SIGKILLed
> -	 */
> -	if (!vma_has_reserves(vma, chg) && !available_huge_pages(h))
> -		goto err;
> -
> -	/* If reserves cannot be used, ensure enough pages are in the pool */
> -	if (avoid_reserve && !available_huge_pages(h))
> +	if (!use_hstate_resv && !available_huge_pages(h))
>  		goto err;
> 
>  	gfp_mask = htlb_alloc_mask(h);
> @@ -1430,7 +1421,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
>  		folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask,
>  							nid, nodemask);
> 
> -	if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) {
> +	if (folio && use_hstate_resv) {
>  		folio_set_hugetlb_restore_reserve(folio);
>  		h->resv_huge_pages--;
>  	}
> @@ -2973,6 +2964,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>  	struct mem_cgroup *memcg;
>  	bool deferred_reserve;
>  	gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
> +	bool use_hstate_resv;
> 
>  	memcg = get_mem_cgroup_from_current();
>  	memcg_charge_ret = mem_cgroup_hugetlb_try_charge(memcg, gfp, nr_pages);
> @@ -3033,20 +3025,17 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>  	if (ret)
>  		goto out_uncharge_cgroup_reservation;
> 
> +	use_hstate_resv = !avoid_reserve && vma_has_reserves(vma, gbl_chg);
> +
>  	spin_lock_irq(&hugetlb_lock);
> -	/*
> -	 * glb_chg is passed to indicate whether or not a page must be taken
> -	 * from the global free pool (global change).  gbl_chg == 0 indicates
> -	 * a reservation exists for the allocation.
> -	 */
> -	folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
> +	folio = dequeue_hugetlb_folio_vma(h, vma, addr, use_hstate_resv);
>  	if (!folio) {
>  		spin_unlock_irq(&hugetlb_lock);
>  		folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
>  		if (!folio)
>  			goto out_uncharge_cgroup;
>  		spin_lock_irq(&hugetlb_lock);
> -		if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
> +		if (use_hstate_resv) {
>  			folio_set_hugetlb_restore_reserve(folio);
>  			h->resv_huge_pages--;
>  		}
> --
> 2.47.0.rc1.288.g06298d1525-goog
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma()
  2024-10-11 23:22 ` [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() Ackerley Tng
  2024-10-30 14:31   ` Sean Christopherson
  2024-11-05 17:10   ` Peter Xu
@ 2024-11-06 10:13   ` Oscar Salvador
  2 siblings, 0 replies; 9+ messages in thread
From: Oscar Salvador @ 2024-11-06 10:13 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: muchun.song, peterx, akpm, rientjes, fvdl, jthoughton, david,
	isaku.yamahata, zhiquan1.li, fan.du, jun.miao, tabba,
	quic_eberman, roypat, jgg, jhubbard, seanjc, pbonzini,
	erdemaktas, vannapurve, pgonda, linux-kernel, linux-mm

On Fri, Oct 11, 2024 at 11:22:36PM +0000, Ackerley Tng wrote:
> Replace arguments avoid_reserve and chg in dequeue_hugetlb_folio_vma()
> so dequeue_hugetlb_folio_vma() is more understandable.
> 
> The new argument, use_hstate_resv, indicates whether the folio to be
> dequeued should be taken from reservations in hstate.
> 
> If use_hstate_resv is true, the folio to be dequeued should be taken
> from reservations in hstate and hence h->resv_huge_pages is
> decremented, and the folio is marked so that the reservation is
> restored.
> 
> If use_hstate_resv is false, then a folio needs to be taken from the
> pool and hence there must exist available_huge_pages(h), failing
> which, goto err.
> 
> The bool use_hstate_resv can be reused within
> dequeue_hugetlb_folio_vma()'s caller, alloc_hugetlb_folio().
> 
> No functional changes are intended.
> 
> As proof, the original two if conditions
> 
> !vma_has_reserves(vma, chg) && !available_huge_pages(h)
> 
> and
> 
> avoid_reserve && !available_huge_pages(h)
> 
> can be combined into
> 
> (avoid_reserve || !vma_has_reserves(vma, chg))
> && !available_huge_pages(h).
> 
> Applying de Morgan's theorem on
> 
> avoid_reserve || !vma_has_reserves(vma, chg)
> 
> yields
> 
> !avoid_reserve && vma_has_reserves(vma, chg),
> 
> hence the simplification is correct.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

I like this, it is a nice simplification and hugetlb revervation
mechanism is already hard enough to follow.

As already mentioned, Sean's changes look easier to follow.

Reviewed-by: Oscar Salvador <osalvador@suse.de>

On a side note, we might want to convert the 'avoid_reserve' param from
alloc_hugetlb_folio into a bool, as we are using it exactly like that,
so it would seem more natutal.

> ---
>  mm/hugetlb.c | 33 +++++++++++----------------------
>  1 file changed, 11 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 190fa05635f4..73165c670739 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1281,8 +1281,9 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
>  	}
> 
>  	/*
> -	 * Only the process that called mmap() has reserves for
> -	 * private mappings.
> +	 * Only the process that called mmap() has reserves for private
> +	 * mappings. A child process with MAP_PRIVATE mappings created by their
> +	 * parent have no page reserves.

'page reserves' looks odd. I would say 'reservations'.


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/3] mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv()
  2024-10-11 23:22 [RFC PATCH 0/3] Reduce dependence on vmas deep in hugetlb allocation code Ackerley Tng
  2024-10-11 23:22 ` [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() Ackerley Tng
@ 2024-10-11 23:22 ` Ackerley Tng
  2024-11-05 18:46   ` Peter Xu
  2024-10-11 23:22 ` [RFC PATCH 3/3] mm: hugetlb: Remove unnecessary check for avoid_reserve Ackerley Tng
  2 siblings, 1 reply; 9+ messages in thread
From: Ackerley Tng @ 2024-10-11 23:22 UTC (permalink / raw)
  To: muchun.song, peterx, akpm, rientjes, fvdl, jthoughton, david
  Cc: isaku.yamahata, zhiquan1.li, fan.du, jun.miao, tabba,
	quic_eberman, roypat, jgg, jhubbard, seanjc, pbonzini,
	erdemaktas, vannapurve, ackerleytng, pgonda, linux-kernel,
	linux-mm

With the addition of the chg parameter, vma_has_reserves() no longer
just determines whether the vma has reserves.

The comment in the vma->vm_flags & VM_NORESERVE block indicates that
this function actually computes whether or not the reserved count
should be decremented.

This refactoring also takes into account the allocation's request
parameter avoid_reserve, which helps to further simplify the calling
function alloc_hugetlb_folio().

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
 mm/hugetlb.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 73165c670739..47c421eba112 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1246,9 +1246,19 @@ void clear_vma_resv_huge_pages(struct vm_area_struct *vma)
 	hugetlb_dup_vma_private(vma);
 }

-/* Returns true if the VMA has associated reserve pages */
-static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
+/*
+ * Returns true if this allocation should use (debit) hstate reservations, based on
+ *
+ * @vma: VMA config
+ * @chg: Whether the page requirement can be satisfied using subpool reservations
+ * @avoid_reserve: Whether allocation was requested to avoid using reservations
+ */
+static bool should_use_hstate_resv(struct vm_area_struct *vma, long chg,
+				   bool avoid_reserve)
 {
+	if (avoid_reserve)
+		return false;
+
 	if (vma->vm_flags & VM_NORESERVE) {
 		/*
 		 * This address is already reserved by other process(chg == 0),
@@ -3025,7 +3035,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 	if (ret)
 		goto out_uncharge_cgroup_reservation;

-	use_hstate_resv = !avoid_reserve && vma_has_reserves(vma, gbl_chg);
+	use_hstate_resv = should_use_hstate_resv(vma, gbl_chg, avoid_reserve);

 	spin_lock_irq(&hugetlb_lock);
 	folio = dequeue_hugetlb_folio_vma(h, vma, addr, use_hstate_resv);
--
2.47.0.rc1.288.g06298d1525-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/3] mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv()
  2024-10-11 23:22 ` [RFC PATCH 2/3] mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() Ackerley Tng
@ 2024-11-05 18:46   ` Peter Xu
  2024-11-11  9:19     ` Oscar Salvador
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2024-11-05 18:46 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: muchun.song, akpm, rientjes, fvdl, jthoughton, david,
	isaku.yamahata, zhiquan1.li, fan.du, jun.miao, tabba,
	quic_eberman, roypat, jgg, jhubbard, seanjc, pbonzini,
	erdemaktas, vannapurve, pgonda, linux-kernel, linux-mm

On Fri, Oct 11, 2024 at 11:22:37PM +0000, Ackerley Tng wrote:
> With the addition of the chg parameter, vma_has_reserves() no longer
> just determines whether the vma has reserves.
> 
> The comment in the vma->vm_flags & VM_NORESERVE block indicates that
> this function actually computes whether or not the reserved count
> should be decremented.
> 
> This refactoring also takes into account the allocation's request
> parameter avoid_reserve, which helps to further simplify the calling
> function alloc_hugetlb_folio().
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

I wonder if this patch could be merged with the 3rd, IIUC this can
fundamentally be seen as a movement of what patch 3 was removed.

Furthermore, I do feel like should_use_hstate_resv() could be redundant on
its own on many things.

Let me try to justify.  Firstly, after 3 patches applied, now it looks like
this (I removed all comments to make things shorter..):

static bool should_use_hstate_resv(struct vm_area_struct *vma, long chg,
				   bool avoid_reserve)
{
	if (avoid_reserve)
		return false;

	if (vma->vm_flags & VM_NORESERVE) {
		if (vma->vm_flags & VM_MAYSHARE && chg == 0)
			return true;
		else
			return false;
	}

	if (vma->vm_flags & VM_MAYSHARE) {
		if (chg)
			return false;
		else
			return true;
	}

	if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
		if (chg)
			return false;
		else
			return true;
	}

	return false;
}

Then let's look at chg==0 processing all above: what does it say?  I
suppose it means "we don't need another global reservation".  It means if
chg==0 we always will use an existing reservation.  From math POV it also
is true, so it can already be moved out ahead, IIUC, like this:

static bool should_use_hstate_resv(struct vm_area_struct *vma, long chg,
				   bool avoid_reserve)
{
	if (avoid_reserve)
		return false;

        if (chg == 0)
                return true;

	if (vma->vm_flags & VM_NORESERVE)
                return false;

	if (vma->vm_flags & VM_MAYSHARE)
                return false;

	if (is_vma_resv_set(vma, HPAGE_RESV_OWNER))
                return false;

	return false;             <--------------------- [1]
}

Move on.  If I read it right, above [1] is exactly for avoid_reserve==1
case, where it basically says "it's !NORESERVE, private, and it's not the
vma resv owner, either fork() or CoW".  If my reading is correct, it means
after your patch 2, [1] should never be reachable at all.. I would hope
adding a panic() right above would be ok.

And irrelevant of whether my understanding is correct.. math-wise above is
also already the same as:

static bool should_use_hstate_resv(struct vm_area_struct *vma, long chg,
				   bool avoid_reserve)
{
	if (avoid_reserve)
		return false;

        if (chg == 0)
                return true;

	return false;
}

Then it makes a lot more sense now, because afaict, gbl_chg is exactly what
should_use_hstate_resv() is looking for.. only except avoid_reserve==true.

Would above make sense to you?

In short, it's about whether a patch on top of your series would further
simply this whole thing, like below:

===8<===
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 60e72214d5bf..4b1c5c4ed7be 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1245,80 +1245,6 @@ void clear_vma_resv_huge_pages(struct vm_area_struct *vma)
 	hugetlb_dup_vma_private(vma);
 }
 
-/*
- * Returns true if this allocation should use (debit) hstate reservations, based on
- *
- * @vma: VMA config
- * @chg: Whether the page requirement can be satisfied using subpool reservations
- * @avoid_reserve: Whether allocation was requested to avoid using reservations
- */
-static bool should_use_hstate_resv(struct vm_area_struct *vma, long chg,
-				   bool avoid_reserve)
-{
-	if (avoid_reserve)
-		return false;
-
-	if (vma->vm_flags & VM_NORESERVE) {
-		/*
-		 * This address is already reserved by other process(chg == 0),
-		 * so, we should decrement reserved count. Without decrementing,
-		 * reserve count remains after releasing inode, because this
-		 * allocated page will go into page cache and is regarded as
-		 * coming from reserved pool in releasing step.  Currently, we
-		 * don't have any other solution to deal with this situation
-		 * properly, so add work-around here.
-		 */
-		if (vma->vm_flags & VM_MAYSHARE && chg == 0)
-			return true;
-		else
-			return false;
-	}
-
-	/* Shared mappings always use reserves */
-	if (vma->vm_flags & VM_MAYSHARE) {
-		/*
-		 * We know VM_NORESERVE is not set.  Therefore, there SHOULD
-		 * be a region map for all pages.  The only situation where
-		 * there is no region map is if a hole was punched via
-		 * fallocate.  In this case, there really are no reserves to
-		 * use.  This situation is indicated if chg != 0.
-		 */
-		if (chg)
-			return false;
-		else
-			return true;
-	}
-
-	/*
-	 * Only the process that called mmap() has reserves for private
-	 * mappings. A child process with MAP_PRIVATE mappings created by their
-	 * parent have no page reserves.
-	 */
-	if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
-		/*
-		 * Like the shared case above, a hole punch or truncate
-		 * could have been performed on the private mapping.
-		 * Examine the value of chg to determine if reserves
-		 * actually exist or were previously consumed.
-		 * Very Subtle - The value of chg comes from a previous
-		 * call to vma_needs_reserves().  The reserve map for
-		 * private mappings has different (opposite) semantics
-		 * than that of shared mappings.  vma_needs_reserves()
-		 * has already taken this difference in semantics into
-		 * account.  Therefore, the meaning of chg is the same
-		 * as in the shared case above.  Code could easily be
-		 * combined, but keeping it separate draws attention to
-		 * subtle differences.
-		 */
-		if (chg)
-			return false;
-		else
-			return true;
-	}
-
-	return false;
-}
-
 static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio)
 {
 	int nid = folio_nid(folio);
@@ -3255,7 +3181,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 
 	}
 
-	use_hstate_resv = should_use_hstate_resv(vma, gbl_chg, avoid_reserve);
+	use_hstate_resv = avoid_reserve || !gbl_chg;
 
 	/*
 	 * charge_cgroup_reservation if this allocation is not consuming a
===8<===

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 2/3] mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv()
  2024-11-05 18:46   ` Peter Xu
@ 2024-11-11  9:19     ` Oscar Salvador
  0 siblings, 0 replies; 9+ messages in thread
From: Oscar Salvador @ 2024-11-11  9:19 UTC (permalink / raw)
  To: Peter Xu
  Cc: Ackerley Tng, muchun.song, akpm, rientjes, fvdl, jthoughton,
	david, isaku.yamahata, zhiquan1.li, fan.du, jun.miao, tabba,
	quic_eberman, roypat, jgg, jhubbard, seanjc, pbonzini,
	erdemaktas, vannapurve, pgonda, linux-kernel, linux-mm

On Tue, Nov 05, 2024 at 01:46:39PM -0500, Peter Xu wrote:
> I wonder if this patch could be merged with the 3rd, IIUC this can
> fundamentally be seen as a movement of what patch 3 was removed.

I think it makes sense to merge it, yes.

> Furthermore, I do feel like should_use_hstate_resv() could be redundant on
> its own on many things.

...
 
> Then let's look at chg==0 processing all above: what does it say?  I
> suppose it means "we don't need another global reservation".  It means if
> chg==0 we always will use an existing reservation.  From math POV it also
> is true, so it can already be moved out ahead, IIUC, like this:
> 
> static bool should_use_hstate_resv(struct vm_area_struct *vma, long chg,
> 				   bool avoid_reserve)
> {
> 	if (avoid_reserve)
> 		return false;
> 
>         if (chg == 0)
>                 return true;
> 
> 	if (vma->vm_flags & VM_NORESERVE)
>                 return false;
> 
> 	if (vma->vm_flags & VM_MAYSHARE)
>                 return false;
> 
> 	if (is_vma_resv_set(vma, HPAGE_RESV_OWNER))
>                 return false;
> 
> 	return false;             <--------------------- [1]
> }
> 
> Move on.  If I read it right, above [1] is exactly for avoid_reserve==1
> case, where it basically says "it's !NORESERVE, private, and it's not the
> vma resv owner, either fork() or CoW".  If my reading is correct, it means
> after your patch 2, [1] should never be reachable at all.. I would hope
> adding a panic() right above would be ok.
> 
> And irrelevant of whether my understanding is correct.. math-wise above is
> also already the same as:
> 
> static bool should_use_hstate_resv(struct vm_area_struct *vma, long chg,
> 				   bool avoid_reserve)
> {
> 	if (avoid_reserve)
> 		return false;
> 
>         if (chg == 0)
>                 return true;
> 
> 	return false;
> }

I have been looking into this because hugetlb reservations always make
me uneasy, but I think you are right.

CoW and fork both set avoid_reserve to 1,

 copy_hugetlb_range
  ...
  alloc_hugetlb_folio(dst_vma, addr, 1)

 hugetlb_wp
  outside_reserve = 1
  alloc_hugetlb_folio(..., outside_reserve)

So I think you are right and this can be simplified.

I would not add a panic though, maybe some kind of warning (VM_DEBUG?).


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 3/3] mm: hugetlb: Remove unnecessary check for avoid_reserve
  2024-10-11 23:22 [RFC PATCH 0/3] Reduce dependence on vmas deep in hugetlb allocation code Ackerley Tng
  2024-10-11 23:22 ` [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() Ackerley Tng
  2024-10-11 23:22 ` [RFC PATCH 2/3] mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() Ackerley Tng
@ 2024-10-11 23:22 ` Ackerley Tng
  2 siblings, 0 replies; 9+ messages in thread
From: Ackerley Tng @ 2024-10-11 23:22 UTC (permalink / raw)
  To: muchun.song, peterx, akpm, rientjes, fvdl, jthoughton, david
  Cc: isaku.yamahata, zhiquan1.li, fan.du, jun.miao, tabba,
	quic_eberman, roypat, jgg, jhubbard, seanjc, pbonzini,
	erdemaktas, vannapurve, ackerleytng, pgonda, linux-kernel,
	linux-mm

If avoid_reserve is true, gbl_chg is not used anyway, so there is no
point in setting gbl_chg.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
 mm/hugetlb.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 47c421eba112..a2e2b770a018 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3009,16 +3009,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 		if (gbl_chg < 0)
 			goto out_end_reservation;

-		/*
-		 * Even though there was no reservation in the region/reserve
-		 * map, there could be reservations associated with the
-		 * subpool that can be used.  This would be indicated if the
-		 * return value of hugepage_subpool_get_pages() is zero.
-		 * However, if avoid_reserve is specified we still avoid even
-		 * the subpool reservations.
-		 */
-		if (avoid_reserve)
-			gbl_chg = 1;
 	}

 	/* If this allocation is not consuming a reservation, charge it now.
--
2.47.0.rc1.288.g06298d1525-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-11-11  9:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-11 23:22 [RFC PATCH 0/3] Reduce dependence on vmas deep in hugetlb allocation code Ackerley Tng
2024-10-11 23:22 ` [RFC PATCH 1/3] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() Ackerley Tng
2024-10-30 14:31   ` Sean Christopherson
2024-11-05 17:10   ` Peter Xu
2024-11-06 10:13   ` Oscar Salvador
2024-10-11 23:22 ` [RFC PATCH 2/3] mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() Ackerley Tng
2024-11-05 18:46   ` Peter Xu
2024-11-11  9:19     ` Oscar Salvador
2024-10-11 23:22 ` [RFC PATCH 3/3] mm: hugetlb: Remove unnecessary check for avoid_reserve Ackerley Tng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox