linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] memcg/hugetlb: Rework memcg hugetlb charging
@ 2024-11-08 21:29 Joshua Hahn
  2024-11-08 21:29 ` [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb Joshua Hahn
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Joshua Hahn @ 2024-11-08 21:29 UTC (permalink / raw)
  To: shakeel.butt
  Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
	linux-mm, linux-kernel, kernel-team

This series cleans up memcg's hugetlb charging logic by deprecating the
current memcg hugetlb try-charge + {commit, cancel} logic present in
alloc_hugetlb_folio. A single function mem_cgroup_charge_hugetlb takes
its place instead. This makes the code more maintainable by simplifying
the error path and reduces memcg's footprint in hugetlb logic.

This patch introduces a few changes in the hugetlb folio allocation
error path:
(a) Instead of having multiple return points, we consolidate them to
    two: one for reaching the memcg limit or running out of memory
    (-ENOMEM) and one for hugetlb allocation fails / limit being
    reached (-ENOSPC).
(b) Previously, the memcg limit was checked before the folio is acquired,
    meaning the hugeTLB folio isn't acquired if the limit is reached.
    This patch performs the charging after the folio is reached, meaning
    if memcg's limit is reached, the acquired folio is freed right away.

This patch builds on two earlier patch series: [2] which adds memcg
hugeTLB counters, and [3] which deprecates charge moving and removes the
last references to mem_cgroup_cancel_charge. The request for this cleanup
can be found in [2].

Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

[1] https://lore.kernel.org/all/20231006184629.155543-1-nphamcs@gmail.com/
[2] https://lore.kernel.org/all/20241101204402.1885383-1-joshua.hahnjy@gmail.com/
[3] https://lore.kernel.org/linux-mm/20241025012304.2473312-1-shakeel.butt@linux.dev/

---
Changelog
v2:
  * Removed declaration of memcg_accounts_hugetlb from memcontrol.h
  * Moved second call to memcg_accounts_hugetlb from 2nd patch to 1st
  * Changed error behavior in alloc_hugetlb_folio: v1 included a bug
    that uncharged hugetlb_cgroup twice when memecg's limit was reached
  * mem_cgroup_charge_hugetlb no longer called with hugetlb_lock held
  * Moved mem_cgroup_hugetlb_{try, charge} deprecation to patch 3
  * mem_cgroup_charge_hugetlb always decrements memcg's refcount
  * Fully cleaned up mem_cgroup_{cancel,commit}_charge
  * Fixed typos
Joshua Hahn (3):
  memcg/hugetlb: Introduce memcg_accounts_hugetlb
  memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
  memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol

 include/linux/memcontrol.h | 22 +---------
 mm/hugetlb.c               | 34 +++++----------
 mm/memcontrol.c            | 89 +++++++++++++-------------------------
 3 files changed, 43 insertions(+), 102 deletions(-)

-- 
2.43.5



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb
  2024-11-08 21:29 [PATCH 0/3] memcg/hugetlb: Rework memcg hugetlb charging Joshua Hahn
@ 2024-11-08 21:29 ` Joshua Hahn
  2024-11-08 22:21   ` Shakeel Butt
  2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
  2024-11-08 21:29 ` [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol Joshua Hahn
  2 siblings, 1 reply; 18+ messages in thread
From: Joshua Hahn @ 2024-11-08 21:29 UTC (permalink / raw)
  To: shakeel.butt
  Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
	linux-mm, linux-kernel, kernel-team

This patch isolates the check for whether memcg accounts hugetlb.
This condition can only be true if the memcg mount option
memory_hugetlb_accounting is on, which includes hugetlb usage
in memory.current.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

---
 mm/memcontrol.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f3a9653cef0e..97f63ec9c9fb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1425,6 +1425,9 @@ unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item)
 		memcg_page_state_output_unit(item);
 }
 
+/* Forward declaration */
+bool memcg_accounts_hugetlb(void);
+
 static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
 {
 	int i;
@@ -1446,7 +1449,7 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
 
 #ifdef CONFIG_HUGETLB_PAGE
 		if (unlikely(memory_stats[i].idx == NR_HUGETLB) &&
-		    !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
+			!memcg_accounts_hugetlb())
 			continue;
 #endif
 		size = memcg_page_state_output(memcg, memory_stats[i].idx);
@@ -4483,6 +4486,15 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
 	return ret;
 }
 
+bool memcg_accounts_hugetlb(void)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+	return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
+#else
+	return false;
+#endif
+}
+
 /**
  * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio
  * @memcg: memcg to charge.
@@ -4508,8 +4520,7 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
 	 * but do not attempt to commit charge later (or cancel on error) either.
 	 */
 	if (mem_cgroup_disabled() || !memcg ||
-		!cgroup_subsys_on_dfl(memory_cgrp_subsys) ||
-		!(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
+		!cgroup_subsys_on_dfl(memory_cgrp_subsys) || !memcg_accounts_hugetlb())
 		return -EOPNOTSUPP;
 
 	if (try_charge(memcg, gfp, nr_pages))
-- 
2.43.5



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
  2024-11-08 21:29 [PATCH 0/3] memcg/hugetlb: Rework memcg hugetlb charging Joshua Hahn
  2024-11-08 21:29 ` [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb Joshua Hahn
@ 2024-11-08 21:29 ` Joshua Hahn
  2024-11-08 22:42   ` Shakeel Butt
                     ` (3 more replies)
  2024-11-08 21:29 ` [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol Joshua Hahn
  2 siblings, 4 replies; 18+ messages in thread
From: Joshua Hahn @ 2024-11-08 21:29 UTC (permalink / raw)
  To: shakeel.butt
  Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
	linux-mm, linux-kernel, kernel-team

This patch introduces mem_cgroup_charge_hugetlb, which combines the
logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
memcg in hugetlb code, and also consolidates the error path that memcg
can take into just one point.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

---
 include/linux/memcontrol.h |  2 ++
 mm/hugetlb.c               | 34 ++++++++++++----------------------
 mm/memcontrol.c            | 19 +++++++++++++++++++
 3 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bb49e0d4b377..d90c1ac791f1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -678,6 +678,8 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
 int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
 		long nr_pages);
 
+int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp);
+
 int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
 				  gfp_t gfp, swp_entry_t entry);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fbb10e52d7ea..6f6841483039 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2967,21 +2967,13 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 	struct hugepage_subpool *spool = subpool_vma(vma);
 	struct hstate *h = hstate_vma(vma);
 	struct folio *folio;
-	long map_chg, map_commit, nr_pages = pages_per_huge_page(h);
+	long map_chg, map_commit;
 	long gbl_chg;
-	int memcg_charge_ret, ret, idx;
+	int ret, idx;
 	struct hugetlb_cgroup *h_cg = NULL;
-	struct mem_cgroup *memcg;
 	bool deferred_reserve;
 	gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
 
-	memcg = get_mem_cgroup_from_current();
-	memcg_charge_ret = mem_cgroup_hugetlb_try_charge(memcg, gfp, nr_pages);
-	if (memcg_charge_ret == -ENOMEM) {
-		mem_cgroup_put(memcg);
-		return ERR_PTR(-ENOMEM);
-	}
-
 	idx = hstate_index(h);
 	/*
 	 * Examine the region/reserve map to determine if the process
@@ -2989,12 +2981,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 	 * code of zero indicates a reservation exists (no change).
 	 */
 	map_chg = gbl_chg = vma_needs_reservation(h, vma, addr);
-	if (map_chg < 0) {
-		if (!memcg_charge_ret)
-			mem_cgroup_cancel_charge(memcg, nr_pages);
-		mem_cgroup_put(memcg);
+	if (map_chg < 0)
 		return ERR_PTR(-ENOMEM);
-	}
 
 	/*
 	 * Processes that did not create the mapping will have no
@@ -3092,10 +3080,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 		}
 	}
 
-	if (!memcg_charge_ret)
-		mem_cgroup_commit_charge(folio, memcg);
-	lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h));
-	mem_cgroup_put(memcg);
+	ret = mem_cgroup_charge_hugetlb(folio, gfp);
+	if (ret == -ENOMEM) {
+		spin_unlock_irq(&hugetlb_lock);
+		free_huge_folio(folio);
+		return ERR_PTR(-ENOMEM);
+	}
+	else if (!ret)
+		lruvec_stat_mod_folio(folio, NR_HUGETLB,
+		      pages_per_huge_page(h));
 
 	return folio;
 
@@ -3110,9 +3103,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
 		hugepage_subpool_put_pages(spool, 1);
 out_end_reservation:
 	vma_end_reservation(h, vma, addr);
-	if (!memcg_charge_ret)
-		mem_cgroup_cancel_charge(memcg, nr_pages);
-	mem_cgroup_put(memcg);
 	return ERR_PTR(-ENOSPC);
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 97f63ec9c9fb..95ee77fe27af 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4529,6 +4529,25 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
 	return 0;
 }
 
+int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
+{
+	struct mem_cgroup *memcg = get_mem_cgroup_from_current();
+	int ret = 0;
+
+	if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
+		!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	if (charge_memcg(folio, memcg, gfp))
+		ret = -ENOMEM;
+
+out:
+	mem_cgroup_put(memcg);
+	return ret;
+}
+
 /**
  * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin.
  * @folio: folio to charge.
-- 
2.43.5



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol
  2024-11-08 21:29 [PATCH 0/3] memcg/hugetlb: Rework memcg hugetlb charging Joshua Hahn
  2024-11-08 21:29 ` [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb Joshua Hahn
  2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
@ 2024-11-08 21:29 ` Joshua Hahn
  2024-11-08 22:43   ` Shakeel Butt
  2024-11-08 23:07   ` Yosry Ahmed
  2 siblings, 2 replies; 18+ messages in thread
From: Joshua Hahn @ 2024-11-08 21:29 UTC (permalink / raw)
  To: shakeel.butt
  Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
	linux-mm, linux-kernel, kernel-team

This patch fully deprecates the mem_cgroup_{try, commit, cancel} charge
functions, as well as their hugetlb variants. Please note that this
patch relies on [1], which removes the last references (from memcg-v1)
to some of these functions.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

[1] https://lore.kernel.org/linux-mm/20241025012304.2473312-1-shakeel.butt@linux.dev/

---
 include/linux/memcontrol.h | 22 -------------
 mm/memcontrol.c            | 65 ++------------------------------------
 2 files changed, 3 insertions(+), 84 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d90c1ac791f1..75f15c4efe73 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -649,8 +649,6 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
 		page_counter_read(&memcg->memory);
 }
 
-void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg);
-
 int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp);
 
 /**
@@ -675,9 +673,6 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
 	return __mem_cgroup_charge(folio, mm, gfp);
 }
 
-int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
-		long nr_pages);
-
 int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp);
 
 int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
@@ -708,7 +703,6 @@ static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
 	__mem_cgroup_uncharge_folios(folios);
 }
 
-void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages);
 void mem_cgroup_replace_folio(struct folio *old, struct folio *new);
 void mem_cgroup_migrate(struct folio *old, struct folio *new);
 
@@ -1167,23 +1161,12 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
 	return false;
 }
 
-static inline void mem_cgroup_commit_charge(struct folio *folio,
-		struct mem_cgroup *memcg)
-{
-}
-
 static inline int mem_cgroup_charge(struct folio *folio,
 		struct mm_struct *mm, gfp_t gfp)
 {
 	return 0;
 }
 
-static inline int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg,
-		gfp_t gfp, long nr_pages)
-{
-	return 0;
-}
-
 static inline int mem_cgroup_swapin_charge_folio(struct folio *folio,
 			struct mm_struct *mm, gfp_t gfp, swp_entry_t entry)
 {
@@ -1202,11 +1185,6 @@ static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
 {
 }
 
-static inline void mem_cgroup_cancel_charge(struct mem_cgroup *memcg,
-		unsigned int nr_pages)
-{
-}
-
 static inline void mem_cgroup_replace_folio(struct folio *old,
 		struct folio *new)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 95ee77fe27af..17126d8d263d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2351,21 +2351,6 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	return 0;
 }
 
-/**
- * mem_cgroup_cancel_charge() - cancel an uncommitted try_charge() call.
- * @memcg: memcg previously charged.
- * @nr_pages: number of pages previously charged.
- */
-void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-	if (mem_cgroup_is_root(memcg))
-		return;
-
-	page_counter_uncharge(&memcg->memory, nr_pages);
-	if (do_memsw_account())
-		page_counter_uncharge(&memcg->memsw, nr_pages);
-}
-
 static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
 {
 	VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio);
@@ -2379,18 +2364,6 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
 	folio->memcg_data = (unsigned long)memcg;
 }
 
-/**
- * mem_cgroup_commit_charge - commit a previously successful try_charge().
- * @folio: folio to commit the charge to.
- * @memcg: memcg previously charged.
- */
-void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg)
-{
-	css_get(&memcg->css);
-	commit_charge(folio, memcg);
-	memcg1_commit_charge(folio, memcg);
-}
-
 static inline void __mod_objcg_mlstate(struct obj_cgroup *objcg,
 				       struct pglist_data *pgdat,
 				       enum node_stat_item idx, int nr)
@@ -4469,7 +4442,9 @@ static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg,
 	if (ret)
 		goto out;
 
-	mem_cgroup_commit_charge(folio, memcg);
+	css_get(&memcg->css);
+	commit_charge(folio, memcg);
+	memcg1_commit_charge(folio, memcg);
 out:
 	return ret;
 }
@@ -4495,40 +4470,6 @@ bool memcg_accounts_hugetlb(void)
 #endif
 }
 
-/**
- * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio
- * @memcg: memcg to charge.
- * @gfp: reclaim mode.
- * @nr_pages: number of pages to charge.
- *
- * This function is called when allocating a huge page folio to determine if
- * the memcg has the capacity for it. It does not commit the charge yet,
- * as the hugetlb folio itself has not been obtained from the hugetlb pool.
- *
- * Once we have obtained the hugetlb folio, we can call
- * mem_cgroup_commit_charge() to commit the charge. If we fail to obtain the
- * folio, we should instead call mem_cgroup_cancel_charge() to undo the effect
- * of try_charge().
- *
- * Returns 0 on success. Otherwise, an error code is returned.
- */
-int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
-			long nr_pages)
-{
-	/*
-	 * If hugetlb memcg charging is not enabled, do not fail hugetlb allocation,
-	 * but do not attempt to commit charge later (or cancel on error) either.
-	 */
-	if (mem_cgroup_disabled() || !memcg ||
-		!cgroup_subsys_on_dfl(memory_cgrp_subsys) || !memcg_accounts_hugetlb())
-		return -EOPNOTSUPP;
-
-	if (try_charge(memcg, gfp, nr_pages))
-		return -ENOMEM;
-
-	return 0;
-}
-
 int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
 {
 	struct mem_cgroup *memcg = get_mem_cgroup_from_current();
-- 
2.43.5



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb
  2024-11-08 21:29 ` [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb Joshua Hahn
@ 2024-11-08 22:21   ` Shakeel Butt
  2024-11-08 23:03     ` Yosry Ahmed
  0 siblings, 1 reply; 18+ messages in thread
From: Shakeel Butt @ 2024-11-08 22:21 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
	linux-mm, linux-kernel, kernel-team

On Fri, Nov 08, 2024 at 01:29:44PM -0800, Joshua Hahn wrote:
> This patch isolates the check for whether memcg accounts hugetlb.
> This condition can only be true if the memcg mount option
> memory_hugetlb_accounting is on, which includes hugetlb usage
> in memory.current.
> 
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> 
> ---
>  mm/memcontrol.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f3a9653cef0e..97f63ec9c9fb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1425,6 +1425,9 @@ unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item)
>  		memcg_page_state_output_unit(item);
>  }
>  
> +/* Forward declaration */
> +bool memcg_accounts_hugetlb(void);

No need for forward declaration. Just define it here and make it static.

> +
>  static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
>  {
>  	int i;
> @@ -1446,7 +1449,7 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
>  
>  #ifdef CONFIG_HUGETLB_PAGE
>  		if (unlikely(memory_stats[i].idx == NR_HUGETLB) &&
> -		    !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
> +			!memcg_accounts_hugetlb())
>  			continue;
>  #endif
>  		size = memcg_page_state_output(memcg, memory_stats[i].idx);
> @@ -4483,6 +4486,15 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
>  	return ret;
>  }
>  
> +bool memcg_accounts_hugetlb(void)
> +{
> +#ifdef CONFIG_HUGETLB_PAGE
> +	return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
> +#else
> +	return false;
> +#endif
> +}
> +
>  /**
>   * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio
>   * @memcg: memcg to charge.
> @@ -4508,8 +4520,7 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
>  	 * but do not attempt to commit charge later (or cancel on error) either.
>  	 */
>  	if (mem_cgroup_disabled() || !memcg ||
> -		!cgroup_subsys_on_dfl(memory_cgrp_subsys) ||
> -		!(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
> +		!cgroup_subsys_on_dfl(memory_cgrp_subsys) || !memcg_accounts_hugetlb())
>  		return -EOPNOTSUPP;
>  
>  	if (try_charge(memcg, gfp, nr_pages))
> -- 
> 2.43.5
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
  2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
@ 2024-11-08 22:42   ` Shakeel Butt
  2024-11-09 18:58     ` Joshua Hahn
  2024-11-09  1:03   ` SeongJae Park
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Shakeel Butt @ 2024-11-08 22:42 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
	linux-mm, linux-kernel, kernel-team

On Fri, Nov 08, 2024 at 01:29:45PM -0800, Joshua Hahn wrote:
> This patch introduces mem_cgroup_charge_hugetlb, which combines the
> logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
> memcg in hugetlb code, and also consolidates the error path that memcg
> can take into just one point.
> 
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> 
> ---
>  include/linux/memcontrol.h |  2 ++
>  mm/hugetlb.c               | 34 ++++++++++++----------------------
>  mm/memcontrol.c            | 19 +++++++++++++++++++
>  3 files changed, 33 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index bb49e0d4b377..d90c1ac791f1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -678,6 +678,8 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
>  int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
>  		long nr_pages);
>  
> +int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp);
> +
>  int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
>  				  gfp_t gfp, swp_entry_t entry);
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fbb10e52d7ea..6f6841483039 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2967,21 +2967,13 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>  	struct hugepage_subpool *spool = subpool_vma(vma);
>  	struct hstate *h = hstate_vma(vma);
>  	struct folio *folio;
> -	long map_chg, map_commit, nr_pages = pages_per_huge_page(h);
> +	long map_chg, map_commit;
>  	long gbl_chg;
> -	int memcg_charge_ret, ret, idx;
> +	int ret, idx;
>  	struct hugetlb_cgroup *h_cg = NULL;
> -	struct mem_cgroup *memcg;
>  	bool deferred_reserve;
>  	gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
>  
> -	memcg = get_mem_cgroup_from_current();
> -	memcg_charge_ret = mem_cgroup_hugetlb_try_charge(memcg, gfp, nr_pages);
> -	if (memcg_charge_ret == -ENOMEM) {
> -		mem_cgroup_put(memcg);
> -		return ERR_PTR(-ENOMEM);
> -	}
> -
>  	idx = hstate_index(h);
>  	/*
>  	 * Examine the region/reserve map to determine if the process
> @@ -2989,12 +2981,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>  	 * code of zero indicates a reservation exists (no change).
>  	 */
>  	map_chg = gbl_chg = vma_needs_reservation(h, vma, addr);
> -	if (map_chg < 0) {
> -		if (!memcg_charge_ret)
> -			mem_cgroup_cancel_charge(memcg, nr_pages);
> -		mem_cgroup_put(memcg);
> +	if (map_chg < 0)
>  		return ERR_PTR(-ENOMEM);
> -	}
>  
>  	/*
>  	 * Processes that did not create the mapping will have no
> @@ -3092,10 +3080,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>  		}
>  	}
>  
> -	if (!memcg_charge_ret)
> -		mem_cgroup_commit_charge(folio, memcg);
> -	lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h));
> -	mem_cgroup_put(memcg);
> +	ret = mem_cgroup_charge_hugetlb(folio, gfp);
> +	if (ret == -ENOMEM) {
> +		spin_unlock_irq(&hugetlb_lock);

spin_unlock_irq??

> +		free_huge_folio(folio);

free_huge_folio() will call lruvec_stat_mod_folio() unconditionally but
you are only calling it on success. This may underflow the metric.

> +		return ERR_PTR(-ENOMEM);
> +	}
> +	else if (!ret)
> +		lruvec_stat_mod_folio(folio, NR_HUGETLB,
> +		      pages_per_huge_page(h));
>  
>  	return folio;
>  
> @@ -3110,9 +3103,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>  		hugepage_subpool_put_pages(spool, 1);
>  out_end_reservation:
>  	vma_end_reservation(h, vma, addr);
> -	if (!memcg_charge_ret)
> -		mem_cgroup_cancel_charge(memcg, nr_pages);
> -	mem_cgroup_put(memcg);
>  	return ERR_PTR(-ENOSPC);
>  }
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 97f63ec9c9fb..95ee77fe27af 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4529,6 +4529,25 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
>  	return 0;
>  }
>  
> +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> +{
> +	struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> +	int ret = 0;
> +
> +	if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
> +		!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> +		ret = -EOPNOTSUPP;

why EOPNOTSUPP? You need to return 0 here. We do want
lruvec_stat_mod_folio() to be called.

> +		goto out;
> +	}
> +
> +	if (charge_memcg(folio, memcg, gfp))
> +		ret = -ENOMEM;
> +
> +out:
> +	mem_cgroup_put(memcg);
> +	return ret;
> +}
> +
>  /**
>   * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin.
>   * @folio: folio to charge.
> -- 
> 2.43.5
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol
  2024-11-08 21:29 ` [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol Joshua Hahn
@ 2024-11-08 22:43   ` Shakeel Butt
  2024-11-08 23:07   ` Yosry Ahmed
  1 sibling, 0 replies; 18+ messages in thread
From: Shakeel Butt @ 2024-11-08 22:43 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
	linux-mm, linux-kernel, kernel-team

On Fri, Nov 08, 2024 at 01:29:46PM -0800, Joshua Hahn wrote:
> This patch fully deprecates the mem_cgroup_{try, commit, cancel} charge
> functions, as well as their hugetlb variants. Please note that this
> patch relies on [1], which removes the last references (from memcg-v1)
> to some of these functions.
> 
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> 

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb
  2024-11-08 22:21   ` Shakeel Butt
@ 2024-11-08 23:03     ` Yosry Ahmed
  2024-11-09 18:31       ` Joshua Hahn
  0 siblings, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2024-11-08 23:03 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Joshua Hahn, hannes, mhocko, roman.gushchin, muchun.song, akpm,
	cgroups, linux-mm, linux-kernel, kernel-team

On Fri, Nov 8, 2024 at 2:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Fri, Nov 08, 2024 at 01:29:44PM -0800, Joshua Hahn wrote:
> > This patch isolates the check for whether memcg accounts hugetlb.
> > This condition can only be true if the memcg mount option
> > memory_hugetlb_accounting is on, which includes hugetlb usage
> > in memory.current.
> >
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> >
> > ---
> >  mm/memcontrol.c | 17 ++++++++++++++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f3a9653cef0e..97f63ec9c9fb 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1425,6 +1425,9 @@ unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item)
> >               memcg_page_state_output_unit(item);
> >  }
> >
> > +/* Forward declaration */
> > +bool memcg_accounts_hugetlb(void);
>
> No need for forward declaration. Just define it here and make it static.

Also please pull the #ifdef outside the function definition, e.g.

#ifdef CONFIG_HUGETLB_PAGE
static bool memcg_accounts_hugetlb(void)
{
     return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
}
#else /* CONFIG_HUGETLB_PAGE */
static bool memcg_accounts_hugetlb(void) { return false; }
{
     return false;
}
#endif /* CONFIG_HUGETLB_PAGE */

>
> > +
> >  static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
> >  {
> >       int i;
> > @@ -1446,7 +1449,7 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
> >
> >  #ifdef CONFIG_HUGETLB_PAGE
> >               if (unlikely(memory_stats[i].idx == NR_HUGETLB) &&
> > -                 !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
> > +                     !memcg_accounts_hugetlb())
> >                       continue;
> >  #endif
> >               size = memcg_page_state_output(memcg, memory_stats[i].idx);
> > @@ -4483,6 +4486,15 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
> >       return ret;
> >  }
> >
> > +bool memcg_accounts_hugetlb(void)
> > +{
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +     return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
> > +#else
> > +     return false;
> > +#endif
> > +}
> > +
> >  /**
> >   * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio
> >   * @memcg: memcg to charge.
> > @@ -4508,8 +4520,7 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
> >        * but do not attempt to commit charge later (or cancel on error) either.
> >        */
> >       if (mem_cgroup_disabled() || !memcg ||
> > -             !cgroup_subsys_on_dfl(memory_cgrp_subsys) ||
> > -             !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
> > +             !cgroup_subsys_on_dfl(memory_cgrp_subsys) || !memcg_accounts_hugetlb())
> >               return -EOPNOTSUPP;
> >
> >       if (try_charge(memcg, gfp, nr_pages))
> > --
> > 2.43.5
> >
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol
  2024-11-08 21:29 ` [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol Joshua Hahn
  2024-11-08 22:43   ` Shakeel Butt
@ 2024-11-08 23:07   ` Yosry Ahmed
  2024-11-09 18:34     ` Joshua Hahn
  1 sibling, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2024-11-08 23:07 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: shakeel.butt, hannes, mhocko, roman.gushchin, muchun.song, akpm,
	cgroups, linux-mm, linux-kernel, kernel-team

On Fri, Nov 8, 2024 at 1:30 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> This patch fully deprecates the mem_cgroup_{try, commit, cancel} charge
> functions, as well as their hugetlb variants. Please note that this
> patch relies on [1], which removes the last references (from memcg-v1)
> to some of these functions.

Nit: We are not really "deprecating" them, we are removing them.
Deprecation is usually tied to user-visible APIs that we cannot just
remove, at least not right away. Please rephrase the subject and
commit log accordingly.

>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
>
> [1] https://lore.kernel.org/linux-mm/20241025012304.2473312-1-shakeel.butt@linux.dev/
>
> ---
>  include/linux/memcontrol.h | 22 -------------
>  mm/memcontrol.c            | 65 ++------------------------------------
>  2 files changed, 3 insertions(+), 84 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d90c1ac791f1..75f15c4efe73 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -649,8 +649,6 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
>                 page_counter_read(&memcg->memory);
>  }
>
> -void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg);
> -
>  int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp);
>
>  /**
> @@ -675,9 +673,6 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
>         return __mem_cgroup_charge(folio, mm, gfp);
>  }
>
> -int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
> -               long nr_pages);
> -
>  int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp);
>
>  int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
> @@ -708,7 +703,6 @@ static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
>         __mem_cgroup_uncharge_folios(folios);
>  }
>
> -void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages);
>  void mem_cgroup_replace_folio(struct folio *old, struct folio *new);
>  void mem_cgroup_migrate(struct folio *old, struct folio *new);
>
> @@ -1167,23 +1161,12 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
>         return false;
>  }
>
> -static inline void mem_cgroup_commit_charge(struct folio *folio,
> -               struct mem_cgroup *memcg)
> -{
> -}
> -
>  static inline int mem_cgroup_charge(struct folio *folio,
>                 struct mm_struct *mm, gfp_t gfp)
>  {
>         return 0;
>  }
>
> -static inline int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg,
> -               gfp_t gfp, long nr_pages)
> -{
> -       return 0;
> -}
> -
>  static inline int mem_cgroup_swapin_charge_folio(struct folio *folio,
>                         struct mm_struct *mm, gfp_t gfp, swp_entry_t entry)
>  {
> @@ -1202,11 +1185,6 @@ static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
>  {
>  }
>
> -static inline void mem_cgroup_cancel_charge(struct mem_cgroup *memcg,
> -               unsigned int nr_pages)
> -{
> -}
> -
>  static inline void mem_cgroup_replace_folio(struct folio *old,
>                 struct folio *new)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 95ee77fe27af..17126d8d263d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2351,21 +2351,6 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
>         return 0;
>  }
>
> -/**
> - * mem_cgroup_cancel_charge() - cancel an uncommitted try_charge() call.
> - * @memcg: memcg previously charged.
> - * @nr_pages: number of pages previously charged.
> - */
> -void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
> -{
> -       if (mem_cgroup_is_root(memcg))
> -               return;
> -
> -       page_counter_uncharge(&memcg->memory, nr_pages);
> -       if (do_memsw_account())
> -               page_counter_uncharge(&memcg->memsw, nr_pages);
> -}
> -
>  static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
>  {
>         VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio);
> @@ -2379,18 +2364,6 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
>         folio->memcg_data = (unsigned long)memcg;
>  }
>
> -/**
> - * mem_cgroup_commit_charge - commit a previously successful try_charge().
> - * @folio: folio to commit the charge to.
> - * @memcg: memcg previously charged.
> - */
> -void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg)
> -{
> -       css_get(&memcg->css);
> -       commit_charge(folio, memcg);
> -       memcg1_commit_charge(folio, memcg);
> -}
> -
>  static inline void __mod_objcg_mlstate(struct obj_cgroup *objcg,
>                                        struct pglist_data *pgdat,
>                                        enum node_stat_item idx, int nr)
> @@ -4469,7 +4442,9 @@ static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg,
>         if (ret)
>                 goto out;
>
> -       mem_cgroup_commit_charge(folio, memcg);
> +       css_get(&memcg->css);
> +       commit_charge(folio, memcg);
> +       memcg1_commit_charge(folio, memcg);
>  out:
>         return ret;
>  }
> @@ -4495,40 +4470,6 @@ bool memcg_accounts_hugetlb(void)
>  #endif
>  }
>
> -/**
> - * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio
> - * @memcg: memcg to charge.
> - * @gfp: reclaim mode.
> - * @nr_pages: number of pages to charge.
> - *
> - * This function is called when allocating a huge page folio to determine if
> - * the memcg has the capacity for it. It does not commit the charge yet,
> - * as the hugetlb folio itself has not been obtained from the hugetlb pool.
> - *
> - * Once we have obtained the hugetlb folio, we can call
> - * mem_cgroup_commit_charge() to commit the charge. If we fail to obtain the
> - * folio, we should instead call mem_cgroup_cancel_charge() to undo the effect
> - * of try_charge().
> - *
> - * Returns 0 on success. Otherwise, an error code is returned.
> - */
> -int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
> -                       long nr_pages)
> -{
> -       /*
> -        * If hugetlb memcg charging is not enabled, do not fail hugetlb allocation,
> -        * but do not attempt to commit charge later (or cancel on error) either.
> -        */
> -       if (mem_cgroup_disabled() || !memcg ||
> -               !cgroup_subsys_on_dfl(memory_cgrp_subsys) || !memcg_accounts_hugetlb())
> -               return -EOPNOTSUPP;
> -
> -       if (try_charge(memcg, gfp, nr_pages))
> -               return -ENOMEM;
> -
> -       return 0;
> -}
> -
>  int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
>  {
>         struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> --
> 2.43.5
>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
  2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
  2024-11-08 22:42   ` Shakeel Butt
@ 2024-11-09  1:03   ` SeongJae Park
  2024-11-09 18:41     ` Joshua Hahn
  2024-11-09  3:42   ` kernel test robot
  2024-11-09  5:24   ` kernel test robot
  3 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2024-11-09  1:03 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: SeongJae Park, shakeel.butt, hannes, mhocko, roman.gushchin,
	muchun.song, akpm, cgroups, linux-mm, linux-kernel, kernel-team

Hi Joshua,

On Fri, 8 Nov 2024 13:29:45 -0800 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

> This patch introduces mem_cgroup_charge_hugetlb, which combines the
> logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of

Nit.  Seems the regular expression is not technically correct?

> memcg in hugetlb code, and also consolidates the error path that memcg
> can take into just one point.
> 
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> 
> ---
>  include/linux/memcontrol.h |  2 ++
>  mm/hugetlb.c               | 34 ++++++++++++----------------------
>  mm/memcontrol.c            | 19 +++++++++++++++++++
>  3 files changed, 33 insertions(+), 22 deletions(-)
[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 97f63ec9c9fb..95ee77fe27af 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4529,6 +4529,25 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
>  	return 0;
>  }
>  
> +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)

Can we add a kernel-doc comment for this function?  Maybe that for
mem_cgroup_hugetlb_try_charge() can be stolen with only small updates?

> +{
> +	struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> +	int ret = 0;
> +
> +	if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
> +		!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> +		ret = -EOPNOTSUPP;
> +		goto out;
> +	}
> +
> +	if (charge_memcg(folio, memcg, gfp))
> +		ret = -ENOMEM;
> +
> +out:
> +	mem_cgroup_put(memcg);
> +	return ret;
> +}
> +
>  /**
>   * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin.
>   * @folio: folio to charge.
> -- 
> 2.43.5


Thanks,
SJ


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
  2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
  2024-11-08 22:42   ` Shakeel Butt
  2024-11-09  1:03   ` SeongJae Park
@ 2024-11-09  3:42   ` kernel test robot
  2024-11-09  5:24   ` kernel test robot
  3 siblings, 0 replies; 18+ messages in thread
From: kernel test robot @ 2024-11-09  3:42 UTC (permalink / raw)
  To: Joshua Hahn, shakeel.butt
  Cc: oe-kbuild-all, hannes, mhocko, roman.gushchin, muchun.song, akpm,
	cgroups, linux-mm, linux-kernel, kernel-team

Hi Joshua,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on next-20241108]
[cannot apply to linus/master v6.12-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Joshua-Hahn/memcg-hugetlb-Introduce-memcg_accounts_hugetlb/20241109-053045
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20241108212946.2642085-3-joshua.hahnjy%40gmail.com
patch subject: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20241109/202411091159.y6yDCkdf-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241109/202411091159.y6yDCkdf-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411091159.y6yDCkdf-lkp@intel.com/

All errors (new ones prefixed by >>):

   mm/hugetlb.c: In function 'alloc_hugetlb_folio':
>> mm/hugetlb.c:3083:15: error: implicit declaration of function 'mem_cgroup_charge_hugetlb'; did you mean 'mem_cgroup_charge_skmem'? [-Werror=implicit-function-declaration]
    3083 |         ret = mem_cgroup_charge_hugetlb(folio, gfp);
         |               ^~~~~~~~~~~~~~~~~~~~~~~~~
         |               mem_cgroup_charge_skmem
   cc1: some warnings being treated as errors


vim +3083 mm/hugetlb.c

  2963	
  2964	struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
  2965					    unsigned long addr, int avoid_reserve)
  2966	{
  2967		struct hugepage_subpool *spool = subpool_vma(vma);
  2968		struct hstate *h = hstate_vma(vma);
  2969		struct folio *folio;
  2970		long map_chg, map_commit;
  2971		long gbl_chg;
  2972		int ret, idx;
  2973		struct hugetlb_cgroup *h_cg = NULL;
  2974		bool deferred_reserve;
  2975		gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
  2976	
  2977		idx = hstate_index(h);
  2978		/*
  2979		 * Examine the region/reserve map to determine if the process
  2980		 * has a reservation for the page to be allocated.  A return
  2981		 * code of zero indicates a reservation exists (no change).
  2982		 */
  2983		map_chg = gbl_chg = vma_needs_reservation(h, vma, addr);
  2984		if (map_chg < 0)
  2985			return ERR_PTR(-ENOMEM);
  2986	
  2987		/*
  2988		 * Processes that did not create the mapping will have no
  2989		 * reserves as indicated by the region/reserve map. Check
  2990		 * that the allocation will not exceed the subpool limit.
  2991		 * Allocations for MAP_NORESERVE mappings also need to be
  2992		 * checked against any subpool limit.
  2993		 */
  2994		if (map_chg || avoid_reserve) {
  2995			gbl_chg = hugepage_subpool_get_pages(spool, 1);
  2996			if (gbl_chg < 0)
  2997				goto out_end_reservation;
  2998	
  2999			/*
  3000			 * Even though there was no reservation in the region/reserve
  3001			 * map, there could be reservations associated with the
  3002			 * subpool that can be used.  This would be indicated if the
  3003			 * return value of hugepage_subpool_get_pages() is zero.
  3004			 * However, if avoid_reserve is specified we still avoid even
  3005			 * the subpool reservations.
  3006			 */
  3007			if (avoid_reserve)
  3008				gbl_chg = 1;
  3009		}
  3010	
  3011		/* If this allocation is not consuming a reservation, charge it now.
  3012		 */
  3013		deferred_reserve = map_chg || avoid_reserve;
  3014		if (deferred_reserve) {
  3015			ret = hugetlb_cgroup_charge_cgroup_rsvd(
  3016				idx, pages_per_huge_page(h), &h_cg);
  3017			if (ret)
  3018				goto out_subpool_put;
  3019		}
  3020	
  3021		ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
  3022		if (ret)
  3023			goto out_uncharge_cgroup_reservation;
  3024	
  3025		spin_lock_irq(&hugetlb_lock);
  3026		/*
  3027		 * glb_chg is passed to indicate whether or not a page must be taken
  3028		 * from the global free pool (global change).  gbl_chg == 0 indicates
  3029		 * a reservation exists for the allocation.
  3030		 */
  3031		folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
  3032		if (!folio) {
  3033			spin_unlock_irq(&hugetlb_lock);
  3034			folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
  3035			if (!folio)
  3036				goto out_uncharge_cgroup;
  3037			spin_lock_irq(&hugetlb_lock);
  3038			if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
  3039				folio_set_hugetlb_restore_reserve(folio);
  3040				h->resv_huge_pages--;
  3041			}
  3042			list_add(&folio->lru, &h->hugepage_activelist);
  3043			folio_ref_unfreeze(folio, 1);
  3044			/* Fall through */
  3045		}
  3046	
  3047		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio);
  3048		/* If allocation is not consuming a reservation, also store the
  3049		 * hugetlb_cgroup pointer on the page.
  3050		 */
  3051		if (deferred_reserve) {
  3052			hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h),
  3053							  h_cg, folio);
  3054		}
  3055	
  3056		spin_unlock_irq(&hugetlb_lock);
  3057	
  3058		hugetlb_set_folio_subpool(folio, spool);
  3059	
  3060		map_commit = vma_commit_reservation(h, vma, addr);
  3061		if (unlikely(map_chg > map_commit)) {
  3062			/*
  3063			 * The page was added to the reservation map between
  3064			 * vma_needs_reservation and vma_commit_reservation.
  3065			 * This indicates a race with hugetlb_reserve_pages.
  3066			 * Adjust for the subpool count incremented above AND
  3067			 * in hugetlb_reserve_pages for the same page.  Also,
  3068			 * the reservation count added in hugetlb_reserve_pages
  3069			 * no longer applies.
  3070			 */
  3071			long rsv_adjust;
  3072	
  3073			rsv_adjust = hugepage_subpool_put_pages(spool, 1);
  3074			hugetlb_acct_memory(h, -rsv_adjust);
  3075			if (deferred_reserve) {
  3076				spin_lock_irq(&hugetlb_lock);
  3077				hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
  3078						pages_per_huge_page(h), folio);
  3079				spin_unlock_irq(&hugetlb_lock);
  3080			}
  3081		}
  3082	
> 3083		ret = mem_cgroup_charge_hugetlb(folio, gfp);
  3084		if (ret == -ENOMEM) {
  3085			spin_unlock_irq(&hugetlb_lock);
  3086			free_huge_folio(folio);
  3087			return ERR_PTR(-ENOMEM);
  3088		}
  3089		else if (!ret)
  3090			lruvec_stat_mod_folio(folio, NR_HUGETLB,
  3091			      pages_per_huge_page(h));
  3092	
  3093		return folio;
  3094	
  3095	out_uncharge_cgroup:
  3096		hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg);
  3097	out_uncharge_cgroup_reservation:
  3098		if (deferred_reserve)
  3099			hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h),
  3100							    h_cg);
  3101	out_subpool_put:
  3102		if (map_chg || avoid_reserve)
  3103			hugepage_subpool_put_pages(spool, 1);
  3104	out_end_reservation:
  3105		vma_end_reservation(h, vma, addr);
  3106		return ERR_PTR(-ENOSPC);
  3107	}
  3108	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
  2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
                     ` (2 preceding siblings ...)
  2024-11-09  3:42   ` kernel test robot
@ 2024-11-09  5:24   ` kernel test robot
  3 siblings, 0 replies; 18+ messages in thread
From: kernel test robot @ 2024-11-09  5:24 UTC (permalink / raw)
  To: Joshua Hahn, shakeel.butt
  Cc: llvm, oe-kbuild-all, hannes, mhocko, roman.gushchin, muchun.song,
	akpm, cgroups, linux-mm, linux-kernel, kernel-team

Hi Joshua,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on next-20241108]
[cannot apply to linus/master v6.12-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Joshua-Hahn/memcg-hugetlb-Introduce-memcg_accounts_hugetlb/20241109-053045
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20241108212946.2642085-3-joshua.hahnjy%40gmail.com
patch subject: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
config: x86_64-buildonly-randconfig-002-20241109 (https://download.01.org/0day-ci/archive/20241109/202411091344.51lAbqmY-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241109/202411091344.51lAbqmY-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411091344.51lAbqmY-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from mm/hugetlb.c:8:
   In file included from include/linux/mm.h:2211:
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from mm/hugetlb.c:37:
   include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
      47 |         __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages);
         |                                    ~~~~~~~~~~~ ^ ~~~
   include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
      49 |                                 NR_ZONE_LRU_BASE + lru, nr_pages);
         |                                 ~~~~~~~~~~~~~~~~ ^ ~~~
>> mm/hugetlb.c:3083:8: error: call to undeclared function 'mem_cgroup_charge_hugetlb'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    3083 |         ret = mem_cgroup_charge_hugetlb(folio, gfp);
         |               ^
   mm/hugetlb.c:3083:8: note: did you mean 'mem_cgroup_charge_skmem'?
   include/linux/memcontrol.h:1613:6: note: 'mem_cgroup_charge_skmem' declared here
    1613 | bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
         |      ^
   3 warnings and 1 error generated.


vim +/mem_cgroup_charge_hugetlb +3083 mm/hugetlb.c

  2963	
  2964	struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
  2965					    unsigned long addr, int avoid_reserve)
  2966	{
  2967		struct hugepage_subpool *spool = subpool_vma(vma);
  2968		struct hstate *h = hstate_vma(vma);
  2969		struct folio *folio;
  2970		long map_chg, map_commit;
  2971		long gbl_chg;
  2972		int ret, idx;
  2973		struct hugetlb_cgroup *h_cg = NULL;
  2974		bool deferred_reserve;
  2975		gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
  2976	
  2977		idx = hstate_index(h);
  2978		/*
  2979		 * Examine the region/reserve map to determine if the process
  2980		 * has a reservation for the page to be allocated.  A return
  2981		 * code of zero indicates a reservation exists (no change).
  2982		 */
  2983		map_chg = gbl_chg = vma_needs_reservation(h, vma, addr);
  2984		if (map_chg < 0)
  2985			return ERR_PTR(-ENOMEM);
  2986	
  2987		/*
  2988		 * Processes that did not create the mapping will have no
  2989		 * reserves as indicated by the region/reserve map. Check
  2990		 * that the allocation will not exceed the subpool limit.
  2991		 * Allocations for MAP_NORESERVE mappings also need to be
  2992		 * checked against any subpool limit.
  2993		 */
  2994		if (map_chg || avoid_reserve) {
  2995			gbl_chg = hugepage_subpool_get_pages(spool, 1);
  2996			if (gbl_chg < 0)
  2997				goto out_end_reservation;
  2998	
  2999			/*
  3000			 * Even though there was no reservation in the region/reserve
  3001			 * map, there could be reservations associated with the
  3002			 * subpool that can be used.  This would be indicated if the
  3003			 * return value of hugepage_subpool_get_pages() is zero.
  3004			 * However, if avoid_reserve is specified we still avoid even
  3005			 * the subpool reservations.
  3006			 */
  3007			if (avoid_reserve)
  3008				gbl_chg = 1;
  3009		}
  3010	
  3011		/* If this allocation is not consuming a reservation, charge it now.
  3012		 */
  3013		deferred_reserve = map_chg || avoid_reserve;
  3014		if (deferred_reserve) {
  3015			ret = hugetlb_cgroup_charge_cgroup_rsvd(
  3016				idx, pages_per_huge_page(h), &h_cg);
  3017			if (ret)
  3018				goto out_subpool_put;
  3019		}
  3020	
  3021		ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
  3022		if (ret)
  3023			goto out_uncharge_cgroup_reservation;
  3024	
  3025		spin_lock_irq(&hugetlb_lock);
  3026		/*
  3027		 * glb_chg is passed to indicate whether or not a page must be taken
  3028		 * from the global free pool (global change).  gbl_chg == 0 indicates
  3029		 * a reservation exists for the allocation.
  3030		 */
  3031		folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
  3032		if (!folio) {
  3033			spin_unlock_irq(&hugetlb_lock);
  3034			folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
  3035			if (!folio)
  3036				goto out_uncharge_cgroup;
  3037			spin_lock_irq(&hugetlb_lock);
  3038			if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
  3039				folio_set_hugetlb_restore_reserve(folio);
  3040				h->resv_huge_pages--;
  3041			}
  3042			list_add(&folio->lru, &h->hugepage_activelist);
  3043			folio_ref_unfreeze(folio, 1);
  3044			/* Fall through */
  3045		}
  3046	
  3047		hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio);
  3048		/* If allocation is not consuming a reservation, also store the
  3049		 * hugetlb_cgroup pointer on the page.
  3050		 */
  3051		if (deferred_reserve) {
  3052			hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h),
  3053							  h_cg, folio);
  3054		}
  3055	
  3056		spin_unlock_irq(&hugetlb_lock);
  3057	
  3058		hugetlb_set_folio_subpool(folio, spool);
  3059	
  3060		map_commit = vma_commit_reservation(h, vma, addr);
  3061		if (unlikely(map_chg > map_commit)) {
  3062			/*
  3063			 * The page was added to the reservation map between
  3064			 * vma_needs_reservation and vma_commit_reservation.
  3065			 * This indicates a race with hugetlb_reserve_pages.
  3066			 * Adjust for the subpool count incremented above AND
  3067			 * in hugetlb_reserve_pages for the same page.  Also,
  3068			 * the reservation count added in hugetlb_reserve_pages
  3069			 * no longer applies.
  3070			 */
  3071			long rsv_adjust;
  3072	
  3073			rsv_adjust = hugepage_subpool_put_pages(spool, 1);
  3074			hugetlb_acct_memory(h, -rsv_adjust);
  3075			if (deferred_reserve) {
  3076				spin_lock_irq(&hugetlb_lock);
  3077				hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
  3078						pages_per_huge_page(h), folio);
  3079				spin_unlock_irq(&hugetlb_lock);
  3080			}
  3081		}
  3082	
> 3083		ret = mem_cgroup_charge_hugetlb(folio, gfp);
  3084		if (ret == -ENOMEM) {
  3085			spin_unlock_irq(&hugetlb_lock);
  3086			free_huge_folio(folio);
  3087			return ERR_PTR(-ENOMEM);
  3088		}
  3089		else if (!ret)
  3090			lruvec_stat_mod_folio(folio, NR_HUGETLB,
  3091			      pages_per_huge_page(h));
  3092	
  3093		return folio;
  3094	
  3095	out_uncharge_cgroup:
  3096		hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg);
  3097	out_uncharge_cgroup_reservation:
  3098		if (deferred_reserve)
  3099			hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h),
  3100							    h_cg);
  3101	out_subpool_put:
  3102		if (map_chg || avoid_reserve)
  3103			hugepage_subpool_put_pages(spool, 1);
  3104	out_end_reservation:
  3105		vma_end_reservation(h, vma, addr);
  3106		return ERR_PTR(-ENOSPC);
  3107	}
  3108	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb
  2024-11-08 23:03     ` Yosry Ahmed
@ 2024-11-09 18:31       ` Joshua Hahn
  0 siblings, 0 replies; 18+ messages in thread
From: Joshua Hahn @ 2024-11-09 18:31 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Shakeel Butt, hannes, mhocko, roman.gushchin, muchun.song, akpm,
	cgroups, linux-mm, linux-kernel, kernel-team

On Fri, Nov 8, 2024 at 6:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Fri, Nov 8, 2024 at 2:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Fri, Nov 08, 2024 at 01:29:44PM -0800, Joshua Hahn wrote:
> > > This patch isolates the check for whether memcg accounts hugetlb.
> > > This condition can only be true if the memcg mount option
> > > memory_hugetlb_accounting is on, which includes hugetlb usage
> > > in memory.current.
> > >
> > > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > >
> > > ---
> > >  mm/memcontrol.c | 17 ++++++++++++++---
> > >  1 file changed, 14 insertions(+), 3 deletions(-)
> > > +/* Forward declaration */
> > > +bool memcg_accounts_hugetlb(void);
> >
> > No need for forward declaration. Just define it here and make it static.
>
> Also please pull the #ifdef outside the function definition, e.g.
>
> #ifdef CONFIG_HUGETLB_PAGE
> static bool memcg_accounts_hugetlb(void)
> {
>      return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
> }
> #else /* CONFIG_HUGETLB_PAGE */
> static bool memcg_accounts_hugetlb(void) { return false; }
> {
>      return false;
> }
> #endif /* CONFIG_HUGETLB_PAGE */
>

Hello Shakeel and Yosry,

Thank you for taking the time to review my patch.
Yes -- I will just declare the function & make it static. It was my
intention to group the new memcg charging functions together,
and in that effort I just made a forward declaration above.
However, I think that it does make the code look a bit more
messy, which is against the spirit of this patch series!

And Yosry, thank you for your feedback, I will separate the
definitions based on the #ifdef.

Thank you both, I hope you have a great day!
Joshua


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol
  2024-11-08 23:07   ` Yosry Ahmed
@ 2024-11-09 18:34     ` Joshua Hahn
  0 siblings, 0 replies; 18+ messages in thread
From: Joshua Hahn @ 2024-11-09 18:34 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: shakeel.butt, hannes, mhocko, roman.gushchin, muchun.song, akpm,
	cgroups, linux-mm, linux-kernel, kernel-team

On Fri, Nov 8, 2024 at 6:08 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Fri, Nov 8, 2024 at 1:30 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> >
> > This patch fully deprecates the mem_cgroup_{try, commit, cancel} charge
> > functions, as well as their hugetlb variants. Please note that this
> > patch relies on [1], which removes the last references (from memcg-v1)
> > to some of these functions.
>
> Nit: We are not really "deprecating" them, we are removing them.
> Deprecation is usually tied to user-visible APIs that we cannot just
> remove, at least not right away. Please rephrase the subject and
> commit log accordingly.
>
> >
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> >
> > [1] https://lore.kernel.org/linux-mm/20241025012304.2473312-1-shakeel.butt@linux.dev/

Hi Yosry,

Thank you for letting me know. To be completely honest, I think
I have been misusing the word in that case. You are correct,
the goal was to try and not change any functionality from the
user perspective, so I think removing is a better word, as you
suggested. I will make this change in the v3!

Have a great day,
Joshua


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
  2024-11-09  1:03   ` SeongJae Park
@ 2024-11-09 18:41     ` Joshua Hahn
  2024-11-09 20:52       ` SeongJae Park
  0 siblings, 1 reply; 18+ messages in thread
From: Joshua Hahn @ 2024-11-09 18:41 UTC (permalink / raw)
  To: SeongJae Park
  Cc: shakeel.butt, hannes, mhocko, roman.gushchin, muchun.song, akpm,
	cgroups, linux-mm, linux-kernel, kernel-team

Hello SJ, thank you for reviewing my patch!

On Fri, Nov 8, 2024 at 8:03 PM SeongJae Park <sj@kernel.org> wrote:
>
> Hi Joshua,
>
> On Fri, 8 Nov 2024 13:29:45 -0800 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> > This patch introduces mem_cgroup_charge_hugetlb, which combines the
> > logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
>
> Nit.  Seems the regular expression is not technically correct?

I see, I will change it expand it out to include both. What I meant to
say is that it combines the functionality of both the functions, but
I think there was a typo there. I will just expand it out so that it is
more clear to readers!

> > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
>
> Can we add a kernel-doc comment for this function?  Maybe that for
> mem_cgroup_hugetlb_try_charge() can be stolen with only small updates?

Yes, I can definitely add a kernel-doc for this function. Would
you mind expanding on the "stolen only with small updates" part?
Do you mean that instead of writing a completely new section
in the kernel-doc, I can just change the name of the section
and modify small parts of the description?

> Thanks,
> SJ

Thank you for your time! I hope you have a good weekend!
Joshua


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
  2024-11-08 22:42   ` Shakeel Butt
@ 2024-11-09 18:58     ` Joshua Hahn
  2024-11-11  6:33       ` Shakeel Butt
  0 siblings, 1 reply; 18+ messages in thread
From: Joshua Hahn @ 2024-11-09 18:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
	linux-mm, linux-kernel, kernel-team

Hello Shakeel, thank you for reviewing my patch!

On Fri, Nov 8, 2024 at 5:43 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Fri, Nov 08, 2024 at 01:29:45PM -0800, Joshua Hahn wrote:
> > This patch introduces mem_cgroup_charge_hugetlb, which combines the
> > logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
> > memcg in hugetlb code, and also consolidates the error path that memcg
> > can take into just one point.
> >
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > -     if (!memcg_charge_ret)
> > -             mem_cgroup_commit_charge(folio, memcg);
> > -     lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h));
> > -     mem_cgroup_put(memcg);
> > +     ret = mem_cgroup_charge_hugetlb(folio, gfp);
> > +     if (ret == -ENOMEM) {
> > +             spin_unlock_irq(&hugetlb_lock);
>
> spin_unlock_irq??

Thank you for the catch. I completely missed this after I
swapped the position of mem_cgroup_charge_hugetlb
to be called without the lock. I will fix this.

> > +             free_huge_folio(folio);
>
> free_huge_folio() will call lruvec_stat_mod_folio() unconditionally but
> you are only calling it on success. This may underflow the metric.

I was actually thinking about this too. I was wondering what would
make sense -- in the original draft of this patch, I had the charge
increment be called unconditionally as well. The idea was that
even though it would not make sense to have the stat incremented
when there is an error, it would eventually be corrected by
free_huge_folio's decrement. However, because there is nothing
stopping the user from checking the stat in this period, they may
temporarily see that the value is incremented in memory.stat,
even though they were not able to obtain this page.

With that said, maybe it makes sense to increment unconditionally,
if free also decrements unconditionally. This race condition is
not something that will cause a huge problem for the user,
although users relying on userspace monitors for memory.stat
to handle memory management may see some problems.

Maybe what would make the most sense is to do both
incrementing & decrementing conditionally as well.
Thank you for your feedback, I will iterate on this for the next version!

> > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> > +{
> > +     struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> > +     int ret = 0;
> > +
> > +     if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
> > +             !memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > +             ret = -EOPNOTSUPP;
>
> why EOPNOTSUPP? You need to return 0 here. We do want
> lruvec_stat_mod_folio() to be called.

In this case, I was just preserving the original code's return statements.
That is, in mem_cgroup_hugetlb_try_charge, the intended behavior
was to return -EOPNOTSUPP if any of these conditions were met.
If I understand the code correctly, calling lruvec_stat_mod_folio()
on this failure will be a noop, since either memcg doesn't account
hugetlb folios / there is no memcg / memcg is disabled.

With all of this said, I think your feedback makes the most sense
here, given the new semantics of the function: if there is no
memcg or memcg doesn't account hugetlb, then there is no
way that the limit can be reached! I will go forward with returning 0,
and calling lruvec_stat_mod_folio (which will be a noop).

Thank you for your detailed feedback. I wish I had caught these
errors myself, thank you for your time in reviewing my patch.

I hope you have a great rest of your weekend!
Joshua


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
  2024-11-09 18:41     ` Joshua Hahn
@ 2024-11-09 20:52       ` SeongJae Park
  0 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2024-11-09 20:52 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: SeongJae Park, shakeel.butt, hannes, mhocko, roman.gushchin,
	muchun.song, akpm, cgroups, linux-mm, linux-kernel, kernel-team

On Sat, 9 Nov 2024 13:41:31 -0500 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

> Hello SJ, thank you for reviewing my patch!
> 
> On Fri, Nov 8, 2024 at 8:03 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > Hi Joshua,
> >
> > On Fri, 8 Nov 2024 13:29:45 -0800 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> >
> > > This patch introduces mem_cgroup_charge_hugetlb, which combines the
> > > logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
> >
> > Nit.  Seems the regular expression is not technically correct?
> 
> I see, I will change it expand it out to include both. What I meant to
> say is that it combines the functionality of both the functions, but
> I think there was a typo there. I will just expand it out so that it is
> more clear to readers!

Thank you :)

> 
> > > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> >
> > Can we add a kernel-doc comment for this function?  Maybe that for
> > mem_cgroup_hugetlb_try_charge() can be stolen with only small updates?
> 
> Yes, I can definitely add a kernel-doc for this function. Would
> you mind expanding on the "stolen only with small updates" part?
> Do you mean that instead of writing a completely new section
> in the kernel-doc, I can just change the name of the section
> and modify small parts of the description?

You're right.  I just thought that might save some of your time if that makes
sense.  I don't really mind about this, so do whatever as you prefer :)


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
  2024-11-09 18:58     ` Joshua Hahn
@ 2024-11-11  6:33       ` Shakeel Butt
  0 siblings, 0 replies; 18+ messages in thread
From: Shakeel Butt @ 2024-11-11  6:33 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
	linux-mm, linux-kernel, kernel-team

On Sat, Nov 09, 2024 at 01:58:46PM -0500, Joshua Hahn wrote:
[...]
> 
> > > +             free_huge_folio(folio);
> >
> > free_huge_folio() will call lruvec_stat_mod_folio() unconditionally but
> > you are only calling it on success. This may underflow the metric.
> 
> I was actually thinking about this too. I was wondering what would
> make sense -- in the original draft of this patch, I had the charge
> increment be called unconditionally as well. The idea was that
> even though it would not make sense to have the stat incremented
> when there is an error, it would eventually be corrected by
> free_huge_folio's decrement. However, because there is nothing
> stopping the user from checking the stat in this period, they may
> temporarily see that the value is incremented in memory.stat,
> even though they were not able to obtain this page.
> 

On the alloc path, unconditionally calling lruvec_stat_mod_folio() after
mem_cgroup_charge_hugetlb() but before free_huge_folio() is fine. Please
note that if mem_cgroup_charge_hugetlb() has failed,
lruvec_stat_mod_folio() will not be incrementing the memcg stat. The
system level stats may get elevated for small time window but that is
fine.

> With that said, maybe it makes sense to increment unconditionally,
> if free also decrements unconditionally. This race condition is
> not something that will cause a huge problem for the user,
> although users relying on userspace monitors for memory.stat
> to handle memory management may see some problems.
> 
> Maybe what would make the most sense is to do both
> incrementing & decrementing conditionally as well.
> Thank you for your feedback, I will iterate on this for the next version!
> 
> > > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> > > +{
> > > +     struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> > > +     int ret = 0;
> > > +
> > > +     if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
> > > +             !memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > +             ret = -EOPNOTSUPP;
> >
> > why EOPNOTSUPP? You need to return 0 here. We do want
> > lruvec_stat_mod_folio() to be called.
> 
> In this case, I was just preserving the original code's return statements.
> That is, in mem_cgroup_hugetlb_try_charge, the intended behavior
> was to return -EOPNOTSUPP if any of these conditions were met.
> If I understand the code correctly, calling lruvec_stat_mod_folio()
> on this failure will be a noop, since either memcg doesn't account
> hugetlb folios / there is no memcg / memcg is disabled.
> 

lruvec_stat_mod_folio() does manipulate system level stats, so even if
memcg accounting of hugetlb is not enabled we do want system level stats
get updated.



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-11-11  6:33 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-08 21:29 [PATCH 0/3] memcg/hugetlb: Rework memcg hugetlb charging Joshua Hahn
2024-11-08 21:29 ` [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb Joshua Hahn
2024-11-08 22:21   ` Shakeel Butt
2024-11-08 23:03     ` Yosry Ahmed
2024-11-09 18:31       ` Joshua Hahn
2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
2024-11-08 22:42   ` Shakeel Butt
2024-11-09 18:58     ` Joshua Hahn
2024-11-11  6:33       ` Shakeel Butt
2024-11-09  1:03   ` SeongJae Park
2024-11-09 18:41     ` Joshua Hahn
2024-11-09 20:52       ` SeongJae Park
2024-11-09  3:42   ` kernel test robot
2024-11-09  5:24   ` kernel test robot
2024-11-08 21:29 ` [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol Joshua Hahn
2024-11-08 22:43   ` Shakeel Butt
2024-11-08 23:07   ` Yosry Ahmed
2024-11-09 18:34     ` Joshua Hahn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox