* [PATCH 0/3] memcg/hugetlb: Rework memcg hugetlb charging
@ 2024-11-08 21:29 Joshua Hahn
2024-11-08 21:29 ` [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb Joshua Hahn
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Joshua Hahn @ 2024-11-08 21:29 UTC (permalink / raw)
To: shakeel.butt
Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
linux-mm, linux-kernel, kernel-team
This series cleans up memcg's hugetlb charging logic by deprecating the
current memcg hugetlb try-charge + {commit, cancel} logic present in
alloc_hugetlb_folio. A single function mem_cgroup_charge_hugetlb takes
its place instead. This makes the code more maintainable by simplifying
the error path and reduces memcg's footprint in hugetlb logic.
This patch introduces a few changes in the hugetlb folio allocation
error path:
(a) Instead of having multiple return points, we consolidate them to
two: one for reaching the memcg limit or running out of memory
(-ENOMEM) and one for hugetlb allocation fails / limit being
reached (-ENOSPC).
(b) Previously, the memcg limit was checked before the folio is acquired,
meaning the hugeTLB folio isn't acquired if the limit is reached.
This patch performs the charging after the folio is reached, meaning
if memcg's limit is reached, the acquired folio is freed right away.
This patch builds on two earlier patch series: [2] which adds memcg
hugeTLB counters, and [3] which deprecates charge moving and removes the
last references to mem_cgroup_cancel_charge. The request for this cleanup
can be found in [2].
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
[1] https://lore.kernel.org/all/20231006184629.155543-1-nphamcs@gmail.com/
[2] https://lore.kernel.org/all/20241101204402.1885383-1-joshua.hahnjy@gmail.com/
[3] https://lore.kernel.org/linux-mm/20241025012304.2473312-1-shakeel.butt@linux.dev/
---
Changelog
v2:
* Removed declaration of memcg_accounts_hugetlb from memcontrol.h
* Moved second call to memcg_accounts_hugetlb from 2nd patch to 1st
* Changed error behavior in alloc_hugetlb_folio: v1 included a bug
that uncharged hugetlb_cgroup twice when memecg's limit was reached
* mem_cgroup_charge_hugetlb no longer called with hugetlb_lock held
* Moved mem_cgroup_hugetlb_{try, charge} deprecation to patch 3
* mem_cgroup_charge_hugetlb always decrements memcg's refcount
* Fully cleaned up mem_cgroup_{cancel,commit}_charge
* Fixed typos
Joshua Hahn (3):
memcg/hugetlb: Introduce memcg_accounts_hugetlb
memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol
include/linux/memcontrol.h | 22 +---------
mm/hugetlb.c | 34 +++++----------
mm/memcontrol.c | 89 +++++++++++++-------------------------
3 files changed, 43 insertions(+), 102 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb
2024-11-08 21:29 [PATCH 0/3] memcg/hugetlb: Rework memcg hugetlb charging Joshua Hahn
@ 2024-11-08 21:29 ` Joshua Hahn
2024-11-08 22:21 ` Shakeel Butt
2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
2024-11-08 21:29 ` [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol Joshua Hahn
2 siblings, 1 reply; 18+ messages in thread
From: Joshua Hahn @ 2024-11-08 21:29 UTC (permalink / raw)
To: shakeel.butt
Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
linux-mm, linux-kernel, kernel-team
This patch isolates the check for whether memcg accounts hugetlb.
This condition can only be true if the memcg mount option
memory_hugetlb_accounting is on, which includes hugetlb usage
in memory.current.
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
mm/memcontrol.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f3a9653cef0e..97f63ec9c9fb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1425,6 +1425,9 @@ unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item)
memcg_page_state_output_unit(item);
}
+/* Forward declaration */
+bool memcg_accounts_hugetlb(void);
+
static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
{
int i;
@@ -1446,7 +1449,7 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
#ifdef CONFIG_HUGETLB_PAGE
if (unlikely(memory_stats[i].idx == NR_HUGETLB) &&
- !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
+ !memcg_accounts_hugetlb())
continue;
#endif
size = memcg_page_state_output(memcg, memory_stats[i].idx);
@@ -4483,6 +4486,15 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
return ret;
}
+bool memcg_accounts_hugetlb(void)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+ return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
+#else
+ return false;
+#endif
+}
+
/**
* mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio
* @memcg: memcg to charge.
@@ -4508,8 +4520,7 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
* but do not attempt to commit charge later (or cancel on error) either.
*/
if (mem_cgroup_disabled() || !memcg ||
- !cgroup_subsys_on_dfl(memory_cgrp_subsys) ||
- !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
+ !cgroup_subsys_on_dfl(memory_cgrp_subsys) || !memcg_accounts_hugetlb())
return -EOPNOTSUPP;
if (try_charge(memcg, gfp, nr_pages))
--
2.43.5
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
2024-11-08 21:29 [PATCH 0/3] memcg/hugetlb: Rework memcg hugetlb charging Joshua Hahn
2024-11-08 21:29 ` [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb Joshua Hahn
@ 2024-11-08 21:29 ` Joshua Hahn
2024-11-08 22:42 ` Shakeel Butt
` (3 more replies)
2024-11-08 21:29 ` [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol Joshua Hahn
2 siblings, 4 replies; 18+ messages in thread
From: Joshua Hahn @ 2024-11-08 21:29 UTC (permalink / raw)
To: shakeel.butt
Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
linux-mm, linux-kernel, kernel-team
This patch introduces mem_cgroup_charge_hugetlb, which combines the
logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
memcg in hugetlb code, and also consolidates the error path that memcg
can take into just one point.
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
include/linux/memcontrol.h | 2 ++
mm/hugetlb.c | 34 ++++++++++++----------------------
mm/memcontrol.c | 19 +++++++++++++++++++
3 files changed, 33 insertions(+), 22 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bb49e0d4b377..d90c1ac791f1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -678,6 +678,8 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
long nr_pages);
+int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp);
+
int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
gfp_t gfp, swp_entry_t entry);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fbb10e52d7ea..6f6841483039 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2967,21 +2967,13 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
struct hugepage_subpool *spool = subpool_vma(vma);
struct hstate *h = hstate_vma(vma);
struct folio *folio;
- long map_chg, map_commit, nr_pages = pages_per_huge_page(h);
+ long map_chg, map_commit;
long gbl_chg;
- int memcg_charge_ret, ret, idx;
+ int ret, idx;
struct hugetlb_cgroup *h_cg = NULL;
- struct mem_cgroup *memcg;
bool deferred_reserve;
gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
- memcg = get_mem_cgroup_from_current();
- memcg_charge_ret = mem_cgroup_hugetlb_try_charge(memcg, gfp, nr_pages);
- if (memcg_charge_ret == -ENOMEM) {
- mem_cgroup_put(memcg);
- return ERR_PTR(-ENOMEM);
- }
-
idx = hstate_index(h);
/*
* Examine the region/reserve map to determine if the process
@@ -2989,12 +2981,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
* code of zero indicates a reservation exists (no change).
*/
map_chg = gbl_chg = vma_needs_reservation(h, vma, addr);
- if (map_chg < 0) {
- if (!memcg_charge_ret)
- mem_cgroup_cancel_charge(memcg, nr_pages);
- mem_cgroup_put(memcg);
+ if (map_chg < 0)
return ERR_PTR(-ENOMEM);
- }
/*
* Processes that did not create the mapping will have no
@@ -3092,10 +3080,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
}
}
- if (!memcg_charge_ret)
- mem_cgroup_commit_charge(folio, memcg);
- lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h));
- mem_cgroup_put(memcg);
+ ret = mem_cgroup_charge_hugetlb(folio, gfp);
+ if (ret == -ENOMEM) {
+ spin_unlock_irq(&hugetlb_lock);
+ free_huge_folio(folio);
+ return ERR_PTR(-ENOMEM);
+ }
+ else if (!ret)
+ lruvec_stat_mod_folio(folio, NR_HUGETLB,
+ pages_per_huge_page(h));
return folio;
@@ -3110,9 +3103,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
hugepage_subpool_put_pages(spool, 1);
out_end_reservation:
vma_end_reservation(h, vma, addr);
- if (!memcg_charge_ret)
- mem_cgroup_cancel_charge(memcg, nr_pages);
- mem_cgroup_put(memcg);
return ERR_PTR(-ENOSPC);
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 97f63ec9c9fb..95ee77fe27af 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4529,6 +4529,25 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
return 0;
}
+int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
+{
+ struct mem_cgroup *memcg = get_mem_cgroup_from_current();
+ int ret = 0;
+
+ if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
+ !memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
+
+ if (charge_memcg(folio, memcg, gfp))
+ ret = -ENOMEM;
+
+out:
+ mem_cgroup_put(memcg);
+ return ret;
+}
+
/**
* mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin.
* @folio: folio to charge.
--
2.43.5
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol
2024-11-08 21:29 [PATCH 0/3] memcg/hugetlb: Rework memcg hugetlb charging Joshua Hahn
2024-11-08 21:29 ` [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb Joshua Hahn
2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
@ 2024-11-08 21:29 ` Joshua Hahn
2024-11-08 22:43 ` Shakeel Butt
2024-11-08 23:07 ` Yosry Ahmed
2 siblings, 2 replies; 18+ messages in thread
From: Joshua Hahn @ 2024-11-08 21:29 UTC (permalink / raw)
To: shakeel.butt
Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
linux-mm, linux-kernel, kernel-team
This patch fully deprecates the mem_cgroup_{try, commit, cancel} charge
functions, as well as their hugetlb variants. Please note that this
patch relies on [1], which removes the last references (from memcg-v1)
to some of these functions.
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
[1] https://lore.kernel.org/linux-mm/20241025012304.2473312-1-shakeel.butt@linux.dev/
---
include/linux/memcontrol.h | 22 -------------
mm/memcontrol.c | 65 ++------------------------------------
2 files changed, 3 insertions(+), 84 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d90c1ac791f1..75f15c4efe73 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -649,8 +649,6 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
page_counter_read(&memcg->memory);
}
-void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg);
-
int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp);
/**
@@ -675,9 +673,6 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
return __mem_cgroup_charge(folio, mm, gfp);
}
-int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
- long nr_pages);
-
int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp);
int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
@@ -708,7 +703,6 @@ static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
__mem_cgroup_uncharge_folios(folios);
}
-void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages);
void mem_cgroup_replace_folio(struct folio *old, struct folio *new);
void mem_cgroup_migrate(struct folio *old, struct folio *new);
@@ -1167,23 +1161,12 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
return false;
}
-static inline void mem_cgroup_commit_charge(struct folio *folio,
- struct mem_cgroup *memcg)
-{
-}
-
static inline int mem_cgroup_charge(struct folio *folio,
struct mm_struct *mm, gfp_t gfp)
{
return 0;
}
-static inline int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg,
- gfp_t gfp, long nr_pages)
-{
- return 0;
-}
-
static inline int mem_cgroup_swapin_charge_folio(struct folio *folio,
struct mm_struct *mm, gfp_t gfp, swp_entry_t entry)
{
@@ -1202,11 +1185,6 @@ static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
{
}
-static inline void mem_cgroup_cancel_charge(struct mem_cgroup *memcg,
- unsigned int nr_pages)
-{
-}
-
static inline void mem_cgroup_replace_folio(struct folio *old,
struct folio *new)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 95ee77fe27af..17126d8d263d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2351,21 +2351,6 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
return 0;
}
-/**
- * mem_cgroup_cancel_charge() - cancel an uncommitted try_charge() call.
- * @memcg: memcg previously charged.
- * @nr_pages: number of pages previously charged.
- */
-void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
- if (mem_cgroup_is_root(memcg))
- return;
-
- page_counter_uncharge(&memcg->memory, nr_pages);
- if (do_memsw_account())
- page_counter_uncharge(&memcg->memsw, nr_pages);
-}
-
static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
{
VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio);
@@ -2379,18 +2364,6 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
folio->memcg_data = (unsigned long)memcg;
}
-/**
- * mem_cgroup_commit_charge - commit a previously successful try_charge().
- * @folio: folio to commit the charge to.
- * @memcg: memcg previously charged.
- */
-void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg)
-{
- css_get(&memcg->css);
- commit_charge(folio, memcg);
- memcg1_commit_charge(folio, memcg);
-}
-
static inline void __mod_objcg_mlstate(struct obj_cgroup *objcg,
struct pglist_data *pgdat,
enum node_stat_item idx, int nr)
@@ -4469,7 +4442,9 @@ static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg,
if (ret)
goto out;
- mem_cgroup_commit_charge(folio, memcg);
+ css_get(&memcg->css);
+ commit_charge(folio, memcg);
+ memcg1_commit_charge(folio, memcg);
out:
return ret;
}
@@ -4495,40 +4470,6 @@ bool memcg_accounts_hugetlb(void)
#endif
}
-/**
- * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio
- * @memcg: memcg to charge.
- * @gfp: reclaim mode.
- * @nr_pages: number of pages to charge.
- *
- * This function is called when allocating a huge page folio to determine if
- * the memcg has the capacity for it. It does not commit the charge yet,
- * as the hugetlb folio itself has not been obtained from the hugetlb pool.
- *
- * Once we have obtained the hugetlb folio, we can call
- * mem_cgroup_commit_charge() to commit the charge. If we fail to obtain the
- * folio, we should instead call mem_cgroup_cancel_charge() to undo the effect
- * of try_charge().
- *
- * Returns 0 on success. Otherwise, an error code is returned.
- */
-int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
- long nr_pages)
-{
- /*
- * If hugetlb memcg charging is not enabled, do not fail hugetlb allocation,
- * but do not attempt to commit charge later (or cancel on error) either.
- */
- if (mem_cgroup_disabled() || !memcg ||
- !cgroup_subsys_on_dfl(memory_cgrp_subsys) || !memcg_accounts_hugetlb())
- return -EOPNOTSUPP;
-
- if (try_charge(memcg, gfp, nr_pages))
- return -ENOMEM;
-
- return 0;
-}
-
int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
{
struct mem_cgroup *memcg = get_mem_cgroup_from_current();
--
2.43.5
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb
2024-11-08 21:29 ` [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb Joshua Hahn
@ 2024-11-08 22:21 ` Shakeel Butt
2024-11-08 23:03 ` Yosry Ahmed
0 siblings, 1 reply; 18+ messages in thread
From: Shakeel Butt @ 2024-11-08 22:21 UTC (permalink / raw)
To: Joshua Hahn
Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
linux-mm, linux-kernel, kernel-team
On Fri, Nov 08, 2024 at 01:29:44PM -0800, Joshua Hahn wrote:
> This patch isolates the check for whether memcg accounts hugetlb.
> This condition can only be true if the memcg mount option
> memory_hugetlb_accounting is on, which includes hugetlb usage
> in memory.current.
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
>
> ---
> mm/memcontrol.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f3a9653cef0e..97f63ec9c9fb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1425,6 +1425,9 @@ unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item)
> memcg_page_state_output_unit(item);
> }
>
> +/* Forward declaration */
> +bool memcg_accounts_hugetlb(void);
No need for forward declaration. Just define it here and make it static.
> +
> static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
> {
> int i;
> @@ -1446,7 +1449,7 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
>
> #ifdef CONFIG_HUGETLB_PAGE
> if (unlikely(memory_stats[i].idx == NR_HUGETLB) &&
> - !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
> + !memcg_accounts_hugetlb())
> continue;
> #endif
> size = memcg_page_state_output(memcg, memory_stats[i].idx);
> @@ -4483,6 +4486,15 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
> return ret;
> }
>
> +bool memcg_accounts_hugetlb(void)
> +{
> +#ifdef CONFIG_HUGETLB_PAGE
> + return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
> +#else
> + return false;
> +#endif
> +}
> +
> /**
> * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio
> * @memcg: memcg to charge.
> @@ -4508,8 +4520,7 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
> * but do not attempt to commit charge later (or cancel on error) either.
> */
> if (mem_cgroup_disabled() || !memcg ||
> - !cgroup_subsys_on_dfl(memory_cgrp_subsys) ||
> - !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
> + !cgroup_subsys_on_dfl(memory_cgrp_subsys) || !memcg_accounts_hugetlb())
> return -EOPNOTSUPP;
>
> if (try_charge(memcg, gfp, nr_pages))
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
@ 2024-11-08 22:42 ` Shakeel Butt
2024-11-09 18:58 ` Joshua Hahn
2024-11-09 1:03 ` SeongJae Park
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: Shakeel Butt @ 2024-11-08 22:42 UTC (permalink / raw)
To: Joshua Hahn
Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
linux-mm, linux-kernel, kernel-team
On Fri, Nov 08, 2024 at 01:29:45PM -0800, Joshua Hahn wrote:
> This patch introduces mem_cgroup_charge_hugetlb, which combines the
> logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
> memcg in hugetlb code, and also consolidates the error path that memcg
> can take into just one point.
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
>
> ---
> include/linux/memcontrol.h | 2 ++
> mm/hugetlb.c | 34 ++++++++++++----------------------
> mm/memcontrol.c | 19 +++++++++++++++++++
> 3 files changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index bb49e0d4b377..d90c1ac791f1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -678,6 +678,8 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
> int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
> long nr_pages);
>
> +int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp);
> +
> int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
> gfp_t gfp, swp_entry_t entry);
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index fbb10e52d7ea..6f6841483039 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2967,21 +2967,13 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> struct hugepage_subpool *spool = subpool_vma(vma);
> struct hstate *h = hstate_vma(vma);
> struct folio *folio;
> - long map_chg, map_commit, nr_pages = pages_per_huge_page(h);
> + long map_chg, map_commit;
> long gbl_chg;
> - int memcg_charge_ret, ret, idx;
> + int ret, idx;
> struct hugetlb_cgroup *h_cg = NULL;
> - struct mem_cgroup *memcg;
> bool deferred_reserve;
> gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
>
> - memcg = get_mem_cgroup_from_current();
> - memcg_charge_ret = mem_cgroup_hugetlb_try_charge(memcg, gfp, nr_pages);
> - if (memcg_charge_ret == -ENOMEM) {
> - mem_cgroup_put(memcg);
> - return ERR_PTR(-ENOMEM);
> - }
> -
> idx = hstate_index(h);
> /*
> * Examine the region/reserve map to determine if the process
> @@ -2989,12 +2981,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> * code of zero indicates a reservation exists (no change).
> */
> map_chg = gbl_chg = vma_needs_reservation(h, vma, addr);
> - if (map_chg < 0) {
> - if (!memcg_charge_ret)
> - mem_cgroup_cancel_charge(memcg, nr_pages);
> - mem_cgroup_put(memcg);
> + if (map_chg < 0)
> return ERR_PTR(-ENOMEM);
> - }
>
> /*
> * Processes that did not create the mapping will have no
> @@ -3092,10 +3080,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> }
> }
>
> - if (!memcg_charge_ret)
> - mem_cgroup_commit_charge(folio, memcg);
> - lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h));
> - mem_cgroup_put(memcg);
> + ret = mem_cgroup_charge_hugetlb(folio, gfp);
> + if (ret == -ENOMEM) {
> + spin_unlock_irq(&hugetlb_lock);
spin_unlock_irq??
> + free_huge_folio(folio);
free_huge_folio() will call lruvec_stat_mod_folio() unconditionally but
you are only calling it on success. This may underflow the metric.
> + return ERR_PTR(-ENOMEM);
> + }
> + else if (!ret)
> + lruvec_stat_mod_folio(folio, NR_HUGETLB,
> + pages_per_huge_page(h));
>
> return folio;
>
> @@ -3110,9 +3103,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> hugepage_subpool_put_pages(spool, 1);
> out_end_reservation:
> vma_end_reservation(h, vma, addr);
> - if (!memcg_charge_ret)
> - mem_cgroup_cancel_charge(memcg, nr_pages);
> - mem_cgroup_put(memcg);
> return ERR_PTR(-ENOSPC);
> }
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 97f63ec9c9fb..95ee77fe27af 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4529,6 +4529,25 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
> return 0;
> }
>
> +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> +{
> + struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> + int ret = 0;
> +
> + if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
> + !memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> + ret = -EOPNOTSUPP;
why EOPNOTSUPP? You need to return 0 here. We do want
lruvec_stat_mod_folio() to be called.
> + goto out;
> + }
> +
> + if (charge_memcg(folio, memcg, gfp))
> + ret = -ENOMEM;
> +
> +out:
> + mem_cgroup_put(memcg);
> + return ret;
> +}
> +
> /**
> * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin.
> * @folio: folio to charge.
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol
2024-11-08 21:29 ` [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol Joshua Hahn
@ 2024-11-08 22:43 ` Shakeel Butt
2024-11-08 23:07 ` Yosry Ahmed
1 sibling, 0 replies; 18+ messages in thread
From: Shakeel Butt @ 2024-11-08 22:43 UTC (permalink / raw)
To: Joshua Hahn
Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
linux-mm, linux-kernel, kernel-team
On Fri, Nov 08, 2024 at 01:29:46PM -0800, Joshua Hahn wrote:
> This patch fully deprecates the mem_cgroup_{try, commit, cancel} charge
> functions, as well as their hugetlb variants. Please note that this
> patch relies on [1], which removes the last references (from memcg-v1)
> to some of these functions.
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb
2024-11-08 22:21 ` Shakeel Butt
@ 2024-11-08 23:03 ` Yosry Ahmed
2024-11-09 18:31 ` Joshua Hahn
0 siblings, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2024-11-08 23:03 UTC (permalink / raw)
To: Shakeel Butt
Cc: Joshua Hahn, hannes, mhocko, roman.gushchin, muchun.song, akpm,
cgroups, linux-mm, linux-kernel, kernel-team
On Fri, Nov 8, 2024 at 2:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Fri, Nov 08, 2024 at 01:29:44PM -0800, Joshua Hahn wrote:
> > This patch isolates the check for whether memcg accounts hugetlb.
> > This condition can only be true if the memcg mount option
> > memory_hugetlb_accounting is on, which includes hugetlb usage
> > in memory.current.
> >
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> >
> > ---
> > mm/memcontrol.c | 17 ++++++++++++++---
> > 1 file changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f3a9653cef0e..97f63ec9c9fb 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1425,6 +1425,9 @@ unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item)
> > memcg_page_state_output_unit(item);
> > }
> >
> > +/* Forward declaration */
> > +bool memcg_accounts_hugetlb(void);
>
> No need for forward declaration. Just define it here and make it static.
Also please pull the #ifdef outside the function definition, e.g.
#ifdef CONFIG_HUGETLB_PAGE
static bool memcg_accounts_hugetlb(void)
{
return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
}
#else /* CONFIG_HUGETLB_PAGE */
static bool memcg_accounts_hugetlb(void) { return false; }
{
return false;
}
#endif /* CONFIG_HUGETLB_PAGE */
>
> > +
> > static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
> > {
> > int i;
> > @@ -1446,7 +1449,7 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
> >
> > #ifdef CONFIG_HUGETLB_PAGE
> > if (unlikely(memory_stats[i].idx == NR_HUGETLB) &&
> > - !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
> > + !memcg_accounts_hugetlb())
> > continue;
> > #endif
> > size = memcg_page_state_output(memcg, memory_stats[i].idx);
> > @@ -4483,6 +4486,15 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
> > return ret;
> > }
> >
> > +bool memcg_accounts_hugetlb(void)
> > +{
> > +#ifdef CONFIG_HUGETLB_PAGE
> > + return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
> > +#else
> > + return false;
> > +#endif
> > +}
> > +
> > /**
> > * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio
> > * @memcg: memcg to charge.
> > @@ -4508,8 +4520,7 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
> > * but do not attempt to commit charge later (or cancel on error) either.
> > */
> > if (mem_cgroup_disabled() || !memcg ||
> > - !cgroup_subsys_on_dfl(memory_cgrp_subsys) ||
> > - !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING))
> > + !cgroup_subsys_on_dfl(memory_cgrp_subsys) || !memcg_accounts_hugetlb())
> > return -EOPNOTSUPP;
> >
> > if (try_charge(memcg, gfp, nr_pages))
> > --
> > 2.43.5
> >
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol
2024-11-08 21:29 ` [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol Joshua Hahn
2024-11-08 22:43 ` Shakeel Butt
@ 2024-11-08 23:07 ` Yosry Ahmed
2024-11-09 18:34 ` Joshua Hahn
1 sibling, 1 reply; 18+ messages in thread
From: Yosry Ahmed @ 2024-11-08 23:07 UTC (permalink / raw)
To: Joshua Hahn
Cc: shakeel.butt, hannes, mhocko, roman.gushchin, muchun.song, akpm,
cgroups, linux-mm, linux-kernel, kernel-team
On Fri, Nov 8, 2024 at 1:30 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> This patch fully deprecates the mem_cgroup_{try, commit, cancel} charge
> functions, as well as their hugetlb variants. Please note that this
> patch relies on [1], which removes the last references (from memcg-v1)
> to some of these functions.
Nit: We are not really "deprecating" them, we are removing them.
Deprecation is usually tied to user-visible APIs that we cannot just
remove, at least not right away. Please rephrase the subject and
commit log accordingly.
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
>
> [1] https://lore.kernel.org/linux-mm/20241025012304.2473312-1-shakeel.butt@linux.dev/
>
> ---
> include/linux/memcontrol.h | 22 -------------
> mm/memcontrol.c | 65 ++------------------------------------
> 2 files changed, 3 insertions(+), 84 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d90c1ac791f1..75f15c4efe73 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -649,8 +649,6 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
> page_counter_read(&memcg->memory);
> }
>
> -void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg);
> -
> int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp);
>
> /**
> @@ -675,9 +673,6 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
> return __mem_cgroup_charge(folio, mm, gfp);
> }
>
> -int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
> - long nr_pages);
> -
> int mem_cgroup_charge_hugetlb(struct folio* folio, gfp_t gfp);
>
> int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm,
> @@ -708,7 +703,6 @@ static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
> __mem_cgroup_uncharge_folios(folios);
> }
>
> -void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages);
> void mem_cgroup_replace_folio(struct folio *old, struct folio *new);
> void mem_cgroup_migrate(struct folio *old, struct folio *new);
>
> @@ -1167,23 +1161,12 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
> return false;
> }
>
> -static inline void mem_cgroup_commit_charge(struct folio *folio,
> - struct mem_cgroup *memcg)
> -{
> -}
> -
> static inline int mem_cgroup_charge(struct folio *folio,
> struct mm_struct *mm, gfp_t gfp)
> {
> return 0;
> }
>
> -static inline int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg,
> - gfp_t gfp, long nr_pages)
> -{
> - return 0;
> -}
> -
> static inline int mem_cgroup_swapin_charge_folio(struct folio *folio,
> struct mm_struct *mm, gfp_t gfp, swp_entry_t entry)
> {
> @@ -1202,11 +1185,6 @@ static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
> {
> }
>
> -static inline void mem_cgroup_cancel_charge(struct mem_cgroup *memcg,
> - unsigned int nr_pages)
> -{
> -}
> -
> static inline void mem_cgroup_replace_folio(struct folio *old,
> struct folio *new)
> {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 95ee77fe27af..17126d8d263d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2351,21 +2351,6 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
> return 0;
> }
>
> -/**
> - * mem_cgroup_cancel_charge() - cancel an uncommitted try_charge() call.
> - * @memcg: memcg previously charged.
> - * @nr_pages: number of pages previously charged.
> - */
> -void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
> -{
> - if (mem_cgroup_is_root(memcg))
> - return;
> -
> - page_counter_uncharge(&memcg->memory, nr_pages);
> - if (do_memsw_account())
> - page_counter_uncharge(&memcg->memsw, nr_pages);
> -}
> -
> static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
> {
> VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio);
> @@ -2379,18 +2364,6 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
> folio->memcg_data = (unsigned long)memcg;
> }
>
> -/**
> - * mem_cgroup_commit_charge - commit a previously successful try_charge().
> - * @folio: folio to commit the charge to.
> - * @memcg: memcg previously charged.
> - */
> -void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg)
> -{
> - css_get(&memcg->css);
> - commit_charge(folio, memcg);
> - memcg1_commit_charge(folio, memcg);
> -}
> -
> static inline void __mod_objcg_mlstate(struct obj_cgroup *objcg,
> struct pglist_data *pgdat,
> enum node_stat_item idx, int nr)
> @@ -4469,7 +4442,9 @@ static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg,
> if (ret)
> goto out;
>
> - mem_cgroup_commit_charge(folio, memcg);
> + css_get(&memcg->css);
> + commit_charge(folio, memcg);
> + memcg1_commit_charge(folio, memcg);
> out:
> return ret;
> }
> @@ -4495,40 +4470,6 @@ bool memcg_accounts_hugetlb(void)
> #endif
> }
>
> -/**
> - * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio
> - * @memcg: memcg to charge.
> - * @gfp: reclaim mode.
> - * @nr_pages: number of pages to charge.
> - *
> - * This function is called when allocating a huge page folio to determine if
> - * the memcg has the capacity for it. It does not commit the charge yet,
> - * as the hugetlb folio itself has not been obtained from the hugetlb pool.
> - *
> - * Once we have obtained the hugetlb folio, we can call
> - * mem_cgroup_commit_charge() to commit the charge. If we fail to obtain the
> - * folio, we should instead call mem_cgroup_cancel_charge() to undo the effect
> - * of try_charge().
> - *
> - * Returns 0 on success. Otherwise, an error code is returned.
> - */
> -int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
> - long nr_pages)
> -{
> - /*
> - * If hugetlb memcg charging is not enabled, do not fail hugetlb allocation,
> - * but do not attempt to commit charge later (or cancel on error) either.
> - */
> - if (mem_cgroup_disabled() || !memcg ||
> - !cgroup_subsys_on_dfl(memory_cgrp_subsys) || !memcg_accounts_hugetlb())
> - return -EOPNOTSUPP;
> -
> - if (try_charge(memcg, gfp, nr_pages))
> - return -ENOMEM;
> -
> - return 0;
> -}
> -
> int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> {
> struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> --
> 2.43.5
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
2024-11-08 22:42 ` Shakeel Butt
@ 2024-11-09 1:03 ` SeongJae Park
2024-11-09 18:41 ` Joshua Hahn
2024-11-09 3:42 ` kernel test robot
2024-11-09 5:24 ` kernel test robot
3 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2024-11-09 1:03 UTC (permalink / raw)
To: Joshua Hahn
Cc: SeongJae Park, shakeel.butt, hannes, mhocko, roman.gushchin,
muchun.song, akpm, cgroups, linux-mm, linux-kernel, kernel-team
Hi Joshua,
On Fri, 8 Nov 2024 13:29:45 -0800 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> This patch introduces mem_cgroup_charge_hugetlb, which combines the
> logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
Nit. Seems the regular expression is not technically correct?
> memcg in hugetlb code, and also consolidates the error path that memcg
> can take into just one point.
>
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
>
> ---
> include/linux/memcontrol.h | 2 ++
> mm/hugetlb.c | 34 ++++++++++++----------------------
> mm/memcontrol.c | 19 +++++++++++++++++++
> 3 files changed, 33 insertions(+), 22 deletions(-)
[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 97f63ec9c9fb..95ee77fe27af 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4529,6 +4529,25 @@ int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp,
> return 0;
> }
>
> +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
Can we add a kernel-doc comment for this function? Maybe that for
mem_cgroup_hugetlb_try_charge() can be stolen with only small updates?
> +{
> + struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> + int ret = 0;
> +
> + if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
> + !memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> + ret = -EOPNOTSUPP;
> + goto out;
> + }
> +
> + if (charge_memcg(folio, memcg, gfp))
> + ret = -ENOMEM;
> +
> +out:
> + mem_cgroup_put(memcg);
> + return ret;
> +}
> +
> /**
> * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin.
> * @folio: folio to charge.
> --
> 2.43.5
Thanks,
SJ
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
2024-11-08 22:42 ` Shakeel Butt
2024-11-09 1:03 ` SeongJae Park
@ 2024-11-09 3:42 ` kernel test robot
2024-11-09 5:24 ` kernel test robot
3 siblings, 0 replies; 18+ messages in thread
From: kernel test robot @ 2024-11-09 3:42 UTC (permalink / raw)
To: Joshua Hahn, shakeel.butt
Cc: oe-kbuild-all, hannes, mhocko, roman.gushchin, muchun.song, akpm,
cgroups, linux-mm, linux-kernel, kernel-team
Hi Joshua,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on next-20241108]
[cannot apply to linus/master v6.12-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Joshua-Hahn/memcg-hugetlb-Introduce-memcg_accounts_hugetlb/20241109-053045
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20241108212946.2642085-3-joshua.hahnjy%40gmail.com
patch subject: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20241109/202411091159.y6yDCkdf-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241109/202411091159.y6yDCkdf-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411091159.y6yDCkdf-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/hugetlb.c: In function 'alloc_hugetlb_folio':
>> mm/hugetlb.c:3083:15: error: implicit declaration of function 'mem_cgroup_charge_hugetlb'; did you mean 'mem_cgroup_charge_skmem'? [-Werror=implicit-function-declaration]
3083 | ret = mem_cgroup_charge_hugetlb(folio, gfp);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
| mem_cgroup_charge_skmem
cc1: some warnings being treated as errors
vim +3083 mm/hugetlb.c
2963
2964 struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
2965 unsigned long addr, int avoid_reserve)
2966 {
2967 struct hugepage_subpool *spool = subpool_vma(vma);
2968 struct hstate *h = hstate_vma(vma);
2969 struct folio *folio;
2970 long map_chg, map_commit;
2971 long gbl_chg;
2972 int ret, idx;
2973 struct hugetlb_cgroup *h_cg = NULL;
2974 bool deferred_reserve;
2975 gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
2976
2977 idx = hstate_index(h);
2978 /*
2979 * Examine the region/reserve map to determine if the process
2980 * has a reservation for the page to be allocated. A return
2981 * code of zero indicates a reservation exists (no change).
2982 */
2983 map_chg = gbl_chg = vma_needs_reservation(h, vma, addr);
2984 if (map_chg < 0)
2985 return ERR_PTR(-ENOMEM);
2986
2987 /*
2988 * Processes that did not create the mapping will have no
2989 * reserves as indicated by the region/reserve map. Check
2990 * that the allocation will not exceed the subpool limit.
2991 * Allocations for MAP_NORESERVE mappings also need to be
2992 * checked against any subpool limit.
2993 */
2994 if (map_chg || avoid_reserve) {
2995 gbl_chg = hugepage_subpool_get_pages(spool, 1);
2996 if (gbl_chg < 0)
2997 goto out_end_reservation;
2998
2999 /*
3000 * Even though there was no reservation in the region/reserve
3001 * map, there could be reservations associated with the
3002 * subpool that can be used. This would be indicated if the
3003 * return value of hugepage_subpool_get_pages() is zero.
3004 * However, if avoid_reserve is specified we still avoid even
3005 * the subpool reservations.
3006 */
3007 if (avoid_reserve)
3008 gbl_chg = 1;
3009 }
3010
3011 /* If this allocation is not consuming a reservation, charge it now.
3012 */
3013 deferred_reserve = map_chg || avoid_reserve;
3014 if (deferred_reserve) {
3015 ret = hugetlb_cgroup_charge_cgroup_rsvd(
3016 idx, pages_per_huge_page(h), &h_cg);
3017 if (ret)
3018 goto out_subpool_put;
3019 }
3020
3021 ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
3022 if (ret)
3023 goto out_uncharge_cgroup_reservation;
3024
3025 spin_lock_irq(&hugetlb_lock);
3026 /*
3027 * glb_chg is passed to indicate whether or not a page must be taken
3028 * from the global free pool (global change). gbl_chg == 0 indicates
3029 * a reservation exists for the allocation.
3030 */
3031 folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
3032 if (!folio) {
3033 spin_unlock_irq(&hugetlb_lock);
3034 folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
3035 if (!folio)
3036 goto out_uncharge_cgroup;
3037 spin_lock_irq(&hugetlb_lock);
3038 if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
3039 folio_set_hugetlb_restore_reserve(folio);
3040 h->resv_huge_pages--;
3041 }
3042 list_add(&folio->lru, &h->hugepage_activelist);
3043 folio_ref_unfreeze(folio, 1);
3044 /* Fall through */
3045 }
3046
3047 hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio);
3048 /* If allocation is not consuming a reservation, also store the
3049 * hugetlb_cgroup pointer on the page.
3050 */
3051 if (deferred_reserve) {
3052 hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h),
3053 h_cg, folio);
3054 }
3055
3056 spin_unlock_irq(&hugetlb_lock);
3057
3058 hugetlb_set_folio_subpool(folio, spool);
3059
3060 map_commit = vma_commit_reservation(h, vma, addr);
3061 if (unlikely(map_chg > map_commit)) {
3062 /*
3063 * The page was added to the reservation map between
3064 * vma_needs_reservation and vma_commit_reservation.
3065 * This indicates a race with hugetlb_reserve_pages.
3066 * Adjust for the subpool count incremented above AND
3067 * in hugetlb_reserve_pages for the same page. Also,
3068 * the reservation count added in hugetlb_reserve_pages
3069 * no longer applies.
3070 */
3071 long rsv_adjust;
3072
3073 rsv_adjust = hugepage_subpool_put_pages(spool, 1);
3074 hugetlb_acct_memory(h, -rsv_adjust);
3075 if (deferred_reserve) {
3076 spin_lock_irq(&hugetlb_lock);
3077 hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
3078 pages_per_huge_page(h), folio);
3079 spin_unlock_irq(&hugetlb_lock);
3080 }
3081 }
3082
> 3083 ret = mem_cgroup_charge_hugetlb(folio, gfp);
3084 if (ret == -ENOMEM) {
3085 spin_unlock_irq(&hugetlb_lock);
3086 free_huge_folio(folio);
3087 return ERR_PTR(-ENOMEM);
3088 }
3089 else if (!ret)
3090 lruvec_stat_mod_folio(folio, NR_HUGETLB,
3091 pages_per_huge_page(h));
3092
3093 return folio;
3094
3095 out_uncharge_cgroup:
3096 hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg);
3097 out_uncharge_cgroup_reservation:
3098 if (deferred_reserve)
3099 hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h),
3100 h_cg);
3101 out_subpool_put:
3102 if (map_chg || avoid_reserve)
3103 hugepage_subpool_put_pages(spool, 1);
3104 out_end_reservation:
3105 vma_end_reservation(h, vma, addr);
3106 return ERR_PTR(-ENOSPC);
3107 }
3108
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
` (2 preceding siblings ...)
2024-11-09 3:42 ` kernel test robot
@ 2024-11-09 5:24 ` kernel test robot
3 siblings, 0 replies; 18+ messages in thread
From: kernel test robot @ 2024-11-09 5:24 UTC (permalink / raw)
To: Joshua Hahn, shakeel.butt
Cc: llvm, oe-kbuild-all, hannes, mhocko, roman.gushchin, muchun.song,
akpm, cgroups, linux-mm, linux-kernel, kernel-team
Hi Joshua,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on next-20241108]
[cannot apply to linus/master v6.12-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Joshua-Hahn/memcg-hugetlb-Introduce-memcg_accounts_hugetlb/20241109-053045
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20241108212946.2642085-3-joshua.hahnjy%40gmail.com
patch subject: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
config: x86_64-buildonly-randconfig-002-20241109 (https://download.01.org/0day-ci/archive/20241109/202411091344.51lAbqmY-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241109/202411091344.51lAbqmY-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411091344.51lAbqmY-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from mm/hugetlb.c:8:
In file included from include/linux/mm.h:2211:
include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
In file included from mm/hugetlb.c:37:
include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
47 | __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages);
| ~~~~~~~~~~~ ^ ~~~
include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
49 | NR_ZONE_LRU_BASE + lru, nr_pages);
| ~~~~~~~~~~~~~~~~ ^ ~~~
>> mm/hugetlb.c:3083:8: error: call to undeclared function 'mem_cgroup_charge_hugetlb'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
3083 | ret = mem_cgroup_charge_hugetlb(folio, gfp);
| ^
mm/hugetlb.c:3083:8: note: did you mean 'mem_cgroup_charge_skmem'?
include/linux/memcontrol.h:1613:6: note: 'mem_cgroup_charge_skmem' declared here
1613 | bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
| ^
3 warnings and 1 error generated.
vim +/mem_cgroup_charge_hugetlb +3083 mm/hugetlb.c
2963
2964 struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
2965 unsigned long addr, int avoid_reserve)
2966 {
2967 struct hugepage_subpool *spool = subpool_vma(vma);
2968 struct hstate *h = hstate_vma(vma);
2969 struct folio *folio;
2970 long map_chg, map_commit;
2971 long gbl_chg;
2972 int ret, idx;
2973 struct hugetlb_cgroup *h_cg = NULL;
2974 bool deferred_reserve;
2975 gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
2976
2977 idx = hstate_index(h);
2978 /*
2979 * Examine the region/reserve map to determine if the process
2980 * has a reservation for the page to be allocated. A return
2981 * code of zero indicates a reservation exists (no change).
2982 */
2983 map_chg = gbl_chg = vma_needs_reservation(h, vma, addr);
2984 if (map_chg < 0)
2985 return ERR_PTR(-ENOMEM);
2986
2987 /*
2988 * Processes that did not create the mapping will have no
2989 * reserves as indicated by the region/reserve map. Check
2990 * that the allocation will not exceed the subpool limit.
2991 * Allocations for MAP_NORESERVE mappings also need to be
2992 * checked against any subpool limit.
2993 */
2994 if (map_chg || avoid_reserve) {
2995 gbl_chg = hugepage_subpool_get_pages(spool, 1);
2996 if (gbl_chg < 0)
2997 goto out_end_reservation;
2998
2999 /*
3000 * Even though there was no reservation in the region/reserve
3001 * map, there could be reservations associated with the
3002 * subpool that can be used. This would be indicated if the
3003 * return value of hugepage_subpool_get_pages() is zero.
3004 * However, if avoid_reserve is specified we still avoid even
3005 * the subpool reservations.
3006 */
3007 if (avoid_reserve)
3008 gbl_chg = 1;
3009 }
3010
3011 /* If this allocation is not consuming a reservation, charge it now.
3012 */
3013 deferred_reserve = map_chg || avoid_reserve;
3014 if (deferred_reserve) {
3015 ret = hugetlb_cgroup_charge_cgroup_rsvd(
3016 idx, pages_per_huge_page(h), &h_cg);
3017 if (ret)
3018 goto out_subpool_put;
3019 }
3020
3021 ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg);
3022 if (ret)
3023 goto out_uncharge_cgroup_reservation;
3024
3025 spin_lock_irq(&hugetlb_lock);
3026 /*
3027 * glb_chg is passed to indicate whether or not a page must be taken
3028 * from the global free pool (global change). gbl_chg == 0 indicates
3029 * a reservation exists for the allocation.
3030 */
3031 folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
3032 if (!folio) {
3033 spin_unlock_irq(&hugetlb_lock);
3034 folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
3035 if (!folio)
3036 goto out_uncharge_cgroup;
3037 spin_lock_irq(&hugetlb_lock);
3038 if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
3039 folio_set_hugetlb_restore_reserve(folio);
3040 h->resv_huge_pages--;
3041 }
3042 list_add(&folio->lru, &h->hugepage_activelist);
3043 folio_ref_unfreeze(folio, 1);
3044 /* Fall through */
3045 }
3046
3047 hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio);
3048 /* If allocation is not consuming a reservation, also store the
3049 * hugetlb_cgroup pointer on the page.
3050 */
3051 if (deferred_reserve) {
3052 hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h),
3053 h_cg, folio);
3054 }
3055
3056 spin_unlock_irq(&hugetlb_lock);
3057
3058 hugetlb_set_folio_subpool(folio, spool);
3059
3060 map_commit = vma_commit_reservation(h, vma, addr);
3061 if (unlikely(map_chg > map_commit)) {
3062 /*
3063 * The page was added to the reservation map between
3064 * vma_needs_reservation and vma_commit_reservation.
3065 * This indicates a race with hugetlb_reserve_pages.
3066 * Adjust for the subpool count incremented above AND
3067 * in hugetlb_reserve_pages for the same page. Also,
3068 * the reservation count added in hugetlb_reserve_pages
3069 * no longer applies.
3070 */
3071 long rsv_adjust;
3072
3073 rsv_adjust = hugepage_subpool_put_pages(spool, 1);
3074 hugetlb_acct_memory(h, -rsv_adjust);
3075 if (deferred_reserve) {
3076 spin_lock_irq(&hugetlb_lock);
3077 hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
3078 pages_per_huge_page(h), folio);
3079 spin_unlock_irq(&hugetlb_lock);
3080 }
3081 }
3082
> 3083 ret = mem_cgroup_charge_hugetlb(folio, gfp);
3084 if (ret == -ENOMEM) {
3085 spin_unlock_irq(&hugetlb_lock);
3086 free_huge_folio(folio);
3087 return ERR_PTR(-ENOMEM);
3088 }
3089 else if (!ret)
3090 lruvec_stat_mod_folio(folio, NR_HUGETLB,
3091 pages_per_huge_page(h));
3092
3093 return folio;
3094
3095 out_uncharge_cgroup:
3096 hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg);
3097 out_uncharge_cgroup_reservation:
3098 if (deferred_reserve)
3099 hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h),
3100 h_cg);
3101 out_subpool_put:
3102 if (map_chg || avoid_reserve)
3103 hugepage_subpool_put_pages(spool, 1);
3104 out_end_reservation:
3105 vma_end_reservation(h, vma, addr);
3106 return ERR_PTR(-ENOSPC);
3107 }
3108
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb
2024-11-08 23:03 ` Yosry Ahmed
@ 2024-11-09 18:31 ` Joshua Hahn
0 siblings, 0 replies; 18+ messages in thread
From: Joshua Hahn @ 2024-11-09 18:31 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Shakeel Butt, hannes, mhocko, roman.gushchin, muchun.song, akpm,
cgroups, linux-mm, linux-kernel, kernel-team
On Fri, Nov 8, 2024 at 6:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Fri, Nov 8, 2024 at 2:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Fri, Nov 08, 2024 at 01:29:44PM -0800, Joshua Hahn wrote:
> > > This patch isolates the check for whether memcg accounts hugetlb.
> > > This condition can only be true if the memcg mount option
> > > memory_hugetlb_accounting is on, which includes hugetlb usage
> > > in memory.current.
> > >
> > > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > >
> > > ---
> > > mm/memcontrol.c | 17 ++++++++++++++---
> > > 1 file changed, 14 insertions(+), 3 deletions(-)
> > > +/* Forward declaration */
> > > +bool memcg_accounts_hugetlb(void);
> >
> > No need for forward declaration. Just define it here and make it static.
>
> Also please pull the #ifdef outside the function definition, e.g.
>
> #ifdef CONFIG_HUGETLB_PAGE
> static bool memcg_accounts_hugetlb(void)
> {
> return cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
> }
> #else /* CONFIG_HUGETLB_PAGE */
> static bool memcg_accounts_hugetlb(void) { return false; }
> {
> return false;
> }
> #endif /* CONFIG_HUGETLB_PAGE */
>
Hello Shakeel and Yosry,
Thank you for taking the time to review my patch.
Yes -- I will just declare the function & make it static. It was my
intention to group the new memcg charging functions together,
and in that effort I just made a forward declaration above.
However, I think that it does make the code look a bit more
messy, which is against the spirit of this patch series!
And Yosry, thank you for your feedback, I will separate the
definitions based on the #ifdef.
Thank you both, I hope you have a great day!
Joshua
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol
2024-11-08 23:07 ` Yosry Ahmed
@ 2024-11-09 18:34 ` Joshua Hahn
0 siblings, 0 replies; 18+ messages in thread
From: Joshua Hahn @ 2024-11-09 18:34 UTC (permalink / raw)
To: Yosry Ahmed
Cc: shakeel.butt, hannes, mhocko, roman.gushchin, muchun.song, akpm,
cgroups, linux-mm, linux-kernel, kernel-team
On Fri, Nov 8, 2024 at 6:08 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Fri, Nov 8, 2024 at 1:30 PM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> >
> > This patch fully deprecates the mem_cgroup_{try, commit, cancel} charge
> > functions, as well as their hugetlb variants. Please note that this
> > patch relies on [1], which removes the last references (from memcg-v1)
> > to some of these functions.
>
> Nit: We are not really "deprecating" them, we are removing them.
> Deprecation is usually tied to user-visible APIs that we cannot just
> remove, at least not right away. Please rephrase the subject and
> commit log accordingly.
>
> >
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> >
> > [1] https://lore.kernel.org/linux-mm/20241025012304.2473312-1-shakeel.butt@linux.dev/
Hi Yosry,
Thank you for letting me know. To be completely honest, I think
I have been misusing the word in that case. You are correct,
the goal was to try and not change any functionality from the
user perspective, so I think removing is a better word, as you
suggested. I will make this change in the v3!
Have a great day,
Joshua
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
2024-11-09 1:03 ` SeongJae Park
@ 2024-11-09 18:41 ` Joshua Hahn
2024-11-09 20:52 ` SeongJae Park
0 siblings, 1 reply; 18+ messages in thread
From: Joshua Hahn @ 2024-11-09 18:41 UTC (permalink / raw)
To: SeongJae Park
Cc: shakeel.butt, hannes, mhocko, roman.gushchin, muchun.song, akpm,
cgroups, linux-mm, linux-kernel, kernel-team
Hello SJ, thank you for reviewing my patch!
On Fri, Nov 8, 2024 at 8:03 PM SeongJae Park <sj@kernel.org> wrote:
>
> Hi Joshua,
>
> On Fri, 8 Nov 2024 13:29:45 -0800 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> > This patch introduces mem_cgroup_charge_hugetlb, which combines the
> > logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
>
> Nit. Seems the regular expression is not technically correct?
I see, I will change it expand it out to include both. What I meant to
say is that it combines the functionality of both the functions, but
I think there was a typo there. I will just expand it out so that it is
more clear to readers!
> > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
>
> Can we add a kernel-doc comment for this function? Maybe that for
> mem_cgroup_hugetlb_try_charge() can be stolen with only small updates?
Yes, I can definitely add a kernel-doc for this function. Would
you mind expanding on the "stolen only with small updates" part?
Do you mean that instead of writing a completely new section
in the kernel-doc, I can just change the name of the section
and modify small parts of the description?
> Thanks,
> SJ
Thank you for your time! I hope you have a good weekend!
Joshua
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
2024-11-08 22:42 ` Shakeel Butt
@ 2024-11-09 18:58 ` Joshua Hahn
2024-11-11 6:33 ` Shakeel Butt
0 siblings, 1 reply; 18+ messages in thread
From: Joshua Hahn @ 2024-11-09 18:58 UTC (permalink / raw)
To: Shakeel Butt
Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
linux-mm, linux-kernel, kernel-team
Hello Shakeel, thank you for reviewing my patch!
On Fri, Nov 8, 2024 at 5:43 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Fri, Nov 08, 2024 at 01:29:45PM -0800, Joshua Hahn wrote:
> > This patch introduces mem_cgroup_charge_hugetlb, which combines the
> > logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
> > memcg in hugetlb code, and also consolidates the error path that memcg
> > can take into just one point.
> >
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > - if (!memcg_charge_ret)
> > - mem_cgroup_commit_charge(folio, memcg);
> > - lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h));
> > - mem_cgroup_put(memcg);
> > + ret = mem_cgroup_charge_hugetlb(folio, gfp);
> > + if (ret == -ENOMEM) {
> > + spin_unlock_irq(&hugetlb_lock);
>
> spin_unlock_irq??
Thank you for the catch. I completely missed this after I
swapped the position of mem_cgroup_charge_hugetlb
to be called without the lock. I will fix this.
> > + free_huge_folio(folio);
>
> free_huge_folio() will call lruvec_stat_mod_folio() unconditionally but
> you are only calling it on success. This may underflow the metric.
I was actually thinking about this too. I was wondering what would
make sense -- in the original draft of this patch, I had the charge
increment be called unconditionally as well. The idea was that
even though it would not make sense to have the stat incremented
when there is an error, it would eventually be corrected by
free_huge_folio's decrement. However, because there is nothing
stopping the user from checking the stat in this period, they may
temporarily see that the value is incremented in memory.stat,
even though they were not able to obtain this page.
With that said, maybe it makes sense to increment unconditionally,
if free also decrements unconditionally. This race condition is
not something that will cause a huge problem for the user,
although users relying on userspace monitors for memory.stat
to handle memory management may see some problems.
Maybe what would make the most sense is to do both
incrementing & decrementing conditionally as well.
Thank you for your feedback, I will iterate on this for the next version!
> > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> > +{
> > + struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> > + int ret = 0;
> > +
> > + if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
> > + !memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > + ret = -EOPNOTSUPP;
>
> why EOPNOTSUPP? You need to return 0 here. We do want
> lruvec_stat_mod_folio() to be called.
In this case, I was just preserving the original code's return statements.
That is, in mem_cgroup_hugetlb_try_charge, the intended behavior
was to return -EOPNOTSUPP if any of these conditions were met.
If I understand the code correctly, calling lruvec_stat_mod_folio()
on this failure will be a noop, since either memcg doesn't account
hugetlb folios / there is no memcg / memcg is disabled.
With all of this said, I think your feedback makes the most sense
here, given the new semantics of the function: if there is no
memcg or memcg doesn't account hugetlb, then there is no
way that the limit can be reached! I will go forward with returning 0,
and calling lruvec_stat_mod_folio (which will be a noop).
Thank you for your detailed feedback. I wish I had caught these
errors myself, thank you for your time in reviewing my patch.
I hope you have a great rest of your weekend!
Joshua
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
2024-11-09 18:41 ` Joshua Hahn
@ 2024-11-09 20:52 ` SeongJae Park
0 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2024-11-09 20:52 UTC (permalink / raw)
To: Joshua Hahn
Cc: SeongJae Park, shakeel.butt, hannes, mhocko, roman.gushchin,
muchun.song, akpm, cgroups, linux-mm, linux-kernel, kernel-team
On Sat, 9 Nov 2024 13:41:31 -0500 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> Hello SJ, thank you for reviewing my patch!
>
> On Fri, Nov 8, 2024 at 8:03 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > Hi Joshua,
> >
> > On Fri, 8 Nov 2024 13:29:45 -0800 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> >
> > > This patch introduces mem_cgroup_charge_hugetlb, which combines the
> > > logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
> >
> > Nit. Seems the regular expression is not technically correct?
>
> I see, I will change it expand it out to include both. What I meant to
> say is that it combines the functionality of both the functions, but
> I think there was a typo there. I will just expand it out so that it is
> more clear to readers!
Thank you :)
>
> > > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> >
> > Can we add a kernel-doc comment for this function? Maybe that for
> > mem_cgroup_hugetlb_try_charge() can be stolen with only small updates?
>
> Yes, I can definitely add a kernel-doc for this function. Would
> you mind expanding on the "stolen only with small updates" part?
> Do you mean that instead of writing a completely new section
> in the kernel-doc, I can just change the name of the section
> and modify small parts of the description?
You're right. I just thought that might save some of your time if that makes
sense. I don't really mind about this, so do whatever as you prefer :)
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb
2024-11-09 18:58 ` Joshua Hahn
@ 2024-11-11 6:33 ` Shakeel Butt
0 siblings, 0 replies; 18+ messages in thread
From: Shakeel Butt @ 2024-11-11 6:33 UTC (permalink / raw)
To: Joshua Hahn
Cc: hannes, mhocko, roman.gushchin, muchun.song, akpm, cgroups,
linux-mm, linux-kernel, kernel-team
On Sat, Nov 09, 2024 at 01:58:46PM -0500, Joshua Hahn wrote:
[...]
>
> > > + free_huge_folio(folio);
> >
> > free_huge_folio() will call lruvec_stat_mod_folio() unconditionally but
> > you are only calling it on success. This may underflow the metric.
>
> I was actually thinking about this too. I was wondering what would
> make sense -- in the original draft of this patch, I had the charge
> increment be called unconditionally as well. The idea was that
> even though it would not make sense to have the stat incremented
> when there is an error, it would eventually be corrected by
> free_huge_folio's decrement. However, because there is nothing
> stopping the user from checking the stat in this period, they may
> temporarily see that the value is incremented in memory.stat,
> even though they were not able to obtain this page.
>
On the alloc path, unconditionally calling lruvec_stat_mod_folio() after
mem_cgroup_charge_hugetlb() but before free_huge_folio() is fine. Please
note that if mem_cgroup_charge_hugetlb() has failed,
lruvec_stat_mod_folio() will not be incrementing the memcg stat. The
system level stats may get elevated for small time window but that is
fine.
> With that said, maybe it makes sense to increment unconditionally,
> if free also decrements unconditionally. This race condition is
> not something that will cause a huge problem for the user,
> although users relying on userspace monitors for memory.stat
> to handle memory management may see some problems.
>
> Maybe what would make the most sense is to do both
> incrementing & decrementing conditionally as well.
> Thank you for your feedback, I will iterate on this for the next version!
>
> > > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> > > +{
> > > + struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> > > + int ret = 0;
> > > +
> > > + if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
> > > + !memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > > + ret = -EOPNOTSUPP;
> >
> > why EOPNOTSUPP? You need to return 0 here. We do want
> > lruvec_stat_mod_folio() to be called.
>
> In this case, I was just preserving the original code's return statements.
> That is, in mem_cgroup_hugetlb_try_charge, the intended behavior
> was to return -EOPNOTSUPP if any of these conditions were met.
> If I understand the code correctly, calling lruvec_stat_mod_folio()
> on this failure will be a noop, since either memcg doesn't account
> hugetlb folios / there is no memcg / memcg is disabled.
>
lruvec_stat_mod_folio() does manipulate system level stats, so even if
memcg accounting of hugetlb is not enabled we do want system level stats
get updated.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-11-11 6:33 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-08 21:29 [PATCH 0/3] memcg/hugetlb: Rework memcg hugetlb charging Joshua Hahn
2024-11-08 21:29 ` [PATCH 1/3] memcg/hugetlb: Introduce memcg_accounts_hugetlb Joshua Hahn
2024-11-08 22:21 ` Shakeel Butt
2024-11-08 23:03 ` Yosry Ahmed
2024-11-09 18:31 ` Joshua Hahn
2024-11-08 21:29 ` [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb Joshua Hahn
2024-11-08 22:42 ` Shakeel Butt
2024-11-09 18:58 ` Joshua Hahn
2024-11-11 6:33 ` Shakeel Butt
2024-11-09 1:03 ` SeongJae Park
2024-11-09 18:41 ` Joshua Hahn
2024-11-09 20:52 ` SeongJae Park
2024-11-09 3:42 ` kernel test robot
2024-11-09 5:24 ` kernel test robot
2024-11-08 21:29 ` [PATCH 3/3] memcg/hugetlb: Deprecate memcg hugetlb try-commit-cancel protocol Joshua Hahn
2024-11-08 22:43 ` Shakeel Butt
2024-11-08 23:07 ` Yosry Ahmed
2024-11-09 18:34 ` Joshua Hahn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox