* [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter()
@ 2024-07-24 19:02 Kinsey Ho
2024-07-24 19:02 ` [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal Kinsey Ho
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Kinsey Ho @ 2024-07-24 19:02 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Yosry Ahmed, Roman Gushchin, Kinsey Ho
Incremental cgroup iteration is being used again [1], but incremental
cgroup iteration was introduced for cgroup v1. It hasn't been fully
maintained for many years. This patchset improves the reliability of
mem_cgroup_iter(), along with improving simplicity and code readability.
[1] https://lore.kernel.org/20240514202641.2821494-1-hannes@cmpxchg.org/
Kinsey Ho (4):
mm: don't hold css->refcnt during traversal
mm: increment gen # before restarting traversal
mm: restart if multiple traversals raced
mm: clean up mem_cgroup_iter()
include/linux/memcontrol.h | 6 +--
mm/memcontrol.c | 84 +++++++++++++++-----------------------
2 files changed, 37 insertions(+), 53 deletions(-)
--
2.45.2.1089.g2a221341d9-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal
2024-07-24 19:02 [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter() Kinsey Ho
@ 2024-07-24 19:02 ` Kinsey Ho
2024-07-25 16:33 ` kernel test robot
2024-07-25 20:43 ` Johannes Weiner
2024-07-24 19:02 ` [PATCH mm-unstable v1 2/4] mm: increment gen # before restarting traversal Kinsey Ho
` (3 subsequent siblings)
4 siblings, 2 replies; 15+ messages in thread
From: Kinsey Ho @ 2024-07-24 19:02 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Yosry Ahmed, Roman Gushchin, Kinsey Ho
To obtain the pointer to the saved memcg position, mem_cgroup_iter()
currently holds css->refcnt during memcg traversal only to put
css->refcnt at the end of the routine. This isn't necessary as an
rcu_read_lock is already held throughout the function.
Remove css->refcnt usage during traversal by leveraging RCU.
Signed-off-by: Kinsey Ho <kinseyho@google.com>
---
include/linux/memcontrol.h | 2 +-
mm/memcontrol.c | 18 +-----------------
2 files changed, 2 insertions(+), 18 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 7e2eb091049a..4cbab85e2e56 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -75,7 +75,7 @@ struct lruvec_stats_percpu;
struct lruvec_stats;
struct mem_cgroup_reclaim_iter {
- struct mem_cgroup *position;
+ struct mem_cgroup __rcu *position;
/* scan generation, increased every round-trip */
unsigned int generation;
};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 960371788687..062bfeee799c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1019,20 +1019,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
else if (reclaim->generation != iter->generation)
goto out_unlock;
- while (1) {
- pos = READ_ONCE(iter->position);
- if (!pos || css_tryget(&pos->css))
- break;
- /*
- * css reference reached zero, so iter->position will
- * be cleared by ->css_released. However, we should not
- * rely on this happening soon, because ->css_released
- * is called from a work queue, and by busy-waiting we
- * might block it. So we clear iter->position right
- * away.
- */
- (void)cmpxchg(&iter->position, pos, NULL);
- }
+ pos = rcu_dereference(iter->position);
} else if (prev) {
pos = prev;
}
@@ -1073,9 +1060,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
*/
(void)cmpxchg(&iter->position, pos, memcg);
- if (pos)
- css_put(&pos->css);
-
if (!memcg)
iter->generation++;
}
--
2.45.2.1089.g2a221341d9-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH mm-unstable v1 2/4] mm: increment gen # before restarting traversal
2024-07-24 19:02 [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter() Kinsey Ho
2024-07-24 19:02 ` [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal Kinsey Ho
@ 2024-07-24 19:02 ` Kinsey Ho
2024-07-25 20:53 ` Johannes Weiner
2024-07-24 19:02 ` [PATCH mm-unstable v1 3/4] mm: restart if multiple traversals raced Kinsey Ho
` (2 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Kinsey Ho @ 2024-07-24 19:02 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Yosry Ahmed, Roman Gushchin, Kinsey Ho
The generation number in struct mem_cgroup_reclaim_iter should be
incremented on every round-trip. Currently, it is possible for a
concurrent reclaimer to jump in at the end of the hierarchy, causing a
traversal restart (resetting the iteration position) without
incrementing the generation number.
Move the traversal restart such that the generation number is
incremented before the restart.
Signed-off-by: Kinsey Ho <kinseyho@google.com>
---
mm/memcontrol.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 062bfeee799c..f672bc47c6b5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1003,7 +1003,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
root = root_mem_cgroup;
rcu_read_lock();
-
+restart:
if (reclaim) {
struct mem_cgroup_per_node *mz;
@@ -1030,14 +1030,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
for (;;) {
css = css_next_descendant_pre(css, &root->css);
if (!css) {
- /*
- * Reclaimers share the hierarchy walk, and a
- * new one might jump in right at the end of
- * the hierarchy - make sure they see at least
- * one group and restart from the beginning.
- */
- if (!prev)
- continue;
break;
}
@@ -1060,8 +1052,18 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
*/
(void)cmpxchg(&iter->position, pos, memcg);
- if (!memcg)
+ if (!memcg) {
iter->generation++;
+
+ /*
+ * Reclaimers share the hierarchy walk, and a
+ * new one might jump in right at the end of
+ * the hierarchy - make sure they see at least
+ * one group and restart from the beginning.
+ */
+ if (!prev)
+ goto restart;
+ }
}
out_unlock:
--
2.45.2.1089.g2a221341d9-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH mm-unstable v1 3/4] mm: restart if multiple traversals raced
2024-07-24 19:02 [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter() Kinsey Ho
2024-07-24 19:02 ` [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal Kinsey Ho
2024-07-24 19:02 ` [PATCH mm-unstable v1 2/4] mm: increment gen # before restarting traversal Kinsey Ho
@ 2024-07-24 19:02 ` Kinsey Ho
2024-07-24 19:02 ` [PATCH mm-unstable v1 4/4] mm: clean up mem_cgroup_iter() Kinsey Ho
2024-07-25 0:25 ` [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter() Roman Gushchin
4 siblings, 0 replies; 15+ messages in thread
From: Kinsey Ho @ 2024-07-24 19:02 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Yosry Ahmed, Roman Gushchin, Kinsey Ho
Currently, if multiple reclaimers raced on the same position, the
reclaimers which detect the race will still reclaim from the same memcg.
Instead, the reclaimers which detect the race should move on to the next
memcg in the hierarchy.
So, in the case where multiple traversals race, jump back to the start
of the mem_cgroup_iter() function to find the next memcg in the
hierarchy to reclaim from.
Signed-off-by: Kinsey Ho <kinseyho@google.com>
---
include/linux/memcontrol.h | 4 ++--
mm/memcontrol.c | 14 ++++++++++----
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4cbab85e2e56..2b354abe6d48 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -57,7 +57,7 @@ enum memcg_memory_event {
struct mem_cgroup_reclaim_cookie {
pg_data_t *pgdat;
- unsigned int generation;
+ int generation;
};
#ifdef CONFIG_MEMCG
@@ -77,7 +77,7 @@ struct lruvec_stats;
struct mem_cgroup_reclaim_iter {
struct mem_cgroup __rcu *position;
/* scan generation, increased every round-trip */
- unsigned int generation;
+ atomic_t generation;
};
/*
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f672bc47c6b5..4314a2b8848d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1005,18 +1005,20 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
rcu_read_lock();
restart:
if (reclaim) {
+ int gen;
struct mem_cgroup_per_node *mz;
mz = root->nodeinfo[reclaim->pgdat->node_id];
iter = &mz->iter;
+ gen = atomic_read(&iter->generation);
/*
* On start, join the current reclaim iteration cycle.
* Exit when a concurrent walker completes it.
*/
if (!prev)
- reclaim->generation = iter->generation;
- else if (reclaim->generation != iter->generation)
+ reclaim->generation = gen;
+ else if (reclaim->generation != gen)
goto out_unlock;
pos = rcu_dereference(iter->position);
@@ -1050,10 +1052,14 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
* thread, so check that the value hasn't changed since we read
* it to avoid reclaiming from the same cgroup twice.
*/
- (void)cmpxchg(&iter->position, pos, memcg);
+ if (cmpxchg(&iter->position, pos, memcg) != pos) {
+ if (css && css != &root->css)
+ css_put(css);
+ goto restart;
+ }
if (!memcg) {
- iter->generation++;
+ atomic_inc(&iter->generation);
/*
* Reclaimers share the hierarchy walk, and a
--
2.45.2.1089.g2a221341d9-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH mm-unstable v1 4/4] mm: clean up mem_cgroup_iter()
2024-07-24 19:02 [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter() Kinsey Ho
` (2 preceding siblings ...)
2024-07-24 19:02 ` [PATCH mm-unstable v1 3/4] mm: restart if multiple traversals raced Kinsey Ho
@ 2024-07-24 19:02 ` Kinsey Ho
2024-07-25 18:15 ` kernel test robot
2024-07-25 0:25 ` [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter() Roman Gushchin
4 siblings, 1 reply; 15+ messages in thread
From: Kinsey Ho @ 2024-07-24 19:02 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, Yosry Ahmed, Roman Gushchin, Kinsey Ho
A clean up to make variable names more clear and to improve code
readability.
No functional change.
Signed-off-by: Kinsey Ho <kinseyho@google.com>
---
mm/memcontrol.c | 36 ++++++++++++++----------------------
1 file changed, 14 insertions(+), 22 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4314a2b8848d..7e3e95c62122 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -992,9 +992,9 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
struct mem_cgroup_reclaim_cookie *reclaim)
{
struct mem_cgroup_reclaim_iter *iter;
- struct cgroup_subsys_state *css = NULL;
- struct mem_cgroup *memcg = NULL;
- struct mem_cgroup *pos = NULL;
+ struct cgroup_subsys_state *css;
+ struct mem_cgroup *pos;
+ struct mem_cgroup *next = NULL;
if (mem_cgroup_disabled())
return NULL;
@@ -1006,10 +1006,9 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
restart:
if (reclaim) {
int gen;
- struct mem_cgroup_per_node *mz;
+ int nid = reclaim->pgdat->node_id;
- mz = root->nodeinfo[reclaim->pgdat->node_id];
- iter = &mz->iter;
+ iter = &root->nodeinfo[nid]->iter;
gen = atomic_read(&iter->generation);
/*
@@ -1022,43 +1021,36 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
goto out_unlock;
pos = rcu_dereference(iter->position);
- } else if (prev) {
+ } else
pos = prev;
- }
- if (pos)
- css = &pos->css;
-
- for (;;) {
- css = css_next_descendant_pre(css, &root->css);
- if (!css) {
- break;
- }
+ css = pos ? &pos->css : NULL;
+ while ((css = css_next_descendant_pre(css, &root->css))) {
/*
* Verify the css and acquire a reference. The root
* is provided by the caller, so we know it's alive
* and kicking, and don't take an extra reference.
*/
- if (css == &root->css || css_tryget(css)) {
- memcg = mem_cgroup_from_css(css);
+ if (css == &root->css || css_tryget(css))
break;
- }
}
+ next = mem_cgroup_from_css(css);
+
if (reclaim) {
/*
* The position could have already been updated by a competing
* thread, so check that the value hasn't changed since we read
* it to avoid reclaiming from the same cgroup twice.
*/
- if (cmpxchg(&iter->position, pos, memcg) != pos) {
+ if (cmpxchg(&iter->position, pos, next) != pos) {
if (css && css != &root->css)
css_put(css);
goto restart;
}
- if (!memcg) {
+ if (!next) {
atomic_inc(&iter->generation);
/*
@@ -1077,7 +1069,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
if (prev && prev != root)
css_put(&prev->css);
- return memcg;
+ return next;
}
/**
--
2.45.2.1089.g2a221341d9-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter()
2024-07-24 19:02 [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter() Kinsey Ho
` (3 preceding siblings ...)
2024-07-24 19:02 ` [PATCH mm-unstable v1 4/4] mm: clean up mem_cgroup_iter() Kinsey Ho
@ 2024-07-25 0:25 ` Roman Gushchin
4 siblings, 0 replies; 15+ messages in thread
From: Roman Gushchin @ 2024-07-25 0:25 UTC (permalink / raw)
To: Kinsey Ho
Cc: Andrew Morton, linux-mm, linux-kernel, Yosry Ahmed,
Johannes Weiner, Michal Hocko, Shakeel Butt, Muchun Song
cc memcg maintainers
On Wed, Jul 24, 2024 at 07:02:10PM +0000, Kinsey Ho wrote:
> Incremental cgroup iteration is being used again [1], but incremental
> cgroup iteration was introduced for cgroup v1. It hasn't been fully
> maintained for many years.
This is a bold statement :)
> This patchset improves the reliability of
> mem_cgroup_iter(), along with improving simplicity and code readability.
>
> [1] https://lore.kernel.org/20240514202641.2821494-1-hannes@cmpxchg.org/
>
> Kinsey Ho (4):
> mm: don't hold css->refcnt during traversal
> mm: increment gen # before restarting traversal
> mm: restart if multiple traversals raced
> mm: clean up mem_cgroup_iter()
But the series looks great to me!
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
for the series.
Thanks!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal
2024-07-24 19:02 ` [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal Kinsey Ho
@ 2024-07-25 16:33 ` kernel test robot
2024-07-25 20:43 ` Johannes Weiner
1 sibling, 0 replies; 15+ messages in thread
From: kernel test robot @ 2024-07-25 16:33 UTC (permalink / raw)
To: Kinsey Ho, Andrew Morton
Cc: oe-kbuild-all, Linux Memory Management List, linux-kernel,
Yosry Ahmed, Roman Gushchin, Kinsey Ho
Hi Kinsey,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v6.10 next-20240725]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Kinsey-Ho/mm-don-t-hold-css-refcnt-during-traversal/20240725-030750
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20240724190214.1108049-2-kinseyho%40google.com
patch subject: [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal
config: x86_64-randconfig-121-20240725 (https://download.01.org/0day-ci/archive/20240726/202407260047.sIuRFLJE-lkp@intel.com/config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240726/202407260047.sIuRFLJE-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407260047.sIuRFLJE-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
>> mm/memcontrol.c:1063:23: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct mem_cgroup [noderef] __rcu *__old @@ got struct mem_cgroup *[assigned] pos @@
mm/memcontrol.c:1063:23: sparse: expected struct mem_cgroup [noderef] __rcu *__old
mm/memcontrol.c:1063:23: sparse: got struct mem_cgroup *[assigned] pos
mm/memcontrol.c:1063:23: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct mem_cgroup [noderef] __rcu *__new @@ got struct mem_cgroup *[assigned] memcg @@
mm/memcontrol.c:1063:23: sparse: expected struct mem_cgroup [noderef] __rcu *__new
mm/memcontrol.c:1063:23: sparse: got struct mem_cgroup *[assigned] memcg
>> mm/memcontrol.c:1101:17: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct mem_cgroup [noderef] __rcu *__old @@ got struct mem_cgroup *dead_memcg @@
mm/memcontrol.c:1101:17: sparse: expected struct mem_cgroup [noderef] __rcu *__old
mm/memcontrol.c:1101:17: sparse: got struct mem_cgroup *dead_memcg
mm/memcontrol.c: note: in included file (through include/linux/cgroup-defs.h):
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
mm/memcontrol.c: note: in included file:
include/linux/memcontrol.h:747:9: sparse: sparse: context imbalance in 'folio_lruvec_lock' - wrong count at exit
include/linux/memcontrol.h:747:9: sparse: sparse: context imbalance in 'folio_lruvec_lock_irq' - wrong count at exit
include/linux/memcontrol.h:747:9: sparse: sparse: context imbalance in 'folio_lruvec_lock_irqsave' - wrong count at exit
mm/memcontrol.c: note: in included file (through include/linux/cgroup-defs.h):
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
vim +1063 mm/memcontrol.c
4b569387c0d566 Nhat Pham 2023-10-06 974
5660048ccac873 Johannes Weiner 2012-01-12 975 /**
5660048ccac873 Johannes Weiner 2012-01-12 976 * mem_cgroup_iter - iterate over memory cgroup hierarchy
5660048ccac873 Johannes Weiner 2012-01-12 977 * @root: hierarchy root
5660048ccac873 Johannes Weiner 2012-01-12 978 * @prev: previously returned memcg, NULL on first invocation
5660048ccac873 Johannes Weiner 2012-01-12 979 * @reclaim: cookie for shared reclaim walks, NULL for full walks
5660048ccac873 Johannes Weiner 2012-01-12 980 *
5660048ccac873 Johannes Weiner 2012-01-12 981 * Returns references to children of the hierarchy below @root, or
5660048ccac873 Johannes Weiner 2012-01-12 982 * @root itself, or %NULL after a full round-trip.
5660048ccac873 Johannes Weiner 2012-01-12 983 *
5660048ccac873 Johannes Weiner 2012-01-12 984 * Caller must pass the return value in @prev on subsequent
5660048ccac873 Johannes Weiner 2012-01-12 985 * invocations for reference counting, or use mem_cgroup_iter_break()
5660048ccac873 Johannes Weiner 2012-01-12 986 * to cancel a hierarchy walk before the round-trip is complete.
5660048ccac873 Johannes Weiner 2012-01-12 987 *
05bdc520b3ad39 Miaohe Lin 2020-10-13 988 * Reclaimers can specify a node in @reclaim to divide up the memcgs
05bdc520b3ad39 Miaohe Lin 2020-10-13 989 * in the hierarchy among all concurrent reclaimers operating on the
05bdc520b3ad39 Miaohe Lin 2020-10-13 990 * same node.
5660048ccac873 Johannes Weiner 2012-01-12 991 */
694fbc0fe78518 Andrew Morton 2013-09-24 992 struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
9f3a0d0933de07 Johannes Weiner 2012-01-12 993 struct mem_cgroup *prev,
694fbc0fe78518 Andrew Morton 2013-09-24 994 struct mem_cgroup_reclaim_cookie *reclaim)
7d74b06f240f1b KAMEZAWA Hiroyuki 2010-10-27 995 {
3f649ab728cda8 Kees Cook 2020-06-03 996 struct mem_cgroup_reclaim_iter *iter;
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 997 struct cgroup_subsys_state *css = NULL;
9f3a0d0933de07 Johannes Weiner 2012-01-12 998 struct mem_cgroup *memcg = NULL;
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 999 struct mem_cgroup *pos = NULL;
711d3d2c9bc3fb KAMEZAWA Hiroyuki 2010-10-27 1000
694fbc0fe78518 Andrew Morton 2013-09-24 1001 if (mem_cgroup_disabled())
694fbc0fe78518 Andrew Morton 2013-09-24 1002 return NULL;
5660048ccac873 Johannes Weiner 2012-01-12 1003
9f3a0d0933de07 Johannes Weiner 2012-01-12 1004 if (!root)
9f3a0d0933de07 Johannes Weiner 2012-01-12 1005 root = root_mem_cgroup;
9f3a0d0933de07 Johannes Weiner 2012-01-12 1006
542f85f9ae4acd Michal Hocko 2013-04-29 1007 rcu_read_lock();
14067bb3e24b96 KAMEZAWA Hiroyuki 2009-04-02 1008
527a5ec9a53471 Johannes Weiner 2012-01-12 1009 if (reclaim) {
ef8f2327996b5c Mel Gorman 2016-07-28 1010 struct mem_cgroup_per_node *mz;
527a5ec9a53471 Johannes Weiner 2012-01-12 1011
a3747b53b1771a Johannes Weiner 2021-04-29 1012 mz = root->nodeinfo[reclaim->pgdat->node_id];
9da83f3fc74b80 Yafang Shao 2019-11-30 1013 iter = &mz->iter;
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1014
a9320aae68a1cd Wei Yang 2022-04-28 1015 /*
a9320aae68a1cd Wei Yang 2022-04-28 1016 * On start, join the current reclaim iteration cycle.
a9320aae68a1cd Wei Yang 2022-04-28 1017 * Exit when a concurrent walker completes it.
a9320aae68a1cd Wei Yang 2022-04-28 1018 */
a9320aae68a1cd Wei Yang 2022-04-28 1019 if (!prev)
a9320aae68a1cd Wei Yang 2022-04-28 1020 reclaim->generation = iter->generation;
a9320aae68a1cd Wei Yang 2022-04-28 1021 else if (reclaim->generation != iter->generation)
542f85f9ae4acd Michal Hocko 2013-04-29 1022 goto out_unlock;
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1023
fe6faac115aa89 Kinsey Ho 2024-07-24 1024 pos = rcu_dereference(iter->position);
89d8330ccf2ad4 Wei Yang 2022-04-28 1025 } else if (prev) {
89d8330ccf2ad4 Wei Yang 2022-04-28 1026 pos = prev;
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1027 }
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1028
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1029 if (pos)
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1030 css = &pos->css;
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1031
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1032 for (;;) {
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1033 css = css_next_descendant_pre(css, &root->css);
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1034 if (!css) {
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1035 /*
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1036 * Reclaimers share the hierarchy walk, and a
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1037 * new one might jump in right at the end of
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1038 * the hierarchy - make sure they see at least
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1039 * one group and restart from the beginning.
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1040 */
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1041 if (!prev)
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1042 continue;
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1043 break;
542f85f9ae4acd Michal Hocko 2013-04-29 1044 }
5f578161971863 Michal Hocko 2013-04-29 1045
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1046 /*
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1047 * Verify the css and acquire a reference. The root
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1048 * is provided by the caller, so we know it's alive
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1049 * and kicking, and don't take an extra reference.
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1050 */
41555dadbff8d2 Wei Yang 2022-04-28 1051 if (css == &root->css || css_tryget(css)) {
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1052 memcg = mem_cgroup_from_css(css);
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1053 break;
41555dadbff8d2 Wei Yang 2022-04-28 1054 }
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1055 }
9f3a0d0933de07 Johannes Weiner 2012-01-12 1056
527a5ec9a53471 Johannes Weiner 2012-01-12 1057 if (reclaim) {
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1058 /*
6df38689e0e9a0 Vladimir Davydov 2015-12-29 1059 * The position could have already been updated by a competing
6df38689e0e9a0 Vladimir Davydov 2015-12-29 1060 * thread, so check that the value hasn't changed since we read
6df38689e0e9a0 Vladimir Davydov 2015-12-29 1061 * it to avoid reclaiming from the same cgroup twice.
5ac8fb31ad2ebd Johannes Weiner 2014-12-10 1062 */
6df38689e0e9a0 Vladimir Davydov 2015-12-29 @1063 (void)cmpxchg(&iter->position, pos, memcg);
6df38689e0e9a0 Vladimir Davydov 2015-12-29 1064
19f39402864ea3 Michal Hocko 2013-04-29 1065 if (!memcg)
527a5ec9a53471 Johannes Weiner 2012-01-12 1066 iter->generation++;
527a5ec9a53471 Johannes Weiner 2012-01-12 1067 }
14067bb3e24b96 KAMEZAWA Hiroyuki 2009-04-02 1068
542f85f9ae4acd Michal Hocko 2013-04-29 1069 out_unlock:
542f85f9ae4acd Michal Hocko 2013-04-29 1070 rcu_read_unlock();
c40046f3ad5e87 Michal Hocko 2013-04-29 1071 if (prev && prev != root)
c40046f3ad5e87 Michal Hocko 2013-04-29 1072 css_put(&prev->css);
c40046f3ad5e87 Michal Hocko 2013-04-29 1073
9f3a0d0933de07 Johannes Weiner 2012-01-12 1074 return memcg;
9f3a0d0933de07 Johannes Weiner 2012-01-12 1075 }
14067bb3e24b96 KAMEZAWA Hiroyuki 2009-04-02 1076
5660048ccac873 Johannes Weiner 2012-01-12 1077 /**
5660048ccac873 Johannes Weiner 2012-01-12 1078 * mem_cgroup_iter_break - abort a hierarchy walk prematurely
5660048ccac873 Johannes Weiner 2012-01-12 1079 * @root: hierarchy root
5660048ccac873 Johannes Weiner 2012-01-12 1080 * @prev: last visited hierarchy member as returned by mem_cgroup_iter()
5660048ccac873 Johannes Weiner 2012-01-12 1081 */
5660048ccac873 Johannes Weiner 2012-01-12 1082 void mem_cgroup_iter_break(struct mem_cgroup *root,
9f3a0d0933de07 Johannes Weiner 2012-01-12 1083 struct mem_cgroup *prev)
9f3a0d0933de07 Johannes Weiner 2012-01-12 1084 {
711d3d2c9bc3fb KAMEZAWA Hiroyuki 2010-10-27 1085 if (!root)
711d3d2c9bc3fb KAMEZAWA Hiroyuki 2010-10-27 1086 root = root_mem_cgroup;
9f3a0d0933de07 Johannes Weiner 2012-01-12 1087 if (prev && prev != root)
9f3a0d0933de07 Johannes Weiner 2012-01-12 1088 css_put(&prev->css);
14067bb3e24b96 KAMEZAWA Hiroyuki 2009-04-02 1089 }
9f3a0d0933de07 Johannes Weiner 2012-01-12 1090
54a83d6bcbf8f4 Miles Chen 2019-08-13 1091 static void __invalidate_reclaim_iterators(struct mem_cgroup *from,
54a83d6bcbf8f4 Miles Chen 2019-08-13 1092 struct mem_cgroup *dead_memcg)
6df38689e0e9a0 Vladimir Davydov 2015-12-29 1093 {
6df38689e0e9a0 Vladimir Davydov 2015-12-29 1094 struct mem_cgroup_reclaim_iter *iter;
ef8f2327996b5c Mel Gorman 2016-07-28 1095 struct mem_cgroup_per_node *mz;
ef8f2327996b5c Mel Gorman 2016-07-28 1096 int nid;
6df38689e0e9a0 Vladimir Davydov 2015-12-29 1097
6df38689e0e9a0 Vladimir Davydov 2015-12-29 1098 for_each_node(nid) {
a3747b53b1771a Johannes Weiner 2021-04-29 1099 mz = from->nodeinfo[nid];
9da83f3fc74b80 Yafang Shao 2019-11-30 1100 iter = &mz->iter;
9da83f3fc74b80 Yafang Shao 2019-11-30 @1101 cmpxchg(&iter->position, dead_memcg, NULL);
6df38689e0e9a0 Vladimir Davydov 2015-12-29 1102 }
6df38689e0e9a0 Vladimir Davydov 2015-12-29 1103 }
54a83d6bcbf8f4 Miles Chen 2019-08-13 1104
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mm-unstable v1 4/4] mm: clean up mem_cgroup_iter()
2024-07-24 19:02 ` [PATCH mm-unstable v1 4/4] mm: clean up mem_cgroup_iter() Kinsey Ho
@ 2024-07-25 18:15 ` kernel test robot
0 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2024-07-25 18:15 UTC (permalink / raw)
To: Kinsey Ho, Andrew Morton
Cc: oe-kbuild-all, Linux Memory Management List, linux-kernel,
Yosry Ahmed, Roman Gushchin, Kinsey Ho
Hi Kinsey,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v6.10 next-20240725]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Kinsey-Ho/mm-don-t-hold-css-refcnt-during-traversal/20240725-030750
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20240724190214.1108049-5-kinseyho%40google.com
patch subject: [PATCH mm-unstable v1 4/4] mm: clean up mem_cgroup_iter()
config: x86_64-randconfig-121-20240725 (https://download.01.org/0day-ci/archive/20240726/202407260248.CBU1JMb1-lkp@intel.com/config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240726/202407260248.CBU1JMb1-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407260248.CBU1JMb1-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
mm/memcontrol.c:1049:21: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct mem_cgroup [noderef] __rcu *__old @@ got struct mem_cgroup *[assigned] pos @@
mm/memcontrol.c:1049:21: sparse: expected struct mem_cgroup [noderef] __rcu *__old
mm/memcontrol.c:1049:21: sparse: got struct mem_cgroup *[assigned] pos
>> mm/memcontrol.c:1049:21: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct mem_cgroup [noderef] __rcu *__new @@ got struct mem_cgroup *[assigned] next @@
mm/memcontrol.c:1049:21: sparse: expected struct mem_cgroup [noderef] __rcu *__new
mm/memcontrol.c:1049:21: sparse: got struct mem_cgroup *[assigned] next
mm/memcontrol.c:1049:57: sparse: sparse: incompatible types in comparison expression (different address spaces):
mm/memcontrol.c:1049:57: sparse: struct mem_cgroup [noderef] __rcu *
mm/memcontrol.c:1049:57: sparse: struct mem_cgroup *
mm/memcontrol.c:1101:17: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct mem_cgroup [noderef] __rcu *__old @@ got struct mem_cgroup *dead_memcg @@
mm/memcontrol.c:1101:17: sparse: expected struct mem_cgroup [noderef] __rcu *__old
mm/memcontrol.c:1101:17: sparse: got struct mem_cgroup *dead_memcg
mm/memcontrol.c: note: in included file (through include/linux/cgroup-defs.h):
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
mm/memcontrol.c: note: in included file:
include/linux/memcontrol.h:747:9: sparse: sparse: context imbalance in 'folio_lruvec_lock' - wrong count at exit
include/linux/memcontrol.h:747:9: sparse: sparse: context imbalance in 'folio_lruvec_lock_irq' - wrong count at exit
include/linux/memcontrol.h:747:9: sparse: sparse: context imbalance in 'folio_lruvec_lock_irqsave' - wrong count at exit
mm/memcontrol.c: note: in included file (through include/linux/cgroup-defs.h):
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
vim +1049 mm/memcontrol.c
974
975 /**
976 * mem_cgroup_iter - iterate over memory cgroup hierarchy
977 * @root: hierarchy root
978 * @prev: previously returned memcg, NULL on first invocation
979 * @reclaim: cookie for shared reclaim walks, NULL for full walks
980 *
981 * Returns references to children of the hierarchy below @root, or
982 * @root itself, or %NULL after a full round-trip.
983 *
984 * Caller must pass the return value in @prev on subsequent
985 * invocations for reference counting, or use mem_cgroup_iter_break()
986 * to cancel a hierarchy walk before the round-trip is complete.
987 *
988 * Reclaimers can specify a node in @reclaim to divide up the memcgs
989 * in the hierarchy among all concurrent reclaimers operating on the
990 * same node.
991 */
992 struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
993 struct mem_cgroup *prev,
994 struct mem_cgroup_reclaim_cookie *reclaim)
995 {
996 struct mem_cgroup_reclaim_iter *iter;
997 struct cgroup_subsys_state *css;
998 struct mem_cgroup *pos;
999 struct mem_cgroup *next = NULL;
1000
1001 if (mem_cgroup_disabled())
1002 return NULL;
1003
1004 if (!root)
1005 root = root_mem_cgroup;
1006
1007 rcu_read_lock();
1008 restart:
1009 if (reclaim) {
1010 int gen;
1011 int nid = reclaim->pgdat->node_id;
1012
1013 iter = &root->nodeinfo[nid]->iter;
1014 gen = atomic_read(&iter->generation);
1015
1016 /*
1017 * On start, join the current reclaim iteration cycle.
1018 * Exit when a concurrent walker completes it.
1019 */
1020 if (!prev)
1021 reclaim->generation = gen;
1022 else if (reclaim->generation != gen)
1023 goto out_unlock;
1024
1025 pos = rcu_dereference(iter->position);
1026 } else
1027 pos = prev;
1028
1029 css = pos ? &pos->css : NULL;
1030
1031 while ((css = css_next_descendant_pre(css, &root->css))) {
1032 /*
1033 * Verify the css and acquire a reference. The root
1034 * is provided by the caller, so we know it's alive
1035 * and kicking, and don't take an extra reference.
1036 */
1037 if (css == &root->css || css_tryget(css))
1038 break;
1039 }
1040
1041 next = mem_cgroup_from_css(css);
1042
1043 if (reclaim) {
1044 /*
1045 * The position could have already been updated by a competing
1046 * thread, so check that the value hasn't changed since we read
1047 * it to avoid reclaiming from the same cgroup twice.
1048 */
> 1049 if (cmpxchg(&iter->position, pos, next) != pos) {
1050 if (css && css != &root->css)
1051 css_put(css);
1052 goto restart;
1053 }
1054
1055 if (!next) {
1056 atomic_inc(&iter->generation);
1057
1058 /*
1059 * Reclaimers share the hierarchy walk, and a
1060 * new one might jump in right at the end of
1061 * the hierarchy - make sure they see at least
1062 * one group and restart from the beginning.
1063 */
1064 if (!prev)
1065 goto restart;
1066 }
1067 }
1068
1069 out_unlock:
1070 rcu_read_unlock();
1071 if (prev && prev != root)
1072 css_put(&prev->css);
1073
1074 return next;
1075 }
1076
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal
2024-07-24 19:02 ` [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal Kinsey Ho
2024-07-25 16:33 ` kernel test robot
@ 2024-07-25 20:43 ` Johannes Weiner
2024-07-25 21:09 ` Roman Gushchin
2024-07-25 22:33 ` Yosry Ahmed
1 sibling, 2 replies; 15+ messages in thread
From: Johannes Weiner @ 2024-07-25 20:43 UTC (permalink / raw)
To: Kinsey Ho
Cc: Andrew Morton, linux-mm, linux-kernel, Yosry Ahmed, Roman Gushchin
On Wed, Jul 24, 2024 at 07:02:11PM +0000, Kinsey Ho wrote:
> To obtain the pointer to the saved memcg position, mem_cgroup_iter()
> currently holds css->refcnt during memcg traversal only to put
> css->refcnt at the end of the routine. This isn't necessary as an
> rcu_read_lock is already held throughout the function.
>
> Remove css->refcnt usage during traversal by leveraging RCU.
Eh, I don't know about this.
RCU ensures that the css memory isn't freed.
The tryget ensures that the css is still alive and valid.
In this case, it just so happens that the sibling linkage is also rcu
protected. But accessing random css members when the refcount is 0 is
kind of sketchy. On the other hand, the refcount is guaranteed to be
valid, and rcu + tryget is a common pattern.
What does this buy us? The tryget is cheap.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mm-unstable v1 2/4] mm: increment gen # before restarting traversal
2024-07-24 19:02 ` [PATCH mm-unstable v1 2/4] mm: increment gen # before restarting traversal Kinsey Ho
@ 2024-07-25 20:53 ` Johannes Weiner
0 siblings, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2024-07-25 20:53 UTC (permalink / raw)
To: Kinsey Ho
Cc: Andrew Morton, linux-mm, linux-kernel, Yosry Ahmed, Roman Gushchin
On Wed, Jul 24, 2024 at 07:02:12PM +0000, Kinsey Ho wrote:
> The generation number in struct mem_cgroup_reclaim_iter should be
> incremented on every round-trip. Currently, it is possible for a
> concurrent reclaimer to jump in at the end of the hierarchy, causing a
> traversal restart (resetting the iteration position) without
> incrementing the generation number.
>
> Move the traversal restart such that the generation number is
> incremented before the restart.
>
> Signed-off-by: Kinsey Ho <kinseyho@google.com>
The consequence of resetting the position without bumping the
generation would be that another ongoing mem_cgroup_iter() could walk
the tree twice, which is undesirable. It would be good to spell that
out in the changelog.
Otherwise, looks good to me.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal
2024-07-25 20:43 ` Johannes Weiner
@ 2024-07-25 21:09 ` Roman Gushchin
2024-07-25 22:33 ` Yosry Ahmed
1 sibling, 0 replies; 15+ messages in thread
From: Roman Gushchin @ 2024-07-25 21:09 UTC (permalink / raw)
To: Johannes Weiner
Cc: Kinsey Ho, Andrew Morton, linux-mm, linux-kernel, Yosry Ahmed
On Thu, Jul 25, 2024 at 04:43:46PM -0400, Johannes Weiner wrote:
> On Wed, Jul 24, 2024 at 07:02:11PM +0000, Kinsey Ho wrote:
> > To obtain the pointer to the saved memcg position, mem_cgroup_iter()
> > currently holds css->refcnt during memcg traversal only to put
> > css->refcnt at the end of the routine. This isn't necessary as an
> > rcu_read_lock is already held throughout the function.
> >
> > Remove css->refcnt usage during traversal by leveraging RCU.
>
> Eh, I don't know about this.
>
> RCU ensures that the css memory isn't freed.
>
> The tryget ensures that the css is still alive and valid.
>
> In this case, it just so happens that the sibling linkage is also rcu
> protected. But accessing random css members when the refcount is 0 is
> kind of sketchy. On the other hand, the refcount is guaranteed to be
> valid, and rcu + tryget is a common pattern.
I also spent quite some time thinking about potential bad consequences,
but _it seems_ to be safe (but I agree it feels dangerous).
>
> What does this buy us? The tryget is cheap.
To be fair, tryget is not always cheap. Offline/dying cgroups have an atomic
operation there.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal
2024-07-25 20:43 ` Johannes Weiner
2024-07-25 21:09 ` Roman Gushchin
@ 2024-07-25 22:33 ` Yosry Ahmed
2024-08-01 21:46 ` Kinsey Ho
2024-08-01 22:32 ` Kinsey Ho
1 sibling, 2 replies; 15+ messages in thread
From: Yosry Ahmed @ 2024-07-25 22:33 UTC (permalink / raw)
To: Johannes Weiner
Cc: Kinsey Ho, Andrew Morton, linux-mm, linux-kernel, Roman Gushchin
On Thu, Jul 25, 2024 at 1:43 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Jul 24, 2024 at 07:02:11PM +0000, Kinsey Ho wrote:
> > To obtain the pointer to the saved memcg position, mem_cgroup_iter()
> > currently holds css->refcnt during memcg traversal only to put
> > css->refcnt at the end of the routine. This isn't necessary as an
> > rcu_read_lock is already held throughout the function.
> >
> > Remove css->refcnt usage during traversal by leveraging RCU.
>
> Eh, I don't know about this.
>
> RCU ensures that the css memory isn't freed.
>
> The tryget ensures that the css is still alive and valid.
>
> In this case, it just so happens that the sibling linkage is also rcu
> protected. But accessing random css members when the refcount is 0 is
> kind of sketchy. On the other hand, the refcount is guaranteed to be
> valid, and rcu + tryget is a common pattern.
To be fair, the documentation of css_next_descendant_pre() mentions
that the requirements are:
- Either cgroup_mutex or RCU lock is held.
- Both @pos and @root are accessible.
- @pos is a descendant of @root.
This reads to me like it is intentional that RCU protection is enough
for @pos and @root, and that the sibling linkage is RCU protected by
design. Perhaps we could clarify this further (whether at
css_next_descendant_pre(), or above the definition of the linkage
members).
>
> What does this buy us? The tryget is cheap.
mem_cgroup_iter() is not an easy function to follow, so I personally
appreciate the simplicity gains tbh.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal
2024-07-25 22:33 ` Yosry Ahmed
@ 2024-08-01 21:46 ` Kinsey Ho
2024-08-01 22:32 ` Kinsey Ho
1 sibling, 0 replies; 15+ messages in thread
From: Kinsey Ho @ 2024-08-01 21:46 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Johannes Weiner, Andrew Morton, linux-mm, linux-kernel, Roman Gushchin
[-- Attachment #1: Type: text/plain, Size: 1093 bytes --]
Thank you Johannes, Roman, and Yosry for reviewing this patch!
On Thu, Jul 25, 2024 at 3:34 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> On Thu, Jul 25, 2024 at 1:43 PM Johannes Weiner <hannes@cmpxchg.org>
wrote:
> > What does this buy us? The tryget is cheap.
>
> mem_cgroup_iter() is not an easy function to follow, so I personally
> appreciate the simplicity gains tbh.
Yes, the main intention here was to simplify the code’s readability.
> This reads to me like it is intentional that RCU protection is enough
> for @pos and @root, and that the sibling linkage is RCU protected by
> design. Perhaps we could clarify this further (whether at
> css_next_descendant_pre(), or above the definition of the linkage
> members).
Do we want to move forward with Yosry’s suggestion to clarify that the
sibling linkage is RCU-protected by design? Perhaps this clarification can
be made in the definition of the linkage members so that the safety of the
css in this function is more clear to users. If this is sufficient, I will
make the change in a v2 patchset.
[-- Attachment #2: Type: text/html, Size: 1349 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal
2024-07-25 22:33 ` Yosry Ahmed
2024-08-01 21:46 ` Kinsey Ho
@ 2024-08-01 22:32 ` Kinsey Ho
2024-08-02 1:28 ` Johannes Weiner
1 sibling, 1 reply; 15+ messages in thread
From: Kinsey Ho @ 2024-08-01 22:32 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Johannes Weiner, Andrew Morton, linux-mm, linux-kernel, Roman Gushchin
Sorry, I replied to this email earlier but it had some issues with plain
text. Please ignore the first reply of mine (the one with HTML). I'm resending
the email below.
Thank you Johannes, Roman, and Yosry for reviewing this patch!
On Thu, Jul 25, 2024 at 3:34 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> On Thu, Jul 25, 2024 at 1:43 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > What does this buy us? The tryget is cheap.
>
> mem_cgroup_iter() is not an easy function to follow, so I personally
> appreciate the simplicity gains tbh.
Yes, the main intention here was to simplify the code's readability.
> This reads to me like it is intentional that RCU protection is enough
> for @pos and @root, and that the sibling linkage is RCU protected by
> design. Perhaps we could clarify this further (whether at
> css_next_descendant_pre(), or above the definition of the linkage
> members).
Do we want to move forward with Yosry's suggestion to clarify that the
sibling linkage is RCU-protected by design? Perhaps this clarification
can be made in the definition of the linkage members so that the
safety of the css in this function is more clear to users. If this is
sufficient, I will make the change in a v2 patchset.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal
2024-08-01 22:32 ` Kinsey Ho
@ 2024-08-02 1:28 ` Johannes Weiner
0 siblings, 0 replies; 15+ messages in thread
From: Johannes Weiner @ 2024-08-02 1:28 UTC (permalink / raw)
To: Kinsey Ho
Cc: Yosry Ahmed, Andrew Morton, linux-mm, linux-kernel, Roman Gushchin
On Thu, Aug 01, 2024 at 03:32:53PM -0700, Kinsey Ho wrote:
> Sorry, I replied to this email earlier but it had some issues with plain
> text. Please ignore the first reply of mine (the one with HTML). I'm resending
> the email below.
>
> Thank you Johannes, Roman, and Yosry for reviewing this patch!
>
> On Thu, Jul 25, 2024 at 3:34 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > On Thu, Jul 25, 2024 at 1:43 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > What does this buy us? The tryget is cheap.
> >
> > mem_cgroup_iter() is not an easy function to follow, so I personally
> > appreciate the simplicity gains tbh.
>
> Yes, the main intention here was to simplify the code's readability.
>
> > This reads to me like it is intentional that RCU protection is enough
> > for @pos and @root, and that the sibling linkage is RCU protected by
> > design. Perhaps we could clarify this further (whether at
> > css_next_descendant_pre(), or above the definition of the linkage
> > members).
>
> Do we want to move forward with Yosry's suggestion to clarify that the
> sibling linkage is RCU-protected by design? Perhaps this clarification
> can be made in the definition of the linkage members so that the
> safety of the css in this function is more clear to users. If this is
> sufficient, I will make the change in a v2 patchset.
Yes, that sounds like a good way forward to me.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-08-02 1:28 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-24 19:02 [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter() Kinsey Ho
2024-07-24 19:02 ` [PATCH mm-unstable v1 1/4] mm: don't hold css->refcnt during traversal Kinsey Ho
2024-07-25 16:33 ` kernel test robot
2024-07-25 20:43 ` Johannes Weiner
2024-07-25 21:09 ` Roman Gushchin
2024-07-25 22:33 ` Yosry Ahmed
2024-08-01 21:46 ` Kinsey Ho
2024-08-01 22:32 ` Kinsey Ho
2024-08-02 1:28 ` Johannes Weiner
2024-07-24 19:02 ` [PATCH mm-unstable v1 2/4] mm: increment gen # before restarting traversal Kinsey Ho
2024-07-25 20:53 ` Johannes Weiner
2024-07-24 19:02 ` [PATCH mm-unstable v1 3/4] mm: restart if multiple traversals raced Kinsey Ho
2024-07-24 19:02 ` [PATCH mm-unstable v1 4/4] mm: clean up mem_cgroup_iter() Kinsey Ho
2024-07-25 18:15 ` kernel test robot
2024-07-25 0:25 ` [PATCH mm-unstable v1 0/4] Improve mem_cgroup_iter() Roman Gushchin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox