[PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations
@ 2023-09-29 18:00 Roman Gushchin
  2023-09-29 18:00 ` [PATCH v1 1/5] mm: kmem: optimize get_obj_cgroup_from_current() Roman Gushchin
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Roman Gushchin @ 2023-09-29 18:00 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Muchun Song, Dennis Zhou, Andrew Morton,
	David Rientjes, Vlastimil Babka, Roman Gushchin

This patchset improves the performance of accounted kernel memory allocations
by ~30% as measured by a micro-benchmark [1]. The benchmark is very
straightforward: 1M of 64 bytes-large kmalloc() allocations.

Below are results with the disabled kernel memory accounting, the original state
and with this patchset applied.

|             | Kmem disabled | Original | Patched |  Delta |
|-------------+---------------+----------+---------+--------|
| User cgroup |         29764 |    84548 |   59078 | -30.0% |
| Root cgroup |         29742 |    48342 |   31501 | -34.8% |

As we can see, the patchset removes the majority of the overhead when there is
no actual accounting (a task belongs to the root memory cgroup) and almost
halves the accounting overhead otherwise.

The main idea is to get rid of unnecessary memcg to objcg conversions and switch
to a scope-based protection of objcgs, which eliminates extra operations with
objcg reference counters under a rcu read lock. More details are provided in
individual commit descriptions.

v1:
	- made the objcg update fully lockless
	- fixed !CONFIG_MMU build issues
rfc:
	https://lwn.net/Articles/945722/

--
[1]:

static int memory_alloc_test(struct seq_file *m, void *v)
{
       unsigned long i, j;
       void **ptrs;
       ktime_t start, end;
       s64 delta, min_delta = LLONG_MAX;

       ptrs = kvmalloc(sizeof(void *) * 1000000, GFP_KERNEL);
       if (!ptrs)
               return -ENOMEM;

       for (j = 0; j < 100; j++) {
               start = ktime_get();
               for (i = 0; i < 1000000; i++)
                       ptrs[i] = kmalloc(64, GFP_KERNEL_ACCOUNT);
               end = ktime_get();

               delta = ktime_us_delta(end, start);
               if (delta < min_delta)
                       min_delta = delta;

               for (i = 0; i < 1000000; i++)
                       kfree(ptrs[i]);
       }

       kvfree(ptrs);
       seq_printf(m, "%lld us\n", min_delta);

       return 0;
}

--

Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>


Roman Gushchin (5):
  mm: kmem: optimize get_obj_cgroup_from_current()
  mm: kmem: add direct objcg pointer to task_struct
  mm: kmem: make memcg keep a reference to the original objcg
  mm: kmem: scoped objcg protection
  percpu: scoped objcg protection

 include/linux/memcontrol.h |  24 ++++-
 include/linux/sched.h      |   4 +
 mm/memcontrol.c            | 184 ++++++++++++++++++++++++++++++++-----
 mm/percpu.c                |   8 +-
 mm/slab.h                  |  10 +-
 5 files changed, 192 insertions(+), 38 deletions(-)

-- 
2.42.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1 1/5] mm: kmem: optimize get_obj_cgroup_from_current()
  2023-09-29 18:00 [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Roman Gushchin
@ 2023-09-29 18:00 ` Roman Gushchin
  2023-10-03 16:48   ` Johannes Weiner
  2023-09-29 18:00 ` [PATCH v1 2/5] mm: kmem: add direct objcg pointer to task_struct Roman Gushchin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Roman Gushchin @ 2023-09-29 18:00 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Muchun Song, Dennis Zhou, Andrew Morton,
	David Rientjes, Vlastimil Babka, Roman Gushchin

Manually inline memcg_kmem_bypass() and active_memcg() to speed up
get_obj_cgroup_from_current() by avoiding duplicate in_task() checks
and active_memcg() readings.

Also add a likely() macro to __get_obj_cgroup_from_memcg():
obj_cgroup_tryget() should succeed at almost all times except a very
unlikely race with the memcg deletion path.

Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeelb@google.com>
---
 mm/memcontrol.c | 34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9741d62d0424..16ac2a5838fb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1068,19 +1068,6 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
 }
 EXPORT_SYMBOL(get_mem_cgroup_from_mm);
 
-static __always_inline bool memcg_kmem_bypass(void)
-{
-	/* Allow remote memcg charging from any context. */
-	if (unlikely(active_memcg()))
-		return false;
-
-	/* Memcg to charge can't be determined. */
-	if (!in_task() || !current->mm || (current->flags & PF_KTHREAD))
-		return true;
-
-	return false;
-}
-
 /**
  * mem_cgroup_iter - iterate over memory cgroup hierarchy
  * @root: hierarchy root
@@ -3007,7 +2994,7 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
 
 	for (; !mem_cgroup_is_root(memcg); memcg = parent_mem_cgroup(memcg)) {
 		objcg = rcu_dereference(memcg->objcg);
-		if (objcg && obj_cgroup_tryget(objcg))
+		if (likely(objcg && obj_cgroup_tryget(objcg)))
 			break;
 		objcg = NULL;
 	}
@@ -3016,16 +3003,23 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
 
 __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 {
-	struct obj_cgroup *objcg = NULL;
 	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 
-	if (memcg_kmem_bypass())
-		return NULL;
+	if (in_task()) {
+		memcg = current->active_memcg;
+
+		/* Memcg to charge can't be determined. */
+		if (likely(!memcg) && (!current->mm || (current->flags & PF_KTHREAD)))
+			return NULL;
+	} else {
+		memcg = this_cpu_read(int_active_memcg);
+		if (likely(!memcg))
+			return NULL;
+	}
 
 	rcu_read_lock();
-	if (unlikely(active_memcg()))
-		memcg = active_memcg();
-	else
+	if (!memcg)
 		memcg = mem_cgroup_from_task(current);
 	objcg = __get_obj_cgroup_from_memcg(memcg);
 	rcu_read_unlock();
-- 
2.42.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/5] mm: kmem: optimize get_obj_cgroup_from_current()
  2023-09-29 18:00 ` [PATCH v1 1/5] mm: kmem: optimize get_obj_cgroup_from_current() Roman Gushchin
@ 2023-10-03 16:48   ` Johannes Weiner
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Weiner @ 2023-10-03 16:48 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, linux-kernel, cgroups, Michal Hocko, Shakeel Butt,
	Muchun Song, Dennis Zhou, Andrew Morton, David Rientjes,
	Vlastimil Babka

On Fri, Sep 29, 2023 at 11:00:51AM -0700, Roman Gushchin wrote:
> Manually inline memcg_kmem_bypass() and active_memcg() to speed up
> get_obj_cgroup_from_current() by avoiding duplicate in_task() checks
> and active_memcg() readings.
> 
> Also add a likely() macro to __get_obj_cgroup_from_memcg():
> obj_cgroup_tryget() should succeed at almost all times except a very
> unlikely race with the memcg deletion path.
> 
> Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
> Acked-by: Shakeel Butt <shakeelb@google.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1 2/5] mm: kmem: add direct objcg pointer to task_struct
  2023-09-29 18:00 [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Roman Gushchin
  2023-09-29 18:00 ` [PATCH v1 1/5] mm: kmem: optimize get_obj_cgroup_from_current() Roman Gushchin
@ 2023-09-29 18:00 ` Roman Gushchin
  2023-10-03 16:59   ` Johannes Weiner
  2023-09-29 18:00 ` [PATCH v1 3/5] mm: kmem: make memcg keep a reference to the original objcg Roman Gushchin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Roman Gushchin @ 2023-09-29 18:00 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Muchun Song, Dennis Zhou, Andrew Morton,
	David Rientjes, Vlastimil Babka, Roman Gushchin

To charge a freshly allocated kernel object to a memory cgroup, the
kernel needs to obtain an objcg pointer. Currently it does it
indirectly by obtaining the memcg pointer first and then calling to
__get_obj_cgroup_from_memcg().

Usually tasks spend their entire life belonging to the same object
cgroup. So it makes sense to save the objcg pointer on task_struct
directly, so it can be obtained faster. It requires some work on fork,
exit and cgroup migrate paths, but these paths are way colder.

To avoid any costly synchronization the following rules are applied:
1) A task sets it's objcg pointer itself.

2) If a task is being migrated to another cgroup, the least
   significant bit of the objcg pointer is set atomically.

3) On the allocation path the objcg pointer is obtained locklessly
   using the READ_ONCE() macro and the least significant bit is
   checked. If it's set, the following procedure is used to update
   it locklessly:
       - task->objcg is zeroed using cmpxcg
       - new objcg pointer is obtained
       - task->objcg is updated using try_cmpxchg
       - operation is repeated if try_cmpxcg fails
   It guarantees that no updates will be lost if task migration
   is racing against objcg pointer update. It also allows to keep
   both read and write paths fully lockless.

Because the task is keeping a reference to the objcg, it can't go away
while the task is alive.

This commit doesn't change the way the remote memcg charging works.

Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
---
 include/linux/memcontrol.h |  10 ++++
 include/linux/sched.h      |   4 ++
 mm/memcontrol.c            | 111 ++++++++++++++++++++++++++++++++++---
 3 files changed, 116 insertions(+), 9 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ab94ad4597d0..1c1ebb269ac1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -553,6 +553,16 @@ static inline bool folio_memcg_kmem(struct folio *folio)
 	return folio->memcg_data & MEMCG_DATA_KMEM;
 }
 
+static inline bool current_objcg_needs_update(struct obj_cgroup *objcg)
+{
+	return (struct obj_cgroup *)((unsigned long)objcg & 0x1);
+}
+
+static inline struct obj_cgroup *
+current_objcg_without_update_flag(struct obj_cgroup *objcg)
+{
+	return (struct obj_cgroup *)((unsigned long)objcg & ~0x1);
+}
 
 #else
 static inline bool folio_memcg_kmem(struct folio *folio)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77f01ac385f7..60de42715b56 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1443,6 +1443,10 @@ struct task_struct {
 	struct mem_cgroup		*active_memcg;
 #endif
 
+#ifdef CONFIG_MEMCG_KMEM
+	struct obj_cgroup		*objcg;
+#endif
+
 #ifdef CONFIG_BLK_CGROUP
 	struct gendisk			*throttle_disk;
 #endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 16ac2a5838fb..ec28f9cfc2f0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3001,6 +3001,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
 	return objcg;
 }
 
+static struct obj_cgroup *current_objcg_update(struct obj_cgroup *old)
+{
+	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg = NULL, *tmp = old;
+
+	old = current_objcg_without_update_flag(old);
+	if (old)
+		obj_cgroup_put(old);
+
+	rcu_read_lock();
+	do {
+		/* Atomically drop the update bit, */
+		WARN_ON_ONCE(cmpxchg(&current->objcg, tmp, 0) != tmp);
+
+		/* ...obtain the new objcg pointer */
+		memcg = mem_cgroup_from_task(current);
+		for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) {
+			objcg = rcu_dereference(memcg->objcg);
+			if (objcg && obj_cgroup_tryget(objcg))
+				break;
+			objcg = NULL;
+		}
+
+		/*
+		 * ...and try atomically set up a new objcg pointer. If it
+		 * fails, it means the update flag was set concurrently, so
+		 * the whole procedure should be repeated.
+		 */
+		tmp = 0;
+	} while (!try_cmpxchg(&current->objcg, &tmp, objcg));
+	rcu_read_unlock();
+
+	return objcg;
+}
+
+static inline void current_objcg_set_needs_update(struct task_struct *task)
+{
+	/* atomically set the update bit */
+	set_bit(0, (unsigned long *)&current->objcg);
+}
+
 __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 {
 	struct mem_cgroup *memcg;
@@ -3008,19 +3049,26 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 
 	if (in_task()) {
 		memcg = current->active_memcg;
+		if (unlikely(memcg))
+			goto from_memcg;
 
-		/* Memcg to charge can't be determined. */
-		if (likely(!memcg) && (!current->mm || (current->flags & PF_KTHREAD)))
-			return NULL;
+		objcg = READ_ONCE(current->objcg);
+		if (unlikely(current_objcg_needs_update(objcg)))
+			objcg = current_objcg_update(objcg);
+
+		if (objcg) {
+			obj_cgroup_get(objcg);
+			return objcg;
+		}
 	} else {
 		memcg = this_cpu_read(int_active_memcg);
-		if (likely(!memcg))
-			return NULL;
+		if (unlikely(memcg))
+			goto from_memcg;
 	}
+	return NULL;
 
+from_memcg:
 	rcu_read_lock();
-	if (!memcg)
-		memcg = mem_cgroup_from_task(current);
 	objcg = __get_obj_cgroup_from_memcg(memcg);
 	rcu_read_unlock();
 	return objcg;
@@ -6345,6 +6393,7 @@ static void mem_cgroup_move_task(void)
 		mem_cgroup_clear_mc();
 	}
 }
+
 #else	/* !CONFIG_MMU */
 static int mem_cgroup_can_attach(struct cgroup_taskset *tset)
 {
@@ -6358,8 +6407,27 @@ static void mem_cgroup_move_task(void)
 }
 #endif
 
+#ifdef CONFIG_MEMCG_KMEM
+static void mem_cgroup_fork(struct task_struct *task)
+{
+	/*
+	 * Set the update flag to cause task->objcg to be initialized lazily
+	 * on the first allocation.
+	 */
+	task->objcg = (struct obj_cgroup *)0x1;
+}
+
+static void mem_cgroup_exit(struct task_struct *task)
+{
+	struct obj_cgroup *objcg = current_objcg_without_update_flag(task->objcg);
+
+	if (objcg)
+		obj_cgroup_put(objcg);
+}
+#endif
+
 #ifdef CONFIG_LRU_GEN
-static void mem_cgroup_attach(struct cgroup_taskset *tset)
+static void mem_cgroup_lru_gen_attach(struct cgroup_taskset *tset)
 {
 	struct task_struct *task;
 	struct cgroup_subsys_state *css;
@@ -6377,10 +6445,29 @@ static void mem_cgroup_attach(struct cgroup_taskset *tset)
 	task_unlock(task);
 }
 #else
+static void mem_cgroup_lru_gen_attach(struct cgroup_taskset *tset) {}
+#endif /* CONFIG_LRU_GEN */
+
+#ifdef CONFIG_MEMCG_KMEM
+static void mem_cgroup_kmem_attach(struct cgroup_taskset *tset)
+{
+	struct task_struct *task;
+	struct cgroup_subsys_state *css;
+
+	cgroup_taskset_for_each(task, css, tset)
+		current_objcg_set_needs_update(task);
+}
+#else
+static void mem_cgroup_kmem_attach(struct cgroup_taskset *tset) {}
+#endif /* CONFIG_MEMCG_KMEM */
+
+#if defined(CONFIG_LRU_GEN) || defined(CONFIG_MEMCG_KMEM)
 static void mem_cgroup_attach(struct cgroup_taskset *tset)
 {
+	mem_cgroup_lru_gen_attach(tset);
+	mem_cgroup_kmem_attach(tset);
 }
-#endif /* CONFIG_LRU_GEN */
+#endif
 
 static int seq_puts_memcg_tunable(struct seq_file *m, unsigned long value)
 {
@@ -6824,9 +6911,15 @@ struct cgroup_subsys memory_cgrp_subsys = {
 	.css_reset = mem_cgroup_css_reset,
 	.css_rstat_flush = mem_cgroup_css_rstat_flush,
 	.can_attach = mem_cgroup_can_attach,
+#if defined(CONFIG_LRU_GEN) || defined(CONFIG_MEMCG_KMEM)
 	.attach = mem_cgroup_attach,
+#endif
 	.cancel_attach = mem_cgroup_cancel_attach,
 	.post_attach = mem_cgroup_move_task,
+#ifdef CONFIG_MEMCG_KMEM
+	.fork = mem_cgroup_fork,
+	.exit = mem_cgroup_exit,
+#endif
 	.dfl_cftypes = memory_files,
 	.legacy_cftypes = mem_cgroup_legacy_files,
 	.early_init = 0,
-- 
2.42.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/5] mm: kmem: add direct objcg pointer to task_struct
  2023-09-29 18:00 ` [PATCH v1 2/5] mm: kmem: add direct objcg pointer to task_struct Roman Gushchin
@ 2023-10-03 16:59   ` Johannes Weiner
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Weiner @ 2023-10-03 16:59 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, linux-kernel, cgroups, Michal Hocko, Shakeel Butt,
	Muchun Song, Dennis Zhou, Andrew Morton, David Rientjes,
	Vlastimil Babka

On Fri, Sep 29, 2023 at 11:00:52AM -0700, Roman Gushchin wrote:
> @@ -553,6 +553,16 @@ static inline bool folio_memcg_kmem(struct folio *folio)
>  	return folio->memcg_data & MEMCG_DATA_KMEM;
>  }
>  
> +static inline bool current_objcg_needs_update(struct obj_cgroup *objcg)
> +{
> +	return (struct obj_cgroup *)((unsigned long)objcg & 0x1);
> +}
> +
> +static inline struct obj_cgroup *
> +current_objcg_without_update_flag(struct obj_cgroup *objcg)
> +{
> +	return (struct obj_cgroup *)((unsigned long)objcg & ~0x1);
> +}

I would slightly prefer naming the bit with a define, and open-coding
the bitops in the current callsites. This makes it clearer that the
actual pointer bits are overloaded in the places where the pointer is
accessed.

> @@ -3001,6 +3001,47 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg)
>  	return objcg;
>  }
>  
> +static struct obj_cgroup *current_objcg_update(struct obj_cgroup *old)
> +{
> +	struct mem_cgroup *memcg;
> +	struct obj_cgroup *objcg = NULL, *tmp = old;
> +
> +	old = current_objcg_without_update_flag(old);
> +	if (old)
> +		obj_cgroup_put(old);
> +
> +	rcu_read_lock();
> +	do {
> +		/* Atomically drop the update bit, */
> +		WARN_ON_ONCE(cmpxchg(&current->objcg, tmp, 0) != tmp);
> +
> +		/* ...obtain the new objcg pointer */
> +		memcg = mem_cgroup_from_task(current);
> +		for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) {
> +			objcg = rcu_dereference(memcg->objcg);
> +			if (objcg && obj_cgroup_tryget(objcg))
> +				break;
> +			objcg = NULL;
> +		}

As per the other thread, it would be great to have a comment here
explaining the scenario(s) when the tryget could fail and we'd have to
defer to an ancestor.

> +
> +		/*
> +		 * ...and try atomically set up a new objcg pointer. If it
> +		 * fails, it means the update flag was set concurrently, so
> +		 * the whole procedure should be repeated.
> +		 */
> +		tmp = 0;
> +	} while (!try_cmpxchg(&current->objcg, &tmp, objcg));
> +	rcu_read_unlock();
> +
> +	return objcg;

Overall this looks great to me.

AFAICS the rcu_read_lock() is needed for the mem_cgroup_from_task()
and tryget(). Is it possible to localize it around these operations?
Or am I missing some other effect it has?

> @@ -6358,8 +6407,27 @@ static void mem_cgroup_move_task(void)
>  }
>  #endif
>  
> +#ifdef CONFIG_MEMCG_KMEM
> +static void mem_cgroup_fork(struct task_struct *task)
> +{
> +	/*
> +	 * Set the update flag to cause task->objcg to be initialized lazily
> +	 * on the first allocation.
> +	 */
> +	task->objcg = (struct obj_cgroup *)0x1;
> +}

I like this open-coding!

Should this mention why it doesn't need to be atomic? Task is in
fork(), no concurrent modifications from allocations or migrations
possible...

None of the feedback is a blocker, though.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1 3/5] mm: kmem: make memcg keep a reference to the original objcg
  2023-09-29 18:00 [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Roman Gushchin
  2023-09-29 18:00 ` [PATCH v1 1/5] mm: kmem: optimize get_obj_cgroup_from_current() Roman Gushchin
  2023-09-29 18:00 ` [PATCH v1 2/5] mm: kmem: add direct objcg pointer to task_struct Roman Gushchin
@ 2023-09-29 18:00 ` Roman Gushchin
  2023-09-29 18:00 ` [PATCH v1 4/5] mm: kmem: scoped objcg protection Roman Gushchin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2023-09-29 18:00 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Muchun Song, Dennis Zhou, Andrew Morton,
	David Rientjes, Vlastimil Babka, Roman Gushchin

Keep a reference to the original objcg object for the entire life
of a memcg structure.

This allows to simplify the synchronization on the kernel memory
allocation paths: pinning a (live) memcg will also pin the
corresponding objcg.

The memory overhead of this change is minimal because object cgroups
usually outlive their corresponding memory cgroups even without this
change, so it's only an additional pointer per memcg.

Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
---
 include/linux/memcontrol.h | 8 +++++++-
 mm/memcontrol.c            | 5 +++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1c1ebb269ac1..e59dea9d8666 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -299,7 +299,13 @@ struct mem_cgroup {
 
 #ifdef CONFIG_MEMCG_KMEM
 	int kmemcg_id;
-	struct obj_cgroup __rcu *objcg;
+	/*
+	 * memcg->objcg is wiped out as a part of the objcg repaprenting
+	 * process. memcg->orig_objcg preserves a pointer (and a reference)
+	 * to the original objcg until the end of live of memcg.
+	 */
+	struct obj_cgroup __rcu	*objcg;
+	struct obj_cgroup	*orig_objcg;
 	/* list of inherited objcgs, protected by objcg_lock */
 	struct list_head objcg_list;
 #endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ec28f9cfc2f0..e9890f6e4da7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3803,6 +3803,8 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
 
 	objcg->memcg = memcg;
 	rcu_assign_pointer(memcg->objcg, objcg);
+	obj_cgroup_get(objcg);
+	memcg->orig_objcg = objcg;
 
 	static_branch_enable(&memcg_kmem_online_key);
 
@@ -5297,6 +5299,9 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 {
 	int node;
 
+	if (memcg->orig_objcg)
+		obj_cgroup_put(memcg->orig_objcg);
+
 	for_each_node(node)
 		free_mem_cgroup_per_node_info(memcg, node);
 	kfree(memcg->vmstats);
-- 
2.42.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1 4/5] mm: kmem: scoped objcg protection
  2023-09-29 18:00 [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Roman Gushchin
                   ` (2 preceding siblings ...)
  2023-09-29 18:00 ` [PATCH v1 3/5] mm: kmem: make memcg keep a reference to the original objcg Roman Gushchin
@ 2023-09-29 18:00 ` Roman Gushchin
  2023-09-29 18:00 ` [PATCH v1 5/5] percpu: " Roman Gushchin
  2023-10-04 18:32 ` [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Michal Koutný
  5 siblings, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2023-09-29 18:00 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Muchun Song, Dennis Zhou, Andrew Morton,
	David Rientjes, Vlastimil Babka, Roman Gushchin

Switch to a scope-based protection of the objcg pointer on slab/kmem
allocation paths. Instead of using the get_() semantics in the
pre-allocation hook and put the reference afterwards, let's rely
on the fact that objcg is pinned by the scope.

It's possible because:
1) if the objcg is received from the current task struct, the task is
   keeping a reference to the objcg.
2) if the objcg is received from an active memcg (remote charging),
   the memcg is pinned by the scope and has a reference to the
   corresponding objcg.

Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
---
 include/linux/memcontrol.h |  6 +++++
 mm/memcontrol.c            | 46 ++++++++++++++++++++++++++++++++++++--
 mm/slab.h                  | 10 +++------
 3 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e59dea9d8666..5a52327ab09a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1779,6 +1779,12 @@ bool mem_cgroup_kmem_disabled(void);
 int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
 void __memcg_kmem_uncharge_page(struct page *page, int order);
 
+/*
+ * The returned objcg pointer is safe to use without additional
+ * protection within a scope, refer to the implementation for the
+ * additional details.
+ */
+struct obj_cgroup *current_obj_cgroup(void);
 struct obj_cgroup *get_obj_cgroup_from_current(void);
 struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e9890f6e4da7..78ab36b5899f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3074,6 +3074,48 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
 	return objcg;
 }
 
+__always_inline struct obj_cgroup *current_obj_cgroup(void)
+{
+	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
+
+	if (in_task()) {
+		memcg = current->active_memcg;
+		if (unlikely(memcg))
+			goto from_memcg;
+
+		objcg = READ_ONCE(current->objcg);
+		if (unlikely(current_objcg_needs_update(objcg)))
+			objcg = current_objcg_update(objcg);
+		/*
+		 * Objcg reference is kept by the task, so it's safe
+		 * to use the objcg by the current task.
+		 */
+		return objcg;
+	} else {
+		memcg = this_cpu_read(int_active_memcg);
+		if (unlikely(memcg))
+			goto from_memcg;
+	}
+	return NULL;
+
+from_memcg:
+	for (; !mem_cgroup_is_root(memcg); memcg = parent_mem_cgroup(memcg)) {
+		/*
+		 * Memcg pointer is protected by scope (see set_active_memcg())
+		 * and is pinning the corresponding objcg, so objcg can't go
+		 * away and can be used within the scope without any additional
+		 * protection.
+		 */
+		objcg = rcu_dereference_check(memcg->objcg, 1);
+		if (likely(objcg))
+			break;
+		objcg = NULL;
+	}
+
+	return objcg;
+}
+
 struct obj_cgroup *get_obj_cgroup_from_folio(struct folio *folio)
 {
 	struct obj_cgroup *objcg;
@@ -3168,15 +3210,15 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 	struct obj_cgroup *objcg;
 	int ret = 0;
 
-	objcg = get_obj_cgroup_from_current();
+	objcg = current_obj_cgroup();
 	if (objcg) {
 		ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
 		if (!ret) {
+			obj_cgroup_get(objcg);
 			page->memcg_data = (unsigned long)objcg |
 				MEMCG_DATA_KMEM;
 			return 0;
 		}
-		obj_cgroup_put(objcg);
 	}
 	return ret;
 }
diff --git a/mm/slab.h b/mm/slab.h
index 799a315695c6..8cd3294fedf5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -484,7 +484,7 @@ static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
 	if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))
 		return true;
 
-	objcg = get_obj_cgroup_from_current();
+	objcg = current_obj_cgroup();
 	if (!objcg)
 		return true;
 
@@ -497,17 +497,14 @@ static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
 		css_put(&memcg->css);
 
 		if (ret)
-			goto out;
+			return false;
 	}
 
 	if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s)))
-		goto out;
+		return false;
 
 	*objcgp = objcg;
 	return true;
-out:
-	obj_cgroup_put(objcg);
-	return false;
 }
 
 static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
@@ -542,7 +539,6 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
 			obj_cgroup_uncharge(objcg, obj_full_size(s));
 		}
 	}
-	obj_cgroup_put(objcg);
 }
 
 static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
-- 
2.42.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1 5/5] percpu: scoped objcg protection
  2023-09-29 18:00 [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Roman Gushchin
                   ` (3 preceding siblings ...)
  2023-09-29 18:00 ` [PATCH v1 4/5] mm: kmem: scoped objcg protection Roman Gushchin
@ 2023-09-29 18:00 ` Roman Gushchin
  2023-10-04 18:32 ` [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Michal Koutný
  5 siblings, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2023-09-29 18:00 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, cgroups, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Muchun Song, Dennis Zhou, Andrew Morton,
	David Rientjes, Vlastimil Babka, Roman Gushchin

Similar to slab and kmem, switch to a scope-based protection of the
objcg pointer to avoid.

Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
---
 mm/percpu.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index a7665de8485f..f53ba692d67a 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1628,14 +1628,12 @@ static bool pcpu_memcg_pre_alloc_hook(size_t size, gfp_t gfp,
 	if (!memcg_kmem_online() || !(gfp & __GFP_ACCOUNT))
 		return true;
 
-	objcg = get_obj_cgroup_from_current();
+	objcg = current_obj_cgroup();
 	if (!objcg)
 		return true;
 
-	if (obj_cgroup_charge(objcg, gfp, pcpu_obj_full_size(size))) {
-		obj_cgroup_put(objcg);
+	if (obj_cgroup_charge(objcg, gfp, pcpu_obj_full_size(size)))
 		return false;
-	}
 
 	*objcgp = objcg;
 	return true;
@@ -1649,6 +1647,7 @@ static void pcpu_memcg_post_alloc_hook(struct obj_cgroup *objcg,
 		return;
 
 	if (likely(chunk && chunk->obj_cgroups)) {
+		obj_cgroup_get(objcg);
 		chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = objcg;
 
 		rcu_read_lock();
@@ -1657,7 +1656,6 @@ static void pcpu_memcg_post_alloc_hook(struct obj_cgroup *objcg,
 		rcu_read_unlock();
 	} else {
 		obj_cgroup_uncharge(objcg, pcpu_obj_full_size(size));
-		obj_cgroup_put(objcg);
 	}
 }
 
-- 
2.42.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations
  2023-09-29 18:00 [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Roman Gushchin
                   ` (4 preceding siblings ...)
  2023-09-29 18:00 ` [PATCH v1 5/5] percpu: " Roman Gushchin
@ 2023-10-04 18:32 ` Michal Koutný
  2023-10-04 19:02   ` Roman Gushchin
  5 siblings, 1 reply; 10+ messages in thread
From: Michal Koutný @ 2023-10-04 18:32 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Muchun Song, Dennis Zhou, Andrew Morton,
	David Rientjes, Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 602 bytes --]

On Fri, Sep 29, 2023 at 11:00:50AM -0700, Roman Gushchin <roman.gushchin@linux.dev> wrote:
> This patchset improves the performance of accounted kernel memory allocations
> by ~30% as measured by a micro-benchmark [1]. The benchmark is very
> straightforward: 1M of 64 bytes-large kmalloc() allocations.

Nice.
Have you tried how these +34% compose with -34% reported way back [1]
when file lock accounting was added (because your benchmark and lock1
sound quite similar)?
(BTW Is that your motivation (too)?)

Thanks,
Michal

[1]  https://lore.kernel.org/r/20210907150757.GE17617@xsang-OptiPlex-9020/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations
  2023-10-04 18:32 ` [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Michal Koutný
@ 2023-10-04 19:02   ` Roman Gushchin
  0 siblings, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2023-10-04 19:02 UTC (permalink / raw)
  To: Michal Koutný
  Cc: linux-mm, linux-kernel, cgroups, Johannes Weiner, Michal Hocko,
	Shakeel Butt, Muchun Song, Dennis Zhou, Andrew Morton,
	David Rientjes, Vlastimil Babka

On Wed, Oct 04, 2023 at 08:32:39PM +0200, Michal Koutný wrote:
> On Fri, Sep 29, 2023 at 11:00:50AM -0700, Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > This patchset improves the performance of accounted kernel memory allocations
> > by ~30% as measured by a micro-benchmark [1]. The benchmark is very
> > straightforward: 1M of 64 bytes-large kmalloc() allocations.
> 
> Nice.

Thanks!

> Have you tried how these +34% compose with -34% reported way back [1]
> when file lock accounting was added (because your benchmark and lock1
> sound quite similar)?

No, I haven't. I'm kindly waiting for an automatic report here :)
But if someone can run these tests manually, I'll appreciate it a lot.

> (BTW Is that your motivation (too)?)

Not really, it was on my todo list for a long time and I just got some spare
cycles to figure out missing parts (mostly around targeted/remote charging).

Also plan to try similar approach to speed up generic memcg charging.

Thanks!


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-10-04 19:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-29 18:00 [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Roman Gushchin
2023-09-29 18:00 ` [PATCH v1 1/5] mm: kmem: optimize get_obj_cgroup_from_current() Roman Gushchin
2023-10-03 16:48   ` Johannes Weiner
2023-09-29 18:00 ` [PATCH v1 2/5] mm: kmem: add direct objcg pointer to task_struct Roman Gushchin
2023-10-03 16:59   ` Johannes Weiner
2023-09-29 18:00 ` [PATCH v1 3/5] mm: kmem: make memcg keep a reference to the original objcg Roman Gushchin
2023-09-29 18:00 ` [PATCH v1 4/5] mm: kmem: scoped objcg protection Roman Gushchin
2023-09-29 18:00 ` [PATCH v1 5/5] percpu: " Roman Gushchin
2023-10-04 18:32 ` [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Michal Koutný
2023-10-04 19:02   ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox