linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/3] Kernel stack usage histogram
@ 2024-07-24 20:33 Pasha Tatashin
  2024-07-24 20:33 ` [PATCH v5 1/3] memcg: increase the valid index range for memcg stats Pasha Tatashin
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Pasha Tatashin @ 2024-07-24 20:33 UTC (permalink / raw)
  To: akpm, jpoimboe, pasha.tatashin, kent.overstreet, peterz, nphamcs,
	cerasuolodomenico, surenb, lizhijian, willy, shakeel.butt,
	vbabka, ziy, linux-kernel, linux-mm

Provide histogram of stack sizes for the exited threads:
Example outputs:
Intel:
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0

ARM with 64K page_size:
$ grep kstack /proc/vmstat
kstack_1k 1
kstack_2k 340
kstack_4k 25212
kstack_8k 1659
kstack_16k 0
kstack_32k 0
kstack_64k 0

Changelog:
v5:
Uninlined stack_not_used() and kstack_histogram()  per mailing list
discussion
Use count_vm_event() instead of this_cpu_inc()
Increase memcg_event limit from s8_max to u8_max.

Pasha Tatashin (2):
  vmstat: Kernel stack usage histogram
  task_stack: uninline stack_not_used

Shakeel Butt (1):
  memcg: increase the valid index range for memcg stats

 include/linux/sched/task_stack.h | 18 ++--------
 include/linux/vm_event_item.h    | 24 ++++++++++++++
 kernel/exit.c                    | 57 ++++++++++++++++++++++++++++++++
 kernel/sched/core.c              |  4 +--
 mm/memcontrol.c                  | 50 ++++++++++++++++------------
 mm/vmstat.c                      | 24 ++++++++++++++
 6 files changed, 137 insertions(+), 40 deletions(-)

-- 
2.45.2.1089.g2a221341d9-goog



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 1/3] memcg: increase the valid index range for memcg stats
  2024-07-24 20:33 [PATCH v5 0/3] Kernel stack usage histogram Pasha Tatashin
@ 2024-07-24 20:33 ` Pasha Tatashin
  2024-07-24 23:17   ` Yosry Ahmed
  2024-07-24 20:33 ` [PATCH v5 2/3] vmstat: Kernel stack usage histogram Pasha Tatashin
  2024-07-24 20:33 ` [PATCH v5 3/3] task_stack: uninline stack_not_used Pasha Tatashin
  2 siblings, 1 reply; 10+ messages in thread
From: Pasha Tatashin @ 2024-07-24 20:33 UTC (permalink / raw)
  To: akpm, jpoimboe, pasha.tatashin, kent.overstreet, peterz, nphamcs,
	cerasuolodomenico, surenb, lizhijian, willy, shakeel.butt,
	vbabka, ziy, linux-kernel, linux-mm

From: Shakeel Butt <shakeel.butt@linux.dev>

At the moment the valid index for the indirection tables for memcg stats
and events is < S8_MAX. These indirection tables are used in performance
critical codepaths. With the latest addition to the vm_events, the
NR_VM_EVENT_ITEMS has gone over S8_MAX. One way to resolve is to
increase the entry size of the indirection table from int8_t to int16_t
but this will increase the potential number of cachelines needed to
access the indirection table.

This patch took a different approach and make the valid index < U8_MAX.
In this way the size of the indirection tables will remain same and we
only need to invalid index check from less than 0 to equal to U8_MAX.
In this approach we have also removed a subtraction from the performance
critical codepaths.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 mm/memcontrol.c | 50 +++++++++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 960371788687..2fdeece7f1f8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -320,24 +320,27 @@ static const unsigned int memcg_stat_items[] = {
 #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
 #define MEMCG_VMSTAT_SIZE (NR_MEMCG_NODE_STAT_ITEMS + \
 			   ARRAY_SIZE(memcg_stat_items))
-static int8_t mem_cgroup_stats_index[MEMCG_NR_STAT] __read_mostly;
+#define IS_INVALID(index) ((index) == U8_MAX)
+static u8 mem_cgroup_stats_index[MEMCG_NR_STAT] __read_mostly;
 
 static void init_memcg_stats(void)
 {
-	int8_t i, j = 0;
+	u8 i, j = 0;
 
-	BUILD_BUG_ON(MEMCG_NR_STAT >= S8_MAX);
+	BUILD_BUG_ON(MEMCG_NR_STAT >= U8_MAX);
 
-	for (i = 0; i < NR_MEMCG_NODE_STAT_ITEMS; ++i)
-		mem_cgroup_stats_index[memcg_node_stat_items[i]] = ++j;
+	memset(mem_cgroup_stats_index, U8_MAX, sizeof(mem_cgroup_stats_index));
 
-	for (i = 0; i < ARRAY_SIZE(memcg_stat_items); ++i)
-		mem_cgroup_stats_index[memcg_stat_items[i]] = ++j;
+	for (i = 0; i < NR_MEMCG_NODE_STAT_ITEMS; ++i, ++j)
+		mem_cgroup_stats_index[memcg_node_stat_items[i]] = j;
+
+	for (i = 0; i < ARRAY_SIZE(memcg_stat_items); ++i, ++j)
+		mem_cgroup_stats_index[memcg_stat_items[i]] = j;
 }
 
 static inline int memcg_stats_index(int idx)
 {
-	return mem_cgroup_stats_index[idx] - 1;
+	return mem_cgroup_stats_index[idx];
 }
 
 struct lruvec_stats_percpu {
@@ -369,7 +372,7 @@ unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx)
 		return node_page_state(lruvec_pgdat(lruvec), idx);
 
 	i = memcg_stats_index(idx);
-	if (WARN_ONCE(i < 0, "%s: missing stat item %d\n", __func__, idx))
+	if (WARN_ONCE(IS_INVALID(i), "%s: missing stat item %d\n", __func__, idx))
 		return 0;
 
 	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
@@ -392,7 +395,7 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
 		return node_page_state(lruvec_pgdat(lruvec), idx);
 
 	i = memcg_stats_index(idx);
-	if (WARN_ONCE(i < 0, "%s: missing stat item %d\n", __func__, idx))
+	if (WARN_ONCE(IS_INVALID(i), "%s: missing stat item %d\n", __func__, idx))
 		return 0;
 
 	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
@@ -435,21 +438,24 @@ static const unsigned int memcg_vm_event_stat[] = {
 };
 
 #define NR_MEMCG_EVENTS ARRAY_SIZE(memcg_vm_event_stat)
-static int8_t mem_cgroup_events_index[NR_VM_EVENT_ITEMS] __read_mostly;
+static u8 mem_cgroup_events_index[NR_VM_EVENT_ITEMS] __read_mostly;
 
 static void init_memcg_events(void)
 {
-	int8_t i;
+	u8 i;
+
+	BUILD_BUG_ON(NR_VM_EVENT_ITEMS >= U8_MAX);
 
-	BUILD_BUG_ON(NR_VM_EVENT_ITEMS >= S8_MAX);
+	memset(mem_cgroup_events_index, U8_MAX,
+	       sizeof(mem_cgroup_events_index));
 
 	for (i = 0; i < NR_MEMCG_EVENTS; ++i)
-		mem_cgroup_events_index[memcg_vm_event_stat[i]] = i + 1;
+		mem_cgroup_events_index[memcg_vm_event_stat[i]] = i;
 }
 
 static inline int memcg_events_index(enum vm_event_item idx)
 {
-	return mem_cgroup_events_index[idx] - 1;
+	return mem_cgroup_events_index[idx];
 }
 
 struct memcg_vmstats_percpu {
@@ -621,7 +627,7 @@ unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
 	long x;
 	int i = memcg_stats_index(idx);
 
-	if (WARN_ONCE(i < 0, "%s: missing stat item %d\n", __func__, idx))
+	if (WARN_ONCE(IS_INVALID(i), "%s: missing stat item %d\n", __func__, idx))
 		return 0;
 
 	x = READ_ONCE(memcg->vmstats->state[i]);
@@ -662,7 +668,7 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
 	if (mem_cgroup_disabled())
 		return;
 
-	if (WARN_ONCE(i < 0, "%s: missing stat item %d\n", __func__, idx))
+	if (WARN_ONCE(IS_INVALID(i), "%s: missing stat item %d\n", __func__, idx))
 		return;
 
 	__this_cpu_add(memcg->vmstats_percpu->state[i], val);
@@ -675,7 +681,7 @@ unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx)
 	long x;
 	int i = memcg_stats_index(idx);
 
-	if (WARN_ONCE(i < 0, "%s: missing stat item %d\n", __func__, idx))
+	if (WARN_ONCE(IS_INVALID(i), "%s: missing stat item %d\n", __func__, idx))
 		return 0;
 
 	x = READ_ONCE(memcg->vmstats->state_local[i]);
@@ -694,7 +700,7 @@ static void __mod_memcg_lruvec_state(struct lruvec *lruvec,
 	struct mem_cgroup *memcg;
 	int i = memcg_stats_index(idx);
 
-	if (WARN_ONCE(i < 0, "%s: missing stat item %d\n", __func__, idx))
+	if (WARN_ONCE(IS_INVALID(i), "%s: missing stat item %d\n", __func__, idx))
 		return;
 
 	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
@@ -810,7 +816,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
 	if (mem_cgroup_disabled())
 		return;
 
-	if (WARN_ONCE(i < 0, "%s: missing stat item %d\n", __func__, idx))
+	if (WARN_ONCE(IS_INVALID(i), "%s: missing stat item %d\n", __func__, idx))
 		return;
 
 	memcg_stats_lock();
@@ -823,7 +829,7 @@ unsigned long memcg_events(struct mem_cgroup *memcg, int event)
 {
 	int i = memcg_events_index(event);
 
-	if (WARN_ONCE(i < 0, "%s: missing stat item %d\n", __func__, event))
+	if (WARN_ONCE(IS_INVALID(i), "%s: missing stat item %d\n", __func__, event))
 		return 0;
 
 	return READ_ONCE(memcg->vmstats->events[i]);
@@ -833,7 +839,7 @@ unsigned long memcg_events_local(struct mem_cgroup *memcg, int event)
 {
 	int i = memcg_events_index(event);
 
-	if (WARN_ONCE(i < 0, "%s: missing stat item %d\n", __func__, event))
+	if (WARN_ONCE(IS_INVALID(i), "%s: missing stat item %d\n", __func__, event))
 		return 0;
 
 	return READ_ONCE(memcg->vmstats->events_local[i]);
-- 
2.45.2.1089.g2a221341d9-goog



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 2/3] vmstat: Kernel stack usage histogram
  2024-07-24 20:33 [PATCH v5 0/3] Kernel stack usage histogram Pasha Tatashin
  2024-07-24 20:33 ` [PATCH v5 1/3] memcg: increase the valid index range for memcg stats Pasha Tatashin
@ 2024-07-24 20:33 ` Pasha Tatashin
  2024-07-24 20:49   ` Shakeel Butt
  2024-07-24 20:33 ` [PATCH v5 3/3] task_stack: uninline stack_not_used Pasha Tatashin
  2 siblings, 1 reply; 10+ messages in thread
From: Pasha Tatashin @ 2024-07-24 20:33 UTC (permalink / raw)
  To: akpm, jpoimboe, pasha.tatashin, kent.overstreet, peterz, nphamcs,
	cerasuolodomenico, surenb, lizhijian, willy, shakeel.butt,
	vbabka, ziy, linux-kernel, linux-mm

As part of the dynamic kernel stack project, we need to know the amount
of data that can be saved by reducing the default kernel stack size [1].

Provide a kernel stack usage histogram to aid in optimizing kernel stack
sizes and minimizing memory waste in large-scale environments. The
histogram divides stack usage into power-of-two buckets and reports the
results in /proc/vmstat. This information is especially valuable in
environments with millions of machines, where even small optimizations
can have a significant impact.

The histogram data is presented in /proc/vmstat with entries like
"kstack_1k", "kstack_2k", and so on, indicating the number of threads
that exited with stack usage falling within each respective bucket.

Example outputs:
Intel:
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0

ARM with 64K page_size:
$ grep kstack /proc/vmstat
kstack_1k 1
kstack_2k 340
kstack_4k 25212
kstack_8k 1659
kstack_16k 0
kstack_32k 0
kstack_64k 0

Note: once the dynamic kernel stack is implemented it will depend on the
implementation the usability of this feature: On hardware that supports
faults on kernel stacks, we will have other metrics that show the total
number of pages allocated for stacks. On hardware where faults are not
supported, we will most likely have some optimization where only some
threads are extended, and for those, these metrics will still be very
useful.

[1] https://lwn.net/Articles/974367

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 include/linux/vm_event_item.h | 24 ++++++++++++++++++++++
 kernel/exit.c                 | 38 +++++++++++++++++++++++++++++++++++
 mm/vmstat.c                   | 24 ++++++++++++++++++++++
 3 files changed, 86 insertions(+)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 747943bc8cc2..37ad1c16367a 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -154,6 +154,30 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		VMA_LOCK_RETRY,
 		VMA_LOCK_MISS,
 #endif
+#ifdef CONFIG_DEBUG_STACK_USAGE
+		KSTACK_1K,
+#if THREAD_SIZE > 1024
+		KSTACK_2K,
+#endif
+#if THREAD_SIZE > 2048
+		KSTACK_4K,
+#endif
+#if THREAD_SIZE > 4096
+		KSTACK_8K,
+#endif
+#if THREAD_SIZE > 8192
+		KSTACK_16K,
+#endif
+#if THREAD_SIZE > 16384
+		KSTACK_32K,
+#endif
+#if THREAD_SIZE > 32768
+		KSTACK_64K,
+#endif
+#if THREAD_SIZE > 65536
+		KSTACK_REST,
+#endif
+#endif /* CONFIG_DEBUG_STACK_USAGE */
 		NR_VM_EVENT_ITEMS
 };
 
diff --git a/kernel/exit.c b/kernel/exit.c
index 7430852a8571..64bfc2bae55b 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -778,6 +778,43 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 }
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
+/* Count the maximum pages reached in kernel stacks */
+static inline void kstack_histogram(unsigned long used_stack)
+{
+#ifdef CONFIG_VM_EVENT_COUNTERS
+	if (used_stack <= 1024)
+		count_vm_event(KSTACK_1K);
+#if THREAD_SIZE > 1024
+	else if (used_stack <= 2048)
+		count_vm_event(KSTACK_2K);
+#endif
+#if THREAD_SIZE > 2048
+	else if (used_stack <= 4096)
+		count_vm_event(KSTACK_4K);
+#endif
+#if THREAD_SIZE > 4096
+	else if (used_stack <= 8192)
+		count_vm_event(KSTACK_8K);
+#endif
+#if THREAD_SIZE > 8192
+	else if (used_stack <= 16384)
+		count_vm_event(KSTACK_16K);
+#endif
+#if THREAD_SIZE > 16384
+	else if (used_stack <= 32768)
+		count_vm_event(KSTACK_32K);
+#endif
+#if THREAD_SIZE > 32768
+	else if (used_stack <= 65536)
+		count_vm_event(KSTACK_64K);
+#endif
+#if THREAD_SIZE > 65536
+	else
+		count_vm_event(KSTACK_REST);
+#endif
+#endif /* CONFIG_VM_EVENT_COUNTERS */
+}
+
 static void check_stack_usage(void)
 {
 	static DEFINE_SPINLOCK(low_water_lock);
@@ -785,6 +822,7 @@ static void check_stack_usage(void)
 	unsigned long free;
 
 	free = stack_not_used(current);
+	kstack_histogram(THREAD_SIZE - free);
 
 	if (free >= lowest_to_date)
 		return;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 73d791d1caad..6e3347789eb2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1417,6 +1417,30 @@ const char * const vmstat_text[] = {
 	"vma_lock_retry",
 	"vma_lock_miss",
 #endif
+#ifdef CONFIG_DEBUG_STACK_USAGE
+	"kstack_1k",
+#if THREAD_SIZE > 1024
+	"kstack_2k",
+#endif
+#if THREAD_SIZE > 2048
+	"kstack_4k",
+#endif
+#if THREAD_SIZE > 4096
+	"kstack_8k",
+#endif
+#if THREAD_SIZE > 8192
+	"kstack_16k",
+#endif
+#if THREAD_SIZE > 16384
+	"kstack_32k",
+#endif
+#if THREAD_SIZE > 32768
+	"kstack_64k",
+#endif
+#if THREAD_SIZE > 65536
+	"kstack_rest",
+#endif
+#endif
 #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */
 };
 #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */
-- 
2.45.2.1089.g2a221341d9-goog



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 3/3] task_stack: uninline stack_not_used
  2024-07-24 20:33 [PATCH v5 0/3] Kernel stack usage histogram Pasha Tatashin
  2024-07-24 20:33 ` [PATCH v5 1/3] memcg: increase the valid index range for memcg stats Pasha Tatashin
  2024-07-24 20:33 ` [PATCH v5 2/3] vmstat: Kernel stack usage histogram Pasha Tatashin
@ 2024-07-24 20:33 ` Pasha Tatashin
  2024-07-24 20:50   ` Shakeel Butt
  2 siblings, 1 reply; 10+ messages in thread
From: Pasha Tatashin @ 2024-07-24 20:33 UTC (permalink / raw)
  To: akpm, jpoimboe, pasha.tatashin, kent.overstreet, peterz, nphamcs,
	cerasuolodomenico, surenb, lizhijian, willy, shakeel.butt,
	vbabka, ziy, linux-kernel, linux-mm

Given that stack_not_used() is not performance critical function
uninline it.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 include/linux/sched/task_stack.h | 18 +++---------------
 kernel/exit.c                    | 19 +++++++++++++++++++
 kernel/sched/core.c              |  4 +---
 3 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h
index ccd72b978e1f..bf10bdb487dd 100644
--- a/include/linux/sched/task_stack.h
+++ b/include/linux/sched/task_stack.h
@@ -95,23 +95,11 @@ static inline int object_is_on_stack(const void *obj)
 extern void thread_stack_cache_init(void);
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
+unsigned long stack_not_used(struct task_struct *p);
+#else
 static inline unsigned long stack_not_used(struct task_struct *p)
 {
-	unsigned long *n = end_of_stack(p);
-
-	do { 	/* Skip over canary */
-# ifdef CONFIG_STACK_GROWSUP
-		n--;
-# else
-		n++;
-# endif
-	} while (!*n);
-
-# ifdef CONFIG_STACK_GROWSUP
-	return (unsigned long)end_of_stack(p) - (unsigned long)n;
-# else
-	return (unsigned long)n - (unsigned long)end_of_stack(p);
-# endif
+	return 0;
 }
 #endif
 extern void set_task_stack_end_magic(struct task_struct *tsk);
diff --git a/kernel/exit.c b/kernel/exit.c
index 64bfc2bae55b..45085a0e7c16 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -778,6 +778,25 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 }
 
 #ifdef CONFIG_DEBUG_STACK_USAGE
+unsigned long stack_not_used(struct task_struct *p)
+{
+	unsigned long *n = end_of_stack(p);
+
+	do {	/* Skip over canary */
+# ifdef CONFIG_STACK_GROWSUP
+		n--;
+# else
+		n++;
+# endif
+	} while (!*n);
+
+# ifdef CONFIG_STACK_GROWSUP
+	return (unsigned long)end_of_stack(p) - (unsigned long)n;
+# else
+	return (unsigned long)n - (unsigned long)end_of_stack(p);
+# endif
+}
+
 /* Count the maximum pages reached in kernel stacks */
 static inline void kstack_histogram(unsigned long used_stack)
 {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ae5ef3013a55..f5861f64e960 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7405,7 +7405,7 @@ EXPORT_SYMBOL(io_schedule);
 
 void sched_show_task(struct task_struct *p)
 {
-	unsigned long free = 0;
+	unsigned long free;
 	int ppid;
 
 	if (!try_get_task_stack(p))
@@ -7415,9 +7415,7 @@ void sched_show_task(struct task_struct *p)
 
 	if (task_is_running(p))
 		pr_cont("  running task    ");
-#ifdef CONFIG_DEBUG_STACK_USAGE
 	free = stack_not_used(p);
-#endif
 	ppid = 0;
 	rcu_read_lock();
 	if (pid_alive(p))
-- 
2.45.2.1089.g2a221341d9-goog



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 2/3] vmstat: Kernel stack usage histogram
  2024-07-24 20:33 ` [PATCH v5 2/3] vmstat: Kernel stack usage histogram Pasha Tatashin
@ 2024-07-24 20:49   ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2024-07-24 20:49 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: akpm, jpoimboe, kent.overstreet, peterz, nphamcs,
	cerasuolodomenico, surenb, lizhijian, willy, vbabka, ziy,
	linux-kernel, linux-mm

On Wed, Jul 24, 2024 at 08:33:21PM GMT, Pasha Tatashin wrote:
> As part of the dynamic kernel stack project, we need to know the amount
> of data that can be saved by reducing the default kernel stack size [1].
> 
> Provide a kernel stack usage histogram to aid in optimizing kernel stack
> sizes and minimizing memory waste in large-scale environments. The
> histogram divides stack usage into power-of-two buckets and reports the
> results in /proc/vmstat. This information is especially valuable in
> environments with millions of machines, where even small optimizations
> can have a significant impact.
> 
> The histogram data is presented in /proc/vmstat with entries like
> "kstack_1k", "kstack_2k", and so on, indicating the number of threads
> that exited with stack usage falling within each respective bucket.
> 
> Example outputs:
> Intel:
> $ grep kstack /proc/vmstat
> kstack_1k 3
> kstack_2k 188
> kstack_4k 11391
> kstack_8k 243
> kstack_16k 0
> 
> ARM with 64K page_size:
> $ grep kstack /proc/vmstat
> kstack_1k 1
> kstack_2k 340
> kstack_4k 25212
> kstack_8k 1659
> kstack_16k 0
> kstack_32k 0
> kstack_64k 0
> 
> Note: once the dynamic kernel stack is implemented it will depend on the
> implementation the usability of this feature: On hardware that supports
> faults on kernel stacks, we will have other metrics that show the total
> number of pages allocated for stacks. On hardware where faults are not
> supported, we will most likely have some optimization where only some
> threads are extended, and for those, these metrics will still be very
> useful.
> 
> [1] https://lwn.net/Articles/974367
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 3/3] task_stack: uninline stack_not_used
  2024-07-24 20:33 ` [PATCH v5 3/3] task_stack: uninline stack_not_used Pasha Tatashin
@ 2024-07-24 20:50   ` Shakeel Butt
  2024-07-25 14:30     ` Pasha Tatashin
  0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2024-07-24 20:50 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: akpm, jpoimboe, kent.overstreet, peterz, nphamcs,
	cerasuolodomenico, surenb, lizhijian, willy, vbabka, ziy,
	linux-kernel, linux-mm

On Wed, Jul 24, 2024 at 08:33:22PM GMT, Pasha Tatashin wrote:
> Given that stack_not_used() is not performance critical function
> uninline it.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 1/3] memcg: increase the valid index range for memcg stats
  2024-07-24 20:33 ` [PATCH v5 1/3] memcg: increase the valid index range for memcg stats Pasha Tatashin
@ 2024-07-24 23:17   ` Yosry Ahmed
  2024-07-25 14:25     ` Pasha Tatashin
  0 siblings, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2024-07-24 23:17 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: akpm, jpoimboe, kent.overstreet, peterz, nphamcs,
	cerasuolodomenico, surenb, lizhijian, willy, shakeel.butt,
	vbabka, ziy, linux-kernel, linux-mm

On Wed, Jul 24, 2024 at 1:33 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> From: Shakeel Butt <shakeel.butt@linux.dev>
>
> At the moment the valid index for the indirection tables for memcg stats
> and events is < S8_MAX. These indirection tables are used in performance
> critical codepaths. With the latest addition to the vm_events, the
> NR_VM_EVENT_ITEMS has gone over S8_MAX. One way to resolve is to
> increase the entry size of the indirection table from int8_t to int16_t
> but this will increase the potential number of cachelines needed to
> access the indirection table.
>
> This patch took a different approach and make the valid index < U8_MAX.
> In this way the size of the indirection tables will remain same and we
> only need to invalid index check from less than 0 to equal to U8_MAX.
> In this approach we have also removed a subtraction from the performance
> critical codepaths.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  mm/memcontrol.c | 50 +++++++++++++++++++++++++++----------------------
>  1 file changed, 28 insertions(+), 22 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 960371788687..2fdeece7f1f8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -320,24 +320,27 @@ static const unsigned int memcg_stat_items[] = {
>  #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
>  #define MEMCG_VMSTAT_SIZE (NR_MEMCG_NODE_STAT_ITEMS + \
>                            ARRAY_SIZE(memcg_stat_items))
> -static int8_t mem_cgroup_stats_index[MEMCG_NR_STAT] __read_mostly;
> +#define IS_INVALID(index) ((index) == U8_MAX)

The use of this macro extends well into this file, should we use a
more specific name (e.g. IS_VALID_STATS_IDX())?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 1/3] memcg: increase the valid index range for memcg stats
  2024-07-24 23:17   ` Yosry Ahmed
@ 2024-07-25 14:25     ` Pasha Tatashin
  0 siblings, 0 replies; 10+ messages in thread
From: Pasha Tatashin @ 2024-07-25 14:25 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: akpm, jpoimboe, kent.overstreet, peterz, nphamcs,
	cerasuolodomenico, surenb, lizhijian, willy, shakeel.butt,
	vbabka, ziy, linux-kernel, linux-mm

> >  #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
> >  #define MEMCG_VMSTAT_SIZE (NR_MEMCG_NODE_STAT_ITEMS + \
> >                            ARRAY_SIZE(memcg_stat_items))
> > -static int8_t mem_cgroup_stats_index[MEMCG_NR_STAT] __read_mostly;
> > +#define IS_INVALID(index) ((index) == U8_MAX)
>
> The use of this macro extends well into this file, should we use a
> more specific name (e.g. IS_VALID_STATS_IDX())?

I will redefine it like this:
#define BAD_STAT_IDX(index) ((u32)(index) >= U8_MAX)

1. A descriptive yet short name.
2. Check if the index is equal or larger than U8_MAX, which feels safer.

Pasha


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 3/3] task_stack: uninline stack_not_used
  2024-07-24 20:50   ` Shakeel Butt
@ 2024-07-25 14:30     ` Pasha Tatashin
  2024-07-25 15:57       ` Shakeel Butt
  0 siblings, 1 reply; 10+ messages in thread
From: Pasha Tatashin @ 2024-07-25 14:30 UTC (permalink / raw)
  To: akpm
  Cc: Shakeel Butt, jpoimboe, kent.overstreet, peterz, nphamcs,
	cerasuolodomenico, surenb, lizhijian, willy, vbabka, ziy,
	linux-kernel, linux-mm

> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

Andrew,

You took the other two patches, but not this one. Should I drop it in
the next version?

Pasha


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 3/3] task_stack: uninline stack_not_used
  2024-07-25 14:30     ` Pasha Tatashin
@ 2024-07-25 15:57       ` Shakeel Butt
  0 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2024-07-25 15:57 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: akpm, jpoimboe, kent.overstreet, peterz, nphamcs,
	cerasuolodomenico, surenb, lizhijian, willy, vbabka, ziy,
	linux-kernel, linux-mm

On Thu, Jul 25, 2024 at 10:30:15AM GMT, Pasha Tatashin wrote:
> > > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> >
> > Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> 
> Andrew,
> 
> You took the other two patches, but not this one. Should I drop it in
> the next version?
> 

Please keep it as we do want this patch. Most probably this was missed
by misttake and if you see the commit message of first patch, it says
"This patch (of 3):".


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-07-25 15:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-24 20:33 [PATCH v5 0/3] Kernel stack usage histogram Pasha Tatashin
2024-07-24 20:33 ` [PATCH v5 1/3] memcg: increase the valid index range for memcg stats Pasha Tatashin
2024-07-24 23:17   ` Yosry Ahmed
2024-07-25 14:25     ` Pasha Tatashin
2024-07-24 20:33 ` [PATCH v5 2/3] vmstat: Kernel stack usage histogram Pasha Tatashin
2024-07-24 20:49   ` Shakeel Butt
2024-07-24 20:33 ` [PATCH v5 3/3] task_stack: uninline stack_not_used Pasha Tatashin
2024-07-24 20:50   ` Shakeel Butt
2024-07-25 14:30     ` Pasha Tatashin
2024-07-25 15:57       ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox