linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/7] page_owner: print stacks and their outstanding allocations
@ 2024-02-14 17:01 Oscar Salvador
  2024-02-14 17:01 ` [PATCH v9 1/7] lib/stackdepot: Fix first entry having a 0-handle Oscar Salvador
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Oscar Salvador @ 2024-02-14 17:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Oscar Salvador

Changes v8 -> v9
     - Fix handle-0 for the very first stack_record entry
     - Collect Acked-by and Reviewed-by from Marco and Vlastimil
     - Adressed feedback from Marco and Vlastimil
     - stack_print() no longer allocates a memory buffer, prints directly
       using seq_printf: by Vlastimil
     - Added two static struct stack for dummy_handle and faiure_handle
     - add_stack_record_to_list() now filters out the gfp_mask the same way
       stackdepot does, for consistency
     - Rename set_threshold to count_threshold

Changes v7 -> v8
     - Rebased on top of -next
     - page_owner maintains its own stack_records list now
     - Kill auxiliary stackdepot function to traverse buckets
     - page_owner_stacks is now a directory with 'show_stacks'
       and 'set_threshold'
     - Update Documentation/mm/page_owner.rst
     - Adressed feedback from Marco

Changes v6 -> v7:
     - Rebased on top of Andrey Konovalov's libstackdepot patchset
     - Reformulated the changelogs

Changes v5 -> v6:
     - Rebase on top of v6.7-rc1
     - Move stack_record struct to the header
     - Addressed feedback from Vlastimil
       (some code tweaks and changelogs suggestions)

Changes v4 -> v5:
     - Addressed feedback from Alexander Potapenko

Changes v3 -> v4:
     - Rebase (long time has passed)
     - Use boolean instead of enum for action by Alexander Potapenko
     - (I left some feedback untouched because it's been long and
        would like to discuss it here now instead of re-vamping
        and old thread)

Changes v2 -> v3:
     - Replace interface in favor of seq operations
       (suggested by Vlastimil)
     - Use debugfs interface to store/read valued (suggested by Ammar)


page_owner is a great debug functionality tool that lets us know
about all pages that have been allocated/freed and their specific
stacktrace.
This comes very handy when debugging memory leaks, since with
some scripting we can see the outstanding allocations, which might point
to a memory leak.

In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages and try to reconstruct the
stack <-> allocated/freed relationship, becoming most of the time a
daunting and slow process when we have tons of allocation/free operations. 

This patchset aims to ease that by adding a new functionality into
page_owner.
This functionality creates a new directory called 'page_owner_stacks'
under 'sys/kernel//debug' with a read-only file called 'show_stacks',
which prints out all the stacks followed by their outstanding number
of allocations (being that the times the stacktrace has allocated
but not freed yet).
This gives us a clear and a quick overview of stacks <-> allocated/free.

We take advantage of the new refcount_f field that stack_record struct
gained, and increment/decrement the stack refcount on every
__set_page_owner() (alloc operation) and __reset_page_owner (free operation)
call.

Unfortunately, we cannot use the new stackdepot api
STACK_DEPOT_FLAG_GET because it does not fulfill page_owner needs,
meaning we would have to special case things, at which point
makes more sense for page_owner to do its own {dec,inc}rementing
of the stacks.
E.g: Using STACK_DEPOT_FLAG_PUT, once the refcount reaches 0,
such stack gets evicted, so page_owner would lose information.

This patch also creates a new file called 'set_threshold' within
'page_owner_stacks' directory, and by writing a value to it, the stacks
which refcount is below such value will be filtered out.

A PoC can be found below:

 # cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks.txt
 # head -40 page_owner_full_stacks.txt 
  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  page_cache_ra_unbounded+0x96/0x180
  filemap_get_pages+0xfd/0x590
  filemap_read+0xcc/0x330
  blkdev_read_iter+0xb8/0x150
  vfs_read+0x285/0x320
  ksys_read+0xa5/0xe0
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 521



  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  __filemap_get_folio+0x14a/0x490
  ext4_write_begin+0xbd/0x4b0 [ext4]
  generic_perform_write+0xc1/0x1e0
  ext4_buffered_write_iter+0x68/0xe0 [ext4]
  ext4_file_write_iter+0x70/0x740 [ext4]
  vfs_write+0x33d/0x420
  ksys_write+0xa5/0xe0
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 4609
...
...

 # echo 5000 > /sys/kernel/debug/page_owner_stacks/set_threshold 
 # cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks_5000.txt
 # head -40 page_owner_full_stacks_5000.txt 
  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  __filemap_get_folio+0x14a/0x490
  ext4_write_begin+0xbd/0x4b0 [ext4]
  generic_perform_write+0xc1/0x1e0
  ext4_buffered_write_iter+0x68/0xe0 [ext4]
  ext4_file_write_iter+0x70/0x740 [ext4]
  vfs_write+0x33d/0x420
  ksys_pwrite64+0x75/0x90
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 6781



  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  pcpu_populate_chunk+0xec/0x350
  pcpu_balance_workfn+0x2d1/0x4a0
  process_scheduled_works+0x84/0x380
  worker_thread+0x12a/0x2a0
  kthread+0xe3/0x110
  ret_from_fork+0x30/0x50
  ret_from_fork_asm+0x1b/0x30
 stack_count: 8641

Oscar Salvador (7):
  lib/stackdepot: Fix first entry having a 0-handle
  lib/stackdepot: Move stack_record struct definition into the header
  mm,page_owner: Maintain own list of stack_records structs
  mm,page_owner: Implement the tracking of the stacks count
  mm,page_owner: Display all stacks and their count
  mm,page_owner: Filter out stacks by a threshold
  mm,page_owner: Update Documentation regarding page_owner_stacks

 Documentation/mm/page_owner.rst |  45 +++++++
 include/linux/stackdepot.h      |  58 +++++++++
 lib/stackdepot.c                |  65 +++--------
 mm/page_owner.c                 | 200 +++++++++++++++++++++++++++++++-
 4 files changed, 318 insertions(+), 50 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v9 1/7] lib/stackdepot: Fix first entry having a 0-handle
  2024-02-14 17:01 [PATCH v9 0/7] page_owner: print stacks and their outstanding allocations Oscar Salvador
@ 2024-02-14 17:01 ` Oscar Salvador
  2024-02-15 10:46   ` Vlastimil Babka
  2024-02-14 17:01 ` [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header Oscar Salvador
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Oscar Salvador @ 2024-02-14 17:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Oscar Salvador

The very first entry of stack_record gets a handle of 0, but this is wrong
because stackdepot treats a 0-handle as a non-valid one.
E.g: See the check in stack_depot_fetch()

Fix this by adding and offset of 1.

This bug has been lurking since the very beginning of stackdepot,
but no one really cared as it seems.
Because of that I am not adding a Fixes tag.

Co-developed-by: Marco Elver <elver@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 lib/stackdepot.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 4a7055a63d9f..c043a4186bc5 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -45,15 +45,16 @@
 #define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
 			       STACK_DEPOT_EXTRA_BITS)
 #define DEPOT_POOLS_CAP 8192
+/* The pool_index is offset by 1 so the first record does not have a 0 handle. */
 #define DEPOT_MAX_POOLS \
-	(((1LL << (DEPOT_POOL_INDEX_BITS)) < DEPOT_POOLS_CAP) ? \
-	 (1LL << (DEPOT_POOL_INDEX_BITS)) : DEPOT_POOLS_CAP)
+	(((1LL << (DEPOT_POOL_INDEX_BITS)) - 1 < DEPOT_POOLS_CAP) ? \
+	 (1LL << (DEPOT_POOL_INDEX_BITS)) - 1 : DEPOT_POOLS_CAP)
 
 /* Compact structure that stores a reference to a stack. */
 union handle_parts {
 	depot_stack_handle_t handle;
 	struct {
-		u32 pool_index	: DEPOT_POOL_INDEX_BITS;
+		u32 pool_index	: DEPOT_POOL_INDEX_BITS; /* pool_index is offset by 1 */
 		u32 offset	: DEPOT_OFFSET_BITS;
 		u32 extra	: STACK_DEPOT_EXTRA_BITS;
 	};
@@ -372,7 +373,7 @@ static struct stack_record *depot_pop_free_pool(void **prealloc, size_t size)
 	stack = current_pool + pool_offset;
 
 	/* Pre-initialize handle once. */
-	stack->handle.pool_index = pool_index;
+	stack->handle.pool_index = pool_index + 1;
 	stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN;
 	stack->handle.extra = 0;
 	INIT_LIST_HEAD(&stack->hash_list);
@@ -483,18 +484,19 @@ static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
 	const int pools_num_cached = READ_ONCE(pools_num);
 	union handle_parts parts = { .handle = handle };
 	void *pool;
+	u32 pool_index = parts.pool_index - 1;
 	size_t offset = parts.offset << DEPOT_STACK_ALIGN;
 	struct stack_record *stack;
 
 	lockdep_assert_not_held(&pool_lock);
 
-	if (parts.pool_index > pools_num_cached) {
+	if (pool_index > pools_num_cached) {
 		WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
-		     parts.pool_index, pools_num_cached, handle);
+		     pool_index, pools_num_cached, handle);
 		return NULL;
 	}
 
-	pool = stack_pools[parts.pool_index];
+	pool = stack_pools[pool_index];
 	if (WARN_ON(!pool))
 		return NULL;
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header
  2024-02-14 17:01 [PATCH v9 0/7] page_owner: print stacks and their outstanding allocations Oscar Salvador
  2024-02-14 17:01 ` [PATCH v9 1/7] lib/stackdepot: Fix first entry having a 0-handle Oscar Salvador
@ 2024-02-14 17:01 ` Oscar Salvador
  2024-02-15  8:16   ` Marco Elver
  2024-02-14 17:01 ` [PATCH v9 3/7] mm,page_owner: Maintain own list of stack_records structs Oscar Salvador
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Oscar Salvador @ 2024-02-14 17:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Oscar Salvador

In order to move the heavy lifting into page_owner code, this one
needs to have access to the stack_record structure, which right now
sits in lib/stackdepot.c.
Move it to the stackdepot.h header so page_owner can access
stack_record's struct fields.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/stackdepot.h | 47 ++++++++++++++++++++++++++++++++++++++
 lib/stackdepot.c           | 45 +-----------------------------------
 2 files changed, 48 insertions(+), 44 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index adcbb8f23600..c4b5ad57c066 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -30,6 +30,53 @@ typedef u32 depot_stack_handle_t;
  */
 #define STACK_DEPOT_EXTRA_BITS 5
 
+#define DEPOT_HANDLE_BITS (sizeof(depot_stack_handle_t) * 8)
+
+#define DEPOT_POOL_ORDER 2 /* Pool size order, 4 pages */
+#define DEPOT_POOL_SIZE (1LL << (PAGE_SHIFT + DEPOT_POOL_ORDER))
+#define DEPOT_STACK_ALIGN 4
+#define DEPOT_OFFSET_BITS (DEPOT_POOL_ORDER + PAGE_SHIFT - DEPOT_STACK_ALIGN)
+#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
+			       STACK_DEPOT_EXTRA_BITS)
+
+#ifdef CONFIG_STACKDEPOT
+/* Compact structure that stores a reference to a stack. */
+union handle_parts {
+	depot_stack_handle_t handle;
+	struct {
+		/* pool_index is offset by 1 */
+		u32 pool_index	: DEPOT_POOL_INDEX_BITS;
+		u32 offset	: DEPOT_OFFSET_BITS;
+		u32 extra	: STACK_DEPOT_EXTRA_BITS;
+	};
+};
+
+struct stack_record {
+	struct list_head hash_list;	/* Links in the hash table */
+	u32 hash;			/* Hash in hash table */
+	u32 size;			/* Number of stored frames */
+	union handle_parts handle;	/* Constant after initialization */
+	refcount_t count;
+	union {
+		unsigned long entries[CONFIG_STACKDEPOT_MAX_FRAMES];	/* Frames */
+		struct {
+			/*
+			 * An important invariant of the implementation is to
+			 * only place a stack record onto the freelist iff its
+			 * refcount is zero. Because stack records with a zero
+			 * refcount are never considered as valid, it is safe to
+			 * union @entries and freelist management state below.
+			 * Conversely, as soon as an entry is off the freelist
+			 * and its refcount becomes non-zero, the below must not
+			 * be accessed until being placed back on the freelist.
+			 */
+			struct list_head free_list;	/* Links in the freelist */
+			unsigned long rcu_state;	/* RCU cookie */
+		};
+	};
+};
+#endif
+
 typedef u32 depot_flags_t;
 
 /*
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index c043a4186bc5..4a661a6777da 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -36,55 +36,12 @@
 #include <linux/memblock.h>
 #include <linux/kasan-enabled.h>
 
-#define DEPOT_HANDLE_BITS (sizeof(depot_stack_handle_t) * 8)
-
-#define DEPOT_POOL_ORDER 2 /* Pool size order, 4 pages */
-#define DEPOT_POOL_SIZE (1LL << (PAGE_SHIFT + DEPOT_POOL_ORDER))
-#define DEPOT_STACK_ALIGN 4
-#define DEPOT_OFFSET_BITS (DEPOT_POOL_ORDER + PAGE_SHIFT - DEPOT_STACK_ALIGN)
-#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
-			       STACK_DEPOT_EXTRA_BITS)
 #define DEPOT_POOLS_CAP 8192
-/* The pool_index is offset by 1 so the first record does not have a 0 handle. */
+/* The pool_index is offset by 1 so the first record does not have a 0 handle */
 #define DEPOT_MAX_POOLS \
 	(((1LL << (DEPOT_POOL_INDEX_BITS)) - 1 < DEPOT_POOLS_CAP) ? \
 	 (1LL << (DEPOT_POOL_INDEX_BITS)) - 1 : DEPOT_POOLS_CAP)
 
-/* Compact structure that stores a reference to a stack. */
-union handle_parts {
-	depot_stack_handle_t handle;
-	struct {
-		u32 pool_index	: DEPOT_POOL_INDEX_BITS; /* pool_index is offset by 1 */
-		u32 offset	: DEPOT_OFFSET_BITS;
-		u32 extra	: STACK_DEPOT_EXTRA_BITS;
-	};
-};
-
-struct stack_record {
-	struct list_head hash_list;	/* Links in the hash table */
-	u32 hash;			/* Hash in hash table */
-	u32 size;			/* Number of stored frames */
-	union handle_parts handle;	/* Constant after initialization */
-	refcount_t count;
-	union {
-		unsigned long entries[CONFIG_STACKDEPOT_MAX_FRAMES];	/* Frames */
-		struct {
-			/*
-			 * An important invariant of the implementation is to
-			 * only place a stack record onto the freelist iff its
-			 * refcount is zero. Because stack records with a zero
-			 * refcount are never considered as valid, it is safe to
-			 * union @entries and freelist management state below.
-			 * Conversely, as soon as an entry is off the freelist
-			 * and its refcount becomes non-zero, the below must not
-			 * be accessed until being placed back on the freelist.
-			 */
-			struct list_head free_list;	/* Links in the freelist */
-			unsigned long rcu_state;	/* RCU cookie */
-		};
-	};
-};
-
 static bool stack_depot_disabled;
 static bool __stack_depot_early_init_requested __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
 static bool __stack_depot_early_init_passed __initdata;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v9 3/7] mm,page_owner: Maintain own list of stack_records structs
  2024-02-14 17:01 [PATCH v9 0/7] page_owner: print stacks and their outstanding allocations Oscar Salvador
  2024-02-14 17:01 ` [PATCH v9 1/7] lib/stackdepot: Fix first entry having a 0-handle Oscar Salvador
  2024-02-14 17:01 ` [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header Oscar Salvador
@ 2024-02-14 17:01 ` Oscar Salvador
  2024-02-15 10:55   ` Vlastimil Babka
  2024-02-14 17:01 ` [PATCH v9 4/7] mm,page_owner: Implement the tracking of the stacks count Oscar Salvador
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Oscar Salvador @ 2024-02-14 17:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Oscar Salvador

page_owner needs to increment a stack_record refcount when a new allocation
occurs, and decrement it on a free operation.
In order to do that, we need to have a way to get a stack_record from a
handle.
Implement __stack_depot_get_stack_record() which just does that, and make
it public so page_owner can use it.

Also, traversing all stackdepot buckets comes with its own complexity,
plus we would have to implement a way to mark only those stack_records
that were originated from page_owner, as those are the ones we are
interested in.
For that reason, page_owner maintains its own list of stack_records,
because traversing that list is faster than traversing all buckets
while keeping at the same time a low complexity.

For now, add to stack_list only the stack_records of dummy_handle and
failure_handle, and set their refcount of 1.

Further patches will add code to increment or decrement stack_records
count on allocation and free operation.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/stackdepot.h | 11 +++++++++++
 lib/stackdepot.c           |  8 ++++++++
 mm/page_owner.c            | 15 +++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index c4b5ad57c066..3c6caa5abc7c 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -178,6 +178,17 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
 depot_stack_handle_t stack_depot_save(unsigned long *entries,
 				      unsigned int nr_entries, gfp_t gfp_flags);
 
+/**
+ * __stack_depot_get_stack_record - Get a pointer to a stack_record struct
+ *
+ * @handle: Stack depot handle
+ *
+ * This function is only for internal purposes.
+ *
+ * Return: Returns a pointer to a stack_record struct
+ */
+struct stack_record *__stack_depot_get_stack_record(depot_stack_handle_t handle);
+
 /**
  * stack_depot_fetch - Fetch a stack trace from stack depot
  *
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 4a661a6777da..3da6d7cfcdfb 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -687,6 +687,14 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
 }
 EXPORT_SYMBOL_GPL(stack_depot_save);
 
+struct stack_record *__stack_depot_get_stack_record(depot_stack_handle_t handle)
+{
+	if (!handle)
+		return NULL;
+
+	return depot_fetch_stack(handle);
+}
+
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
 			       unsigned long **entries)
 {
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 5634e5d890f8..33e342b15d9b 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -36,6 +36,14 @@ struct page_owner {
 	pid_t free_tgid;
 };
 
+struct stack {
+	struct stack_record *stack_record;
+	struct stack *next;
+};
+static struct stack dummy_stack;
+static struct stack failure_stack;
+static struct stack *stack_list;
+
 static bool page_owner_enabled __initdata;
 DEFINE_STATIC_KEY_FALSE(page_owner_inited);
 
@@ -95,6 +103,13 @@ static __init void init_page_owner(void)
 	register_early_stack();
 	static_branch_enable(&page_owner_inited);
 	init_early_allocated_pages();
+	/* Initialize dummy and failure stacks and link them to stack_list */
+	dummy_stack.stack_record = __stack_depot_get_stack_record(dummy_handle);
+	failure_stack.stack_record = __stack_depot_get_stack_record(failure_handle);
+	refcount_set(&dummy_stack.stack_record->count, 1);
+	refcount_set(&failure_stack.stack_record->count, 1);
+	dummy_stack.next = &failure_stack;
+	stack_list = &dummy_stack;
 }
 
 struct page_ext_operations page_owner_ops = {
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v9 4/7] mm,page_owner: Implement the tracking of the stacks count
  2024-02-14 17:01 [PATCH v9 0/7] page_owner: print stacks and their outstanding allocations Oscar Salvador
                   ` (2 preceding siblings ...)
  2024-02-14 17:01 ` [PATCH v9 3/7] mm,page_owner: Maintain own list of stack_records structs Oscar Salvador
@ 2024-02-14 17:01 ` Oscar Salvador
  2024-02-15 11:08   ` Vlastimil Babka
  2024-02-14 17:01 ` [PATCH v9 5/7] mm,page_owner: Display all stacks and their count Oscar Salvador
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Oscar Salvador @ 2024-02-14 17:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Oscar Salvador

Implement {inc,dec}_stack_record_count() which increments or
decrements on respective allocation and free operations, via
__reset_page_owner() (free operation) and __set_page_owner() (alloc
operation).
Newly allocated stack_record structs will be added to the list stack_list
via add_stack_record_to_list().
Modifications on the list are protected via a spinlock with irqs
disabled, since this code can also be reached from IRQ context.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Marco Elver <elver@google.com>
---
 mm/page_owner.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index 33e342b15d9b..df6a923af5de 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -43,6 +43,7 @@ struct stack {
 static struct stack dummy_stack;
 static struct stack failure_stack;
 static struct stack *stack_list;
+static DEFINE_SPINLOCK(stack_list_lock);
 
 static bool page_owner_enabled __initdata;
 DEFINE_STATIC_KEY_FALSE(page_owner_inited);
@@ -150,11 +151,68 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
 	return handle;
 }
 
+static void add_stack_record_to_list(struct stack_record *stack_record,
+				     gfp_t gfp_mask)
+{
+	unsigned long flags;
+	struct stack *stack;
+
+	/* Filter gfp_mask the same way stackdepot does, for consistency */
+	gfp_mask &= ~GFP_ZONEMASK;
+	gfp_mask &= (GFP_ATOMIC | GFP_KERNEL);
+	gfp_mask |= __GFP_NOWARN;
+
+	stack = kmalloc(sizeof(*stack), gfp_mask);
+	if (!stack)
+		return;
+
+	stack->stack_record = stack_record;
+	stack->next = NULL;
+
+	spin_lock_irqsave(&stack_list_lock, flags);
+	stack->next = stack_list;
+	stack_list = stack;
+	spin_unlock_irqrestore(&stack_list_lock, flags);
+}
+
+static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask)
+{
+	struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
+
+	if (!stack_record)
+		return;
+
+	/*
+	 * New stack_record's that do not use STACK_DEPOT_FLAG_GET start
+	 * with REFCOUNT_SATURATED to catch spurious increments of their
+	 * refcount.
+	 * Since we do not use STACK_DEPOT_FLAG_GET API, let us
+	 * set a refcount of 1 ourselves.
+	 */
+	if (refcount_read(&stack_record->count) == REFCOUNT_SATURATED) {
+		int old = REFCOUNT_SATURATED;
+
+		if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1))
+			/* Add the new stack_record to our list */
+			add_stack_record_to_list(stack_record, gfp_mask);
+	}
+	refcount_inc(&stack_record->count);
+}
+
+static void dec_stack_record_count(depot_stack_handle_t handle)
+{
+	struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
+
+	if (stack_record)
+		refcount_dec(&stack_record->count);
+}
+
 void __reset_page_owner(struct page *page, unsigned short order)
 {
 	int i;
 	struct page_ext *page_ext;
 	depot_stack_handle_t handle;
+	depot_stack_handle_t alloc_handle;
 	struct page_owner *page_owner;
 	u64 free_ts_nsec = local_clock();
 
@@ -162,17 +220,29 @@ void __reset_page_owner(struct page *page, unsigned short order)
 	if (unlikely(!page_ext))
 		return;
 
+	page_owner = get_page_owner(page_ext);
+	alloc_handle = page_owner->handle;
+
 	handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
 	for (i = 0; i < (1 << order); i++) {
 		__clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
-		page_owner = get_page_owner(page_ext);
 		page_owner->free_handle = handle;
 		page_owner->free_ts_nsec = free_ts_nsec;
 		page_owner->free_pid = current->pid;
 		page_owner->free_tgid = current->tgid;
 		page_ext = page_ext_next(page_ext);
+		page_owner = get_page_owner(page_ext);
 	}
 	page_ext_put(page_ext);
+	if (alloc_handle != early_handle)
+		/*
+		 * early_handle is being set as a handle for all those
+		 * early allocated pages. See init_pages_in_zone().
+		 * Since their refcount is not being incremented because
+		 * the machinery is not ready yet, we cannot decrement
+		 * their refcount either.
+		 */
+		dec_stack_record_count(alloc_handle);
 }
 
 static inline void __set_page_owner_handle(struct page_ext *page_ext,
@@ -214,6 +284,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
 		return;
 	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
 	page_ext_put(page_ext);
+	inc_stack_record_count(handle, gfp_mask);
 }
 
 void __set_page_owner_migrate_reason(struct page *page, int reason)
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v9 5/7] mm,page_owner: Display all stacks and their count
  2024-02-14 17:01 [PATCH v9 0/7] page_owner: print stacks and their outstanding allocations Oscar Salvador
                   ` (3 preceding siblings ...)
  2024-02-14 17:01 ` [PATCH v9 4/7] mm,page_owner: Implement the tracking of the stacks count Oscar Salvador
@ 2024-02-14 17:01 ` Oscar Salvador
  2024-02-15 11:10   ` Vlastimil Babka
  2024-02-14 17:01 ` [PATCH v9 6/7] mm,page_owner: Filter out stacks by a threshold Oscar Salvador
  2024-02-14 17:01 ` [PATCH v9 7/7] mm,page_owner: Update Documentation regarding page_owner_stacks Oscar Salvador
  6 siblings, 1 reply; 24+ messages in thread
From: Oscar Salvador @ 2024-02-14 17:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Oscar Salvador

This patch adds a new directory called 'page_owner_stacks' under
/sys/kernel/debug/, with a file called 'show_stacks' in it.
Reading from that file will show all stacks that were added by page_owner
followed by their counting, giving us a clear overview of stack <-> count
relationship.

E.g:

  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  __filemap_get_folio+0x14a/0x490
  ext4_write_begin+0xbd/0x4b0 [ext4]
  generic_perform_write+0xc1/0x1e0
  ext4_buffered_write_iter+0x68/0xe0 [ext4]
  ext4_file_write_iter+0x70/0x740 [ext4]
  vfs_write+0x33d/0x420
  ksys_write+0xa5/0xe0
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 4578

The seq stack_{start,next} functions will iterate through the list
stack_list in order to print all stacks.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Marco Elver <elver@google.com>
---
 mm/page_owner.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 92 insertions(+), 1 deletion(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index df6a923af5de..5258a417f4d1 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -171,7 +171,13 @@ static void add_stack_record_to_list(struct stack_record *stack_record,
 
 	spin_lock_irqsave(&stack_list_lock, flags);
 	stack->next = stack_list;
-	stack_list = stack;
+	/*
+	 * This pairs with smp_load_acquire() from function
+	 * stack_start(). This guarantees that stack_start()
+	 * will see an updated stack_list before starting to
+	 * traverse the list.
+	 */
+	smp_store_release(&stack_list, stack);
 	spin_unlock_irqrestore(&stack_list_lock, flags);
 }
 
@@ -805,8 +811,90 @@ static const struct file_operations proc_page_owner_operations = {
 	.llseek		= lseek_page_owner,
 };
 
+static void *stack_start(struct seq_file *m, loff_t *ppos)
+{
+	struct stack *stack;
+
+	if (*ppos == -1UL)
+		return NULL;
+
+	if (!*ppos) {
+		/*
+		 * This pairs with smp_store_release() from function
+		 * add_stack_record_to_list(), so we get a consistent
+		 * value of stack_list.
+		 */
+		stack = smp_load_acquire(&stack_list);
+	} else {
+		stack = m->private;
+		stack = stack->next;
+	}
+
+	m->private = stack;
+
+	return stack;
+}
+
+static void *stack_next(struct seq_file *m, void *v, loff_t *ppos)
+{
+	struct stack *stack = v;
+
+	stack = stack->next;
+	*ppos = stack ? *ppos + 1 : -1UL;
+	m->private = stack;
+
+	return stack;
+}
+
+static int stack_print(struct seq_file *m, void *v)
+{
+	int i;
+	struct stack *stack = v;
+	unsigned long *entries;
+	unsigned long nr_entries;
+	struct stack_record *stack_record = stack->stack_record;
+
+	nr_entries = stack_record->size;
+	entries = stack_record->entries;
+
+	if (!nr_entries || nr_entries < 0 ||
+	    refcount_read(&stack_record->count) < 2)
+		return 0;
+
+	for (i = 0; i < nr_entries; i++)
+		seq_printf(m, " %pS\n", (void *)entries[i]);
+	seq_printf(m, "stack_count: %d\n\n", refcount_read(&stack_record->count));
+
+	return 0;
+}
+
+static void stack_stop(struct seq_file *m, void *v)
+{
+}
+
+static const struct seq_operations page_owner_stack_op = {
+	.start	= stack_start,
+	.next	= stack_next,
+	.stop	= stack_stop,
+	.show	= stack_print
+};
+
+static int page_owner_stack_open(struct inode *inode, struct file *file)
+{
+	return seq_open_private(file, &page_owner_stack_op, 0);
+}
+
+static const struct file_operations page_owner_stack_operations = {
+	.open		= page_owner_stack_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+
 static int __init pageowner_init(void)
 {
+	struct dentry *dir;
+
 	if (!static_branch_unlikely(&page_owner_inited)) {
 		pr_info("page_owner is disabled\n");
 		return 0;
@@ -814,6 +902,9 @@ static int __init pageowner_init(void)
 
 	debugfs_create_file("page_owner", 0400, NULL, NULL,
 			    &proc_page_owner_operations);
+	dir = debugfs_create_dir("page_owner_stacks", NULL);
+	debugfs_create_file("show_stacks", 0400, dir, NULL,
+			    &page_owner_stack_operations);
 
 	return 0;
 }
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v9 6/7] mm,page_owner: Filter out stacks by a threshold
  2024-02-14 17:01 [PATCH v9 0/7] page_owner: print stacks and their outstanding allocations Oscar Salvador
                   ` (4 preceding siblings ...)
  2024-02-14 17:01 ` [PATCH v9 5/7] mm,page_owner: Display all stacks and their count Oscar Salvador
@ 2024-02-14 17:01 ` Oscar Salvador
  2024-02-15 11:12   ` Vlastimil Babka
  2024-02-14 17:01 ` [PATCH v9 7/7] mm,page_owner: Update Documentation regarding page_owner_stacks Oscar Salvador
  6 siblings, 1 reply; 24+ messages in thread
From: Oscar Salvador @ 2024-02-14 17:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Oscar Salvador

We want to be able to filter out the stacks based on a threshold we can
can tune.
By writing to 'count_threshold' file, we can adjust the threshold value.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/page_owner.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index 5258a417f4d1..9b975f59b773 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -846,9 +846,11 @@ static void *stack_next(struct seq_file *m, void *v, loff_t *ppos)
 	return stack;
 }
 
+static unsigned long page_owner_stack_threshold;
+
 static int stack_print(struct seq_file *m, void *v)
 {
-	int i;
+	int i, stack_count;
 	struct stack *stack = v;
 	unsigned long *entries;
 	unsigned long nr_entries;
@@ -856,14 +858,15 @@ static int stack_print(struct seq_file *m, void *v)
 
 	nr_entries = stack_record->size;
 	entries = stack_record->entries;
+	stack_count = refcount_read(&stack_record->count);
 
-	if (!nr_entries || nr_entries < 0 ||
-	    refcount_read(&stack_record->count) < 2)
+	if (!nr_entries || nr_entries < 0 || stack_count < 2 ||
+	    stack_count < page_owner_stack_threshold)
 		return 0;
 
 	for (i = 0; i < nr_entries; i++)
 		seq_printf(m, " %pS\n", (void *)entries[i]);
-	seq_printf(m, "stack_count: %d\n\n", refcount_read(&stack_record->count));
+	seq_printf(m, "stack_count: %d\n\n", stack_count);
 
 	return 0;
 }
@@ -891,6 +894,22 @@ static const struct file_operations page_owner_stack_operations = {
 	.release	= seq_release,
 };
 
+static int page_owner_threshold_get(void *data, u64 *val)
+{
+	*val = READ_ONCE(page_owner_stack_threshold);
+	return 0;
+}
+
+static int page_owner_threshold_set(void *data, u64 val)
+{
+	WRITE_ONCE(page_owner_stack_threshold, val);
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(proc_page_owner_threshold, &page_owner_threshold_get,
+			&page_owner_threshold_set, "%llu");
+
+
 static int __init pageowner_init(void)
 {
 	struct dentry *dir;
@@ -905,6 +924,8 @@ static int __init pageowner_init(void)
 	dir = debugfs_create_dir("page_owner_stacks", NULL);
 	debugfs_create_file("show_stacks", 0400, dir, NULL,
 			    &page_owner_stack_operations);
+	debugfs_create_file("count_threshold", 0600, dir, NULL,
+			    &proc_page_owner_threshold);
 
 	return 0;
 }
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v9 7/7] mm,page_owner: Update Documentation regarding page_owner_stacks
  2024-02-14 17:01 [PATCH v9 0/7] page_owner: print stacks and their outstanding allocations Oscar Salvador
                   ` (5 preceding siblings ...)
  2024-02-14 17:01 ` [PATCH v9 6/7] mm,page_owner: Filter out stacks by a threshold Oscar Salvador
@ 2024-02-14 17:01 ` Oscar Salvador
  2024-02-15 11:13   ` Vlastimil Babka
  6 siblings, 1 reply; 24+ messages in thread
From: Oscar Salvador @ 2024-02-14 17:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Vlastimil Babka,
	Marco Elver, Andrey Konovalov, Alexander Potapenko,
	Oscar Salvador

Update page_owner documentation including the new page_owner_stacks
feature to show how it can be used.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 Documentation/mm/page_owner.rst | 45 +++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/Documentation/mm/page_owner.rst b/Documentation/mm/page_owner.rst
index 62e3f7ab23cc..0d0334cd5179 100644
--- a/Documentation/mm/page_owner.rst
+++ b/Documentation/mm/page_owner.rst
@@ -24,6 +24,11 @@ fragmentation statistics can be obtained through gfp flag information of
 each page. It is already implemented and activated if page owner is
 enabled. Other usages are more than welcome.
 
+It can also be used to show all the stacks and their outstanding
+allocations, which gives us a quick overview of where the memory is going
+without the need to screen through all the pages and match the allocation
+and free operation.
+
 page owner is disabled by default. So, if you'd like to use it, you need
 to add "page_owner=on" to your boot cmdline. If the kernel is built
 with page owner and page owner is disabled in runtime due to not enabling
@@ -68,6 +73,46 @@ Usage
 
 4) Analyze information from page owner::
 
+	cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt
+	cat stacks.txt
+	 prep_new_page+0xa9/0x120
+	 get_page_from_freelist+0x7e6/0x2140
+	 __alloc_pages+0x18a/0x370
+	 new_slab+0xc8/0x580
+	 ___slab_alloc+0x1f2/0xaf0
+	 __slab_alloc.isra.86+0x22/0x40
+	 kmem_cache_alloc+0x31b/0x350
+	 __khugepaged_enter+0x39/0x100
+	 dup_mmap+0x1c7/0x5ce
+	 copy_process+0x1afe/0x1c90
+	 kernel_clone+0x9a/0x3c0
+	 __do_sys_clone+0x66/0x90
+	 do_syscall_64+0x7f/0x160
+	 entry_SYSCALL_64_after_hwframe+0x6c/0x74
+	stack_count: 234
+	...
+	...
+	echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold
+	cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt
+	cat stacks_7000.txt
+	 prep_new_page+0xa9/0x120
+	 get_page_from_freelist+0x7e6/0x2140
+	 __alloc_pages+0x18a/0x370
+	 alloc_pages_mpol+0xdf/0x1e0
+	 folio_alloc+0x14/0x50
+	 filemap_alloc_folio+0xb0/0x100
+	 page_cache_ra_unbounded+0x97/0x180
+	 filemap_fault+0x4b4/0x1200
+	 __do_fault+0x2d/0x110
+	 do_pte_missing+0x4b0/0xa30
+	 __handle_mm_fault+0x7fa/0xb70
+	 handle_mm_fault+0x125/0x300
+	 do_user_addr_fault+0x3c9/0x840
+	 exc_page_fault+0x68/0x150
+	 asm_exc_page_fault+0x22/0x30
+	stack_count: 8248
+	...
+
 	cat /sys/kernel/debug/page_owner > page_owner_full.txt
 	./page_owner_sort page_owner_full.txt sorted_page_owner.txt
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header
  2024-02-14 17:01 ` [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header Oscar Salvador
@ 2024-02-15  8:16   ` Marco Elver
  2024-02-15  8:22     ` Oscar Salvador
  2024-02-15  9:30     ` Vlastimil Babka
  0 siblings, 2 replies; 24+ messages in thread
From: Marco Elver @ 2024-02-15  8:16 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Andrey Konovalov, Alexander Potapenko

On Wed, 14 Feb 2024 at 18:00, Oscar Salvador <osalvador@suse.de> wrote:
>
> In order to move the heavy lifting into page_owner code, this one
> needs to have access to the stack_record structure, which right now
> sits in lib/stackdepot.c.
> Move it to the stackdepot.h header so page_owner can access
> stack_record's struct fields.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Marco Elver <elver@google.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/stackdepot.h | 47 ++++++++++++++++++++++++++++++++++++++
>  lib/stackdepot.c           | 45 +-----------------------------------
>  2 files changed, 48 insertions(+), 44 deletions(-)
>
> diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
> index adcbb8f23600..c4b5ad57c066 100644
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -30,6 +30,53 @@ typedef u32 depot_stack_handle_t;
>   */
>  #define STACK_DEPOT_EXTRA_BITS 5
>
> +#define DEPOT_HANDLE_BITS (sizeof(depot_stack_handle_t) * 8)
> +
> +#define DEPOT_POOL_ORDER 2 /* Pool size order, 4 pages */
> +#define DEPOT_POOL_SIZE (1LL << (PAGE_SHIFT + DEPOT_POOL_ORDER))
> +#define DEPOT_STACK_ALIGN 4
> +#define DEPOT_OFFSET_BITS (DEPOT_POOL_ORDER + PAGE_SHIFT - DEPOT_STACK_ALIGN)
> +#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
> +                              STACK_DEPOT_EXTRA_BITS)
> +
> +#ifdef CONFIG_STACKDEPOT
> +/* Compact structure that stores a reference to a stack. */
> +union handle_parts {
> +       depot_stack_handle_t handle;
> +       struct {
> +               /* pool_index is offset by 1 */
> +               u32 pool_index  : DEPOT_POOL_INDEX_BITS;
> +               u32 offset      : DEPOT_OFFSET_BITS;
> +               u32 extra       : STACK_DEPOT_EXTRA_BITS;
> +       };
> +};
> +
> +struct stack_record {
> +       struct list_head hash_list;     /* Links in the hash table */
> +       u32 hash;                       /* Hash in hash table */
> +       u32 size;                       /* Number of stored frames */
> +       union handle_parts handle;      /* Constant after initialization */
> +       refcount_t count;
> +       union {
> +               unsigned long entries[CONFIG_STACKDEPOT_MAX_FRAMES];    /* Frames */
> +               struct {
> +                       /*
> +                        * An important invariant of the implementation is to
> +                        * only place a stack record onto the freelist iff its
> +                        * refcount is zero. Because stack records with a zero
> +                        * refcount are never considered as valid, it is safe to
> +                        * union @entries and freelist management state below.
> +                        * Conversely, as soon as an entry is off the freelist
> +                        * and its refcount becomes non-zero, the below must not
> +                        * be accessed until being placed back on the freelist.
> +                        */
> +                       struct list_head free_list;     /* Links in the freelist */
> +                       unsigned long rcu_state;        /* RCU cookie */
> +               };
> +       };
> +};
> +#endif
> +
>  typedef u32 depot_flags_t;
>
>  /*
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index c043a4186bc5..4a661a6777da 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -36,55 +36,12 @@
>  #include <linux/memblock.h>
>  #include <linux/kasan-enabled.h>
>
> -#define DEPOT_HANDLE_BITS (sizeof(depot_stack_handle_t) * 8)
> -
> -#define DEPOT_POOL_ORDER 2 /* Pool size order, 4 pages */
> -#define DEPOT_POOL_SIZE (1LL << (PAGE_SHIFT + DEPOT_POOL_ORDER))
> -#define DEPOT_STACK_ALIGN 4
> -#define DEPOT_OFFSET_BITS (DEPOT_POOL_ORDER + PAGE_SHIFT - DEPOT_STACK_ALIGN)
> -#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
> -                              STACK_DEPOT_EXTRA_BITS)
>  #define DEPOT_POOLS_CAP 8192
> -/* The pool_index is offset by 1 so the first record does not have a 0 handle. */
> +/* The pool_index is offset by 1 so the first record does not have a 0 handle */

Why this comment change? We lost the '.' -- for future reference, it'd
be good to ensure unnecessary changes don't creep into the diff. This
is just nitpicking, and I've already reviewed this change, so no need
to send a v+1.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header
  2024-02-15  8:16   ` Marco Elver
@ 2024-02-15  8:22     ` Oscar Salvador
  2024-02-15  9:30     ` Vlastimil Babka
  1 sibling, 0 replies; 24+ messages in thread
From: Oscar Salvador @ 2024-02-15  8:22 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Vlastimil Babka, Andrey Konovalov, Alexander Potapenko

On Thu, Feb 15, 2024 at 09:16:58AM +0100, Marco Elver wrote:
> On Wed, 14 Feb 2024 at 18:00, Oscar Salvador <osalvador@suse.de> wrote:
> > -/* The pool_index is offset by 1 so the first record does not have a 0 handle. */
> > +/* The pool_index is offset by 1 so the first record does not have a 0 handle */
> 
> Why this comment change? We lost the '.' -- for future reference, it'd
> be good to ensure unnecessary changes don't creep into the diff. This
> is just nitpicking, and I've already reviewed this change, so no need
> to send a v+1.

Right, this was an oversight.

Andrew, please fold the following into the patch, thanks:

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 4a661a6777da..514b8d40ff57 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -37,7 +37,7 @@
 #include <linux/kasan-enabled.h>

 #define DEPOT_POOLS_CAP 8192
-/* The pool_index is offset by 1 so the first record does not have a 0 handle */
+/* The pool_index is offset by 1 so the first record does not have a 0 handle. */
 #define DEPOT_MAX_POOLS \
 	(((1LL << (DEPOT_POOL_INDEX_BITS)) - 1 < DEPOT_POOLS_CAP) ? \
 	 (1LL << (DEPOT_POOL_INDEX_BITS)) - 1 : DEPOT_POOLS_CAP)

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header
  2024-02-15  8:16   ` Marco Elver
  2024-02-15  8:22     ` Oscar Salvador
@ 2024-02-15  9:30     ` Vlastimil Babka
  2024-02-15  9:33       ` Marco Elver
  1 sibling, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-02-15  9:30 UTC (permalink / raw)
  To: Marco Elver, Oscar Salvador
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko,
	Andrey Konovalov, Alexander Potapenko

On 2/15/24 09:16, Marco Elver wrote:
> On Wed, 14 Feb 2024 at 18:00, Oscar Salvador <osalvador@suse.de> wrote:
>>
>> In order to move the heavy lifting into page_owner code, this one
>> needs to have access to the stack_record structure, which right now
>> sits in lib/stackdepot.c.
>> Move it to the stackdepot.h header so page_owner can access
>> stack_record's struct fields.
>>
>> Signed-off-by: Oscar Salvador <osalvador@suse.de>
>> Reviewed-by: Marco Elver <elver@google.com>
>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>> ---

>>  #define DEPOT_POOLS_CAP 8192
>> -/* The pool_index is offset by 1 so the first record does not have a 0 handle. */
>> +/* The pool_index is offset by 1 so the first record does not have a 0 handle */
> 
> Why this comment change? We lost the '.' -- for future reference, it'd
> be good to ensure unnecessary changes don't creep into the diff. This
> is just nitpicking, 

Agree with this part.

> and I've already reviewed this change, so no need
> to send a v+1.

But confused by this remark. There is a number of nontrivial changes in the
series from v8, and IIRC v8 was dropped from mm/ meanwhile, so a v+1 of the
whole series is expected and not fixups. Which means including patches that
were already reviewed. That's the usual process.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header
  2024-02-15  9:30     ` Vlastimil Babka
@ 2024-02-15  9:33       ` Marco Elver
  2024-02-15 10:43         ` Vlastimil Babka
  0 siblings, 1 reply; 24+ messages in thread
From: Marco Elver @ 2024-02-15  9:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Oscar Salvador, Andrew Morton, linux-kernel, linux-mm,
	Michal Hocko, Andrey Konovalov, Alexander Potapenko

On Thu, 15 Feb 2024 at 10:30, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/15/24 09:16, Marco Elver wrote:
> > On Wed, 14 Feb 2024 at 18:00, Oscar Salvador <osalvador@suse.de> wrote:
> >>
> >> In order to move the heavy lifting into page_owner code, this one
> >> needs to have access to the stack_record structure, which right now
> >> sits in lib/stackdepot.c.
> >> Move it to the stackdepot.h header so page_owner can access
> >> stack_record's struct fields.
> >>
> >> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> >> Reviewed-by: Marco Elver <elver@google.com>
> >> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
>
> >>  #define DEPOT_POOLS_CAP 8192
> >> -/* The pool_index is offset by 1 so the first record does not have a 0 handle. */
> >> +/* The pool_index is offset by 1 so the first record does not have a 0 handle */
> >
> > Why this comment change? We lost the '.' -- for future reference, it'd
> > be good to ensure unnecessary changes don't creep into the diff. This
> > is just nitpicking,
>
> Agree with this part.
>
> > and I've already reviewed this change, so no need
> > to send a v+1.
>
> But confused by this remark. There is a number of nontrivial changes in the
> series from v8, and IIRC v8 was dropped from mm/ meanwhile, so a v+1 of the
> whole series is expected and not fixups. Which means including patches that
> were already reviewed. That's the usual process.

This is already v9. Of course, still need to look at rest of v9 and if
there are major changes needed then a v10 is needed.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header
  2024-02-15  9:33       ` Marco Elver
@ 2024-02-15 10:43         ` Vlastimil Babka
  0 siblings, 0 replies; 24+ messages in thread
From: Vlastimil Babka @ 2024-02-15 10:43 UTC (permalink / raw)
  To: Marco Elver
  Cc: Oscar Salvador, Andrew Morton, linux-kernel, linux-mm,
	Michal Hocko, Andrey Konovalov, Alexander Potapenko

On 2/15/24 10:33, Marco Elver wrote:
> On Thu, 15 Feb 2024 at 10:30, Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 2/15/24 09:16, Marco Elver wrote:
>> > On Wed, 14 Feb 2024 at 18:00, Oscar Salvador <osalvador@suse.de> wrote:
>> >>
>> >> In order to move the heavy lifting into page_owner code, this one
>> >> needs to have access to the stack_record structure, which right now
>> >> sits in lib/stackdepot.c.
>> >> Move it to the stackdepot.h header so page_owner can access
>> >> stack_record's struct fields.
>> >>
>> >> Signed-off-by: Oscar Salvador <osalvador@suse.de>
>> >> Reviewed-by: Marco Elver <elver@google.com>
>> >> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>> >> ---
>>
>> >>  #define DEPOT_POOLS_CAP 8192
>> >> -/* The pool_index is offset by 1 so the first record does not have a 0 handle. */
>> >> +/* The pool_index is offset by 1 so the first record does not have a 0 handle */
>> >
>> > Why this comment change? We lost the '.' -- for future reference, it'd
>> > be good to ensure unnecessary changes don't creep into the diff. This
>> > is just nitpicking,
>>
>> Agree with this part.
>>
>> > and I've already reviewed this change, so no need
>> > to send a v+1.
>>
>> But confused by this remark. There is a number of nontrivial changes in the
>> series from v8, and IIRC v8 was dropped from mm/ meanwhile, so a v+1 of the
>> whole series is expected and not fixups. Which means including patches that
>> were already reviewed. That's the usual process.
> 
> This is already v9. Of course, still need to look at rest of v9 and if
> there are major changes needed then a v10 is needed.

Ah sorry I misunderstood you completely. What you meant v10 isn't needed for
the missing "." and I thought you were saying v9 already wasn't needed (for
this particular patch).


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 1/7] lib/stackdepot: Fix first entry having a 0-handle
  2024-02-14 17:01 ` [PATCH v9 1/7] lib/stackdepot: Fix first entry having a 0-handle Oscar Salvador
@ 2024-02-15 10:46   ` Vlastimil Babka
  0 siblings, 0 replies; 24+ messages in thread
From: Vlastimil Babka @ 2024-02-15 10:46 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On 2/14/24 18:01, Oscar Salvador wrote:
> The very first entry of stack_record gets a handle of 0, but this is wrong
> because stackdepot treats a 0-handle as a non-valid one.
> E.g: See the check in stack_depot_fetch()
> 
> Fix this by adding and offset of 1.
> 
> This bug has been lurking since the very beginning of stackdepot,
> but no one really cared as it seems.
> Because of that I am not adding a Fixes tag.
> 
> Co-developed-by: Marco Elver <elver@google.com>
> Signed-off-by: Marco Elver <elver@google.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 3/7] mm,page_owner: Maintain own list of stack_records structs
  2024-02-14 17:01 ` [PATCH v9 3/7] mm,page_owner: Maintain own list of stack_records structs Oscar Salvador
@ 2024-02-15 10:55   ` Vlastimil Babka
  2024-02-15 12:52     ` Marco Elver
  0 siblings, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-02-15 10:55 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On 2/14/24 18:01, Oscar Salvador wrote:
> page_owner needs to increment a stack_record refcount when a new allocation
> occurs, and decrement it on a free operation.
> In order to do that, we need to have a way to get a stack_record from a
> handle.
> Implement __stack_depot_get_stack_record() which just does that, and make
> it public so page_owner can use it.
> 
> Also, traversing all stackdepot buckets comes with its own complexity,
> plus we would have to implement a way to mark only those stack_records
> that were originated from page_owner, as those are the ones we are
> interested in.
> For that reason, page_owner maintains its own list of stack_records,
> because traversing that list is faster than traversing all buckets
> while keeping at the same time a low complexity.
> 
> For now, add to stack_list only the stack_records of dummy_handle and
> failure_handle, and set their refcount of 1.
> 
> Further patches will add code to increment or decrement stack_records
> count on allocation and free operation.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 4/7] mm,page_owner: Implement the tracking of the stacks count
  2024-02-14 17:01 ` [PATCH v9 4/7] mm,page_owner: Implement the tracking of the stacks count Oscar Salvador
@ 2024-02-15 11:08   ` Vlastimil Babka
  2024-02-15 11:57     ` Oscar Salvador
  0 siblings, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-02-15 11:08 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On 2/14/24 18:01, Oscar Salvador wrote:
> Implement {inc,dec}_stack_record_count() which increments or
> decrements on respective allocation and free operations, via
> __reset_page_owner() (free operation) and __set_page_owner() (alloc
> operation).
> Newly allocated stack_record structs will be added to the list stack_list
> via add_stack_record_to_list().
> Modifications on the list are protected via a spinlock with irqs
> disabled, since this code can also be reached from IRQ context.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Marco Elver <elver@google.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Note:

> +static void inc_stack_record_count(depot_stack_handle_t handle, gfp_t gfp_mask)
> +{
> +	struct stack_record *stack_record = __stack_depot_get_stack_record(handle);
> +
> +	if (!stack_record)
> +		return;
> +
> +	/*
> +	 * New stack_record's that do not use STACK_DEPOT_FLAG_GET start
> +	 * with REFCOUNT_SATURATED to catch spurious increments of their
> +	 * refcount.
> +	 * Since we do not use STACK_DEPOT_FLAG_GET API, let us
> +	 * set a refcount of 1 ourselves.
> +	 */
> +	if (refcount_read(&stack_record->count) == REFCOUNT_SATURATED) {
> +		int old = REFCOUNT_SATURATED;
> +
> +		if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1))
> +			/* Add the new stack_record to our list */
> +			add_stack_record_to_list(stack_record, gfp_mask);
			
			Not returning here...

> +	}
> +	refcount_inc(&stack_record->count);

... means we'll increase the count to 2 on the first store, so there's a
bias. Which would be consistent with the failure and dummy stacks that also
start with a refcount of 1. But then the stack count reporting should
decrement by 1 to prevent confusion? (in the following patch). Imagine
somebody debugging an allocation stack where there are not so many of them,
but the allocation is large, and being sidetracked by an off-by-one error.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 5/7] mm,page_owner: Display all stacks and their count
  2024-02-14 17:01 ` [PATCH v9 5/7] mm,page_owner: Display all stacks and their count Oscar Salvador
@ 2024-02-15 11:10   ` Vlastimil Babka
  2024-02-15 11:58     ` Oscar Salvador
  0 siblings, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-02-15 11:10 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On 2/14/24 18:01, Oscar Salvador wrote:
> This patch adds a new directory called 'page_owner_stacks' under
> /sys/kernel/debug/, with a file called 'show_stacks' in it.
> Reading from that file will show all stacks that were added by page_owner
> followed by their counting, giving us a clear overview of stack <-> count
> relationship.
> 
> E.g:
> 
>   prep_new_page+0xa9/0x120
>   get_page_from_freelist+0x801/0x2210
>   __alloc_pages+0x18b/0x350
>   alloc_pages_mpol+0x91/0x1f0
>   folio_alloc+0x14/0x50
>   filemap_alloc_folio+0xb2/0x100
>   __filemap_get_folio+0x14a/0x490
>   ext4_write_begin+0xbd/0x4b0 [ext4]
>   generic_perform_write+0xc1/0x1e0
>   ext4_buffered_write_iter+0x68/0xe0 [ext4]
>   ext4_file_write_iter+0x70/0x740 [ext4]
>   vfs_write+0x33d/0x420
>   ksys_write+0xa5/0xe0
>   do_syscall_64+0x80/0x160
>   entry_SYSCALL_64_after_hwframe+0x6e/0x76
>  stack_count: 4578
> 
> The seq stack_{start,next} functions will iterate through the list
> stack_list in order to print all stacks.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Acked-by: Marco Elver <elver@google.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

...

> +static int stack_print(struct seq_file *m, void *v)
> +{
> +	int i;
> +	struct stack *stack = v;
> +	unsigned long *entries;
> +	unsigned long nr_entries;
> +	struct stack_record *stack_record = stack->stack_record;
> +
> +	nr_entries = stack_record->size;
> +	entries = stack_record->entries;
> +
> +	if (!nr_entries || nr_entries < 0 ||
> +	    refcount_read(&stack_record->count) < 2)
> +		return 0;
> +
> +	for (i = 0; i < nr_entries; i++)
> +		seq_printf(m, " %pS\n", (void *)entries[i]);
> +	seq_printf(m, "stack_count: %d\n\n", refcount_read(&stack_record->count));

So count - 1 here to report actual usage, as explained in reply to 4/7?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 6/7] mm,page_owner: Filter out stacks by a threshold
  2024-02-14 17:01 ` [PATCH v9 6/7] mm,page_owner: Filter out stacks by a threshold Oscar Salvador
@ 2024-02-15 11:12   ` Vlastimil Babka
  2024-02-15 12:01     ` Oscar Salvador
  0 siblings, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-02-15 11:12 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On 2/14/24 18:01, Oscar Salvador wrote:
> We want to be able to filter out the stacks based on a threshold we can
> can tune.
> By writing to 'count_threshold' file, we can adjust the threshold value.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
...

> ---
>  mm/page_owner.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 5258a417f4d1..9b975f59b773 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -846,9 +846,11 @@ static void *stack_next(struct seq_file *m, void *v, loff_t *ppos)
>  	return stack;
>  }
>  
> +static unsigned long page_owner_stack_threshold;
> +
>  static int stack_print(struct seq_file *m, void *v)
>  {
> -	int i;
> +	int i, stack_count;
>  	struct stack *stack = v;
>  	unsigned long *entries;
>  	unsigned long nr_entries;
> @@ -856,14 +858,15 @@ static int stack_print(struct seq_file *m, void *v)
>  
>  	nr_entries = stack_record->size;
>  	entries = stack_record->entries;
> +	stack_count = refcount_read(&stack_record->count);

Again "- 1" here.

> -	if (!nr_entries || nr_entries < 0 ||
> -	    refcount_read(&stack_record->count) < 2)
> +	if (!nr_entries || nr_entries < 0 || stack_count < 2 ||
> +	    stack_count < page_owner_stack_threshold)

Which will also correct the comparison.

>  		return 0;
>  
>  	for (i = 0; i < nr_entries; i++)
>  		seq_printf(m, " %pS\n", (void *)entries[i]);
> -	seq_printf(m, "stack_count: %d\n\n", refcount_read(&stack_record->count));
> +	seq_printf(m, "stack_count: %d\n\n", stack_count);

And no - 1 needed here then.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 7/7] mm,page_owner: Update Documentation regarding page_owner_stacks
  2024-02-14 17:01 ` [PATCH v9 7/7] mm,page_owner: Update Documentation regarding page_owner_stacks Oscar Salvador
@ 2024-02-15 11:13   ` Vlastimil Babka
  2024-02-15 12:53     ` Marco Elver
  0 siblings, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2024-02-15 11:13 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On 2/14/24 18:01, Oscar Salvador wrote:
> Update page_owner documentation including the new page_owner_stacks
> feature to show how it can be used.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 4/7] mm,page_owner: Implement the tracking of the stacks count
  2024-02-15 11:08   ` Vlastimil Babka
@ 2024-02-15 11:57     ` Oscar Salvador
  0 siblings, 0 replies; 24+ messages in thread
From: Oscar Salvador @ 2024-02-15 11:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On Thu, Feb 15, 2024 at 12:08:53PM +0100, Vlastimil Babka wrote:
> On 2/14/24 18:01, Oscar Salvador wrote:
> > Implement {inc,dec}_stack_record_count() which increments or
> > decrements on respective allocation and free operations, via
> > __reset_page_owner() (free operation) and __set_page_owner() (alloc
> > operation).
> > Newly allocated stack_record structs will be added to the list stack_list
> > via add_stack_record_to_list().
> > Modifications on the list are protected via a spinlock with irqs
> > disabled, since this code can also be reached from IRQ context.
> > 
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > Reviewed-by: Marco Elver <elver@google.com>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!


> > +		if (atomic_try_cmpxchg_relaxed(&stack_record->count.refs, &old, 1))
> > +			/* Add the new stack_record to our list */
> > +			add_stack_record_to_list(stack_record, gfp_mask);
> 			
> 			Not returning here...
> 
> > +	}
> > +	refcount_inc(&stack_record->count);
> 
> ... means we'll increase the count to 2 on the first store, so there's a
> bias. Which would be consistent with the failure and dummy stacks that also
> start with a refcount of 1. But then the stack count reporting should
> decrement by 1 to prevent confusion? (in the following patch). Imagine
> somebody debugging an allocation stack where there are not so many of them,
> but the allocation is large, and being sidetracked by an off-by-one error.

Good catch Vlastimil!
Yes, we should substract one from the total count in stack_print.

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 5/7] mm,page_owner: Display all stacks and their count
  2024-02-15 11:10   ` Vlastimil Babka
@ 2024-02-15 11:58     ` Oscar Salvador
  0 siblings, 0 replies; 24+ messages in thread
From: Oscar Salvador @ 2024-02-15 11:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On Thu, Feb 15, 2024 at 12:10:38PM +0100, Vlastimil Babka wrote:
> On 2/14/24 18:01, Oscar Salvador wrote:
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > Acked-by: Marco Elver <elver@google.com>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks

> > +	for (i = 0; i < nr_entries; i++)
> > +		seq_printf(m, " %pS\n", (void *)entries[i]);
> > +	seq_printf(m, "stack_count: %d\n\n", refcount_read(&stack_record->count));
> 
> So count - 1 here to report actual usage, as explained in reply to 4/7?

Yes, will do.


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 6/7] mm,page_owner: Filter out stacks by a threshold
  2024-02-15 11:12   ` Vlastimil Babka
@ 2024-02-15 12:01     ` Oscar Salvador
  0 siblings, 0 replies; 24+ messages in thread
From: Oscar Salvador @ 2024-02-15 12:01 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Michal Hocko, Marco Elver,
	Andrey Konovalov, Alexander Potapenko

On Thu, Feb 15, 2024 at 12:12:56PM +0100, Vlastimil Babka wrote:
> On 2/14/24 18:01, Oscar Salvador wrote:
> > We want to be able to filter out the stacks based on a threshold we can
> > can tune.
> > By writing to 'count_threshold' file, we can adjust the threshold value.
> > 
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks

> Again "- 1" here.

Yes, I might just instead add this assigment in the previous patch,
so no further modifications wrt. stack_count will be needed in this
patch.

Might be cleaner?
 

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 3/7] mm,page_owner: Maintain own list of stack_records structs
  2024-02-15 10:55   ` Vlastimil Babka
@ 2024-02-15 12:52     ` Marco Elver
  0 siblings, 0 replies; 24+ messages in thread
From: Marco Elver @ 2024-02-15 12:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Oscar Salvador, Andrew Morton, linux-kernel, linux-mm,
	Michal Hocko, Andrey Konovalov, Alexander Potapenko

On Thu, 15 Feb 2024 at 11:55, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/14/24 18:01, Oscar Salvador wrote:
> > page_owner needs to increment a stack_record refcount when a new allocation
> > occurs, and decrement it on a free operation.
> > In order to do that, we need to have a way to get a stack_record from a
> > handle.
> > Implement __stack_depot_get_stack_record() which just does that, and make
> > it public so page_owner can use it.
> >
> > Also, traversing all stackdepot buckets comes with its own complexity,
> > plus we would have to implement a way to mark only those stack_records
> > that were originated from page_owner, as those are the ones we are
> > interested in.
> > For that reason, page_owner maintains its own list of stack_records,
> > because traversing that list is faster than traversing all buckets
> > while keeping at the same time a low complexity.
> >
> > For now, add to stack_list only the stack_records of dummy_handle and
> > failure_handle, and set their refcount of 1.
> >
> > Further patches will add code to increment or decrement stack_records
> > count on allocation and free operation.
> >
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>

+1

Reviewed-by: Marco Elver <elver@google.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 7/7] mm,page_owner: Update Documentation regarding page_owner_stacks
  2024-02-15 11:13   ` Vlastimil Babka
@ 2024-02-15 12:53     ` Marco Elver
  0 siblings, 0 replies; 24+ messages in thread
From: Marco Elver @ 2024-02-15 12:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Oscar Salvador, Andrew Morton, linux-kernel, linux-mm,
	Michal Hocko, Andrey Konovalov, Alexander Potapenko

On Thu, 15 Feb 2024 at 12:13, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/14/24 18:01, Oscar Salvador wrote:
> > Update page_owner documentation including the new page_owner_stacks
> > feature to show how it can be used.
> >
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> Thanks!

Looks good:

Reviewed-by: Marco Elver <elver@google.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-02-15 12:54 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-14 17:01 [PATCH v9 0/7] page_owner: print stacks and their outstanding allocations Oscar Salvador
2024-02-14 17:01 ` [PATCH v9 1/7] lib/stackdepot: Fix first entry having a 0-handle Oscar Salvador
2024-02-15 10:46   ` Vlastimil Babka
2024-02-14 17:01 ` [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header Oscar Salvador
2024-02-15  8:16   ` Marco Elver
2024-02-15  8:22     ` Oscar Salvador
2024-02-15  9:30     ` Vlastimil Babka
2024-02-15  9:33       ` Marco Elver
2024-02-15 10:43         ` Vlastimil Babka
2024-02-14 17:01 ` [PATCH v9 3/7] mm,page_owner: Maintain own list of stack_records structs Oscar Salvador
2024-02-15 10:55   ` Vlastimil Babka
2024-02-15 12:52     ` Marco Elver
2024-02-14 17:01 ` [PATCH v9 4/7] mm,page_owner: Implement the tracking of the stacks count Oscar Salvador
2024-02-15 11:08   ` Vlastimil Babka
2024-02-15 11:57     ` Oscar Salvador
2024-02-14 17:01 ` [PATCH v9 5/7] mm,page_owner: Display all stacks and their count Oscar Salvador
2024-02-15 11:10   ` Vlastimil Babka
2024-02-15 11:58     ` Oscar Salvador
2024-02-14 17:01 ` [PATCH v9 6/7] mm,page_owner: Filter out stacks by a threshold Oscar Salvador
2024-02-15 11:12   ` Vlastimil Babka
2024-02-15 12:01     ` Oscar Salvador
2024-02-14 17:01 ` [PATCH v9 7/7] mm,page_owner: Update Documentation regarding page_owner_stacks Oscar Salvador
2024-02-15 11:13   ` Vlastimil Babka
2024-02-15 12:53     ` Marco Elver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox