* [PATCH 1/3] mm/page_owner: add option 'print_handle' for 'show_stacks'
2025-09-24 17:40 [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks' Mauricio Faria de Oliveira
@ 2025-09-24 17:40 ` Mauricio Faria de Oliveira
2025-09-25 20:28 ` Joshua Hahn
2025-09-24 17:40 ` [PATCH 2/3] mm/page_owner: add option 'print_stack' " Mauricio Faria de Oliveira
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Mauricio Faria de Oliveira @ 2025-09-24 17:40 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka
Cc: Oscar Salvador, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
kernel-dev
For monitoring the memory usage per stack trace, it is more efficient to
use the handle number as unique identifier of a stack trace than to, for
example, read/hash all stack traces to uniquely identify them everytime.
The handle number is a unique identifier for stack traces in stackdepot.
This patch adds the option 'print_handle' to print the handle number of
stack traces in 'show_stacks'.
Testing:
- Without handles (default):
# cat /sys/kernel/debug/page_owner_stacks/show_stacks > f1
- With handles (new option):
# echo 1 >/sys/kernel/debug/page_owner_stacks/print_handle
# cat /sys/kernel/debug/page_owner_stacks/show_stacks > f2
- Same number of lines for 'nr_base_pages' and 'handle':
# grep -c '^nr_base_pages:' f1
873
# grep -c '^handle:' f2
873
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
---
mm/page_owner.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index c3ca21132c2c..420426749239 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -887,6 +887,7 @@ static void *stack_next(struct seq_file *m, void *v, loff_t *ppos)
}
static unsigned long page_owner_pages_threshold;
+static bool page_owner_print_handle;
static int stack_print(struct seq_file *m, void *v)
{
@@ -908,6 +909,8 @@ static int stack_print(struct seq_file *m, void *v)
for (i = 0; i < nr_entries; i++)
seq_printf(m, " %pS\n", (void *)entries[i]);
+ if (page_owner_print_handle)
+ seq_printf(m, "handle: %d\n", stack_record->handle.handle);
seq_printf(m, "nr_base_pages: %d\n\n", nr_base_pages);
return 0;
@@ -968,6 +971,8 @@ static int __init pageowner_init(void)
&page_owner_stack_operations);
debugfs_create_file("count_threshold", 0600, dir, NULL,
&proc_page_owner_threshold);
+ debugfs_create_bool("print_handle", 0600, dir,
+ &page_owner_print_handle);
return 0;
}
--
2.48.1
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH 1/3] mm/page_owner: add option 'print_handle' for 'show_stacks'
2025-09-24 17:40 ` [PATCH 1/3] mm/page_owner: add option 'print_handle' " Mauricio Faria de Oliveira
@ 2025-09-25 20:28 ` Joshua Hahn
2025-09-25 22:25 ` Mauricio Faria de Oliveira
0 siblings, 1 reply; 14+ messages in thread
From: Joshua Hahn @ 2025-09-25 20:28 UTC (permalink / raw)
To: Mauricio Faria de Oliveira
Cc: Andrew Morton, Vlastimil Babka, Oscar Salvador,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, linux-mm, linux-kernel, kernel-dev
On Wed, 24 Sep 2025 14:40:21 -0300 Mauricio Faria de Oliveira <mfo@igalia.com> wrote:
> For monitoring the memory usage per stack trace, it is more efficient to
> use the handle number as unique identifier of a stack trace than to, for
> example, read/hash all stack traces to uniquely identify them everytime.
>
> The handle number is a unique identifier for stack traces in stackdepot.
>
> This patch adds the option 'print_handle' to print the handle number of
> stack traces in 'show_stacks'.
> @@ -887,6 +887,7 @@ static void *stack_next(struct seq_file *m, void *v, loff_t *ppos)
> }
>
> static unsigned long page_owner_pages_threshold;
> +static bool page_owner_print_handle;
Hi Mauricio,
Quick nit -- If I understand your cover letter correctly, you want
page_owner_print_handle to be false by default, should we initialize this
to false?
Have a great day!
Joshua
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] mm/page_owner: add option 'print_handle' for 'show_stacks'
2025-09-25 20:28 ` Joshua Hahn
@ 2025-09-25 22:25 ` Mauricio Faria de Oliveira
0 siblings, 0 replies; 14+ messages in thread
From: Mauricio Faria de Oliveira @ 2025-09-25 22:25 UTC (permalink / raw)
To: Joshua Hahn
Cc: Andrew Morton, Vlastimil Babka, Oscar Salvador,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, linux-mm, linux-kernel, kernel-dev
On 2025-09-25 17:28, Joshua Hahn wrote:
> On Wed, 24 Sep 2025 14:40:21 -0300 Mauricio Faria de Oliveira <mfo@igalia.com> wrote:
>
>> For monitoring the memory usage per stack trace, it is more efficient to
>> use the handle number as unique identifier of a stack trace than to, for
>> example, read/hash all stack traces to uniquely identify them everytime.
>>
>> The handle number is a unique identifier for stack traces in stackdepot.
>>
>> This patch adds the option 'print_handle' to print the handle number of
>> stack traces in 'show_stacks'.
>
>> @@ -887,6 +887,7 @@ static void *stack_next(struct seq_file *m, void *v, loff_t *ppos)
>> }
>>
>> static unsigned long page_owner_pages_threshold;
>> +static bool page_owner_print_handle;
>
> Hi Mauricio,
>
> Quick nit -- If I understand your cover letter correctly, you want
> page_owner_print_handle to be false by default, should we initialize this
> to false?
Hey Joshua,
In this case, it is not needed, as static variables don't need to be
initialized to false/0. This is handled as an error in checkpatch.pl,
so it should not be added even for completeness.
Thanks for checking this.
--
Mauricio
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] mm/page_owner: add option 'print_stack' for 'show_stacks'
2025-09-24 17:40 [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks' Mauricio Faria de Oliveira
2025-09-24 17:40 ` [PATCH 1/3] mm/page_owner: add option 'print_handle' " Mauricio Faria de Oliveira
@ 2025-09-24 17:40 ` Mauricio Faria de Oliveira
2025-09-24 17:40 ` [PATCH 3/3] mm/page_owner: update Documentation with 'print_handle' and 'print_stack' Mauricio Faria de Oliveira
2025-09-25 16:08 ` [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks' Michal Hocko
3 siblings, 0 replies; 14+ messages in thread
From: Mauricio Faria de Oliveira @ 2025-09-24 17:40 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka
Cc: Oscar Salvador, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
kernel-dev
For monitoring the memory usage per stack trace, it is more efficient to
read _just_ the handle number of the stack traces _without_ stack traces
themselves, and their number of base pages.
The stack traces are required only at the time to associate memory usage
of a handle number with its stack trace (eg, see top-consuming handles).
Before that, it is sufficient to have just the handle number, as it can
be matched with a stack trace later (possible with the previous patch).
This patch adds the option 'print_stack' option (enabled by default) to
print stack traces in 'show_stacks'.
Testing:
- Enable handle numbers:
# echo 1 >/sys/kernel/debug/page_owner_stacks/print_handle
- With stacks (default):
# cat /sys/kernel/debug/page_owner_stacks/show_stacks > f1
- Without stacks (new option):
# echo 0 >/sys/kernel/debug/page_owner_stacks/print_stack
# cat /sys/kernel/debug/page_owner_stacks/show_stacks > f2
- Differences:
# cat f1
get_page_from_freelist+0x14ab/0x16a0
...
do_syscall_64+0xa4/0x290
handle: 15728651
nr_base_pages: 2
get_page_from_freelist+0x14ab/0x16a0
...
do_sys_openat2+0x8a/0xe0
handle: 8388619
nr_base_pages: 16
...
# cat f2
handle: 15728651
nr_base_pages: 2
handle: 8388619
nr_base_pages: 16
...
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
---
mm/page_owner.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 420426749239..25221676676d 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -888,6 +888,7 @@ static void *stack_next(struct seq_file *m, void *v, loff_t *ppos)
static unsigned long page_owner_pages_threshold;
static bool page_owner_print_handle;
+static bool page_owner_print_stack = true;
static int stack_print(struct seq_file *m, void *v)
{
@@ -900,15 +901,17 @@ static int stack_print(struct seq_file *m, void *v)
if (!stack->stack_record)
return 0;
- nr_entries = stack_record->size;
- entries = stack_record->entries;
nr_base_pages = refcount_read(&stack_record->count) - 1;
if (nr_base_pages < 1 || nr_base_pages < page_owner_pages_threshold)
return 0;
- for (i = 0; i < nr_entries; i++)
- seq_printf(m, " %pS\n", (void *)entries[i]);
+ if (page_owner_print_stack) {
+ nr_entries = stack_record->size;
+ entries = stack_record->entries;
+ for (i = 0; i < nr_entries; i++)
+ seq_printf(m, " %pS\n", (void *)entries[i]);
+ }
if (page_owner_print_handle)
seq_printf(m, "handle: %d\n", stack_record->handle.handle);
seq_printf(m, "nr_base_pages: %d\n\n", nr_base_pages);
@@ -973,6 +976,8 @@ static int __init pageowner_init(void)
&proc_page_owner_threshold);
debugfs_create_bool("print_handle", 0600, dir,
&page_owner_print_handle);
+ debugfs_create_bool("print_stack", 0600, dir,
+ &page_owner_print_stack);
return 0;
}
--
2.48.1
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH 3/3] mm/page_owner: update Documentation with 'print_handle' and 'print_stack'
2025-09-24 17:40 [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks' Mauricio Faria de Oliveira
2025-09-24 17:40 ` [PATCH 1/3] mm/page_owner: add option 'print_handle' " Mauricio Faria de Oliveira
2025-09-24 17:40 ` [PATCH 2/3] mm/page_owner: add option 'print_stack' " Mauricio Faria de Oliveira
@ 2025-09-24 17:40 ` Mauricio Faria de Oliveira
2025-09-25 16:08 ` [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks' Michal Hocko
3 siblings, 0 replies; 14+ messages in thread
From: Mauricio Faria de Oliveira @ 2025-09-24 17:40 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka
Cc: Oscar Salvador, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
kernel-dev
Describe and provide examples for 'print_handle' and 'print_stack'.
Signed-off-by: Mauricio Faria de Oliveira <mfo@igalia.com>
---
Documentation/mm/page_owner.rst | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/Documentation/mm/page_owner.rst b/Documentation/mm/page_owner.rst
index 3a45a20fc05a..fac14ff2e4a5 100644
--- a/Documentation/mm/page_owner.rst
+++ b/Documentation/mm/page_owner.rst
@@ -27,7 +27,10 @@ enabled. Other usages are more than welcome.
It can also be used to show all the stacks and their current number of
allocated base pages, which gives us a quick overview of where the memory
is going without the need to screen through all the pages and match the
-allocation and free operation.
+allocation and free operation. It's also possible to show only a numeric
+identifier of all the stacks (without stack traces) and their number of
+allocated base pages (faster to read and parse, eg, for monitoring) that
+can be matched with stacks later (options print_handle and print_stack).
page owner is disabled by default. So, if you'd like to use it, you need
to add "page_owner=on" to your boot cmdline. If the kernel is built
@@ -95,6 +98,7 @@ Usage
...
...
echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold
+ echo 1 > /sys/kernel/debug/page_owner_stacks/print_handle
cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt
cat stacks_7000.txt
post_alloc_hook+0x177/0x1a0
@@ -113,6 +117,15 @@ Usage
__do_sys_finit_module+0x381/0x730
do_syscall_64+0x8d/0x150
entry_SYSCALL_64_after_hwframe+0x62/0x6a
+ handle: 42
+ nr_base_pages: 20824
+ ...
+
+ echo 1 > /sys/kernel/debug/page_owner_stacks/print_handle
+ echo 0 > /sys/kernel/debug/page_owner_stacks/print_stack
+ cat /sys/kernel/debug/page_owner_stacks/show_stacks > handles_7000.txt
+ cat handles_7000.txt
+ handle: 42
nr_base_pages: 20824
...
--
2.48.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks'
2025-09-24 17:40 [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks' Mauricio Faria de Oliveira
` (2 preceding siblings ...)
2025-09-24 17:40 ` [PATCH 3/3] mm/page_owner: update Documentation with 'print_handle' and 'print_stack' Mauricio Faria de Oliveira
@ 2025-09-25 16:08 ` Michal Hocko
2025-09-25 19:38 ` Mauricio Faria de Oliveira
3 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2025-09-25 16:08 UTC (permalink / raw)
To: Mauricio Faria de Oliveira
Cc: Andrew Morton, Vlastimil Babka, Oscar Salvador,
Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
linux-mm, linux-kernel, kernel-dev
On Wed 24-09-25 14:40:20, Mauricio Faria de Oliveira wrote:
> Problem:
>
> The use case of monitoring the memory usage per stack trace (or tracking
> a particular stack trace) requires uniquely identifying each stack trace.
>
> This has to be done for every stack trace on every sample of monitoring,
> even if tracking only one stack trace (to identify it among all others).
>
> Therefore, an approach like, for example, hashing the stack traces from
> 'show_stacks' for calculating unique identifiers can become expensive.
>
> Solution:
>
> Fortunately, the kernel can provide a unique identifier for stack traces
> in page_owner, which is the handle number in stackdepot.
>
> Additionally, with that information, the stack traces themselves are not
> needed until the time when the memory usage should be associated with a
> stack trace (say, to look at a few top consumers), using handle numbers.
>
> This eliminates hashing and reduces filtering related to stack traces in
> userspace, and reduces text emitted/copied by the kernel.
Let's see if I understand this correctly. You are suggesting trimming
down the output to effectivelly key, value pair and only resolve the key
once per debugging session because keys do not change and you do not
need the full stack traces that maps to the key. Correct?
Could you elaborate some more on why the performance really matters here?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks'
2025-09-25 16:08 ` [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks' Michal Hocko
@ 2025-09-25 19:38 ` Mauricio Faria de Oliveira
2025-09-26 6:55 ` Michal Hocko
0 siblings, 1 reply; 14+ messages in thread
From: Mauricio Faria de Oliveira @ 2025-09-25 19:38 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Vlastimil Babka, Oscar Salvador,
Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
linux-mm, linux-kernel, kernel-dev
On 2025-09-25 13:08, Michal Hocko wrote:
> On Wed 24-09-25 14:40:20, Mauricio Faria de Oliveira wrote:
>> Problem:
>>
>> The use case of monitoring the memory usage per stack trace (or tracking
>> a particular stack trace) requires uniquely identifying each stack trace.
>>
>> This has to be done for every stack trace on every sample of monitoring,
>> even if tracking only one stack trace (to identify it among all others).
>>
>> Therefore, an approach like, for example, hashing the stack traces from
>> 'show_stacks' for calculating unique identifiers can become expensive.
>>
>> Solution:
>>
>> Fortunately, the kernel can provide a unique identifier for stack traces
>> in page_owner, which is the handle number in stackdepot.
>>
>> Additionally, with that information, the stack traces themselves are not
>> needed until the time when the memory usage should be associated with a
>> stack trace (say, to look at a few top consumers), using handle numbers.
>>
>> This eliminates hashing and reduces filtering related to stack traces in
>> userspace, and reduces text emitted/copied by the kernel.
>
> Let's see if I understand this correctly. You are suggesting trimming
> down the output to effectivelly key, value pair and only resolve the key
> once per debugging session because keys do not change and you do not
> need the full stack traces that maps to the key. Correct?
Yes, exactly.
> Could you elaborate some more on why the performance really matters here?
Sure.
One reason is optimizing data processing.
Currently, the step to obtain the key of a strack trace (e.g., hashing)
incurs
a considerable work (done for all stack traces, on every sample) that
actually
is duplicated work (the same result for each stack trace, on every
sample).
That calculation is a significant overhead compared to the operation
it's done
for, which is '(calculated) key = memory usage'.
Thus, optimizing that step to just reading the key from the kernel would
save
resources (processing) and time (e.g., waiting for results to be ready,
on post
processing; or reducing the time required per sample, on live
monitoring).
Another reason is optimizing data collection.
There is some overhead in periodically waking-up, reading and storing
data, and
later in filtering it. (Admittedly, much less significant than the
above.)
However, despite being a minor improvement, it actually prevents the
production
of data that is discarded at consumption; that helps both producer and
consumer.
The cumulative improvement may be interesting over very long profiling
sessions.
Hope this addresses your question. Happy to provide more context or
details.
Thanks,
--
Mauricio
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks'
2025-09-25 19:38 ` Mauricio Faria de Oliveira
@ 2025-09-26 6:55 ` Michal Hocko
2025-09-26 16:47 ` Mauricio Faria de Oliveira
0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2025-09-26 6:55 UTC (permalink / raw)
To: Mauricio Faria de Oliveira
Cc: Andrew Morton, Vlastimil Babka, Oscar Salvador,
Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
linux-mm, linux-kernel, kernel-dev
On Thu 25-09-25 16:38:46, Mauricio Faria de Oliveira wrote:
> On 2025-09-25 13:08, Michal Hocko wrote:
[...]
> > Could you elaborate some more on why the performance really matters here?
>
> Sure.
>
> One reason is optimizing data processing.
>
> Currently, the step to obtain the key of a strack trace (e.g., hashing)
> incurs
> a considerable work (done for all stack traces, on every sample) that
> actually
> is duplicated work (the same result for each stack trace, on every
> sample).
OK, that was not really clear to me but the above seems to suggest that
by hashing you really mean hashing in the userspace when trying to
create a key so that you can watch memory consumption trends per stack
trace (hash in this case) without post processing.
Stating that more explicitly in the changelog along with an example on
how you are using this would be really helpful.
When the interface was originally introduced the primary usecase was to
examine biggest memory consumers - e.g. when memory counters do not add
up to counters that track most common users (e.g. userspace memory, slab
caches etc.). In those case you need to see those stack traces as those
are giving you the most valuable information.
I can see you are coming from a different direction and you want to
collect data repeatedly and watch for trends rather than analyzing a
particular situation. This seems like a useful usecase in itself.
My main question is whether this should squashed into the existing file
with a rather strange semantic of controling the file content depending
on a different file content. Instead, would it make more sense to add
two more files, one to display your requested key:value data and another
to resolve key -> stack trace?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks'
2025-09-26 6:55 ` Michal Hocko
@ 2025-09-26 16:47 ` Mauricio Faria de Oliveira
2025-09-30 14:02 ` Michal Hocko
0 siblings, 1 reply; 14+ messages in thread
From: Mauricio Faria de Oliveira @ 2025-09-26 16:47 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Vlastimil Babka, Oscar Salvador,
Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
linux-mm, linux-kernel, kernel-dev
On 2025-09-26 03:55, Michal Hocko wrote:
> On Thu 25-09-25 16:38:46, Mauricio Faria de Oliveira wrote:
>> On 2025-09-25 13:08, Michal Hocko wrote:
> [...]
>> > Could you elaborate some more on why the performance really matters here?
>>
>> Sure.
>>
>> One reason is optimizing data processing.
>>
>> Currently, the step to obtain the key of a strack trace (e.g., hashing)
>> incurs
>> a considerable work (done for all stack traces, on every sample) that
>> actually
>> is duplicated work (the same result for each stack trace, on every
>> sample).
>
> OK, that was not really clear to me but the above seems to suggest that
> by hashing you really mean hashing in the userspace when trying to
> create a key so that you can watch memory consumption trends per stack
> trace (hash in this case) without post processing.
Yes.
> Stating that more explicitly in the changelog along with an example on
> how you are using this would be really helpful.
Sure. Thanks for pointing that out, and making the effort to understand.
> When the interface was originally introduced the primary usecase was to
> examine biggest memory consumers - e.g. when memory counters do not add
> up to counters that track most common users (e.g. userspace memory, slab
> caches etc.). In those case you need to see those stack traces as those
> are giving you the most valuable information.
>
> I can see you are coming from a different direction and you want to
> collect data repeatedly and watch for trends rather than analyzing a
> particular situation. This seems like a useful usecase in itself.
Precisely. I can make that more explicit in the changelog as well.
> My main question is whether this should squashed into the existing file
> with a rather strange semantic of controling the file content depending
> on a different file content. Instead, would it make more sense to add
> two more files, one to display your requested key:value data and another
> to resolve key -> stack trace?
I see your point. Either way works for me, honestly.
Let me justify the current way, but it's certainly OK to change it, if
that is preferred.
The use of option files has precedents in page_owner itself
(count_threshould) and ftrace (/sys/kernel/debug/trace/options/*).
The use of output files needs more code/complexity for a similar result,
AFAICT (I actually started it this way, but changed it to minimize
changes).
The reason is debugfs_create_bool() is more specialized/simpler to
handle than debugfs_create_file().
It ends up with a similar pattern in a common "__stack_print()" to avoid
duplicate code (conditions on parameters to configure the output), and
it adds:
- 2 ops structs per file (file_operations and seq_operations, as in
'show_stacks'), for plumbing different behaviors down to different
functions, to call the common function with different parameters.
- It should be possible to reduce it with private fields (from
debugfs_create_file(data) to seq_file.private), however, since
seq_file.private is used (iterator in stack_start|next()), this needs
more code: a new struct for the private field (to store the current
iterator and add the new parameters).
So, I went for the (IMHO) simpler and smaller implementation with option
files instead of output files.
Please let me know which way is preferred, and I'll send v2 with that
(in addition to the changelog suggestions).
Thanks,
--
Mauricio
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks'
2025-09-26 16:47 ` Mauricio Faria de Oliveira
@ 2025-09-30 14:02 ` Michal Hocko
2025-09-30 14:32 ` Mauricio Faria de Oliveira
2025-10-01 10:58 ` Vlastimil Babka
0 siblings, 2 replies; 14+ messages in thread
From: Michal Hocko @ 2025-09-30 14:02 UTC (permalink / raw)
To: Mauricio Faria de Oliveira
Cc: Andrew Morton, Vlastimil Babka, Oscar Salvador,
Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
linux-mm, linux-kernel, kernel-dev
On Fri 26-09-25 13:47:15, Mauricio Faria de Oliveira wrote:
> On 2025-09-26 03:55, Michal Hocko wrote:
> > On Thu 25-09-25 16:38:46, Mauricio Faria de Oliveira wrote:
> >> On 2025-09-25 13:08, Michal Hocko wrote:
> > [...]
> >> > Could you elaborate some more on why the performance really matters here?
> >>
> >> Sure.
> >>
> >> One reason is optimizing data processing.
> >>
> >> Currently, the step to obtain the key of a strack trace (e.g., hashing)
> >> incurs
> >> a considerable work (done for all stack traces, on every sample) that
> >> actually
> >> is duplicated work (the same result for each stack trace, on every
> >> sample).
> >
> > OK, that was not really clear to me but the above seems to suggest that
> > by hashing you really mean hashing in the userspace when trying to
> > create a key so that you can watch memory consumption trends per stack
> > trace (hash in this case) without post processing.
>
> Yes.
>
> > Stating that more explicitly in the changelog along with an example on
> > how you are using this would be really helpful.
>
> Sure. Thanks for pointing that out, and making the effort to understand.
>
> > When the interface was originally introduced the primary usecase was to
> > examine biggest memory consumers - e.g. when memory counters do not add
> > up to counters that track most common users (e.g. userspace memory, slab
> > caches etc.). In those case you need to see those stack traces as those
> > are giving you the most valuable information.
> >
> > I can see you are coming from a different direction and you want to
> > collect data repeatedly and watch for trends rather than analyzing a
> > particular situation. This seems like a useful usecase in itself.
>
> Precisely. I can make that more explicit in the changelog as well.
>
> > My main question is whether this should squashed into the existing file
> > with a rather strange semantic of controling the file content depending
> > on a different file content. Instead, would it make more sense to add
> > two more files, one to display your requested key:value data and another
> > to resolve key -> stack trace?
>
> I see your point. Either way works for me, honestly.
> Let me justify the current way, but it's certainly OK to change it, if
> that is preferred.
>
> The use of option files has precedents in page_owner itself
> (count_threshould) and ftrace (/sys/kernel/debug/trace/options/*).
>
> The use of output files needs more code/complexity for a similar result,
> AFAICT (I actually started it this way, but changed it to minimize
> changes).
> The reason is debugfs_create_bool() is more specialized/simpler to
> handle than debugfs_create_file().
>
> It ends up with a similar pattern in a common "__stack_print()" to avoid
> duplicate code (conditions on parameters to configure the output), and
> it adds:
> - 2 ops structs per file (file_operations and seq_operations, as in
> 'show_stacks'), for plumbing different behaviors down to different
> functions, to call the common function with different parameters.
> - It should be possible to reduce it with private fields (from
> debugfs_create_file(data) to seq_file.private), however, since
> seq_file.private is used (iterator in stack_start|next()), this needs
> more code: a new struct for the private field (to store the current
> iterator and add the new parameters).
>
> So, I went for the (IMHO) simpler and smaller implementation with option
> files instead of output files.
>
> Please let me know which way is preferred, and I'll send v2 with that
> (in addition to the changelog suggestions).
Sure, I see. The main problem with the option file is that it is
inherently suited for a single consumer which is a hard assumption to
make at this stage. So I think it is worth having a separate 2 files
which provide the missing functionality.
Thanks!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks'
2025-09-30 14:02 ` Michal Hocko
@ 2025-09-30 14:32 ` Mauricio Faria de Oliveira
2025-10-01 10:58 ` Vlastimil Babka
1 sibling, 0 replies; 14+ messages in thread
From: Mauricio Faria de Oliveira @ 2025-09-30 14:32 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, Vlastimil Babka, Oscar Salvador,
Suren Baghdasaryan, Brendan Jackman, Johannes Weiner, Zi Yan,
linux-mm, linux-kernel, kernel-dev
On 2025-09-30 11:02, Michal Hocko wrote:
> Sure, I see. The main problem with the option file is that it is
> inherently suited for a single consumer which is a hard assumption to
> make at this stage. So I think it is worth having a separate 2 files
> which provide the missing functionality.
Ack; I'll submit v2 addressing your suggestions. Thanks!
--
Mauricio
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks'
2025-09-30 14:02 ` Michal Hocko
2025-09-30 14:32 ` Mauricio Faria de Oliveira
@ 2025-10-01 10:58 ` Vlastimil Babka
2025-10-01 17:37 ` Mauricio Faria de Oliveira
1 sibling, 1 reply; 14+ messages in thread
From: Vlastimil Babka @ 2025-10-01 10:58 UTC (permalink / raw)
To: Michal Hocko, Mauricio Faria de Oliveira
Cc: Andrew Morton, Oscar Salvador, Suren Baghdasaryan,
Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
kernel-dev
On 9/30/25 4:02 PM, Michal Hocko wrote:
> On Fri 26-09-25 13:47:15, Mauricio Faria de Oliveira wrote:
>>> My main question is whether this should squashed into the existing file
>>> with a rather strange semantic of controling the file content depending
>>> on a different file content. Instead, would it make more sense to add
>>> two more files, one to display your requested key:value data and another
>>> to resolve key -> stack trace?
>>
>> I see your point. Either way works for me, honestly.
>> Let me justify the current way, but it's certainly OK to change it, if
>> that is preferred.
>>
>> The use of option files has precedents in page_owner itself
>> (count_threshould) and ftrace (/sys/kernel/debug/trace/options/*).
>>
>> The use of output files needs more code/complexity for a similar result,
>> AFAICT (I actually started it this way, but changed it to minimize
>> changes).
>> The reason is debugfs_create_bool() is more specialized/simpler to
>> handle than debugfs_create_file().
>>
>> It ends up with a similar pattern in a common "__stack_print()" to avoid
>> duplicate code (conditions on parameters to configure the output), and
>> it adds:
>> - 2 ops structs per file (file_operations and seq_operations, as in
>> 'show_stacks'), for plumbing different behaviors down to different
>> functions, to call the common function with different parameters.
>> - It should be possible to reduce it with private fields (from
>> debugfs_create_file(data) to seq_file.private), however, since
>> seq_file.private is used (iterator in stack_start|next()), this needs
>> more code: a new struct for the private field (to store the current
>> iterator and add the new parameters).
>>
>> So, I went for the (IMHO) simpler and smaller implementation with option
>> files instead of output files.
>>
>> Please let me know which way is preferred, and I'll send v2 with that
>> (in addition to the changelog suggestions).
>
> Sure, I see. The main problem with the option file is that it is
> inherently suited for a single consumer which is a hard assumption to
> make at this stage. So I think it is worth having a separate 2 files
> which provide the missing functionality.
Agreed, we should prioritize a better userspace API over having simpler
kernel implementation.
Will count_threshold apply the same to the new file that prints only
handles? I guess it will?
Also the handles to stack translation file could perhaps support
"seeking" to a specific handle if you're interested in only a few
handles. Perhaps not necessary though if you plan to read it all just once.
> Thanks!
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] mm/page_owner: add options 'print_handle' and 'print_stack' for 'show_stacks'
2025-10-01 10:58 ` Vlastimil Babka
@ 2025-10-01 17:37 ` Mauricio Faria de Oliveira
0 siblings, 0 replies; 14+ messages in thread
From: Mauricio Faria de Oliveira @ 2025-10-01 17:37 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Michal Hocko, Andrew Morton, Oscar Salvador, Suren Baghdasaryan,
Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
kernel-dev
On 2025-10-01 07:58, Vlastimil Babka wrote:
> On 9/30/25 4:02 PM, Michal Hocko wrote:
>> On Fri 26-09-25 13:47:15, Mauricio Faria de Oliveira wrote:
>>>> My main question is whether this should squashed into the existing file
>>>> with a rather strange semantic of controling the file content depending
>>>> on a different file content. Instead, would it make more sense to add
>>>> two more files, one to display your requested key:value data and another
>>>> to resolve key -> stack trace?
>>>
>>> I see your point. Either way works for me, honestly.
>>> Let me justify the current way, but it's certainly OK to change it, if
>>> that is preferred.
>>>
>>> The use of option files has precedents in page_owner itself
>>> (count_threshould) and ftrace (/sys/kernel/debug/trace/options/*).
>>>
>>> The use of output files needs more code/complexity for a similar result,
>>> AFAICT (I actually started it this way, but changed it to minimize
>>> changes).
>>> The reason is debugfs_create_bool() is more specialized/simpler to
>>> handle than debugfs_create_file().
>>>
>>> It ends up with a similar pattern in a common "__stack_print()" to avoid
>>> duplicate code (conditions on parameters to configure the output), and
>>> it adds:
>>> - 2 ops structs per file (file_operations and seq_operations, as in
>>> 'show_stacks'), for plumbing different behaviors down to different
>>> functions, to call the common function with different parameters.
>>> - It should be possible to reduce it with private fields (from
>>> debugfs_create_file(data) to seq_file.private), however, since
>>> seq_file.private is used (iterator in stack_start|next()), this needs
>>> more code: a new struct for the private field (to store the current
>>> iterator and add the new parameters).
>>>
>>> So, I went for the (IMHO) simpler and smaller implementation with option
>>> files instead of output files.
>>>
>>> Please let me know which way is preferred, and I'll send v2 with that
>>> (in addition to the changelog suggestions).
>>
>> Sure, I see. The main problem with the option file is that it is
>> inherently suited for a single consumer which is a hard assumption to
>> make at this stage. So I think it is worth having a separate 2 files
>> which provide the missing functionality.
>
> Agreed, we should prioritize a better userspace API over having simpler
> kernel implementation.
Just to clarify, I agree. I personally considered option files a good
userspace API for this, but later realized the different perspective
above.
> Will count_threshold apply the same to the new file that prints only
> handles? I guess it will?
Yes. The new file that prints only handles should only differ in format,
not behavior.
This is different for the file that translates handles to stacks,
though. For that, count_threshould does not apply, as a previously
reported handle may have less pages and not make it, which would not
allow it to be resolved to a stack trace.
> Also the handles to stack translation file could perhaps support
> "seeking" to a specific handle if you're interested in only a few
> handles. Perhaps not necessary though if you plan to read it all just once.
That sounds cool. For the current usecase, it only needs to read it all
just once, indeed. I'd be happy to revisit this later, if needed.
Let's see how v2 goes; sending it out.
Thanks!
--
Mauricio
^ permalink raw reply [flat|nested] 14+ messages in thread