* [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination
@ 2026-04-13 18:52 Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan
0 siblings, 2 replies; 5+ messages in thread
From: Liew Rui Yan @ 2026-04-13 18:52 UTC (permalink / raw)
To: SeongJae Park; +Cc: damon, linux-mm, Liew Rui Yan
Problem
========
When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
'kdamond_pid' remains stale. This prevents user from restarting DAMON
because both writing 'Y' and 'N' to 'enabled' will fail.
"Unexpected termination" here means the kdamond exits without any user
request (e.g., not by writing 'N' to 'enabled').
User Impact
===========
Once kdamond terminates this way, it cannot be restarted via sysfs
because:
1. DAMON_LRU_SORT/DAMON_RECLAIM is built into the kernel, so it cannot
be unloaded and reloaded at runtime.
2. Writing 'N' to 'enabled' fails because kdamond no longer exists;
Writing 'Y' does nothing, as 'enabled' is already Y.
As a result, the only way to restore DAMON functionality is a full
system reboot.
Solution
========
damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only
sets it to false upon successful completion. When 'maybe_corrupted'
remains true, kdamond will terminate eventually.
Therefore:
1. In damon_{lru_sort, reclaim}_turn(): Add fallback logic to reset
parameters when damon_stop() fails but kdamond is not running.
2. In damon_{lru_sort, reclaim}_apply_parameters(): Reset parameters
when damon_commit_ctx() fails, as kdamond will terminate due to
maybe_corrupted mechanism.
Changes from RFC-v1
(https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com)
- Remove RFC tag.
- Remove 'damon_thread_status' structure and damon_update_thread_status()
(SJ pointed out this was too much extension of core API for a problem
that can be fixed more simply).
- Add a fallback in damon_{lru_sort, reclaim}_turn() 'N' path. If
damon_stop() fails but kdamond is not running, forcefully reset the
parameters.
- Reset 'enabled' and 'kdamond_pid' when damon_commit_ctx() fails in
damon_{lru_sort, reclaim}_apply_parameters() (kdamond will terminate
eventually in this case).
Liew Rui Yan (2):
mm/damon/lru_sort: reset thread status parameters upon kdamond
termination
mm/damon/reclaim: reset thread status parameters upon kdamond
termination
mm/damon/lru_sort.c | 13 +++++++++++--
mm/damon/reclaim.c | 13 +++++++++++--
2 files changed, 22 insertions(+), 4 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 1/2] mm/damon/lru_sort: reset thread status parameters upon kdamond termination
2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
@ 2026-04-13 18:52 ` Liew Rui Yan
2026-04-13 19:54 ` (sashiko review) " Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan
1 sibling, 1 reply; 5+ messages in thread
From: Liew Rui Yan @ 2026-04-13 18:52 UTC (permalink / raw)
To: SeongJae Park; +Cc: damon, linux-mm, Liew Rui Yan
Problem
========
When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
'kdamond_pid' remains stale. This prevents user from restarting DAMON
because both writing 'Y' and 'N' to 'enabled' will fail.
"Unexpected termination" here means the kdamond exits without any user
request (e.g., not by writing 'N' to 'enabled').
User Impact
===========
Once kdamond terminates this way, it cannot be restarted via sysfs
because:
1. DAMON_LRU_SORT is built into the kernel, so it cannot be unloaded and
reloaded at runtime.
2. Writing 'N' to 'enabled' fails because kdamond no longer exists;
Writing 'Y' does nothing, as 'enabled' is already Y.
As a result, the only way to restore DAMON functionality is a full
system reboot.
Solution
========
damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only
sets it to false upon successful completion. When 'maybe_corrupted'
remains true, kdamond will terminate eventually.
Therefore:
1. In damon_lru_sort_turn(): Add fallback logic to reset
parameters when damon_stop() fails but kdamond is not running.
2. In damon_lru_sort_apply_parameters(): Reset parameters
when damon_commit_ctx() fails, as kdamond will terminate due to
maybe_corrupted mechanism.
Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
---
mm/damon/lru_sort.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
index 554559d72976..96c8d0dfcafd 100644
--- a/mm/damon/lru_sort.c
+++ b/mm/damon/lru_sort.c
@@ -344,6 +344,10 @@ static int damon_lru_sort_apply_parameters(void)
if (err)
goto out;
err = damon_commit_ctx(ctx, param_ctx);
+ if (err) {
+ enabled = false;
+ kdamond_pid = -1;
+ }
out:
damon_destroy_ctx(param_ctx);
return err;
@@ -388,8 +392,13 @@ static int damon_lru_sort_turn(bool on)
if (!on) {
err = damon_stop(&ctx, 1);
- if (!err)
- kdamond_pid = -1;
+ if (err) {
+ if (!damon_is_running(ctx))
+ err = 0;
+ else
+ return err;
+ }
+ kdamond_pid = -1;
return err;
}
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 2/2] mm/damon/reclaim: reset thread status parameters upon kdamond termination
2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
@ 2026-04-13 18:52 ` Liew Rui Yan
2026-04-13 19:57 ` (sashiko review) " Liew Rui Yan
1 sibling, 1 reply; 5+ messages in thread
From: Liew Rui Yan @ 2026-04-13 18:52 UTC (permalink / raw)
To: SeongJae Park; +Cc: damon, linux-mm, Liew Rui Yan
Problem
========
When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
'kdamond_pid' remains stale. This prevents user from restarting DAMON
because both writing 'Y' and 'N' to 'enabled' will fail.
"Unexpected termination" here means the kdamond exits without any user
request (e.g., not by writing 'N' to 'enabled').
User Impact
===========
Once kdamond terminates this way, it cannot be restarted via sysfs
because:
1. DAMON_RECLAIM is built into the kernel, so it cannot be unloaded and
reloaded at runtime.
2. Writing 'N' to 'enabled' fails because kdamond no longer exists;
Writing 'Y' does nothing, as 'enabled' is already Y.
As a result, the only way to restore DAMON functionality is a full
system reboot.
Solution
========
damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only
sets it to false upon successful completion. When 'maybe_corrupted'
remains true, kdamond will terminate eventually.
Therefore:
1. In damon_reclaim_turn(): Add fallback logic to reset parameters when
damon_stop() fails but kdamond is not running.
2. In damon_reclaim_apply_parameters(): Reset parameters when
damon_commit_ctx() fails, as kdamond will terminate due to
maybe_corrupted mechanism.
Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
---
mm/damon/reclaim.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index 86da14778658..e3e148fd80f9 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -250,6 +250,10 @@ static int damon_reclaim_apply_parameters(void)
if (err)
goto out;
err = damon_commit_ctx(ctx, param_ctx);
+ if (err) {
+ enabled = false;
+ kdamond_pid = -1;
+ }
out:
damon_destroy_ctx(param_ctx);
return err;
@@ -290,8 +294,13 @@ static int damon_reclaim_turn(bool on)
if (!on) {
err = damon_stop(&ctx, 1);
- if (!err)
- kdamond_pid = -1;
+ if (err) {
+ if (!damon_is_running(ctx))
+ err = 0;
+ else
+ return err;
+ }
+ kdamond_pid = -1;
return err;
}
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: (sashiko review) [PATCH v2 1/2] mm/damon/lru_sort: reset thread status parameters upon kdamond termination
2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
@ 2026-04-13 19:54 ` Liew Rui Yan
0 siblings, 0 replies; 5+ messages in thread
From: Liew Rui Yan @ 2026-04-13 19:54 UTC (permalink / raw)
To: aethernet65535; +Cc: damon, linux-mm, sj
Sashiko found some issues, so I will send my comment for it soon.
Forwarding sashiko review for doing discussions via mails.
# review url: https://sashiko.dev/#/patchset/20260413185249.5921-2-aethernet65535@gmail.com
# start of sashiko.dev inline review
commit a943fcfe60e73956b72cad5a625120222653bdda
Author: Liew Rui Yan <aethernet65535@gmail.com>
mm/damon/lru_sort: reset thread status parameters upon kdamond termination
This commit adds logic to reset thread status parameters when kdamond
terminates unexpectedly or when context commit fails. This ensures the
module does not get stuck in a state where it cannot be restarted via sysfs.
> diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
> index 554559d729760..96c8d0dfcafd2 100644
> --- a/mm/damon/lru_sort.c
> +++ b/mm/damon/lru_sort.c
> @@ -344,6 +344,10 @@ static int damon_lru_sort_apply_parameters(void)
> if (err)
> goto out;
> err = damon_commit_ctx(ctx, param_ctx);
> + if (err) {
> + enabled = false;
> + kdamond_pid = -1;
> + }
Does updating the module parameters here require holding kernel_param_lock?
Since damon_lru_sort_apply_parameters() can be executed asynchronously by the
kdamond thread when a user writes to the commit_inputs parameter, changing
these variables locklessly might introduce a race condition.
If enabled is set to false here while kdamond is still preparing to terminate,
could a concurrent sysfs write (echo Y > enabled) read the false state and
proceed to call damon_lru_sort_turn(true) because it incorrectly assumes the
worker has completely stopped?
If so, damon_lru_sort_turn(true) would call damon_commit_ctx(ctx, ...) and
modify the shared ctx structures while the exiting kdamond worker thread is
concurrently executing its cleanup block, such as damon_destroy_targets(ctx).
Can this concurrent modification of the context lists lead to use-after-free
issues or list corruption?
[ ... ]
# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260413185249.5921-2-aethernet65535@gmail.com
# hkml [1] generated a draft of this mail. You can regenerate
# this using below command:
#
# hkml patch sashiko_dev --for_forwarding \
# 20260413185249.5921-2-aethernet65535@gmail.com
#
# [1] https://github.com/sjp38/hackermail
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: (sashiko review) [PATCH v2 2/2] mm/damon/reclaim: reset thread status parameters upon kdamond termination
2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan
@ 2026-04-13 19:57 ` Liew Rui Yan
0 siblings, 0 replies; 5+ messages in thread
From: Liew Rui Yan @ 2026-04-13 19:57 UTC (permalink / raw)
To: aethernet65535; +Cc: damon, linux-mm, sj
Sashiko found some issues, so I will send my comment for it soon.
Forwarding sashiko review for doing discussions via mails.
# review url: https://sashiko.dev/#/patchset/20260413185249.5921-3-aethernet65535@gmail.com
# start of sashiko.dev inline review
commit cefd2ed657fa3baedb834d79191ef3af7c5363ac
Author: Liew Rui Yan <aethernet65535@gmail.com>
mm/damon/reclaim: reset thread status parameters upon kdamond termination
This commit aims to fix an issue where an unexpected kdamond termination
leaves the enabled parameter as 'Y' and a stale kdamond_pid, preventing
the user from restarting DAMON without a reboot. It adds fallback logic
to reset these parameters when damon_commit_ctx() fails or when
damon_stop() fails but the thread is no longer running.
> diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
> index 86da147786583..e3e148fd80f97 100644
> --- a/mm/damon/reclaim.c
> +++ b/mm/damon/reclaim.c
[ ... ]
> @@ -250,6 +250,10 @@ static int damon_reclaim_apply_parameters(void)
> if (err)
> goto out;
> err = damon_commit_ctx(ctx, param_ctx);
> + if (err) {
> + enabled = false;
> + kdamond_pid = -1;
> + }
> out:
> damon_destroy_ctx(param_ctx);
> return err;
Can prematurely resetting enabled to false here introduce a race condition
leading to a use-after-free of the DAMON context structures?
If damon_reclaim_apply_parameters() is invoked from the kdamond worker thread
(for example, when applying commit_inputs) and damon_commit_ctx() fails,
ctx->maybe_corrupted is set to true. This signals the kdamond thread to
break its main loop and begin its teardown phase, such as executing
damon_destroy_targets() to free lists.
Because enabled is set to false asynchronously here, a concurrent sysfs write
of 'Y' to enabled will succeed and immediately trigger
damon_reclaim_turn(true). This unconditionally calls
damon_reclaim_apply_parameters() and executes damon_commit_ctx() from the
sysfs thread.
Since damon_commit_ctx() locklessly mutates and frees items in
ctx->adaptive_targets and ctx->schemes, would this race directly with the
still-exiting kdamond thread traversing and freeing those exact same lists,
resulting in list corruption and a use-after-free?
# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260413185249.5921-3-aethernet65535@gmail.com
# hkml [1] generated a draft of this mail. You can regenerate
# this using below command:
#
# hkml patch sashiko_dev --for_forwarding \
# 20260413185249.5921-3-aethernet65535@gmail.com
#
# [1] https://github.com/sjp38/hackermail
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-13 19:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
2026-04-13 19:54 ` (sashiko review) " Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan
2026-04-13 19:57 ` (sashiko review) " Liew Rui Yan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox