* [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination
@ 2026-04-13 18:52 Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Liew Rui Yan @ 2026-04-13 18:52 UTC (permalink / raw)
To: SeongJae Park; +Cc: damon, linux-mm, Liew Rui Yan
Problem
========
When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
'kdamond_pid' remains stale. This prevents user from restarting DAMON
because both writing 'Y' and 'N' to 'enabled' will fail.
"Unexpected termination" here means the kdamond exits without any user
request (e.g., not by writing 'N' to 'enabled').
User Impact
===========
Once kdamond terminates this way, it cannot be restarted via sysfs
because:
1. DAMON_LRU_SORT/DAMON_RECLAIM is built into the kernel, so it cannot
be unloaded and reloaded at runtime.
2. Writing 'N' to 'enabled' fails because kdamond no longer exists;
Writing 'Y' does nothing, as 'enabled' is already Y.
As a result, the only way to restore DAMON functionality is a full
system reboot.
Solution
========
damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only
sets it to false upon successful completion. When 'maybe_corrupted'
remains true, kdamond will terminate eventually.
Therefore:
1. In damon_{lru_sort, reclaim}_turn(): Add fallback logic to reset
parameters when damon_stop() fails but kdamond is not running.
2. In damon_{lru_sort, reclaim}_apply_parameters(): Reset parameters
when damon_commit_ctx() fails, as kdamond will terminate due to
maybe_corrupted mechanism.
Changes from RFC-v1
(https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com)
- Remove RFC tag.
- Remove 'damon_thread_status' structure and damon_update_thread_status()
(SJ pointed out this was too much extension of core API for a problem
that can be fixed more simply).
- Add a fallback in damon_{lru_sort, reclaim}_turn() 'N' path. If
damon_stop() fails but kdamond is not running, forcefully reset the
parameters.
- Reset 'enabled' and 'kdamond_pid' when damon_commit_ctx() fails in
damon_{lru_sort, reclaim}_apply_parameters() (kdamond will terminate
eventually in this case).
Liew Rui Yan (2):
mm/damon/lru_sort: reset thread status parameters upon kdamond
termination
mm/damon/reclaim: reset thread status parameters upon kdamond
termination
mm/damon/lru_sort.c | 13 +++++++++++--
mm/damon/reclaim.c | 13 +++++++++++--
2 files changed, 22 insertions(+), 4 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH v2 1/2] mm/damon/lru_sort: reset thread status parameters upon kdamond termination 2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan @ 2026-04-13 18:52 ` Liew Rui Yan 2026-04-13 19:54 ` (sashiko review) " Liew Rui Yan 2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan ` (3 subsequent siblings) 4 siblings, 1 reply; 12+ messages in thread From: Liew Rui Yan @ 2026-04-13 18:52 UTC (permalink / raw) To: SeongJae Park; +Cc: damon, linux-mm, Liew Rui Yan Problem ======== When kdamond terminates unexpectedly, 'enabled' remains 'Y' and 'kdamond_pid' remains stale. This prevents user from restarting DAMON because both writing 'Y' and 'N' to 'enabled' will fail. "Unexpected termination" here means the kdamond exits without any user request (e.g., not by writing 'N' to 'enabled'). User Impact =========== Once kdamond terminates this way, it cannot be restarted via sysfs because: 1. DAMON_LRU_SORT is built into the kernel, so it cannot be unloaded and reloaded at runtime. 2. Writing 'N' to 'enabled' fails because kdamond no longer exists; Writing 'Y' does nothing, as 'enabled' is already Y. As a result, the only way to restore DAMON functionality is a full system reboot. Solution ======== damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only sets it to false upon successful completion. When 'maybe_corrupted' remains true, kdamond will terminate eventually. Therefore: 1. In damon_lru_sort_turn(): Add fallback logic to reset parameters when damon_stop() fails but kdamond is not running. 2. In damon_lru_sort_apply_parameters(): Reset parameters when damon_commit_ctx() fails, as kdamond will terminate due to maybe_corrupted mechanism. Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com> --- mm/damon/lru_sort.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c index 554559d72976..96c8d0dfcafd 100644 --- a/mm/damon/lru_sort.c +++ b/mm/damon/lru_sort.c @@ -344,6 +344,10 @@ static int damon_lru_sort_apply_parameters(void) if (err) goto out; err = damon_commit_ctx(ctx, param_ctx); + if (err) { + enabled = false; + kdamond_pid = -1; + } out: damon_destroy_ctx(param_ctx); return err; @@ -388,8 +392,13 @@ static int damon_lru_sort_turn(bool on) if (!on) { err = damon_stop(&ctx, 1); - if (!err) - kdamond_pid = -1; + if (err) { + if (!damon_is_running(ctx)) + err = 0; + else + return err; + } + kdamond_pid = -1; return err; } -- 2.53.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: (sashiko review) [PATCH v2 1/2] mm/damon/lru_sort: reset thread status parameters upon kdamond termination 2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan @ 2026-04-13 19:54 ` Liew Rui Yan 0 siblings, 0 replies; 12+ messages in thread From: Liew Rui Yan @ 2026-04-13 19:54 UTC (permalink / raw) To: aethernet65535; +Cc: damon, linux-mm, sj Sashiko found some issues, so I will send my comment for it soon. Forwarding sashiko review for doing discussions via mails. # review url: https://sashiko.dev/#/patchset/20260413185249.5921-2-aethernet65535@gmail.com # start of sashiko.dev inline review commit a943fcfe60e73956b72cad5a625120222653bdda Author: Liew Rui Yan <aethernet65535@gmail.com> mm/damon/lru_sort: reset thread status parameters upon kdamond termination This commit adds logic to reset thread status parameters when kdamond terminates unexpectedly or when context commit fails. This ensures the module does not get stuck in a state where it cannot be restarted via sysfs. > diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c > index 554559d729760..96c8d0dfcafd2 100644 > --- a/mm/damon/lru_sort.c > +++ b/mm/damon/lru_sort.c > @@ -344,6 +344,10 @@ static int damon_lru_sort_apply_parameters(void) > if (err) > goto out; > err = damon_commit_ctx(ctx, param_ctx); > + if (err) { > + enabled = false; > + kdamond_pid = -1; > + } Does updating the module parameters here require holding kernel_param_lock? Since damon_lru_sort_apply_parameters() can be executed asynchronously by the kdamond thread when a user writes to the commit_inputs parameter, changing these variables locklessly might introduce a race condition. If enabled is set to false here while kdamond is still preparing to terminate, could a concurrent sysfs write (echo Y > enabled) read the false state and proceed to call damon_lru_sort_turn(true) because it incorrectly assumes the worker has completely stopped? If so, damon_lru_sort_turn(true) would call damon_commit_ctx(ctx, ...) and modify the shared ctx structures while the exiting kdamond worker thread is concurrently executing its cleanup block, such as damon_destroy_targets(ctx). Can this concurrent modification of the context lists lead to use-after-free issues or list corruption? [ ... ] # end of sashiko.dev inline review # review url: https://sashiko.dev/#/patchset/20260413185249.5921-2-aethernet65535@gmail.com # hkml [1] generated a draft of this mail. You can regenerate # this using below command: # # hkml patch sashiko_dev --for_forwarding \ # 20260413185249.5921-2-aethernet65535@gmail.com # # [1] https://github.com/sjp38/hackermail ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 2/2] mm/damon/reclaim: reset thread status parameters upon kdamond termination 2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan 2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan @ 2026-04-13 18:52 ` Liew Rui Yan 2026-04-13 19:57 ` (sashiko review) " Liew Rui Yan 2026-04-13 22:05 ` [PATCH v2 0/2] mm/damon: " Liew Rui Yan ` (2 subsequent siblings) 4 siblings, 1 reply; 12+ messages in thread From: Liew Rui Yan @ 2026-04-13 18:52 UTC (permalink / raw) To: SeongJae Park; +Cc: damon, linux-mm, Liew Rui Yan Problem ======== When kdamond terminates unexpectedly, 'enabled' remains 'Y' and 'kdamond_pid' remains stale. This prevents user from restarting DAMON because both writing 'Y' and 'N' to 'enabled' will fail. "Unexpected termination" here means the kdamond exits without any user request (e.g., not by writing 'N' to 'enabled'). User Impact =========== Once kdamond terminates this way, it cannot be restarted via sysfs because: 1. DAMON_RECLAIM is built into the kernel, so it cannot be unloaded and reloaded at runtime. 2. Writing 'N' to 'enabled' fails because kdamond no longer exists; Writing 'Y' does nothing, as 'enabled' is already Y. As a result, the only way to restore DAMON functionality is a full system reboot. Solution ======== damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only sets it to false upon successful completion. When 'maybe_corrupted' remains true, kdamond will terminate eventually. Therefore: 1. In damon_reclaim_turn(): Add fallback logic to reset parameters when damon_stop() fails but kdamond is not running. 2. In damon_reclaim_apply_parameters(): Reset parameters when damon_commit_ctx() fails, as kdamond will terminate due to maybe_corrupted mechanism. Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com> --- mm/damon/reclaim.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c index 86da14778658..e3e148fd80f9 100644 --- a/mm/damon/reclaim.c +++ b/mm/damon/reclaim.c @@ -250,6 +250,10 @@ static int damon_reclaim_apply_parameters(void) if (err) goto out; err = damon_commit_ctx(ctx, param_ctx); + if (err) { + enabled = false; + kdamond_pid = -1; + } out: damon_destroy_ctx(param_ctx); return err; @@ -290,8 +294,13 @@ static int damon_reclaim_turn(bool on) if (!on) { err = damon_stop(&ctx, 1); - if (!err) - kdamond_pid = -1; + if (err) { + if (!damon_is_running(ctx)) + err = 0; + else + return err; + } + kdamond_pid = -1; return err; } -- 2.53.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: (sashiko review) [PATCH v2 2/2] mm/damon/reclaim: reset thread status parameters upon kdamond termination 2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan @ 2026-04-13 19:57 ` Liew Rui Yan 0 siblings, 0 replies; 12+ messages in thread From: Liew Rui Yan @ 2026-04-13 19:57 UTC (permalink / raw) To: aethernet65535; +Cc: damon, linux-mm, sj Sashiko found some issues, so I will send my comment for it soon. Forwarding sashiko review for doing discussions via mails. # review url: https://sashiko.dev/#/patchset/20260413185249.5921-3-aethernet65535@gmail.com # start of sashiko.dev inline review commit cefd2ed657fa3baedb834d79191ef3af7c5363ac Author: Liew Rui Yan <aethernet65535@gmail.com> mm/damon/reclaim: reset thread status parameters upon kdamond termination This commit aims to fix an issue where an unexpected kdamond termination leaves the enabled parameter as 'Y' and a stale kdamond_pid, preventing the user from restarting DAMON without a reboot. It adds fallback logic to reset these parameters when damon_commit_ctx() fails or when damon_stop() fails but the thread is no longer running. > diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c > index 86da147786583..e3e148fd80f97 100644 > --- a/mm/damon/reclaim.c > +++ b/mm/damon/reclaim.c [ ... ] > @@ -250,6 +250,10 @@ static int damon_reclaim_apply_parameters(void) > if (err) > goto out; > err = damon_commit_ctx(ctx, param_ctx); > + if (err) { > + enabled = false; > + kdamond_pid = -1; > + } > out: > damon_destroy_ctx(param_ctx); > return err; Can prematurely resetting enabled to false here introduce a race condition leading to a use-after-free of the DAMON context structures? If damon_reclaim_apply_parameters() is invoked from the kdamond worker thread (for example, when applying commit_inputs) and damon_commit_ctx() fails, ctx->maybe_corrupted is set to true. This signals the kdamond thread to break its main loop and begin its teardown phase, such as executing damon_destroy_targets() to free lists. Because enabled is set to false asynchronously here, a concurrent sysfs write of 'Y' to enabled will succeed and immediately trigger damon_reclaim_turn(true). This unconditionally calls damon_reclaim_apply_parameters() and executes damon_commit_ctx() from the sysfs thread. Since damon_commit_ctx() locklessly mutates and frees items in ctx->adaptive_targets and ctx->schemes, would this race directly with the still-exiting kdamond thread traversing and freeing those exact same lists, resulting in list corruption and a use-after-free? # end of sashiko.dev inline review # review url: https://sashiko.dev/#/patchset/20260413185249.5921-3-aethernet65535@gmail.com # hkml [1] generated a draft of this mail. You can regenerate # this using below command: # # hkml patch sashiko_dev --for_forwarding \ # 20260413185249.5921-3-aethernet65535@gmail.com # # [1] https://github.com/sjp38/hackermail ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination 2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan 2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan 2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan @ 2026-04-13 22:05 ` Liew Rui Yan 2026-04-14 0:28 ` SeongJae Park 2026-04-14 0:22 ` SeongJae Park 2026-04-16 0:17 ` SeongJae Park 4 siblings, 1 reply; 12+ messages in thread From: Liew Rui Yan @ 2026-04-13 22:05 UTC (permalink / raw) To: aethernet65535, sj; +Cc: damon, linux-mm Hi SeongJae, I've reviewed the Sashiko report on [PATCH v2 1/2] and [2/2]. Since the issues are essentially the same, I want to reply to them all in one email. # PATCH v2 1/2 > > diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c > > index 554559d729760..96c8d0dfcafd2 100644 > > --- a/mm/damon/lru_sort.c > > +++ b/mm/damon/lru_sort.c > > @@ -344,6 +344,10 @@ static int damon_lru_sort_apply_parameters(void) > > if (err) > > goto out; > > err = damon_commit_ctx(ctx, param_ctx); > > + if (err) { > > + enabled = false; > > + kdamond_pid = -1; > > + } > > Does updating the module parameters here require holding kernel_param_lock? > > Since damon_lru_sort_apply_parameters() can be executed asynchronously by the > kdamond thread when a user writes to the commit_inputs parameter, changing > these variables locklessly might introduce a race condition. > > If enabled is set to false here while kdamond is still preparing to terminate, > could a concurrent sysfs write (echo Y > enabled) read the false state and > proceed to call damon_lru_sort_turn(true) because it incorrectly assumes the > worker has completely stopped? > > If so, damon_lru_sort_turn(true) would call damon_commit_ctx(ctx, ...) and > modify the shared ctx structures while the exiting kdamond worker thread is > concurrently executing its cleanup block, such as damon_destroy_targets(ctx). > > Can this concurrent modification of the context lists lead to use-after-free > issues or list corruption? # PATCH v2 2/2 > > diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c > > index 86da147786583..e3e148fd80f97 100644 > > --- a/mm/damon/reclaim.c > > +++ b/mm/damon/reclaim.c > [ ... ] > > @@ -250,6 +250,10 @@ static int damon_reclaim_apply_parameters(void) > > if (err) > > goto out; > > err = damon_commit_ctx(ctx, param_ctx); > > + if (err) { > > + enabled = false; > > + kdamond_pid = -1; > > + } > > out: > > damon_destroy_ctx(param_ctx); > > return err; > > Can prematurely resetting enabled to false here introduce a race condition > leading to a use-after-free of the DAMON context structures? > > If damon_reclaim_apply_parameters() is invoked from the kdamond worker thread > (for example, when applying commit_inputs) and damon_commit_ctx() fails, > ctx->maybe_corrupted is set to true. This signals the kdamond thread to > break its main loop and begin its teardown phase, such as executing > damon_destroy_targets() to free lists. > > Because enabled is set to false asynchronously here, a concurrent sysfs write > of 'Y' to enabled will succeed and immediately trigger > damon_reclaim_turn(true). This unconditionally calls > damon_reclaim_apply_parameters() and executes damon_commit_ctx() from the > sysfs thread. > > Since damon_commit_ctx() locklessly mutates and frees items in > ctx->adaptive_targets and ctx->schemes, would this race directly with the > still-exiting kdamond thread traversing and freeing those exact same lists, > resulting in list corruption and a use-after-free? The core issue is - modifying 'enabled' and 'kdamond_pid' in the error path of damon_commit_ctx() is racy. My plan for v3: - Remove the reset code in damon_*_apply_parameters() - Keep only the fix in damon_*_turn(false) This resolves the restart issue without introducing new races. Please let me know if this direction looks good. Small changes for v3: - Delete a "=" at the bottom of "Problem" (commit message): Problem - ======== + ======= Best regards, Rui Yan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination 2026-04-13 22:05 ` [PATCH v2 0/2] mm/damon: " Liew Rui Yan @ 2026-04-14 0:28 ` SeongJae Park 0 siblings, 0 replies; 12+ messages in thread From: SeongJae Park @ 2026-04-14 0:28 UTC (permalink / raw) To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm On Tue, 14 Apr 2026 06:05:11 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote: > Hi SeongJae, > > I've reviewed the Sashiko report on [PATCH v2 1/2] and [2/2]. Since the > issues are essentially the same, I want to reply to them all in one > email. [...] > # PATCH v2 2/2 > > > diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c > > > index 86da147786583..e3e148fd80f97 100644 > > > --- a/mm/damon/reclaim.c > > > +++ b/mm/damon/reclaim.c > > [ ... ] > > > @@ -250,6 +250,10 @@ static int damon_reclaim_apply_parameters(void) > > > if (err) > > > goto out; > > > err = damon_commit_ctx(ctx, param_ctx); > > > + if (err) { > > > + enabled = false; > > > + kdamond_pid = -1; > > > + } > > > out: > > > damon_destroy_ctx(param_ctx); > > > return err; > > > > Can prematurely resetting enabled to false here introduce a race condition > > leading to a use-after-free of the DAMON context structures? > > > > If damon_reclaim_apply_parameters() is invoked from the kdamond worker thread > > (for example, when applying commit_inputs) and damon_commit_ctx() fails, > > ctx->maybe_corrupted is set to true. This signals the kdamond thread to > > break its main loop and begin its teardown phase, such as executing > > damon_destroy_targets() to free lists. > > > > Because enabled is set to false asynchronously here, a concurrent sysfs write > > of 'Y' to enabled will succeed and immediately trigger > > damon_reclaim_turn(true). This unconditionally calls > > damon_reclaim_apply_parameters() and executes damon_commit_ctx() from the > > sysfs thread. > > > > Since damon_commit_ctx() locklessly mutates and frees items in > > ctx->adaptive_targets and ctx->schemes, would this race directly with the > > still-exiting kdamond thread traversing and freeing those exact same lists, > > resulting in list corruption and a use-after-free? > > The core issue is - modifying 'enabled' and 'kdamond_pid' in the error > path of damon_commit_ctx() is racy. We simply made such racy user behaviors be prohibited [1]. So this should be fine. But, I'd prefer simpler fix, as I replied to the cover letter. > > My plan for v3: > - Remove the reset code in damon_*_apply_parameters() > - Keep only the fix in damon_*_turn(false) > > This resolves the restart issue without introducing new races. > Please let me know if this direction looks good. > > Small changes for v3: > - Delete a "=" at the bottom of "Problem" (commit message): > > Problem > - ======== > + ======= > I also added comments about above as a reply to the cover letter. Please reply there. [1] https://lkml.kernel.org/r/20260329153052.46657-2-sj@kernel.org Thanks, SJ [...] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination 2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan ` (2 preceding siblings ...) 2026-04-13 22:05 ` [PATCH v2 0/2] mm/damon: " Liew Rui Yan @ 2026-04-14 0:22 ` SeongJae Park 2026-04-14 0:34 ` SeongJae Park 2026-04-16 0:17 ` SeongJae Park 4 siblings, 1 reply; 12+ messages in thread From: SeongJae Park @ 2026-04-14 0:22 UTC (permalink / raw) To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm On Tue, 14 Apr 2026 02:52:47 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote: > Problem > ======== Let's align the underline with the subject. Also, let's add one blank line after the underline. > When kdamond terminates unexpectedly, 'enabled' remains 'Y' and > 'kdamond_pid' remains stale. This prevents user from restarting DAMON > because both writing 'Y' and 'N' to 'enabled' will fail. > > "Unexpected termination" here means the kdamond exits without any user > request (e.g., not by writing 'N' to 'enabled'). Could you please further explain when such termination can happen? > > User Impact > =========== > Once kdamond terminates this way, it cannot be restarted via sysfs > because: > > 1. DAMON_LRU_SORT/DAMON_RECLAIM is built into the kernel, so it cannot > be unloaded and reloaded at runtime. I think this is quite obvious, so may better to be dropped. > 2. Writing 'N' to 'enabled' fails because kdamond no longer exists; > Writing 'Y' does nothing, as 'enabled' is already Y. > > As a result, the only way to restore DAMON functionality is a full > system reboot. Thank you for clarifying the user impact. I think this deserves Cc-ing stable@. I think 'Problem' and 'User Impact' can be unified into one section. > > Solution > ======== > damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only > sets it to false upon successful completion. When 'maybe_corrupted' > remains true, kdamond will terminate eventually. This seems better to be explained earlier, on the problem section. > > Therefore: > 1. In damon_{lru_sort, reclaim}_turn(): Add fallback logic to reset > parameters when damon_stop() fails but kdamond is not running. > 2. In damon_{lru_sort, reclaim}_apply_parameters(): Reset parameters > when damon_commit_ctx() fails, as kdamond will terminate due to > maybe_corrupted mechanism. So the problem is that 'enable' parameter value is not trustworthy, and this series is trying to make it trustworthy. I think it is bit complicated, especially for stable@ fix. What about simply using more trustworthy information, e.g., ''' --- a/mm/damon/reclaim.c +++ b/mm/damon/reclaim.c @@ -390,7 +390,7 @@ MODULE_PARM_DESC(addr_unit, static int damon_reclaim_enabled_store(const char *val, const struct kernel_param *kp) { - bool is_enabled = enabled; + bool is_enabled = false; bool enable; int err; @@ -398,6 +398,9 @@ static int damon_reclaim_enabled_store(const char *val, if (err) return err; + if (ctx) + is_enabled = damon_is_running(ctx); + if (is_enabled == enable) return 0; ''' > > Changes from RFC-v1 > (https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com) > - Remove RFC tag. When dropping RFC tag, let's start from v1 again, from the next time. Thanks, SJ [...] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination 2026-04-14 0:22 ` SeongJae Park @ 2026-04-14 0:34 ` SeongJae Park 2026-04-15 18:45 ` Liew Rui Yan 0 siblings, 1 reply; 12+ messages in thread From: SeongJae Park @ 2026-04-14 0:34 UTC (permalink / raw) To: SeongJae Park; +Cc: Liew Rui Yan, damon, linux-mm On Mon, 13 Apr 2026 17:23:03 -0700 SeongJae Park <sj@kernel.org> wrote: > On Tue, 14 Apr 2026 02:52:47 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote: > > > Problem > > ======== > > Let's align the underline with the subject. Also, let's add one blank line > after the underline. > > > When kdamond terminates unexpectedly, 'enabled' remains 'Y' and > > 'kdamond_pid' remains stale. This prevents user from restarting DAMON > > because both writing 'Y' and 'N' to 'enabled' will fail. > > > > "Unexpected termination" here means the kdamond exits without any user > > request (e.g., not by writing 'N' to 'enabled'). > > Could you please further explain when such termination can happen? > > > > > User Impact > > =========== > > Once kdamond terminates this way, it cannot be restarted via sysfs > > because: > > > > 1. DAMON_LRU_SORT/DAMON_RECLAIM is built into the kernel, so it cannot > > be unloaded and reloaded at runtime. > > I think this is quite obvious, so may better to be dropped. > > > 2. Writing 'N' to 'enabled' fails because kdamond no longer exists; > > Writing 'Y' does nothing, as 'enabled' is already Y. > > > > As a result, the only way to restore DAMON functionality is a full > > system reboot. > > Thank you for clarifying the user impact. I think this deserves Cc-ing > stable@. > > I think 'Problem' and 'User Impact' can be unified into one section. > > > > > Solution > > ======== > > damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only > > sets it to false upon successful completion. When 'maybe_corrupted' > > remains true, kdamond will terminate eventually. > > This seems better to be explained earlier, on the problem section. > > > > > Therefore: > > 1. In damon_{lru_sort, reclaim}_turn(): Add fallback logic to reset > > parameters when damon_stop() fails but kdamond is not running. > > 2. In damon_{lru_sort, reclaim}_apply_parameters(): Reset parameters > > when damon_commit_ctx() fails, as kdamond will terminate due to > > maybe_corrupted mechanism. > > So the problem is that 'enable' parameter value is not trustworthy, and this > series is trying to make it trustworthy. I think it is bit complicated, > especially for stable@ fix. What about simply using more trustworthy > information, e.g., > > ''' > --- a/mm/damon/reclaim.c > +++ b/mm/damon/reclaim.c > @@ -390,7 +390,7 @@ MODULE_PARM_DESC(addr_unit, > static int damon_reclaim_enabled_store(const char *val, > const struct kernel_param *kp) > { > - bool is_enabled = enabled; > + bool is_enabled = false; > bool enable; > int err; > > @@ -398,6 +398,9 @@ static int damon_reclaim_enabled_store(const char *val, > if (err) > return err; > > + if (ctx) > + is_enabled = damon_is_running(ctx); > + > if (is_enabled == enable) > return 0; > > ''' > > > > > Changes from RFC-v1 > > (https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com) > > - Remove RFC tag. > > When dropping RFC tag, let's start from v1 again, from the next time. Also, I just found the patches don't have Fixes: and Cc: stable@. Could you please add those appripriately? Thanks, SJ > > > Thanks, > SJ > > [...] > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination 2026-04-14 0:34 ` SeongJae Park @ 2026-04-15 18:45 ` Liew Rui Yan 2026-04-15 23:55 ` SeongJae Park 0 siblings, 1 reply; 12+ messages in thread From: Liew Rui Yan @ 2026-04-15 18:45 UTC (permalink / raw) To: sj; +Cc: aethernet65535, damon, linux-mm On Mon, 13 Apr 2026 17:34:34 -0700 SeongJae Park <sj@kernel.org> wrote: > On Mon, 13 Apr 2026 17:23:03 -0700 SeongJae Park <sj@kernel.org> wrote: > > > On Tue, 14 Apr 2026 02:52:47 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote: > > > > > Problem > > > ======== > > > > Let's align the underline with the subject. Also, let's add one blank line > > after the underline. > > > > > When kdamond terminates unexpectedly, 'enabled' remains 'Y' and > > > 'kdamond_pid' remains stale. This prevents user from restarting DAMON > > > because both writing 'Y' and 'N' to 'enabled' will fail. > > > > > > "Unexpected termination" here means the kdamond exits without any user > > > request (e.g., not by writing 'N' to 'enabled'). > > > > Could you please further explain when such termination can happen? > > > > > > > > User Impact > > > =========== > > > Once kdamond terminates this way, it cannot be restarted via sysfs > > > because: > > > > > > 1. DAMON_LRU_SORT/DAMON_RECLAIM is built into the kernel, so it cannot > > > be unloaded and reloaded at runtime. > > > > I think this is quite obvious, so may better to be dropped. > > > > > 2. Writing 'N' to 'enabled' fails because kdamond no longer exists; > > > Writing 'Y' does nothing, as 'enabled' is already Y. > > > > > > As a result, the only way to restore DAMON functionality is a full > > > system reboot. > > > > Thank you for clarifying the user impact. I think this deserves Cc-ing > > stable@. > > > > I think 'Problem' and 'User Impact' can be unified into one section. > > > > > > > > Solution > > > ======== > > > damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only > > > sets it to false upon successful completion. When 'maybe_corrupted' > > > remains true, kdamond will terminate eventually. > > > > This seems better to be explained earlier, on the problem section. Okay, this is my current commit message: Problem ======= When kdamond terminates unexpectedly, 'enabled' remains 'Y' and 'kdamond_pid' remains stale. This prevents user from restarting DAMON because both writing 'Y' and 'N' to 'enabled' will fail. "Unexpected termination" here means the kdamond exits without any user request (e.g., not by writing 'N' to 'enabled'). This can happen when: - Internal error of kdamond. - Invalid parameters commit via 'commit_inputs' (e.g., addr_unit=3). The root cause is that damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only sets it to false upon successful completion. When 'maybe_corrupted' remains true, kdamond will terminate eventually. Once kdamond terminates this way, it cannot be restarted via sysfs because writing 'N' to 'enabled' fails (kdamond no longer exists) and writing 'Y' does nothing ('enabled' is already Y). As a result, the only way to restore DAMON functionality is a full system reboot. Solution ======== The problem is that 'enable' parameter value is not trustworthy. Instead of relying on the 'enabled' variable, use damon_is_running(ctx) which reflects the true kdamond state. > > [...] > > So the problem is that 'enable' parameter value is not trustworthy, and this > > series is trying to make it trustworthy. I think it is bit complicated, > > especially for stable@ fix. What about simply using more trustworthy > > information, e.g., > > > > ''' > > --- a/mm/damon/reclaim.c > > +++ b/mm/damon/reclaim.c > > @@ -390,7 +390,7 @@ MODULE_PARM_DESC(addr_unit, > > static int damon_reclaim_enabled_store(const char *val, > > const struct kernel_param *kp) > > { > > - bool is_enabled = enabled; > > + bool is_enabled = false; > > bool enable; > > int err; > > > > @@ -398,6 +398,9 @@ static int damon_reclaim_enabled_store(const char *val, > > if (err) > > return err; > > > > + if (ctx) > > + is_enabled = damon_is_running(ctx); > > + > > if (is_enabled == enable) > > return 0; > > > > ''' Thank you for the suggestion. I have tested this implementation and it works expected. I agree that its complexity is much lower and more suitable for a stable fix. > > > > > > > > Changes from RFC-v1 > > > (https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com) > > > - Remove RFC tag. > > > > When dropping RFC tag, let's start from v1 again, from the next time. Regarding the versioning, I have two questions: 1. Is version number inhritance only required when converting from non-RFC to an RFC? 2. Should my next version be tagged as v3 (since the current one is v2)? > > Also, I just found the patches don't have Fixes: and Cc: stable@. Could you > please add those appripriately? Regarding the Fixes: tags, I will include the following: - Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting") - Fixes: 059342d1dd4e ("mm/damon/reclaim: fix the timer always stays active") Best regards, Rui Yan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination 2026-04-15 18:45 ` Liew Rui Yan @ 2026-04-15 23:55 ` SeongJae Park 0 siblings, 0 replies; 12+ messages in thread From: SeongJae Park @ 2026-04-15 23:55 UTC (permalink / raw) To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm On Thu, 16 Apr 2026 02:45:12 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote: > On Mon, 13 Apr 2026 17:34:34 -0700 SeongJae Park <sj@kernel.org> wrote: > > > On Mon, 13 Apr 2026 17:23:03 -0700 SeongJae Park <sj@kernel.org> wrote: > > > > > On Tue, 14 Apr 2026 02:52:47 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote: > > > > > > > Problem > > > > ======== > > > > > > Let's align the underline with the subject. Also, let's add one blank line > > > after the underline. > > > > > > > When kdamond terminates unexpectedly, 'enabled' remains 'Y' and > > > > 'kdamond_pid' remains stale. This prevents user from restarting DAMON > > > > because both writing 'Y' and 'N' to 'enabled' will fail. > > > > > > > > "Unexpected termination" here means the kdamond exits without any user > > > > request (e.g., not by writing 'N' to 'enabled'). > > > > > > Could you please further explain when such termination can happen? > > > > > > > > > > > User Impact > > > > =========== > > > > Once kdamond terminates this way, it cannot be restarted via sysfs > > > > because: > > > > > > > > 1. DAMON_LRU_SORT/DAMON_RECLAIM is built into the kernel, so it cannot > > > > be unloaded and reloaded at runtime. > > > > > > I think this is quite obvious, so may better to be dropped. > > > > > > > 2. Writing 'N' to 'enabled' fails because kdamond no longer exists; > > > > Writing 'Y' does nothing, as 'enabled' is already Y. > > > > > > > > As a result, the only way to restore DAMON functionality is a full > > > > system reboot. > > > > > > Thank you for clarifying the user impact. I think this deserves Cc-ing > > > stable@. > > > > > > I think 'Problem' and 'User Impact' can be unified into one section. > > > > > > > > > > > Solution > > > > ======== > > > > damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only > > > > sets it to false upon successful completion. When 'maybe_corrupted' > > > > remains true, kdamond will terminate eventually. > > > > > > This seems better to be explained earlier, on the problem section. > > Okay, this is my current commit message: > > Problem > ======= > > When kdamond terminates unexpectedly, 'enabled' remains 'Y' and > 'kdamond_pid' remains stale. This prevents user from restarting DAMON > because both writing 'Y' and 'N' to 'enabled' will fail. > > "Unexpected termination" here means the kdamond exits without any user > request (e.g., not by writing 'N' to 'enabled'). This can happen when: > - Internal error of kdamond. > - Invalid parameters commit via 'commit_inputs' (e.g., addr_unit=3). > > The root cause is that damon_commit_ctx() sets 'maybe_corrupted=true' at > the beginning and only sets it to false upon successful completion. When > 'maybe_corrupted' remains true, kdamond will terminate eventually. > > Once kdamond terminates this way, it cannot be restarted via sysfs > because writing 'N' to 'enabled' fails (kdamond no longer exists) and > writing 'Y' does nothing ('enabled' is already Y). As a result, the only > way to restore DAMON functionality is a full system reboot. > > Solution > ======== > > The problem is that 'enable' parameter value is not trustworthy. > Instead of relying on the 'enabled' variable, use damon_is_running(ctx) > which reflects the true kdamond state. I don't want to be picky, but ... I still feels this is easy to make readers including future myslef be confused for some points. Given this is a bit serious bug fix, I feel I have to raise the bar, at least for future myself. What about below? Kdamonds for DAMON_RECLAIM and DAMON_LRU_SORT could stop without properly updating 'enabled' parameter value, when parameters online commit fails due to invalid user inputs or internal memory allocation failures. The enabled parameter value, which can be stale after certain kdamond stops, is used by the modules' kdamond start/stop logic, to avoid doing unnecessary start/stop works. As a result, uses cannot restart the kdamonds before the system is rebooted and 'enabled' parameter value is correctly reset. For example, the issue can be reproduced like below: <your reproducer> Fix the issue by using damon_is_running(), which should reflect the true kdamond state, instead of the 'enabled' parameter. > > > > [...] > > > So the problem is that 'enable' parameter value is not trustworthy, and this > > > series is trying to make it trustworthy. I think it is bit complicated, > > > especially for stable@ fix. What about simply using more trustworthy > > > information, e.g., > > > > > > ''' > > > --- a/mm/damon/reclaim.c > > > +++ b/mm/damon/reclaim.c > > > @@ -390,7 +390,7 @@ MODULE_PARM_DESC(addr_unit, > > > static int damon_reclaim_enabled_store(const char *val, > > > const struct kernel_param *kp) > > > { > > > - bool is_enabled = enabled; > > > + bool is_enabled = false; > > > bool enable; > > > int err; > > > > > > @@ -398,6 +398,9 @@ static int damon_reclaim_enabled_store(const char *val, > > > if (err) > > > return err; > > > > > > + if (ctx) > > > + is_enabled = damon_is_running(ctx); > > > + > > > if (is_enabled == enable) > > > return 0; > > > > > > ''' > > Thank you for the suggestion. I have tested this implementation and it > works expected. I agree that its complexity is much lower and more > suitable for a stable fix. Thank you for testing! > > > > > > > > > > > > Changes from RFC-v1 > > > > (https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com) > > > > - Remove RFC tag. > > > > > > When dropping RFC tag, let's start from v1 again, from the next time. > > Regarding the versioning, I have two questions: > 1. Is version number inhritance only required when converting from > non-RFC to an RFC? I'd request such conversion when dropping RFC tag. E.g., RFC -> RFC v2 ... -- (drop RFC) -> v1 -> v2 ... > 2. Should my next version be tagged as v3 (since the current one is v2)? The ship is already sailed for this patch series. Please tag it with v3. But please follow the suggestion from next times. > > > > > Also, I just found the patches don't have Fixes: and Cc: stable@. Could you > > please add those appripriately? > > Regarding the Fixes: tags, I will include the following: > - Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting") This seems correct, but ... > - Fixes: 059342d1dd4e ("mm/damon/reclaim: fix the timer always stays active") Is this correct? At the point of this commit, there was no commit_inputs parameter. Were you able to confirm this issue can and cannot be reproduced on this commit and its parent commit, respectively? Thanks, SJ [...] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination 2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan ` (3 preceding siblings ...) 2026-04-14 0:22 ` SeongJae Park @ 2026-04-16 0:17 ` SeongJae Park 4 siblings, 0 replies; 12+ messages in thread From: SeongJae Park @ 2026-04-16 0:17 UTC (permalink / raw) To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm On Tue, 14 Apr 2026 02:52:47 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote: [...] > > mm/damon/lru_sort.c | 13 +++++++++++-- > mm/damon/reclaim.c | 13 +++++++++++-- > 2 files changed, 22 insertions(+), 4 deletions(-) One more trivial comment. Let's use 'mm/damon/reclaim,lru_sort:' for the series subject prefix. Thanks, SJ [...] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-04-16 0:17 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan 2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan 2026-04-13 19:54 ` (sashiko review) " Liew Rui Yan 2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan 2026-04-13 19:57 ` (sashiko review) " Liew Rui Yan 2026-04-13 22:05 ` [PATCH v2 0/2] mm/damon: " Liew Rui Yan 2026-04-14 0:28 ` SeongJae Park 2026-04-14 0:22 ` SeongJae Park 2026-04-14 0:34 ` SeongJae Park 2026-04-15 18:45 ` Liew Rui Yan 2026-04-15 23:55 ` SeongJae Park 2026-04-16 0:17 ` SeongJae Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox