linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination
@ 2026-04-13 18:52 Liew Rui Yan
  2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Liew Rui Yan @ 2026-04-13 18:52 UTC (permalink / raw)
  To: SeongJae Park; +Cc: damon, linux-mm, Liew Rui Yan

Problem
========
When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
'kdamond_pid' remains stale. This prevents user from restarting DAMON
because both writing 'Y' and 'N' to 'enabled' will fail.

"Unexpected termination" here means the kdamond exits without any user
request (e.g., not by writing 'N' to 'enabled').

User Impact
===========
Once kdamond terminates this way, it cannot be restarted via sysfs
because:

1. DAMON_LRU_SORT/DAMON_RECLAIM is built into the kernel, so it cannot
   be unloaded and reloaded at runtime.
2. Writing 'N' to 'enabled' fails because kdamond no longer exists;
   Writing 'Y' does nothing, as 'enabled' is already Y.

As a result, the only way to restore DAMON functionality is a full
system reboot.

Solution
========
damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only
sets it to false upon successful completion. When 'maybe_corrupted'
remains true, kdamond will terminate eventually.

Therefore:
1. In damon_{lru_sort, reclaim}_turn(): Add fallback logic to reset
   parameters when damon_stop() fails but kdamond is not running.
2. In damon_{lru_sort, reclaim}_apply_parameters(): Reset parameters
   when damon_commit_ctx() fails, as kdamond will terminate due to
   maybe_corrupted mechanism.

Changes from RFC-v1
(https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com)
- Remove RFC tag.
- Remove 'damon_thread_status' structure and damon_update_thread_status()
  (SJ pointed out this was too much extension of core API for a problem
  that can be fixed more simply).
- Add a fallback in damon_{lru_sort, reclaim}_turn() 'N' path. If
  damon_stop() fails but kdamond is not running, forcefully reset the
  parameters.
- Reset 'enabled' and 'kdamond_pid' when damon_commit_ctx() fails in
  damon_{lru_sort, reclaim}_apply_parameters() (kdamond will terminate
  eventually in this case).

Liew Rui Yan (2):
  mm/damon/lru_sort: reset thread status parameters upon kdamond
    termination
  mm/damon/reclaim: reset thread status parameters upon kdamond
    termination

 mm/damon/lru_sort.c | 13 +++++++++++--
 mm/damon/reclaim.c  | 13 +++++++++++--
 2 files changed, 22 insertions(+), 4 deletions(-)

-- 
2.53.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] mm/damon/lru_sort: reset thread status parameters upon kdamond termination
  2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
@ 2026-04-13 18:52 ` Liew Rui Yan
  2026-04-13 19:54   ` (sashiko review) " Liew Rui Yan
  2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Liew Rui Yan @ 2026-04-13 18:52 UTC (permalink / raw)
  To: SeongJae Park; +Cc: damon, linux-mm, Liew Rui Yan

Problem
========
When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
'kdamond_pid' remains stale. This prevents user from restarting DAMON
because both writing 'Y' and 'N' to 'enabled' will fail.

"Unexpected termination" here means the kdamond exits without any user
request (e.g., not by writing 'N' to 'enabled').

User Impact
===========
Once kdamond terminates this way, it cannot be restarted via sysfs
because:

1. DAMON_LRU_SORT is built into the kernel, so it cannot be unloaded and
   reloaded at runtime.
2. Writing 'N' to 'enabled' fails because kdamond no longer exists;
   Writing 'Y' does nothing, as 'enabled' is already Y.

As a result, the only way to restore DAMON functionality is a full
system reboot.

Solution
========
damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only
sets it to false upon successful completion. When 'maybe_corrupted'
remains true, kdamond will terminate eventually.

Therefore:
1. In damon_lru_sort_turn(): Add fallback logic to reset
   parameters when damon_stop() fails but kdamond is not running.
2. In damon_lru_sort_apply_parameters(): Reset parameters
   when damon_commit_ctx() fails, as kdamond will terminate due to
   maybe_corrupted mechanism.

Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
---
 mm/damon/lru_sort.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
index 554559d72976..96c8d0dfcafd 100644
--- a/mm/damon/lru_sort.c
+++ b/mm/damon/lru_sort.c
@@ -344,6 +344,10 @@ static int damon_lru_sort_apply_parameters(void)
 	if (err)
 		goto out;
 	err = damon_commit_ctx(ctx, param_ctx);
+	if (err) {
+		enabled = false;
+		kdamond_pid = -1;
+	}
 out:
 	damon_destroy_ctx(param_ctx);
 	return err;
@@ -388,8 +392,13 @@ static int damon_lru_sort_turn(bool on)
 
 	if (!on) {
 		err = damon_stop(&ctx, 1);
-		if (!err)
-			kdamond_pid = -1;
+		if (err) {
+			if (!damon_is_running(ctx))
+				err = 0;
+			else
+				return err;
+		}
+		kdamond_pid = -1;
 		return err;
 	}
 
-- 
2.53.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] mm/damon/reclaim: reset thread status parameters upon kdamond termination
  2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
  2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
@ 2026-04-13 18:52 ` Liew Rui Yan
  2026-04-13 19:57   ` (sashiko review) " Liew Rui Yan
  2026-04-13 22:05 ` [PATCH v2 0/2] mm/damon: " Liew Rui Yan
  2026-04-14  0:22 ` SeongJae Park
  3 siblings, 1 reply; 9+ messages in thread
From: Liew Rui Yan @ 2026-04-13 18:52 UTC (permalink / raw)
  To: SeongJae Park; +Cc: damon, linux-mm, Liew Rui Yan

Problem
========
When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
'kdamond_pid' remains stale. This prevents user from restarting DAMON
because both writing 'Y' and 'N' to 'enabled' will fail.

"Unexpected termination" here means the kdamond exits without any user
request (e.g., not by writing 'N' to 'enabled').

User Impact
===========
Once kdamond terminates this way, it cannot be restarted via sysfs
because:

1. DAMON_RECLAIM is built into the kernel, so it cannot be unloaded and
   reloaded at runtime.
2. Writing 'N' to 'enabled' fails because kdamond no longer exists;
   Writing 'Y' does nothing, as 'enabled' is already Y.

As a result, the only way to restore DAMON functionality is a full
system reboot.

Solution
========
damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only
sets it to false upon successful completion. When 'maybe_corrupted'
remains true, kdamond will terminate eventually.

Therefore:
1. In damon_reclaim_turn(): Add fallback logic to reset parameters when
   damon_stop() fails but kdamond is not running.
2. In damon_reclaim_apply_parameters(): Reset parameters when
   damon_commit_ctx() fails, as kdamond will terminate due to
   maybe_corrupted mechanism.

Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
---
 mm/damon/reclaim.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index 86da14778658..e3e148fd80f9 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -250,6 +250,10 @@ static int damon_reclaim_apply_parameters(void)
 	if (err)
 		goto out;
 	err = damon_commit_ctx(ctx, param_ctx);
+	if (err) {
+		enabled = false;
+		kdamond_pid = -1;
+	}
 out:
 	damon_destroy_ctx(param_ctx);
 	return err;
@@ -290,8 +294,13 @@ static int damon_reclaim_turn(bool on)
 
 	if (!on) {
 		err = damon_stop(&ctx, 1);
-		if (!err)
-			kdamond_pid = -1;
+		if (err) {
+			if (!damon_is_running(ctx))
+				err = 0;
+			else
+				return err;
+		}
+		kdamond_pid = -1;
 		return err;
 	}
 
-- 
2.53.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (sashiko review) [PATCH v2 1/2] mm/damon/lru_sort: reset thread status parameters upon kdamond termination
  2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
@ 2026-04-13 19:54   ` Liew Rui Yan
  0 siblings, 0 replies; 9+ messages in thread
From: Liew Rui Yan @ 2026-04-13 19:54 UTC (permalink / raw)
  To: aethernet65535; +Cc: damon, linux-mm, sj

Sashiko found some issues, so I will send my comment for it soon.

Forwarding sashiko review for doing discussions via mails.

# review url: https://sashiko.dev/#/patchset/20260413185249.5921-2-aethernet65535@gmail.com
# start of sashiko.dev inline review
commit a943fcfe60e73956b72cad5a625120222653bdda
Author: Liew Rui Yan <aethernet65535@gmail.com>

mm/damon/lru_sort: reset thread status parameters upon kdamond termination

This commit adds logic to reset thread status parameters when kdamond
terminates unexpectedly or when context commit fails. This ensures the
module does not get stuck in a state where it cannot be restarted via sysfs.

> diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
> index 554559d729760..96c8d0dfcafd2 100644
> --- a/mm/damon/lru_sort.c
> +++ b/mm/damon/lru_sort.c
> @@ -344,6 +344,10 @@ static int damon_lru_sort_apply_parameters(void)
>  	if (err)
>  		goto out;
>  	err = damon_commit_ctx(ctx, param_ctx);
> +	if (err) {
> +		enabled = false;
> +		kdamond_pid = -1;
> +	}

Does updating the module parameters here require holding kernel_param_lock?

Since damon_lru_sort_apply_parameters() can be executed asynchronously by the
kdamond thread when a user writes to the commit_inputs parameter, changing
these variables locklessly might introduce a race condition.

If enabled is set to false here while kdamond is still preparing to terminate,
could a concurrent sysfs write (echo Y > enabled) read the false state and
proceed to call damon_lru_sort_turn(true) because it incorrectly assumes the
worker has completely stopped?

If so, damon_lru_sort_turn(true) would call damon_commit_ctx(ctx, ...) and
modify the shared ctx structures while the exiting kdamond worker thread is
concurrently executing its cleanup block, such as damon_destroy_targets(ctx).

Can this concurrent modification of the context lists lead to use-after-free
issues or list corruption?

[ ... ]


# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260413185249.5921-2-aethernet65535@gmail.com

# hkml [1] generated a draft of this mail.  You can regenerate
# this using below command:
#
#     hkml patch sashiko_dev --for_forwarding \
#             20260413185249.5921-2-aethernet65535@gmail.com
#
# [1] https://github.com/sjp38/hackermail


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: (sashiko review) [PATCH v2 2/2] mm/damon/reclaim: reset thread status parameters upon kdamond termination
  2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan
@ 2026-04-13 19:57   ` Liew Rui Yan
  0 siblings, 0 replies; 9+ messages in thread
From: Liew Rui Yan @ 2026-04-13 19:57 UTC (permalink / raw)
  To: aethernet65535; +Cc: damon, linux-mm, sj

Sashiko found some issues, so I will send my comment for it soon.

Forwarding sashiko review for doing discussions via mails.

# review url: https://sashiko.dev/#/patchset/20260413185249.5921-3-aethernet65535@gmail.com
# start of sashiko.dev inline review
commit cefd2ed657fa3baedb834d79191ef3af7c5363ac
Author: Liew Rui Yan <aethernet65535@gmail.com>

mm/damon/reclaim: reset thread status parameters upon kdamond termination

This commit aims to fix an issue where an unexpected kdamond termination
leaves the enabled parameter as 'Y' and a stale kdamond_pid, preventing
the user from restarting DAMON without a reboot. It adds fallback logic
to reset these parameters when damon_commit_ctx() fails or when
damon_stop() fails but the thread is no longer running.

> diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
> index 86da147786583..e3e148fd80f97 100644
> --- a/mm/damon/reclaim.c
> +++ b/mm/damon/reclaim.c
[ ... ]
> @@ -250,6 +250,10 @@ static int damon_reclaim_apply_parameters(void)
>  	if (err)
>  		goto out;
>  	err = damon_commit_ctx(ctx, param_ctx);
> +	if (err) {
> +		enabled = false;
> +		kdamond_pid = -1;
> +	}
>  out:
>  	damon_destroy_ctx(param_ctx);
>  	return err;

Can prematurely resetting enabled to false here introduce a race condition
leading to a use-after-free of the DAMON context structures?

If damon_reclaim_apply_parameters() is invoked from the kdamond worker thread
(for example, when applying commit_inputs) and damon_commit_ctx() fails,
ctx->maybe_corrupted is set to true. This signals the kdamond thread to
break its main loop and begin its teardown phase, such as executing
damon_destroy_targets() to free lists.

Because enabled is set to false asynchronously here, a concurrent sysfs write
of 'Y' to enabled will succeed and immediately trigger
damon_reclaim_turn(true). This unconditionally calls
damon_reclaim_apply_parameters() and executes damon_commit_ctx() from the
sysfs thread.

Since damon_commit_ctx() locklessly mutates and frees items in
ctx->adaptive_targets and ctx->schemes, would this race directly with the
still-exiting kdamond thread traversing and freeing those exact same lists,
resulting in list corruption and a use-after-free?


# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260413185249.5921-3-aethernet65535@gmail.com

# hkml [1] generated a draft of this mail.  You can regenerate
# this using below command:
#
#     hkml patch sashiko_dev --for_forwarding \
#             20260413185249.5921-3-aethernet65535@gmail.com
#
# [1] https://github.com/sjp38/hackermail


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
  2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
  2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan
@ 2026-04-13 22:05 ` Liew Rui Yan
  2026-04-14  0:28   ` SeongJae Park
  2026-04-14  0:22 ` SeongJae Park
  3 siblings, 1 reply; 9+ messages in thread
From: Liew Rui Yan @ 2026-04-13 22:05 UTC (permalink / raw)
  To: aethernet65535, sj; +Cc: damon, linux-mm

Hi SeongJae,

I've reviewed the Sashiko report on [PATCH v2 1/2] and [2/2]. Since the
issues are essentially the same, I want to reply to them all in one
email.

# PATCH v2 1/2
> > diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
> > index 554559d729760..96c8d0dfcafd2 100644
> > --- a/mm/damon/lru_sort.c
> > +++ b/mm/damon/lru_sort.c
> > @@ -344,6 +344,10 @@ static int damon_lru_sort_apply_parameters(void)
> >  	if (err)
> >  		goto out;
> >  	err = damon_commit_ctx(ctx, param_ctx);
> > +	if (err) {
> > +		enabled = false;
> > +		kdamond_pid = -1;
> > +	}
> 
> Does updating the module parameters here require holding kernel_param_lock?
> 
> Since damon_lru_sort_apply_parameters() can be executed asynchronously by the
> kdamond thread when a user writes to the commit_inputs parameter, changing
> these variables locklessly might introduce a race condition.
> 
> If enabled is set to false here while kdamond is still preparing to terminate,
> could a concurrent sysfs write (echo Y > enabled) read the false state and
> proceed to call damon_lru_sort_turn(true) because it incorrectly assumes the
> worker has completely stopped?
> 
> If so, damon_lru_sort_turn(true) would call damon_commit_ctx(ctx, ...) and
> modify the shared ctx structures while the exiting kdamond worker thread is
> concurrently executing its cleanup block, such as damon_destroy_targets(ctx).
> 
> Can this concurrent modification of the context lists lead to use-after-free
> issues or list corruption?

# PATCH v2 2/2
> > diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
> > index 86da147786583..e3e148fd80f97 100644
> > --- a/mm/damon/reclaim.c
> > +++ b/mm/damon/reclaim.c
> [ ... ]
> > @@ -250,6 +250,10 @@ static int damon_reclaim_apply_parameters(void)
> >  	if (err)
> >  		goto out;
> >  	err = damon_commit_ctx(ctx, param_ctx);
> > +	if (err) {
> > +		enabled = false;
> > +		kdamond_pid = -1;
> > +	}
> >  out:
> >  	damon_destroy_ctx(param_ctx);
> >  	return err;
> 
> Can prematurely resetting enabled to false here introduce a race condition
> leading to a use-after-free of the DAMON context structures?
> 
> If damon_reclaim_apply_parameters() is invoked from the kdamond worker thread
> (for example, when applying commit_inputs) and damon_commit_ctx() fails,
> ctx->maybe_corrupted is set to true. This signals the kdamond thread to
> break its main loop and begin its teardown phase, such as executing
> damon_destroy_targets() to free lists.
> 
> Because enabled is set to false asynchronously here, a concurrent sysfs write
> of 'Y' to enabled will succeed and immediately trigger
> damon_reclaim_turn(true). This unconditionally calls
> damon_reclaim_apply_parameters() and executes damon_commit_ctx() from the
> sysfs thread.
> 
> Since damon_commit_ctx() locklessly mutates and frees items in
> ctx->adaptive_targets and ctx->schemes, would this race directly with the
> still-exiting kdamond thread traversing and freeing those exact same lists,
> resulting in list corruption and a use-after-free?

The core issue is - modifying 'enabled' and 'kdamond_pid' in the error
path of damon_commit_ctx() is racy.

My plan for v3:
- Remove the reset code in damon_*_apply_parameters()
- Keep only the fix in damon_*_turn(false)

This resolves the restart issue without introducing new races.
Please let me know if this direction looks good.

Small changes for v3:
- Delete a "=" at the bottom of "Problem" (commit message):

      Problem
-     ========
+     =======

Best regards,
Rui Yan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
                   ` (2 preceding siblings ...)
  2026-04-13 22:05 ` [PATCH v2 0/2] mm/damon: " Liew Rui Yan
@ 2026-04-14  0:22 ` SeongJae Park
  2026-04-14  0:34   ` SeongJae Park
  3 siblings, 1 reply; 9+ messages in thread
From: SeongJae Park @ 2026-04-14  0:22 UTC (permalink / raw)
  To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm

On Tue, 14 Apr 2026 02:52:47 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> Problem
> ========

Let's align the underline with the subject.  Also, let's add one blank line
after the underline.

> When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
> 'kdamond_pid' remains stale. This prevents user from restarting DAMON
> because both writing 'Y' and 'N' to 'enabled' will fail.
> 
> "Unexpected termination" here means the kdamond exits without any user
> request (e.g., not by writing 'N' to 'enabled').

Could you please further explain when such termination can happen?

> 
> User Impact
> ===========
> Once kdamond terminates this way, it cannot be restarted via sysfs
> because:
> 
> 1. DAMON_LRU_SORT/DAMON_RECLAIM is built into the kernel, so it cannot
>    be unloaded and reloaded at runtime.

I think this is quite obvious, so may better to be dropped.

> 2. Writing 'N' to 'enabled' fails because kdamond no longer exists;
>    Writing 'Y' does nothing, as 'enabled' is already Y.
> 
> As a result, the only way to restore DAMON functionality is a full
> system reboot.

Thank you for clarifying the user impact.  I think this deserves Cc-ing
stable@.

I think 'Problem' and 'User Impact' can be unified into one section.

> 
> Solution
> ========
> damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only
> sets it to false upon successful completion. When 'maybe_corrupted'
> remains true, kdamond will terminate eventually.

This seems better to be explained earlier, on the problem section.

> 
> Therefore:
> 1. In damon_{lru_sort, reclaim}_turn(): Add fallback logic to reset
>    parameters when damon_stop() fails but kdamond is not running.
> 2. In damon_{lru_sort, reclaim}_apply_parameters(): Reset parameters
>    when damon_commit_ctx() fails, as kdamond will terminate due to
>    maybe_corrupted mechanism.

So the problem is that 'enable' parameter value is not trustworthy, and this
series is trying to make it trustworthy.  I think it is bit complicated,
especially for stable@ fix.  What about simply using more trustworthy
information, e.g.,

'''
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -390,7 +390,7 @@ MODULE_PARM_DESC(addr_unit,
 static int damon_reclaim_enabled_store(const char *val,
                const struct kernel_param *kp)
 {
-       bool is_enabled = enabled;
+       bool is_enabled = false;
        bool enable;
        int err;

@@ -398,6 +398,9 @@ static int damon_reclaim_enabled_store(const char *val,
        if (err)
                return err;

+       if (ctx)
+               is_enabled = damon_is_running(ctx);
+
        if (is_enabled == enable)
                return 0;

'''

> 
> Changes from RFC-v1
> (https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com)
> - Remove RFC tag.

When dropping RFC tag, let's start from v1 again, from the next time.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-13 22:05 ` [PATCH v2 0/2] mm/damon: " Liew Rui Yan
@ 2026-04-14  0:28   ` SeongJae Park
  0 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2026-04-14  0:28 UTC (permalink / raw)
  To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm

On Tue, 14 Apr 2026 06:05:11 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> Hi SeongJae,
> 
> I've reviewed the Sashiko report on [PATCH v2 1/2] and [2/2]. Since the
> issues are essentially the same, I want to reply to them all in one
> email.
[...]
> # PATCH v2 2/2
> > > diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
> > > index 86da147786583..e3e148fd80f97 100644
> > > --- a/mm/damon/reclaim.c
> > > +++ b/mm/damon/reclaim.c
> > [ ... ]
> > > @@ -250,6 +250,10 @@ static int damon_reclaim_apply_parameters(void)
> > >  	if (err)
> > >  		goto out;
> > >  	err = damon_commit_ctx(ctx, param_ctx);
> > > +	if (err) {
> > > +		enabled = false;
> > > +		kdamond_pid = -1;
> > > +	}
> > >  out:
> > >  	damon_destroy_ctx(param_ctx);
> > >  	return err;
> > 
> > Can prematurely resetting enabled to false here introduce a race condition
> > leading to a use-after-free of the DAMON context structures?
> > 
> > If damon_reclaim_apply_parameters() is invoked from the kdamond worker thread
> > (for example, when applying commit_inputs) and damon_commit_ctx() fails,
> > ctx->maybe_corrupted is set to true. This signals the kdamond thread to
> > break its main loop and begin its teardown phase, such as executing
> > damon_destroy_targets() to free lists.
> > 
> > Because enabled is set to false asynchronously here, a concurrent sysfs write
> > of 'Y' to enabled will succeed and immediately trigger
> > damon_reclaim_turn(true). This unconditionally calls
> > damon_reclaim_apply_parameters() and executes damon_commit_ctx() from the
> > sysfs thread.
> > 
> > Since damon_commit_ctx() locklessly mutates and frees items in
> > ctx->adaptive_targets and ctx->schemes, would this race directly with the
> > still-exiting kdamond thread traversing and freeing those exact same lists,
> > resulting in list corruption and a use-after-free?
> 
> The core issue is - modifying 'enabled' and 'kdamond_pid' in the error
> path of damon_commit_ctx() is racy.

We simply made such racy user behaviors be prohibited [1].  So this should be
fine.  But, I'd prefer simpler fix, as I replied to the cover letter.

> 
> My plan for v3:
> - Remove the reset code in damon_*_apply_parameters()
> - Keep only the fix in damon_*_turn(false)
> 
> This resolves the restart issue without introducing new races.
> Please let me know if this direction looks good.
> 
> Small changes for v3:
> - Delete a "=" at the bottom of "Problem" (commit message):
> 
>       Problem
> -     ========
> +     =======
> 

I also added comments about above as a reply to the cover letter.  Please reply
there.

[1] https://lkml.kernel.org/r/20260329153052.46657-2-sj@kernel.org


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-14  0:22 ` SeongJae Park
@ 2026-04-14  0:34   ` SeongJae Park
  0 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2026-04-14  0:34 UTC (permalink / raw)
  To: SeongJae Park; +Cc: Liew Rui Yan, damon, linux-mm

On Mon, 13 Apr 2026 17:23:03 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Tue, 14 Apr 2026 02:52:47 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> 
> > Problem
> > ========
> 
> Let's align the underline with the subject.  Also, let's add one blank line
> after the underline.
> 
> > When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
> > 'kdamond_pid' remains stale. This prevents user from restarting DAMON
> > because both writing 'Y' and 'N' to 'enabled' will fail.
> > 
> > "Unexpected termination" here means the kdamond exits without any user
> > request (e.g., not by writing 'N' to 'enabled').
> 
> Could you please further explain when such termination can happen?
> 
> > 
> > User Impact
> > ===========
> > Once kdamond terminates this way, it cannot be restarted via sysfs
> > because:
> > 
> > 1. DAMON_LRU_SORT/DAMON_RECLAIM is built into the kernel, so it cannot
> >    be unloaded and reloaded at runtime.
> 
> I think this is quite obvious, so may better to be dropped.
> 
> > 2. Writing 'N' to 'enabled' fails because kdamond no longer exists;
> >    Writing 'Y' does nothing, as 'enabled' is already Y.
> > 
> > As a result, the only way to restore DAMON functionality is a full
> > system reboot.
> 
> Thank you for clarifying the user impact.  I think this deserves Cc-ing
> stable@.
> 
> I think 'Problem' and 'User Impact' can be unified into one section.
> 
> > 
> > Solution
> > ========
> > damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only
> > sets it to false upon successful completion. When 'maybe_corrupted'
> > remains true, kdamond will terminate eventually.
> 
> This seems better to be explained earlier, on the problem section.
> 
> > 
> > Therefore:
> > 1. In damon_{lru_sort, reclaim}_turn(): Add fallback logic to reset
> >    parameters when damon_stop() fails but kdamond is not running.
> > 2. In damon_{lru_sort, reclaim}_apply_parameters(): Reset parameters
> >    when damon_commit_ctx() fails, as kdamond will terminate due to
> >    maybe_corrupted mechanism.
> 
> So the problem is that 'enable' parameter value is not trustworthy, and this
> series is trying to make it trustworthy.  I think it is bit complicated,
> especially for stable@ fix.  What about simply using more trustworthy
> information, e.g.,
> 
> '''
> --- a/mm/damon/reclaim.c
> +++ b/mm/damon/reclaim.c
> @@ -390,7 +390,7 @@ MODULE_PARM_DESC(addr_unit,
>  static int damon_reclaim_enabled_store(const char *val,
>                 const struct kernel_param *kp)
>  {
> -       bool is_enabled = enabled;
> +       bool is_enabled = false;
>         bool enable;
>         int err;
> 
> @@ -398,6 +398,9 @@ static int damon_reclaim_enabled_store(const char *val,
>         if (err)
>                 return err;
> 
> +       if (ctx)
> +               is_enabled = damon_is_running(ctx);
> +
>         if (is_enabled == enable)
>                 return 0;
> 
> '''
> 
> > 
> > Changes from RFC-v1
> > (https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com)
> > - Remove RFC tag.
> 
> When dropping RFC tag, let's start from v1 again, from the next time.

Also, I just found the patches don't have Fixes: and Cc: stable@.  Could you
please add those appripriately?


Thanks,
SJ

> 
> 
> Thanks,
> SJ
> 
> [...]
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-04-14  0:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-13 18:52 [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
2026-04-13 19:54   ` (sashiko review) " Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan
2026-04-13 19:57   ` (sashiko review) " Liew Rui Yan
2026-04-13 22:05 ` [PATCH v2 0/2] mm/damon: " Liew Rui Yan
2026-04-14  0:28   ` SeongJae Park
2026-04-14  0:22 ` SeongJae Park
2026-04-14  0:34   ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox