From: SeongJae Park <sj@kernel.org>
To: SeongJae Park <sj@kernel.org>
Cc: Liew Rui Yan <aethernet65535@gmail.com>,
damon@lists.linux.dev, linux-mm@kvack.org
Subject: Re: [PATCH v2 0/2] mm/damon: reset thread status parameters upon kdamond termination
Date: Wed, 15 Apr 2026 23:28:17 -0700 [thread overview]
Message-ID: <20260416062818.949-1-sj@kernel.org> (raw)
In-Reply-To: <20260415235529.86329-1-sj@kernel.org>
On Wed, 15 Apr 2026 16:55:29 -0700 SeongJae Park <sj@kernel.org> wrote:
> On Thu, 16 Apr 2026 02:45:12 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
>
> > On Mon, 13 Apr 2026 17:34:34 -0700 SeongJae Park <sj@kernel.org> wrote:
> >
> > > On Mon, 13 Apr 2026 17:23:03 -0700 SeongJae Park <sj@kernel.org> wrote:
> > >
> > > > On Tue, 14 Apr 2026 02:52:47 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> > > >
> > > > > Problem
> > > > > ========
> > > >
> > > > Let's align the underline with the subject. Also, let's add one blank line
> > > > after the underline.
> > > >
> > > > > When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
> > > > > 'kdamond_pid' remains stale. This prevents user from restarting DAMON
> > > > > because both writing 'Y' and 'N' to 'enabled' will fail.
> > > > >
> > > > > "Unexpected termination" here means the kdamond exits without any user
> > > > > request (e.g., not by writing 'N' to 'enabled').
> > > >
> > > > Could you please further explain when such termination can happen?
> > > >
> > > > >
> > > > > User Impact
> > > > > ===========
> > > > > Once kdamond terminates this way, it cannot be restarted via sysfs
> > > > > because:
> > > > >
> > > > > 1. DAMON_LRU_SORT/DAMON_RECLAIM is built into the kernel, so it cannot
> > > > > be unloaded and reloaded at runtime.
> > > >
> > > > I think this is quite obvious, so may better to be dropped.
> > > >
> > > > > 2. Writing 'N' to 'enabled' fails because kdamond no longer exists;
> > > > > Writing 'Y' does nothing, as 'enabled' is already Y.
> > > > >
> > > > > As a result, the only way to restore DAMON functionality is a full
> > > > > system reboot.
> > > >
> > > > Thank you for clarifying the user impact. I think this deserves Cc-ing
> > > > stable@.
> > > >
> > > > I think 'Problem' and 'User Impact' can be unified into one section.
> > > >
> > > > >
> > > > > Solution
> > > > > ========
> > > > > damon_commit_ctx() sets 'maybe_corrupted=true' at the beginning and only
> > > > > sets it to false upon successful completion. When 'maybe_corrupted'
> > > > > remains true, kdamond will terminate eventually.
> > > >
> > > > This seems better to be explained earlier, on the problem section.
> >
> > Okay, this is my current commit message:
> >
> > Problem
> > =======
> >
> > When kdamond terminates unexpectedly, 'enabled' remains 'Y' and
> > 'kdamond_pid' remains stale. This prevents user from restarting DAMON
> > because both writing 'Y' and 'N' to 'enabled' will fail.
> >
> > "Unexpected termination" here means the kdamond exits without any user
> > request (e.g., not by writing 'N' to 'enabled'). This can happen when:
> > - Internal error of kdamond.
> > - Invalid parameters commit via 'commit_inputs' (e.g., addr_unit=3).
> >
> > The root cause is that damon_commit_ctx() sets 'maybe_corrupted=true' at
> > the beginning and only sets it to false upon successful completion. When
> > 'maybe_corrupted' remains true, kdamond will terminate eventually.
> >
> > Once kdamond terminates this way, it cannot be restarted via sysfs
> > because writing 'N' to 'enabled' fails (kdamond no longer exists) and
> > writing 'Y' does nothing ('enabled' is already Y). As a result, the only
> > way to restore DAMON functionality is a full system reboot.
> >
> > Solution
> > ========
> >
> > The problem is that 'enable' parameter value is not trustworthy.
> > Instead of relying on the 'enabled' variable, use damon_is_running(ctx)
> > which reflects the true kdamond state.
>
> I don't want to be picky, but ... I still feels this is easy to make readers
> including future myslef be confused for some points. Given this is a bit
> serious bug fix, I feel I have to raise the bar, at least for future myself.
> What about below?
>
> Kdamonds for DAMON_RECLAIM and DAMON_LRU_SORT could stop without
> properly updating 'enabled' parameter value, when parameters online
> commit fails due to invalid user inputs or internal memory allocation
> failures. The enabled parameter value, which can be stale after
> certain kdamond stops, is used by the modules' kdamond start/stop
> logic, to avoid doing unnecessary start/stop works. As a result, uses
> cannot restart the kdamonds before the system is rebooted and 'enabled'
> parameter value is correctly reset. For example, the issue can be
> reproduced like below:
>
> <your reproducer>
>
> Fix the issue by using damon_is_running(), which should reflect the
> true kdamond state, instead of the 'enabled' parameter.
>
> >
> > > > [...]
> > > > So the problem is that 'enable' parameter value is not trustworthy, and this
> > > > series is trying to make it trustworthy. I think it is bit complicated,
> > > > especially for stable@ fix. What about simply using more trustworthy
> > > > information, e.g.,
> > > >
> > > > '''
> > > > --- a/mm/damon/reclaim.c
> > > > +++ b/mm/damon/reclaim.c
> > > > @@ -390,7 +390,7 @@ MODULE_PARM_DESC(addr_unit,
> > > > static int damon_reclaim_enabled_store(const char *val,
> > > > const struct kernel_param *kp)
> > > > {
> > > > - bool is_enabled = enabled;
> > > > + bool is_enabled = false;
> > > > bool enable;
> > > > int err;
> > > >
> > > > @@ -398,6 +398,9 @@ static int damon_reclaim_enabled_store(const char *val,
> > > > if (err)
> > > > return err;
> > > >
> > > > + if (ctx)
> > > > + is_enabled = damon_is_running(ctx);
> > > > +
> > > > if (is_enabled == enable)
> > > > return 0;
> > > >
> > > > '''
> >
> > Thank you for the suggestion. I have tested this implementation and it
> > works expected. I agree that its complexity is much lower and more
> > suitable for a stable fix.
>
> Thank you for testing!
Hmm, and now I think we migt be able to use this kind of trustworthy
information for reading part too, for not only enabled but also kdamond_pid,
like below.
'''
diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index ecfc6b58d07d2..775b33dbc952f 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -159,15 +159,6 @@ static unsigned long addr_unit __read_mostly = 1;
static bool skip_anon __read_mostly;
module_param(skip_anon, bool, 0600);
-/*
- * PID of the DAMON thread
- *
- * If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread.
- * Else, -1.
- */
-static int kdamond_pid __read_mostly = -1;
-module_param(kdamond_pid, int, 0400);
-
static struct damos_stat damon_reclaim_stat;
DEFINE_DAMON_MODULES_DAMOS_STATS_PARAMS(damon_reclaim_stat,
reclaim_tried_regions, reclaimed_regions, quota_exceeds);
@@ -343,12 +334,8 @@ static int damon_reclaim_turn(bool on)
{
int err;
- if (!on) {
- err = damon_stop(&ctx, 1);
- if (!err)
- kdamond_pid = -1;
- return err;
- }
+ if (!on)
+ return damon_stop(&ctx, 1);
err = damon_reclaim_apply_parameters();
if (err)
@@ -357,9 +344,6 @@ static int damon_reclaim_turn(bool on)
err = damon_start(&ctx, 1, true);
if (err)
return err;
- kdamond_pid = damon_kdamond_pid(ctx);
- if (kdamond_pid < 0)
- return kdamond_pid;
return damon_call(ctx, &call_control);
}
@@ -387,10 +371,16 @@ module_param_cb(addr_unit, &addr_unit_param_ops, &addr_unit, 0600);
MODULE_PARM_DESC(addr_unit,
"Scale factor for DAMON_RECLAIM to ops address conversion (default: 1)");
+static bool damon_reclaim_enabled(void)
+{
+ if (!ctx)
+ return false;
+ return damon_is_running(ctx);
+}
+
static int damon_reclaim_enabled_store(const char *val,
const struct kernel_param *kp)
{
- bool is_enabled = enabled;
bool enable;
int err;
@@ -398,7 +388,7 @@ static int damon_reclaim_enabled_store(const char *val,
if (err)
return err;
- if (is_enabled == enable)
+ if (damon_reclaim_enabled() == enable)
return 0;
/* Called before init function. The function will handle this. */
@@ -414,15 +404,48 @@ static int damon_reclaim_enabled_store(const char *val,
return err;
}
+static int damon_reclaim_enabled_load(char *buffer,
+ const struct kernel_param *kp)
+{
+ return sprintf(buffer, "%c\n", damon_reclaim_enabled() ? 'Y' : 'N');
+}
+
static const struct kernel_param_ops enabled_param_ops = {
.set = damon_reclaim_enabled_store,
- .get = param_get_bool,
+ .get = damon_reclaim_enabled_load,
};
module_param_cb(enabled, &enabled_param_ops, &enabled, 0600);
MODULE_PARM_DESC(enabled,
"Enable or disable DAMON_RECLAIM (default: disabled)");
+static int damon_reclaim_kdamond_pid_load(char *buffer,
+ const struct kernel_param *kp)
+{
+ int kdamond_pid;
+
+ if (ctx) {
+ kdamond_pid = damon_kdamond_pid(ctx);
+ if (kdamond_pid < 0)
+ kdamond_pid = -1;
+ } else {
+ kdamond_pid = -1;
+ }
+ return sprintf(buffer, "%d\n", kdamond_pid);
+}
+
+static const struct kernel_param_ops kdamond_pid_param_ops = {
+ .get = damon_reclaim_kdamond_pid_load,
+};
+
+/*
+ * PID of the DAMON thread
+ *
+ * If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread.
+ * Else, -1.
+ */
+module_param_cb(kdamond_pid, &kdamond_pid_param_ops, NULL, 0400);
+
static int __init damon_reclaim_init(void)
{
int err;
'''
This is much bigger change than your original suggestion. But I feel this is
simpler, because this eliminates the needs to keep the states. What do you
think?
Thanks,
SJ
>
> >
> > > >
> > > > >
> > > > > Changes from RFC-v1
> > > > > (https://lore.kernel.org/20260330164347.12772-1-aethernet65535@gmail.com)
> > > > > - Remove RFC tag.
> > > >
> > > > When dropping RFC tag, let's start from v1 again, from the next time.
> >
> > Regarding the versioning, I have two questions:
> > 1. Is version number inhritance only required when converting from
> > non-RFC to an RFC?
>
> I'd request such conversion when dropping RFC tag. E.g.,
>
> RFC -> RFC v2 ... -- (drop RFC) -> v1 -> v2 ...
>
> > 2. Should my next version be tagged as v3 (since the current one is v2)?
>
> The ship is already sailed for this patch series. Please tag it with v3. But
> please follow the suggestion from next times.
>
> >
> > >
> > > Also, I just found the patches don't have Fixes: and Cc: stable@. Could you
> > > please add those appripriately?
> >
> > Regarding the Fixes: tags, I will include the following:
> > - Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting")
>
> This seems correct, but ...
>
> > - Fixes: 059342d1dd4e ("mm/damon/reclaim: fix the timer always stays active")
>
> Is this correct? At the point of this commit, there was no commit_inputs
> parameter. Were you able to confirm this issue can and cannot be reproduced on
> this commit and its parent commit, respectively?
>
>
> Thanks,
> SJ
>
> [...]
Sent using hkml (https://github.com/sjp38/hackermail)
next prev parent reply other threads:[~2026-04-16 6:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-13 18:52 Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 1/2] mm/damon/lru_sort: " Liew Rui Yan
2026-04-13 19:54 ` (sashiko review) " Liew Rui Yan
2026-04-13 18:52 ` [PATCH v2 2/2] mm/damon/reclaim: " Liew Rui Yan
2026-04-13 19:57 ` (sashiko review) " Liew Rui Yan
2026-04-13 22:05 ` [PATCH v2 0/2] mm/damon: " Liew Rui Yan
2026-04-14 0:28 ` SeongJae Park
2026-04-14 0:22 ` SeongJae Park
2026-04-14 0:34 ` SeongJae Park
2026-04-15 18:45 ` Liew Rui Yan
2026-04-15 23:55 ` SeongJae Park
2026-04-16 6:28 ` SeongJae Park [this message]
2026-04-16 0:17 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260416062818.949-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=aethernet65535@gmail.com \
--cc=damon@lists.linux.dev \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox