From: SeongJae Park <sj@kernel.org>
To: JaeJoon Jung <rgbi3307@gmail.com>
Cc: SeongJae Park <sj@kernel.org>,
damon@lists.linux.dev, linux-mm@kvack.org, rgbi3307@nate.com
Subject: Re: [PATCH] mm/damon/sysfs: preventing duplicated list_add_tail() at the damon_call()
Date: Thu, 25 Dec 2025 11:49:58 -0800 [thread overview]
Message-ID: <20251225194959.937-1-sj@kernel.org> (raw)
In-Reply-To: <CAHOvCC4zwbiLXRTHav67GcngiVZ-5GESqkY9juLtqAfWQW1UQg@mail.gmail.com>
On Thu, 25 Dec 2025 11:35:33 +0900 JaeJoon Jung <rgbi3307@gmail.com> wrote:
> On Thu, 25 Dec 2025 at 09:32, SeongJae Park <sj@kernel.org> wrote:
> >
> > Hello JaeJoon,
> >
> > On Wed, 24 Dec 2025 18:43:58 +0900 JaeJoon Jung <rgbi3307@gmail.com> wrote:
> >
> > > cd /sys/kernel/mm/damon/admin
> > > echo "off" > kdamonds/0/state
> > >
> > > echo "commit" > kdamonds/0/state
> > > echo "commit" > kdamonds/0/state
> > >
> > > If you repeat "commit" twice with the kdamonds/0/state set to "off"
> > > with the above command, list_add corruption error occurs as follows:
> > >
> > > 4-page vmalloc region starting at 0xffffffc600a38000 allocated at
> > > kernel_clone+0x44/0x41e
> > > ------------[ cut here ]------------
> > > list_add corruption. prev->next should be next (ffffffd6c7c5a6a8),
> > > but was ffffffc600a3bcc8. (prev=ffffffc600a3bcc8).
> > > WARNING: lib/list_debug.c:32 at __list_add_valid_or_report+
> > > 0xd8/0xe2, CPU#0: bash/466
> > > Modules linked in: dwmac_starfive stmmac_platform stmmac pcs_xpcs phylink
> > > CPU: 0 UID: 0 PID: 466 Comm: bash Tainted: G W 6.19.0-rc2+ #1 PREEMPTLAZY
> > > Tainted: [W]=WARN
> > > Hardware name: StarFive VisionFive 2 v1.3B (DT)
> > > epc : __list_add_valid_or_report+0xd8/0xe2
> > > ra : __list_add_valid_or_report+0xd8/0xe2
> > > epc : ffffffff80540bce ra : ffffffff80540bce sp : ffffffc600a3bc00
> > > gp : ffffffff81caec40 tp : ffffffd6c036f080 t0 : 0000000000000000
> > > t1 : 0000000000006000 t2 : 0000000000000002 s0 : ffffffc600a3bc30
> > > s1 : ffffffc600a3bcc8 a0 : ffffffd6fbf49a40 a1 : ffffffd6c036f080
> > > a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000000
> > > a5 : 0000000000000000 a6 : 0000000020000000 a7 : 0000000000000001
> > > s2 : ffffffd6c7c5a6a8 s3 : ffffffc600a3bcc8 s4 : ffffffc600a3bcc8
> > > s5 : ffffffd6c7c5a6b8 s6 : ffffffd6c7c5a6a8 s7 : 0000003ff3f32794
> > > s8 : 0000002ab38c9118 s9 : 0000000000000065 s10: 0000003f823a5cb8
> > > s11: 0000003f823264e8 t3 : 0000000000000001 t4 : 0000000000000000
> > > t5 : 00000000fa83b2da t6 : 000000000051df90
> > > status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
> > > [<ffffffff80540bce>] __list_add_valid_or_report+0xd8/0xe2
> > > [<ffffffff80255c86>] damon_call+0x52/0xe8
> > > [<ffffffff8025c9a8>] damon_sysfs_damon_call+0x60/0x8a
> > > [<ffffffff8025daf4>] state_store+0xfc/0x294
> > > [<ffffffff80dbf1fa>] kobj_attr_store+0xe/0x1a
> > > [<ffffffff802f070c>] sysfs_kf_write+0x42/0x56
> > > [<ffffffff802eef4e>] kernfs_fop_write_iter+0xf4/0x178
> > > [<ffffffff8026545c>] vfs_write+0x1b6/0x3b2
> > > [<ffffffff80265782>] ksys_write+0x52/0xbc
> > > [<ffffffff80265800>] __riscv_sys_write+0x14/0x1c
> > > [<ffffffff80ddf124>] do_trap_ecall_u+0x19c/0x26e
> > > [<ffffffff80deaa38>] handle_exception+0x150/0x15c
> > > ---[ end trace 0000000000000000 ]---
> > > -bash: echo: write error: Invalid argument
> >
> > Thank you for finding issue!
> >
> > Also appreciate for sharing your detailed reproducer. Nevertheless, I think
> > the reproducer can be more detailed. E.g., you could explicitly explain the
> > fact that the reproduction step should be executed only after starting DAMON
> > with the kdamond, and the kernel should run with CONFIG_lIST_HARDENED to get
> > the output from the kernel log.
>
> Yes, as you said, I ran it under CONFIG_LIST_HARDENED=y condition.
Thank you for confirming.
>
> >
> > >
> > > The cause of the above error is that list_add_tail() is executed
> > > repeatedly while executing damon_call(ctx, control)
> > > in damon_sysfs_damon_call(). The execution flow is summarized below:
> > >
> > > damon_sysfs_damon_call()
> > > --> damon_call(ctx, control)
> > > list_add_tail(control, ctx->call_contols);
> > > --> /* list_add corruption error */
> > > if (!damon_is_running)
> > > return -EINVAL;
> > >
> > > If you execute damon_call() when damon_sysfs_kdamond_running() is true,
> > > you can prevent the error of duplicate execution of list_add_tail().
> >
> > The kdamond might be terminated between the damon_call() call and the
> > damon_is_running() check inside the damon_call() execution. In the case, the
> > problem may still happen.
> >
> > The problem happens because damon_call() is not removing the damon_call_control
> > object before returning the error, right? What about removing the object
> > before returning the error?
>
> damon_call() is called after damon_start() --> kdamond_fn() is executed,
> This is a problem because damon_call() also occurs when kdamond is "off"
> only in damon/sysfs. So, my first patch solved the problem, but the
> following also worked. I tested it.
>
> And it seems better to keep the existing method of releasing
> damon_call_control. Since the damon_call_control structure uses both
> static and kmalloc(), it's appropriate to release it in kdamond_fn() according
> to the condition control->canceled && control->dealloc_on_cancel.
>
> My previous suggestion regarding this:
> https://lore.kernel.org/damon/20251206224724.13832-1-rgbi3307@gmail.com/
>
> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index babad37719b6..2ead0bb3c462 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -1462,6 +1462,9 @@ bool damon_is_running(struct damon_ctx *ctx)
> */
> int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
> {
> + if (!damon_is_running(ctx))
> + return -EINVAL;
> +
> if (!control->repeat)
> init_completion(&control->completion);
> control->canceled = false;
> @@ -1470,8 +1473,6 @@ int damon_call(struct damon_ctx *ctx, struct
> damon_call_control *control)
> mutex_lock(&ctx->call_controls_lock);
> list_add_tail(&control->list, &ctx->call_controls);
> mutex_unlock(&ctx->call_controls_lock);
> - if (!damon_is_running(ctx))
> - return -EINVAL;
> if (control->repeat)
> return 0;
> wait_for_completion(&control->completion);
Let's assume DAMON is terminated between the damon_is_running() and
list_add_tail(). In the case, the control->fn() will never be called back. If
control->repeat is false, this function will even inifnitely wait.
I think we should keep the damon_is_running() as is, but further check if it
was terminated without handling the control object, and remove it from the list
in the case. Like below.
diff --git a/mm/damon/core.c b/mm/damon/core.c
index c3ae55b940cc..23d8602f5a49 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1649,6 +1649,35 @@ bool damon_is_running(struct damon_ctx *ctx)
return running;
}
+/*
+ * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
+ * an inactive context.
+ * @ctx: The inactive DAMON context.
+ * @control: Control variable of the call request.
+ *
+ * This function is called in a case that @control is added to @ctx but @ctx is
+ * not running (inactive). See if @ctx handled @control or not, and cleanup
+ * @control if it was not handled.
+ *
+ * Returns 0 if @control was handled by @ctx, negative error code otherwise.
+ */
+static int damon_call_handle_inactive_ctx(
+ struct damon_ctx *ctx, struct damon_call_control *control)
+{
+ struct damon_call_control *c;
+
+ mutex_lock(&ctx->call_controls_lock);
+ list_for_each_entry(c, &ctx->call_controls, list) {
+ if (c == control) {
+ list_del(&control->list);
+ mutex_unlock(&ctx->call_controls_lock);
+ return -EINVAL;
+ }
+ }
+ mutex_unlock(&ctx->call_controls_lock);
+ return 0;
+}
+
/**
* damon_call() - Invoke a given function on DAMON worker thread (kdamond).
* @ctx: DAMON context to call the function for.
@@ -1679,7 +1708,7 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
list_add_tail(&control->list, &ctx->call_controls);
mutex_unlock(&ctx->call_controls_lock);
if (!damon_is_running(ctx))
- return -EINVAL;
+ return damon_call_handle_inactive_ctx(ctx, control);
if (control->repeat)
return 0;
wait_for_completion(&control->completion);
If you don't mind, I'll post the above diff as a patch, adding a 'Reported-by:'
tag for you.
>
> >
> > >
> > > Signed-off-by: JaeJoon Jung <rgbi3307@gmail.com>
> >
> > Could you please also add Fixes: and Cc: stable@ ?
>
> I don't have much experience with this, so I'm sorry,
> but could you please give me an example about this?
No problem.
Please refer to https://docs.kernel.org/process/stable-kernel-rules.html
Thanks,
SJ
[...]
next prev parent reply other threads:[~2025-12-25 19:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-24 9:43 JaeJoon Jung
2025-12-25 0:32 ` SeongJae Park
2025-12-25 2:35 ` JaeJoon Jung
2025-12-25 19:49 ` SeongJae Park [this message]
2025-12-26 1:48 ` JaeJoon Jung
2025-12-26 18:41 ` SeongJae Park
2025-12-26 23:53 ` JaeJoon Jung
2025-12-27 17:42 ` SeongJae Park
2025-12-29 3:38 ` JaeJoon Jung
2025-12-29 15:14 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251225194959.937-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=damon@lists.linux.dev \
--cc=linux-mm@kvack.org \
--cc=rgbi3307@gmail.com \
--cc=rgbi3307@nate.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox