From: JaeJoon Jung <rgbi3307@gmail.com>
To: SeongJae Park <sj@kernel.org>
Cc: damon@lists.linux.dev, linux-mm@kvack.org, rgbi3307@nate.com
Subject: Re: [PATCH] mm/damon/sysfs: preventing duplicated list_add_tail() at the damon_call()
Date: Fri, 26 Dec 2025 10:48:31 +0900 [thread overview]
Message-ID: <CAHOvCC6PQKAK5WojmVFqBw0V-TZ+11CM_2uq4z8TxsyEu3gLVQ@mail.gmail.com> (raw)
In-Reply-To: <20251225194959.937-1-sj@kernel.org>
On Fri, 26 Dec 2025 at 04:50, SeongJae Park <sj@kernel.org> wrote:
>
> On Thu, 25 Dec 2025 11:35:33 +0900 JaeJoon Jung <rgbi3307@gmail.com> wrote:
>
> > On Thu, 25 Dec 2025 at 09:32, SeongJae Park <sj@kernel.org> wrote:
> > >
> > > Hello JaeJoon,
> > >
> > > On Wed, 24 Dec 2025 18:43:58 +0900 JaeJoon Jung <rgbi3307@gmail.com> wrote:
> > >
> > > > cd /sys/kernel/mm/damon/admin
> > > > echo "off" > kdamonds/0/state
> > > >
> > > > echo "commit" > kdamonds/0/state
> > > > echo "commit" > kdamonds/0/state
> > > >
> > > > If you repeat "commit" twice with the kdamonds/0/state set to "off"
> > > > with the above command, list_add corruption error occurs as follows:
> > > >
> > > > 4-page vmalloc region starting at 0xffffffc600a38000 allocated at
> > > > kernel_clone+0x44/0x41e
> > > > ------------[ cut here ]------------
> > > > list_add corruption. prev->next should be next (ffffffd6c7c5a6a8),
> > > > but was ffffffc600a3bcc8. (prev=ffffffc600a3bcc8).
> > > > WARNING: lib/list_debug.c:32 at __list_add_valid_or_report+
> > > > 0xd8/0xe2, CPU#0: bash/466
> > > > Modules linked in: dwmac_starfive stmmac_platform stmmac pcs_xpcs phylink
> > > > CPU: 0 UID: 0 PID: 466 Comm: bash Tainted: G W 6.19.0-rc2+ #1 PREEMPTLAZY
> > > > Tainted: [W]=WARN
> > > > Hardware name: StarFive VisionFive 2 v1.3B (DT)
> > > > epc : __list_add_valid_or_report+0xd8/0xe2
> > > > ra : __list_add_valid_or_report+0xd8/0xe2
> > > > epc : ffffffff80540bce ra : ffffffff80540bce sp : ffffffc600a3bc00
> > > > gp : ffffffff81caec40 tp : ffffffd6c036f080 t0 : 0000000000000000
> > > > t1 : 0000000000006000 t2 : 0000000000000002 s0 : ffffffc600a3bc30
> > > > s1 : ffffffc600a3bcc8 a0 : ffffffd6fbf49a40 a1 : ffffffd6c036f080
> > > > a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000000
> > > > a5 : 0000000000000000 a6 : 0000000020000000 a7 : 0000000000000001
> > > > s2 : ffffffd6c7c5a6a8 s3 : ffffffc600a3bcc8 s4 : ffffffc600a3bcc8
> > > > s5 : ffffffd6c7c5a6b8 s6 : ffffffd6c7c5a6a8 s7 : 0000003ff3f32794
> > > > s8 : 0000002ab38c9118 s9 : 0000000000000065 s10: 0000003f823a5cb8
> > > > s11: 0000003f823264e8 t3 : 0000000000000001 t4 : 0000000000000000
> > > > t5 : 00000000fa83b2da t6 : 000000000051df90
> > > > status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
> > > > [<ffffffff80540bce>] __list_add_valid_or_report+0xd8/0xe2
> > > > [<ffffffff80255c86>] damon_call+0x52/0xe8
> > > > [<ffffffff8025c9a8>] damon_sysfs_damon_call+0x60/0x8a
> > > > [<ffffffff8025daf4>] state_store+0xfc/0x294
> > > > [<ffffffff80dbf1fa>] kobj_attr_store+0xe/0x1a
> > > > [<ffffffff802f070c>] sysfs_kf_write+0x42/0x56
> > > > [<ffffffff802eef4e>] kernfs_fop_write_iter+0xf4/0x178
> > > > [<ffffffff8026545c>] vfs_write+0x1b6/0x3b2
> > > > [<ffffffff80265782>] ksys_write+0x52/0xbc
> > > > [<ffffffff80265800>] __riscv_sys_write+0x14/0x1c
> > > > [<ffffffff80ddf124>] do_trap_ecall_u+0x19c/0x26e
> > > > [<ffffffff80deaa38>] handle_exception+0x150/0x15c
> > > > ---[ end trace 0000000000000000 ]---
> > > > -bash: echo: write error: Invalid argument
> > >
> > > Thank you for finding issue!
> > >
> > > Also appreciate for sharing your detailed reproducer. Nevertheless, I think
> > > the reproducer can be more detailed. E.g., you could explicitly explain the
> > > fact that the reproduction step should be executed only after starting DAMON
> > > with the kdamond, and the kernel should run with CONFIG_lIST_HARDENED to get
> > > the output from the kernel log.
> >
> > Yes, as you said, I ran it under CONFIG_LIST_HARDENED=y condition.
>
> Thank you for confirming.
>
> >
> > >
> > > >
> > > > The cause of the above error is that list_add_tail() is executed
> > > > repeatedly while executing damon_call(ctx, control)
> > > > in damon_sysfs_damon_call(). The execution flow is summarized below:
> > > >
> > > > damon_sysfs_damon_call()
> > > > --> damon_call(ctx, control)
> > > > list_add_tail(control, ctx->call_contols);
> > > > --> /* list_add corruption error */
> > > > if (!damon_is_running)
> > > > return -EINVAL;
> > > >
> > > > If you execute damon_call() when damon_sysfs_kdamond_running() is true,
> > > > you can prevent the error of duplicate execution of list_add_tail().
> > >
> > > The kdamond might be terminated between the damon_call() call and the
> > > damon_is_running() check inside the damon_call() execution. In the case, the
> > > problem may still happen.
> > >
> > > The problem happens because damon_call() is not removing the damon_call_control
> > > object before returning the error, right? What about removing the object
> > > before returning the error?
> >
> > damon_call() is called after damon_start() --> kdamond_fn() is executed,
> > This is a problem because damon_call() also occurs when kdamond is "off"
> > only in damon/sysfs. So, my first patch solved the problem, but the
> > following also worked. I tested it.
> >
> > And it seems better to keep the existing method of releasing
> > damon_call_control. Since the damon_call_control structure uses both
> > static and kmalloc(), it's appropriate to release it in kdamond_fn() according
> > to the condition control->canceled && control->dealloc_on_cancel.
> >
> > My previous suggestion regarding this:
> > https://lore.kernel.org/damon/20251206224724.13832-1-rgbi3307@gmail.com/
> >
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index babad37719b6..2ead0bb3c462 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> > @@ -1462,6 +1462,9 @@ bool damon_is_running(struct damon_ctx *ctx)
> > */
> > int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
> > {
> > + if (!damon_is_running(ctx))
> > + return -EINVAL;
> > +
> > if (!control->repeat)
> > init_completion(&control->completion);
> > control->canceled = false;
> > @@ -1470,8 +1473,6 @@ int damon_call(struct damon_ctx *ctx, struct
> > damon_call_control *control)
> > mutex_lock(&ctx->call_controls_lock);
> > list_add_tail(&control->list, &ctx->call_controls);
> > mutex_unlock(&ctx->call_controls_lock);
> > - if (!damon_is_running(ctx))
> > - return -EINVAL;
> > if (control->repeat)
> > return 0;
> > wait_for_completion(&control->completion);
>
> Let's assume DAMON is terminated between the damon_is_running() and
> list_add_tail(). In the case, the control->fn() will never be called back. If
> control->repeat is false, this function will even inifnitely wait.
As you said, there are cases where kdamond is terminated(stopped) in
damon_is_running() and list_add_tail(). It may be a very rare case, but
I missed this case.
>
> I think we should keep the damon_is_running() as is, but further check if it
> was terminated without handling the control object, and remove it from the list
> in the case. Like below.
>
> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index c3ae55b940cc..23d8602f5a49 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -1649,6 +1649,35 @@ bool damon_is_running(struct damon_ctx *ctx)
> return running;
> }
>
> +/*
> + * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
> + * an inactive context.
> + * @ctx: The inactive DAMON context.
> + * @control: Control variable of the call request.
> + *
> + * This function is called in a case that @control is added to @ctx but @ctx is
> + * not running (inactive). See if @ctx handled @control or not, and cleanup
> + * @control if it was not handled.
> + *
> + * Returns 0 if @control was handled by @ctx, negative error code otherwise.
> + */
> +static int damon_call_handle_inactive_ctx(
> + struct damon_ctx *ctx, struct damon_call_control *control)
> +{
> + struct damon_call_control *c;
> +
> + mutex_lock(&ctx->call_controls_lock);
> + list_for_each_entry(c, &ctx->call_controls, list) {
> + if (c == control) {
> + list_del(&control->list);
> + mutex_unlock(&ctx->call_controls_lock);
> + return -EINVAL;
> + }
> + }
> + mutex_unlock(&ctx->call_controls_lock);
> + return 0;
> +}
> +
> /**
> * damon_call() - Invoke a given function on DAMON worker thread (kdamond).
> * @ctx: DAMON context to call the function for.
> @@ -1679,7 +1708,7 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
> list_add_tail(&control->list, &ctx->call_controls);
> mutex_unlock(&ctx->call_controls_lock);
> if (!damon_is_running(ctx))
> - return -EINVAL;
> + return damon_call_handle_inactive_ctx(ctx, control);
> if (control->repeat)
> return 0;
> wait_for_completion(&control->completion);
>
However, the damon_call_handle_inactive_ctx() function is to post-process
the duplicate addition of control->list. Rather, it is more efficient to
prevent duplicate additions in advance, as follows:
I have tested the following and it works fine.
@@ -1467,11 +1496,14 @@ int damon_call(struct damon_ctx *ctx, struct
damon_call_control *control)
control->canceled = false;
INIT_LIST_HEAD(&control->list);
- mutex_lock(&ctx->call_controls_lock);
- list_add_tail(&control->list, &ctx->call_controls);
- mutex_unlock(&ctx->call_controls_lock);
- if (!damon_is_running(ctx))
+ if (damon_is_running(ctx)) {
+ mutex_lock(&ctx->call_controls_lock);
+ list_add_tail(&control->list, &ctx->call_controls);
+ mutex_unlock(&ctx->call_controls_lock);
+ } else {
+ /* return damon_call_handle_inactive_ctx(ctx, control); */
return -EINVAL;
+ }
if (control->repeat)
return 0;
wait_for_completion(&control->completion);
> If you don't mind, I'll post the above diff as a patch, adding a 'Reported-by:'
> tag for you.
'Reported-by:' is OK. However, please check the above again.
>
> >
> > >
> > > >
> > > > Signed-off-by: JaeJoon Jung <rgbi3307@gmail.com>
> > >
> > > Could you please also add Fixes: and Cc: stable@ ?
> >
> > I don't have much experience with this, so I'm sorry,
> > but could you please give me an example about this?
>
> No problem.
>
> Please refer to https://docs.kernel.org/process/stable-kernel-rules.html
Thanks,
JaeJoon
>
>
> Thanks,
> SJ
>
> [...]
next prev parent reply other threads:[~2025-12-26 1:48 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-24 9:43 JaeJoon Jung
2025-12-25 0:32 ` SeongJae Park
2025-12-25 2:35 ` JaeJoon Jung
2025-12-25 19:49 ` SeongJae Park
2025-12-26 1:48 ` JaeJoon Jung [this message]
2025-12-26 18:41 ` SeongJae Park
2025-12-26 23:53 ` JaeJoon Jung
2025-12-27 17:42 ` SeongJae Park
2025-12-29 3:38 ` JaeJoon Jung
2025-12-29 15:14 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHOvCC6PQKAK5WojmVFqBw0V-TZ+11CM_2uq4z8TxsyEu3gLVQ@mail.gmail.com \
--to=rgbi3307@gmail.com \
--cc=damon@lists.linux.dev \
--cc=linux-mm@kvack.org \
--cc=rgbi3307@nate.com \
--cc=sj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox