Re: [PATCH] mm/damon/sysfs: preventing duplicated list_add_tail() at the damon_call()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: SeongJae Park <sj@kernel.org>
To: JaeJoon Jung <rgbi3307@gmail.com>
Cc: SeongJae Park <sj@kernel.org>,
	damon@lists.linux.dev, linux-mm@kvack.org, rgbi3307@nate.com
Subject: Re: [PATCH] mm/damon/sysfs: preventing duplicated list_add_tail() at the damon_call()
Date: Thu, 25 Dec 2025 11:49:58 -0800	[thread overview]
Message-ID: <20251225194959.937-1-sj@kernel.org> (raw)
In-Reply-To: <CAHOvCC4zwbiLXRTHav67GcngiVZ-5GESqkY9juLtqAfWQW1UQg@mail.gmail.com>

On Thu, 25 Dec 2025 11:35:33 +0900 JaeJoon Jung <rgbi3307@gmail.com> wrote:

> On Thu, 25 Dec 2025 at 09:32, SeongJae Park <sj@kernel.org> wrote:
> >
> > Hello JaeJoon,
> >
> > On Wed, 24 Dec 2025 18:43:58 +0900 JaeJoon Jung <rgbi3307@gmail.com> wrote:
> >
> > >  cd /sys/kernel/mm/damon/admin
> > >  echo "off" > kdamonds/0/state
> > >
> > >  echo "commit" > kdamonds/0/state
> > >  echo "commit" > kdamonds/0/state
> > >
> > > If you repeat "commit" twice with the kdamonds/0/state set to "off"
> > > with the above command, list_add corruption error occurs as follows:
> > >
> > > 4-page vmalloc region starting at 0xffffffc600a38000 allocated at
> > > kernel_clone+0x44/0x41e
> > > ------------[ cut here ]------------
> > > list_add corruption. prev->next should be next (ffffffd6c7c5a6a8),
> > >          but was ffffffc600a3bcc8. (prev=ffffffc600a3bcc8).
> > > WARNING: lib/list_debug.c:32 at __list_add_valid_or_report+
> > > 0xd8/0xe2, CPU#0: bash/466
> > > Modules linked in: dwmac_starfive stmmac_platform stmmac pcs_xpcs phylink
> > > CPU: 0 UID: 0 PID: 466 Comm: bash Tainted: G W 6.19.0-rc2+ #1 PREEMPTLAZY
> > > Tainted: [W]=WARN
> > > Hardware name: StarFive VisionFive 2 v1.3B (DT)
> > >  epc : __list_add_valid_or_report+0xd8/0xe2
> > >  ra : __list_add_valid_or_report+0xd8/0xe2
> > >  epc : ffffffff80540bce ra : ffffffff80540bce sp : ffffffc600a3bc00
> > >  gp : ffffffff81caec40 tp : ffffffd6c036f080 t0 : 0000000000000000
> > >  t1 : 0000000000006000 t2 : 0000000000000002 s0 : ffffffc600a3bc30
> > >  s1 : ffffffc600a3bcc8 a0 : ffffffd6fbf49a40 a1 : ffffffd6c036f080
> > >  a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000000
> > >  a5 : 0000000000000000 a6 : 0000000020000000 a7 : 0000000000000001
> > >  s2 : ffffffd6c7c5a6a8 s3 : ffffffc600a3bcc8 s4 : ffffffc600a3bcc8
> > >  s5 : ffffffd6c7c5a6b8 s6 : ffffffd6c7c5a6a8 s7 : 0000003ff3f32794
> > >  s8 : 0000002ab38c9118 s9 : 0000000000000065 s10: 0000003f823a5cb8
> > >  s11: 0000003f823264e8 t3 : 0000000000000001 t4 : 0000000000000000
> > >  t5 : 00000000fa83b2da t6 : 000000000051df90
> > > status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
> > > [<ffffffff80540bce>] __list_add_valid_or_report+0xd8/0xe2
> > > [<ffffffff80255c86>] damon_call+0x52/0xe8
> > > [<ffffffff8025c9a8>] damon_sysfs_damon_call+0x60/0x8a
> > > [<ffffffff8025daf4>] state_store+0xfc/0x294
> > > [<ffffffff80dbf1fa>] kobj_attr_store+0xe/0x1a
> > > [<ffffffff802f070c>] sysfs_kf_write+0x42/0x56
> > > [<ffffffff802eef4e>] kernfs_fop_write_iter+0xf4/0x178
> > > [<ffffffff8026545c>] vfs_write+0x1b6/0x3b2
> > > [<ffffffff80265782>] ksys_write+0x52/0xbc
> > > [<ffffffff80265800>] __riscv_sys_write+0x14/0x1c
> > > [<ffffffff80ddf124>] do_trap_ecall_u+0x19c/0x26e
> > > [<ffffffff80deaa38>] handle_exception+0x150/0x15c
> > > ---[ end trace 0000000000000000 ]---
> > > -bash: echo: write error: Invalid argument
> >
> > Thank you for finding issue!
> >
> > Also appreciate for sharing your detailed reproducer.  Nevertheless, I think
> > the reproducer can be more detailed.  E.g., you could explicitly explain the
> > fact that the reproduction step should be executed only after starting DAMON
> > with the kdamond, and the kernel should run with CONFIG_lIST_HARDENED to get
> > the output from the kernel log.
> 
> Yes, as you said, I ran it under CONFIG_LIST_HARDENED=y condition.

Thank you for confirming.

> 
> >
> > >
> > > The cause of the above error is that list_add_tail() is executed
> > > repeatedly while executing damon_call(ctx, control)
> > > in damon_sysfs_damon_call().  The execution flow is summarized below:
> > >
> > > damon_sysfs_damon_call()
> > >         --> damon_call(ctx, control)
> > >                 list_add_tail(control, ctx->call_contols);
> > >                         --> /* list_add corruption error */
> > >                 if (!damon_is_running)
> > >                         return -EINVAL;
> > >
> > > If you execute damon_call() when damon_sysfs_kdamond_running() is true,
> > > you can prevent the error of duplicate execution of list_add_tail().
> >
> > The kdamond might be terminated between the damon_call() call and the
> > damon_is_running() check inside the damon_call() execution.  In the case, the
> > problem may still happen.
> >
> > The problem happens because damon_call() is not removing the damon_call_control
> > object before returning the error, right?  What about removing the object
> > before returning the error?
> 
> damon_call() is called after damon_start() --> kdamond_fn() is executed,
> This is a problem because damon_call() also occurs when kdamond is "off"
> only in damon/sysfs.  So, my first patch solved the problem, but the
> following also worked. I tested it.
> 
> And it seems better to keep the existing method of releasing
> damon_call_control.  Since the damon_call_control structure uses both
> static and kmalloc(),  it's appropriate to release it in kdamond_fn() according
> to the condition control->canceled && control->dealloc_on_cancel.
> 
> My previous suggestion regarding this:
> https://lore.kernel.org/damon/20251206224724.13832-1-rgbi3307@gmail.com/
> 
> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index babad37719b6..2ead0bb3c462 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -1462,6 +1462,9 @@ bool damon_is_running(struct damon_ctx *ctx)
>   */
>  int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
>  {
> +       if (!damon_is_running(ctx))
> +               return -EINVAL;
> +
>         if (!control->repeat)
>                 init_completion(&control->completion);
>         control->canceled = false;
> @@ -1470,8 +1473,6 @@ int damon_call(struct damon_ctx *ctx, struct
> damon_call_control *control)
>         mutex_lock(&ctx->call_controls_lock);
>         list_add_tail(&control->list, &ctx->call_controls);
>         mutex_unlock(&ctx->call_controls_lock);
> -       if (!damon_is_running(ctx))
> -               return -EINVAL;
>         if (control->repeat)
>                 return 0;
>         wait_for_completion(&control->completion);

Let's assume DAMON is terminated between the damon_is_running() and
list_add_tail().  In the case, the control->fn() will never be called back.  If
control->repeat is false, this function will even inifnitely wait.

I think we should keep the damon_is_running() as is, but further check if it
was terminated without handling the control object, and remove it from the list
in the case.  Like below.

diff --git a/mm/damon/core.c b/mm/damon/core.c
index c3ae55b940cc..23d8602f5a49 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1649,6 +1649,35 @@ bool damon_is_running(struct damon_ctx *ctx)
        return running;
 }

+/*
+ * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
+ *                                   an inactive context.
+ * @ctx:       The inactive DAMON context.
+ * @control:   Control variable of the call request.
+ *
+ * This function is called in a case that @control is added to @ctx but @ctx is
+ * not running (inactive).  See if @ctx handled @control or not, and cleanup
+ * @control if it was not handled.
+ *
+ * Returns 0 if @control was handled by @ctx, negative error code otherwise.
+ */
+static int damon_call_handle_inactive_ctx(
+               struct damon_ctx *ctx, struct damon_call_control *control)
+{
+       struct damon_call_control *c;
+
+       mutex_lock(&ctx->call_controls_lock);
+       list_for_each_entry(c, &ctx->call_controls, list) {
+               if (c == control) {
+                       list_del(&control->list);
+                       mutex_unlock(&ctx->call_controls_lock);
+                       return -EINVAL;
+               }
+       }
+       mutex_unlock(&ctx->call_controls_lock);
+       return 0;
+}
+
 /**
  * damon_call() - Invoke a given function on DAMON worker thread (kdamond).
  * @ctx:       DAMON context to call the function for.
@@ -1679,7 +1708,7 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
        list_add_tail(&control->list, &ctx->call_controls);
        mutex_unlock(&ctx->call_controls_lock);
        if (!damon_is_running(ctx))
-               return -EINVAL;
+               return damon_call_handle_inactive_ctx(ctx, control);
        if (control->repeat)
                return 0;
        wait_for_completion(&control->completion);

If you don't mind, I'll post the above diff as a patch, adding a 'Reported-by:'
tag for you.

> 
> >
> > >
> > > Signed-off-by: JaeJoon Jung <rgbi3307@gmail.com>
> >
> > Could you please also add Fixes: and Cc: stable@ ?
> 
> I don't have much experience with this, so I'm sorry,
> but could you please give me an example about this?

No problem.

Please refer to https://docs.kernel.org/process/stable-kernel-rules.html


Thanks,
SJ

[...]

next prev parent reply	other threads:[~2025-12-25 19:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-24  9:43 JaeJoon Jung
2025-12-25  0:32 ` SeongJae Park
2025-12-25  2:35   ` JaeJoon Jung
2025-12-25 19:49     ` SeongJae Park [this message]
2025-12-26  1:48       ` JaeJoon Jung
2025-12-26 18:41         ` SeongJae Park
2025-12-26 23:53           ` JaeJoon Jung
2025-12-27 17:42             ` SeongJae Park
2025-12-29  3:38               ` JaeJoon Jung
2025-12-29 15:14                 ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251225194959.937-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=rgbi3307@gmail.com \
    --cc=rgbi3307@nate.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox