I found a bug in damos_walk() that leaves a dangling walk_control
pointer when called on an inactive context. The pattern is
structurally identical to the bug fixed in commit f9132fbc2e83
("mm/damon/core: remove call_control in inactive contexts") for
damon_call().
## Description
damos_walk() sets ctx->walk_control to point to a caller-provided
stack-allocated control structure (core.c line 1560), then checks
if the DAMON context is running (line 1562). If the context is
inactive, it returns -EINVAL (line 1563) WITHOUT clearing
ctx->walk_control back to NULL.
This leaves a dangling pointer. Subsequent damos_walk() calls see
the non-NULL stale pointer and return -EBUSY, permanently locking
the DAMOS tried_regions interface.
## Affected versions
Introduced in: commit bf0eaba0ff9c ("mm/damon/core: implement damos_walk()")
First affected release: v6.14-rc1
Affected stable releases: v6.14, v6.15, v6.16, v6.17, v6.18, v6.19
Tested on: 6.19.0 (commit ca4ee40bf13d, QEMU/KVM x86_64)
Current mainline: UNFIXED
## Reproduction (confirmed on 6.19.0, CONFIG_DAMON=y CONFIG_DAMON_SYSFS=y)
DAMON=/sys/kernel/mm/damon/admin/kdamonds
# Setup context with scheme
echo 1 > $DAMON/nr_kdamonds
echo 1 > $DAMON/0/contexts/nr_contexts
echo vaddr > $DAMON/0/contexts/0/operations
echo 1 > $DAMON/0/contexts/0/targets/nr_targets
echo $$ > $DAMON/0/contexts/0/targets/0/pid_target
echo 1 > $DAMON/0/contexts/0/schemes/nr_schemes
echo stat > $DAMON/0/contexts/0/schemes/0/action
# Start then stop (ctx stays allocated per sysfs design)
echo on > $DAMON/0/state
sleep 1
echo off > $DAMON/0/state
sleep 1
# Trigger bug: damos_walk() on inactive context
echo "update_schemes_tried_regions" > $DAMON/0/state
# Returns -EINVAL, walk_control left dangling
# Confirm: second call gets -EBUSY (dangling pointer != NULL)
echo "update_schemes_tried_regions" > $DAMON/0/state
# Returns -EBUSY -- interface permanently locked
## Tested output
First call: -EINVAL (Invalid argument)
Second call: -EBUSY (Device or resource busy) <-- BUG confirmed
## Root cause
Commit bf0eaba0ff9c ("mm/damon/core: implement damos_walk()")
introduced this function without cleanup on the -EINVAL error path.
The sibling function damon_call() had the exact same bug and was
fixed in f9132fbc2e83 by adding damon_call_handle_inactive_ctx()
which removes the control object when the context is inactive.
damos_walk() has no equivalent cleanup.
## Impact
1. PERMANENT LOCKUP DOS: After on->off->update_schemes_tried_regions,
all future tried_regions queries return -EBUSY forever until
the DAMON context is destroyed.
2. DANGLING POINTER: ctx->walk_control points to freed stack memory.
The struct damos_walk_control contains a function pointer
(walk_fn). If any DAMON API consumer reuses the same ctx after
damos_walk() returns -EINVAL and kdamond is restarted, it would
dereference the dangling pointer in damos_walk_call_walk()
(which calls control->walk_fn) or damos_walk_cancel().
Reported-by: Raul Pazemécxas <raul_pazemecxas@hotmail.com>
Best regards,
Raul