[PATCH 0/5] selftests/damon: improve leak detection and wss estimation reliability

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] selftests/damon: improve leak detection and wss estimation reliability
@ 2026-01-17  2:07 SeongJae Park
  2026-01-17  2:07 ` [PATCH 1/5] selftests/damon/sysfs_memcg_path_leak.sh: use kmemleak SeongJae Park
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: SeongJae Park @ 2026-01-17  2:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Shuah Khan, damon, linux-kernel, linux-kselftest,
	linux-mm

Two DAMON selftets, namely 'sysfs_memcg_leak' and
'sysfs_update_schemes_tried_regions_wss_estimation' frequently show
intermittent failures due to their unreliable leak detection and working
set size estimation.  Make those more reliable.

SeongJae Park (5):
  selftests/damon/sysfs_memcg_path_leak.sh: use kmemleak
  selftests/damon/wss_estimation: test for up to 160 MiB working set
    size
  selftests/damon/access_memory: add repeat mode
  selftests/damon/wss_estimation: ensure number of collected wss
  selftests/damon/wss_estimation: deduplicate failed samples output

 tools/testing/selftests/damon/access_memory.c | 29 +++++++++----
 .../selftests/damon/sysfs_memcg_path_leak.sh  | 26 ++++++------
 ...te_schemes_tried_regions_wss_estimation.py | 41 +++++++++++++++----
 3 files changed, 67 insertions(+), 29 deletions(-)


base-commit: 3944e89e2ad1bafac43daae60b56d2847227ab01
-- 
2.47.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/5] selftests/damon/sysfs_memcg_path_leak.sh: use kmemleak
  2026-01-17  2:07 [PATCH 0/5] selftests/damon: improve leak detection and wss estimation reliability SeongJae Park
@ 2026-01-17  2:07 ` SeongJae Park
  2026-01-17  2:07 ` [PATCH 2/5] selftests/damon/wss_estimation: test for up to 160 MiB working set size SeongJae Park
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: SeongJae Park @ 2026-01-17  2:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Shuah Khan, damon, linux-kernel, linux-kselftest,
	linux-mm

sysfs_memcg_path_leak.sh determines if the memory leak has happened by
seeing if Slab size on /proc/meminfo increases more than expected after
an action.  Depending on the system and background workloads, the
reasonable expectation varies.  For the reason, the test frequently
shows intermittent failures.  Use kmemleak, which is much more reliable
and correct, instead.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 .../selftests/damon/sysfs_memcg_path_leak.sh  | 26 ++++++++++---------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/damon/sysfs_memcg_path_leak.sh b/tools/testing/selftests/damon/sysfs_memcg_path_leak.sh
index 64c5d8c518a4..33a7ff43ed6c 100755
--- a/tools/testing/selftests/damon/sysfs_memcg_path_leak.sh
+++ b/tools/testing/selftests/damon/sysfs_memcg_path_leak.sh
@@ -14,6 +14,13 @@ then
 	exit $ksft_skip
 fi
 
+kmemleak="/sys/kernel/debug/kmemleak"
+if [ ! -f "$kmemleak" ]
+then
+	echo "$kmemleak not found"
+	exit $ksft_skip
+fi
+
 # ensure filter directory
 echo 1 > "$damon_sysfs/kdamonds/nr_kdamonds"
 echo 1 > "$damon_sysfs/kdamonds/0/contexts/nr_contexts"
@@ -22,22 +29,17 @@ echo 1 > "$damon_sysfs/kdamonds/0/contexts/0/schemes/0/filters/nr_filters"
 
 filter_dir="$damon_sysfs/kdamonds/0/contexts/0/schemes/0/filters/0"
 
-before_kb=$(grep Slab /proc/meminfo | awk '{print $2}')
-
-# try to leak 3000 KiB
-for i in {1..102400};
+# try to leak 128 times
+for i in {1..128};
 do
 	echo "012345678901234567890123456789" > "$filter_dir/memcg_path"
 done
 
-after_kb=$(grep Slab /proc/meminfo | awk '{print $2}')
-# expect up to 1500 KiB free from other tasks memory
-expected_after_kb_max=$((before_kb + 1500))
-
-if [ "$after_kb" -gt "$expected_after_kb_max" ]
+echo scan > "$kmemleak"
+kmemleak_report=$(cat "$kmemleak")
+if [ "$kmemleak_report" = "" ]
 then
-	echo "maybe memcg_path are leaking: $before_kb -> $after_kb"
-	exit 1
-else
 	exit 0
 fi
+echo "$kmemleak_report"
+exit 1
-- 
2.47.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/5] selftests/damon/wss_estimation: test for up to 160 MiB working set size
  2026-01-17  2:07 [PATCH 0/5] selftests/damon: improve leak detection and wss estimation reliability SeongJae Park
  2026-01-17  2:07 ` [PATCH 1/5] selftests/damon/sysfs_memcg_path_leak.sh: use kmemleak SeongJae Park
@ 2026-01-17  2:07 ` SeongJae Park
  2026-01-17  2:07 ` [PATCH 3/5] selftests/damon/access_memory: add repeat mode SeongJae Park
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: SeongJae Park @ 2026-01-17  2:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Shuah Khan, damon, linux-kernel, linux-kselftest,
	linux-mm

DAMON reads and writes Accessed bits of page tables without manual TLB
flush for two reasons.  First, it minimizes the overhead.  Second, real
systems that need DAMON are expected to be memory intensive enough to
cause periodic TLB flushes.  For test setups that use small test
workloads, however, the system's TLB could be big enough to cover whole
or most accesses of the test workload.  In this case, no page table walk
happens and DAMON cannot show any access from the test workload.

The test workload for DAMON's working set size estimation selftest is
such a case.  It accesses only 10 MiB working set, and it turned out
there are test setups that have TLBs large enough to cover the 10 MiB
data accesses.  As a result, the test fails depending on the test
machine.

Make it more reliable by trying larger working sets up to 160 MiB when
it fails.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 ...te_schemes_tried_regions_wss_estimation.py | 29 +++++++++++++++----
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py b/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
index 90ad7409a7a6..bf48ef8e5241 100755
--- a/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
+++ b/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
@@ -6,9 +6,8 @@ import time
 
 import _damon_sysfs
 
-def main():
-    # access two 10 MiB memory regions, 2 second per each
-    sz_region = 10 * 1024 * 1024
+def pass_wss_estimation(sz_region):
+    # access two regions of given size, 2 seocnds per each region
     proc = subprocess.Popen(['./access_memory', '2', '%d' % sz_region, '2000'])
     kdamonds = _damon_sysfs.Kdamonds([_damon_sysfs.Kdamond(
             contexts=[_damon_sysfs.DamonCtx(
@@ -36,20 +35,38 @@ def main():
 
         wss_collected.append(
                 kdamonds.kdamonds[0].contexts[0].schemes[0].tried_bytes)
+    err = kdamonds.stop()
+    if err is not None:
+        print('kdamond stop failed: %s' % err)
+        exit(1)
 
     wss_collected.sort()
     acceptable_error_rate = 0.2
     for percentile in [50, 75]:
         sample = wss_collected[int(len(wss_collected) * percentile / 100)]
         error_rate = abs(sample - sz_region) / sz_region
-        print('%d-th percentile (%d) error %f' %
-                (percentile, sample, error_rate))
+        print('%d-th percentile error %f (expect %d, result %d)' %
+                (percentile, error_rate, sz_region, sample))
         if error_rate > acceptable_error_rate:
             print('the error rate is not acceptable (> %f)' %
                     acceptable_error_rate)
             print('samples are as below')
             print('\n'.join(['%d' % wss for wss in wss_collected]))
-            exit(1)
+            return False
+    return True
+
+def main():
+    # DAMON doesn't flush TLB.  If the system has large TLB that can cover
+    # whole test working set, DAMON cannot see the access.  Test up to 160 MiB
+    # test working set.
+    sz_region_mb = 10
+    max_sz_region_mb = 160
+    while sz_region_mb <= max_sz_region_mb:
+        test_pass = pass_wss_estimation(sz_region_mb * 1024 * 1024)
+        if test_pass is True:
+            exit(0)
+        sz_region_mb *= 2
+    exit(1)
 
 if __name__ == '__main__':
     main()
-- 
2.47.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/5] selftests/damon/access_memory: add repeat mode
  2026-01-17  2:07 [PATCH 0/5] selftests/damon: improve leak detection and wss estimation reliability SeongJae Park
  2026-01-17  2:07 ` [PATCH 1/5] selftests/damon/sysfs_memcg_path_leak.sh: use kmemleak SeongJae Park
  2026-01-17  2:07 ` [PATCH 2/5] selftests/damon/wss_estimation: test for up to 160 MiB working set size SeongJae Park
@ 2026-01-17  2:07 ` SeongJae Park
  2026-01-17  2:07 ` [PATCH 4/5] selftests/damon/wss_estimation: ensure number of collected wss SeongJae Park
  2026-01-17  2:07 ` [PATCH 5/5] selftests/damon/wss_estimation: deduplicate failed samples output SeongJae Park
  4 siblings, 0 replies; 6+ messages in thread
From: SeongJae Park @ 2026-01-17  2:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Shuah Khan, damon, linux-kernel, linux-kselftest,
	linux-mm

'access_memory' is an artificial memory access generator program that is
used for a few DAMON selftests.  It accesses a given number of regions
one by one only once, and exits.  Depending on systems, the test
workload may exit faster than expected, making the tests unreliable.
For reliable control of the artificial memory access pattern, add a mode
to make it repeat running.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 tools/testing/selftests/damon/access_memory.c | 29 ++++++++++++++-----
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/damon/access_memory.c b/tools/testing/selftests/damon/access_memory.c
index 56b17e8fe1be..567793b11107 100644
--- a/tools/testing/selftests/damon/access_memory.c
+++ b/tools/testing/selftests/damon/access_memory.c
@@ -8,6 +8,11 @@
 #include <string.h>
 #include <time.h>
 
+enum access_mode {
+	ACCESS_MODE_ONCE,
+	ACCESS_MODE_REPEAT,
+};
+
 int main(int argc, char *argv[])
 {
 	char **regions;
@@ -15,10 +20,12 @@ int main(int argc, char *argv[])
 	int nr_regions;
 	int sz_region;
 	int access_time_ms;
+	enum access_mode mode = ACCESS_MODE_ONCE;
+
 	int i;
 
-	if (argc != 4) {
-		printf("Usage: %s <number> <size (bytes)> <time (ms)>\n",
+	if (argc < 4) {
+		printf("Usage: %s <number> <size (bytes)> <time (ms)> [mode]\n",
 				argv[0]);
 		return -1;
 	}
@@ -27,15 +34,21 @@ int main(int argc, char *argv[])
 	sz_region = atoi(argv[2]);
 	access_time_ms = atoi(argv[3]);
 
+	if (argc > 4 && !strcmp(argv[4], "repeat"))
+		mode = ACCESS_MODE_REPEAT;
+
 	regions = malloc(sizeof(*regions) * nr_regions);
 	for (i = 0; i < nr_regions; i++)
 		regions[i] = malloc(sz_region);
 
-	for (i = 0; i < nr_regions; i++) {
-		start_clock = clock();
-		while ((clock() - start_clock) * 1000 / CLOCKS_PER_SEC <
-				access_time_ms)
-			memset(regions[i], i, sz_region);
-	}
+	do {
+		for (i = 0; i < nr_regions; i++) {
+			start_clock = clock();
+			while ((clock() - start_clock) * 1000 / CLOCKS_PER_SEC
+					< access_time_ms)
+				memset(regions[i], i, sz_region);
+		}
+	} while (mode == ACCESS_MODE_REPEAT);
+
 	return 0;
 }
-- 
2.47.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 4/5] selftests/damon/wss_estimation: ensure number of collected wss
  2026-01-17  2:07 [PATCH 0/5] selftests/damon: improve leak detection and wss estimation reliability SeongJae Park
                   ` (2 preceding siblings ...)
  2026-01-17  2:07 ` [PATCH 3/5] selftests/damon/access_memory: add repeat mode SeongJae Park
@ 2026-01-17  2:07 ` SeongJae Park
  2026-01-17  2:07 ` [PATCH 5/5] selftests/damon/wss_estimation: deduplicate failed samples output SeongJae Park
  4 siblings, 0 replies; 6+ messages in thread
From: SeongJae Park @ 2026-01-17  2:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Shuah Khan, damon, linux-kernel, linux-kselftest,
	linux-mm

DAMON selftest for working set size estimation collects DAMON's working
set size measurements of the running artificial memory access generator
program until the program is finished.  Depending on how quickly the
program finishes, and how quickly DAMON starts, the number of collected
working set size measurements may vary, and make the test results
unreliable.  Ensure it collects 40 measurements by using the repeat mode
of the artificial memory access generator program, and finish the
measurements only after the desired number of collections are made.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 .../sysfs_update_schemes_tried_regions_wss_estimation.py    | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py b/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
index bf48ef8e5241..cdccb9f0f855 100755
--- a/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
+++ b/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
@@ -8,7 +8,8 @@ import _damon_sysfs
 
 def pass_wss_estimation(sz_region):
     # access two regions of given size, 2 seocnds per each region
-    proc = subprocess.Popen(['./access_memory', '2', '%d' % sz_region, '2000'])
+    proc = subprocess.Popen(
+            ['./access_memory', '2', '%d' % sz_region, '2000', 'repeat'])
     kdamonds = _damon_sysfs.Kdamonds([_damon_sysfs.Kdamond(
             contexts=[_damon_sysfs.DamonCtx(
                 ops='vaddr',
@@ -26,7 +27,7 @@ def pass_wss_estimation(sz_region):
         exit(1)
 
     wss_collected = []
-    while proc.poll() == None:
+    while proc.poll() is None and len(wss_collected) < 40:
         time.sleep(0.1)
         err = kdamonds.kdamonds[0].update_schemes_tried_bytes()
         if err != None:
@@ -35,6 +36,7 @@ def pass_wss_estimation(sz_region):
 
         wss_collected.append(
                 kdamonds.kdamonds[0].contexts[0].schemes[0].tried_bytes)
+    proc.terminate()
     err = kdamonds.stop()
     if err is not None:
         print('kdamond stop failed: %s' % err)
-- 
2.47.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 5/5] selftests/damon/wss_estimation: deduplicate failed samples output
  2026-01-17  2:07 [PATCH 0/5] selftests/damon: improve leak detection and wss estimation reliability SeongJae Park
                   ` (3 preceding siblings ...)
  2026-01-17  2:07 ` [PATCH 4/5] selftests/damon/wss_estimation: ensure number of collected wss SeongJae Park
@ 2026-01-17  2:07 ` SeongJae Park
  4 siblings, 0 replies; 6+ messages in thread
From: SeongJae Park @ 2026-01-17  2:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Shuah Khan, damon, linux-kernel, linux-kselftest,
	linux-mm

When the test fails, it shows whole sampled working set size
measurements.  The purpose is showing the distribution of the measured
values, to let the tester know if it was just intermittent failure.
Multiple same values on the output are therefore unnecessary.  It was
not a big deal since the test was failing only once in the past.  But
the test can now fail multiple times with increased working set size,
until it passes or the working set size reaches a limit.  Hence the
noisy output can be quite long and annoying.  Print only the
deduplicated distribution information.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 .../sysfs_update_schemes_tried_regions_wss_estimation.py    | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py b/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
index cdccb9f0f855..35c724a63f6c 100755
--- a/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
+++ b/tools/testing/selftests/damon/sysfs_update_schemes_tried_regions_wss_estimation.py
@@ -53,7 +53,11 @@ def pass_wss_estimation(sz_region):
             print('the error rate is not acceptable (> %f)' %
                     acceptable_error_rate)
             print('samples are as below')
-            print('\n'.join(['%d' % wss for wss in wss_collected]))
+            for idx, wss in enumerate(wss_collected):
+                if idx < len(wss_collected) - 1 and \
+                        wss_collected[idx + 1] == wss:
+                    continue
+                print('%d/%d: %d' % (idx, len(wss_collected), wss))
             return False
     return True
 
-- 
2.47.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-01-17  2:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-17  2:07 [PATCH 0/5] selftests/damon: improve leak detection and wss estimation reliability SeongJae Park
2026-01-17  2:07 ` [PATCH 1/5] selftests/damon/sysfs_memcg_path_leak.sh: use kmemleak SeongJae Park
2026-01-17  2:07 ` [PATCH 2/5] selftests/damon/wss_estimation: test for up to 160 MiB working set size SeongJae Park
2026-01-17  2:07 ` [PATCH 3/5] selftests/damon/access_memory: add repeat mode SeongJae Park
2026-01-17  2:07 ` [PATCH 4/5] selftests/damon/wss_estimation: ensure number of collected wss SeongJae Park
2026-01-17  2:07 ` [PATCH 5/5] selftests/damon/wss_estimation: deduplicate failed samples output SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox