On 12/21/25 7:56 PM, Li Wang wrote:
On Mon, Dec 22, 2025 at 4:30 AM Waiman Long <llong@redhat.com> wrote:

On 12/21/25 7:26 AM, Li Wang wrote:
The hugetlb cgroup usage wait loops in charge_reserved_hugetlb.sh were
unbounded and could hang forever if the expected cgroup file value never
appears (e.g. due to write_to_hugetlbfs in Error mapping).

--- Error log ---
   # uname -r
   6.12.0-xxx.el10.aarch64+64k

   # ls /sys/kernel/mm/hugepages/hugepages-*
   hugepages-16777216kB/  hugepages-2048kB/  hugepages-524288kB/

   #./charge_reserved_hugetlb.sh -cgroup-v2
   # -----------------------------------------
   ...
   # nr hugepages = 10
   # writing cgroup limit: 5368709120
   # writing reseravation limit: 5368709120
   ...
   # write_to_hugetlbfs: Error mapping the file: Cannot allocate memory
   # Waiting for hugetlb memory reservation to reach size 2684354560.
   # 0
   # Waiting for hugetlb memory reservation to reach size 2684354560.
   # 0
   # Waiting for hugetlb memory reservation to reach size 2684354560.
   # 0
   # Waiting for hugetlb memory reservation to reach size 2684354560.
   # 0
   # Waiting for hugetlb memory reservation to reach size 2684354560.
   # 0
   # Waiting for hugetlb memory reservation to reach size 2684354560.
   # 0
   ...

Introduce a small helper, wait_for_file_value(), and use it for:
   - waiting for reservation usage to drop to 0,
   - waiting for reservation usage to reach a given size,
   - waiting for fault usage to reach a given size.

This makes the waits consistent and adds a hard timeout (60 tries with
1s sleep) so the test fails instead of stalling indefinitely.

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Waiman Long <longman@redhat.com>
---
  .../selftests/mm/charge_reserved_hugetlb.sh   | 51 +++++++++++--------
  1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
index fa6713892d82..447769657634 100755
--- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
@@ -100,7 +100,7 @@ function setup_cgroup() {
    echo writing cgroup limit: "$cgroup_limit"
    echo "$cgroup_limit" >$cgroup_path/$name/hugetlb.${MB}MB.$fault_limit_file

-  echo writing reseravation limit: "$reservation_limit"
+  echo writing reservation limit: "$reservation_limit"
    echo "$reservation_limit" > \
      $cgroup_path/$name/hugetlb.${MB}MB.$reservation_limit_file

@@ -112,41 +112,50 @@ function setup_cgroup() {
    fi
  }

+function wait_for_file_value() {
+  local path="$1"
+  local expect="$2"
+  local max_tries=60
+
+  if [[ ! -r "$path" ]]; then
+    echo "ERROR: cannot read '$path', missing or permission denied"
+    return 1
+  fi
+
+  for ((i=1; i<=max_tries; i++)); do
+    local cur="$(cat "$path")"
+    if [[ "$cur" == "$expect" ]]; then
+      return 0
+    fi
+    echo "Waiting for $path to become '$expect' (current: '$cur') (try $i/$max_tries)"
+    sleep 1
+  done
+
+  echo "ERROR: timeout waiting for $path to become '$expect'"
+  return 1
+}
+
  function wait_for_hugetlb_memory_to_get_depleted() {
    local cgroup="$1"
    local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
-  # Wait for hugetlbfs memory to get depleted.
-  while [ $(cat $path) != 0 ]; do
-    echo Waiting for hugetlb memory to get depleted.
-    cat $path
-    sleep 0.5
-  done
+
+  wait_for_file_value "$path" "0"
  }

  function wait_for_hugetlb_memory_to_get_reserved() {
    local cgroup="$1"
    local size="$2"
-
    local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
-  # Wait for hugetlbfs memory to get written.
-  while [ $(cat $path) != $size ]; do
-    echo Waiting for hugetlb memory reservation to reach size $size.
-    cat $path
-    sleep 0.5
-  done
+
+  wait_for_file_value "$path" "$size"
  }

  function wait_for_hugetlb_memory_to_get_written() {
    local cgroup="$1"
    local size="$2"
-
    local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$fault_usage_file"
-  # Wait for hugetlbfs memory to get written.
-  while [ $(cat $path) != $size ]; do
-    echo Waiting for hugetlb memory to reach size $size.
-    cat $path
-    sleep 0.5
-  done
+
+  wait_for_file_value "$path" "$size"
  }

  function write_hugetlbfs_and_get_usage() {
wait_for_file_value() now return 0 onr success and 1 on timeout.
However, none of the callers of the wait_for_hugetlb_memory* are
checking their return values and acting accordingly. Are we expecting
that the test will show failure because the waiting isn't completed or
should we explicitly exit with ksft_fail (1) value?
Hmm, it seems the test shouldn't exit too early.

As the wait_for_hugetlb_memory* is only trying 60s to examine the file
value, if timeouted, we still need to keep going because the test requires
CLEANUP work and exit/report from there.

The key point of each subtest is to save the '$write_result' value and
examine it
which controls the whole test to exit.

e.g.

This is an intentional error test:

# ./charge_reserved_hugetlb.sh -cgroup-v2
CLEANUP DONE
...
Writing to this path: /mnt/huge/test
Writing this size: 2684354560
Not populating.
Not writing to memory.
Using method=0
Shared mapping.
RESERVE mapping.
Allocating using HUGETLBFS.
write_to_hugetlbfs: Error mapping the file: Cannot allocate memory
Waiting for /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current
to become '2684354560' (current: '0') (try 1/60)
Waiting for /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current
to become '2684354560' (current: '0') (try 2/60)
Waiting for /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current
to become '2684354560' (current: '0') (try 3/60)
Waiting for /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current
to become '2684354560' (current: '0') (try 4/60)
...
Waiting for /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current
to become '2684354560' (current: '0') (try 60/60)
ERROR: timeout waiting for
/sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current to
become '2684354560'
After write:
hugetlb_usage=0
reserved_usage=0
0
0
Memory charged to hugtlb=0
Memory charged to reservation=0
expected (2684354560) != actual (0): Reserved memory not charged to
reservation usage.
CLEANUP DONE

Thank for running a test case. As long as the test will still report a failure, it will be fine with me. I just want to note that the return value value of wait_for_file_value() isn't currently used at all.

Cheers, Longman