On 12/21/25 7:56 PM, Li Wang wrote: > On Mon, Dec 22, 2025 at 4:30 AM Waiman Long wrote: >> >> On 12/21/25 7:26 AM, Li Wang wrote: >>> The hugetlb cgroup usage wait loops in charge_reserved_hugetlb.sh were >>> unbounded and could hang forever if the expected cgroup file value never >>> appears (e.g. due to write_to_hugetlbfs in Error mapping). >>> >>> --- Error log --- >>> # uname -r >>> 6.12.0-xxx.el10.aarch64+64k >>> >>> # ls /sys/kernel/mm/hugepages/hugepages-* >>> hugepages-16777216kB/ hugepages-2048kB/ hugepages-524288kB/ >>> >>> #./charge_reserved_hugetlb.sh -cgroup-v2 >>> # ----------------------------------------- >>> ... >>> # nr hugepages = 10 >>> # writing cgroup limit: 5368709120 >>> # writing reseravation limit: 5368709120 >>> ... >>> # write_to_hugetlbfs: Error mapping the file: Cannot allocate memory >>> # Waiting for hugetlb memory reservation to reach size 2684354560. >>> # 0 >>> # Waiting for hugetlb memory reservation to reach size 2684354560. >>> # 0 >>> # Waiting for hugetlb memory reservation to reach size 2684354560. >>> # 0 >>> # Waiting for hugetlb memory reservation to reach size 2684354560. >>> # 0 >>> # Waiting for hugetlb memory reservation to reach size 2684354560. >>> # 0 >>> # Waiting for hugetlb memory reservation to reach size 2684354560. >>> # 0 >>> ... >>> >>> Introduce a small helper, wait_for_file_value(), and use it for: >>> - waiting for reservation usage to drop to 0, >>> - waiting for reservation usage to reach a given size, >>> - waiting for fault usage to reach a given size. >>> >>> This makes the waits consistent and adds a hard timeout (60 tries with >>> 1s sleep) so the test fails instead of stalling indefinitely. >>> >>> Signed-off-by: Li Wang >>> Cc: David Hildenbrand >>> Cc: Mark Brown >>> Cc: Shuah Khan >>> Cc: Waiman Long >>> --- >>> .../selftests/mm/charge_reserved_hugetlb.sh | 51 +++++++++++-------- >>> 1 file changed, 30 insertions(+), 21 deletions(-) >>> >>> diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh >>> index fa6713892d82..447769657634 100755 >>> --- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh >>> +++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh >>> @@ -100,7 +100,7 @@ function setup_cgroup() { >>> echo writing cgroup limit: "$cgroup_limit" >>> echo "$cgroup_limit" >$cgroup_path/$name/hugetlb.${MB}MB.$fault_limit_file >>> >>> - echo writing reseravation limit: "$reservation_limit" >>> + echo writing reservation limit: "$reservation_limit" >>> echo "$reservation_limit" > \ >>> $cgroup_path/$name/hugetlb.${MB}MB.$reservation_limit_file >>> >>> @@ -112,41 +112,50 @@ function setup_cgroup() { >>> fi >>> } >>> >>> +function wait_for_file_value() { >>> + local path="$1" >>> + local expect="$2" >>> + local max_tries=60 >>> + >>> + if [[ ! -r "$path" ]]; then >>> + echo "ERROR: cannot read '$path', missing or permission denied" >>> + return 1 >>> + fi >>> + >>> + for ((i=1; i<=max_tries; i++)); do >>> + local cur="$(cat "$path")" >>> + if [[ "$cur" == "$expect" ]]; then >>> + return 0 >>> + fi >>> + echo "Waiting for $path to become '$expect' (current: '$cur') (try $i/$max_tries)" >>> + sleep 1 >>> + done >>> + >>> + echo "ERROR: timeout waiting for $path to become '$expect'" >>> + return 1 >>> +} >>> + >>> function wait_for_hugetlb_memory_to_get_depleted() { >>> local cgroup="$1" >>> local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file" >>> - # Wait for hugetlbfs memory to get depleted. >>> - while [ $(cat $path) != 0 ]; do >>> - echo Waiting for hugetlb memory to get depleted. >>> - cat $path >>> - sleep 0.5 >>> - done >>> + >>> + wait_for_file_value "$path" "0" >>> } >>> >>> function wait_for_hugetlb_memory_to_get_reserved() { >>> local cgroup="$1" >>> local size="$2" >>> - >>> local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file" >>> - # Wait for hugetlbfs memory to get written. >>> - while [ $(cat $path) != $size ]; do >>> - echo Waiting for hugetlb memory reservation to reach size $size. >>> - cat $path >>> - sleep 0.5 >>> - done >>> + >>> + wait_for_file_value "$path" "$size" >>> } >>> >>> function wait_for_hugetlb_memory_to_get_written() { >>> local cgroup="$1" >>> local size="$2" >>> - >>> local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$fault_usage_file" >>> - # Wait for hugetlbfs memory to get written. >>> - while [ $(cat $path) != $size ]; do >>> - echo Waiting for hugetlb memory to reach size $size. >>> - cat $path >>> - sleep 0.5 >>> - done >>> + >>> + wait_for_file_value "$path" "$size" >>> } >>> >>> function write_hugetlbfs_and_get_usage() { >> wait_for_file_value() now return 0 onr success and 1 on timeout. >> However, none of the callers of the wait_for_hugetlb_memory* are >> checking their return values and acting accordingly. Are we expecting >> that the test will show failure because the waiting isn't completed or >> should we explicitly exit with ksft_fail (1) value? > Hmm, it seems the test shouldn't exit too early. > > As the wait_for_hugetlb_memory* is only trying 60s to examine the file > value, if timeouted, we still need to keep going because the test requires > CLEANUP work and exit/report from there. > > The key point of each subtest is to save the '$write_result' value and > examine it > which controls the whole test to exit. > > e.g. > > This is an intentional error test: > > # ./charge_reserved_hugetlb.sh -cgroup-v2 > CLEANUP DONE > ... > Writing to this path: /mnt/huge/test > Writing this size: 2684354560 > Not populating. > Not writing to memory. > Using method=0 > Shared mapping. > RESERVE mapping. > Allocating using HUGETLBFS. > write_to_hugetlbfs: Error mapping the file: Cannot allocate memory > Waiting for /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current > to become '2684354560' (current: '0') (try 1/60) > Waiting for /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current > to become '2684354560' (current: '0') (try 2/60) > Waiting for /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current > to become '2684354560' (current: '0') (try 3/60) > Waiting for /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current > to become '2684354560' (current: '0') (try 4/60) > ... > Waiting for /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current > to become '2684354560' (current: '0') (try 60/60) > ERROR: timeout waiting for > /sys/fs/cgroup/hugetlb_cgroup_test/hugetlb.512MB.rsvd.current to > become '2684354560' > After write: > hugetlb_usage=0 > reserved_usage=0 > 0 > 0 > Memory charged to hugtlb=0 > Memory charged to reservation=0 > expected (2684354560) != actual (0): Reserved memory not charged to > reservation usage. > CLEANUP DONE Thank for running a test case. As long as the test will still report a failure, it will be fine with me. I just want to note that the return value value of wait_for_file_value() isn't currently used at all. Cheers, Longman