linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/2] memcg: Fix test_memcg_min/low test failures
@ 2025-04-15 21:04 Waiman Long
  2025-04-15 21:04 ` [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on Waiman Long
  2025-04-15 21:04 ` [PATCH v7 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long
  0 siblings, 2 replies; 9+ messages in thread
From: Waiman Long @ 2025-04-15 21:04 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Tejun Heo, Michal Koutný,
	Shuah Khan
  Cc: linux-kernel, cgroups, linux-mm, linux-kselftest, Waiman Long

v7:
 - Skip the vmscan change as the mem_cgroup_usage() check for now as
   it is currently redundant.

v6:
 - The memcg_test_low failure is indeed due to the memory_recursiveprot
   mount option which is enabled by default in systemd cgroup v2 setting.
   So adopt Michal's suggestion to adjust the low event checking
   according to whether memory_recursiveprot is enabled or not.

v5:
 - Use mem_cgroup_usage() in patch 1 as originally suggested by Johannes.

The test_memcontrol selftest consistently fails its test_memcg_low
sub-test (with memory_recursiveprot enabled) and sporadically fails
its test_memcg_min sub-test. This patchset fixes the test_memcg_min
and test_memcg_low failures by adjusting the test_memcontrol selftest
to fix these test failures.

Waiman Long (2):
  selftests: memcg: Allow low event with no memory.low and
    memory_recursiveprot on
  selftests: memcg: Increase error tolerance of child memory.current
    check in test_memcg_protection()

 .../selftests/cgroup/test_memcontrol.c        | 20 ++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

-- 
2.49.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on
  2025-04-15 21:04 [PATCH v7 0/2] memcg: Fix test_memcg_min/low test failures Waiman Long
@ 2025-04-15 21:04 ` Waiman Long
  2025-04-16  9:25   ` Michal Koutný
  2025-04-15 21:04 ` [PATCH v7 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long
  1 sibling, 1 reply; 9+ messages in thread
From: Waiman Long @ 2025-04-15 21:04 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Tejun Heo, Michal Koutný,
	Shuah Khan
  Cc: linux-kernel, cgroups, linux-mm, linux-kselftest, Waiman Long

The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that its 3rd test child cgroup which
have a memmory.low of 0 have low event count. This happens when
memory_recursiveprot mount option is enabled which is the default
setting used by systemd to mount cgroup2 filesystem.

Modify the test_memcontrol.c to allow non-zero low event count in this
particular case with memory_recursiveprot on.

With this patch applied, the test_memcg_low sub-test finishes
successfully without failure in most cases. Though both test_memcg_low
and test_memcg_min sub-tests may still fail occasionally if the
memory.current values fall outside of the expected ranges.

The 4th test child cgroup has no memory usage and so has an effective
low of 0. It has no low event count because the mem_cgroup_below_low()
check in shrink_node_memcgs() is skipped as mem_cgroup_below_min()
returns true. If we ever change mem_cgroup_below_min() in such a way
that it no longer skips the no usage case, we will have to add code to
explicitly skip it.

Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 tools/testing/selftests/cgroup/test_memcontrol.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 16f5d74ae762..5a5dcbe57b56 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -380,10 +380,10 @@ static bool reclaim_until(const char *memcg, long goal);
  *
  * Then it checks actual memory usages and expects that:
  * A/B    memory.current ~= 50M
- * A/B/C  memory.current ~= 29M
- * A/B/D  memory.current ~= 21M
- * A/B/E  memory.current ~= 0
- * A/B/F  memory.current  = 0
+ * A/B/C  memory.current ~= 29M [memory.events:low > 0]
+ * A/B/D  memory.current ~= 21M [memory.events:low > 0]
+ * A/B/E  memory.current ~= 0   [memory.events:low == 0 if !memory_recursiveprot, > 0 otherwise]
+ * A/B/F  memory.current  = 0   [memory.events:low == 0]
  * (for origin of the numbers, see model in memcg_protection.m.)
  *
  * After that it tries to allocate more than there is
@@ -525,8 +525,14 @@ static int test_memcg_protection(const char *root, bool min)
 		goto cleanup;
 	}
 
+	/*
+	 * Child 2 has memory.low=0, but some low protection is still being
+	 * distributed down from its parent with memory.low=50M if cgroup2
+	 * memory_recursiveprot mount option is enabled. So the low event
+	 * count will be non-zero in this case.
+	 */
 	for (i = 0; i < ARRAY_SIZE(children); i++) {
-		int no_low_events_index = 1;
+		int no_low_events_index = has_recursiveprot ? 2 : 1;
 		long low, oom;
 
 		oom = cg_read_key_long(children[i], "memory.events", "oom ");
-- 
2.49.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v7 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection()
  2025-04-15 21:04 [PATCH v7 0/2] memcg: Fix test_memcg_min/low test failures Waiman Long
  2025-04-15 21:04 ` [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on Waiman Long
@ 2025-04-15 21:04 ` Waiman Long
  1 sibling, 0 replies; 9+ messages in thread
From: Waiman Long @ 2025-04-15 21:04 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Tejun Heo, Michal Koutný,
	Shuah Khan
  Cc: linux-kernel, cgroups, linux-mm, linux-kselftest, Waiman Long

The test_memcg_protection() function is used for the test_memcg_min and
test_memcg_low sub-tests. This function generates a set of parent/child
cgroups like:

  parent:  memory.min/low = 50M
  child 0: memory.min/low = 75M,  memory.current = 50M
  child 1: memory.min/low = 25M,  memory.current = 50M
  child 2: memory.min/low = 0,    memory.current = 50M

After applying memory pressure, the function expects the following
actual memory usages.

  parent:  memory.current ~= 50M
  child 0: memory.current ~= 29M
  child 1: memory.current ~= 21M
  child 2: memory.current ~= 0

In reality, the actual memory usages can differ quite a bit from the
expected values. It uses an error tolerance of 10% with the values_close()
helper.

Both the test_memcg_min and test_memcg_low sub-tests can fail
sporadically because the actual memory usage exceeds the 10% error
tolerance. Below are a sample of the usage data of the tests runs
that fail.

  Child   Actual usage    Expected usage    %err
  -----   ------------    --------------    ----
    1       16990208         22020096      -12.9%
    1       17252352         22020096      -12.1%
    0       37699584         30408704      +10.7%
    1       14368768         22020096      -21.0%
    1       16871424         22020096      -13.2%

The current 10% error tolerenace might be right at the time
test_memcontrol.c was first introduced in v4.18 kernel, but memory
reclaim have certainly evolved quite a bit since then which may result
in a bit more run-to-run variation than previously expected.

Increase the error tolerance to 15% for child 0 and 20% for child 1 to
minimize the chance of this type of failure. The tolerance is bigger
for child 1 because an upswing in child 0 corresponds to a smaller
%err than a similar downswing in child 1 due to the way %err is used
in values_close().

Before this patch, a 100 test runs of test_memcontrol produced the
following results:

     17 not ok 1 test_memcg_min
     22 not ok 2 test_memcg_low

After applying this patch, there were no test failure for test_memcg_min
and test_memcg_low in 100 test runs. However, these tests may still fail
once in a while if the memory usage goes beyond the newly extended range.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 tools/testing/selftests/cgroup/test_memcontrol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 5a5dcbe57b56..2ef07b8fa718 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -495,10 +495,10 @@ static int test_memcg_protection(const char *root, bool min)
 	for (i = 0; i < ARRAY_SIZE(children); i++)
 		c[i] = cg_read_long(children[i], "memory.current");
 
-	if (!values_close(c[0], MB(29), 10))
+	if (!values_close(c[0], MB(29), 15))
 		goto cleanup;
 
-	if (!values_close(c[1], MB(21), 10))
+	if (!values_close(c[1], MB(21), 20))
 		goto cleanup;
 
 	if (c[3] != 0)
-- 
2.49.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on
  2025-04-15 21:04 ` [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on Waiman Long
@ 2025-04-16  9:25   ` Michal Koutný
  2025-04-20 21:48     ` Waiman Long
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Koutný @ 2025-04-16  9:25 UTC (permalink / raw)
  To: Waiman Long
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Tejun Heo, Shuah Khan, linux-kernel,
	cgroups, linux-mm, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

On Tue, Apr 15, 2025 at 05:04:14PM -0400, Waiman Long <longman@redhat.com> wrote:
> +	/*
> +	 * Child 2 has memory.low=0, but some low protection is still being
> +	 * distributed down from its parent with memory.low=50M if cgroup2
> +	 * memory_recursiveprot mount option is enabled. So the low event
> +	 * count will be non-zero in this case.

I say: Child 2 should have zero effective low value in this test case.
Johannes says (IIUC): One cannot argue whether there is or isn't
effective low for Child 2, it depends on siblings.
(I also say that low events should only be counted for nominal low
breaches but that's not so important here.)

But together this means no value of memory.events:low is valid or
invalid in this testcase. Hence I suggested ignoring Child 2's value in
checks.

> +	 */
>  	for (i = 0; i < ARRAY_SIZE(children); i++) {
> -		int no_low_events_index = 1;
> +		int no_low_events_index = has_recursiveprot ? 2 : 1;
>  		long low, oom;
>  
>  		oom = cg_read_key_long(children[i], "memory.events", "oom ");

But this is not what I Suggested-by: [1]

Michal

[1] https://lore.kernel.org/r/awgbdn6gwnj4kfaezsorvopgsdyoty3yahdeanqvoxstz2w2ke@xc3sv43elkz5

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on
  2025-04-16  9:25   ` Michal Koutný
@ 2025-04-20 21:48     ` Waiman Long
  2025-04-22 12:11       ` Michal Koutný
  0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2025-04-20 21:48 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Tejun Heo, Shuah Khan, linux-kernel,
	cgroups, linux-mm, linux-kselftest


On 4/16/25 5:25 AM, Michal Koutný wrote:
> On Tue, Apr 15, 2025 at 05:04:14PM -0400, Waiman Long <longman@redhat.com> wrote:
>> +	/*
>> +	 * Child 2 has memory.low=0, but some low protection is still being
>> +	 * distributed down from its parent with memory.low=50M if cgroup2
>> +	 * memory_recursiveprot mount option is enabled. So the low event
>> +	 * count will be non-zero in this case.
> I say: Child 2 should have zero effective low value in this test case.
> Johannes says (IIUC): One cannot argue whether there is or isn't
> effective low for Child 2, it depends on siblings.
> (I also say that low events should only be counted for nominal low
> breaches but that's not so important here.)
>
> But together this means no value of memory.events:low is valid or
> invalid in this testcase. Hence I suggested ignoring Child 2's value in
> checks.
I understand your point of view. What I want to do is to document the 
expected behavior and I don't see any example of ignoring a metric for a 
particular child in the test. In this particular test, I did see an elow 
of 17 for child 2.
>
>> +	 */
>>   	for (i = 0; i < ARRAY_SIZE(children); i++) {
>> -		int no_low_events_index = 1;
>> +		int no_low_events_index = has_recursiveprot ? 2 : 1;
>>   		long low, oom;
>>   
>>   		oom = cg_read_key_long(children[i], "memory.events", "oom ");
> But this is not what I Suggested-by: [1]

I was referring to the suggestion that the setting of 
memory_recursiveprot mount option has a material impact of the child 2 
test result. Roman probably didn't have memory_recursiveprot set when 
developing this selftest.

I can take out the Suggested-by tag.

Cheers,
Longman


>
> Michal
>
> [1] https://lore.kernel.org/r/awgbdn6gwnj4kfaezsorvopgsdyoty3yahdeanqvoxstz2w2ke@xc3sv43elkz5



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on
  2025-04-20 21:48     ` Waiman Long
@ 2025-04-22 12:11       ` Michal Koutný
  2025-04-22 23:58         ` Waiman Long
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Koutný @ 2025-04-22 12:11 UTC (permalink / raw)
  To: Waiman Long
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Tejun Heo, Shuah Khan, linux-kernel,
	cgroups, linux-mm, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

On Sun, Apr 20, 2025 at 05:48:15PM -0400, Waiman Long <llong@redhat.com> wrote:
> I was referring to the suggestion that the setting of memory_recursiveprot
> mount option has a material impact of the child 2 test result. Roman
> probably didn't have memory_recursiveprot set when developing this selftest.

The patch in its v7 form is effectively a revert of
	1d09069f5313f ("selftests: memcg: expect no low events in unprotected sibling")

i.e. this would be going in circles (that commit is also a revert) hence
I suggested to exempt looking at memory.events:low entirely with
memory_recursiveprot (and check for 0 when !memory_recursiveprot) --
which is something new (and hopefully universally better :-)

Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on
  2025-04-22 12:11       ` Michal Koutný
@ 2025-04-22 23:58         ` Waiman Long
  2025-04-23 16:49           ` Michal Koutný
  0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2025-04-22 23:58 UTC (permalink / raw)
  To: Michal Koutný, Waiman Long
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Tejun Heo, Shuah Khan, linux-kernel,
	cgroups, linux-mm, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 1138 bytes --]


On 4/22/25 8:11 AM, Michal Koutný wrote:
> On Sun, Apr 20, 2025 at 05:48:15PM -0400, Waiman Long<llong@redhat.com> wrote:
>> I was referring to the suggestion that the setting of memory_recursiveprot
>> mount option has a material impact of the child 2 test result. Roman
>> probably didn't have memory_recursiveprot set when developing this selftest.
> The patch in its v7 form is effectively a revert of
> 	1d09069f5313f ("selftests: memcg: expect no low events in unprotected sibling")
>
> i.e. this would be going in circles (that commit is also a revert) hence
> I suggested to exempt looking at memory.events:low entirely with
> memory_recursiveprot (and check for 0 when !memory_recursiveprot) --
> which is something new (and hopefully universally better :-)

Ah, you had done a lot of work for the test_memcontrol selftest. Am I 
correct to assume that the purpose of 1d09069f5313f ("selftests: memcg: 
expect no low events in unprotected sibling") is to force a failure in 
the test_memcg_low test to force a change in the current behavior? Or 
was it the case that it didn't fail when you submit your patch?

Cheers, Longman

[-- Attachment #2: Type: text/html, Size: 1839 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on
  2025-04-22 23:58         ` Waiman Long
@ 2025-04-23 16:49           ` Michal Koutný
  2025-04-23 17:03             ` Waiman Long
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Koutný @ 2025-04-23 16:49 UTC (permalink / raw)
  To: Waiman Long
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Tejun Heo, Shuah Khan, linux-kernel,
	cgroups, linux-mm, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 769 bytes --]

On Tue, Apr 22, 2025 at 07:58:56PM -0400, Waiman Long <llong@redhat.com> wrote:
> Am I correct to assume that the purpose of 1d09069f5313f ("selftests:
> memcg: expect no low events in unprotected sibling") is to force a
> failure in the test_memcg_low test to force a change in the current
> behavior? Or was it the case that it didn't fail when you submit your
> patch?

Yes, the failure had been intended to mark unexpected mode of reclaim
(there's still a reproducer somewhere in the references). However, I
learnt that:
  a) it ain't easy to fix,
  b) the only occurence of the troublesome behavior was in the test and
     never reported by users in real life.

I've started to prefer the variant where the particular check is
indefinite since that.

HTH,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on
  2025-04-23 16:49           ` Michal Koutný
@ 2025-04-23 17:03             ` Waiman Long
  0 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2025-04-23 17:03 UTC (permalink / raw)
  To: Michal Koutný, Waiman Long
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Tejun Heo, Shuah Khan, linux-kernel,
	cgroups, linux-mm, linux-kselftest


On 4/23/25 12:49 PM, Michal Koutný wrote:
> On Tue, Apr 22, 2025 at 07:58:56PM -0400, Waiman Long <llong@redhat.com> wrote:
>> Am I correct to assume that the purpose of 1d09069f5313f ("selftests:
>> memcg: expect no low events in unprotected sibling") is to force a
>> failure in the test_memcg_low test to force a change in the current
>> behavior? Or was it the case that it didn't fail when you submit your
>> patch?
> Yes, the failure had been intended to mark unexpected mode of reclaim
> (there's still a reproducer somewhere in the references). However, I
> learnt that:
>    a) it ain't easy to fix,
>    b) the only occurence of the troublesome behavior was in the test and
>       never reported by users in real life.
>
> I've started to prefer the variant where the particular check is
> indefinite since that.

OK, I will update the patch as you had suggested. I am fine doing that, 
just that I did not understand why you wanted the result to be undefined 
in the first place.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-04-23 17:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-15 21:04 [PATCH v7 0/2] memcg: Fix test_memcg_min/low test failures Waiman Long
2025-04-15 21:04 ` [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on Waiman Long
2025-04-16  9:25   ` Michal Koutný
2025-04-20 21:48     ` Waiman Long
2025-04-22 12:11       ` Michal Koutný
2025-04-22 23:58         ` Waiman Long
2025-04-23 16:49           ` Michal Koutný
2025-04-23 17:03             ` Waiman Long
2025-04-15 21:04 ` [PATCH v7 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox