From: Waiman Long <llong@redhat.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: "Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Hocko" <mhocko@kernel.org>,
"Shakeel Butt" <shakeel.butt@linux.dev>,
"Muchun Song" <muchun.song@linux.dev>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Tejun Heo" <tj@kernel.org>, "Michal Koutný" <mkoutny@suse.com>,
"Shuah Khan" <shuah@kernel.org>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-mm@kvack.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v3 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection()
Date: Sat, 12 Apr 2025 23:15:00 -0400 [thread overview]
Message-ID: <bbc48f40-274c-44ec-9a98-7c18b64628c0@redhat.com> (raw)
In-Reply-To: <Z_Wht7kyWyk62IBU@google.com>
On 4/8/25 6:22 PM, Roman Gushchin wrote:
> On Sat, Apr 05, 2025 at 10:40:10PM -0400, Waiman Long wrote:
>> The test_memcg_protection() function is used for the test_memcg_min and
>> test_memcg_low sub-tests. This function generates a set of parent/child
>> cgroups like:
>>
>> parent: memory.min/low = 50M
>> child 0: memory.min/low = 75M, memory.current = 50M
>> child 1: memory.min/low = 25M, memory.current = 50M
>> child 2: memory.min/low = 0, memory.current = 50M
>>
>> After applying memory pressure, the function expects the following
>> actual memory usages.
>>
>> parent: memory.current ~= 50M
>> child 0: memory.current ~= 29M
>> child 1: memory.current ~= 21M
>> child 2: memory.current ~= 0
>>
>> In reality, the actual memory usages can differ quite a bit from the
>> expected values. It uses an error tolerance of 10% with the values_close()
>> helper.
>>
>> Both the test_memcg_min and test_memcg_low sub-tests can fail
>> sporadically because the actual memory usage exceeds the 10% error
>> tolerance. Below are a sample of the usage data of the tests runs
>> that fail.
>>
>> Child Actual usage Expected usage %err
>> ----- ------------ -------------- ----
>> 1 16990208 22020096 -12.9%
>> 1 17252352 22020096 -12.1%
>> 0 37699584 30408704 +10.7%
>> 1 14368768 22020096 -21.0%
>> 1 16871424 22020096 -13.2%
>>
>> The current 10% error tolerenace might be right at the time
>> test_memcontrol.c was first introduced in v4.18 kernel, but memory
>> reclaim have certainly evolved quite a bit since then which may result
>> in a bit more run-to-run variation than previously expected.
>>
>> Increase the error tolerance to 15% for child 0 and 20% for child 1 to
>> minimize the chance of this type of failure. The tolerance is bigger
>> for child 1 because an upswing in child 0 corresponds to a smaller
>> %err than a similar downswing in child 1 due to the way %err is used
>> in values_close().
>>
>> Before this patch, a 100 test runs of test_memcontrol produced the
>> following results:
>>
>> 17 not ok 1 test_memcg_min
>> 22 not ok 2 test_memcg_low
>>
>> After applying this patch, there were no test failure for test_memcg_min
>> and test_memcg_low in 100 test runs.
> Ideally we want to calculate these values dynamically based on the machine
> size (number of cpus and total memory size).
>
> We can calculate the memcg error margin and scale memcg sizes if necessarily.
> It's the only way to make it pass both on a 2-CPU's vm and 512-CPU's physical
> server.
>
> Not a blocker for this patch, just an idea for the future.
Thanks for the suggestion.
As I said in a previous mail, the way the test works is by waiting until
the the memory.current of the parent is close to 50M, then it checks the
memory.current's of its children to see how much usage each of them
have. I am not sure if nr of CPUs or total memory size is really a
factor here. We will probably need to run some experiments to find out.
Anyway, it will be a future patch if they are really a factor here.
Cheers,
Longman
>
> Thanks!
>
prev parent reply other threads:[~2025-04-13 3:15 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-06 2:40 [PATCH v3 0/2] memcg: Fix test_memcg_min/low test failures Waiman Long
2025-04-06 2:40 ` [PATCH v3 1/2] mm/vmscan: Skip memcg with !usage in shrink_node_memcgs() Waiman Long
2025-04-06 4:27 ` kernel test robot
2025-04-06 5:08 ` kernel test robot
2025-04-06 2:40 ` [PATCH v3 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() Waiman Long
2025-04-08 22:22 ` Roman Gushchin
2025-04-13 3:15 ` Waiman Long [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bbc48f40-274c-44ec-9a98-7c18b64628c0@redhat.com \
--to=llong@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=shuah@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox