From: 台运方 <yunfangtai09@gmail.com>
To: hannes@cmpxchg.org
Cc: hughd@google.com, tj@kernel.org, vdavydov@parallels.com,
cgroups@vger.kernel.org, linux-mm@kvack.org
Subject: [BUG] The usage of memory cgroup is not consistent with processes when using THP
Date: Sun, 26 Sep 2021 15:35:34 +0800 [thread overview]
Message-ID: <CAHKqYaa7H=M4E-=ObO0ecj+NE2KwZN5d7QSz4_b6tXz2vOo+VA@mail.gmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 1241 bytes --]
Hi folks,
We found that the usage counter of containers with memory cgroup v1 is
not consistent with the memory usage of processes when using THP.
It is introduced in upstream 0a31bc97c80 patch and still exists in
Linux 5.14.5.
The root cause is that mem_cgroup_uncharge is moved to the final
put_page(). When freeing parts of huge pages in THP, the memory usage
of process is updated when pte unmapped and the usage counter of
memory cgroup is updated when splitting huge pages in
deferred_split_scan. This causes the inconsistencies and we could find
more than 30GB memory difference in our daily usage.
It is reproduced with the following program and script.
The program named "eat_memory_release" allocates every 8 MB memory and
releases the last 1 MB memory using madvise.
The script "test_thp.sh" creates a memory cgroup, runs
"eat_memory_release 500" in it and loops the proceed by 10 times. The
output shows the changing of memory, which should be about 500M memory
less in theory.
The outputs are varying randomly when using THP, while adding "echo 2
> /proc/sys/vm/drop_caches" before accounting can avoid this.
Are there any patches to fix it or is it normal by design?
Thanks,
Yunfang Tai
[-- Attachment #2: eat_release_memory.c --]
[-- Type: text/x-c-code, Size: 1175 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
int main(int argc, char* argv[])
{
char* memindex[1000] = {0};
int eat = 0;
int wait = 0;
int i = 0;
if (argc < 2) {
printf("Usage: ./eat_release_memory <num> #allocate num * 8 MB and free num MB memory\n");
return;
}
sscanf(argv[1], "%d", &eat);
if (eat <= 0 || eat >= 1000) {
printf("num should larger than 0 and less than 1000\n");
return;
}
printf("Allocate memory in MB size: %d\n", eat * 8);
printf("Allocation memory Begin!\n");
for (i = 0; i < eat; i++) {
memindex[i] = (char*)mmap(NULL, 8*1024*1024, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
memset(memindex[i], 0, 8*1024*1024);
}
printf("Allocation memory Done!\n");
sleep(2);
printf("Now begin to madvise free memory!\n");
for (i = 0; i < eat; i++) {
madvise(memindex[i] + 7*1024*1024, 1024*1024, MADV_DONTNEED);
}
sleep(5);
printf("Now begin to release memory!\n");
for (i = 0; i < eat; i++) {
munmap(memindex[i], 8*1024*1024);
}
}
[-- Attachment #3: test_thp.sh --]
[-- Type: application/x-sh, Size: 598 bytes --]
next reply other threads:[~2021-09-26 7:35 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-26 7:35 台运方 [this message]
2021-09-27 17:28 ` Yang Shi
2021-09-28 7:15 ` 台运方
2021-09-28 22:14 ` Yang Shi
2021-09-29 3:25 ` Yunfang Tai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHKqYaa7H=M4E-=ObO0ecj+NE2KwZN5d7QSz4_b6tXz2vOo+VA@mail.gmail.com' \
--to=yunfangtai09@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=tj@kernel.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox