From: Li Wang <liwang@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <guro@fb.com>, Michal Hocko <mhocko@suse.com>,
Linux-MM <linux-mm@kvack.org>
Cc: LTP List <ltp@lists.linux.it>, Martin Doucha <mdoucha@suse.cz>,
Cyril Hrubis <chrubis@suse.cz>,
Eirik Fuller <efuller@redhat.com>, Chunyu Hu <chuhu@redhat.com>,
Xinpeng Liu <liuxp11@chinatelecom.cn>,
linux-kernel <linux-kernel@vger.kernel.org>,
Paul Bunyan <pbunyan@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>
Subject: [OVERCOMMIT_GUESS] Broken mmap() for MAP_FAILED (ENOMEM)
Date: Sun, 11 Apr 2021 12:47:35 +0800 [thread overview]
Message-ID: <CAEemH2c63GXZosG+e3=c9FgisYdgx02PnV7pknMpWg6EyjM-AQ@mail.gmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 3585 bytes --]
Hi Johannes, Roman, and MM experts,
Both Xinpeng and PaulB reports that LTP/ioctl_sg01 always gets OOM killed
on aarch64
( confirmed "x86_64 + kernel-v5.12-rc6" influenced as well) when system
MemAvailable
less than MemFree. With help of Eirik and Chunyu, we found that the problem
only
occurred since below kernel commit:
commit 8c7829b04c523cdc732cb77f59f03320e09f3386
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 13 May 2019 17:21:50 -0700
mm: fix false-positive OVERCOMMIT_GUESS failures
The mmap() behavior changed in GUESS mode from that, we can NOT receive
MAP_FAILED on ENOMEM in userspace anymore unless the process one-time
allocating memory larger than "total_ram+ total_swap" explicitly, hence, it
does
not look like a heuristics way in memory allocation.
Chunyu and I concern that might be more trouble for users in memory
allocation.
mmap2
ksys_mmap_pgoff
vm_mmap_pgoff
do_mmap
mmap_region
// Private writeable mmaping: check memory availability
security_vm_enough_memory_mm
__vm_enough_memory
"
872 int __vm_enough_memory(struct mm_struct *mm, long pages, int
cap_sys_admin)
...
884 if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
885 if (pages > totalram_pages() + total_swap_pages)
886 goto error;
887 return 0;
888 }
"
As __vm_enough_memory() using a consistent upbound on return ENOMEM which
only
make sense for the one-time requested memory size larger than "total_ram +
total_swap",
so all processes in userspace will more easily hit OOM (in
OVERCOMMIT_GUESS) roughly.
Maybe the acceptable way should be to dynamically detect the available/free
memory
according to the running system "free_pages + free_swap_pages" as before.
Any thoughts or suggestions?
=================
To simply show the above issue, I extract a C reproducer as:
Without the kernel commit
# ./mmap_failed
...
map_blocks[1493] = 0xffc525c60000
PASS: MAP_FAILED as expected
After the kernel commit:
# ./mmap_failed
...
map_blocks[1617] = 0x3c0836b0000
map_blocks[1618] = 0x3c0796b0000
Killed <===== Always Killed by OOM-Killer
-------------------------
# cat mmap_failed.c
#include <stdio.h>
#include <sys/sysinfo.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#define BLOCKSIZE (160 * 1024 * 1024)
void main(void)
{
size_t i, maxsize, map_count = 0, blocksize = BLOCKSIZE;
void **map_blocks;
struct sysinfo info;
sysinfo(&info);
maxsize = (info.freeram + info.freeswap) * info.mem_unit;
map_count = maxsize / blocksize;
map_blocks = malloc(map_count * sizeof(void *));
for (i = 0; i < map_count; i++) {
map_blocks[i] = mmap(NULL, blocksize, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// we'd better get MAP_FAILED and break here but not OOM
instantly
if (map_blocks[i] == MAP_FAILED) {
map_count = i;
printf("PASS: MAP_FAILED as expected\n");
break;
}
printf("map_blocks[%d] = %p\n", i, map_blocks[i]);
memset(map_blocks[i], 1, blocksize);
}
for (i = 0; i < map_count; i++)
munmap(map_blocks[i], blocksize);
free(map_blocks);
}
--
P.s there is another issue about MemAvailable < MemFree because of reserve
ing
by khugepaged for allocating transparent hugepage, but I don't want to mix
them
in this thread to make things complicated. @Chunyu, if you can start a new
email
thread that'd be appreciated.
--
Regards,
Li Wang
[-- Attachment #2: Type: text/html, Size: 7578 bytes --]
reply other threads:[~2021-04-11 4:47 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAEemH2c63GXZosG+e3=c9FgisYdgx02PnV7pknMpWg6EyjM-AQ@mail.gmail.com' \
--to=liwang@redhat.com \
--cc=chrubis@suse.cz \
--cc=chuhu@redhat.com \
--cc=efuller@redhat.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liuxp11@chinatelecom.cn \
--cc=ltp@lists.linux.it \
--cc=mdoucha@suse.cz \
--cc=mhocko@suse.com \
--cc=pbunyan@redhat.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox