From: Zi Yan <ziy@nvidia.com>
To: Pedro Falcato <pfalcato@suse.de>
Cc: Felix Abecassis <fabecassis@nvidia.com>,
linux-mm@kvack.org, John Hubbard <jhubbard@nvidia.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>
Subject: Re: OOM kill of privileged processes when exhausting a single NUMA node
Date: Thu, 26 Jun 2025 19:27:36 -0400 [thread overview]
Message-ID: <E1E5B829-72A2-425C-836F-A1C94FD17C76@nvidia.com> (raw)
In-Reply-To: <btenbxjm4eheurdl2oexxy4h5diphmfy5cugiscfv6nljqhfki@2xxhbwtfslsj>
On 26 Jun 2025, at 19:21, Pedro Falcato wrote:
> On Thu, Jun 26, 2025 at 10:27:36PM +0000, Felix Abecassis wrote:
>> Hello linux-mm team,
>>
>> I have found an interesting behavior in the Linux kernel: an unprivileged user
>> with access to user namespaces can cause privileged processes to be killed due
>> to an OOM situation on a single NUMA node, even if the system has plenty of
>> memory available on other NUMA nodes.
>>
>> This might lead to a local denial of service in some situations, so please
>> review and let me know if the current behavior is expected.
>>
>> The steps are simple:
>> 1. Use a Linux system with multiple NUMA nodes
>> 2. Enable unprivileged user namespaces (often distro dependent)
>> 3. As an unprivileged user, create a user namespace + mount namespace
>> and mount a tmpfs bound to NUMA node 1
>> 4. Attempt to fill the tmpfs with more data than it can possibly store
>> 5. The OOM killer will kill a significant amount of system daemons
>> (UID 0).
>>
>
> I somewhat agree that this is somewhat unintended tmpfs behavior, but you can
> (probably) pull this off in other ways:
>
> - use set_mempolicy()/mbind to bind to a NUMA node and use a big mmap() mapping
> - just use a lot of memory
OOM will kill the app using a lot of memory, but with tmpfs, like you mentioned
below, OOM is not able to find a victim process to kill.
>
> and it's not limited to NUMA either.
>
> AFAIK user namespaces aren't really isolating in the sense that you need a
> cgroup on top to further control software you don't trust (or want to limit
> for other reasons)
>
>
> And in this case the particular problem is that tmpfs really can't track
> what process "owns" a file, even if O_TMPFILE was specified. So you can quite
> trivially run out of memory in a regular Linux distro by filling up the /tmp
> (if tmpfs, of course), if you have write perms for /tmp, which by default you do.
>
>
> The only alarming bit (to me) is that cgroups don't work in this case as well.
> The most adhoc solution I have would be to possibly limit the tmpfs size to
> memory.max. Adding the memcg folks for more comments.
>
> --
> Pedro
>
>> The possible mitigations I currently know of are: create a swap space, disable
>> unprivileged user namespaces, or set sysctl vm.oom_kill_allocating_task=1.
>>
>> To be 100% clear, this does not require elevated privileges, and we are only
>> using a fraction of the total system memory.
>>
>> Below is an example on a Ubuntu 25.04 VM under qemu where I hotplugged a new
>> NUMA node with 1GB of memory, I also place the current process under a 2GB
>> memory cgroup to show that it's not an effective mitigation.
>>
>> $ uname -a
>> Linux ubuntu 6.14.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Wed May 21 15:01:51 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
>>
>> $ id -u
>> 1000
>>
>> # Enable unprivileged user namespaces (this is an Ubuntu feature)
>> $ sudo sysctl kernel.apparmor_restrict_unprivileged_userns=0
>>
>> $ sudo sh -c 'echo 2G > /sys/fs/cgroup/user.slice/user-1000.slice/memory.max'
>>
>> $ numastat -mzc
>>
>> Per-node system memory usage (in MBs):
>> Token Unaccepted not in hash table.
>> Token Unaccepted not in hash table.
>> Node 0 Node 1 Total
>> ------ ------ -----
>> MemTotal 7940 1024 8964
>> MemFree 7533 1024 8557
>> MemUsed 407 0 407
>> Active 176 0 176
>> Inactive 44 0 44
>> Active(anon) 42 0 42
>> Active(file) 134 0 134
>> Inactive(file) 44 0 44
>> Unevictable 26 0 26
>> Mlocked 26 0 26
>> Dirty 0 0 0
>> FilePages 186 0 186
>> Mapped 57 0 57
>> AnonPages 59 0 59
>> Shmem 1 0 1
>> KernelStack 2 0 2
>> PageTables 2 0 2
>> Slab 84 0 84
>> SReclaimable 17 0 17
>> SUnreclaim 68 0 68
>> KReclaimable 17 0 17
>>
>> $ unshare -U -r -m sh -xc 'mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm ; dd if=/dev/zero of=/dev/shm/file bs=64K count=25000'
>> + mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm
>> + dd if=/dev/zero of=/dev/shm/file bs=64K count=25000
>> [ 294.046130] Out of memory: Killed process 1074 (systemd) total-vm:21968kB, anon-rss:2048kB, file-rss:10164kB, shmem-rss:0kB, UID:1000 pgtables:88kB oom_score_adj:100
>> [ 294.052224] Out of memory: Killed process 1076 ((sd-pam)) total-vm:21992kB, anon-rss:1772kB, file-rss:1832kB, shmem-rss:0kB, UID:1000 pgtables:76kB oom_score_adj:100
>> [ 294.058446] Out of memory: Killed process 821 (unattended-upgr) total-vm:121388kB, anon-rss:13272kB, file-rss:16004kB, shmem-rss:0kB, UID:0 pgtables:140kB oom_score_adj:0
>> [ 294.064551] Out of memory: Killed process 423 (systemd-resolve) total-vm:23200kB, anon-rss:2560kB, file-rss:11504kB, shmem-rss:0kB, UID:990 pgtables:88kB oom_score_adj:0
>> [ 294.070491] Out of memory: Killed process 789 (udisksd) total-vm:470572kB, anon-rss:1920kB, file-rss:11840kB, shmem-rss:0kB, UID:0 pgtables:136kB oom_score_adj:0
>> [ 294.076371] Out of memory: Killed process 848 (ModemManager) total-vm:391392kB, anon-rss:1792kB, file-rss:10516kB, shmem-rss:0kB, UID:0 pgtables:124kB oom_score_adj:0
>> [ 294.082350] Out of memory: Killed process 733 (systemd-network) total-vm:20804kB, anon-rss:1296kB, file-rss:10068kB, shmem-rss:0kB, UID:998 pgtables:76kB oom_score_adj:0
>> [ 294.088273] Out of memory: Killed process 1141 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.094350] Out of memory: Killed process 788 (systemd-logind) total-vm:18896kB, anon-rss:896kB, file-rss:7968kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.100461] Out of memory: Killed process 1151 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:7732kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.106462] Out of memory: Killed process 1154 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8036kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.112592] Out of memory: Killed process 1155 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.118725] Out of memory: Killed process 1161 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:998 pgtables:84kB oom_score_adj:0
>> [ 294.124827] Out of memory: Killed process 1165 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8484kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
>> [ 294.131138] Out of memory: Killed process 1169 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8604kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.137548] Out of memory: Killed process 1177 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8592kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.144659] Out of memory: Killed process 1187 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8800kB, shmem-rss:0kB, UID:998 pgtables:80kB oom_score_adj:0
>> [ 294.151118] Out of memory: Killed process 1179 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:7972kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0
>> [ 294.157569] Out of memory: Killed process 1194 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8596kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.163877] Out of memory: Killed process 417 (systemd-timesyn) total-vm:91608kB, anon-rss:896kB, file-rss:7132kB, shmem-rss:0kB, UID:996 pgtables:88kB oom_score_adj:0
>> [ 294.170240] Out of memory: Killed process 783 (polkitd) total-vm:306832kB, anon-rss:640kB, file-rss:7264kB, shmem-rss:0kB, UID:988 pgtables:96kB oom_score_adj:0
>> [ 294.176668] Out of memory: Killed process 1200 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7776kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.183107] Out of memory: Killed process 1205 (9) total-vm:20136kB, anon-rss:1152kB, file-rss:6584kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.189627] Out of memory: Killed process 1210 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7844kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.196227] Out of memory: Killed process 1209 ((d-logind)) total-vm:20140kB, anon-rss:1280kB, file-rss:7284kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.202956] Out of memory: Killed process 1212 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8568kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [ 294.209719] Out of memory: Killed process 1223 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [ 294.216356] Out of memory: Killed process 851 (rsyslogd) total-vm:220676kB, anon-rss:1280kB, file-rss:4292kB, shmem-rss:0kB, UID:101 pgtables:80kB oom_score_adj:0
>> [ 294.223146] Out of memory: Killed process 1220 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:8044kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
>> [ 294.229888] Out of memory: Killed process 1234 ((systemd)) total-vm:21992kB, anon-rss:1664kB, file-rss:8852kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:100
>> [ 294.236624] Out of memory: Killed process 952 (login) total-vm:11220kB, anon-rss:768kB, file-rss:4616kB, shmem-rss:0kB, UID:0 pgtables:64kB oom_score_adj:0
>> [ 294.243266] Out of memory: Killed process 940 (cron) total-vm:7512kB, anon-rss:256kB, file-rss:2760kB, shmem-rss:0kB, UID:0 pgtables:56kB oom_score_adj:0
>> [ 294.249871] Out of memory: Killed process 956 (agetty) total-vm:8516kB, anon-rss:128kB, file-rss:2492kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2025-06-26 23:27 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-26 22:27 Felix Abecassis
2025-06-26 23:21 ` Pedro Falcato
2025-06-26 23:27 ` Zi Yan [this message]
2025-06-27 3:15 ` Felix Abecassis
2025-06-27 8:17 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1E5B829-72A2-425C-836F-A1C94FD17C76@nvidia.com \
--to=ziy@nvidia.com \
--cc=fabecassis@nvidia.com \
--cc=hannes@cmpxchg.org \
--cc=jhubbard@nvidia.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=pfalcato@suse.de \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox