linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Pedro Falcato <pfalcato@suse.de>
Cc: Felix Abecassis <fabecassis@nvidia.com>,
	linux-mm@kvack.org, John Hubbard <jhubbard@nvidia.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>
Subject: Re: OOM kill of privileged processes when exhausting a single NUMA node
Date: Thu, 26 Jun 2025 19:27:36 -0400	[thread overview]
Message-ID: <E1E5B829-72A2-425C-836F-A1C94FD17C76@nvidia.com> (raw)
In-Reply-To: <btenbxjm4eheurdl2oexxy4h5diphmfy5cugiscfv6nljqhfki@2xxhbwtfslsj>

On 26 Jun 2025, at 19:21, Pedro Falcato wrote:

> On Thu, Jun 26, 2025 at 10:27:36PM +0000, Felix Abecassis wrote:
>> Hello linux-mm team,
>>
>> I have found an interesting behavior in the Linux kernel: an unprivileged user
>> with access to user namespaces can cause privileged processes to be killed due
>> to an OOM situation on a single NUMA node, even if the system has plenty of
>> memory available on other NUMA nodes.
>>
>> This might lead to a local denial of service in some situations, so please
>> review and let me know if the current behavior is expected.
>>
>> The steps are simple:
>> 1. Use a Linux system with multiple NUMA nodes
>> 2. Enable unprivileged user namespaces (often distro dependent)
>> 3. As an unprivileged user, create a user namespace + mount namespace
>>    and mount a tmpfs bound to NUMA node 1
>> 4. Attempt to fill the tmpfs with more data than it can possibly store
>> 5. The OOM killer will kill a significant amount of system daemons
>>    (UID 0).
>>
>
> I somewhat agree that this is somewhat unintended tmpfs behavior, but you can
> (probably) pull this off in other ways:
>
> - use set_mempolicy()/mbind to bind to a NUMA node and use a big mmap() mapping
> - just use a lot of memory

OOM will kill the app using a lot of memory, but with tmpfs, like you mentioned
below, OOM is not able to find a victim process to kill.

>
> and it's not limited to NUMA either.
>
> AFAIK user namespaces aren't really isolating in the sense that you need a
> cgroup on top to further control software you don't trust (or want to limit
> for other reasons)
>
>
> And in this case the particular problem is that tmpfs really can't track
> what process "owns" a file, even if O_TMPFILE was specified. So you can quite
> trivially run out of memory in a regular Linux distro by filling up the /tmp
> (if tmpfs, of course), if you have write perms for /tmp, which by default you do.
>
>
> The only alarming bit (to me) is that cgroups don't work in this case as well.
> The most adhoc solution I have would be to possibly limit the tmpfs size to
> memory.max. Adding the memcg folks for more comments.
>
> --
> Pedro
>
>> The possible mitigations I currently know of are: create a swap space, disable
>> unprivileged user namespaces, or set sysctl vm.oom_kill_allocating_task=1.
>>
>> To be 100% clear, this does not require elevated privileges, and we are only
>> using a fraction of the total system memory.
>>
>> Below is an example on a Ubuntu 25.04 VM under qemu where I hotplugged a new
>> NUMA node with 1GB of memory, I also place the current process under a 2GB
>> memory cgroup to show that it's not an effective mitigation.
>>
>> $ uname -a
>> Linux ubuntu 6.14.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Wed May 21 15:01:51 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
>>
>> $ id -u
>> 1000
>>
>> # Enable unprivileged user namespaces (this is an Ubuntu feature)
>> $ sudo sysctl kernel.apparmor_restrict_unprivileged_userns=0
>>
>> $ sudo sh -c 'echo 2G > /sys/fs/cgroup/user.slice/user-1000.slice/memory.max'
>>
>> $ numastat -mzc
>>
>> Per-node system memory usage (in MBs):
>> Token Unaccepted not in hash table.
>> Token Unaccepted not in hash table.
>>                  Node 0 Node 1 Total
>>                  ------ ------ -----
>> MemTotal           7940   1024  8964
>> MemFree            7533   1024  8557
>> MemUsed             407      0   407
>> Active              176      0   176
>> Inactive             44      0    44
>> Active(anon)         42      0    42
>> Active(file)        134      0   134
>> Inactive(file)       44      0    44
>> Unevictable          26      0    26
>> Mlocked              26      0    26
>> Dirty                 0      0     0
>> FilePages           186      0   186
>> Mapped               57      0    57
>> AnonPages            59      0    59
>> Shmem                 1      0     1
>> KernelStack           2      0     2
>> PageTables            2      0     2
>> Slab                 84      0    84
>> SReclaimable         17      0    17
>> SUnreclaim           68      0    68
>> KReclaimable         17      0    17
>>
>> $ unshare -U -r -m sh -xc 'mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm ; dd if=/dev/zero of=/dev/shm/file bs=64K count=25000'
>> + mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm
>> + dd if=/dev/zero of=/dev/shm/file bs=64K count=25000
>> [  294.046130] Out of memory: Killed process 1074 (systemd) total-vm:21968kB, anon-rss:2048kB, file-rss:10164kB, shmem-rss:0kB, UID:1000 pgtables:88kB oom_score_adj:100
>> [  294.052224] Out of memory: Killed process 1076 ((sd-pam)) total-vm:21992kB, anon-rss:1772kB, file-rss:1832kB, shmem-rss:0kB, UID:1000 pgtables:76kB oom_score_adj:100
>> [  294.058446] Out of memory: Killed process 821 (unattended-upgr) total-vm:121388kB, anon-rss:13272kB, file-rss:16004kB, shmem-rss:0kB, UID:0 pgtables:140kB oom_score_adj:0
>> [  294.064551] Out of memory: Killed process 423 (systemd-resolve) total-vm:23200kB, anon-rss:2560kB, file-rss:11504kB, shmem-rss:0kB, UID:990 pgtables:88kB oom_score_adj:0
>> [  294.070491] Out of memory: Killed process 789 (udisksd) total-vm:470572kB, anon-rss:1920kB, file-rss:11840kB, shmem-rss:0kB, UID:0 pgtables:136kB oom_score_adj:0
>> [  294.076371] Out of memory: Killed process 848 (ModemManager) total-vm:391392kB, anon-rss:1792kB, file-rss:10516kB, shmem-rss:0kB, UID:0 pgtables:124kB oom_score_adj:0
>> [  294.082350] Out of memory: Killed process 733 (systemd-network) total-vm:20804kB, anon-rss:1296kB, file-rss:10068kB, shmem-rss:0kB, UID:998 pgtables:76kB oom_score_adj:0
>> [  294.088273] Out of memory: Killed process 1141 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [  294.094350] Out of memory: Killed process 788 (systemd-logind) total-vm:18896kB, anon-rss:896kB, file-rss:7968kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [  294.100461] Out of memory: Killed process 1151 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:7732kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [  294.106462] Out of memory: Killed process 1154 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8036kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [  294.112592] Out of memory: Killed process 1155 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [  294.118725] Out of memory: Killed process 1161 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:998 pgtables:84kB oom_score_adj:0
>> [  294.124827] Out of memory: Killed process 1165 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8484kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
>> [  294.131138] Out of memory: Killed process 1169 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8604kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [  294.137548] Out of memory: Killed process 1177 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8592kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [  294.144659] Out of memory: Killed process 1187 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8800kB, shmem-rss:0kB, UID:998 pgtables:80kB oom_score_adj:0
>> [  294.151118] Out of memory: Killed process 1179 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:7972kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0
>> [  294.157569] Out of memory: Killed process 1194 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8596kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [  294.163877] Out of memory: Killed process 417 (systemd-timesyn) total-vm:91608kB, anon-rss:896kB, file-rss:7132kB, shmem-rss:0kB, UID:996 pgtables:88kB oom_score_adj:0
>> [  294.170240] Out of memory: Killed process 783 (polkitd) total-vm:306832kB, anon-rss:640kB, file-rss:7264kB, shmem-rss:0kB, UID:988 pgtables:96kB oom_score_adj:0
>> [  294.176668] Out of memory: Killed process 1200 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7776kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [  294.183107] Out of memory: Killed process 1205 (9) total-vm:20136kB, anon-rss:1152kB, file-rss:6584kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [  294.189627] Out of memory: Killed process 1210 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7844kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [  294.196227] Out of memory: Killed process 1209 ((d-logind)) total-vm:20140kB, anon-rss:1280kB, file-rss:7284kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [  294.202956] Out of memory: Killed process 1212 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8568kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
>> [  294.209719] Out of memory: Killed process 1223 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
>> [  294.216356] Out of memory: Killed process 851 (rsyslogd) total-vm:220676kB, anon-rss:1280kB, file-rss:4292kB, shmem-rss:0kB, UID:101 pgtables:80kB oom_score_adj:0
>> [  294.223146] Out of memory: Killed process 1220 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:8044kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0
>> [  294.229888] Out of memory: Killed process 1234 ((systemd)) total-vm:21992kB, anon-rss:1664kB, file-rss:8852kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:100
>> [  294.236624] Out of memory: Killed process 952 (login) total-vm:11220kB, anon-rss:768kB, file-rss:4616kB, shmem-rss:0kB, UID:0 pgtables:64kB oom_score_adj:0
>> [  294.243266] Out of memory: Killed process 940 (cron) total-vm:7512kB, anon-rss:256kB, file-rss:2760kB, shmem-rss:0kB, UID:0 pgtables:56kB oom_score_adj:0
>> [  294.249871] Out of memory: Killed process 956 (agetty) total-vm:8516kB, anon-rss:128kB, file-rss:2492kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0


Best Regards,
Yan, Zi


  reply	other threads:[~2025-06-26 23:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-26 22:27 Felix Abecassis
2025-06-26 23:21 ` Pedro Falcato
2025-06-26 23:27   ` Zi Yan [this message]
2025-06-27  3:15   ` Felix Abecassis
2025-06-27  8:17   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1E5B829-72A2-425C-836F-A1C94FD17C76@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=fabecassis@nvidia.com \
    --cc=hannes@cmpxchg.org \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=pfalcato@suse.de \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox