From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4799FC7EE33 for ; Thu, 26 Jun 2025 23:22:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8E696B00A0; Thu, 26 Jun 2025 19:22:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D66276B00A2; Thu, 26 Jun 2025 19:22:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C54CB6B00A3; Thu, 26 Jun 2025 19:22:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B171B6B00A0 for ; Thu, 26 Jun 2025 19:22:05 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3818110435C for ; Thu, 26 Jun 2025 23:22:05 +0000 (UTC) X-FDA: 83599127010.14.D84E204 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf25.hostedemail.com (Postfix) with ESMTP id C903BA0002 for ; Thu, 26 Jun 2025 23:22:02 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=NdBbhiFx; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=iVCm35Yz; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=CQpsvcix; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=20VJ8PTu; spf=pass (imf25.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=pfalcato@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750980123; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2DLf8ABjs0PpIIL9/242UzTCHVxoPY3YTe/HwBy7u1w=; b=bZwdi5+tNoP8BhsxhA2DTyr5FvP1nEIHSko/AKxXnztK8f0qAZ23fwPW3KYREAY/dVLVpf tg1kbwL07LAsX4cXW0ywG4T+tJqmC/JOrQNVoZ4hAxZCw3dyuAhGrO2B7+vbUgzGrZAthL uzYy2KMQDhCbc68FeQRr3ON5c5Y57zk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750980123; a=rsa-sha256; cv=none; b=XIhvDi4Xcj6OobIiBozgYYV745Q9xHmkudXeMBzIjSR3jmmNPvyqEMZLmBEciQPxPy/dGR YLrrLK7Y+wgs67J5tAVoQQOpgj8qHUmOMOHkcXezBgbMIB7u/O6P/eLL817ezvva+3i1vg 2KTV0hVhdE+Q3zNUz1NajEfzC6kbNTc= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=NdBbhiFx; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=iVCm35Yz; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=CQpsvcix; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=20VJ8PTu; spf=pass (imf25.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=pfalcato@suse.de; dmarc=pass (policy=none) header.from=suse.de Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id EB55921172; Thu, 26 Jun 2025 23:21:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1750980121; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2DLf8ABjs0PpIIL9/242UzTCHVxoPY3YTe/HwBy7u1w=; b=NdBbhiFxUQNwWpODkf8s8I6HYCYshS90gT88agNSLKuET+x9wBECXwDg8FY3VjIG0iarqy JlUCqYf0Kn/t5neFibbAFP3/n+8t4OMOyaF6FRDMQ0D2rQUWn9rjlF6wO91At32uSnoqSq DvRbVwA3Uz4Ex7CUcgHYe1MjkM+W+EI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1750980121; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2DLf8ABjs0PpIIL9/242UzTCHVxoPY3YTe/HwBy7u1w=; b=iVCm35YzNyiECg/a3lA2Dk11Emc1PcTTDSVPRF09bY35TiFpm5qTOjD3NLuKUCdCgSdqeW GU7qI6RIB9+x2MAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1750980119; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2DLf8ABjs0PpIIL9/242UzTCHVxoPY3YTe/HwBy7u1w=; b=CQpsvcixlO1JfcBGaY1GFh32RdljQY1mq7AMkkNr0+ec7H65PpANtls8Lvqy5VlpgdNXpz YrQa+SuFm9S4i61VxjlAhYWzI7dXS2mATfv663s+L6CBKEZqF7qQdJTcQ4BYuqS/lHFvsX 2WYYNS7j1wB9nlPUWmtGIlOV5WGq78E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1750980119; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2DLf8ABjs0PpIIL9/242UzTCHVxoPY3YTe/HwBy7u1w=; b=20VJ8PTudOKEeZU8ybmu1dgqkGrp/s1VWtZ0WQbpoRlJN2vXNtqA+rmXpt9iuaOUuFT0mK 5LqEJ+4aL5MpqXBQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 4B57913188; Thu, 26 Jun 2025 23:21:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 7VFjDhfWXWgWRwAAD6G6ig (envelope-from ); Thu, 26 Jun 2025 23:21:59 +0000 Date: Fri, 27 Jun 2025 00:21:57 +0100 From: Pedro Falcato To: Felix Abecassis Cc: "linux-mm@kvack.org" , Zi Yan , John Hubbard , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song Subject: Re: OOM kill of privileged processes when exhausting a single NUMA node Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C903BA0002 X-Stat-Signature: sidpj5dxzszci31gakg5bdo57p5ronwe X-Rspam-User: X-HE-Tag: 1750980122-214489 X-HE-Meta: U2FsdGVkX1/vthvFXi8s+yQkdJ2dxQ5eTuuXAUHeGzakyB/g7AhnqZAeRpY1bQu4X0XLw2xlkWYyxWb5YKwyDP/bGPUWtEDWTZLuyCQRpL1JfNXvwQcg21Ft2O71fculH9U3ZQ09guqATCJzylJhpP5pMmjX2oWKAcMVANOvaS/AG0Gt4/HCXYAJkqzkfqrLZMz/xC8psp/fap2VvHnO6fYXaIP2nv1dCJQMCQMgefvhlryGrCz9MDSzjiyy+o5R2R26EcxvDW8+kdcNhKNue7lD1OqbR62ftnaPC26ynVL0VANIxRA4A9vl3wkFMJknjhgGDHsUCi7vD5/h+zEtMtwgYdaOxcZtgnHAwA1+54x4KFTvMXcZK8/wHTG/iyMi7o5f9Cw2J+bTbSBYqD0OlYuLsXo485KChUXh1vZpnbCE3wG/OnL43KVFS/ip17F8NQmzyXj3Uc2DlOfqw+XSN06Sx6D4nTz1ceMQuoKL1g6YUim7FlLXBdYMNcuwmcRF1WP8yeSHN/fvJ+Q+azmOMFUtak4M7iyw9q+zvurDSjoWKb8JZ8g0funwvMFi4mo02WaED5C06MqjtxAkghRrZFtcorgKWyNNOToRb8eMfFND6m8L9HVQXD69CKrwb4+7+t5E503MDujANfJtAsKcRrEPJHgRHz9TO6LR7jJqF7GQ3G2saWGEN7ftWRPxdcbNBM/CtoVqnj9SR9CbXOLZp4Mx3XmfcTZh/B7LlThssMPd2cwe6DQwfQkRTZzko1aHbVIw5jMnjJzSVKarj2mOqHqoMyMKaQg0mqqfVGaaV9Aszjidw02mkxh1xd1adAl8LmDe4xZgV4giUtL9iNQbgwEantq5e3jUfp8w8u913ec7OijYYoWiF2la9Ic/AEyfDbfPYnRu+wsV4Ww025xj7tLKHHFH3HdAuqBWikAdjk980h6YEr4M2K6E4dY+Ekl6P3gbpqZ0AK8EiSh5K24 DtbWS26+ 4jjy5XamSg7+6fOcYn71RdxYm5TsmUh5eeJF2WOk2/W6Z7PgHCyG53nuNW+T/HUNGXEpg2f3O2SJZSYm+nksUvcbZdsKlEwsbtARAJfMGFsTQIiRIrBOZ21sTj5/lZPXRfIJ67hht4BesJC5jHUvbKsqhwJ2klQt0VAeFJY7i3L9i6MJ6UZhpJSMiEeYxJr6TVo0IGtxQoE1ste/DSKbTICQR6KXx4ErJzhVakyx8Op+oC2K/mWfG95CnPbJQnbcaUnrcLUhI0NT14BXYhDNCFw8h7bI3zR/9xBGcvI9l3tAPB9FLQ4PX+6CcA77dwHlWyux1wxBymo9OBeSmmKKPbAfAXw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 26, 2025 at 10:27:36PM +0000, Felix Abecassis wrote: > Hello linux-mm team, > > I have found an interesting behavior in the Linux kernel: an unprivileged user > with access to user namespaces can cause privileged processes to be killed due > to an OOM situation on a single NUMA node, even if the system has plenty of > memory available on other NUMA nodes. > > This might lead to a local denial of service in some situations, so please > review and let me know if the current behavior is expected. > > The steps are simple: > 1. Use a Linux system with multiple NUMA nodes > 2. Enable unprivileged user namespaces (often distro dependent) > 3. As an unprivileged user, create a user namespace + mount namespace > and mount a tmpfs bound to NUMA node 1 > 4. Attempt to fill the tmpfs with more data than it can possibly store > 5. The OOM killer will kill a significant amount of system daemons > (UID 0). > I somewhat agree that this is somewhat unintended tmpfs behavior, but you can (probably) pull this off in other ways: - use set_mempolicy()/mbind to bind to a NUMA node and use a big mmap() mapping - just use a lot of memory and it's not limited to NUMA either. AFAIK user namespaces aren't really isolating in the sense that you need a cgroup on top to further control software you don't trust (or want to limit for other reasons) And in this case the particular problem is that tmpfs really can't track what process "owns" a file, even if O_TMPFILE was specified. So you can quite trivially run out of memory in a regular Linux distro by filling up the /tmp (if tmpfs, of course), if you have write perms for /tmp, which by default you do. The only alarming bit (to me) is that cgroups don't work in this case as well. The most adhoc solution I have would be to possibly limit the tmpfs size to memory.max. Adding the memcg folks for more comments. -- Pedro > The possible mitigations I currently know of are: create a swap space, disable > unprivileged user namespaces, or set sysctl vm.oom_kill_allocating_task=1. > > To be 100% clear, this does not require elevated privileges, and we are only > using a fraction of the total system memory. > > Below is an example on a Ubuntu 25.04 VM under qemu where I hotplugged a new > NUMA node with 1GB of memory, I also place the current process under a 2GB > memory cgroup to show that it's not an effective mitigation. > > $ uname -a > Linux ubuntu 6.14.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Wed May 21 15:01:51 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux > > $ id -u > 1000 > > # Enable unprivileged user namespaces (this is an Ubuntu feature) > $ sudo sysctl kernel.apparmor_restrict_unprivileged_userns=0 > > $ sudo sh -c 'echo 2G > /sys/fs/cgroup/user.slice/user-1000.slice/memory.max' > > $ numastat -mzc > > Per-node system memory usage (in MBs): > Token Unaccepted not in hash table. > Token Unaccepted not in hash table. > Node 0 Node 1 Total > ------ ------ ----- > MemTotal 7940 1024 8964 > MemFree 7533 1024 8557 > MemUsed 407 0 407 > Active 176 0 176 > Inactive 44 0 44 > Active(anon) 42 0 42 > Active(file) 134 0 134 > Inactive(file) 44 0 44 > Unevictable 26 0 26 > Mlocked 26 0 26 > Dirty 0 0 0 > FilePages 186 0 186 > Mapped 57 0 57 > AnonPages 59 0 59 > Shmem 1 0 1 > KernelStack 2 0 2 > PageTables 2 0 2 > Slab 84 0 84 > SReclaimable 17 0 17 > SUnreclaim 68 0 68 > KReclaimable 17 0 17 > > $ unshare -U -r -m sh -xc 'mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm ; dd if=/dev/zero of=/dev/shm/file bs=64K count=25000' > + mount -t tmpfs -o mpol=bind:1 tmpfs /dev/shm > + dd if=/dev/zero of=/dev/shm/file bs=64K count=25000 > [ 294.046130] Out of memory: Killed process 1074 (systemd) total-vm:21968kB, anon-rss:2048kB, file-rss:10164kB, shmem-rss:0kB, UID:1000 pgtables:88kB oom_score_adj:100 > [ 294.052224] Out of memory: Killed process 1076 ((sd-pam)) total-vm:21992kB, anon-rss:1772kB, file-rss:1832kB, shmem-rss:0kB, UID:1000 pgtables:76kB oom_score_adj:100 > [ 294.058446] Out of memory: Killed process 821 (unattended-upgr) total-vm:121388kB, anon-rss:13272kB, file-rss:16004kB, shmem-rss:0kB, UID:0 pgtables:140kB oom_score_adj:0 > [ 294.064551] Out of memory: Killed process 423 (systemd-resolve) total-vm:23200kB, anon-rss:2560kB, file-rss:11504kB, shmem-rss:0kB, UID:990 pgtables:88kB oom_score_adj:0 > [ 294.070491] Out of memory: Killed process 789 (udisksd) total-vm:470572kB, anon-rss:1920kB, file-rss:11840kB, shmem-rss:0kB, UID:0 pgtables:136kB oom_score_adj:0 > [ 294.076371] Out of memory: Killed process 848 (ModemManager) total-vm:391392kB, anon-rss:1792kB, file-rss:10516kB, shmem-rss:0kB, UID:0 pgtables:124kB oom_score_adj:0 > [ 294.082350] Out of memory: Killed process 733 (systemd-network) total-vm:20804kB, anon-rss:1296kB, file-rss:10068kB, shmem-rss:0kB, UID:998 pgtables:76kB oom_score_adj:0 > [ 294.088273] Out of memory: Killed process 1141 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 > [ 294.094350] Out of memory: Killed process 788 (systemd-logind) total-vm:18896kB, anon-rss:896kB, file-rss:7968kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0 > [ 294.100461] Out of memory: Killed process 1151 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:7732kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0 > [ 294.106462] Out of memory: Killed process 1154 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8036kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0 > [ 294.112592] Out of memory: Killed process 1155 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0 > [ 294.118725] Out of memory: Killed process 1161 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8648kB, shmem-rss:0kB, UID:998 pgtables:84kB oom_score_adj:0 > [ 294.124827] Out of memory: Killed process 1165 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8484kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0 > [ 294.131138] Out of memory: Killed process 1169 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8604kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 > [ 294.137548] Out of memory: Killed process 1177 ((resolved)) total-vm:20604kB, anon-rss:1280kB, file-rss:8592kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0 > [ 294.144659] Out of memory: Killed process 1187 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8800kB, shmem-rss:0kB, UID:998 pgtables:80kB oom_score_adj:0 > [ 294.151118] Out of memory: Killed process 1179 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:7972kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0 > [ 294.157569] Out of memory: Killed process 1194 ((networkd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8596kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 > [ 294.163877] Out of memory: Killed process 417 (systemd-timesyn) total-vm:91608kB, anon-rss:896kB, file-rss:7132kB, shmem-rss:0kB, UID:996 pgtables:88kB oom_score_adj:0 > [ 294.170240] Out of memory: Killed process 783 (polkitd) total-vm:306832kB, anon-rss:640kB, file-rss:7264kB, shmem-rss:0kB, UID:988 pgtables:96kB oom_score_adj:0 > [ 294.176668] Out of memory: Killed process 1200 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7776kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0 > [ 294.183107] Out of memory: Killed process 1205 (9) total-vm:20136kB, anon-rss:1152kB, file-rss:6584kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 > [ 294.189627] Out of memory: Killed process 1210 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:7844kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 > [ 294.196227] Out of memory: Killed process 1209 ((d-logind)) total-vm:20140kB, anon-rss:1280kB, file-rss:7284kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 > [ 294.202956] Out of memory: Killed process 1212 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8568kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0 > [ 294.209719] Out of memory: Killed process 1223 ((imesyncd)) total-vm:20604kB, anon-rss:1280kB, file-rss:8556kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 > [ 294.216356] Out of memory: Killed process 851 (rsyslogd) total-vm:220676kB, anon-rss:1280kB, file-rss:4292kB, shmem-rss:0kB, UID:101 pgtables:80kB oom_score_adj:0 > [ 294.223146] Out of memory: Killed process 1220 (systemd-logind) total-vm:18728kB, anon-rss:1024kB, file-rss:8044kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0 > [ 294.229888] Out of memory: Killed process 1234 ((systemd)) total-vm:21992kB, anon-rss:1664kB, file-rss:8852kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:100 > [ 294.236624] Out of memory: Killed process 952 (login) total-vm:11220kB, anon-rss:768kB, file-rss:4616kB, shmem-rss:0kB, UID:0 pgtables:64kB oom_score_adj:0 > [ 294.243266] Out of memory: Killed process 940 (cron) total-vm:7512kB, anon-rss:256kB, file-rss:2760kB, shmem-rss:0kB, UID:0 pgtables:56kB oom_score_adj:0 > [ 294.249871] Out of memory: Killed process 956 (agetty) total-vm:8516kB, anon-rss:128kB, file-rss:2492kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0