From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55726C4167B for ; Tue, 14 Nov 2023 10:15:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D05066B02AF; Tue, 14 Nov 2023 05:15:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C8E6C6B02B0; Tue, 14 Nov 2023 05:15:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2E5D6B02B3; Tue, 14 Nov 2023 05:15:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9D4076B02AF for ; Tue, 14 Nov 2023 05:15:07 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 674B1160901 for ; Tue, 14 Nov 2023 10:15:07 +0000 (UTC) X-FDA: 81456151854.11.D458C21 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf08.hostedemail.com (Postfix) with ESMTP id 476AA160009 for ; Tue, 14 Nov 2023 10:15:05 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="XtB/PXaW"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf08.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699956905; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SugAxX0AOOgxOm0z3CfXkaaUjnczUiytSEA5AFJbaNo=; b=COAawQIKhl3WkWUZFqOhdC0x0gF4Ee3ZA1qz0kJViJnNvg1vjGz7AG9saAQk/XLqeHclY4 ClEs3yoe6dr8blWdK1g78JhaUi7QaGD4cVp1KumFYqAKE8tQtaN8cHqhyy7JgYcLottLtF 5GQHHfFsmR3B6YZEIXplYdfu8ktDAX8= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="XtB/PXaW"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf08.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699956905; a=rsa-sha256; cv=none; b=HrwOKhXFkXTlGIHGSrcVUbvhKXdkyoc6YXrHcTwLfKAp7OWOQykjA0BdmCeFyk2xHDrrcX P433FJZa+qF01hfHdqGdiS34K27egF26tHWm2MOeu/JtGxwxZM4sb62jveSwMTeNz45xT5 QOTC+/Q6KIhPfIG8lfxOZcI6MSLe/lg= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id F2D3A1F88C; Tue, 14 Nov 2023 10:15:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1699956903; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SugAxX0AOOgxOm0z3CfXkaaUjnczUiytSEA5AFJbaNo=; b=XtB/PXaWGt6TJXW8BlOr8DnScCxjOryGtVmxEYV6/RCPj1aMl8zGxyMqZC79deHNzZlYFm 75JAkKidjfjvpYcavjskYPCBYL3w/NfxmPEYJB7rRVImMNZIszLuaJn0aHpejjkPVQZtzG Wu4JGhKICPi1lFtguQfAkmywszjI5yI= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D663513416; Tue, 14 Nov 2023 10:15:02 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id OoNFMqZIU2VgKQAAMHmgww (envelope-from ); Tue, 14 Nov 2023 10:15:02 +0000 Date: Tue, 14 Nov 2023 11:15:02 +0100 From: Michal Hocko To: Yafang Shao Cc: Casey Schaufler , akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, linux-mm@kvack.org, linux-security-module@vger.kernel.org, bpf@vger.kernel.org, ligang.bdlg@bytedance.com Subject: Re: [RFC PATCH -mm 0/4] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Message-ID: References: <20231112073424.4216-1-laoar.shao@gmail.com> <188dc90e-864f-4681-88a5-87401c655878@schaufler-ca.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Stat-Signature: 86au4dq3ok6f7n9hxytrxcuo4rfocnpa X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 476AA160009 X-HE-Tag: 1699956905-431595 X-HE-Meta: U2FsdGVkX197P+FGlMB9ov8HFk1bm6p2u/KPZc9NjXN38KwoRrBfR9EPg8D8qwKIKIcpl3KoplZBXmxnvMFlnViKhRugTHdMl86M0Rp/JHpSvw4hSVz3h8LwkCIcpkaeEs01kRFiShlSQWw50t0Hi0i7IXTOEHnw8m9titiHQcCRC101SCk39axP3dy21H+0eIQjrIQK3glNMMgC4lLW2hCjTNQwas3Gqur05L5x4jeJESHGzQkQPjyIX7BO70C6AfLE2YhiF2Xfyj1f/CxEYIQsLafDdCjMGF7FcX1IpZdTPZZUYzc124cBhVp0+58xOw6tNoSPyin5iOnd/H0MqBafufUxErKKEN2A14N6y7TsLPoNZgYzmmTbkaNsh7U1kuW9bsXaoKP5wZOl0UL4Zxu0bUXomYzxYLDLaL6CGofXXOLC8DvrkGccFqN+r1N0skqRlJOwpGdpH35dp6CmIGBNPG94shm6jMcqOjeVQlubf8S6Tgt2s8QRV4DNKYWKp4Mu2R76NY3V4mzml9naMwpc8bbcYAQn8U62EGdMbv0EAQux8IrHuP7RE8wVgYN1IzKVDRgc037xYpDpHYAFH/vOdTMckT+mRPPqWb09ZJabKPDYML9cuxwvTQenyAUTXWkr9DKpXjiE1GEgGKQuvP3t+83GZb277j6YrvJeOMK93KhInx4zE15j2gR6RPO2wHq6uZ7DhF3/46IXs4WqRjZrcJGQLojXiiF2+aEaaedENBCa0/XUWq0/vE4QOFQGJkBeOPRXwzg7GxZwqqOg6WR2kO+w/+tZS4/VbwtgnJiiJZaXRJ/sASK+6SD2LRSXcIHHBRNl5Q201FlleE0ULcUtNdq34/EccOCAJcVKMoMzGvgLSA7Ia/+0tT+b+esT1ujE8cseYhR3h0/JHZY6q/uQdw1pW46usvtDED8IF/0Idi6SbjiQ+WfVVbEKDbGuXgDWBu7w0N7HKUmfzVs GWn2LOkj gospAZiSH5vIXnllZozEhjCqU/uNNVWHlytNlCb/Dg5in47cRM0MCQWcafbHO9TojBK7nxZb/JlV5BZXDcgZmo1rh/aZuG8NeTW19TJTiqj51bfJ5WUlWeSL1NXxDZ/WySC9ydQXEvC1BFM11wLjvKaLRat4Hb31R7/5HZNw44VKKMeP/l9nH+GoQUWAu6ZyndN8+xChRSA20r2GelXPNGvWFCyGLnI0chO0la/EvXRw4bb5YPqaHCFXlUpuPyopq6G2vHp9E71X+WtwQaLBuJrJB5Rv2oHFFfd01hdXTzGYMPdZHzcZKeotymxNDAC7Ttgrj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon 13-11-23 11:15:06, Yafang Shao wrote: > On Mon, Nov 13, 2023 at 12:45 AM Casey Schaufler wrote: > > > > On 11/11/2023 11:34 PM, Yafang Shao wrote: > > > Background > > > ========== > > > > > > In our containerized environment, we've identified unexpected OOM events > > > where the OOM-killer terminates tasks despite having ample free memory. > > > This anomaly is traced back to tasks within a container using mbind(2) to > > > bind memory to a specific NUMA node. When the allocated memory on this node > > > is exhausted, the OOM-killer, prioritizing tasks based on oom_score, > > > indiscriminately kills tasks. This becomes more critical with guaranteed > > > tasks (oom_score_adj: -998) aggravating the issue. > > > > Is there some reason why you can't fix the callers of mbind(2)? > > This looks like an user space configuration error rather than a > > system security issue. > > It appears my initial description may have caused confusion. In this > scenario, the caller is an unprivileged user lacking any capabilities. > While a privileged user, such as root, experiencing this issue might > indicate a user space configuration error, the concerning aspect is > the potential for an unprivileged user to disrupt the system easily. > If this is perceived as a misconfiguration, the question arises: What > is the correct configuration to prevent an unprivileged user from > utilizing mbind(2)?" How is this any different than a non NUMA (mbind) situation? You can still have an unprivileged user to allocate just until the OOM triggers and disrupt other workload consuming more memory. Sure the mempolicy based OOM is less precise and it might select a victim with only a small consumption on a target NUMA node but fundamentally the situation is very similar. I do not think disallowing mbind specifically is solving a real problem. -- Michal Hocko SUSE Labs