From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED11EC4332F for ; Tue, 14 Nov 2023 12:00:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A4866B02C9; Tue, 14 Nov 2023 07:00:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2547E6B02CA; Tue, 14 Nov 2023 07:00:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0CF546B02CB; Tue, 14 Nov 2023 07:00:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id EAACD6B02C9 for ; Tue, 14 Nov 2023 07:00:33 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id AD098C0A86 for ; Tue, 14 Nov 2023 12:00:33 +0000 (UTC) X-FDA: 81456417546.28.9152EC1 Received: from mail-qv1-f45.google.com (mail-qv1-f45.google.com [209.85.219.45]) by imf05.hostedemail.com (Postfix) with ESMTP id 9F87A100035 for ; Tue, 14 Nov 2023 12:00:31 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bbOJuIme; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.45 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699963231; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fT89VrUM4OJhyd4A4Cn9ZU5tdEguZqAnwhZWFOOT8fs=; b=BBy/z8a6hu+JlD4/wdJ1/KzdjxYgKw2a2GiFHeiNayPEv6lVE5uimsZWIO2Mrti2xgvh10 fmuE9FFxUWfXum+BOxV4CLmIybfWZ6ZTtO+uQBdZlNBVAEYHbfshTK4aHZ+IH/8hEaSrdi iULK9nBhoboGMeALIsIq7L4Sy7ZR/Gk= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bbOJuIme; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.45 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699963231; a=rsa-sha256; cv=none; b=qDmtDWtiDojhQ7W473y5mCkP2uW6Im6xpEcYFIJxiy7XuJI9rsN7r5PM8f3QGWe7mzJlLs Pak95LnfWluW1XYaiPKCVnbz/9l+O+6u/gUOfWPonYrhxjmj93XWrDVftxlhiuDgwzPVeb LKG8fPFoMFBfXpl+1mRo2AhiOeSH0gs= Received: by mail-qv1-f45.google.com with SMTP id 6a1803df08f44-677a12f1362so22542496d6.1 for ; Tue, 14 Nov 2023 04:00:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699963230; x=1700568030; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fT89VrUM4OJhyd4A4Cn9ZU5tdEguZqAnwhZWFOOT8fs=; b=bbOJuImerCRqYQMkToAHfaCEj8NbJllFdExZRN5FCQuGWes8y2JSt2RBq3OVHWZ2wr Nf3B1a9y0jm+pf9oKqAgumb8ONZBSaT5HI3Zs11uQnXaFGPTOTHtdRHz27Dqeu9G9cpE /zCANLVpa1tkzwOrYhZDfZYKdSu9BlUtS7Q3AkY3iotUP+XhWUT5YsWnnPiQoLhq4cg2 sdUbVAl1yNJt8nUO5RIz2gg45uNs0jzzXt/95bPoA3mqydY6jdILarPw99FPluq4zYtO /g/EAARgArQtWGnJX8MwYRg9ZrfPhNTkIKYaweXt+r5d1QzeMuXk8ti0Lt+e0Rj8Jk4Y nx5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699963230; x=1700568030; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fT89VrUM4OJhyd4A4Cn9ZU5tdEguZqAnwhZWFOOT8fs=; b=vWhM7kMZW+m5J4B+kvYAgxlY0zyyHIGFtiLYeqxPoRKmYCd2f6YRryEmZlDMs8ZmNV y2fVqicYMyFwbw4M+t2gse2loTsSI0u7xJnRkh+z6P4DcNcmn1f4sgdsxlJuXSw1TGFj t4OU/AeS3zBv/mgDMjKhYrbBhvvg10OSLfLK9hED4INROADb8xajp+GQvQrGWkfhtGa0 yFlshhh+QSo9Slb9D/I6u88eWz9qrjdI42/NW298RrmDHiud0NevePKXH2iOqT+0RYyC oKjOKXTnQc8WOP6P6b+nscNbzWIFhk8xmOW9n8Y9hfsKEDt+CWAsVkeS/aOKpFBAs7TW bCHg== X-Gm-Message-State: AOJu0YwNtsSqXyxXeYS/D+f2sbVMGMuUnGxvJSVnadY/o1oP5qbFGUVb 06oD3JVWqKCRy/LuzbCNATqWJTh1815JAc8JMX8= X-Google-Smtp-Source: AGHT+IELivM7PdGAPOr5HMY5T+cLa3aYp631h40RVkoxQiFV0TnNHl9sZdX2KAviztPvfo3EUpARMpWi2B+gK/slfBE= X-Received: by 2002:ad4:4592:0:b0:66f:e3d4:2145 with SMTP id x18-20020ad44592000000b0066fe3d42145mr2413827qvu.46.1699963230533; Tue, 14 Nov 2023 04:00:30 -0800 (PST) MIME-Version: 1.0 References: <20231112073424.4216-1-laoar.shao@gmail.com> <188dc90e-864f-4681-88a5-87401c655878@schaufler-ca.com> In-Reply-To: From: Yafang Shao Date: Tue, 14 Nov 2023 19:59:53 +0800 Message-ID: Subject: Re: [RFC PATCH -mm 0/4] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf To: Michal Hocko Cc: Casey Schaufler , akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, linux-mm@kvack.org, linux-security-module@vger.kernel.org, bpf@vger.kernel.org, ligang.bdlg@bytedance.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 9F87A100035 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 57csgwudru4xh63n1skmn79idfj343kq X-HE-Tag: 1699963231-384685 X-HE-Meta: U2FsdGVkX19ioSVbMLqBbLOVqU8qbtP3pODVprhg1nEnmV24sWQULGsMERLbHNDOffj1Px54JhuuGca27E7yTVMuvzm2BFmCxlRLXnwHyeBS8w02RUFR5LKNuTJ6XiWRhsiw1ioKZh5A/mRDnK3FGKhBmAH13aUSmGoWJD3KDCperU2gZ687TMZL7eJVBHW4jKqiPOftjDC61u8j+bA5CSMJjyDOoAm0V3jXqsS3Kq7CKNhIfXeS63m1XItP9unamAmgIFXZPfqQRbj9L1I9B3TE+XJFJ8AhkV+OoVW3bpYEh6QarISYD/hK3eAvgcZiYGbTvTonI5gY+V56liCdKr38MzLaazt9wNPlJWc4cDW5X/YeRvrERkRzBfGAX9+TsUhG8ItFJZXDrupgi0wOwhp3pD9w0rP2dvwiYQOardC3CyIgT4el/8NjMc5dFnTOgKC1S4VsA8GH2JZzgb1y9VRL0IDdUaUBnv0iEMAjalF3P93LJjMcNUU5y5MWIwf+YtrYdD5bmjU4ih15YhUeBjetkYHB+qt7d4rBKruTOmvstMPYa7Rd6xKjZy9LPSTHK6ooDQdN4NezWx+EMSx1jzFgnMhlGxKLFtvaRxmq7TlhjwU/tltHcQ00EsLezi0vYVafUGGsRfI80IzCInWYG34SJDlrnzvd+p3g9yw0dyC/Rb49zj2YuQFLsxvXgyH9QV/Ah908tMlT7SkQg8zmv3XSTppiRFRxGPA5Cv3/0zvboKOzsEuE6j6zF69LCBJqR4sk/4t+1FmVAWaCstQwKLHhTicLU4EiwnMRotbzDp02Lda8hqPUmR33zab/d0dfk+a03+ItNusrKqMDbxQq6RlqNUialHfSJIdZ9+0PmcTnQ8n7QlkrLunATUDqLGEcS4P2fP7DEZHrhEgy9eJZw64+9jGasiRiG+r7a+OrGoFtyuA1SSreaYi+TSnTKzsVrIJTh4wkPJoo5mTnMPs bwsqy8N3 6j1Y01Va2SwN3k8R2ZXGetloe09fqN/A6XbSbXwezwoIEtQEpj0d4pABxLCgszjGCI2bEHL9YihYMqAcCJuGjFwHki7wW6p0Cieo+H4MmGnxu1s9FkJs96HoLEIIy7rx35+U9pJBivHGVSjH9cbBJ45iZuiuv7+aUQwpvN3YNlz4x8EU1Mc3rwLV/8xXyLlgTfMGSIFPHNKZiE9PHQ2oKYox460jeFyKWyDOiqQ/k0utUoBhWBGpQnvGzUauMcaLf7BCUmxwTCpHLM6eRk+Lpzmzt8SXVbOXsk84uU3nziLYXIfYduI1dseB2hOlxXkuMEDWYs66ZEPS3QxSaYkNE2U+RsP1cWqYO/q0cvnToZ4Y+cNZB1EGD8+jxr/IbPAv8QQTAHiRav8AW9wuQHueYuaNn9peN0rXubYtDeHYo+bGGXMCFQAS8X7AXMcArrXrD5YCIJCJlB0Xdml4GGywCZr40Hq5zDTDVguSeAAN/Kpt0XDoigx7dfnIPpQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.033578, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 14, 2023 at 6:15=E2=80=AFPM Michal Hocko wrot= e: > > On Mon 13-11-23 11:15:06, Yafang Shao wrote: > > On Mon, Nov 13, 2023 at 12:45=E2=80=AFAM Casey Schaufler wrote: > > > > > > On 11/11/2023 11:34 PM, Yafang Shao wrote: > > > > Background > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > In our containerized environment, we've identified unexpected OOM e= vents > > > > where the OOM-killer terminates tasks despite having ample free mem= ory. > > > > This anomaly is traced back to tasks within a container using mbind= (2) to > > > > bind memory to a specific NUMA node. When the allocated memory on t= his node > > > > is exhausted, the OOM-killer, prioritizing tasks based on oom_score= , > > > > indiscriminately kills tasks. This becomes more critical with guara= nteed > > > > tasks (oom_score_adj: -998) aggravating the issue. > > > > > > Is there some reason why you can't fix the callers of mbind(2)? > > > This looks like an user space configuration error rather than a > > > system security issue. > > > > It appears my initial description may have caused confusion. In this > > scenario, the caller is an unprivileged user lacking any capabilities. > > While a privileged user, such as root, experiencing this issue might > > indicate a user space configuration error, the concerning aspect is > > the potential for an unprivileged user to disrupt the system easily. > > If this is perceived as a misconfiguration, the question arises: What > > is the correct configuration to prevent an unprivileged user from > > utilizing mbind(2)?" > > How is this any different than a non NUMA (mbind) situation? In a UMA system, each gigabyte of memory carries the same cost. Conversely, in a NUMA architecture, opting to confine processes within a specific NUMA node incurs additional costs. In the worst-case scenario, if all containers opt to bind their memory exclusively to specific nodes, it will result in significant memory wastage. > You can > still have an unprivileged user to allocate just until the OOM triggers > and disrupt other workload consuming more memory. Sure the mempolicy > based OOM is less precise and it might select a victim with only a small > consumption on a target NUMA node but fundamentally the situation is > very similar. I do not think disallowing mbind specifically is solving a > real problem. How would you recommend addressing this more effectively? --=20 Regards Yafang