From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1582C072A2 for ; Thu, 16 Nov 2023 01:41:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3381444014C; Wed, 15 Nov 2023 20:41:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E8BD440009; Wed, 15 Nov 2023 20:41:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AFD544014C; Wed, 15 Nov 2023 20:41:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 08633440009 for ; Wed, 15 Nov 2023 20:41:52 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D789812024C for ; Thu, 16 Nov 2023 01:41:51 +0000 (UTC) X-FDA: 81462116022.16.9315A81 Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf20.hostedemail.com (Postfix) with ESMTP id 257DA1C0013 for ; Thu, 16 Nov 2023 01:41:48 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MAtHFKoD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700098909; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OqoDEMqUC1HhkGXWjIU1Ow81RM3RUo27P1s8rtquI7c=; b=GFqNasOOzMxVSnKDxXJ8TXqvWqJTynH7wUXhelCi2AWbF8E7206qClWNaLr/iXUZl1aFt/ B5cSkdnuQtPOVmupOsN2uvF3c/mZyeGkehl32ttSRwIkAvpRg4sRilbEmrnWxKXuvejN2C +x+ONr/ahDkRtirFtWXgxPp4PJrqqUw= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MAtHFKoD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700098909; a=rsa-sha256; cv=none; b=B8YYG0mR2blY0rL94uWzn4wDc6cIfAT5H0yd7UGmn0G2UnjbwrTnAfOUqdRuaXsRnsAaBA G+iktn9uaSIilZ0mFVHuSCVx/46DUchIh/cmnZoJupI6Q1bbQQaH4uwYfVyj+c/7mlw+Ry 4S/7vYqW+hADqfqfvH61HmyWnGPAW0U= Received: by mail-qv1-f42.google.com with SMTP id 6a1803df08f44-66d0c777bf0so1654916d6.3 for ; Wed, 15 Nov 2023 17:41:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700098908; x=1700703708; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OqoDEMqUC1HhkGXWjIU1Ow81RM3RUo27P1s8rtquI7c=; b=MAtHFKoD4s6APi+2G6P1YRXVZoPtGNUwXbSftBXY88P7Rjh1N96t9glfCF9QSABOx7 /s/eW5068qMbBzrUSWy1FOHqoQ/9uEx9riWe8bOYv0N2+hPT2zEwav/vxNgYk5OhT+Vr ts8u8Uszh5mpCZ1SKqxK1iPozaGJ8d6CNNPcbPBFAC+QgHjdqtB2JYgfE4JUuOK2U9cb bpE8JTcoQ4Q3/waU/YsAqqIOQRK8fFvaNwqTA+BwFsQkPYMjJcUUbqKDPMqOh07df2AN ceU43iHR+3VsWcMr1OKzpDjM03QW3mHa2wRKWZTNi4GCGLqNmhkVKc1jW1oSUOl41lyF oWCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700098908; x=1700703708; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OqoDEMqUC1HhkGXWjIU1Ow81RM3RUo27P1s8rtquI7c=; b=wVcbR3BqReyAHiukBGnqsiT1u9G6Yn2YmE+VE51i2ZCLJaNMjIHm/4Tu//ptzurVEL wx/oQNUKN+ulF2/7S8fFWtfhDOF/77yzB5qSVoLHOaiyK4V1n7i87U7JueziNXYGnc8t 4mARONlWutvKTfSD1Qny5Lo0xJlxMAKwtLwSqNvjXJIMHDv9xgcJPt2OQ4JSn0rK9JnK bDV+S/Yeg6jvp0uvSmVh5afDlgTu4n/fE8NiKlviC/NfU0l6pgVzjWPhAWevyKmTcr63 Sx0bNJEogxyfyKGMW/O8mcJFIl41em8NAR6Vq1TJfIiL8nnzZ5lExINtVe/EuCqQR237 EXNQ== X-Gm-Message-State: AOJu0Yw3ZU7zUd4xf0/PeZADgAwRPwM7vh7PFfaOKAHX3s1bPpJ3SKqp MYdVvaRG0wWZBKMBfQuyYz+tQjt0U0uGuIswmO0G+lnx8FM= X-Google-Smtp-Source: AGHT+IHu7uq5rUpxzoyeqwZeAScz1klj4YjnR2LARo7SvHvIiQmm6l3m3tK46Pw34TCk+NfVO88+9Q4WJbchjYYWg54= X-Received: by 2002:a05:6214:1c85:b0:66d:5c10:cab7 with SMTP id ib5-20020a0562141c8500b0066d5c10cab7mr8678469qvb.46.1700098908290; Wed, 15 Nov 2023 17:41:48 -0800 (PST) MIME-Version: 1.0 References: <20231112073424.4216-1-laoar.shao@gmail.com> <188dc90e-864f-4681-88a5-87401c655878@schaufler-ca.com> <22994ba0-18eb-4f9d-a399-abde52ffdc38@schaufler-ca.com> In-Reply-To: <22994ba0-18eb-4f9d-a399-abde52ffdc38@schaufler-ca.com> From: Yafang Shao Date: Thu, 16 Nov 2023 09:41:11 +0800 Message-ID: Subject: Re: [RFC PATCH -mm 0/4] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf To: Casey Schaufler Cc: Michal Hocko , akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, linux-mm@kvack.org, linux-security-module@vger.kernel.org, bpf@vger.kernel.org, ligang.bdlg@bytedance.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 257DA1C0013 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: by8nnarnr7bherui5o3aa3wmomonz8wd X-HE-Tag: 1700098908-568795 X-HE-Meta: U2FsdGVkX1/RgDvEqfW7DyGRCBlsG4TuTjjCFpUNUdWRopwPDgDpNsVuiiGCBA5UfI7g4IwN7p1OrF25ZNTa51VFEXdDfvqw0fswjAZVCZMnO4u0nksdJro/E2m0NtNovYQjkd1RFgEfcIoicGXYv2w2CrFHSFKLZnODQ19mBqgR1O6XEniArp31O8+9cGN5u7Ceak2Bgm8+5E29IGeRpQ9v0HMSUZ/nHMV0MiUsXuVCyo/FN0gM2zecRrcNQg0lR8XiKKlzQd31PcBYiNJch3lJGm+Qy9z1CseKPbL6TWdlq9PjsmWLCveGW95bis0ELEJabv2DVeIltDZ+JOGh8Bcs+vcxxKeNWG29xkGsqyv5mEgV1PiILFpRqUVUsGacg74d2WzRTvKVJAzzsTYaWdpfujx/AgejCUldaj2BS1QppG8jmeaETcNErsTJydl4ok8KP+tDnxACoDptH/zer8uXJ36O4LNR01RVJwYZAjNrfxGbHY1gVOvDa7qlCFlUTPCfWJlxiHMtau/Q/sueX1jmfvgUuFapSAZo5REx/cCvjFSWfNie6ar3/6/KoXLJKQRivXiMK3ZeYApp7eTO3sigrD6K9YlKh4roBlJ+60SBBA3ptesz4ZVALv3GAhTuUgJ067GwTYp5hZCdJtJkzfAO2Ge1lWjmCWsQOgK6hHNRQESGtHCUuJaST0Q5RTX/sHNkfUASs8U4WJFXXMSj97N/1UyDSLPxjDzikEKEH4aj42Z4JgHfxg3DFIjth+xZEg4+DOQyjtMZMvhMjUbBIieNXlBsQOVt/39tT4NpZZy8GRuc8gCNysY34gM6uoAC63I7ZvJay0KjcBzoVYKmUQVRU4Zk4JQVi8wZgJbIq1qaZcpizxApLC/ivt+iLKmVan4KeCG0qNlgFa3ldRGP7AQxlcSQ+d5OjqDXIPiJkZSY4uiRr19kbOuwGhce/DKYcKsNU+jKBtDdhK74oP7 Krx6+r9z P+jU/e3+w8GANPP0Q2EbTix/EFFI9huVq2YPZ2IeecO75OhVXTEHLuOJ4c+fPDtznYXRle/RKXiMAd0hhFz5T0fZCH17buRXzsglF/o2DCy7mhxSjet1/GXxRouZ/knEB9R2UhBo9jECc2l7sES92QoGqU3YeQPgqsuAgf6gIl/WeR5x+X+SXc12CTB2NR6xEM8AjYnL7GLCithVd1wlYnl9TqNL0/GBwsnC0UphVxFcGuI17QJ0tDbMbG5LfXJaWdk4wQp9gIj0vR47YJrqToNkQCEzQyVv0JzQUBgFKNv9Ltcp509hvL1kIjSu6OgAztjeHEJCf40Rf1WqVeXePbGx9SqYDlM/47yXyrGfUcW793SQMvTIyabrMmtdv19rhysBnphWCTVdxs+CiAVurNbxsDcBClkKQW0NpBry1x0xZ4uYZEoABl5jNGZxrZvQn+tBjbuecyvBYvmOYhd3PsQlTh4PzJT6kMk5+sVXf/ttP2miVoKVpkkTSjGiq9QIsLhTP+VwTyGhPCNAgbUDRbk/pHw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.215951, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 16, 2023 at 1:09=E2=80=AFAM Casey Schaufler wrote: > > On 11/15/2023 6:26 AM, Yafang Shao wrote: > > On Wed, Nov 15, 2023 at 5:33=E2=80=AFPM Yafang Shao wrote: > >> On Wed, Nov 15, 2023 at 4:45=E2=80=AFPM Michal Hocko = wrote: > >>> On Wed 15-11-23 09:52:38, Yafang Shao wrote: > >>>> On Wed, Nov 15, 2023 at 12:58=E2=80=AFAM Casey Schaufler wrote: > >>>>> On 11/14/2023 3:59 AM, Yafang Shao wrote: > >>>>>> On Tue, Nov 14, 2023 at 6:15=E2=80=AFPM Michal Hocko wrote: > >>>>>>> On Mon 13-11-23 11:15:06, Yafang Shao wrote: > >>>>>>>> On Mon, Nov 13, 2023 at 12:45=E2=80=AFAM Casey Schaufler wrote: > >>>>>>>>> On 11/11/2023 11:34 PM, Yafang Shao wrote: > >>>>>>>>>> Background > >>>>>>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >>>>>>>>>> > >>>>>>>>>> In our containerized environment, we've identified unexpected = OOM events > >>>>>>>>>> where the OOM-killer terminates tasks despite having ample fre= e memory. > >>>>>>>>>> This anomaly is traced back to tasks within a container using = mbind(2) to > >>>>>>>>>> bind memory to a specific NUMA node. When the allocated memory= on this node > >>>>>>>>>> is exhausted, the OOM-killer, prioritizing tasks based on oom_= score, > >>>>>>>>>> indiscriminately kills tasks. This becomes more critical with = guaranteed > >>>>>>>>>> tasks (oom_score_adj: -998) aggravating the issue. > >>>>>>>>> Is there some reason why you can't fix the callers of mbind(2)? > >>>>>>>>> This looks like an user space configuration error rather than a > >>>>>>>>> system security issue. > >>>>>>>> It appears my initial description may have caused confusion. In = this > >>>>>>>> scenario, the caller is an unprivileged user lacking any capabil= ities. > >>>>>>>> While a privileged user, such as root, experiencing this issue m= ight > >>>>>>>> indicate a user space configuration error, the concerning aspect= is > >>>>>>>> the potential for an unprivileged user to disrupt the system eas= ily. > >>>>>>>> If this is perceived as a misconfiguration, the question arises:= What > >>>>>>>> is the correct configuration to prevent an unprivileged user fro= m > >>>>>>>> utilizing mbind(2)?" > >>>>>>> How is this any different than a non NUMA (mbind) situation? > >>>>>> In a UMA system, each gigabyte of memory carries the same cost. > >>>>>> Conversely, in a NUMA architecture, opting to confine processes wi= thin > >>>>>> a specific NUMA node incurs additional costs. In the worst-case > >>>>>> scenario, if all containers opt to bind their memory exclusively t= o > >>>>>> specific nodes, it will result in significant memory wastage. > >>>>> That still sounds like you've misconfigured your containers such > >>>>> that they expect to get more memory than is available, and that > >>>>> they have more control over it than they really do. > >>>> And again: What configuration method is suitable to limit user contr= ol > >>>> over memory policy adjustments, besides the heavyweight seccomp > >>>> approach? > > What makes seccomp "heavyweight"? The overhead? The infrastructure requir= ed? > > >>> This really depends on the workloads. What is the reason mbind is use= d > >>> in the first place? > >> It can improve their performance. > > How much? You've already demonstrated that using mbind can degrade their = performance. Pls. calm down and read the whole discussion carefully. It is not easy to understand. --=20 Regards Yafang