From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16E5FC35278 for ; Wed, 15 Nov 2023 01:53:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7128980015; Tue, 14 Nov 2023 20:53:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C24F80010; Tue, 14 Nov 2023 20:53:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 58CCB80015; Tue, 14 Nov 2023 20:53:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 45DC480010 for ; Tue, 14 Nov 2023 20:53:17 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 23408C0207 for ; Wed, 15 Nov 2023 01:53:17 +0000 (UTC) X-FDA: 81458516034.21.618184C Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf05.hostedemail.com (Postfix) with ESMTP id 63254100019 for ; Wed, 15 Nov 2023 01:53:15 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dUHcMt93; spf=pass (imf05.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700013195; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y/BHt234WbDp4zYK97AdGPPmMkFTqbDy1Led9ubCTnY=; b=6FlmPWKEGvLpC2G3lY1GF81da7+2TVhIk5SBQFZPlOjTacQmajffZc6IDSTD64C/I021Yk VSMqSAViKKX2XyqP0h0gvE9x7he36nbiiYoHKgxTVBsVpqmHMPpB1Cp36f2hUd2CtlLEvD N7xYjBu3hyr6UT0GsEGeOKkSfpstgoY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700013195; a=rsa-sha256; cv=none; b=652jkim282P+iSTS5kD9eNqaEqEzlGFFNwi7+RQa6O3GL+Xpa3fjMW1lcHo6eq+FuAKXYr XEdbqQ5O/3wY0BmlOJCjRclraU3jBjGtVhull3zb9rQWJKt3tN7L4Axbw75g1rwvQqpgQ+ HBxMv2WX92+a9OF9e65x8WZg3YTf8Cs= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dUHcMt93; spf=pass (imf05.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qv1-f42.google.com with SMTP id 6a1803df08f44-6779fe2b7c6so30006916d6.0 for ; Tue, 14 Nov 2023 17:53:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700013194; x=1700617994; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Y/BHt234WbDp4zYK97AdGPPmMkFTqbDy1Led9ubCTnY=; b=dUHcMt93wRAP0UO5pXZCVCNrspzuQ+beJLT2a7oNaz5Aqp2ME+B2fg5R3MZpAT8gis zsjA9ueHJTjUJHJFoDqQrfbYMip2e4tBNPzaWNNP1aM+bdxD2qwwhNcAYdArnapGf3Kr B30XHKVu9YGGhP07sYmn7OMaQrMBpphyPBFgQJfrSWLVdQdEFnqRpJhI/Nnc6OR8XGfT 7+cCZEybkRLEkD3+sUPKAzgHBzTvPxqKHbee3FBM3mLtKKfUu243B/5k61z2tLtrI6AX E3B852VNMTC9b5HCCxb+gyeAafJNpg+CpIgo4KL5NOGYMIjQV75597iZX+J+4ZbNbW6t CmGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700013194; x=1700617994; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y/BHt234WbDp4zYK97AdGPPmMkFTqbDy1Led9ubCTnY=; b=IsUT26hm3Er+FVaDJ9Go2X74IEQQxc1cYgus9CPHo7RjPstEmhTsIZu61/dBc62UA/ mHLuCCfJ/G/Yw+YYqaYSe/jUfm/qG5zq1R/Tz3IPepr8uy6PqXBD0r93Mmla148nDRaY q6GW3tM0s9S/1iT4bhomNciiUgXWKJX+GEuQ3AU0WNh3JLORzPk7JLsgyxoGvNJL012/ D0p8QEHIuOsKSnHAJBdytLtS8cUyEoWbtvBHVsb4OkFFDDfQcIAf7SPVz6g1ayqNpaCy HDvpZ0Eu1zy3qWCntcvCi3LX5Yezbjb0Q1Xvf2xb91opK8xYdsWWkxGLuIwX0asMS8mm yl4Q== X-Gm-Message-State: AOJu0Yy2YSju7QnZQhL2TMJBq2P4go5s6s5YJzTlQmi1Jr7Of2rMnkBm qmh5HlNd869NNeWJJLDOjPMGLXO1rk4N477oK48= X-Google-Smtp-Source: AGHT+IHo6AZNYLQHGcQHISaJgPr0SREQoncKiE8nFZNQmSoDAPfIMW7+pbh+OvB7/bygy6+qHwvEe1SBIBqyLvMWnuU= X-Received: by 2002:a05:6214:f02:b0:670:63cc:210c with SMTP id gw2-20020a0562140f0200b0067063cc210cmr6230026qvb.39.1700013194432; Tue, 14 Nov 2023 17:53:14 -0800 (PST) MIME-Version: 1.0 References: <20231112073424.4216-1-laoar.shao@gmail.com> <188dc90e-864f-4681-88a5-87401c655878@schaufler-ca.com> In-Reply-To: From: Yafang Shao Date: Wed, 15 Nov 2023 09:52:38 +0800 Message-ID: Subject: Re: [RFC PATCH -mm 0/4] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf To: Casey Schaufler Cc: Michal Hocko , akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, linux-mm@kvack.org, linux-security-module@vger.kernel.org, bpf@vger.kernel.org, ligang.bdlg@bytedance.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 63254100019 X-Rspam-User: X-Stat-Signature: cixqnc6pcsw1begrbw9j8crarqaqqhbh X-Rspamd-Server: rspam03 X-HE-Tag: 1700013195-293390 X-HE-Meta: U2FsdGVkX1+2LE6CH/OpZMOeqtsq9MQhMjFi9R/eURXPxH6EhkXk1Taon219UYJMLs8CA9fa2lwo4JOcVVQd9vLfjNoAmVx/sSYY7lmyZkyv0t39+hBVApJNkhfn5g7cIrCqE65AMz33pUYA8Pq37UnlXG03CIsEJAJflELcR80OdXgdRH7OT52gEzd7RLZ6iz2yZbopjUPF8m631rhZZOz2nnPjwWNn1co0ERDi3/2WgcLJhtzSj4gcEbGiScECxeB1JGpico+8C69Jeio23H+WDIDeEHezXRjE/hAbXfFnZG2Ml04ap7hRaM41nbIQNTS6p52abXKUFinnKXM7F0HtirwNvcnXHnzvqfYy74g3Zdn3jm4I8DR6HFrKWOABmOr5o4gjm+X6quVaft5Y0+JbOwMqoS7oh5ueboslE55yu4R8dj5T2rVZAVAznEViZ8wagqyPkc3nWGR51W0QCpmo6rznIFYwbQ9F4hlmgJO2rJmmgSdgra9j2/s6pI6oNiDTmUjblVV8ThyI32la+vKthp4tIEfT4Z6MbT9sUiGJyR+9T+uvnWD2C2KtqZQJar+bPKP8wAZEUNKldIQLoh1dtNJp9z8ytB0GRvyWvjwrsgHMc54eB1/6tmI0/Ofd7CUTMmlJDYT170iHd6XmY9B6+rNLxhNBXzU/mOmYcMTz7oz6xot43VHl+FnQfjl9GPMkG4nhtC4lL1QKZWwUgId2x3gBXCH71BfyV6A+3FIJcumzPyBahAkTwd6C+TjsKO1q8ExnS1rkmuf7E0pmahLndoKSsSLAB1lsuEwXg5y/FHNy1cqRRUOn5v8FbYOfR8lYuL3wcr4kNrbgtqsS0HLA1WMhPtIoRge0BZpG/VNTaFv/v3M90+ML1QHV85N1c1neXwxhe4G4/Ims5dpBOWk/qkzaqIJy19OxB++ABi+jXgGDc3WDs1Wp80FlDk+rBlH/3f+zaZ1U6ayu+VB FBexi1Li JnhP656aok3mOsYD4ccTggMwPqxsQQpdzh7UjPXsCjPa4duX4J9xZZFYj6I81FbbsE6nFjAbPYa27dnJPdn6t1vqsfsFKgNKJ0T7H+rbRLRHHQTYOrsconfzbnUwsxzHNhabBB+sn3pbOKeH2WuE5PTj3Tc4azukSsFfgb7S7lYJSA/ASRFV3NPXsKQ+QfxzO1UXN7bMV2GIMGQ9Kr4CpMrDSkycOZv2AZMJJFVxK5hatDM6AfgIDcfA4yGrM5HyH0fD7bVez9aWrZBlx4QoR/Ql1Lw5y4HA1JG2V4PttzWrNK3V1BQoEX44tJBQCCuxBErzTF8PB920ZsvokMzAlbQ759HY2Xb6E3ahKWyecACLStWvlqqQjxYfGdtW8EoMHwmzvDVX1Dz79ASYP2D/rNulaQfY5e2AkPFullM+PMsYLazdTflxr9rz2qdyOn9TH0Y6IYDM3tX7CCDSluomI+cl07jJaCviczfMloc6RPvhMAekqxfrmIz5OOw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.300100, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 15, 2023 at 12:58=E2=80=AFAM Casey Schaufler wrote: > > On 11/14/2023 3:59 AM, Yafang Shao wrote: > > On Tue, Nov 14, 2023 at 6:15=E2=80=AFPM Michal Hocko = wrote: > >> On Mon 13-11-23 11:15:06, Yafang Shao wrote: > >>> On Mon, Nov 13, 2023 at 12:45=E2=80=AFAM Casey Schaufler wrote: > >>>> On 11/11/2023 11:34 PM, Yafang Shao wrote: > >>>>> Background > >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >>>>> > >>>>> In our containerized environment, we've identified unexpected OOM e= vents > >>>>> where the OOM-killer terminates tasks despite having ample free mem= ory. > >>>>> This anomaly is traced back to tasks within a container using mbind= (2) to > >>>>> bind memory to a specific NUMA node. When the allocated memory on t= his node > >>>>> is exhausted, the OOM-killer, prioritizing tasks based on oom_score= , > >>>>> indiscriminately kills tasks. This becomes more critical with guara= nteed > >>>>> tasks (oom_score_adj: -998) aggravating the issue. > >>>> Is there some reason why you can't fix the callers of mbind(2)? > >>>> This looks like an user space configuration error rather than a > >>>> system security issue. > >>> It appears my initial description may have caused confusion. In this > >>> scenario, the caller is an unprivileged user lacking any capabilities= . > >>> While a privileged user, such as root, experiencing this issue might > >>> indicate a user space configuration error, the concerning aspect is > >>> the potential for an unprivileged user to disrupt the system easily. > >>> If this is perceived as a misconfiguration, the question arises: What > >>> is the correct configuration to prevent an unprivileged user from > >>> utilizing mbind(2)?" > >> How is this any different than a non NUMA (mbind) situation? > > In a UMA system, each gigabyte of memory carries the same cost. > > Conversely, in a NUMA architecture, opting to confine processes within > > a specific NUMA node incurs additional costs. In the worst-case > > scenario, if all containers opt to bind their memory exclusively to > > specific nodes, it will result in significant memory wastage. > > That still sounds like you've misconfigured your containers such > that they expect to get more memory than is available, and that > they have more control over it than they really do. And again: What configuration method is suitable to limit user control over memory policy adjustments, besides the heavyweight seccomp approach? -- Regards Yafang