From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF131C369AB for ; Sat, 19 Apr 2025 02:06:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1DE86B0005; Fri, 18 Apr 2025 22:06:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C9EEF6B0007; Fri, 18 Apr 2025 22:06:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7B436B0008; Fri, 18 Apr 2025 22:06:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8709A6B0005 for ; Fri, 18 Apr 2025 22:06:47 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7662B121499 for ; Sat, 19 Apr 2025 02:06:48 +0000 (UTC) X-FDA: 83349154896.12.BD26A57 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 17A33A0004 for ; Sat, 19 Apr 2025 02:06:45 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Y14edkmc; spf=pass (imf15.hostedemail.com: domain of llong@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=llong@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745028406; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fg5jTz0E8wfSFW3z1gz+claCWc6uoPxGD1UaNbxQo8M=; b=If4NOB1ZPLVTvDTNSxazS1RF3TmpEbK2ptsPMDFlvvWlyl7wjxn2MiVMe7nmY5mCqFeEk0 q6ggcT40uUrEB9M27rn/ZaXznc82rMPmzkw2kelPLKhWu00dlIC8P1qx0vhYgLJSJvHhhg BAyoCwKKp5XkerSp6ELcbw3whMAwET0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Y14edkmc; spf=pass (imf15.hostedemail.com: domain of llong@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=llong@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745028406; a=rsa-sha256; cv=none; b=mspEzL/9vW7DpfYpDKt9YsAHubRJ/3FgGokCsupV7+0IOUUTW2Cp1eiwJ/JROO6kpYDY9s Hdy+5hdmjFFPXa18XBh910hW1ArP5tAsT4lnMZwYV/0SLE/uOBQD6JSM8SVFItrBeYqZ0+ BGpDDfoj2CTozgisk/YKWQpqlwyJ4Fs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745028405; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Fg5jTz0E8wfSFW3z1gz+claCWc6uoPxGD1UaNbxQo8M=; b=Y14edkmcvISSCIxXaciN3kcDGQz+/Ji5+Q4q9CCHCsb+K5IlRvvzyd4Fjn8194qM4SxeKt NUoaPGET8qiwSftktsKJIK3OP4z8tQQaE2bPHod/j6caLbGFhwxwC+jC9nQhVUWQwDUepu 7RX+2Z1o3iTikM+Z6YmigRZtovESESY= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-177-iqWAm12tMi6mNOzlG5UthA-1; Fri, 18 Apr 2025 22:06:44 -0400 X-MC-Unique: iqWAm12tMi6mNOzlG5UthA-1 X-Mimecast-MFC-AGG-ID: iqWAm12tMi6mNOzlG5UthA_1745028403 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-7c543ab40d3so343436585a.2 for ; Fri, 18 Apr 2025 19:06:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745028403; x=1745633203; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:user-agent:mime-version:date:message-id:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Fg5jTz0E8wfSFW3z1gz+claCWc6uoPxGD1UaNbxQo8M=; b=kXUOsnimuL3kq42wN0wkKjASf16Bs5YQcezOGecyXCNbwnNKD/Ze00ipMKjMMci9EH fFb5RJYl9JQx3LVmGwsZ9S6pJdlo23h9LiwhjZrB/PXaG8zdeMjc6tA2TBx8FMUax0IW tpf0cyPSG2uStWG8oQbYiLaNduaHQjE1gsfDrvt/h6X5MSC8YhPowBoS8e+CKl+6iFZz BX9SepDOy8tWta152LyQZOTp28Q2zXQjbylIbj6ab6BBolz8dLTVKB5ORimN0OdLSinf o9ZHh3QOhHE7dnxTopCOQ7bC/Yyg/3tqyC76lFwcsM0B0YAJqzO0IoP52oAWa+dE/VS2 lVuQ== X-Gm-Message-State: AOJu0YzZAUiNejrB2cckmuumkJa6KLVIG9zy6Qa9xVXT+UCgGYII9IEy aiYJc510Y228GPM/f5Bd3dmPHgXJeGWyLnBk234AquqDSBzCsAiRyJudVUQ9GUNIu6bI4EZhomV w2qgNQ16dLOxqGMV4VRS4aMB9Vp/DslikxTNyR2zsf7DfuQHI X-Gm-Gg: ASbGncvNcqp6K9H/9PmXg+koUWU4/6YMVXN8DxD8pXDcealcANIH/7IwoxZ5/8NPc5b 6v2uFANVAo/8Y/wDiUPiTzTauEcNtPbMNrUtjFft30RYD/fsQ4MZh4oU6nILt8U+1bqMZv6F6gt 44JRFJKYbSENS1mAfsvPG9I3MClt15JDLhjAJaYt6byHK/bFWOoB+NCU3IgXApYYilKMb8WoaiK qD9iUgFykD73C9dcbF/cVPM2vYtuP9UJULWqj9yP6D6+DMjbXSg07OanoZWw8M4Dbfcoy2kk29F Gi5X12XtNgJy7YHeqsiqXGkAo2qg9d6c/IA86d/3X85MGBKWcA== X-Received: by 2002:a05:620a:1a88:b0:7c5:4caa:21b7 with SMTP id af79cd13be357-7c928049e4fmr805918085a.56.1745028403370; Fri, 18 Apr 2025 19:06:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH0Vva5HgQYqt3HAguwTyJTqtBqylBMgw/o808p/lVSAP69/kKkHthGT+f5tt9sh/5MY+qQTg== X-Received: by 2002:a05:620a:1a88:b0:7c5:4caa:21b7 with SMTP id af79cd13be357-7c928049e4fmr805915785a.56.1745028402963; Fri, 18 Apr 2025 19:06:42 -0700 (PDT) Received: from [192.168.130.170] (67-212-218-66.static.pfnllc.net. [67.212.218.66]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c925a90ce7sm167726485a.47.2025.04.18.19.06.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 18 Apr 2025 19:06:42 -0700 (PDT) From: Waiman Long X-Google-Original-From: Waiman Long Message-ID: <162f1ae4-2adf-4133-8de4-20f240e5469e@redhat.com> Date: Fri, 18 Apr 2025 22:06:40 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 2/2] vmscan,cgroup: apply mems_effective to reclaim To: Gregory Price , cgroups@vger.kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org References: <20250418031352.1277966-1-gourry@gourry.net> <20250418031352.1277966-2-gourry@gourry.net> In-Reply-To: <20250418031352.1277966-2-gourry@gourry.net> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: tiLlfnkrgEFPUdxvoP3RjV00NivyVgrgOfbCBaZRiak_1745028403 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 17A33A0004 X-Stat-Signature: 74iq98gays1bmbeusx5m6ex7zfyok6i5 X-HE-Tag: 1745028405-95546 X-HE-Meta: U2FsdGVkX1/rUXfaq9ju23VL+992WDGl54KXJkR2tw87ogHlEjLyUESBut1XWNCmE9lB14Oe1LKJSyyZ+JF/5A1FB79+c+R9nHaJMF0Upt1X8MItLce9D7PXdjwPZ5ctIQbo24WHgoim62ELszvxLXt6M5RwuCyS93/HUXOknC+txzzmwv5vGNnbfAL893lsh2rp9qqKgNr8DOC33s+0uMTPPfG8kIaLF68knike0z+nOuwA5yU6hr5HazLEnfvf7ms2IFPWWiA88wLZltGEjVIBwYnw5u2vgnCd3RSETozafASPX4I72s/pYJzf0bHz0l4FDNGQ1nsKG2+yBePRLwY4hazF7+ODJpddNX2mhLZW9UGE9lNLOpZzppN/ktx+ZAfqiiayZhDkO9PylD6kjUUhbvU8e9yf/ho0EnXgzZwKJqdB9mo4Hk3E3It5EVl6EA3nxUtdItitcE//q7W+doJxuimefWy3l9S+4/p4dbcySZxVop/NdeiwfrZz9QHCFzAu6Lom8o8VLciQRrXy3vt8JQSg/Z1wIhUyyseSXmJtKIpsBhkOxXe5k6CT30ZdeUEs9EbPoMNxwYa7rzicNipbNuHnLkzKsp+Edug50zUe5MU5S5xtE/EQBJsKlfeIZmom0b1PCc6buQKPnRuhxNVARMcKELOdGef61say7/A2eFbsm4brZ/P777Y5CN5bofn1O/xbmASqurvb2un2QeS9xPQr7fdnD4rlNoBM+mVKOjjdF2jd4KfKUxIfEkiX0D2c92tubqoFnhEWxqixwQeXEu56s5J9sL40tZhOYdbLKEqT4RnVfyLrl9V1m4ssr+YUVU6BdaAwR1JqefzcgJ3PssRwShIGeqC/Gfu57a7f+1CzwmY2NiLckl0rdS6hyuX2lhfzWh1zPE3zxejsvYauHpyUiZO6SbD2YLew+np7e4RBItmWLhAXJ9qzRmmHqVLiUxT2KRYw/Zeh8Wc c4UxzZ2C Kgg7hNlPDt8Fxh+AwkR0F3HGir8Dt7nfVpyQwYFciygGJIg0N9aGGUEWy2N3xI2ej+jnuy3ookTATOKB+aaF7lgKyqWhSPYz+aRvAzbHZXHMXLUORrk2pLVeLXRTDutog4RocdOSuBzpb9hYoJ8cfBqjF+XxUU43K9WnxhzewHtLBNxSR5uzJwxkMVefN6+bJsa54R3wkj9jI8d2b6brRpW9u6ixzU9x8aOo9JL5eaQfiDznXEba/eg5nWVcRYCr138HjVKWzkxKvCe7k1hu/elm4EatsVf3oLnAx0dsZMNHPdI/bt6/336ljf4iGA6QJrrvaXpApct1CRSuuFcQWLFZkp0syIjuMPwjM07cq/OkY1D0KERqscAdg2/Ux1g8EpDj+K9emV+xO0DSHSBJ9dv0KSVqfnFUOihOlfv597Pq0sfmbhFFfbtzgPe+Dq7vc5rD7wBbEf5iIP14K8wcyKwx773vul2r25ZixPhmte7tXbfyIjiPfiZ6rPQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/17/25 11:13 PM, Gregory Price wrote: > It is possible for a reclaimer to cause demotions of an lruvec belonging > to a cgroup with cpuset.mems set to exclude some nodes. Attempt to apply > this limitation based on the lruvec's memcg and prevent demotion. > > Notably, this may still allow demotion of shared libraries or any memory > first instantiated in another cgroup. This means cpusets still cannot > cannot guarantee complete isolation when demotion is enabled, and the > docs have been updated to reflect this. > > This is useful for isolating workloads on a multi-tenant system from > certain classes of memory more consistently - with the noted exceptions. > > Signed-off-by: Gregory Price > --- > .../ABI/testing/sysfs-kernel-mm-numa | 14 ++++--- > include/linux/cgroup.h | 7 ++++ > include/linux/cpuset.h | 5 +++ > include/linux/memcontrol.h | 9 ++++ > kernel/cgroup/cgroup.c | 5 +++ > kernel/cgroup/cpuset.c | 22 ++++++++++ > mm/vmscan.c | 41 +++++++++++-------- > 7 files changed, 82 insertions(+), 21 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-numa b/Documentation/ABI/testing/sysfs-kernel-mm-numa > index 77e559d4ed80..27cdcab901f7 100644 > --- a/Documentation/ABI/testing/sysfs-kernel-mm-numa > +++ b/Documentation/ABI/testing/sysfs-kernel-mm-numa > @@ -16,9 +16,13 @@ Description: Enable/disable demoting pages during reclaim > Allowing page migration during reclaim enables these > systems to migrate pages from fast tiers to slow tiers > when the fast tier is under pressure. This migration > - is performed before swap. It may move data to a NUMA > - node that does not fall into the cpuset of the > - allocating process which might be construed to violate > - the guarantees of cpusets. This should not be enabled > - on systems which need strict cpuset location > + is performed before swap if an eligible numa node is > + present in cpuset.mems for the cgroup. If cpusets.mems > + changes at runtime, it may move data to a NUMA node that > + does not fall into the cpuset of the new cpusets.mems, > + which might be construed to violate the guarantees of > + cpusets. Shared memory, such as libraries, owned by > + another cgroup may still be demoted and result in memory > + use on a node not present in cpusets.mem. This should not > + be enabled on systems which need strict cpuset location > guarantees. > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index f8ef47f8a634..2915250a3e5e 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -632,6 +632,8 @@ static inline void cgroup_kthread_ready(void) > > void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen); > struct cgroup *cgroup_get_from_id(u64 id); > + > +extern bool cgroup_node_allowed(struct cgroup *cgroup, int nid); > #else /* !CONFIG_CGROUPS */ > > struct cgroup_subsys_state; > @@ -681,6 +683,11 @@ static inline bool task_under_cgroup_hierarchy(struct task_struct *task, > > static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) > {} > + > +static inline bool cgroup_node_allowed(struct cgroup *cgroup, int nid) > +{ > + return true; > +} > #endif /* !CONFIG_CGROUPS */ > > #ifdef CONFIG_CGROUPS > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h > index 893a4c340d48..c64b4a174456 100644 > --- a/include/linux/cpuset.h > +++ b/include/linux/cpuset.h > @@ -171,6 +171,7 @@ static inline void set_mems_allowed(nodemask_t nodemask) > task_unlock(current); > } > > +extern bool cpuset_node_allowed(struct cgroup *cgroup, int nid); > #else /* !CONFIG_CPUSETS */ > > static inline bool cpusets_enabled(void) { return false; } > @@ -282,6 +283,10 @@ static inline bool read_mems_allowed_retry(unsigned int seq) > return false; > } > > +static inline bool cpuset_node_allowed(struct cgroup *cgroup, int nid) > +{ > + return false; > +} > #endif /* !CONFIG_CPUSETS */ > > #endif /* _LINUX_CPUSET_H */ > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 53364526d877..2906e4bb12e9 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -1736,6 +1736,11 @@ static inline void count_objcg_events(struct obj_cgroup *objcg, > rcu_read_unlock(); > } > > +static inline bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid) > +{ > + return memcg ? cgroup_node_allowed(memcg->css.cgroup, nid) : true; > +} > + > #else > static inline bool mem_cgroup_kmem_disabled(void) > { > @@ -1793,6 +1798,10 @@ static inline void count_objcg_events(struct obj_cgroup *objcg, > { > } > > +static inline bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid) > +{ > + return true; > +} > #endif /* CONFIG_MEMCG */ > > #if defined(CONFIG_MEMCG) && defined(CONFIG_ZSWAP) > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index afc665b7b1fe..ba0b90cd774c 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -7038,6 +7038,11 @@ int cgroup_parse_float(const char *input, unsigned dec_shift, s64 *v) > return 0; > } > > +bool cgroup_node_allowed(struct cgroup *cgroup, int nid) > +{ > + return cpuset_node_allowed(cgroup, nid); > +} > + > /* > * sock->sk_cgrp_data handling. For more info, see sock_cgroup_data > * definition in cgroup-defs.h. > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index d6ed3f053e62..31e4c4cbcdfc 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -4163,6 +4163,28 @@ bool cpuset_current_node_allowed(int node, gfp_t gfp_mask) > return allowed; > } > > +bool cpuset_node_allowed(struct cgroup *cgroup, int nid) > +{ > + struct cgroup_subsys_state *css; > + unsigned long flags; > + struct cpuset *cs; > + bool allowed; > + > + css = cgroup_get_e_css(cgroup, &cpuset_cgrp_subsys); > + if (!css) > + return true; > + > + cs = container_of(css, struct cpuset, css); > + spin_lock_irqsave(&callback_lock, flags); > + /* At least one parent must have a valid node list */ > + while (nodes_empty(cs->effective_mems)) > + cs = parent_cs(cs); For cgroup v2, effective_mems should always be set and walking up the tree isn't necessary. For v1, it can be empty, but memory cgroup and cpuset are unlikely in the same hierarchy. Cheers, Longman