From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AA56C369AB for ; Tue, 22 Apr 2025 02:02:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2AA7F6B000E; Mon, 21 Apr 2025 22:02:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 259F36B0010; Mon, 21 Apr 2025 22:02:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FAB06B0011; Mon, 21 Apr 2025 22:02:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E5F1C6B000E for ; Mon, 21 Apr 2025 22:02:31 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AFA1516184B for ; Tue, 22 Apr 2025 02:02:31 +0000 (UTC) X-FDA: 83360030502.18.6D85CFC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 5A6D21A0002 for ; Tue, 22 Apr 2025 02:02:29 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FlwYUlk1; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of llong@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=llong@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745287349; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wVDUl4XVuutoa7W6ehuEIlOCIvIKUBN9QU/AW89gF3Q=; b=7/5o2kF33jru/9zpBPtEIRapb6Ih6mzRmrOGtodPMiTtJOf5ciVeqZB4g1UKD6I0PIA+eV StMmNhCm3iIJl58Xp8Jo/17EHedmdHEtcbn72EV/qo27YnllmGUjUIMRAvaKqfsyDZ6xgo KAL/1N4344V2ZIEmayzAccnOInKv8Pw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745287349; a=rsa-sha256; cv=none; b=nhu4bo83/FpMw5RRzk6iJLWPGTYhjn5InTkref6Gs3mPwXVL7+IfQ0u2aKNEOqueYs/QLo yaahHqOi1L+OqcEJ9fjuoQooaqcIC3O3U4cLk+Xd+Wi8cvo4wA+CR7Z9A0j0u/AP1vRDdV 6eIROMQ55Xs7uS15J3TZvJQPpu6Clmg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FlwYUlk1; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of llong@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=llong@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745287348; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wVDUl4XVuutoa7W6ehuEIlOCIvIKUBN9QU/AW89gF3Q=; b=FlwYUlk1IovSu52z8lGLtNt3MHW/HbhGmlIBNyEPV1uJ0urfUFCFSd5VEejzUdj1BKF7yr W13aln0YNskI3eP/GRu5eEygjEfBMmr1g4SrdpWcmHEaw2F9FXqEE11bXgqqrhgEsGGU+/ NH6LTQhUMb5BabAcN13eW4JkWanIEXc= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-402-qRj21Z3cNNy8UZyDUzjVDA-1; Mon, 21 Apr 2025 22:02:27 -0400 X-MC-Unique: qRj21Z3cNNy8UZyDUzjVDA-1 X-Mimecast-MFC-AGG-ID: qRj21Z3cNNy8UZyDUzjVDA_1745287347 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-7c54e7922a1so621792285a.2 for ; Mon, 21 Apr 2025 19:02:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745287347; x=1745892147; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:user-agent:mime-version:date:message-id:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wVDUl4XVuutoa7W6ehuEIlOCIvIKUBN9QU/AW89gF3Q=; b=pw18YXrYCJz3JdBpt5gZR6CTOEkjf8jy7ACiDnBhyYCpjrwMWmk6VGzIYVN7w1t1/O fWrd3Xaz6gsfv8wrZuF1t82u5xo6vqInMgqp5nLfIkc9ajgn52LDbamEWl4LVKr05d/I 5ewUBFcaYrCmCfeHzLDNhsUrQ4dY/kSJu4cwuCeaQSNwbq41kOU+6ABndPjFhb3iYRFw 6HPjLn3IjQBpihzafjh11S6gfa+q6IC2EugN9yqMdekW6DiSBRjlL43VCsfVomZePtLk shPbZI1DAcJi/ZinI8MzGG8Cs5azFGyfqmIf+aSwGHBdHEO1uu2nYnxk7pzB1xb3aT+x g0ng== X-Forwarded-Encrypted: i=1; AJvYcCWNvfjHl3R55V7vr2HuQ/nV1+H9JrLxGDvGkuRc0sN8gjvd2RFWasA4ITLP8sFKlh4c5hHSWT9GNw==@kvack.org X-Gm-Message-State: AOJu0YyX5umyye6D4cru1JBqGaBg+lJjJ9qhAeCP7eNCHAvtB7iUqC6l tPihPtWMWez+9NT20L6BjQJ/dpgeXK+3Ra3fEkjXWMHs/dRMHL1XEpotL2H7K5tjaV6pZ26cPs2 An702u/1Co+38kirgzmL9Utwk0J551n1k/8iRgjIqJQicGc2w X-Gm-Gg: ASbGncvJc4ZGiA7dtRX6LEbNIBWwOdBji71Jih4qJqnxVwZveaRz1OEKAaO2GZLc6OW H4CnkmBeHnMzK/BBPEwouWo2fcGF5NehY8UqH4i/1vvLJnuxe8KYb9I8TUxmd3YFwnLbQ8nGxMB GowcbO52xAclBFb5INOM1tHIwe6r6X9S/JMnkxVptdKKytX39Z/vCVDV//HWElWOT7Pvmitu0Xn 5/Ez5UDJd8NlO1Tmpp2C726OboNitlnIrpPO9Qog6xT+8uOXhCJdsSV2dlvCZ7BOdRy4KOdBnTY LqYRIouEYkFWqiBWBJc4R/PxzyckDLlI9ozqcaecs74c+iDA+C0nKvMLcQ== X-Received: by 2002:a05:620a:394e:b0:7c5:e2fe:f431 with SMTP id af79cd13be357-7c928049fc1mr2423339685a.52.1745287346680; Mon, 21 Apr 2025 19:02:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFkvsVukE5sYNg24vtRwuD8ekLBV3eTy0mZIwunL0RQt6blOekbXA8+cDInRCyblxUVlgDdrw== X-Received: by 2002:a05:620a:394e:b0:7c5:e2fe:f431 with SMTP id af79cd13be357-7c928049fc1mr2423335585a.52.1745287346244; Mon, 21 Apr 2025 19:02:26 -0700 (PDT) Received: from ?IPV6:2601:408:c101:1d00:6621:a07c:fed4:cbba? ([2601:408:c101:1d00:6621:a07c:fed4:cbba]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c925b78f0csm492272385a.105.2025.04.21.19.02.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 21 Apr 2025 19:02:25 -0700 (PDT) From: Waiman Long X-Google-Original-From: Waiman Long Message-ID: Date: Mon, 21 Apr 2025 22:02:22 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 2/2] vmscan,cgroup: apply mems_effective to reclaim To: Gregory Price , linux-mm@kvack.org Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tj@kernel.org, mkoutny@suse.com, akpm@linux-foundation.org References: <20250422012616.1883287-1-gourry@gourry.net> <20250422012616.1883287-3-gourry@gourry.net> In-Reply-To: <20250422012616.1883287-3-gourry@gourry.net> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 1PA0HMOW2RTNd3dOSXXyzOvxBNJfh7j2E5QNSO9Wkuc_1745287347 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 5A6D21A0002 X-Rspam-User: X-Stat-Signature: ec5gg3px1h9rnhyu8jzdbw1nb1c1yeeo X-HE-Tag: 1745287349-552109 X-HE-Meta: U2FsdGVkX1+TYfEDmnZG0phKvBWXu8Y3KcjK5ZfUJhBBm7PLPNvZq6wPfNG7DLM7t0QXJRuxecam+tscGkyRhM/EcrcrQuHOnBFX0J11DzdP2ZglaKrAEx/lGwPbxoMWVezvZVnInHJYJyRWBFyTbb6IHnhCKhyUDrdl2dFVX8NNzFY9XhzJ5YjDuZ5zcBdwq4ExGyIVRuWAz1DoVIXfheR+7XTmOnUQfKMddQ+SJBZKFUys5PbYj7UWYZDUXbblFEw5A+MgbFUBU9v+/Knsao1sUTTW9B4Eq3tVeKoY4ppix46sTDqla39CEc8ZqYcT/dC7A/b84dh+6iPqcWcClmkNNYmMPjgjLSUi+aQzga1JAhmRLJgkgHJWeOvDdKfDzZIP37cfGqVzsz2NqAVkS4X6FzjGLDIDLFgMUUUetNzFUiYSAHOKfFp5UDaO4b2k1v36o03yttgiCSJkfXtvOcW2hqS6QTK/0kh4oGtJycXeZqrOIjL9eSKH/pPYr5OYD2MosboPpzHY29gj0RQ7Mx1l/efsvtoOLZdrY59fNJqtNG58Vuzz56HKiQ7tKjMYokMBBdgcquHwm5YcNGQTssf4LKRQFEWdrMo7WuGyQCu5BpRsfmOsEDxk89ZO7RZ333f3VH6LHJDQU4dDE2iWuV/K/iwWtsRzpiS/hfXYQM1kH+j1RkNLyrjcrbe7125tt+YX/lgIcjb9lDwHGWJujPIKxzgnalFHKXyCUTUgycdkMlnuMQ4Dx5piZa+cQ7GgyTIdU6yp+60K5Rv68yUvQ1xGSlYO5wFbq1MboRBdDqQoXdwA+Yew3TVZiV/fJA9UHW9UxfayrcSYeu9uuxvzhXYbD3p92o+cEttn/5vHaicbZN58bYn+Lx1tSbtOVFUIqEJMO1o2kP8Ah41RGNN7pmygxw8LBuXwoMIIQUWUJHC6R+CyHRxZr9KVAiwNNwiDJzPx5RM52aUQSlcQmcw 1sj5WQMT fhY6srPVdlMYpcyIzfPcrgj+LA8eMs8UHO/dbyhDZYqJYy77OPZA6Qn/snzNMN5rq2KaPUX0G9Na0h22n+ws1A/DcC/zb/zwBA1f0PIrGLVZoZuYjL5b3KmkI6de2SnIJZGbueLzJtaH7UffPBi0tN+X1/jmI6H9yt2ZRmqxrUmdncJNuiVcDgvNSCnJeLARuNIyLTS0iec6PWcrgFKUUU5E/aLI/OiX7uJqepHShRpiH2yMmOaHm/TjVhNX6ruF1W9n4Q41v6Xhm/uXC75b1vahGxTRqgVXeum4pnVTlnh70h/Bw6eR/ycVHMIiDP0ad9FLZSxeJfaZncfaOsSHdcWeRLPf9lrmEAHwyKxMAQ1rqzbb1yUHvDaNCyUN9x2LL9MpgX4YxebXLuMjTgTY8Qeq+Z1694cv+UcHO2UzJ8xMWfbnpA/YjFsN9BdL0bSnhxzGpGW8VUZ7Y8r3FC5HJC/CCq67O/CcE+Lm5xvORc6Ty8kDpkpxRunvlOUhg4VjQjdAyel1HIRUIDW1qGdnrluULq56pYwwmSm68C434aYNqgIvhjW/i6XpLsw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/21/25 9:26 PM, Gregory Price wrote: > It is possible for a reclaimer to cause demotions of an lruvec belonging > to a cgroup with cpuset.mems set to exclude some nodes. Attempt to apply > this limitation based on the lruvec's memcg and prevent demotion. > > Notably, this may still allow demotion of shared libraries or any memory > first instantiated in another cgroup. This means cpusets still cannot > cannot guarantee complete isolation when demotion is enabled, and the > docs have been updated to reflect this. > > This is useful for isolating workloads on a multi-tenant system from > certain classes of memory more consistently - with the noted exceptions. > > Acked-by: Tejun Heo > Signed-off-by: Gregory Price > --- > .../ABI/testing/sysfs-kernel-mm-numa | 16 +++++--- > include/linux/cpuset.h | 5 +++ > include/linux/memcontrol.h | 6 +++ > kernel/cgroup/cpuset.c | 26 ++++++++++++ > mm/memcontrol.c | 6 +++ > mm/vmscan.c | 41 +++++++++++-------- > 6 files changed, 78 insertions(+), 22 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-numa b/Documentation/ABI/testing/sysfs-kernel-mm-numa > index 77e559d4ed80..90e375ff54cb 100644 > --- a/Documentation/ABI/testing/sysfs-kernel-mm-numa > +++ b/Documentation/ABI/testing/sysfs-kernel-mm-numa > @@ -16,9 +16,13 @@ Description: Enable/disable demoting pages during reclaim > Allowing page migration during reclaim enables these > systems to migrate pages from fast tiers to slow tiers > when the fast tier is under pressure. This migration > - is performed before swap. It may move data to a NUMA > - node that does not fall into the cpuset of the > - allocating process which might be construed to violate > - the guarantees of cpusets. This should not be enabled > - on systems which need strict cpuset location > - guarantees. > + is performed before swap if an eligible numa node is > + present in cpuset.mems for the cgroup (or if cpuset v1 > + is being used). If cpusets.mems changes at runtime, it > + may move data to a NUMA node that does not fall into the > + cpuset of the new cpusets.mems, which might be construed > + to violate the guarantees of cpusets. Shared memory, > + such as libraries, owned by another cgroup may still be > + demoted and result in memory use on a node not present > + in cpusets.mem. This should not be enabled on systems > + which need strict cpuset location guarantees. > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h > index 893a4c340d48..5255e3fdbf62 100644 > --- a/include/linux/cpuset.h > +++ b/include/linux/cpuset.h > @@ -171,6 +171,7 @@ static inline void set_mems_allowed(nodemask_t nodemask) > task_unlock(current); > } > > +extern bool cpuset_node_allowed(struct cgroup *cgroup, int nid); > #else /* !CONFIG_CPUSETS */ > > static inline bool cpusets_enabled(void) { return false; } > @@ -282,6 +283,10 @@ static inline bool read_mems_allowed_retry(unsigned int seq) > return false; > } > > +static inline bool cpuset_node_allowed(struct cgroup *cgroup, int nid) > +{ > + return true; > +} > #endif /* !CONFIG_CPUSETS */ > > #endif /* _LINUX_CPUSET_H */ > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 53364526d877..a6c4e3faf721 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -1736,6 +1736,8 @@ static inline void count_objcg_events(struct obj_cgroup *objcg, > rcu_read_unlock(); > } > > +bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid); > + > #else > static inline bool mem_cgroup_kmem_disabled(void) > { > @@ -1793,6 +1795,10 @@ static inline void count_objcg_events(struct obj_cgroup *objcg, > { > } > > +static inline bool mem_cgroup_node_allowed(struct mem_cgroup *memcg, int nid) > +{ > + return true; > +} > #endif /* CONFIG_MEMCG */ > > #if defined(CONFIG_MEMCG) && defined(CONFIG_ZSWAP) > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index f8e6a9b642cb..c52348bfd5db 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -4163,6 +4163,32 @@ bool cpuset_current_node_allowed(int node, gfp_t gfp_mask) > return allowed; > } > > +bool cpuset_node_allowed(struct cgroup *cgroup, int nid) > +{ > + struct cgroup_subsys_state *css; > + struct cpuset *cs; > + bool allowed; > + > + /* > + * In v1, mem_cgroup and cpuset are unlikely in the same hierarchy > + * and mems_allowed is likely to be empty even if we could get to it, > + * so return true to avoid taking a global lock on the empty check. > + */ > + if (!cpuset_v2()) > + return true; > + > + css = cgroup_get_e_css(cgroup, &cpuset_cgrp_subsys); > + if (!css) > + return true; > + > + cs = container_of(css, struct cpuset, css); > + rcu_read_lock(); Sorry, I missed the fact that cgroup_get_e_css() will take a reference to the css and so it won't go away. In that case, rcu_read_lock() isn't really needed. However, I do want a comment to say that accessing effective_mems should normally requrie taking either a cpuset_mutex or callback_lock, but is skipped in this case to avoid taking a global lock in the reclaim path at the expense that the result may be inaccurate in some rare cases. Cheers, Longman Cheers, Longman