From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B2E6C36010 for ; Tue, 8 Apr 2025 18:36:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F2ADD6B00D9; Tue, 8 Apr 2025 14:36:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED9946B00DA; Tue, 8 Apr 2025 14:36:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9FDE6B00DB; Tue, 8 Apr 2025 14:36:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BC1666B00D9 for ; Tue, 8 Apr 2025 14:36:26 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CE7501A02B2 for ; Tue, 8 Apr 2025 18:36:27 +0000 (UTC) X-FDA: 83311732014.09.30AB182 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 600011C0004 for ; Tue, 8 Apr 2025 18:36:25 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jQWCQr6D; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf18.hostedemail.com: domain of llong@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=llong@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744137385; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Se1FHByXQ0bG9ZLAbgCVErfU/qkKL9uOy+5u+NBsYCA=; b=ZsFoKG8ebNjfN/OA11v0Zbiouz1CvHU282Be9GbJX6fx7e7Y0NB125nStTOMe5kKE2qndj ZKpKKo4K/HU9Yg5cWEKRWmxxYfoGCWLsPnsHwwXl8lf7XE9H0WjZIsFsrA5GsEH+LRDZZc AsPcNToOwSFXwlDMBP1eOYum4EGj+nU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744137385; a=rsa-sha256; cv=none; b=nePIcH4qmIYPmc1EL3sN2WUR5fGRWomjDQUN998zjivNRoVNlZIOo/cz0i1YD13bIOkpLq EDyRyK/rbHWzcgef/wEbmFtDf1PVFUXx2X0i/uoibgBdJn3kd7sdtCzF5Eto6H051P7tOO kKX3tsSOpqCzBetIlOnaDbDVRnwWAuQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jQWCQr6D; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf18.hostedemail.com: domain of llong@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=llong@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1744137384; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Se1FHByXQ0bG9ZLAbgCVErfU/qkKL9uOy+5u+NBsYCA=; b=jQWCQr6Dsq2mdIyBi4ojDzaDfCO8XIIYl5W7j1L4CapGe3flqMfbsEyg7yvnWG0HvtZGsu JQ4bxBy5rX3fxm0y/1n/MIaz3mDsdE12sxqP3i9EtzftcwYj1nRUOKdvkAfjgeVgLNjYCa EBadD3VgWsHNX+DGaCDd2SoBomp+vzw= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-416-0K-RXIEFNCmGAmYfVCSvRQ-1; Tue, 08 Apr 2025 14:36:21 -0400 X-MC-Unique: 0K-RXIEFNCmGAmYfVCSvRQ-1 X-Mimecast-MFC-AGG-ID: 0K-RXIEFNCmGAmYfVCSvRQ_1744137381 Received: by mail-il1-f199.google.com with SMTP id e9e14a558f8ab-3d43c0dbe6aso126340785ab.1 for ; Tue, 08 Apr 2025 11:36:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744137381; x=1744742181; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:user-agent:mime-version:date:message-id:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Se1FHByXQ0bG9ZLAbgCVErfU/qkKL9uOy+5u+NBsYCA=; b=c9imElnsI+8Xh2FgLCSYuGZ50NPmYgHl5vfI4r/BYUNQR4R2OMIh5ZqogQuzDklL7E EM9mIGXQvBiBSLPNq/IR4LFghN38p0lEuVBf3N1v/g2M+zt+ZxtTwtlkGmDuzdRoDnUu AlRSKPx8Am9Lqbh5a1hKTS7gpMVPEiu5A1sTiTToEern4NQeRbXN8utmnGQKWnF6e8N4 sGfZbsqX6zqGo8u03Um0gM1wXEDkId/MKPZ/5u6VPELWOaeATzmuL+S0WF1a43XgfRzn j3sxKFjXWd6425WOXQ7qL6tXErl5m3wbcJUTrVAaI7gsUmGGD4cX2GygxwjMRYFZYmlL 5v/Q== X-Forwarded-Encrypted: i=1; AJvYcCWD85xSbRQSN4W20sznZ3clKtAxec58/9Xp+vnaxKVfKRvn1G5njwC19i6Z1pGPncD83HajJ282cw==@kvack.org X-Gm-Message-State: AOJu0Ywiu/1CZ+ViOZ0JwLyd2TcL1E3da4M0RuwxVB9HIB9QQ3lL2IFS Fi/viG/u3Wpz71uULXKdNMEA/xokfV4VuqjWkrp3gJINIyLmhj8u7uJ6XQ3n31+A/jz02UE9Pr3 28X15CAVbcaaAP/3hbtTrCAZS6RQvDIAE5WFdg8cfUd/iBiYt X-Gm-Gg: ASbGncuiKHZoUfiJS/5ga0ya1J5k2ggHztk/+xkWDEODRUxS9Xm/LJr93E1tjqBkNP7 hxXJrCQuD3psVgQI56lmufH7ovcPR9v00cTXunGnrt/zINnnK75OFSAyxbGR2fIzTRMRRv5jy5c fRD07EL+5tw2vcuSn2lgw+GK/UxcV3XgduP7oQL/9Ba2NR8K0j0DR2rZen87y+CUYOOol6XtkRj v7b3iSBq/UDbpLFhi5GN3YlkSIhlSN1bKd5EQWZaDqUhH8NEdhlC+O0kwJ+Q4nCWWJitGz1jBNA KJZnsFnpaWqmm6wyvbwtP2k9PYbdmzczqsDh5QGVtCc7c2JTjLOIsqaKlZvwAQ== X-Received: by 2002:a05:6e02:3712:b0:3a7:88f2:cfa9 with SMTP id e9e14a558f8ab-3d77c281bfemr116785ab.11.1744137380645; Tue, 08 Apr 2025 11:36:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHE7yFZdGQ6awpFHZ0/pUXwSjXgM/x9GR8AYdv0kxsLsofBNsZ3deO5hBmn+DaHXlnkd8lByA== X-Received: by 2002:a05:6e02:3712:b0:3a7:88f2:cfa9 with SMTP id e9e14a558f8ab-3d77c281bfemr116395ab.11.1744137380146; Tue, 08 Apr 2025 11:36:20 -0700 (PDT) Received: from ?IPV6:2601:188:c100:5710:315f:57b3:b997:5fca? ([2601:188:c100:5710:315f:57b3:b997:5fca]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-3d703b6bb26sm5893725ab.7.2025.04.08.11.36.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 08 Apr 2025 11:36:19 -0700 (PDT) From: Waiman Long X-Google-Original-From: Waiman Long Message-ID: Date: Tue, 8 Apr 2025 14:36:16 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH] vmscan,cgroup: apply mems_effective to reclaim To: Gregory Price , linux-mm@kvack.org Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, akpm@linux-foundation.org References: <20250320210919.439964-1-gourry@gourry.net> In-Reply-To: <20250320210919.439964-1-gourry@gourry.net> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Ktdtg-_vEe2EBsWoJFdq3M6XL82-U0BaU2stxZa2joU_1744137381 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 600011C0004 X-Stat-Signature: 6a61g4kzhof61hgritnqcmg3z8qmopyi X-Rspam-User: X-HE-Tag: 1744137385-135713 X-HE-Meta: U2FsdGVkX1+uC9penpJONm5hY1zIRCRL7CpsatziUzwuLa2YaVmVs1jwd4RC+u4XLTuDaM20WcuXLscjvT46U83n0zsIbkB3dXYenrW8s+ZkIOOgs1kU8m3HqqYnOlJ6LnCJnqkzePg2xgTUNmqjb58yvdQQ+1H/QiFgxoRFV9jBiykD0nEK0kGGG6+vig26c6MZNOz6c4C9wQX/QF/ZG5atj6yFB2Wd/fiQppVQd12SmReV92Ow177iprntf4LfI3lWGDoi8zc0VTRJXp621eBQN1tox9zi1DsWgG2lWdW5QFvuhs53WmGZCc6Yj+1s4ObdXLX4Gd5KFWTkQSVxKK4WDVDn0/i4cxcCDN1a6nJihrAjhVMuWC+TXEwQYge+alrd/M4mrkVRvFxS6H5DVS9vnywfe59sFNfMzifshQ7dFGJDEM3XbxuC59MdI///r+xoUhB39pMCO5iaokxkXI2AFt/TZmwEedef1/dW18kOB3BbfrOrZyOae9/ddhSx2kwhfR8ZxvPtq88cOF2T/qQMz84cmbzojBCiS6NSPv2LuJ2yExKTe43Lkb6i/Z87MmIRx2PK0kA+aXB0XPF4azuJVDHp7YlLx9lC6ZXljNuPwlH+iz8ijEF9thKSzmWzoAInNBwD40QAgv783O9Bp7dr2etobP84Tbov9T2ygWZ+a3Opa+ACmKBm+k5AYpz7XhpPhhn0zx5VY1u57etdiDUIZAJMHV7eq0Hjw6jKpPiytPjlUt4RnDHBxLgaeGcL2l7XZYFQEXO7nHYrGKHrL1IqcqNTINjebwyaFsi09xiQ83Svzls/yOTZLD7bh1cnLSZesWa6SMePJHwc3jxOoEu3Y2iqPNvBvDcuhH6MrGxT1xgR0L/DmFUSy0IA9xnCfVrZWIgbAsTJa1mR5UbI0488+ctVkvtAMD4mfU/DTEl6WTkL0zACgQwuJmIM7aGOgmYIM4VxOC2SEIINq99 EAyFS114 smO3521apiB/gDODiow3BYj0Hl3zcjtgIaQCFqwCHmtLSsHFg24WGwaXCqCZ/RHjJlf43/beZfKHw4uFIbKEJgsoGM4xjt7JXeyYhNEfsJc/UAbdzR86FzvBOnLH3Odxb2m9d4i1SEBzOmT8pTtRLHlHZ1wa0oIUo2nAhPEWtpbEyh0wto8ORS3GwIsyYPSLBwVtJOhT+3fJl8FuJVazXBCYLnL4vKWM8grrzSTqAV2UECALbUAm/jtN6CntafevyuVbMPDUPyHMmCESacvlGDNqTGa5Df1oJwLWWC78QIO/HJ5gd6rbcAghIJsXEaOCw0IrBwBVfg57uyI0CNtgstU7Q2BtFn+FpNHzEtAlm1E3+CHZ2p1Dx4GLXPQ5MrDsz9yAEesPqp4CjDjtXkbC8Lo/4wbvom7r+BOQJ2MBnq4w7KXYJJ+eZLURsd3acFnsnZvphyCOPZ9nqGh7iNDOWdI0+6EePabfRQgIKr0GkDWSQTgk4OYwafatu2H6Ihnch082T3AK+OMA3q2KSifQP8VBFMg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/20/25 5:09 PM, Gregory Price wrote: > It is possible for a reclaimer to cause demotions of an lruvec belonging > to a cgroup with cpuset.mems set to exclude some nodes. Attempt to apply > this limitation based on the lruvec's memcg and prevent demotion. > > Notably, this may still allow demotion of shared libraries or any memory > first instantiated in another cgroup. This means cpusets still cannot > cannot guarantee complete isolation when demotion is enabled, and the > docs have been updated to reflect this. > > > Note: This is a fairly hacked up method that probably overlooks some > cgroup/cpuset controls or designs. RFCing now for some discussion > at LSFMM '25. > > > Signed-off-by: Gregory Price > --- > .../ABI/testing/sysfs-kernel-mm-numa | 14 +++++--- > include/linux/cpuset.h | 2 ++ > kernel/cgroup/cpuset.c | 10 ++++++ > mm/vmscan.c | 32 ++++++++++++------- > 4 files changed, 41 insertions(+), 17 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-numa b/Documentation/ABI/testing/sysfs-kernel-mm-numa > index 77e559d4ed80..27cdcab901f7 100644 > --- a/Documentation/ABI/testing/sysfs-kernel-mm-numa > +++ b/Documentation/ABI/testing/sysfs-kernel-mm-numa > @@ -16,9 +16,13 @@ Description: Enable/disable demoting pages during reclaim > Allowing page migration during reclaim enables these > systems to migrate pages from fast tiers to slow tiers > when the fast tier is under pressure. This migration > - is performed before swap. It may move data to a NUMA > - node that does not fall into the cpuset of the > - allocating process which might be construed to violate > - the guarantees of cpusets. This should not be enabled > - on systems which need strict cpuset location > + is performed before swap if an eligible numa node is > + present in cpuset.mems for the cgroup. If cpusets.mems > + changes at runtime, it may move data to a NUMA node that > + does not fall into the cpuset of the new cpusets.mems, > + which might be construed to violate the guarantees of > + cpusets. Shared memory, such as libraries, owned by > + another cgroup may still be demoted and result in memory > + use on a node not present in cpusets.mem. This should not > + be enabled on systems which need strict cpuset location > guarantees. > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h > index 835e7b793f6a..d4169f1b1719 100644 > --- a/include/linux/cpuset.h > +++ b/include/linux/cpuset.h > @@ -171,6 +171,8 @@ static inline void set_mems_allowed(nodemask_t nodemask) > task_unlock(current); > } > > +bool memcg_mems_allowed(struct mem_cgroup *memcg, int nid); > + > #else /* !CONFIG_CPUSETS */ > You should also define an inline function for the !CONFIG_CPUSETS case. > static inline bool cpusets_enabled(void) { return false; } > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 0f910c828973..bb9669cc105d 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -4296,3 +4296,13 @@ void cpuset_task_status_allowed(struct seq_file *m, struct task_struct *task) > seq_printf(m, "Mems_allowed_list:\t%*pbl\n", > nodemask_pr_args(&task->mems_allowed)); > } > + > +bool memcg_mems_allowed(struct mem_cgroup *memcg, int nid) > +{ > + struct cgroup_subsys_state *css; > + struct cpuset *cs; > + > + css = cgroup_get_e_css(memcg->css.cgroup, &cpuset_cgrp_subsys); > + cs = css ? container_of(css, struct cpuset, css) : NULL; > + return cs ? node_isset(nid, cs->effective_mems) : true; As said by Johannes, you will need to take the callback_lock to ensure the stability of effective_mems. I also second his suggestion of defining a cgroup_mems_allowed() here and do the the memcg to cgroup translation outside of cpuset.c. Cheers, Longman