From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3A0C51098781 for ; Fri, 20 Mar 2026 13:17:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 937AE6B0089; Fri, 20 Mar 2026 09:17:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E7966B008A; Fri, 20 Mar 2026 09:17:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FE406B008C; Fri, 20 Mar 2026 09:17:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6BDDD6B0089 for ; Fri, 20 Mar 2026 09:17:56 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 144ED1394E9 for ; Fri, 20 Mar 2026 13:17:56 +0000 (UTC) X-FDA: 84566494152.22.885D856 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf08.hostedemail.com (Postfix) with ESMTP id C0275160004 for ; Fri, 20 Mar 2026 13:17:53 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=AqKQ8rYK; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf08.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1774012674; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MKtpZf5K46Wg1Wx2V5ibY1Yk67Tz3/1joNKoFOtonG0=; b=MtpRcak2j5CedkPQlevAeGVEp0Bq0YoMqZSfot2YEJkdyYvo1iQU0cqy8LwAbSeg/lRdrU PVXrcGW1d0wVTbH6z61w1HIYODKXZdFSGhYWEDcorLl9yqd1OxAS09Xjp8+3ifduGLG90/ NvnT5xXDNY+iLNul/R8gz53rBL/o884= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1774012674; a=rsa-sha256; cv=none; b=t7Z9PMPYDAA+DY9UOmPmMk7FPA43X1ycyBk5vQj1PR91X2T9p866FCh8X1rnEaepy8hhGJ hrZTVVMRwG/TbKcV5UQYuQVdluW8ltfSUpHi2yX/TVamrPdtjH6MQkDVkMOXXLdd18R102 sCYywUsIjRpNHLb3RScThlMpwsgPUPY= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=AqKQ8rYK; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf08.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 62JNZYPj2776227; Fri, 20 Mar 2026 13:17:28 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pp1; bh=MKtpZf 5K46Wg1Wx2V5ibY1Yk67Tz3/1joNKoFOtonG0=; b=AqKQ8rYKx75GaEInNd+aiv Ok+UO0jNxgLi4h3Yff3Q1YyNFeWb9ihh50ydZ3ulrXqRrLSbU6AvPk3Vy9fR5wyH O3uhYVrPqtdp+DEgXB89U6qoBQjXYn8TvIo3ECPQSrgsB8VTz114JsFTSoAnfYro ls+WtFCcmZE0UXBVxyLWiYLU3il/PVFf4M9FkUxJqOh2yYtTbCAIMliqDmYcEvnN ji2gkcDFv763GVlfNbJrAtGx6LufuS/+7XnbLMB+gONMTDvUU3aBudIzMYKCsLmh jxGzHea9YHiAkg8IEGRpy6TIZu69N3MfZwAGkxsOmBP/7J1TK6ZAF9D9KtZpdArQ == Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4cvybskqc0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Mar 2026 13:17:28 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 62K8OBJM032397; Fri, 20 Mar 2026 13:17:27 GMT Received: from smtprelay04.dal12v.mail.ibm.com ([172.16.1.6]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4cwm7k73pp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Mar 2026 13:17:26 +0000 Received: from smtpav02.dal12v.mail.ibm.com (smtpav02.dal12v.mail.ibm.com [10.241.53.101]) by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 62KDHQbE22282890 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 20 Mar 2026 13:17:26 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 78BAB5805F; Fri, 20 Mar 2026 13:17:26 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6CD9158051; Fri, 20 Mar 2026 13:17:16 +0000 (GMT) Received: from [9.39.27.18] (unknown [9.39.27.18]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTP; Fri, 20 Mar 2026 13:17:16 +0000 (GMT) Message-ID: <380c52cb-fc8d-4fbe-8d2a-f153bd179816@linux.ibm.com> Date: Fri, 20 Mar 2026 18:47:14 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/3] mm/memcontrol: disable demotion in memcg direct reclaim To: Bing Jiao , linux-mm@kvack.org Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Rientjes , Yosry Ahmed , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Youngjun Park , David Hildenbrand , Qi Zheng , Lorenzo Stoakes , Axel Rasmussen , Yuanchu Xie , Wei Xu , Joshua Hahn References: <20260317230720.990329-1-bingjiao@google.com> <20260317230720.990329-3-bingjiao@google.com> Content-Language: en-US From: Donet Tom In-Reply-To: <20260317230720.990329-3-bingjiao@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Reinject: loops=2 maxloops=12 X-Authority-Analysis: v=2.4 cv=MMttWcZl c=1 sm=1 tr=0 ts=69bd48e8 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=IkcTkHD0fZMA:10 a=Yq5XynenixoA:10 a=VkNPw1HP01LnGYTKEx00:22 a=RnoormkPH1_aCDwRdu11:22 a=uAbxVGIbfxUO_5tXvNgY:22 a=1XWaLZrsAAAA:8 a=VnNF1IyMAAAA:8 a=wwFiQfuXH1A37782mjUA:9 a=QEXdDO2ut3YA:10 X-Proofpoint-ORIG-GUID: HGCOXiYaQMaSzoyKnKZenlLJDdyWg1EF X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMzIwMDEwNCBTYWx0ZWRfXyroHE1Hm43OA AobdbkflHXxYFhg/ihgDW/MwfX2E6ZwMS9GTOMh12x4cnHDUyJRqr0ncvkTAjxKFYdEZRV6cPIl YfC947Ty+iAmzTvzizuHJQHNXNY8pQRZSfh9KXGeBsE0A5RkQGYMIzy7C35z2F8JZya+MNQ8XrD Ylv6vPxjJ/wfaiePa2gVFkJAPSDqquofDGtyhY1B54W6W1wE5hbP3pmPu7DbXQBQry82qhcRNaA oK+KiYinqvYNKgY7po2BoSnzzCP72Up03RlevcpabnfR8TY1Vgiyc4429xKea7bRowDeinczDHF BlMuBPuCxcf5O2rZnhqB1GjI7qlRMX8WketnURRB9De06GOAXhLPfrL+fYPwKJhb2AgVLD9XHoP xcVJTkzmta4TTYsphv5r5IYhYPdkZxrmwgYatdNShQ/yVuqGBwDOBSsmTBq0+bbaUmq7PxT5ccM V+l9KFLjkzJgDaCxY0Q== X-Proofpoint-GUID: 4ktA1E3URIvtzNrvGo4hhyVJJxXiGbnk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-03-20_02,2026-03-19_05,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 lowpriorityscore=0 malwarescore=0 spamscore=0 priorityscore=1501 impostorscore=0 adultscore=0 phishscore=0 clxscore=1011 bulkscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2603050001 definitions=main-2603200104 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C0275160004 X-Stat-Signature: aqy7paqo7xmkanbt7gipwqen6znxabjc X-Rspam-User: X-HE-Tag: 1774012673-224516 X-HE-Meta: U2FsdGVkX18IyMdWQupABapGVYf2e5q3v3fLE0VpZxqtK7mr6f9jAfhCz73iQpwIq94liu7F2xjOkerDSxFr8gpkx89BA+ossT8/eBug7W89NWH6FtaIe2vKj8BULxDHPs6X8gDFDVLqIjHIZB15iPeNopfSF8QNfPBNZnb3Q7crBJtoKToV88fSKo6mbfgeVvOTcGzw2DfhP9ygtVj/K+ZNfWI6bOLD9/f5o6Ph4wyU/UBSmn4lb2sMzPqp6LS92QXTCQOJZTLVOIcNmBvsAqcu2wzlS89uoYy3AupSkaOTIswQeuc/LY92BcQE04r1HYi/HlS6AfpObtWwsbpG9VzXFLwynZgfxc8iz01UBy3tA3+CuZnsCtwXgqOtwnKvpl98Q2/nSYDpC6RiHEDZww9VrngX2OTXWtl85WuPDar+JGkvgGRQReb1uENey3vZKWcILO72FmEY4xLB+t3yEuM8gUp//RbfaXssSVYVg0wTb+pUS6+PWnevYbAN75fdxvmRyBOY87lnyB6SyVca5ICKxTPt4a+LrVbLDc7G/fI/x2+fXvCN2BoWo6v9TTfMwLIqIz5TvomOUhAEIMeZkvfqamcSoWgXSKUf0QDc1nXkH2QO3ViquQRJailS12zCHnMDfrb/dKISQQGRc8eQoxUyJiE9L8i5ZXbadAcN/K5mjUkD3WndfO8/PDn9RXcY1VxWwOHEyVswtANFiYo6IlXV0ESPc7O/cthKKpCS65i2FwAemVPt2C+Pacaiv8uuk91OjO2vXsXalfHRB4z7wNrNAc0RJ4E79L3Nk/ACADbtiu9sRfRoBc9gnGoZKIviKMmyRseTZOVItgTmuZMhrCUgVh2TPhJjT1i1FRCNbmt0412L9YHSlx4+0T1HrDGyoRzBl3FP20W76KSsbMQFaPlfle13/itf04GdK/vU0VsmYGOPmpC/iH3Aohx7wBkulsxboFHA9QVwzvBNgUx Oc6iAuNq yK0fEk/pR3uLSAq6/3RyzULrsokSsNPs89fRv/1mXOBsx87RiQoEMkgD+vA== Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Bing On 3/18/26 4:37 AM, Bing Jiao wrote: > NUMA demotion counts towards reclaim targets in shrink_folio_list(), but > it does not reduce the total memory usage of a memcg. In memcg direct > reclaim paths (e.g., charge-triggered or manual limit writes), where > demotion is allowed, this leads to "fake progress" where the reclaim > loop concludes it has satisfied the memory request without actually > reducing the cgroup's charge. > > This could result in inefficient reclaim loops, CPU waste, moving all > pages to far-tier nodes, and potentially premature OOM kills when the > cgroup is under memory pressure but demotion is still possible. > > Introduce the MEMCG_RECLAIM_NO_DEMOTION flag to disable demotion in > these memcg-specific reclaim paths. This ensures that reclaim > progress is only counted when memory is actually freed or swapped out. Thanks for the patch. With this change, are we completely disabling memory tiering in memcg? > > Signed-off-by: Bing Jiao > --- > include/linux/swap.h | 1 + > mm/memcontrol-v1.c | 10 ++++++++-- > mm/memcontrol.c | 16 +++++++++++----- > mm/vmscan.c | 1 + > 4 files changed, 21 insertions(+), 7 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 7a09df6977a5..e83897a6dc72 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -356,6 +356,7 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone > > #define MEMCG_RECLAIM_MAY_SWAP (1 << 1) > #define MEMCG_RECLAIM_PROACTIVE (1 << 2) > +#define MEMCG_RECLAIM_NO_DEMOTION (1 << 3) > #define MIN_SWAPPINESS 0 > #define MAX_SWAPPINESS 200 > > diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c > index 433bba9dfe71..3cb600e28e5b 100644 > --- a/mm/memcontrol-v1.c > +++ b/mm/memcontrol-v1.c > @@ -1466,6 +1466,10 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, > int ret; > bool limits_invariant; > struct page_counter *counter = memsw ? &memcg->memsw : &memcg->memory; > + unsigned int reclaim_options = MEMCG_RECLAIM_NO_DEMOTION; > + > + if (!memsw) > + reclaim_options |= MEMCG_RECLAIM_MAY_SWAP; > > do { > if (signal_pending(current)) { > @@ -1500,7 +1504,7 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, > } > > if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, > - memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) { > + reclaim_options, NULL)) { > ret = -EBUSY; > break; > } > @@ -1520,6 +1524,8 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, > static int mem_cgroup_force_empty(struct mem_cgroup *memcg) > { > int nr_retries = MAX_RECLAIM_RETRIES; > + unsigned int reclaim_options = MEMCG_RECLAIM_MAY_SWAP | > + MEMCG_RECLAIM_NO_DEMOTION; > > /* we call try-to-free pages for make this cgroup empty */ > lru_add_drain_all(); > @@ -1532,7 +1538,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) > return -EINTR; > > if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, > - MEMCG_RECLAIM_MAY_SWAP, NULL)) > + reclaim_options, NULL)) > nr_retries--; > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 303ac622d22d..fcf1cd0da643 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2287,6 +2287,8 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg, > gfp_t gfp_mask) > { > unsigned long nr_reclaimed = 0; > + unsigned int reclaim_options = MEMCG_RECLAIM_MAY_SWAP | > + MEMCG_RECLAIM_NO_DEMOTION; > > do { > unsigned long pflags; > @@ -2300,7 +2302,7 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg, > psi_memstall_enter(&pflags); > nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, > gfp_mask, > - MEMCG_RECLAIM_MAY_SWAP, > + reclaim_options, > NULL); > psi_memstall_leave(&pflags); > } while ((memcg = parent_mem_cgroup(memcg)) && > @@ -2572,7 +2574,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, > /* Avoid the refill and flush of the older stock */ > batch = nr_pages; > > - reclaim_options = MEMCG_RECLAIM_MAY_SWAP; > + reclaim_options = MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_NO_DEMOTION; > if (!do_memsw_account() || > page_counter_try_charge(&memcg->memsw, batch, &counter)) { > if (page_counter_try_charge(&memcg->memory, batch, &counter)) > @@ -2610,7 +2612,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, > > psi_memstall_enter(&pflags); > nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, > - gfp_mask, reclaim_options, NULL); > + gfp_mask, reclaim_options, NULL); > psi_memstall_leave(&pflags); > > if (mem_cgroup_margin(mem_over_limit) >= nr_pages) > @@ -4638,6 +4640,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > { > struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); > unsigned int nr_retries = MAX_RECLAIM_RETRIES; > + unsigned int reclaim_options = MEMCG_RECLAIM_MAY_SWAP | > + MEMCG_RECLAIM_NO_DEMOTION; > bool drained = false; > unsigned long high; > int err; > @@ -4669,7 +4673,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > } > > reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, > - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL); > + GFP_KERNEL, reclaim_options, NULL); > > if (!reclaimed && !nr_retries--) > break; > @@ -4690,6 +4694,8 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, > { > struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); > unsigned int nr_reclaims = MAX_RECLAIM_RETRIES; > + unsigned int reclaim_options = MEMCG_RECLAIM_MAY_SWAP | > + MEMCG_RECLAIM_NO_DEMOTION; > bool drained = false; > unsigned long max; > int err; > @@ -4721,7 +4727,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, > > if (nr_reclaims) { > if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, > - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL)) > + GFP_KERNEL, reclaim_options, NULL)) > nr_reclaims--; > continue; > } > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 33287ba4a500..7a8617ba1748 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -6809,6 +6809,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, > .may_unmap = 1, > .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), > .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), > + .no_demotion = !!(reclaim_options & MEMCG_RECLAIM_NO_DEMOTION), > }; > /* > * Traverse the ZONELIST_FALLBACK zonelist of the current node to put Did you run any performance benchmarks with this patch? This patch looks good to me. Feel free to add Reviewed by: Donet Tom > -- > 2.53.0.851.ga537e3e6e9-goog > >