From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8270DEB64D9 for ; Wed, 12 Jul 2023 07:44:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 021DF6B0071; Wed, 12 Jul 2023 03:44:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F141E6B0072; Wed, 12 Jul 2023 03:44:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDC4B6B0075; Wed, 12 Jul 2023 03:44:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CB19F6B0071 for ; Wed, 12 Jul 2023 03:44:30 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9306FAFD79 for ; Wed, 12 Jul 2023 07:44:30 +0000 (UTC) X-FDA: 81002172300.30.7066E9A Received: from bjm7-spam01.kuaishou.com (smtpcn03.kuaishou.com [103.107.217.217]) by imf05.hostedemail.com (Postfix) with ESMTP id 5373910001D for ; Wed, 12 Jul 2023 07:44:26 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=kuaishou.com header.s=dkim header.b=Qpzzs4dH; spf=pass (imf05.hostedemail.com: domain of yangyifei03@kuaishou.com designates 103.107.217.217 as permitted sender) smtp.mailfrom=yangyifei03@kuaishou.com; dmarc=pass (policy=none) header.from=kuaishou.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689147868; a=rsa-sha256; cv=none; b=tZTCSDm4hIzF3lJN6OqInFqYj0TZccuvy9DOcyBDvAwSj5SWZIhhTO1hl++SSJ4MGAfVsO oxC1vtM81WKTAh7ItyUa7hcJ+XBxBnSzqricIQrL5JttWwWhs7IN0Ww5pC9L1i5X8K3QfT tvBxPeLUX9Ps6wgALXaCsnknsoEXzNc= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=kuaishou.com header.s=dkim header.b=Qpzzs4dH; spf=pass (imf05.hostedemail.com: domain of yangyifei03@kuaishou.com designates 103.107.217.217 as permitted sender) smtp.mailfrom=yangyifei03@kuaishou.com; dmarc=pass (policy=none) header.from=kuaishou.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689147868; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4bsjq3/OpZknyXpOBaPqn5L7sbEDbyMoFRu21WzLBU4=; b=wqGWYam+Mnx8RAu6OdFcYKVLyRAOr0MfoDAHig0WOJstMGq3NxCKB0chlxNeNcq4Wk96pb 2cs6e1N0qJ5RPTMvBCWUrN6NlMGnGUR8JRTIlVNXmiOeavQkDTFkNhckgTEdGRvdpvokCX BvUQXLDnVVrxJcYha9AaMQg+aL5K0aU= Received: from bjm7-pm-mail12.kuaishou.com ([172.28.1.94]) by bjm7-spam01.kuaishou.com with ESMTPS id 36C7i54E091281 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 12 Jul 2023 15:44:05 +0800 (GMT-8) (envelope-from yangyifei03@kuaishou.com) DKIM-Signature: v=1; a=rsa-sha256; d=kuaishou.com; s=dkim; c=relaxed/relaxed; t=1689147699; h=from:subject:to:date:message-id; bh=4bsjq3/OpZknyXpOBaPqn5L7sbEDbyMoFRu21WzLBU4=; b=Qpzzs4dHeH87du7WuBb9IbhYP0oz6ali7QQhTQKv9+WJyjtZMAttJdDVI/iGf0ExokIf4ZKPI4f 8V0lQ3/8L2Mhkz++Sx1083Rod4J66wyizIDtQDIrIxlILnWSar7F4ZZwwcPYQ3P52eX5CpSC3LtHz bwYOudEdUpnRC4AC0IY= Received: from public-bjmt-d51.idcyz.hb1.kwaidc.com (172.28.1.32) by bjm7-pm-mail12.kuaishou.com (172.28.1.94) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.20; Wed, 12 Jul 2023 15:41:39 +0800 From: Efly Young To: CC: , , , Subject: Re:Re:[PATCH] mm:vmscan: fix inaccurate reclaim during proactive reclaim Date: Wed, 12 Jul 2023 15:42:52 +0800 Message-ID: <20230712074252.25894-1-yangyifei03@kuaishou.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230711152810.GA2627@cmpxchg.org> References: <20230711152810.GA2627@cmpxchg.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [172.28.1.32] X-ClientProxiedBy: bjm7-pm-mail06.kuaishou.com (172.28.1.6) To bjm7-pm-mail12.kuaishou.com (172.28.1.94) X-DNSRBL: X-MAIL:bjm7-spam01.kuaishou.com 36C7i54E091281 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 5373910001D X-Stat-Signature: xbwtg1i6gxtj4jhz6z1y8ke5994nduxg X-Rspam-User: X-HE-Tag: 1689147866-733778 X-HE-Meta: U2FsdGVkX19D0JZrBIs6+K3p0Izd43cpRfaInToZhGXivsfO/RkGZ9GGnm+u2zzN1ixRkH/Z2VwyZGrfdPaLRbTe8QHknenkpUVAE8kW24UilKQlMrP+BtVxcQ1S8LfitD5MLYySNDHVRCy2oq2aY+/oMtaB3aAM4VEWydpaeFys92PjJDah4P6ktOh8WnsA3YEO+4mxsACkRogBeRY8hl3fc5zqnlCs6SzD++QoTp6h4Da4zHDz5DbWeq4BnDng1t8UX4rLMV34FgoTdbk10QW3ty1KXcwZDhFbs/aVf00SXGtG2w2vRIk9K2MfZ9Y1zX5mN4XDFoqPPs863AsZmxizfrnmYQH87p7nsZ/3+Dmyw2i0ytx9Ts/+6QDFjKazUETXX7rDhVQJHhZZYRN/kG7/c7LQgr5idbA49AT/clL9CZciNT4Sc6y9SaRMxWUeCeUKmcI3yb+B88B3a2ZfxwqtEasx4T8K0BxLZPAjRZ93pB7MDf5RsfHt/Lq1Ywax28tSJvODWYzGX3aIbcoRTRSYbKqaVRfL4L8g6y0uHgoXgWubr7L5E230DwlsxVxhKf2Ad6GtqPvKh+J8n0UKkI4klYz3HPcumWzIVakzBUyQeJsunAhBsXLvqh/1P4sE5y/fMExQHCKSoCXMZqnRQL+bN4hdglAfOLGLQc3sJ5TJOJw8AQqrVlmYwr0QYOCYGPPU3WI3WupiD0uHwyN9LQ8+QNQfKnOpo94FSKN0F6jLYXPx5eVNfPIkZ7Z+dnCUkQytwPK7tELEoGS3od6oaeHcBeeZ4N1fTU9qX8AmM0udxayHe8wqIUROWbkk3zqKYGNxd97w3WqeftHtPHwJaS4/AGs3Kb6eGI5uwMdPCVTQySWbwObREArWwdaA7I8XObX4bgy7KBjopqEGVF91zI2thO7SAJWrFgDLalZUQo1LYJzuwRod6in7ErfIDlUv8JV4SKjiz+Rd8Uj5EP/ h8xXeuef yemutRhNyqsKSn0VJhvECGF9v3ubaqxVopJteZ6lEWEQ/sFc1eLL9HytnUCj+kxNmikrGr9W2ptOMIfHAq7EJISkJuDX1IicU/qBzKc+gf9lEQYmsG0rdbI1r8npFpy2YY/lOa5XhdcNI9qjfYMfMv4llIoV0cTAio3ST4/pfJYx0y740WS08tVnRchVmpv70jEd388wtHQamEq4FbUJZnWvykjQZx6zC314TS9ngimae5K9BQ9sGR08SqISw1E/f3zeJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> With commit f53af4285d77 ("mm: vmscan: fix extreme overreclaim >> and swap floods"), proactive reclaim still seems inaccurate. >> >> Our problematic scene also are almost anon pages. Request 1G >> by writing memory.reclaim will reclaim 1.7G or other values >> more than 1G by swapping. >> >> This try to fix the inaccurate reclaim problem. > > I can see how this happens. Direct and kswapd reclaim have much > smaller nr_to_reclaim targets, so it's less noticable when we loop a > few times. Proactive reclaim can come in with a rather large value. > > What does the reproducer setup look like? Are you calling reclaim on a > higher level cgroup with several children? Or is the looping coming > from having multiple zones alone? Thank you for your comment. The process in a leaf cgroup without children just malloc 20G anonymous memory and sleep, then calling reclaim in the leaf cgroup. Before commit f53af4285d77 ("mm: vmscan: fix extreme overreclaim and swap floods"), reclaimer may reclaim many times the amount of request. Now it should eventually reclaim in [request, 2 * request). >> Signed-off-by: Efly Young >> --- >> mm/vmscan.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index 9c1c5e8b..2aea8d9 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -6208,7 +6208,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) >> unsigned long nr_to_scan; >> enum lru_list lru; >> unsigned long nr_reclaimed = 0; >> - unsigned long nr_to_reclaim = sc->nr_to_reclaim; >> + unsigned long nr_to_reclaim = (sc->nr_to_reclaim - sc->nr_reclaimed); > > This can underflow. shrink_list() eats SWAP_CLUSTER_MAX batches out of > lru_pages >> priority, and only checks reclaimed > to_reclaim > after. This will then disable the bailout mechanism entirely. > > In general, I'm not sure this is the best spot to fix the problem: > > - During reclaim/compaction, should_continue_reclaim() may decide that > more reclaim is required before compaction can proceed. But the > second cycle might not do anything now, since you remember the work > done by the previous one. > > - shrink_node_memcgs() might do the full batch against the first > cgroup and not touch the second one anymore. This will result in > super lopsided behavior when you target a tree of multiple groups. > > There might be other spots that break, I haven't checked. > > You could go through them one by one, of course. But the truth is, > larger reclaim targets are the rare exception. Trying to support them > at the risk of breaking all other reclaim users seems ill-advised. I agree with your view. These explanations are more considerate. Thank you again for helping me out. > A better approach might be to just say: "don't call reclaim with large > numbers". Have proactive reclaim code handle the batching into smaller > chunks: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index e8ca4bdcb03c..4b016806dcc7 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6696,7 +6696,7 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, > lru_add_drain_all(); > > reclaimed = try_to_free_mem_cgroup_pages(memcg, > - nr_to_reclaim - nr_reclaimed, > + min(nr_to_reclaim - nr_reclaimed, SWAP_CLUSTER_MAX), > GFP_KERNEL, reclaim_options); > > if (!reclaimed && !nr_retries--) May be this way could solve the inaccurate proactive reclaim problem without breaking the original balance. But may be less efficient than before?