From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0AE2EE49A5 for ; Wed, 23 Aug 2023 02:01:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B31D280002; Tue, 22 Aug 2023 22:01:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 66387940007; Tue, 22 Aug 2023 22:01:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52A9E280002; Tue, 22 Aug 2023 22:01:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3FE97940007 for ; Tue, 22 Aug 2023 22:01:07 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id DE703804B0 for ; Wed, 23 Aug 2023 02:01:06 +0000 (UTC) X-FDA: 81153716532.27.2352E2D Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf22.hostedemail.com (Postfix) with ESMTP id A19C0C000C for ; Wed, 23 Aug 2023 02:01:03 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf22.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=liushixin2@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692756065; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LqJ8p2qHeHr7AgD2Bu+jacAi0yrioAsBF5Obe4pkhR4=; b=YIKt4RbS7ILVUIKXLevuhBIP6KcCkHvYj8Yfmlr9c/mVGOPlS0dLfO4g0+0aV2axksXGY0 qokwuX+A2uXb/I2ATi0UFpK+9+yvifvjbHcfu9xhnq15roPDMuR9v+sQOvyNtjXgt+a8lD DPvbf7/YLSDIqgZqR0Rh5aft1xYk02A= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf22.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=liushixin2@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692756065; a=rsa-sha256; cv=none; b=g2+G7iKubnmMJe0uZEuRzh581BlM2mLt7TrHG9mAxn6aMw1FeK6daeRh4j98AZV+z+kzEI 71vDkW2U/oh5TRuZCiVyfy3ol4a2hBlRBUM7QcDiPXjZsjtPeOxriLrCitB8dj8K1BHRmE XMkP9cjLG+4hVrKBz/x+/cVsiKNsUDk= Received: from dggpemm500009.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4RVq9066kJzNnDv; Wed, 23 Aug 2023 09:57:24 +0800 (CST) Received: from [10.174.179.24] (10.174.179.24) by dggpemm500009.china.huawei.com (7.185.36.225) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Wed, 23 Aug 2023 10:00:58 +0800 Subject: Re: [PATCH v2] mm: vmscan: reclaim anon pages if there are swapcache pages To: Yosry Ahmed References: <20230822024901.2412520-1-liushixin2@huawei.com> CC: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , , , , From: Liu Shixin Message-ID: <50c49baf-d04a-f1e3-0d0e-7bb8e22c3889@huawei.com> Date: Wed, 23 Aug 2023 10:00:58 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.24] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm500009.china.huawei.com (7.185.36.225) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: A19C0C000C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: o5zt6nb7phic5phufd6omzagi1noquey X-HE-Tag: 1692756063-30795 X-HE-Meta: U2FsdGVkX1+ocE4peF4Idgh6ryMKGB1cssxbxwe13XXfvCuIKhHUOOixRYa21inEIYUIdHqEefcTN+CYNhgtBDKF7Bjbea3MxJT6H0lPU3amlXLneYmhB9NYE8t6iw1mLsJ7edFs6w1dKvVmjh9QPVUwWa+Z2zmnxpt/85CDiBvBzeK6UEr1d0oAdaqGKyQvZdd3zJdPv1ysX4tw5b3KaGJtI6JJOpcYQyB5SmdxhXt7bo0sCwz/M4sVqT9FwNca9ThqTENYcyeTcTMm5qNmN2tGxDcaY8LA7byUoXa6S9ulZ3WNncHZ74pg8xdomPVNQYV5wpiMHxs4RB5In9i85T0nR7+t0918p0bvo2dj3JCsnoRu/o90uDWV3B+ASvJ6DjQHC+aqg5Cqu6ubKt3HZFAilUmddF9NaKuhuSmVLicJBiIIVmPx7hbxBR1F0PgHY+3jB9mznXUbtBAO3wyBQv9n20Td3CDwPhWYRG3UqGWwBAlDwS6o3BHBVKBpDz+fnFeHTxD0sKVWeEukwjqluJO0VGt4iKGsp/W7eAhMAhqtVYdA1zshmi40BUrMRcR6/Tdxhk0okytC8zBcH6sRLkxS/3tZyJoSqN7Yoo9/x9wmJ47iWu/7lFnrIoJ6jSKwXHKHHFTlas62XmgeUKWSiB+b37vApP+qoCyZcfSevzOoMkxUJXDy6BoCAUgN3alyJc7w/Or2Iwa0y1RDac204nEwOHd2pAKq+dZEYYEeIkH1d82kpVwWcQAc8SCEiIULWZBdaKh04yMRoVM2+d0UkDTjQXhql7wGHQ1rFCeLZdSfqwgGpBpsdzApy1KpXqowV5xwWoJwxeFgbbCTqaJTOGgDRuf8TzRVV3DTB4CzrP167X+zIswUWo6E1eFjFK1FA5H3CblaUZ3cNU2Og5DizUzEwdie4Cl260bjXVHP7MoOM1tlGJJOTSdrfSpEdzT4AWnsQA1vEGAlJDzuRHD KM6zNlnx T0r+zo5C6C8wz5O/XydEjMHqe9ptkFZG1IoE2Ak03nxEhNHSEB0f9nk4t3Z9dmKFRTS+5Oh3ghAvisS1irAN13dT30CvDhOhL+I5McB2W5q/d6yXTq5+VM/bupydrsLqvQr9oKzuoqxVRrMprKKb1HkyLBO99Ud/aXj5J X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/8/23 0:35, Yosry Ahmed wrote: > On Mon, Aug 21, 2023 at 6:54 PM Liu Shixin wrote: >> When spaces of swap devices are exhausted, only file pages can be reclaimed. >> But there are still some swapcache pages in anon lru list. This can lead >> to a premature out-of-memory. >> >> This problem can be fixed by checking number of swapcache pages in >> can_reclaim_anon_pages(). For memcg v2, there are swapcache stat that can >> be used directly. For memcg v1, use total_swapcache_pages() instead, which >> may not accurate but can solve the problem. > Interesting find. I wonder if we really don't have any handling of > this situation. I have alreadly test this problem and can confirm that it is a real problem. With 9MB swap space and 10MB mem_cgroup limit,when allocate 15MB memory, there is a probability that OOM occurs. > >> Signed-off-by: Liu Shixin >> --- >> include/linux/swap.h | 6 ++++++ >> mm/memcontrol.c | 8 ++++++++ >> mm/vmscan.c | 12 ++++++++---- >> 3 files changed, 22 insertions(+), 4 deletions(-) >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> index 456546443f1f..0318e918bfa4 100644 >> --- a/include/linux/swap.h >> +++ b/include/linux/swap.h >> @@ -669,6 +669,7 @@ static inline void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_p >> } >> >> extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg); >> +extern long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg); >> extern bool mem_cgroup_swap_full(struct folio *folio); >> #else >> static inline void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) >> @@ -691,6 +692,11 @@ static inline long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) >> return get_nr_swap_pages(); >> } >> >> +static inline long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg) >> +{ >> + return total_swapcache_pages(); >> +} >> + >> static inline bool mem_cgroup_swap_full(struct folio *folio) >> { >> return vm_swap_full(); >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index e8ca4bdcb03c..3e578f41023e 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -7567,6 +7567,14 @@ long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) >> return nr_swap_pages; >> } >> >> +long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg) >> +{ >> + if (mem_cgroup_disabled() || do_memsw_account()) >> + return total_swapcache_pages(); >> + >> + return memcg_page_state(memcg, NR_SWAPCACHE); >> +} > Is there a reason why we cannot use NR_SWAPCACHE for cgroup v1? Isn't > that being maintained regardless of cgroup version? It is not exposed > in cgroup v1's memory.stat, but I don't think there is a reason we > can't do that -- if only to document that it is being used with cgroup > v1. Thanks for your advice, it is more appropriate to use NR_SWAPCACH. > > >> + >> bool mem_cgroup_swap_full(struct folio *folio) >> { >> struct mem_cgroup *memcg; >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index 7c33c5b653ef..bcb6279cbae7 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -609,13 +609,17 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, >> if (memcg == NULL) { >> /* >> * For non-memcg reclaim, is there >> - * space in any swap device? >> + * space in any swap device or swapcache pages? >> */ >> - if (get_nr_swap_pages() > 0) >> + if (get_nr_swap_pages() + total_swapcache_pages() > 0) >> return true; >> } else { >> - /* Is the memcg below its swap limit? */ >> - if (mem_cgroup_get_nr_swap_pages(memcg) > 0) >> + /* >> + * Is the memcg below its swap limit or is there swapcache >> + * pages can be freed? >> + */ >> + if (mem_cgroup_get_nr_swap_pages(memcg) + >> + mem_cgroup_get_nr_swapcache_pages(memcg) > 0) >> return true; >> } > I wonder if it would be more efficient to set a bit in struct > scan_control if we only are out of swap spaces but have swap cache > pages, and only isolate anon pages that are in the swap cache, instead > of isolating random anon pages. We may end up isolating pages that are > not in the swap cache for a few iterations and wasting cycles. Good idea. Thanks. > >> -- >> 2.25.1 >> > . >