From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1569C6FA8F for ; Fri, 25 Aug 2023 00:49:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E5A5F280053; Thu, 24 Aug 2023 20:49:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0A0F8E0011; Thu, 24 Aug 2023 20:49:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD1D4280053; Thu, 24 Aug 2023 20:49:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B84CD8E0011 for ; Thu, 24 Aug 2023 20:49:32 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8BC051A0568 for ; Fri, 25 Aug 2023 00:49:32 +0000 (UTC) X-FDA: 81160793784.15.DD75EE8 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by imf19.hostedemail.com (Postfix) with ESMTP id 37E1C1A0009 for ; Fri, 25 Aug 2023 00:49:28 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=W5ErMIIE; spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692924570; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HZl0Rew8s9F/GNxfS39v5qyK3Oa0MPRjf7ugl7Quaqw=; b=7+ycn9HTvfYqziwBpi69q/c6luuTWzAJB7WwZorK34+lWKP7iOMHG6h7iQjJd3jwTrQ9Ib AmrMIZPc3x33B5In4KfxsELLCBakZ4B1mzuR/kVEhGZMOA7qhr42IoVNQmNNF8DyDKsnb0 7hbK55h/ayjvWzK124N4PWE9lZwrn0I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692924570; a=rsa-sha256; cv=none; b=lBkHIVKzdhoP9dpaLvs3160Vl/1m12HwIDkEo8DNIqKe/Zvt6vyG7w87i/up2OeWjQsATk hTP5dPxJyTNbbJ4O7syo/k0MewsBbShUh7+/xZmFhx53o4eNTT8wQw2j64IX9SnYDE4sgd 8RZOylDDl+dmbw7yD3XhchCJTwTiUVE= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=W5ErMIIE; spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692924569; x=1724460569; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version:content-transfer-encoding; bh=SC3CeduAi6sdHjQnC1nP/afnUY7oCpZu6CQrcY684L0=; b=W5ErMIIEaFQhm+EaROihbf0nlDdsKM7QmhBRCmtFmsZg01rlu1Re2sfB XyVrZO8y1ibj1VjRTuql4AL6rD0T5AVjLmpq4IItY9ACM9DzDeOTMxbqT uH/yZQ/V+cdrhP49aF8onxjIl2Q9pdLYtMx7FA6QAU8M3kI9Inoe2TVDm atPcBivfupx/Np0Fj8hy8EriWPbIT312klWLcpaBpg/gyXWICucCCGw55 GZXeGa2JvvM2Pfhc147Te/7ICuliWkMrthc7iPXmOS/30Q1HLZBRLIIv+ w316SozJXq1CfOf8tjM4c36nG2080wLXAXIxULyXQGpAyGWHOuF2/rQ2g w==; X-IronPort-AV: E=McAfee;i="6600,9927,10812"; a="354927221" X-IronPort-AV: E=Sophos;i="6.02,195,1688454000"; d="scan'208";a="354927221" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2023 17:49:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10812"; a="827386888" X-IronPort-AV: E=Sophos;i="6.02,195,1688454000"; d="scan'208";a="827386888" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Aug 2023 17:49:23 -0700 From: "Huang, Ying" To: Yosry Ahmed Cc: Liu Shixin , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , wangkefeng.wang@huawei.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2] mm: vmscan: reclaim anon pages if there are swapcache pages References: <20230822024901.2412520-1-liushixin2@huawei.com> <87wmxk6d1m.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 25 Aug 2023 08:47:15 +0800 In-Reply-To: (Yosry Ahmed's message of "Thu, 24 Aug 2023 11:31:47 -0700") Message-ID: <87sf8854oc.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 37E1C1A0009 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: umn6cmfimqed1rf9kxyxigqf69jh7ocr X-HE-Tag: 1692924568-749569 X-HE-Meta: U2FsdGVkX18A658vJifxF0r7HBJCCGWc1XDAuMdJTVW4uWbvMKnDHfyTVCEpQX+tbvu6XAyiQydTXVB+eREey8zRsqXXWeMCbbBgjKJeAby4PHXd8Pe/QaG3NkqAq9B93X54b+oYJS3iKgUUQXNx1xiVST0icJyKlJbbwJ7WgrbpZ16SItuedKpjin8j7wnUmD5/aQJKgPN5nbWvL4DwYXWzFkjUmAXBN01gPTCQdOBX0GwVdDr1jxawhL0pVoOeEV3cnb5PV787HRYxTRnSQISDICsweWNA1gqI0IQ1mxW9aA4f9rvRF2+JELvCnFb7lcMo4usJk3eiZH3mavcA0nkYn8Y2543RvTP0wmOoMzNs8l/51hDsFhVWsQGBkg8RwCac2a2e3N5hs1LSzrD+9icNwHuasReC4hrrpTvRyl5e2mXuEaUg5aZuHr7IO5gXwimJpcSSftV3Be3bBtEXWFJovWMitmChvMYQ+PrYT9uHNw/UaXGX2UGFwV31akvvgFBJOi7bw1pIK53pKNGT82ajzNc4KC6Kt1rVF0wd/y4wHk3yNDH9/DFq4RjHIkCHc4igf9XONnaaoanJXJ1+TYamA880REBfv6GYlM5HcKVfPqf1z8R3q9o7KdkvlYf5IGp4+UwFJR3QOCu1/bnNjGy+p5llUnRNjw3QWmXwzy5xgkT8QhC53rW7OmXC3cdTj7UQ/dwwZnHlHev66C67xEo3ncQZZKA/ch2MrnsVMvej7bO7WSONnF5+6yVnA0E8NvnTbrrCehsOWEk+F2W6ERAzjCFltlzqLbvTWHUHuOhQ1iZ+OmhOh5Wxsf6qxj/73yIMn5n7IqQxJQ9O8f+pmBkXkONxtd8S61ulEuIiDezzZmo4J57iU4H3UzuvgeUbbAyQgwNALWvw9a4NV2YI8tQWlryqeplLd7ey3Y8PhS+JzhB7DOhLvGceJU/hA1Brr4/Xa5Ubb87dJr6YvBH 7QJJv6Kj CMog6Uw1Ev9nRnoL4DiYOE2IUU8ym+0lTzpLqNN1kj/rFzrzRyZamNUSJ8y4kC1OkSEl45qXMklTNZQVxS5wZ6hPdoQvTYNLFXc9Mx5sjnGHU/d2Fx/kWKng9rQZFSE8SbuIDGysyeP0m2JKsoFcH0sRqEUTVzyZXyaklNzXjNuZE6ExdixchwWqgmC5dZTnyEOWlhFBgMsWR7prGt/IX2WWKzA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Yosry Ahmed writes: > On Thu, Aug 24, 2023 at 1:51=E2=80=AFAM Huang, Ying wrote: >> >> Yosry Ahmed writes: >> >> > On Mon, Aug 21, 2023 at 6:54=E2=80=AFPM Liu Shixin wrote: >> >> >> >> When spaces of swap devices are exhausted, only file pages can be rec= laimed. >> >> But there are still some swapcache pages in anon lru list. This can l= ead >> >> to a premature out-of-memory. >> >> >> >> This problem can be fixed by checking number of swapcache pages in >> >> can_reclaim_anon_pages(). For memcg v2, there are swapcache stat that= can >> >> be used directly. For memcg v1, use total_swapcache_pages() instead, = which >> >> may not accurate but can solve the problem. >> > >> > Interesting find. I wonder if we really don't have any handling of >> > this situation. >> > >> >> >> >> Signed-off-by: Liu Shixin >> >> --- >> >> include/linux/swap.h | 6 ++++++ >> >> mm/memcontrol.c | 8 ++++++++ >> >> mm/vmscan.c | 12 ++++++++---- >> >> 3 files changed, 22 insertions(+), 4 deletions(-) >> >> >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> >> index 456546443f1f..0318e918bfa4 100644 >> >> --- a/include/linux/swap.h >> >> +++ b/include/linux/swap.h >> >> @@ -669,6 +669,7 @@ static inline void mem_cgroup_uncharge_swap(swp_e= ntry_t entry, unsigned int nr_p >> >> } >> >> >> >> extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg); >> >> +extern long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *mem= cg); >> >> extern bool mem_cgroup_swap_full(struct folio *folio); >> >> #else >> >> static inline void mem_cgroup_swapout(struct folio *folio, swp_entry= _t entry) >> >> @@ -691,6 +692,11 @@ static inline long mem_cgroup_get_nr_swap_pages(= struct mem_cgroup *memcg) >> >> return get_nr_swap_pages(); >> >> } >> >> >> >> +static inline long mem_cgroup_get_nr_swapcache_pages(struct mem_cgro= up *memcg) >> >> +{ >> >> + return total_swapcache_pages(); >> >> +} >> >> + >> >> static inline bool mem_cgroup_swap_full(struct folio *folio) >> >> { >> >> return vm_swap_full(); >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> >> index e8ca4bdcb03c..3e578f41023e 100644 >> >> --- a/mm/memcontrol.c >> >> +++ b/mm/memcontrol.c >> >> @@ -7567,6 +7567,14 @@ long mem_cgroup_get_nr_swap_pages(struct mem_c= group *memcg) >> >> return nr_swap_pages; >> >> } >> >> >> >> +long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg) >> >> +{ >> >> + if (mem_cgroup_disabled() || do_memsw_account()) >> >> + return total_swapcache_pages(); >> >> + >> >> + return memcg_page_state(memcg, NR_SWAPCACHE); >> >> +} >> > >> > Is there a reason why we cannot use NR_SWAPCACHE for cgroup v1? Isn't >> > that being maintained regardless of cgroup version? It is not exposed >> > in cgroup v1's memory.stat, but I don't think there is a reason we >> > can't do that -- if only to document that it is being used with cgroup >> > v1. >> > >> > >> >> + >> >> bool mem_cgroup_swap_full(struct folio *folio) >> >> { >> >> struct mem_cgroup *memcg; >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> >> index 7c33c5b653ef..bcb6279cbae7 100644 >> >> --- a/mm/vmscan.c >> >> +++ b/mm/vmscan.c >> >> @@ -609,13 +609,17 @@ static inline bool can_reclaim_anon_pages(struc= t mem_cgroup *memcg, >> >> if (memcg =3D=3D NULL) { >> >> /* >> >> * For non-memcg reclaim, is there >> >> - * space in any swap device? >> >> + * space in any swap device or swapcache pages? >> >> */ >> >> - if (get_nr_swap_pages() > 0) >> >> + if (get_nr_swap_pages() + total_swapcache_pages() > 0) >> >> return true; >> >> } else { >> >> - /* Is the memcg below its swap limit? */ >> >> - if (mem_cgroup_get_nr_swap_pages(memcg) > 0) >> >> + /* >> >> + * Is the memcg below its swap limit or is there swap= cache >> >> + * pages can be freed? >> >> + */ >> >> + if (mem_cgroup_get_nr_swap_pages(memcg) + >> >> + mem_cgroup_get_nr_swapcache_pages(memcg) > 0) >> >> return true; >> >> } >> > >> > I wonder if it would be more efficient to set a bit in struct >> > scan_control if we only are out of swap spaces but have swap cache >> > pages, and only isolate anon pages that are in the swap cache, instead >> > of isolating random anon pages. We may end up isolating pages that are >> > not in the swap cache for a few iterations and wasting cycles. >> >> Scanning swap cache directly will make the code more complex. IIUC, the >> possibility for the swap device to be used up isn't high. If so, I >> prefer the simpler implementation as that in this series. > > I did not mean that, sorry if I wasn't clear. I meant to set a bit in > struct scan_control, and then in isolate_lru_folios() for anon lrus, > we can skip isolating folios that are not in the swapcache if that bit > is set. > > My main concern was that if we have a few pages in the swapcache we > may end up wasting cycles scanning through a lot of other anonymous > pages until we reach them. If that's too much complexity that's > understandable. Sorry, I misunderstood your idea. This sounds reasonable to me. We can check swap space and swap cache in isolate_lru_folios(), either in isolate_lru_folios() directly, or via a bit in scan_control. -- Best Regards, Huang, Ying