From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77500C4332F for ; Mon, 6 Nov 2023 02:20:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49AF38D0003; Sun, 5 Nov 2023 21:20:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4238F8D0002; Sun, 5 Nov 2023 21:20:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C47E8D0003; Sun, 5 Nov 2023 21:20:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 19AD08D0002 for ; Sun, 5 Nov 2023 21:20:13 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E2799120654 for ; Mon, 6 Nov 2023 02:20:12 +0000 (UTC) X-FDA: 81425924664.17.8524364 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by imf12.hostedemail.com (Postfix) with ESMTP id 694444000C for ; Mon, 6 Nov 2023 02:20:10 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Dg/P8lgW"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699237211; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z45LYs6fbv5QzVVxWJ5+1djRJnJX9qEWcovaRRJgtH0=; b=qfWbOx/wXYWsypyjD8Tdo4RASdBXprNmt+b2xeDMhaij4WJvVc2lHs2IZZcZJ6lCsmQkxw sI/6VyLqvmIA9mJzKq7HIlHvix1MXLL8p6+HFjbR60eZ6fCycnQJWgQGcy7SpaOKohTKvy uC25pR+9NiNqjRtdtex+ErvO+q71OYA= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Dg/P8lgW"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699237211; a=rsa-sha256; cv=none; b=H+ptcu7vkhuIKFX7bfFL9khOB0r1Ze0WVo8jb+5LtZSwaANpXfKuq/uD0dMfRmpBOO9oQ5 Un03YExu0ojgGA4RcophruqKo9o18vjE/qKatxQkWjVf9Oa5Sgz+wGUNqG9PpjBLgNuLM2 hZuzapLN9RoqQqJFumAxaTbVGnCqD7M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1699237210; x=1730773210; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=KZ1JE9nQ/ZPSsgmW/pCfXIcQwsbXp5MJrRE9yEZvPt4=; b=Dg/P8lgWk1nxaCoyFxYj6S4CyUfHvV1dQsyOzwE5wOEHYx69VmXyxlAT jeMl7OWmM9jnpCHDv6BWIOSUR0YyiJXCm/5HMhNGT9djbQ7o5Gvmtazfe TR8J4HWVKLtbI+PRzDsfE/+09N5fAiTA9Q82aRyu8vNbebiXP0s5SHtkx xlQeMN5CI7a/JpRje8tB7dN+91x4UhdvhJAP6D2N3N7yxSOmXLlbj5FB/ mA6bDPAGGsTZ78u0mu1UJwUmlVeJP+Cu+pG1lmqByq9zUzMK0VPXI5lLz koeQF73GnVyoCNX78Atm3xz4dYUvrYHfurhH3onDLb06ozqnTi2A3VBBf A==; X-IronPort-AV: E=McAfee;i="6600,9927,10885"; a="420309788" X-IronPort-AV: E=Sophos;i="6.03,279,1694761200"; d="scan'208";a="420309788" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Nov 2023 18:20:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10885"; a="885759898" X-IronPort-AV: E=Sophos;i="6.03,279,1694761200"; d="scan'208";a="885759898" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Nov 2023 18:20:05 -0800 From: "Huang, Ying" To: Liu Shixin Cc: Andrew Morton , Yosry Ahmed , Sachin Sant , Michal Hocko , Johannes Weiner , Kefeng Wang , , Subject: Re: [PATCH v7] mm: vmscan: try to reclaim swapcache pages if no swap space In-Reply-To: <20231104140313.3418001-1-liushixin2@huawei.com> (Liu Shixin's message of "Sat, 4 Nov 2023 22:03:13 +0800") References: <20231104140313.3418001-1-liushixin2@huawei.com> Date: Mon, 06 Nov 2023 10:18:04 +0800 Message-ID: <87h6lzy68z.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 694444000C X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 15874pgf9kmxj3xjcibepce8btszxzzc X-HE-Tag: 1699237210-139440 X-HE-Meta: U2FsdGVkX1/b4XSUA0CI6EhMmP+MsrBqnxtrPh4aTkr60sk5zXA+uLB7IJ/enxCm4IZWH1VYJPZZgnHAsv5mMqlAllK7SUIt9igBN+a+1B1yVSKyEPDbIUaMUxoWZNttlOtlHbbV2cKkAV+mjHpOaCNOFrtXqO6KnDF5OFAB2KLiZ2OeeFo0x6cRpoNIKXS6agL0bttih4Lu3vOpNTr9NpPOKUKFZ8oafAv6h3FHhIbiwYSeKGjkQdsDFh3GHrRM3DW6ZSPajdRQlEdTodPQz9LW+KzOH9bv2xLquIZaWz2lmnaoowqqrNDj+JLsGNCNSDMGg78wz4DM8qK7xCeDlju0Yp89yulx16AmR59R2ZRVAoUZjVVAiA0Ml4UKnyNWG0DIONdNt9+ePpjQbUk7xmNAk8yI4lopwRafE44+lHmwYMv4xaNAqn7nM9JSbcLpoe92QOzEW8uIN28anMbA0NgdP2WclUZnEN06AtvFFHYawUNMnxYldbfkXe95Z2220h8+JTLrL8okeftOxfLGy28nXs8SgHQsNKFIpHFUjD4rpgH0TU2225DNl+s3laUfPRHBUVVtS6qW04/Pz+GiT5LzY6bA7c1EPkvpQV5dicoYOEQdAFs4l21cNrW9q0Sf8ieuCUDAaRBkApFwzjVmN/zbgrQeaQteSKqh+0BUnKQX6eOQcZBvQGHepfC2qGBDw1NnIC9RtyC1k7E8DnHVLGxytbZ34Jj93VqPULEyj89RZRMlOGBzooIhEZO+yHCvY6xFYJtKAUE7RQuhbdjM7yfE79znYC1UoxE3FqhsEgHQPCH94rxPo0wc8xw24xuBNu7xihYpWub3koYF3PqGczDjiLkMgLe5Boun6ndFY/QECKB+IeMHVLUtWk0CU2j1iCoWVer8NesDRaaT8pnQloCsRBZv9qBTjcEsAIfhzc8asZlyjbCiNVVNDRiLEhdlIifkburzjPKxWZPKL7W xTayjBDu 6YVYX25mDFMNNoKYlinSyi98nyYbflLqNdJ5ClLF0a2O9MMeLB9H5r18730X7wXD+kANubl3bQz/XkFVTgoaKlw14aOZN0hUxZW2EDPpq4PpYUGaZAQYjcsWWOzl5+q/R5gq1wMwR/CB4v1jgby1+dulp4ecEbxRk+szMvXEaK3Ss/tVZ3XsBSsl0jDNFrF81LXuoCdy2KhLbTfEfgvA/HjFaxxyBadco2/y+aFjNTxBZXuddbabZ3cHAMaLxs0SF594BUUEu12yKhzH4bN0Nwa1kO6xW0iN3rG1Qrioq+upxRsI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Liu Shixin writes: > When spaces of swap devices are exhausted, only file pages can be > reclaimed. But there are still some swapcache pages in anon lru list. > This can lead to a premature out-of-memory. > > The problem is found with such step: > > Firstly, set a 9MB disk swap space, then create a cgroup with 10MB > memory limit, then runs an program to allocates about 15MB memory. > > The problem occurs occasionally, which may need about 100 times [1]. > > Fix it by checking number of swapcache pages in can_reclaim_anon_pages(). > If the number is not zero, return true and set swapcache_only to 1. > When scan anon lru list in swapcache_only mode, non-swapcache pages will > be skipped to isolate in order to accelerate reclaim efficiency. > > However, in swapcache_only mode, the scan count still increased when scan > non-swapcache pages because there are large number of non-swapcache pages > and rare swapcache pages in swapcache_only mode, and if the non-swapcache > is skipped and do not count, the scan of pages in isolate_lru_folios() can > eventually lead to hung task, just as Sachin reported [2]. > > By the way, since there are enough times of memory reclaim before OOM, it > is not need to isolate too much swapcache pages in one times. > > [1]. https://lore.kernel.org/lkml/CAJD7tkZAfgncV+KbKr36=eDzMnT=9dZOT0dpMWcurHLr6Do+GA@mail.gmail.com/ > [2]. https://lore.kernel.org/linux-mm/CAJD7tkafz_2XAuqE8tGLPEcpLngewhUo=5US14PAtSM9tLBUQg@mail.gmail.com/ > > Signed-off-by: Liu Shixin > Tested-by: Yosry Ahmed > Reviewed-by: "Huang, Ying" > Reviewed-by: Yosry Ahmed > --- > v6->v7: Reset swapcache_only to zero after there are swap spaces. > v5->v6: Fix NULL pointing derefence and hung task problem reported by Sachin. > > include/linux/swap.h | 6 ++++++ > mm/memcontrol.c | 8 ++++++++ > mm/vmscan.c | 36 ++++++++++++++++++++++++++++++++++-- > 3 files changed, 48 insertions(+), 2 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index f6dd6575b905..3ba146ae7cf5 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -659,6 +659,7 @@ static inline void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_p > } > > extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg); > +extern long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg); > extern bool mem_cgroup_swap_full(struct folio *folio); > #else > static inline void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) > @@ -681,6 +682,11 @@ static inline long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) > return get_nr_swap_pages(); > } > > +static inline long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg) > +{ > + return total_swapcache_pages(); > +} > + > static inline bool mem_cgroup_swap_full(struct folio *folio) > { > return vm_swap_full(); > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5b009b233ab8..29e34c06ca83 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -7584,6 +7584,14 @@ long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) > return nr_swap_pages; > } > > +long mem_cgroup_get_nr_swapcache_pages(struct mem_cgroup *memcg) > +{ > + if (mem_cgroup_disabled()) > + return total_swapcache_pages(); > + > + return memcg_page_state(memcg, NR_SWAPCACHE); > +} > + > bool mem_cgroup_swap_full(struct folio *folio) > { > struct mem_cgroup *memcg; > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 6f13394b112e..a5e04291662f 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -137,6 +137,9 @@ struct scan_control { > /* Always discard instead of demoting to lower tier memory */ > unsigned int no_demotion:1; > > + /* Swap space is exhausted, only reclaim swapcache for anon LRU */ > + unsigned int swapcache_only:1; > + > /* Allocation order */ > s8 order; > > @@ -602,6 +605,12 @@ static bool can_demote(int nid, struct scan_control *sc) > return true; > } > > +static void set_swapcache_mode(struct scan_control *sc, bool swapcache_only) > +{ > + if (sc) > + sc->swapcache_only = swapcache_only; > +} > + I think that it's unnecessary to introduce a new function. I understand that you want to reduce the code duplication. We can add sc->swapcache_only = false; at the beginning of can_reclaim_anon_pages() to reduce code duplication. That can cover even more cases IIUC. > static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, > int nid, > struct scan_control *sc) > @@ -611,12 +620,26 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, > * For non-memcg reclaim, is there > * space in any swap device? > */ > - if (get_nr_swap_pages() > 0) > + if (get_nr_swap_pages() > 0) { > + set_swapcache_mode(sc, false); > return true; > + } > + /* Is there any swapcache pages to reclaim? */ > + if (total_swapcache_pages() > 0) { > + set_swapcache_mode(sc, true); > + return true; > + } > } else { > /* Is the memcg below its swap limit? */ > - if (mem_cgroup_get_nr_swap_pages(memcg) > 0) > + if (mem_cgroup_get_nr_swap_pages(memcg) > 0) { > + set_swapcache_mode(sc, false); > return true; > + } > + /* Is there any swapcache pages in memcg to reclaim? */ > + if (mem_cgroup_get_nr_swapcache_pages(memcg) > 0) { > + set_swapcache_mode(sc, true); > + return true; > + } > } If can_demote() returns true, we shouldn't scan swapcache only. -- Best Regards, Huang, Ying > /* > @@ -2342,6 +2365,15 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan, > */ > scan += nr_pages; > > + /* > + * Count non-swapcache too because the swapcache pages may > + * be rare and it takes too much times here if not count > + * the non-swapcache pages. > + */ > + if (unlikely(sc->swapcache_only && !is_file_lru(lru) && > + !folio_test_swapcache(folio))) > + goto move; > + > if (!folio_test_lru(folio)) > goto move; > if (!sc->may_unmap && folio_mapped(folio))