From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 776A9C25B78 for ; Tue, 4 Jun 2024 09:12:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED8B66B00B3; Tue, 4 Jun 2024 05:12:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E88606B00B4; Tue, 4 Jun 2024 05:12:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D502C6B00B5; Tue, 4 Jun 2024 05:12:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B6FBA6B00B3 for ; Tue, 4 Jun 2024 05:12:32 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 776A2A21CF for ; Tue, 4 Jun 2024 09:12:32 +0000 (UTC) X-FDA: 82192640544.04.5A8BF3D Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf23.hostedemail.com (Postfix) with ESMTP id 74CD5140010 for ; Tue, 4 Jun 2024 09:12:29 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf23.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717492350; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yJ1YFfl4CY9AsmOghfGcxzJOj39S9utXvNeWWwu7EVE=; b=gtolJ7lHKTFCMpLM1awD6vqe23Ne+nBhiTd1oNXp1SExxpGryHS3bXs8W2uRorsjWY3USz uOsrA2wd6fZCXH8zFRVihdjk4HOOcELfr74m2pMKr+XhUygcXDbCbthDxDnl7SNGD9PKfg tv5L65QK5YfC1XhHJfnk+IJ4+FRPBUQ= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf23.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717492350; a=rsa-sha256; cv=none; b=gaYRGU2CriJFDTs+T983Ovftro7x0wCyeBNmwtW45p9fQyIHP33z0px2N5v38q0izjC7SS CMTgQEnpX2D2upXbRMjCjjNRM9NsUVZoSrgcAa+ltMm7CTZO4WRcRifNiKG74k5r7Jq/ZV rAbgguDW3THhBmDzjGZI6FqZB+uUGRo= X-AuditID: a67dfc5b-d85ff70000001748-5f-665eda7b264e Date: Tue, 4 Jun 2024 18:12:22 +0900 From: Byungchul Park To: "Huang, Ying" Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, hannes@cmpxchg.org, iamjoonsoo.kim@lge.com, rientjes@google.com Subject: Re: [PATCH v2] mm: let kswapd work again for node that used to be hopeless but may not now Message-ID: <20240604091221.GA28034@system.software.com> References: <20240604072323.10886-1-byungchul@sk.com> <87bk4hcf7h.fsf@yhuang6-desk2.ccr.corp.intel.com> <20240604084533.GA68919@system.software.com> <8734ptccgi.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8734ptccgi.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrPLMWRmVeSWpSXmKPExsXC9ZZnkW71rbg0g533xC3mrF/DZrF6k6/F yu5mNovLu+awWdxb85/Vom3JRiaLk7Mmsziwexx+857ZY8GmUo/Fe14yeWz6NIndo+vtFSaP EzN+s3h83iQXwB7FZZOSmpNZllqkb5fAlbHq41q2gq9KFTPf/WRrYFwg3cXIySEhYCLxYvID Jhi7rW0GO4jNIqAisXHhA1YQm01AXeLGjZ/MILaIgIbEp4XLgWq4OJgF1jJKHLnaAlTEwSEs kCpxfHsASA2vgIXEsiUvWUBqhATOMEr0NqxngkgISpyc+YQFxGYW0JK48e8lE0gvs4C0xPJ/ HCBhTgE7id5F9xlBbFEBZYkD244zgcyREDjAJjHl7gMWiEMlJQ6uuMEygVFgFpKxs5CMnYUw dgEj8ypGocy8stzEzBwTvYzKvMwKveT83E2MwGBfVvsnegfjpwvBhxgFOBiVeHgNFsWmCbEm lhVX5h5ilOBgVhLh7auLThPiTUmsrEotyo8vKs1JLT7EKM3BoiTOa/StPEVIID2xJDU7NbUg tQgmy8TBKdXA2PxZwXmDvxafxa2qZ2wvXqdy9OkJzWDiOsK6RfoOd8nf68JPTIo0arbb6ArF yW/oWNdTwv+VS45pa+Kk85wdonJZZ2N3aLNLfp24bWuxsUJZ5JKotO8XX5mnS39yfWaieMf6 4m2Wh1JL/e/tW8+8+efSnokKffVHOr53M2p7LdRViD2ge3umEktxRqKhFnNRcSIARsvOv3IC AAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrGLMWRmVeSWpSXmKPExsXC5WfdrFt9Ky7N4NkdLos569ewWaze5Gux sruZzeLw3JOsFpd3zWGzuLfmP6tF25KNTBYnZ01mceDwOPzmPbPHgk2lHov3vGTy2PRpErtH 19srTB4nZvxm8Vj84gOTx+dNcgEcUVw2Kak5mWWpRfp2CVwZqz6uZSv4qlQx891PtgbGBdJd jJwcEgImEm1tM9hBbBYBFYmNCx+wgthsAuoSN278ZAaxRQQ0JD4tXA5Uw8XBLLCWUeLI1Rag Ig4OYYFUiePbA0BqeAUsJJYteckCUiMkcIZRordhPRNEQlDi5MwnLCA2s4CWxI1/L5lAepkF pCWW/+MACXMK2En0LrrPCGKLCihLHNh2nGkCI+8sJN2zkHTPQuhewMi8ilEkM68sNzEzx1Sv ODujMi+zQi85P3cTIzB0l9X+mbiD8ctl90OMAhyMSjy8Fitj04RYE8uKK3MPMUpwMCuJ8PbV RacJ8aYkVlalFuXHF5XmpBYfYpTmYFES5/UKT00QEkhPLEnNTk0tSC2CyTJxcEo1MPK65x5+ ZDffzkAt7/z+9K2Oezq+K8aF3Gzh0t8wVbn3tdyV+o2iz25c2uxu7fzAr2p67oXCa8vmafs2 NloYH3qjwjDx0JF/pQFmT7T/6U/bk5RpZP1LWmH60qf8hz2SHcNP8c/XvbN2w+bFC19FMJ1a nfqzrW6jduaGTVdCOa/P/HQjR3zZ5AYlluKMREMt5qLiRADimWPdWQIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 74CD5140010 X-Stat-Signature: uoo6m5b3945itddktkpwmeru8su5tic3 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1717492349-579243 X-HE-Meta: U2FsdGVkX1/4Cbsj3cv+r55G7p3QJLjPapmnbLgJCwndjbx5kYx5WA7g9Kc1sNNAJnmG/Blh058whRivHix0KHG2CF4eIaJSJjeXGPU0U/0OrNFmigXoGSvKYchVK04wmf/d5mct+i12W73q3mhmDLHt11/JTFG61BFKT3dyxhwpKIZZ3wTX+umYJXheuoQOX3CzJ45tUjwT5vdeZjX/2tG3ZrTL/v4rprXR/k+IkQUCcYmFm56V+RxKnwezHrptRhm4JH8hGXl8YKigSiIdBUrCjnMCWdAPtiAdaEn3OPwF78i7ceoiPHsqFzGDBbx+3jnCTW8mDS1jysfNSMtW2llqXlo2RshlwqK7m376XrEX9irBn4MOY+oe+9zx2ZaS4gUkhCDee+yad2q7zsStdstZ9OM1jC7JlSbjOQjeRuwsjEuz4ZWnpE79+lDIxVGkx7oNAdpd0FhnflIO+/3EUIR9gDK9y3eQA9fXB/CSTkVVfs9stcSJ+u48Cl1ini6mkt2CWvDN+x9Tw/KczJzLppa5LcAs15y+bfduALEARl4Nx7ZPqLDOaaTf1NqnetU0xDWC6wxSA0i5tZB8Cg8LEPcrahALVMIpEX1eDqfrUSUrKt2qFaEBzRZ6tWmiuoSIvN+f5H5idnoESPUcLBt1e96v22h/zYOED2H0j+wcZmbDdNsOG9G1+FZjddO18iRudqKTeIJZR+iGkJj+VtQ/erghV2w3NETOawSgbh+4ewNOKf6o33SGBI7qLaCXflRlaqld0lUGuPmyRr5Q3wOaHL/MZWEzokn5TbRRqGJQRlE+2+zhyNtGnnsozoGE1O5PJfJjKndbZzr6xpUVDmiOW4IV1kLYshQfQaby9NNMxAwdeLeHWLj5HRwJmh700y6zlEH69YvV0FunZQM+k34ar5vxSr9vLkLHbsolXbGoIEOwbZg3SLqj3ol7t26KAYwHuaalkoOpjbz4niVuRyI ZtKyqNMH 0HTJmtgcahqAzVRGRofiS4oFeu24yII6Fu8Jf+tauxpiwbE8ZFwBC1Rh7JQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 04, 2024 at 04:57:17PM +0800, Huang, Ying wrote: > Byungchul Park writes: > > > On Tue, Jun 04, 2024 at 03:57:54PM +0800, Huang, Ying wrote: > >> Byungchul Park writes: > >> > >> > Changes from v1: > >> > 1. Don't allow to resume kswapd if the system is under memory > >> > pressure that might affect direct reclaim by any chance, like > >> > if NR_FREE_PAGES is less than (low wmark + min wmark)/2. > >> > > >> > --->8--- > >> > From 6c73fc16b75907f5da9e6b33aff86bf7d7c9dd64 Mon Sep 17 00:00:00 2001 > >> > From: Byungchul Park > >> > Date: Tue, 4 Jun 2024 15:27:56 +0900 > >> > Subject: [PATCH v2] mm: let kswapd work again for node that used to be hopeless but may not now > >> > > >> > A system should run with kswapd running in background when under memory > >> > pressure, such as when the available memory level is below the low water > >> > mark and there are reclaimable folios. > >> > > >> > However, the current code let the system run with kswapd stopped if > >> > kswapd has been stopped due to more than MAX_RECLAIM_RETRIES failures > >> > until direct reclaim will do for that, even if there are reclaimable > >> > folios that can be reclaimed by kswapd. This case was observed in the > >> > following scenario: > >> > > >> > CONFIG_NUMA_BALANCING enabled > >> > sysctl_numa_balancing_mode set to NUMA_BALANCING_MEMORY_TIERING > >> > numa node0 (500GB local DRAM, 128 CPUs) > >> > numa node1 (100GB CXL memory, no CPUs) > >> > swap off > >> > > >> > 1) Run a workload with big anon pages e.g. mmap(200GB). > >> > 2) Continue adding the same workload to the system. > >> > 3) The anon pages are placed in node0 by promotion/demotion. > >> > 4) kswapd0 stops because of the unreclaimable anon pages in node0. > >> > 5) Kill the memory hoggers to restore the system. > >> > > >> > After restoring the system at 5), the system starts to run without > >> > kswapd. Even worse, tiering mechanism is no longer able to work since > >> > the mechanism relies on kswapd for demotion. > >> > >> We have run into the situation that kswapd is kept in failure state for > >> long in a multiple tiers system. I think that your solution is too > > > > My solution just gives a chance for kswapd to work again even if > > kswapd_failures >= MAX_RECLAIM_RETRIES, if there are potential > > reclaimable folios. That's it. > > > >> limited, because OOM killing may not happen, while the access pattern of > > > > I don't get this. OOM will happen as is, through direct reclaim. > > A system that fails to reclaim via kswapd may succeed to reclaim via > direct reclaim, because more CPUs are used to scanning the page tables. > > In a system with NUMA balancing based page promotion and page demotion > enabled, page promotion will wake up kswapd, but kswapd may fail in some > situations. But page promotion will no trigger direct reclaim or OOM. > > >> the workloads may change. We have a preliminary and simple solution for > >> this as follows, > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?h=tiering-0.8&id=17a24a354e12d4d4675d78481b358f668d5a6866 > > > > Whether tiering is involved or not, the same problem can arise if > > kswapd gets stopped due to kswapd_failures >= MAX_RECLAIM_RETRIES. > > Your description is about tiering too. Can you describe a situation I mentioned "tiering" while I described how to reproduce because I ran into the situation while testing with tiering system but I don't think it's the necessary condition. Let me ask you back, why the logic to stop kswapd was considered in the first place? That's because the problem was already observed anyway whether tiering is involved or not. The same problem will arise once kswapd stops. Byungchul > without tiering? > > -- > Best Regards, > Huang, Ying > > > Byungchul > > > >> where we will try to wake up kswapd to check every 10 seconds if kswapd > >> is in failure state. This is another possible solution. > >> > >> > However, the node0 has pages newly allocated after 5), that might or > >> > might not be reclaimable. Since those are potentially reclaimable, it's > >> > worth hopefully trying reclaim by allowing kswapd to work again. > >> > > >> > >> [snip] > >> > >> -- > >> Best Regards, > >> Huang, Ying