From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B91FC61D9B for ; Wed, 22 Nov 2023 13:19:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AC936B05F4; Wed, 22 Nov 2023 08:19:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 15D8C6B05F5; Wed, 22 Nov 2023 08:19:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 025046B05F6; Wed, 22 Nov 2023 08:19:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E69216B05F4 for ; Wed, 22 Nov 2023 08:19:42 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B2CDFB4C78 for ; Wed, 22 Nov 2023 13:19:42 +0000 (UTC) X-FDA: 81485647404.21.22FC635 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf18.hostedemail.com (Postfix) with ESMTP id 688CA1C0003 for ; Wed, 22 Nov 2023 13:19:40 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=NyKE0AGT; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf18.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700659180; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0hlfUJNemNX5RvK9UupKma9tgfKErCXKSSlnJ4F/bgU=; b=dIggwLyBf6pXmXZrIEooAz1dw+FSR2CrilB5FZ2GY1Eoc5B5WpW53jFiWDm8yF/RzHbe9J uGvqSqVjM5E0fexfmKwjoqJrAG55rcW9e6nQ6hoOMn87FCaND+6s3vNGcoLt3BTDy7TQ3n hiQ3LaPSzkLZKx26irOlOC3oRdKp61A= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=NyKE0AGT; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf18.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700659180; a=rsa-sha256; cv=none; b=yVugZcxEa9yxjlp67hOpeELWQzyx5M1hhxhQqh6sF0J7QAxGo0rKEX5kUHFe3a12iLSeSt +mgpbloig4aByEhkrI8NpYoY/5UtwjmdYKyH2UsJ+PkePz+TGG7d4xIWrZCeZi7TL01xCG 8UVmLmdu4Kf5VddCKJwZ1TYvaJv1m0w= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CFF781F8D6; Wed, 22 Nov 2023 13:19:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1700659177; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0hlfUJNemNX5RvK9UupKma9tgfKErCXKSSlnJ4F/bgU=; b=NyKE0AGTH1t6Z4wzPD7zkiu22YrCmVkXRvvb/EVx4vexKfzEhjEfbUlqSroJg9FPWLHSXm 07W70Hv4+OieqEyiabRGheVfYMyvawAvVe+dVRO7hgLo6HzcGYsNVRV4zATjQH5ErOrOZe 9oCZ3sOsowkvZHrd8PVG9bc+1esvhAI= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B22D113467; Wed, 22 Nov 2023 13:19:37 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id jVR0KOn/XWXYZgAAMHmgww (envelope-from ); Wed, 22 Nov 2023 13:19:37 +0000 Date: Wed, 22 Nov 2023 14:19:37 +0100 From: Michal Hocko To: Yosry Ahmed Cc: Liu Shixin , Yu Zhao , Andrew Morton , Huang Ying , Sachin Sant , Johannes Weiner , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v10] mm: vmscan: try to reclaim swapcache pages if no swap space Message-ID: References: <20231121090624.1814733-1-liushixin2@huawei.com> <32fe518a-e962-14ae-badc-719390386db9@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 688CA1C0003 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: khfrnggp3nq6dmj1sbmk7b8yyee1hx8f X-HE-Tag: 1700659180-302897 X-HE-Meta: U2FsdGVkX19ZRyfDCy9C4kBfHCEtBD3H1QuK3mRWO/2Aa6Y6gTi/WsYoEKyP/S4DL9JGhyP0PIBb7AxyFrtsJa4+EOrpV8Sxhe/Cc5qlNKKZA6SmROlTC2vgGeA9ZY95G6dvoRwEFKGGdhosnESW/w7rL2NBhKRT6l4ycfRorxBX1nV1ZIS/W+M7pN6goiXr94Ey4LlHi7GGKc9TOrvpvgFazobzV4hzfUE9VP3yW707opmZ6j8e6SgkhvugZbHJ8cAwlZ2CB3E6FgQzOB7hI3zOnPHpio9xRZ7oYUVGsAjyEMhQ3nb3pUdbc8YXQMKdqhGfOmvDvvd0bnhXE5iANCHFgLAcGBsnS2MaAMK2gFtZPYU94CUOAEyafPbYrfGdQBiJ2lInsjh1i9DTgRo8fhJEcMsMz1/9/+O4iryKD28Y7Ch6FPuqJYlgV2GrVHAZli0CWl3Hb0DOaUWqNESQIi0MQo+32cZQPrAdtVl68rvnY7zGffzv4HRTJT7LElS3oc41+6SgG1lD1aVo16s4jbf7ktnySEYfksDHJ9VKdtLSKRh8atH1Cnmmyq3Gs5sajlwqoZzQeVfDbbPxK14ZHsUQW0m7SCqXUwK77+qKUVZwlvG4O2hfI95gr68k6HYLWmHbEHKuGAyMGsmi5oKjfu03gvwIMqAqfMd29G4wLkna5PFIL0bVK3IpOCZ8SeRmzSd8jM/zfjOsmXefgTqp8MO/zmxRch9xQ8+EwtDTXAU5tF8sKteQ4kPQ3YkUHB3UwA/e8QpgvES2GsmpRwPfB2MVkm3rk4AkYfsAnY6rWjaC4P0oI1muIjQZfhzWh0zujVYgUCjwThT95BDl6iTtBPCnDHggMGaAeOkf1MsxG6i99H0a5WGToVHImplMKBRKitZxIIKHXWTZTYzxOED6rNfU28uTgTXfCGNPseNWtrfU2rGtdgeGlVdFKAw4vkp9BpBd6DEkOe0PDL04Q6F wAlqFSV+ vjioVq7Bk8S+XbTLs23Jl3qP+TKuQKYxn8Z42EasLyRIxyP1l6hCm7j92LQ3eSF1BDoShLoQz8Lu829aFkJE/+JHImhvZeQ3RIc+HmjDxyrUjiUKMQi2uMnbD0+UO8DHJra7Q/JacGGuKqAGSf3jdGlYgNnKJY8aVSESo5ydrfid1MlegG/gxyiBdxPMs2nFu7Xk7Qpy8YP5rbQKi6HijAbPU+aAZCOo7/GcyVchl0Xdg682E9kp2whmdnFWDCKRxttMgkejAWzojcY8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed 22-11-23 02:39:15, Yosry Ahmed wrote: > On Wed, Nov 22, 2023 at 2:09 AM Michal Hocko wrote: > > > > On Wed 22-11-23 09:52:42, Michal Hocko wrote: > > > On Tue 21-11-23 22:44:32, Yosry Ahmed wrote: > > > > On Tue, Nov 21, 2023 at 10:41 PM Liu Shixin wrote: > > > > > > > > > > > > > > > On 2023/11/21 21:00, Michal Hocko wrote: > > > > > > On Tue 21-11-23 17:06:24, Liu Shixin wrote: > > > > > > > > > > > > However, in swapcache_only mode, the scan count still increased when scan > > > > > > non-swapcache pages because there are large number of non-swapcache pages > > > > > > and rare swapcache pages in swapcache_only mode, and if the non-swapcache > > > > > > is skipped and do not count, the scan of pages in isolate_lru_folios() can > > > > > > eventually lead to hung task, just as Sachin reported [2]. > > > > > > I find this paragraph really confusing! I guess what you meant to say is > > > > > > that a real swapcache_only is problematic because it can end up not > > > > > > making any progress, correct? > > > > > This paragraph is going to explain why checking swapcache_only after scan += nr_pages; > > > > > > > > > > > > AFAIU you have addressed that problem by making swapcache_only anon LRU > > > > > > specific, right? That would be certainly more robust as you can still > > > > > > reclaim from file LRUs. I cannot say I like that because swapcache_only > > > > > > is a bit confusing and I do not think we want to grow more special > > > > > > purpose reclaim types. Would it be possible/reasonable to instead put > > > > > > swapcache pages on the file LRU instead? > > > > > It looks like a good idea, but I'm not sure if it's possible. I can try it, is there anything to > > > > > pay attention to? > > > > > > > > I think this might be more intrusive than we think. Every time a page > > > > is added to or removed from the swap cache, we will need to move it > > > > between LRUs. All pages on the anon LRU will need to go through the > > > > file LRU before being reclaimed. I think this might be too big of a > > > > change to achieve this patch's goal. > > > > > > TBH I am not really sure how complex that might turn out to be. > > > Swapcache tends to be full of subtle issues. So you might be right but > > > it would be better to know _why_ this is not possible before we end up > > > phising for couple of swapcache pages on potentially huge anon LRU to > > > isolate them. Think of TB sized machines in this context. > > > > Forgot to mention that it is not really far fetched from comparing this > > to MADV_FREE pages. Those are anonymous but we do not want to keep them > > on anon LRU because we want to age them indepdendent on the swap > > availability as they are just dropped during reclaim. Not too much > > different from swapcache pages. There are more constrains on those but > > fundamentally this is the same problem, no? > > I agree it's not a first, but swap cache pages are more complicated > because they can go back and forth, unlike MADV_FREE pages which > usually go on a one way ticket AFAICT. Yes swapcache pages are indeed more complicated but most of the time they just go away as well, no? MADV_FREE can be reinitiated if they are written as well. So fundamentally they are not that different. > Also pages going into the swap > cache can be much more common that MADV_FREE pages for a lot of > workloads. I am not sure how different reclaim heuristics will react > to such mobility between the LRUs, and the fact that all pages will > now only get evicted through the file LRU. The anon LRU will > essentially become an LRU that feeds the file LRU. Also, the more > pages we move between LRUs, the more ordering violations we introduce, > as we may put colder pages in front of hotter pages or vice versa. Well, traditionally the file LRU has been maintaining page cache or easily disposable pages like MADV_FREE (which can be considered a cache as well). Swapcache is a form of a page cache as well. > All in all, I am not saying it's a bad idea or not possible, I am just > saying it's probably more complicated than MADV_FREE, and adding more > cases where pages move between LRUs could introduce problems (or make > existing problems more visible). Do we want to start adding filtered anon scan for a certain type of pages? Because this is the question here AFAICS. This might seem an easier solution but I would argue that it is less predictable one. It is not unusual that a huge anon LRU would contain only very few LRU pages. That being said, I might be missing some obvious or less obvious reasons why this is completely bad idea. Swapcache is indeed subtle. -- Michal Hocko SUSE Labs