From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86C99C61D97 for ; Wed, 22 Nov 2023 20:14:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E846F6B0630; Wed, 22 Nov 2023 15:14:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E34AB6B0633; Wed, 22 Nov 2023 15:14:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD4F96B0634; Wed, 22 Nov 2023 15:14:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BD33B6B0630 for ; Wed, 22 Nov 2023 15:14:38 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7557D140DF7 for ; Wed, 22 Nov 2023 20:14:38 +0000 (UTC) X-FDA: 81486693036.09.ECBA92D Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf22.hostedemail.com (Postfix) with ESMTP id 9730AC0014 for ; Wed, 22 Nov 2023 20:14:36 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MyKYlK5Q; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700684076; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Nr1DL5tfS7yVcqLu3K8sHUIKgak2PAZwUVJOppxZncw=; b=7L5fS0NifWzGveyop2fal5eQvaFXuOZ65+uZMfsazwaAFNvSBu5jOpMQjIVVtACM8FYXrq OYLElqjo31HRiNBgeAmlTIUoHHtMWp+Zws6mRCL23Pzcchb87Svt+jIBS0U0X7WGtd8tlh /dxbwv3wXzv2MlrHaWWW8fX9mEmu560= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MyKYlK5Q; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700684076; a=rsa-sha256; cv=none; b=NQ0goa749oPpVhfc83LRZGI2GDN/AXn0p1tmM5lk7lwzc7MWr76RWfvL6njxf06qmwiXMo 2m6vshxHcXx1PW0YuJhHoJQ8JSUbQ5NST0ibQtO6SvihqRmPNH8WK1CgnPrnc3KCFOO8uk nnqpJc89xgItQKwVaFfZxdCJ9Sos8Ys= Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-548ce28fd23so234465a12.3 for ; Wed, 22 Nov 2023 12:14:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700684075; x=1701288875; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Nr1DL5tfS7yVcqLu3K8sHUIKgak2PAZwUVJOppxZncw=; b=MyKYlK5Q2mUNN3kYEW90UKRNLtZVQXMp9F0RcFD+9YxoMhzfaDB1bvPP+z0RW1kf5u mIADq/L/zdfj8HkQ5v3WcuMoufuHt6Fd22ZDVOBUwZdC5112F+LiKcxBHInAa4IEOje9 EdaRpXyBCFSHVgoCGcJ9q94SBo9VUWwHzNbxGmQGw5hsuWkAf+oF/zy6x2ogB/yiXEej PWg5AKNi7RQqkUFxwWSCNT7dL/03c9NjWM7fCwd9WYARDHEgzTXHGwO7Z9cly7BLvmTs r5izmlQl8vepPGlWjuoLgv0FbQOABpacYfqb3z7OY5XzuisG0E2D3ODW+6Tnykhjts/l GTbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700684075; x=1701288875; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Nr1DL5tfS7yVcqLu3K8sHUIKgak2PAZwUVJOppxZncw=; b=JdSoFC30PX/c5Ln465jVI3RFVPyDA7ovqy8W+zQKMsuGQAGv4mv4LOq/oqnNnbmf6d RUtPKp8rQcK4LhYLwNLx/49JgvfpEro45IyRomEGGU0l+MRxfjnBLU0vJ/jGgS7syfxL EeUwiqPEqtoFysY4lwtOuDNeyJBietFp9lI4sYzCmraKT/viXb5vmtAM4oCMN7HqfKON v5HSXRQg9ryjzivMG3mSCUpbTAmx9yb52LmuNpp5wXmAUfN29YVIuF3FEIVmiHtNdQX+ 1OmUwKL6iu17c0SS3ncM8mUQPw0YyYXw6ARnQxSj8WbFGPMnP2xxVi8NcaOSuvzaPAAr 62yA== X-Gm-Message-State: AOJu0YysZg/oT7DjuczsU79Mu6fuAw+1Bl9XoBNPx0WHhCgUhRSJNEWY fIw2ajGsKASR4LIs5n0jrlpyFwRuZ/f9q6gGL5sddw== X-Google-Smtp-Source: AGHT+IHbBy1vuIP1D9N/BZThCao3URRhWQAoOskMKsAuPWcwfw10Wsu0XS7bo7PRD0BcT2+AKxZUzvlww04D4RfRVBI= X-Received: by 2002:a17:906:4c:b0:a02:ac3d:9e97 with SMTP id 12-20020a170906004c00b00a02ac3d9e97mr2423326ejg.9.1700684074674; Wed, 22 Nov 2023 12:14:34 -0800 (PST) MIME-Version: 1.0 References: <20231121090624.1814733-1-liushixin2@huawei.com> <32fe518a-e962-14ae-badc-719390386db9@huawei.com> In-Reply-To: From: Yosry Ahmed Date: Wed, 22 Nov 2023 12:13:54 -0800 Message-ID: Subject: Re: [PATCH v10] mm: vmscan: try to reclaim swapcache pages if no swap space To: Michal Hocko Cc: Liu Shixin , Yu Zhao , Andrew Morton , Huang Ying , Sachin Sant , Johannes Weiner , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: p4kwhdpyut1q3cz41fo3877io17sho4q X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9730AC0014 X-HE-Tag: 1700684076-41075 X-HE-Meta: U2FsdGVkX18nxSTqj6tICWHaYIc9GS3zvRb+CBHF0tmCbNREzv19BVVAy0L+HiSzMa0h6VuElzbGkpz8HZHCXjkuuQStkzHtdDYgId64KTONkWkf/i5pT5TBhTHYzR6YaaJb34to8Is6ZYNUEOP6M50PpZGSBbolkK/SHi22HTCu3Do2EYmLNm0Xj0lkiumNxxsO22kOWGHe8TtgBYQ5FrhjDHmZtXcpHA+Z5OHc3wXNG1UxqfPhuxW4dXppPbUB9OVR4/tZ0yTXsLO6DRgjSzTiAoGbtm5U8as1Y0BQMb8Cw6h3dRZcvyaJA8oUEDd2cvS87kMljm1UudYyn5oi90mtfW2zX/wsj6alzCvVyjMVkNfO75Mym/SNlu0t5hsbXXMOJ/CM4+BmWRwNoVI8J4M5JXVbWyX1b1tvKSpvHiHI8oXYDCw7eRU5R6vyJ52OiDXL0O7dYE6lGb2pa/pouUQINWk4QDTbcG5UAykvYI3SNuxdjNGdIqS4E4Cgq1+Bn9VANzAmrSDHs24tCEKwZ9Xy5xcPoo6mrrGm2Zkh3Lt51OA9aaFk2vIUHtUYovgNpPpzx7gROYmfgDFZ+m8joMWS2fyAg1dMrNPNOZ7KwPzRpI8Wj70E3IbnVQPUz32uqTODkN4mc8eRDQpGmNcbm6uuCA2fYDda+0JGQNEeTWCh1bkH3afiuJ06fPzaLtbcRBr7D94w+b3v5N330v3zT9BSEWI8MrZ0pPklWHr8SDZY3XRQTMpHXzX5RIgGU/YQ6c4QgP9etHDQLOkXTpsghagrIxt+nI/Bxk1C/xcT/32pwVKviHoEUUnCUjuQA3l21JCQ37E9R6+ahyy8gu9XoHPbWCpF6jkBU6nQBfZW1kqvKb5BcTvHvRARq4f8C+aG2l9XEMDGvbk24sHcOtLZ6zczWBzJqI7lib1vpBc8Pqrqi67QpQMgMPDXEZ5Ay9I+SJ/N4N2ul0u0xZvhIWD Y8+yIZXk 0O13tWKeVcxkMtZrIfQP8eDh1lazncEO3EIgWB1Di+aXA35gv9t/soRqobzi3p2r6m70U8nOR2idRnS0RD/idZ+isFLB8MXQQH1zmgmJetgupdeUBZ3fKSJx8I7YHtnAQ7s1yTzfgt5YS7e2zDNH8BCE2fMFHhqa6HhNh6gqnrwGuzsJTjZlHM3YXg2xUKBGW1fuMFG1c0FJY9BPDxHqgHT0iZojZTCENwAW/HsPmYMhSQSd+rkKkf1tlmIoT94OtZPSOTFuAFD8anVW4IXOgUs73msdAtSUcBFSVlkos1I1/vD6kFmHe9SkByg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 22, 2023 at 5:19=E2=80=AFAM Michal Hocko wrot= e: > > On Wed 22-11-23 02:39:15, Yosry Ahmed wrote: > > On Wed, Nov 22, 2023 at 2:09=E2=80=AFAM Michal Hocko = wrote: > > > > > > On Wed 22-11-23 09:52:42, Michal Hocko wrote: > > > > On Tue 21-11-23 22:44:32, Yosry Ahmed wrote: > > > > > On Tue, Nov 21, 2023 at 10:41=E2=80=AFPM Liu Shixin wrote: > > > > > > > > > > > > > > > > > > On 2023/11/21 21:00, Michal Hocko wrote: > > > > > > > On Tue 21-11-23 17:06:24, Liu Shixin wrote: > > > > > > > > > > > > > > However, in swapcache_only mode, the scan count still increas= ed when scan > > > > > > > non-swapcache pages because there are large number of non-swa= pcache pages > > > > > > > and rare swapcache pages in swapcache_only mode, and if the n= on-swapcache > > > > > > > is skipped and do not count, the scan of pages in isolate_lru= _folios() can > > > > > > > eventually lead to hung task, just as Sachin reported [2]. > > > > > > > I find this paragraph really confusing! I guess what you mean= t to say is > > > > > > > that a real swapcache_only is problematic because it can end = up not > > > > > > > making any progress, correct? > > > > > > This paragraph is going to explain why checking swapcache_only = after scan +=3D nr_pages; > > > > > > > > > > > > > > AFAIU you have addressed that problem by making swapcache_onl= y anon LRU > > > > > > > specific, right? That would be certainly more robust as you c= an still > > > > > > > reclaim from file LRUs. I cannot say I like that because swap= cache_only > > > > > > > is a bit confusing and I do not think we want to grow more sp= ecial > > > > > > > purpose reclaim types. Would it be possible/reasonable to ins= tead put > > > > > > > swapcache pages on the file LRU instead? > > > > > > It looks like a good idea, but I'm not sure if it's possible. I= can try it, is there anything to > > > > > > pay attention to? > > > > > > > > > > I think this might be more intrusive than we think. Every time a = page > > > > > is added to or removed from the swap cache, we will need to move = it > > > > > between LRUs. All pages on the anon LRU will need to go through t= he > > > > > file LRU before being reclaimed. I think this might be too big of= a > > > > > change to achieve this patch's goal. > > > > > > > > TBH I am not really sure how complex that might turn out to be. > > > > Swapcache tends to be full of subtle issues. So you might be right = but > > > > it would be better to know _why_ this is not possible before we end= up > > > > phising for couple of swapcache pages on potentially huge anon LRU = to > > > > isolate them. Think of TB sized machines in this context. > > > > > > Forgot to mention that it is not really far fetched from comparing th= is > > > to MADV_FREE pages. Those are anonymous but we do not want to keep th= em > > > on anon LRU because we want to age them indepdendent on the swap > > > availability as they are just dropped during reclaim. Not too much > > > different from swapcache pages. There are more constrains on those bu= t > > > fundamentally this is the same problem, no? > > > > I agree it's not a first, but swap cache pages are more complicated > > because they can go back and forth, unlike MADV_FREE pages which > > usually go on a one way ticket AFAICT. > > Yes swapcache pages are indeed more complicated but most of the time > they just go away as well, no? MADV_FREE can be reinitiated if they are > written as well. So fundamentally they are not that different. > > > Also pages going into the swap > > cache can be much more common that MADV_FREE pages for a lot of > > workloads. I am not sure how different reclaim heuristics will react > > to such mobility between the LRUs, and the fact that all pages will > > now only get evicted through the file LRU. The anon LRU will > > essentially become an LRU that feeds the file LRU. Also, the more > > pages we move between LRUs, the more ordering violations we introduce, > > as we may put colder pages in front of hotter pages or vice versa. > > Well, traditionally the file LRU has been maintaining page cache or > easily disposable pages like MADV_FREE (which can be considered a cache > as well). Swapcache is a form of a page cache as well. If I understand correctly, when we move the MADV_FREE pages to the file LRU, we don't have correct information about their relative ordering compared to the pages that are already in the inactive file LRU. IOW, we mess up the ordering of the inactive file LRU a little. If we add more cases of moving pages to the file LRU (for the swap cache), we may make it worse. I am also not sure how this works with MGLRU generations. Keep in mind that when a page is affected with MADV_FREE, it's always called. On the other hand, when a page is added to the swap cache, it could be because it's under reclaim (cold), or it was just swapped in (hot). I am not sure this practically matters, just something to think about. It also seems like all evictions will now be done from the file LRU, so some heuristics (like workingset) may need to be updated accordingly. > > > All in all, I am not saying it's a bad idea or not possible, I am just > > saying it's probably more complicated than MADV_FREE, and adding more > > cases where pages move between LRUs could introduce problems (or make > > existing problems more visible). > > Do we want to start adding filtered anon scan for a certain type of > pages? Because this is the question here AFAICS. This might seem an > easier solution but I would argue that it is less predictable one. > It is not unusual that a huge anon LRU would contain only very few LRU > pages. I agree that it may be a problem in some situations. > > That being said, I might be missing some obvious or less obvious reasons > why this is completely bad idea. Swapcache is indeed subtle. > -- > Michal Hocko > SUSE Labs