From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B3F6E7716C for ; Thu, 5 Dec 2024 15:20:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B04136B00D1; Thu, 5 Dec 2024 10:19:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F3F16B00CC; Thu, 5 Dec 2024 10:19:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9540F6B00BE; Thu, 5 Dec 2024 10:19:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F05F06B00AD for ; Tue, 24 Sep 2024 16:12:50 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A7BC9C03B8 for ; Tue, 24 Sep 2024 20:12:50 +0000 (UTC) X-FDA: 82600730100.27.23F0F80 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf19.hostedemail.com (Postfix) with ESMTP id 9D60C1A0014 for ; Tue, 24 Sep 2024 20:12:48 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MOCJgrwA; spf=pass (imf19.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=quarantine) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727208707; a=rsa-sha256; cv=none; b=4kQmeoRLrLCRkoRnVRixqx1HtQSKkFuv8oteKI1QGM0Q6jIcR8G8PktpBInJ/HbQIJolV0 0OiHcLPs3zUeY5DajTwlmYA0cfu1YoayGX//W4n4+X46OvXaN4EFYMku5rEg6KAxBmRjBU NEp45DclRz2GgAAeAkYZ/sGUD5aWO6Y= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MOCJgrwA; spf=pass (imf19.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=quarantine) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727208707; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qaiIli3SUiX1p3liG11GepCcmjcDTbAFzLK5EhYgIL8=; b=xCS0SfUm1PHCk0P8EvRzasPfgEwCJNVruDTtriezyxQrr9eyEVtiofRJMGb8OPJXZSEWzf bOsnHIA+xil2AkNw88NFsN4F+ZwGnbcR2+ezAfYFm9XrHxKcz4OWWfJ/BRZ5Yth/Y83yc+ cfZgBx5OMrCXjQB66eLqVrKq0Hom8jU= Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-2d8abac30ddso4868603a91.0 for ; Tue, 24 Sep 2024 13:12:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727208767; x=1727813567; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :from:to:cc:subject:date:message-id:reply-to; bh=qaiIli3SUiX1p3liG11GepCcmjcDTbAFzLK5EhYgIL8=; b=MOCJgrwA6RhXp4gO/Hgib46zY3hNjT7PsWf3LRkDa6BH5TTl7cBkCUyr5OHKaM212n +ahxNL5treMmuf4Bso6EbiWwZPrrScLkqxLIEkEPWiTfRauOAbfyyXjnTnb3P6rVlfFM FXnjbCzrwF7mjDxowKQBBpBUSH5rYWtG0m9Cjpg55e5kL0Oo/BBXaRyBVPFxL13scpKc XCsgNakcef3W/OTTaLOsHCsHPw+Kbv2MYKnq+RvrYhsai3O0miA6NQR2+7zkkHC7TU0I Q2gdGpeqFeINQvQ/ZSDuUv8EK6Sj3Dow6T166V2x0ilkxROMsqdae13Y8AX9ZVWy3bTr QfIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727208767; x=1727813567; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qaiIli3SUiX1p3liG11GepCcmjcDTbAFzLK5EhYgIL8=; b=Xur9EtHOCQGG1efmsKNFCDCia9+w5DI3bqevL8bTpIsxkg16FhurS33/Q1IxdDQMFN C431D9HbBxN5Y7Y2DChvAUOC0X5ANWmLKmEBOaCZYs3/quBORh0p1OoZ16R2vi5lacVu M7hTqiONg/bMSL1N7uS4kCuD0F/fkEuXrKKWUsG2C1Vl00wDUcT7N/ix39vA5oCEfX7W I8lPPbsrEPVFilnPeK/Z+ZhfKFnS0ZqTeCWOrMzpxueWzgpspQv3H7niXrWis6DZX7g0 JDBquQsHJj15Q9sBEgN7Qiscrz8k8zqQ7DpzcegplN6C/RvWm/efLbWSEjae+NCzjX/G /dtA== X-Forwarded-Encrypted: i=1; AJvYcCXPu25novm+Ab0qNfKZuTEI5kYJJhefohdmjMsVcDSROy4goOLu5lf3awll5w4neFZ0hTkavKVn/A==@kvack.org X-Gm-Message-State: AOJu0YxOWWY/7pRRNPzA0pB6P5LHfWWQYwS/D54epKZ6dpL9WJBIA784 QtEPqhwh/cBJRk8b1b/yWLrjnOqnoMomfvK6g3W9W2uYd1trHp8/ X-Google-Smtp-Source: AGHT+IHj/iU+2b39zckDJEhUGG1NZFZX4E5uJUpQVmVJEmQhNW59NQS9gmo4f8cX4Wjwj8q6pdE4zQ== X-Received: by 2002:a17:90a:8d10:b0:2d8:8818:4d53 with SMTP id 98e67ed59e1d1-2e06b011f7emr292109a91.41.1727208767157; Tue, 24 Sep 2024 13:12:47 -0700 (PDT) Received: from google.com ([2a00:79e0:2e14:7:5ad3:473c:219d:b8c2]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e058f1be64sm1972267a91.21.2024.09.24.13.12.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Sep 2024 13:12:46 -0700 (PDT) Date: Tue, 24 Sep 2024 13:12:43 -0700 From: Minchan Kim To: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand , akpm@linux-foundation.org, linux-mm@kvack.org, mhocko@suse.com, fengbaopeng@honor.com, gaoxu2@honor.com, hailong.liu@oppo.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, lokeshgidra@google.com, ngeoffray@google.com, shli@fb.com, surenb@google.com, yipengxiang@honor.com, yuzhao@google.com, Barry Song Subject: Re: [PATCH RFC] mm: mglru: provide a separate list for lazyfree anon folios Message-ID: References: <20240914063746.46290-1-21cnbao@gmail.com> <92f97c8e-f23d-4c6e-9f49-230fb4e96c46@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 9D60C1A0014 X-Stat-Signature: hqke6ah6oeu7z5k9utsd3pi5ek4xkkmf X-Rspam-User: X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam: Yes X-HE-Tag: 1727208768-572819 X-HE-Meta: U2FsdGVkX18dgDrCj7D9xoCBd6B+/vq1bA5eDycMuuucf5yWc2O3H4qOY74JiejZqjKdIdKNY0S2et9ArBnhRcx3F0Ri7wnGkUPpdLGVi0UE1SHw0BgAsswqg03J0z1TZGMWanC8MbuKZcexnMGsltd+XqOjS4K0AY7cyABz0nDiEphk3oNhcwX3sl6SYwadgMU5Sc0Zf8klVvOgIhVaeVVnNYDsmqzMm2SuOooi+Fof7MgxbvovklnJ5mdjRdloy29ZcUEG+SFtmxpgZGywPfV741tV/A78wNEHenzPRuTP3qibrVuiyjek99PPZ5osA8BxMqMF8jESiihA0+USnwGIVYITILD/9H0dcnkjlKO1mxdGgw4SIa9eSbBEICQAKrlezKqLVzOE7W2tR6GMJgkLPWV4WFXmY7lvSKnV2hJKW+TvKlhbbmtS6j58bYCX5zclct5GAJ3YZ6aa4hlA6+m4uoVdFkRWKduU3ykN4vC7E2Sc2wqqYbTy6SXtTdrEYJXU0gkMcG/W2n9xCwD9FH9TMy4SHgNyW6cBvSZQPRKPxPeSTAjjW6F+uR0RKZVzBdm2+Fc0aQUvkN62aaPgBpNh+9EiA6YiKrrSWxby9LN9zvFvA4KtBqAvNATY4o3DudtJ/KfhKJb4QiNcqExwcVS8l7LQkPqiazDiVmm3FhpNvVc8wH2ONbNVmUOU8ITRE0nU+vB5truq0I/k3Ll6ZQFoXSHJGMnVxjKtSc5Ya4sjDwWxg5GS52DMxAKGVQ6DTkf5KTUOmfZ4aVg9wd8XUaiA7ts/y+JvXsmGEJJzCZxvFdq0tvQEfK1lh5TDbUB09tk+suJfQ9KGjbN3BQhqdKg+eCnH+/cXYY10+9BCCoVUszz0E9xm1ACRePRS+1Yp75At9XqdzeGQ9Uc8rbmxGrFRseQB4A8HlWGXxYdGsEVP5VSY8dU7gBa7QdNudC2E12wehqT+v6380L0KLfS DWfHoqh6 pG0jXeL6fpyRLUDquaopuK7Pp/HPQlDmDXuvC2Ia4J2kF7XVxn8cKyW0z0983p5Yfj/m3a3aErAE9ueOBBv4pOF2HEXplBN8z3PB4MS1faQIloNpyf9Az+oX67XFnXxTPYHgkEKNyYtqnsu6DydYwdaCz+uWupxZTCmwIApsNjuO+TQE9HO9tsBx3j330jUPB7/bd1b4IjTM/D/S0JVCHWIQVOEi8HZK5Bzq5U/NcA1HZN8U6kwUmejDxXA1WN6alcx2iw+2kh/H6D4gOu6a1k/PIdQfhZJgSAXqgQPfBDgrhS9P0iDDvahFx8HqmYEfHfDe1c+CFbKBSuxkKiGkJdSheg2lMu3slsYGBoCGmkkBQVN038pn/0Y0sXVYhZcAW8elfK5k7YGkc4zi0wpyCBic+WtuwJcs+je+Wz7USWEy5CjXTnlGob2oXVfyPwKpNS56j7QzuqscMNrKGCKLStoiPlEaC+Zbj4Mw8EYNXtChD/7k40ZrPesS6GH/k2g2yfh1i+xm6J0rzyCjh76KvkQfKOkI0HkQTqzsLORaVjkFit6LEy/5gJik1jSGWM6nwqrMjdfz43ABA+vDQ6Z9VGlTbBK658JtSG6CdqYrOz6rYCGAsVp6Dv4mx7BStF4AwXvBFKufr/8e9YoOB1BYOXnX/j6Me8fVsgb58eZy0EPmBt2ogd7G3BIi+x5mkkruwGpg6blPfcuXk2xw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 24, 2024 at 10:38:37AM +1200, Barry Song wrote: > On Tue, Sep 24, 2024 at 10:19 AM Minchan Kim wrote: > > > > On Fri, Sep 20, 2024 at 01:23:57PM +1200, Barry Song wrote: > > > On Wed, Sep 18, 2024 at 12:02 AM David Hildenbrand wrote: > > > > > > > > On 14.09.24 08:37, Barry Song wrote: > > > > > From: Barry Song > > > > > > > > > > This follows up on the discussion regarding Gaoxu's work[1]. It's > > > > > unclear if there's still interest in implementing a separate LRU > > > > > list for lazyfree folios, but I decided to explore it out of > > > > > curiosity. > > > > > > > > > > According to Lokesh, MADV_FREE'd anon folios are expected to be > > > > > released earlier than file folios. One option, as implemented > > > > > by Gao Xu, is to place lazyfree anon folios at the tail of the > > > > > file's `min_seq` generation. However, this approach results in > > > > > lazyfree folios being released in a LIFO manner, which conflicts > > > > > with LRU behavior, as noted by Michal. > > > > > > > > > > To address this, this patch proposes maintaining a separate list > > > > > for lazyfree anon folios while keeping them classified under the > > > > > "file" LRU type to minimize code changes. These lazyfree anon > > > > > folios will still be counted as file folios and share the same > > > > > generation with regular files. In the eviction path, the lazyfree > > > > > list will be prioritized for scanning before the actual file > > > > > LRU list. > > > > > > > > > > > > > What's the downside of another LRU list? Do we have any experience on that? > > > > > > Essentially, the goal is to address the downsides of using a single LRU list for > > > files and lazyfree anonymous pages - seriously more files re-faults. > > > > > > I'm not entirely clear on the downsides of having an additional LRU > > > list. While it > > > does increase complexity, it doesn't seem to be significant. > > > > It's not a catastrophic[1]. I prefer the idea of an additional LRU > > because it offers flexibility for various potential use cases[2]. > > > > orthgonal topic(but may be interest for someone) > > > > My main interest in a new LRU list is to enable the system to maintain a > > quickly reclaimable memory pool and expose the size to the admin with > > a knob to decide how many memory pool they want. > > > > This pool would consist of clean, unmapped pages from both the page cache > > and/or the swap cache. This would allow the system to reclaim memory quickly > > when free memory is low, at the cost of minor fault overhead. > > My current implementation only handles the MADV_FREE anonymous case. If they > are placed in a single LRU, they should be able to be reclaimed very > quickly, simply > discarded without needing to be swapped out. > > I've been thinking about the issue of unmapped pagecache recently. > These unmapped > pagecaches can be reclaimed much faster than mapped ones, especially > when the latter > have a high mapcount and incur significant rmap costs. However, many > pagecaches are > inherently unmapped (e.g., from syscall read). If they are placed in a > single LRU, the > challenge would be comparing the age of unmapped pagecache with mapped ones. > Currently, with the mglru tier mechanism, frequently accessed unmapped > pagecaches > have a chance to be placed in a spot where they are harder to reclaim. > > personally I am quite interested in putting unmapped pagecache > together as right now > reclamation could be like this: > > lru list: > unmapped pagecache(A) - mapped pagecached(B) - unmapped pagecache(C) - mapped > pagecached with huge mapcount(D) > > A and C can be reclaimed with zero cost but they have to wait for D and B. > > But the question is that if make two lists: > > list1: A - C > list2: B - D > > How can we ensure that A and C won't experience many refaults, even though > reclaiming them would be cost-free? Or that B and D might actually be > colder than > A and C? > > If this isn't an issue, I'd be very interested in implementing it. Any thoughts? My proposal involves the following: 1. Introduce an "easily reclaimable" LRU list. This list would hold pages that can be quickly freed without significant overhead. 2. Implement a parameter to control the size of this list. This allows for system tuning based on available memory and performance requirements. 3. Modify kswapd behavior to utilize this list. When kswapd is awakened due to memory pressure, it should attempt to drop those pages first to refill free pages up to the high watermark by first reclaiming. 4. Before kswapd goes to sleep, it should scan the tail of the LRU list and move cold pages to the easily reclaimable list, unmapping them from the page table. 5. Whenever page cache hit, move the page into evictable LRU. This approach allows the system to maintain a pool of readily available memory, mitigating the "aging" problem. The trade-off is the potential for minor page faults and LRU movement ovehreads if these pages in ez_reclaimable LRU are accessed again. Furthermore, we could put some asynchrnous writeback pages(e.g., swap out or writeback the fs pages) into the list, too. Currently, what we are doing is rotate those pages back to head of LRU and once writeback is done, move the page to the tail of LRU again. We can simply put the page into ez_reclaimable LRU without rotating back and forth.