From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67E3ACDD0E1 for ; Tue, 22 Oct 2024 20:15:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D76CE6B00AC; Tue, 22 Oct 2024 16:15:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D274D6B00AE; Tue, 22 Oct 2024 16:15:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEE7A6B00AF; Tue, 22 Oct 2024 16:15:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A0A5B6B00AC for ; Tue, 22 Oct 2024 16:15:16 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 48966A9F7F for ; Tue, 22 Oct 2024 20:14:43 +0000 (UTC) X-FDA: 82702342254.05.A370BDD Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf29.hostedemail.com (Postfix) with ESMTP id 5658F120005 for ; Tue, 22 Oct 2024 20:14:52 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=E4B2nNuO; spf=pass (imf29.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729628038; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CcFx6jKt7NPm7XFrJBKBBZDDhyhUeCqHIYcsin52j20=; b=YrxrDak4V1JD2Eh7MBPb7gXyxGIOsQkEyCqORcf1piMQIJ9abROd19Rtf4gNuk9gZW4XEa 9aVkwbh/pAtAYw5DL0cz5lr99Ycg2amCcn8c9Ig9asecvZHkSecjaHC1aANiiIxhX+WxOY oDFjEDjLJyMe6YITMnXdBjN+S6jBxVw= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=E4B2nNuO; spf=pass (imf29.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729628038; a=rsa-sha256; cv=none; b=Afm1qCUCmfbVJ4PjHkZE9Z/nBjZTutMb/eQLa0nTgWzLHm3lgPQpuaTE9y0X4YGkli1zKd 9Jg9EULxYAMCnc2SICH94SVnGk74vOQmdIE80bJokwRh8S8I5qfVayYullOBOtS7mTKdJJ XQxxo1wZC7AoMn9wTWdEXLpvVQek3zE= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-20cb7139d9dso55470355ad.1 for ; Tue, 22 Oct 2024 13:15:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729628113; x=1730232913; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=CcFx6jKt7NPm7XFrJBKBBZDDhyhUeCqHIYcsin52j20=; b=E4B2nNuO5/5ERMeBYknrMDLme+NntqmsI/VpwhaltacjuOCJWVmXAt+R7uCwMCESej xys+wwa1/Nm/M18QoAZBFOCap+knqMylrUSfsFtBHPIQ/C2uPtGI5N5joC13RWQ2wnD3 GAclcLfJn9yRw1nuJ4JsGVUqxx/zUmLAbLTrxEGZsqH8LrWHE2zrEWE0wZyI3oL0Cs4L Nji5N7SAR+t84QUFOCAV5wPxoTEeYiztlZaUBpp+blygldJTl0F4FpBNLgPkvRMqzoet nS0KCaUaMI6cqpo72g6DlSKR34VBzJ9WustMAkmIPv0KcCbiSOMAI/UsOVY6C7p4bFz6 IWww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729628113; x=1730232913; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CcFx6jKt7NPm7XFrJBKBBZDDhyhUeCqHIYcsin52j20=; b=gKxOrCRKKRypc2UfnaCYs9kgN0y+c7t36+8mkXziuSFodo4akreeT4bjTTwUAbQsfK ftrMJM986JQ0Klw9aTRWtv97v56aZ9yT/eGMTyxKSooDK/whcnw6pk3gh5MHyuT0BmLB KfT6TpFdpIaz8CzrLJk1is2RQ5eSl52YATuAoRDfHXq9kYv2+PZzKdAv8qkmYmbs2kog /kJ4nxh/q3m680I5rXXa0KaUV+uwQ7+h7BJ8rUw6sxZ8G6PnD/T1seBH1S8sB7q9Nfet WraZ7mwc6ulZP4JS1MisC8+FANijFAsfO3ltGGJ11HWvql4WXvtUsEQ3yMywz3Nd0EJD yTYA== X-Forwarded-Encrypted: i=1; AJvYcCWvI0vY9otcdPWCQxNOnvhxXN8led1PukqhnPpnVsilKD3OJrrpQ/E8nZ5qr4HXd7YaTy7uM03NSg==@kvack.org X-Gm-Message-State: AOJu0Yy3QvtGGpUHApVdWR56UBvabpeob8m+NGl8j/9SW0Qh10uTPzI8 7RKRl1G7p4KCsGZd8cLdiJ8DMA7i9o6rl4I7zsj2YlACHZq0K/02 X-Google-Smtp-Source: AGHT+IEEtCTMQAjRRS1qWU9i9c7C9Av3MT5kGXo6PBlEg9ZjL+c1+HmRQjyY8cfs+z5u8k6KAUZLPA== X-Received: by 2002:a17:90b:246:b0:2e2:ede0:91c with SMTP id 98e67ed59e1d1-2e76b727a8emr137336a91.36.1729628112750; Tue, 22 Oct 2024 13:15:12 -0700 (PDT) Received: from google.com ([2a00:79e0:2e14:7:a2ff:33b0:aec2:dc1d]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e5ad518767sm6672707a91.56.2024.10.22.13.15.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2024 13:15:11 -0700 (PDT) Date: Tue, 22 Oct 2024 13:15:09 -0700 From: Minchan Kim To: Barry Song <21cnbao@gmail.com> Cc: Minchan Kim , Andrew Morton , yuzhao@google.com, linux-mm@kvack.org, david@redhat.com, fengbaopeng@honor.com, gaoxu2@honor.com, hailong.liu@oppo.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, lokeshgidra@google.com, mhocko@suse.com, ngeoffray@google.com, shli@fb.com, surenb@google.com, v-songbaohua@oppo.com, yipengxiang@honor.com, Gao Xu Subject: Re: [PATCH v2] mm: mglru: provide a separate list for lazyfree anon folios Message-ID: References: <20241016033030.36990-1-21cnbao@gmail.com> <20241016155835.8fadc58d913d9df14099514b@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 5658F120005 X-Stat-Signature: j8jy4s9bhqrietexu9hmhcinpxjwchpx X-HE-Tag: 1729628092-194579 X-HE-Meta: U2FsdGVkX19BcJOcRoBcUjrl2xK+PTTtSsgpg8lX9kCnDSqsF/3CeYmduGka/nOwr8wpBMdYVEkeDE4VwRXYO0lgLkrfJwDWqz0Uwbeu7HvDIJAupfLPjP7PB4SI/KjDJTWhN7T8W4W57mRjXHMEg3d9smyPZvENAuNhxmWURxish4VxvaPfsnwd57p2zyarJop60DZrXg/xoSDVWep0gAlS99rvaFuZ6RsAo1+rBWcJvoz2GTxwt9TdDSud0Agp3s86WXWz9E71pBvZDtxkhf2gsz5uBropaLV1Hzqz6xM/GM0D7FKosAbJCeDATB0Vd21LcuiXwINw/jEjsJ6OwNw7PNYnAfiDt98lYlxETFMMYaZI4REOSL9IzyUVURFJ/TAlmdp27lmJXfbSYikufcx+5swnXe+Ag8iApewnfqI9D2x1XcNeCbVZJE56Qjtxyrxn+l98+boJYvRvbrkua06iJ5sYClyw9v6WtGCwsxZ9o7f1IDpPHjfv/spP26HmtoyDwOyp+7Ll/NB0o0pmIs7rYAeNo/pnhgPfinsjWWop9+T2bO5EhTeYqx1qqshx3QBFuA9TGNSf11aeDFDiCe70xhwRb1w7ybHHoUvgUJMnikcCYaKY14jdG9CTYGOnBwfLIsDTS3PnvNNT0fYwi3vKiUTvXAdPTTCWQMA+qdPXLWgNNJcw5VAMWW1jecEZUNWOAsXxlfOnXql5+cdL7PjhLyAhj+AmBIMexh4ZF89pyP8gk0BldinJ6p2xrcVkb0QVQ/w7FnYuOuW7UN8rpOwZguoDX5KpQ/KLZt7JY3dp0v87ZteSh9V2apKnl8u/bnfqeHBr0l56E5tWaFaPqxdUo26yT9Kc9gXqksvlI3lwyQgwqdLw/FbkSLywO94XXE9v3VjsPYuDO/QNIS48vCophbHUX2LCRO8LpwTRS7DGB2/sf9uBCsfeD3+FPy+C/XeQO8VbAshg3zUdy6N ApxDleCf PauJ1/+eujhdGQz1GwC+n6ruXCYW4St3V5iw1llnOv7gb25laFypVLiOM/9hVnKvIlbNOh69Tyw3q+G3puK52+ybMrnO/gBE2RCXyBQpOBVxX+lnmHER6jaT+AIVz1inWIaDvxrRUaTD8WzBGzLYCQyW/XDpHyKtZLN/DX5aFFozdgbN63VYk46qIv7Qz9Ylrx6TTcHN8SKOrUZV8isqrg9uYzagoUVyBgs4qIaldwIOj9B8wbwIorkUC3ys2xXBfsCOI6ShzlbZ4L4h1r39qiu7v3djMGtG1AMdC/vP9cWPEBCjDCSTR5Cl5dSXUT2dJNh6oWnjrXsewZMZ45HG0BvF5Bbg0sPIKoUUeVgvqG6GJ5hydsTyfqwl7LgNi7gNlQEkazfYxOwk5PHgXudG5lw+QBYiP6z1eC+6TiLp7wYv4M+xhhoiSzByoVweoJLpWtAiDkkNX9lbWSqLwyHERbRjDH7PLaknOWmViPAKY5a0zu9SR+KCLVo7ZtJTkrkZ8GNOEaG7VnloPvdJ4cTi7yRO/DK50wRDrRGq25BljeaQQZDs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Barry, Sorry for slow response. On Fri, Oct 18, 2024 at 06:12:01PM +1300, Barry Song wrote: > On Fri, Oct 18, 2024 at 6:58 AM Minchan Kim wrote: > > > > On Thu, Oct 17, 2024 at 06:59:09PM +1300, Barry Song wrote: > > > On Thu, Oct 17, 2024 at 11:58 AM Andrew Morton > > > wrote: > > > > > > > > On Wed, 16 Oct 2024 16:30:30 +1300 Barry Song <21cnbao@gmail.com> wrote: > > > > > > > > > To address this, this patch proposes maintaining a separate list > > > > > for lazyfree anon folios while keeping them classified under the > > > > > "file" LRU type to minimize code changes. > > > > > > > > Thanks. I'll await input from other MGLRU developers before adding > > > > this for testing. > > > > > > Thanks! > > > > > > Hi Minchan, Yu, > > > > > > Any comments? I understand that Minchan may have a broader plan > > > to "enable the system to maintain a quickly reclaimable memory > > > pool and provide a knob for admins to control its size." While I > > > have no objection to that plan, I believe improving MADV_FREE > > > performance is a more urgent priority and a low-hanging fruit at this > > > stage. > > > > Hi Barry, > > > > I have no idea why my email didn't send well before. I sent following > > reply on Sep 24. Hope it works this time. > > Hi Minchan, > > I guess not. Your *this* email ended up in my spam folder of gmail, and > my oppo.com account still hasn’t received it. Any idea why? In the end, that's my problem and don't know when it can be fixed. Anyway, hope again this time works. > > > > > ====== &< ====== > > > > My proposal involves the following: > > > > 1. Introduce an "easily reclaimable" LRU list. This list would hold pages > > that can be quickly freed without significant overhead. > > I assume you plan to keep both lazyfree anon pages and 'reclaimed' > file folios (reclaimed in the normal LRU lists but still in the easily- > reclaimable list) in this 'easily reclaimable' LRU list. However, I'm > not sure this will work, as this patch aims to help reclaim lazyfree > anon pages before file folios to reduce both file and anon refaults. > If we place 'reclaimed' file folios and lazyfree anon folios in the > same list, we may need to revisit how to reclaim lazyfree anon folios > before reclaiming the 'reclaimed' file folios. Those reclaimed folio was already *decision-made* but just couldn't due to the *impelementation issue*. So, that's strong candidate to be reclaimed as long as there is no access since then rather other candidates. > > > > > 2. Implement a parameter to control the size of this list. This allows for > > system tuning based on available memory and performance requirements. > > If we include only 'reclaimed' file folios in this 'easily > reclaimable' LRU list, the > parameter makes sense. However, if we also add lazyfree folios to the list, the > parameter becomes less meaningful since we can't predict how many > lazyfree anon folios user space might have. I still feel lazyfree anon folios > are different with "reclaimed" file folios (I mean reclaimed from normal > lists but still in 'easily-reclaimable' list). I thought the ez-reclamable LRU doesn't need to be accurate since we can put other folios later(e.g., fadvise_dontneed but couldn't at that time) > > > > > 3. Modify kswapd behavior to utilize this list. When kswapd is awakened due > > to memory pressure, it should attempt to drop those pages first to refill > > free pages up to the high watermark by first reclaiming. > > > > 4. Before kswapd goes to sleep, it should scan the tail of the LRU list and > > move cold pages to the easily reclaimable list, unmapping them from the > > page table. > > > > 5. Whenever page cache hit, move the page into evictable LRU. > > > > This approach allows the system to maintain a pool of readily available > > memory, mitigating the "aging" problem. The trade-off is the potential for > > minor page faults and LRU movement ovehreads if these pages in ez_reclaimable > > LRU are accessed again. > > I believe you're aware of an implementation from Samsung that uses > cleancache. Although it was dropped from the mainline kernel, it still > exists in the Android kernel. Samsung's rbincache, based on cleancache, > maintains a reserved memory region for holding reclaimed file folios. > Instead of LRU movement, rbincache uses memcpy to transfer data between > the pool and the page cache. > > > > > Furthermore, we could put some asynchrnous writeback pages(e.g., swap > > out or writeback the fs pages) into the list, too. > > Currently, what we are doing is rotate those pages back to head of LRU > > and once writeback is done, move the page to the tail of LRU again. > > We can simply put the page into ez_reclaimable LRU without rotating > > back and forth. > > If this is about establishing a pool of easily reclaimable file folios, I > fully support the idea and am eager to try it, especially for Android, > where there are certainly strong use cases. However, I suspect it may > be controversial and could take months to gain acceptance. Therefore, > I’d prefer we first focus on landing a smaller change to address the > madv_free performance issue and treat that idea as a separate > incremental patch set. I don't want to block the improvement, Barry. The reason I suggested another LRU was actullay to prevent divergent between MGLRU and split-LRU and show the same behavior introducing additional logic in the central place. I don't think that's desire that a usespace hint showed different priority depending on admin config. Personally, I belive that would be better to introudce a knob to change MADV_FREE's behavior for both LRU algorithms at the same time instead of only one even though we will see the LRU inversion issue. > > My current patch specifically targets the issue of reclaiming lazyfree > anon folios before reclaiming file folios. It appears your proposal is > independent (though related) work, and I don't believe it should delay > resolving the madv_free issue. Additionally, that pool doesn’t effectively > address the reclamation priority between files and lazyfree anon folios. > > In conclusion: > > 1. I agree that the pool is valuable, and I’d like to develop it as an > incremental patch set. However, this is a significant step that will > require considerable time. > 2. It could be quite tricky to include both lazyfree anon folios and > reclaimed file folios (which are reclaimed in normal lists but not in > the 'easily-reclaimable' list) in the same LRU list. I’d prefer to > start by replacing Samsung's rbincache to reduce file folio I/O if we > decide to implement the pool. > 3. I believe we should first focus on landing this fix patch for the > madv_free performance issue. > > What are your thoughts? I spoke with Yu, and he would like to hear > your opinion. Sure, I don't want to block any improvement but please think one more one more about my concern and just go with your ideas if everyone except me don't concern it. Thank you.