From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14274CA0EDF for ; Fri, 30 Aug 2024 04:52:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7281C6B00B2; Fri, 30 Aug 2024 00:52:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B06A6B00BB; Fri, 30 Aug 2024 00:52:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52B3F6B00BC; Fri, 30 Aug 2024 00:52:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3074E6B00B2 for ; Fri, 30 Aug 2024 00:52:18 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AB399140D36 for ; Fri, 30 Aug 2024 04:52:17 +0000 (UTC) X-FDA: 82507690314.12.DBC3C88 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) by imf01.hostedemail.com (Postfix) with ESMTP id A23B340008 for ; Fri, 30 Aug 2024 04:52:15 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=qhdIVpZJ; spf=pass (imf01.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.172 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724993465; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BMq71v6EinFbxfx0sEAOWJnX25pjFl6A1prMKAEx9tU=; b=khDHf1/8FwQEUR0zsXpNpRsN9jgWpWROGfPjw46n7e+oHKiWJVHP4DAPALKLK1wdLHIRW5 7mMVOzScbA11E4lYAoof91FFdkMmXjtJP+EalFEYg/fpm3OA0Sxz2Rn6Uc2SIWjJnFdjjh Vfq7UWBc3wlaCREOdWf2NtoBAimpHlY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=qhdIVpZJ; spf=pass (imf01.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.172 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724993465; a=rsa-sha256; cv=none; b=O0rs7yv2PV2feqkt1WZz38uWRlBbujECv2QtA4xvDFkoZNBAU8ZV912rjFPFIx6VLzCY48 Mi3RtIZ5CM4BiYHWPdDkbjF4NVcPSjGEmmGOm9BZRwY0H9F3SzzAe9WdSL6qR5+HR519uM S1YyiqcO1JBy/h4ek7O1zTPwTBYf3HA= Message-ID: <8545b4d8-ba21-4607-8217-2b7b02ccb4d8@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1724993533; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BMq71v6EinFbxfx0sEAOWJnX25pjFl6A1prMKAEx9tU=; b=qhdIVpZJtpVjNL1TEjTcj3TCYkdiKWhhs4meEZlEbfpswph8GIVOYLi7cHo3TNskL5kHOo +U2ukbhJhoGphKhmbLo04DAHQwoJJP22zk7Mxa8rj0cFMcnxxYWet3YtILRz0GBCNex3Ua uvvKNKBkym7HlcufwxBLZNo1igr4G8g= Date: Fri, 30 Aug 2024 12:52:06 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios To: "Sridhar, Kanchana P" , Nhat Pham Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "yosryahmed@google.com" , "ryan.roberts@arm.com" , "Huang, Ying" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" , Usama Arif References: <20240828093516.30228-1-kanchana.p.sridhar@intel.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Stat-Signature: kcfeu4qy8o81bpducx395ywe9sfcia7t X-Rspam-User: X-Rspamd-Queue-Id: A23B340008 X-Rspamd-Server: rspam02 X-HE-Tag: 1724993535-956108 X-HE-Meta: U2FsdGVkX1/R1EwupsUFQpVQPHKFJGTInmRolfgU43uhFfeQRrke04/byCSLtWC5dVpHD8EXbGs4Cei1tfb3usPRlhe+IGWaTqc53t3aZoVkVKFixtgXU/n8m7C35LDOQgwNNR97CT4lIV505QFADwvET8fMuhDyRtE26dz6gG8s5QmORyShCoddwNcHtdpd67qLPy9dJQ4mTLCRGy9RR5st5VOJ4tbTadX8Qbsbl7pQjCigd+AZ8fGtwBDSwM9migJlRw05BwR4F7Hp3qAL7Efl8VejHw8HR7wRbOYZ6OLd5MCViHSeuHr8ws5RxETO5o3qfWILa5A1kgYiva9Bk68Nre5U0mcZgq1mQFSwgAwr4CZXvLcbz0GsmvL+DxRimNLil6Wou+HPOTYQdS+v5vUA28lFEt7R/nxoILW8FUyFcqp++f5GQefDWApsW73wLLGGvXT7giglHZiKZxMP77RTfnq2fFRREdev2k7ye8jDsMmJBiYfAeAlYmsXluEsPJEGaRvB0RW/5iaZLR8Sta13V4XtwldjxNv85KO/P04YgK+e7O9MKwtymiMgWdQax7CqTcqy/59iBOKx35AoFjzJKEYGuINvV2vDeaLcchPmdwsmurH4YzH9Y6TsLV9PsygBkGPpxk1svF3/88WTaYmm/Ybe6DbQRM3J4gmJgZr6I9BFyGPH9qJaol4nALpaycSvz7cz9xiNjqaFxVuHAREHktQ4DRqKV120F4ShonDL847FsbIUIS6GrBDdQI6dpYmv/XZ6oBpajv3l2yXxYDNe0bhwW4G5rrscmR7Bv9wplTV1ar2S1ahI7k7D8qYIG8l8gkbuuqOs6K0wJulor3ucir5S/bRD+7soGOX4Sv9gh8F8GiCPs0aHlHuTwGash51WnqdEjh5jc2uO2lKOWrMn7w6zFsfVElDhf0T7Im3ZAA7m+FtqbUeT4r89sXf2lPLtyYWxAgNZbc7vIDA 5HAHMxEI w5V4eJ75oUsmF4iVbYku6PXFHPn7EjYufKufjWlMHKSHhRGwWoX8MTe1zQ/Fhz/IdZuysIqtBsHdJcEyd/LIg00wzXSoZxy4i3tiYl4JuMTj3oxMQ2LILDbdRsDLnRMgS473jBd5Jo0YoXHd4uy8T88EeKpmQk+L5fVWiQP6nJQiG8NqbeKwJMvD/LLmCqdVU8CL/4HCRgwkDfhbVN8XrhDTutsFrWEdXqS458EE+FYuL9nBsRVTyboCgbijhT2F9Nx3gCgOyxPBCWV69izIgwXVGBr8M+sJJ0k7kX4p929cjDDOfj6suqzjHr71TSGPzuXMMbRfa2jIgLkxzflX0w7clfIX5T370F5y4rdhQ4ZZqAsUbBFkZyscKKDrbI2RAIoLJHobpBggu9PlQz0NaXycMdxuWaVR1N9W0NZ8bScKF9XeaOwwURduCGQMWBgA66/jL6h2s/umNXK6OYBnAq4F9R/pPWn8NoDK5eN8tBsW+Or+m7Ag72HTDt7rCImp4FVLpPybW20lvWBSNJKCti7u04dL3z1AeEYqlPBqYgZhGJB5k3s6EjwUe1eIh5yrJVzUm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/30 03:38, Sridhar, Kanchana P wrote: > Hi Nhat, > >> -----Original Message----- >> From: Nhat Pham >> Sent: Thursday, August 29, 2024 10:11 AM >> To: Sridhar, Kanchana P >> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; >> hannes@cmpxchg.org; yosryahmed@google.com; ryan.roberts@arm.com; >> Huang, Ying ; 21cnbao@gmail.com; akpm@linux- >> foundation.org; Zou, Nanhai ; Feghali, Wajdi K >> ; Gopal, Vinodh ; >> Usama Arif ; Chengming Zhou >> >> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios >> >> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P >> wrote: >>> >>> >>>> -----Original Message----- >>>> From: Nhat Pham >>>> Sent: Wednesday, August 28, 2024 2:35 PM >>>> To: Sridhar, Kanchana P >>>> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; >>>> hannes@cmpxchg.org; yosryahmed@google.com; >> ryan.roberts@arm.com; >>>> Huang, Ying ; 21cnbao@gmail.com; akpm@linux- >>>> foundation.org; Zou, Nanhai ; Feghali, Wajdi K >>>> ; Gopal, Vinodh >>>> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios >>>> >>>> On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar >>>> wrote: >>>>> >>>>> Hi All, >>>>> >>>>> This patch-series enables zswap_store() to accept and store mTHP >>>>> folios. The most significant contribution in this series is from the >>>>> earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been >>>>> migrated to v6.11-rc3 in patch 2/4 of this series. >>>>> >>>>> [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting >>>>> https://lore.kernel.org/linux-mm/20231019110543.3284654-1- >>>> ryan.roberts@arm.com/T/#u >>>>> >>>>> Additionally, there is an attempt to modularize some of the functionality >>>>> in zswap_store(), to make it more amenable to supporting any-order >>>>> mTHPs. For instance, the function zswap_store_entry() stores a >>>> zswap_entry >>>>> in the xarray. Likewise, zswap_delete_stored_offsets() can be used to >>>>> delete all offsets corresponding to a higher order folio stored in zswap. >>>>> >>>> >>>> Will this have any conflict with mTHP swap work? Especially with mTHP >>>> swap-in and zswap writeback. >>>> >>>> My understanding is from zswap's perspective, the large folio is >>>> broken apart into independent subpages, correct? What happens when >> we >>>> have partially written back mTHP (i.e some subpages are in zswap >>>> still, whereas others are written back to swap). Would this >>>> automatically prevent mTHP swapin? >>> >>> That is a good point. To begin with, this patch-series would make the default >>> behavior for mTHP swapout/storage and swapin for ZSWAP to be on par >> with >>> ZRAM. From zswap's perspective, imo this is a significant step forward >> towards >>> realizing cold memory storage with mTHP folios. However, it is only a >> starting >>> point that makes the behavior uniform across zswap/zram. Initially, >> workloads >>> would see a one-time benefit with reclaim being able to swapout mTHP >>> folios without splitting, to zswap. If the mTHPs were cold memory, then we >>> would have derived latency gains towards memory savings (with zswap). >>> >>> However, if the mTHP were part of "not so cold" memory, this would result >>> in a one-way mTHP conversion to 4K folios. Depending on workloads and >> their >>> access patterns, we could either see individual 4K folios being swapped in, >>> or entire chunks if not the entire (original) mTHP needing to be swapped in. >>> >>> It should be noted that this is more of a performance vs. cold memory >>> preservation trade-off that needs to drive mTHP reclaim, storage, swapin >> and >>> writeback policy. Different workloads could require different policies. >> However, >>> even though this patch is only a starting point, it is still functionally correct >>> by being equivalent to zram-mTHP, and compatible with the rest of mm and >>> swap as far as mTHP. Another important functionality/data consistency >> decision >>> I made in this patch series is error handling during zswap_store() of mTHP: >>> in case of any errors, all swap offsets for the mTHP are deleted from the >>> zswap xarray/zpool, since we know that the mTHP will now have to be >> stored >>> in the backing swap device. IOW, an mTHP is either entirely stored in zswap, >>> or entirely not stored in zswap. >>> >>> To answer your question, we would need to come up with what the >> semantics >>> would need to be for zswap zpool storage granularity, swapin granularity, >>> readahead granularity and writeback wrt mTHP and how the overall swap >>> sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower- >> order >>> folios during swapout. Once we have a good understanding of these policies, >>> we could implement them in zswap. Alternately, develop an abstraction that >> is >>> one level above zswap/zram and makes things easier and shareable >> between >>> zswap and zram. By this, I mean fundamental assumptions such as >> consecutive >>> swap offsets (for instance). To some extent, this implies that an mTHP as a >>> swap entity is defined by consecutiveness of swap offsets. Maybe the policy >>> to keep mTHPs in the system over extended duration might be to assemble >>> them dynamically based on swapin_readahead() decisions (which is based >> on >>> workload access patterns). In other words, mTHPs could be a useful >> abstraction >>> that can be static or even dynamic based on working set characteristics, and >>> cold memory preservation. This is quite a complex topic imho. >>> >>> As we know, Barry Song and Chuanhua Han have started the discussion on >>> this in their zram mTHP swapin series [1]. >> >> Yeah I'm a bit more concerned with the correctness aspect. As long as >> it's not buggy, then we can implement mTHP zswapout first, and force >> individual subpage (z)swapin for now (since we cannot control >> writeback from writing individual subpages). > > Absolutely, this sounds like the way to go! > >> >> We can discuss strategy to harmonize mTHP, zswap (with writeback) as >> we go along. > > Sounds great :) > >> >> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script >> not working properly... Let me manually add him in - please include >> him in future submission and responses, as he is also a zswap reviewer >> :) > > I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include > Chengming in future submissions and responses :) Maybe a little late for the party, will take a look ASAP. It's an interesting and great work. Thanks! > >> >> Also cc-ing Usama who is interested in this work. > > Sounds great. > > Thanks, > Kanchana > >> >>> >>> [1] https://lore.kernel.org/all/20240821074541.516249-3- >> hanchuanhua@oppo.com/T/#u >>> >>> Thanks, >>> Kanchana