From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5C15AE6C5F4
	for <linux-mm@archiver.kernel.org>; Tue,  3 Dec 2024 03:06:50 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 8A6766B007B; Mon,  2 Dec 2024 22:06:49 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8552B6B0083; Mon,  2 Dec 2024 22:06:49 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 71CBB6B0085; Mon,  2 Dec 2024 22:06:49 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 5438F6B007B
	for <linux-mm@kvack.org>; Mon,  2 Dec 2024 22:06:49 -0500 (EST)
Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id B1E2F404C4
	for <linux-mm@kvack.org>; Tue,  3 Dec 2024 03:06:48 +0000 (UTC)
X-FDA: 82852160370.28.4870063
Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184])
	by imf21.hostedemail.com (Postfix) with ESMTP id AF43C1C000F
	for <linux-mm@kvack.org>; Tue,  3 Dec 2024 03:06:22 +0000 (UTC)
Authentication-Results: imf21.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=pQbtSUJ3;
	spf=pass (imf21.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.184 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1733195197;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=Ql37HxuYXrMwWUt9td8nUPZc9h8pa2tYvbPHU5U5GXQ=;
	b=UW9AVfmTGlhF+QhVLfcxuTt7GxFTFzZVzfWyoAg0dhDFXS6hBUZ93wv1Wa8EtljspZ3yla
	w5g3BTdyGOe0bDnJtwSED1J4bFyUmRP2+sLAYYjEBt1xri/rT6gXJ7YO/ZYf9UdlYbZKN+
	6XeETGYb4CHUDCQ0v4HN//l0nPndQvM=
ARC-Authentication-Results: i=1;
	imf21.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=pQbtSUJ3;
	spf=pass (imf21.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.184 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733195197; a=rsa-sha256;
	cv=none;
	b=2t0m1SZSmx43PYeL5CeYHAg3om2Uhsbv6pRjmai/g4EU7pnrAtWUQX0xkpTc5bzxisVAcV
	GmB/FStWscQrTXLmTm0TuMyk3cZaydCenmhrvNsTpuCL7/sW6IViFYP1ECQagbqb8FFkDY
	Ck890kCpUtTRZZZWyftKEq9IoA3YPKE=
Message-ID: <045a786d-7b13-4127-82ce-57510565bd15@linux.dev>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1733195203;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=Ql37HxuYXrMwWUt9td8nUPZc9h8pa2tYvbPHU5U5GXQ=;
	b=pQbtSUJ3RATHdLHqKQ/RravvKlIMY+bcfBz8ppmM1dL286zdwBjwM7CqOuWXR3DrlsWIl4
	1TkmeifThxghLbxorDCU75K/YhS2yRKHCK7cWOGzUcY9Y0IbqLZU+yVe1aITRTX+FYzUxj
	oEXmgPeZcFKh/74scf3H1mkXfeAO2yU=
Date: Tue, 3 Dec 2024 11:06:27 +0800
MIME-Version: 1.0
Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications for
 batching.
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>,
 Yosry Ahmed <yosryahmed@google.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
 "linux-mm@kvack.org" <linux-mm@kvack.org>,
 "hannes@cmpxchg.org" <hannes@cmpxchg.org>,
 "nphamcs@gmail.com" <nphamcs@gmail.com>,
 "usamaarif642@gmail.com" <usamaarif642@gmail.com>,
 "ryan.roberts@arm.com" <ryan.roberts@arm.com>,
 "21cnbao@gmail.com" <21cnbao@gmail.com>,
 "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
 "Feghali, Wajdi K" <wajdi.k.feghali@intel.com>,
 "Gopal, Vinodh" <vinodh.gopal@intel.com>
References: <20241127225324.6770-1-kanchana.p.sridhar@intel.com>
 <20241127225324.6770-3-kanchana.p.sridhar@intel.com>
 <c9a0f00b-3aeb-467a-8771-a4ebb57fbba0@linux.dev>
 <CAJD7tkbPSQguHegkzN65==GHuNN9_RPm1FonnF8Bi=BsQDhxng@mail.gmail.com>
 <SJ0PR11MB56781233ABFE772C5991AB01C9362@SJ0PR11MB5678.namprd11.prod.outlook.com>
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Chengming Zhou <chengming.zhou@linux.dev>
In-Reply-To: <SJ0PR11MB56781233ABFE772C5991AB01C9362@SJ0PR11MB5678.namprd11.prod.outlook.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Migadu-Flow: FLOW_OUT
X-Rspamd-Queue-Id: AF43C1C000F
X-Rspamd-Server: rspam12
X-Stat-Signature: i1kf51zx4enzk7zxgdeh9f53fpwixc8r
X-Rspam-User: 
X-HE-Tag: 1733195182-397955
X-HE-Meta: U2FsdGVkX19zvbclM3zarpmyOvQPNZNI7Gtk25rIrzpiFOZLDlzhn1pCCdfpvZ2AvZ4Au2ilwSkKs/1tYVbc6SYDKDgoH1H6an0rjCd7fTdYgP/hyWNk3oF597rObyHsHEWRTVECrORM7UjLpOgM9UTeM0045xhoWfzWEIzm+wQPhGbQYUuVsZPbWLoB9CJjPmme2BfeJqGS3mofV1102BrLMYrOiCLjxXiVgRtdl/tgIOhfTQqznpWsP8YO+cdNPeQ00mP8aWJCNe07xdzy5CtTWxY4Ql8ZrilZke5RhkAKmmxBh7ueDPLt89bY3pgker7IZ6Ma9169LcCx8F8soZwXkIwZCgzqJYFHB+CC2G9jhnOEI/YD5l8M0hHtTnx8Iov8Y0ykHX8YQgiOIjS6Wu5gaWZRtXHLK3txDp1qkunc9yBV2TdpZYiCawOLpuRHDolG4O83ipZa20j8iUT/pgnIS5wK2XieDwakOVb1zuZbadkpNfec1HREIe3VWJ+prBUor62rj43EsMDQWqqUeMOBk8FNAqNxRcRSbUhYMfiY0biwWdGQbuYqyVaaqWkIqFK0n97mezEQKcvfA7U3tA6ShHjsNfmtEWEiUtIFTNraXl1KdgesOuRC2zIhOjxVQw7WnNcH6qUVbJYYSx13JDiFmtG8OdY8nhIt1mOH1/Hfzn6ssXARiS3hjjXvpdQSySd5kkJConbtFB/F5KMlNUvZHbOF0E5iTOsNOs+cjZ2OVcHnwEjSY3i8N61ooyg46T89dP3Xlj0V3YxkvW0Ha1HZLbLj1fyd+vsQFRSyO8xFAxFJbHNRfKMvyFKzdAGM8/PcGI7KZ2usZprS9RFDswLqwg4FA2Poyy9OF7ZXesH/GvB6QhvRyaYTk/efzfN7lXOdeiWtpU8PowBl2sTYiz107E/SvQC6abX4BCDKtC9dWKewsl9rFJbNIJLz9gngYQ4T1EQBrBZGpZ3bVHa
 S/YI2mVI
 X2sX/liUb/fJx8f6t9wXms7pyRL5pxtuDtzWNS4QK1LmgV2fchGCUN9mVa2hcH7K1uM/aWlJiaK8leX/zxhZH0koYDk+ETLYJe2MwNwAxRcFPc8+o2+6zfUrOlVNXq0fzEOV6HXjktFd1zXDjCUMcRVcwFaxbDYOoYZDae5Mb4GVgWnY6shFHYdJcA/Bx9VZQdiXhFTEYMh84dajTn0AiFPTe7u9jnqGrzI178IKDwcOAWZ37YoYntuNNNoKSHoSvt5d0XzUH2Y91oYH0eUkzO8dNHQeotBDdA0xSQWLXc9Ew2sGwEERnFSrJweTq+6JlBHEQ7qUZ+GUn117KeRDaatyf6glSxvgqaI2Vtf6Jv3hnmhcHFjsoxgQZpRllYhR23Y717UmrhlpUjF/UutFXucjz8Yj84kwXbT9NyWz04wusvrulb0znkelBNIXtAdZU5oWwMswFvmJb0Qa8rLV54MNW5mgfCHHJtN6kCh4MIhDuvOUtecXs5v9aTDMH700mh+itE5Q64le8Lyy/rsnRd/gD4Bw8fOUT2AeuZJl9++/cnrruFK5uZ3EOIDKoAnjqCJHefFNyvI5qNndg3Jm9zdz69g==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 2024/12/3 09:01, Sridhar, Kanchana P wrote:
> Hi Chengming, Yosry,
> 
>> -----Original Message-----
>> From: Yosry Ahmed <yosryahmed@google.com>
>> Sent: Monday, December 2, 2024 11:33 AM
>> To: Chengming Zhou <chengming.zhou@linux.dev>
>> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; linux-
>> kernel@vger.kernel.org; linux-mm@kvack.org; hannes@cmpxchg.org;
>> nphamcs@gmail.com; usamaarif642@gmail.com; ryan.roberts@arm.com;
>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
>> Subject: Re: [PATCH v1 2/2] mm: zswap: zswap_store_pages() simplifications
>> for batching.
>>
>> On Wed, Nov 27, 2024 at 11:00 PM Chengming Zhou
>> <chengming.zhou@linux.dev> wrote:
>>>
>>> On 2024/11/28 06:53, Kanchana P Sridhar wrote:
>>>> In order to set up zswap_store_pages() to enable a clean batching
>>>> implementation in [1], this patch implements the following changes:
>>>>
>>>> 1) Addition of zswap_alloc_entries() which will allocate zswap entries for
>>>>      all pages in the specified range for the folio, upfront. If this fails,
>>>>      we return an error status to zswap_store().
>>>>
>>>> 2) Addition of zswap_compress_pages() that calls zswap_compress() for
>> each
>>>>      page, and returns false if any zswap_compress() fails, so
>>>>      zswap_store_page() can cleanup resources allocated and return an
>> error
>>>>      status to zswap_store().
>>>>
>>>> 3) A "store_pages_failed" label that is a catch-all for all failure points
>>>>      in zswap_store_pages(). This facilitates cleaner error handling within
>>>>      zswap_store_pages(), which will become important for IAA compress
>>>>      batching in [1].
>>>>
>>>> [1]: https://patchwork.kernel.org/project/linux-mm/list/?series=911935
>>>>
>>>> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
>>>> ---
>>>>    mm/zswap.c | 93 +++++++++++++++++++++++++++++++++++++++++----
>> ---------
>>>>    1 file changed, 71 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/mm/zswap.c b/mm/zswap.c
>>>> index b09d1023e775..db80c66e2205 100644
>>>> --- a/mm/zswap.c
>>>> +++ b/mm/zswap.c
>>>> @@ -1409,9 +1409,56 @@ static void shrink_worker(struct work_struct
>> *w)
>>>>    * main API
>>>>    **********************************/
>>>>
>>>> +static bool zswap_compress_pages(struct page *pages[],
>>>> +                              struct zswap_entry *entries[],
>>>> +                              u8 nr_pages,
>>>> +                              struct zswap_pool *pool)
>>>> +{
>>>> +     u8 i;
>>>> +
>>>> +     for (i = 0; i < nr_pages; ++i) {
>>>> +             if (!zswap_compress(pages[i], entries[i], pool))
>>>> +                     return false;
>>>> +     }
>>>> +
>>>> +     return true;
>>>> +}
>>>
>>> How about introducing a `zswap_compress_folio()` interface which
>>> can be used by `zswap_store()`?
>>> ```
>>> zswap_store()
>>>          nr_pages = folio_nr_pages(folio)
>>>
>>>          entries = zswap_alloc_entries(nr_pages)
>>>
>>>          ret = zswap_compress_folio(folio, entries, pool)
>>>
>>>          // store entries into xarray and LRU list
>>> ```
>>>
>>> And this version `zswap_compress_folio()` is very simple for now:
>>> ```
>>> zswap_compress_folio()
>>>          nr_pages = folio_nr_pages(folio)
>>>
>>>          for (index = 0; index < nr_pages; ++index) {
>>>                  struct page *page = folio_page(folio, index);
>>>
>>>                  if (!zswap_compress(page, &entries[index], pool))
>>>                          return false;
>>>          }
>>>
>>>          return true;
>>> ```
>>> This can be easily extended to support your "batched" version.
>>>
>>> Then the old `zswap_store_page()` could be removed.
>>>
>>> The good point is simplicity, that we don't need to slice folio
>>> into multiple batches, then repeat the common operations for each
>>> batch, like preparing entries, storing into xarray and LRU list...
>>
>> +1
> 
> Thanks for the code review comments. One question though: would
> it make sense to trade-off the memory footprint cost with the code
> simplification? For instance, lets say we want to store a 64k folio.
> We would allocate memory for 16 zswap entries, and lets say one of
> the compressions fails, we would deallocate memory for all 16 zswap
> entries. Could this lead to zswap_entry kmem_cache starvation and
> subsequent zswap_store() failures in multiple processes scenarios?

Ah, I get your consideration. But it's the unlikely case, right?

If the case you mentioned above happens a lot, I think yes, we should
optimize its memory footprint to avoid allocation and deallocation.

On the other hand, we should consider a folio would be compressed
successfully in most cases. So we have to allocate all entries
eventually.

Based on your consideration, I think your way is ok too, although
I think the patch 2/2 should be dropped, since it hides pages loop
in smaller functions, as Yosry mentioned too.

> 
> In other words, allocating entries in smaller batches -- more specifically,
> only the compress batchsize -- seems to strike a balance in terms of
> memory footprint, while mitigating the starvation aspect, and possibly
> also helping latency (allocating a large # of zswap entries and potentially
> deallocating, could impact latency).

If we consider the likely case (compress successfully), the whole
latency should be better, right? Since we can bulk allocate all
entries at first, and bulk insert to xarray and LRU at last.

> 
> If we agree with the merits of processing a large folio in smaller batches:
> this in turn requires we store the smaller batches of entries in the
> xarray/LRU before moving to the next batch. Which means all the
> zswap_store() ops need to be done for a batch before moving to the next
> batch.
> 

Both way is ok for me based on your memory footprint consideration
above.

Thanks.