From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CE6F9C25B76
	for <linux-mm@archiver.kernel.org>; Mon,  3 Jun 2024 06:19:52 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 658186B0083; Mon,  3 Jun 2024 02:19:52 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 608766B00A0; Mon,  3 Jun 2024 02:19:52 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4F8226B00A1; Mon,  3 Jun 2024 02:19:52 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 378C46B0083
	for <linux-mm@kvack.org>; Mon,  3 Jun 2024 02:19:52 -0400 (EDT)
Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id BE999407BD
	for <linux-mm@kvack.org>; Mon,  3 Jun 2024 06:19:51 +0000 (UTC)
X-FDA: 82188576582.07.C530853
Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180])
	by imf16.hostedemail.com (Postfix) with ESMTP id 5E53E18000C
	for <linux-mm@kvack.org>; Mon,  3 Jun 2024 06:19:49 +0000 (UTC)
Authentication-Results: imf16.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=R8RO5Hhw;
	spf=pass (imf16.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717395589; a=rsa-sha256;
	cv=none;
	b=gaPw2BOTPuDgAfrdcgsAYPJuCyvLsTgDmu/5Id6R4yPi/WcBeLfJysAoBiEOAgthyT2mrN
	XfLAeb66jN9Ms3IO5swL1LqSaGlkBskLNT2LJqxGzLIDPpb80W/0hGHTY2bcdSyl2qVa1u
	f71GqZzK3ouWe6QEhPcGSBMBIvUFOYw=
ARC-Authentication-Results: i=1;
	imf16.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=R8RO5Hhw;
	spf=pass (imf16.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev;
	dmarc=pass (policy=none) header.from=linux.dev
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1717395589;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=S3PaakJPuHRQHe3+zIiR5vamoIk168HLWLkXlrYdn70=;
	b=07wuXX5oAuhJXUk01XxZ6iF6M001cgTsrR0gCLyi4EoJt73WKYXl/jv2JaRtt8zu6asSBz
	o/onF61Hmub1go7hXpH3W6p6dIvUK62xJCxRWHrio+RwLaICBIa95CAOcjLKWRwK+Ei2YF
	XMgzumrYfb6iy/J6f2J264bQqas26js=
X-Envelope-To: yosryahmed@google.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1717395587;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=S3PaakJPuHRQHe3+zIiR5vamoIk168HLWLkXlrYdn70=;
	b=R8RO5HhwU/Uftl4Qyp7mi6KWYUl8eIBjW8t26ie7X7eu9H7429ID+7bZKuQDsMocndid3p
	I57Jfwsb20tC2z7R/E5LJD0oEuPsQCZxk7tUyQP44pkeklTKqRN40hJH3jBrML0CvmN2v8
	a3ifNxVjcxAF+Lh1dStgVburb01w5f4=
X-Envelope-To: nphamcs@gmail.com
X-Envelope-To: willy@infradead.org
X-Envelope-To: akpm@linux-foundation.org
X-Envelope-To: hannes@cmpxchg.org
X-Envelope-To: linux-mm@kvack.org
X-Envelope-To: linux-kernel@vger.kernel.org
X-Envelope-To: david@redhat.com
X-Envelope-To: 21cnbao@gmail.com
X-Envelope-To: chrisl@kernel.org
X-Envelope-To: ryan.roberts@arm.com
X-Envelope-To: kasong@tencent.com
Message-ID: <9de0ce63-3815-4c1a-91a2-11cb3d526672@linux.dev>
Date: Mon, 3 Jun 2024 14:19:17 +0800
MIME-Version: 1.0
Subject: Re: [PATCH 0/3] mm: zswap: trivial folio conversions
Content-Language: en-US
To: Yosry Ahmed <yosryahmed@google.com>, Nhat Pham <nphamcs@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
 Andrew Morton <akpm@linux-foundation.org>,
 Johannes Weiner <hannes@cmpxchg.org>, linux-mm@kvack.org,
 linux-kernel@vger.kernel.org, David Hildenbrand <david@redhat.com>,
 Barry Song <21cnbao@gmail.com>, Chris Li <chrisl@kernel.org>,
 Ryan Roberts <ryan.roberts@arm.com>, Kairui Song <kasong@tencent.com>
References: <20240524033819.1953587-1-yosryahmed@google.com>
 <ZlAQo0P4Z-dgVHn6@casper.infradead.org>
 <CAJD7tkaVFa24Yty=8J01OKkaFB-TDiKq1tj0GuOD2_TbU+13SQ@mail.gmail.com>
 <CAJD7tkaQQAje_jGLwnPp3xAaYcvXiXQBLHC19h=xbZwgSeVrQA@mail.gmail.com>
 <CAKEwX=NFsLA67p1+Eev9SdeTpM8yycM44N21gFzYjjEZuMyhvA@mail.gmail.com>
 <CAJD7tkYz1-nsoDrjLfNoYaKp5R5QShpzPirKWrY-PSqRtXswtg@mail.gmail.com>
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Chengming Zhou <chengming.zhou@linux.dev>
In-Reply-To: <CAJD7tkYz1-nsoDrjLfNoYaKp5R5QShpzPirKWrY-PSqRtXswtg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Migadu-Flow: FLOW_OUT
X-Rspamd-Queue-Id: 5E53E18000C
X-Rspam-User: 
X-Rspamd-Server: rspam12
X-Stat-Signature: 3b836t8nxh786f8kuworej34qw6zd37g
X-HE-Tag: 1717395589-577135
X-HE-Meta: U2FsdGVkX18BwbKP8Tqe7HKCeWZ5N+poPn8z1RePXuLlVPPreQZTAyyhWtbI3zUzW8hEkfsHLGZBJDDbBmOk/vk1I/Ikd77XQw+WaR22VqNy2bF+w91v4tbRx+MyLNuVBb/ZxXYM3+5fxxdjSq+skAkjKsZOLhsICa3KXFFpd0aP84cmyWxrSV2WrEaKFL/w6KG186gxS83OZsFtyfxOnGl7+m985cRmb2mGUlQWbBECbHCDPT8kESL0ZXAk7IHANyZCfoGw9imK38go9ngDiBpHq8SaU2vyso8Xg6tTqtrOJ/85fKyHL93Md26L52mfw8xtLodCZHduxU0nCDGV5u1v1fh5CAhfqmxqDzuKZ2H7Ykp21OMZaF89VaiicCdhHlAEfOn113m5YwLa49v5sLIC4fwt2Aoho7EMem+nIFXX7sBTiV0m7HqZuLLC2nfq1yHZoK4xOpBW9WApVVBbyWudfgCcKtMdGZa/8R8+JCDMAidWN3pLikpy8TO4ZSpkxVrhYF1xiP6WSe0g3oohIpokyV08jQcm8WdGOpBbnIcC6HH0Yxx7J59Jte1U17/3NHsxDASObSxiYj0o90b0E/OodB/I6fJgLPy7xNALLkrCuuLpHG5rjdmOz1Hx/aD0WA2VoudSEdgGz08qB7dnI74GYu+z2mOIXX0USGyoAbz0LGvpvPN0U2vP1KD2iqLMqWGsW1uXZK3F40Pv/xxRA8OTcQ3dDyloih4/8USs2MX+y1uGd/kgmDUy0TVrZeKgaDYL1QOqgTeuS/YFixY42FF6Z9IZgI5BAnrVsxgeKXe7sww5AaMVRFjs2p+hLiNyDhiJ07u9J4kkjv+Gw7K0/WQS5Yg/v7WGPnOj+oXkBqo+5d0g61s3TTtPuYKNzFOdu2358kwQ2nrh9eRelOtTG+zdAbUxhgbK90he0ciuojd90XG9UBFrC7tN6McJ4lGaQVcSusPi1p4/ZOgN6ZD
 /1dC7azJ
 t3L0fJvV1916NMsM/V96freGyOvc435F+EKPD7YyW5fW+Z9kow+uRzxPKEkBqQI4X4gNH3mV1PSkVB9VaQIyJFQInRLAKwunt1rC5F+TzTRwvfiR75Dh4azxSZVM6unSdiQkpzivipoi1wuuX6ejJxzjhSzFXYmO1fChAqhJ4JGzAXkXKsg/W9zpRWzAGTQHP5NQXDIaoHi7v9gKiutV5edTTK+BsqesODo+S
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 2024/5/29 03:32, Yosry Ahmed wrote:
> On Tue, May 28, 2024 at 12:08 PM Nhat Pham <nphamcs@gmail.com> wrote:
>>
>> On Fri, May 24, 2024 at 4:13 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>>>
>>> On Fri, May 24, 2024 at 12:53 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>>>>
>>>> On Thu, May 23, 2024 at 8:59 PM Matthew Wilcox <willy@infradead.org> wrote:
>>>>>
>>>>> On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote:
>>>>>> Some trivial folio conversions in zswap code.
>>>>>
>>>>> The three patches themselves look good.
>>>>>
>>>>>> The mean reason I included a cover letter is that I wanted to get
>>>>>> feedback on what other trivial conversions can/should be done in
>>>>>> mm/zswap.c (keeping in mind that only order-0 folios are supported
>>>>>> anyway).  These are the things I came across while searching for 'page'
>>>>>> in mm/zswap.c, and chose not to do anything about for now:
>>>>>
>>>>> I think there's a deeper question to answer before answering these
>>>>> questions, which is what we intend to do with large folios and zswap in
>>>>> the future.  Do we intend to split them?  Compress them as a large
>>>>> folio?  Compress each page in a large folio separately?  I can see an
>>>>> argument for choices 2 and 3, but I think choice 1 is going to be
>>>>> increasingly untenable.
>>>>
>>>> Yeah I was kinda getting the small things out of the way so that zswap
>>>> is fully folio-ized, before we think about large folios. I haven't
>>>> given it a lot of thought, but here's what I have in mind.
>>>>
>>>> Right now, I think most configs enable zswap will disable
>>>> CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so
>>>> let's assume that today we are splitting large folios before they go
>>>> to zswap (i.e. choice 1).
>>>>
>>>> What we do next depends on how the core swap intends to deal with
>>>> large folios. My understanding based on recent developments is that we
>>>> intend to swapout large folios as a whole, but I saw some discussions
>>>> about splitting all large folios before swapping them out, or leaving
>>>> them whole but swapping them out in order-0 chunks.
>>>>
>>>> I assume the rationale is that there is little benefit to keeping the
>>>> folios whole because they will most likely be freed soon anyway, but I
>>>> understand not wanting to spend time on splitting them, so swapping
>>>> them out in order-0 chunks makes some sense to me. It also dodges the
>>>> whole fragmentation issue.
>>>>
>>>> If we do either of these things in the core swap code, then I think
>>>> zswap doesn't need to do anything to support large folios. If not,
>>>> then we need to make a choice between 2 (compress large folios) &
>>>> choice 3 (compress each page separately) as you mentioned.
>>>>
>>>> Compressing large folios as a whole means that we need to decompress
>>>> them as a whole to read a single page, which I think could be very
>>>> inefficient in some cases or force us to swapin large folios. Unless
>>>> of course we end up in a world where we mostly swapin the same large
>>>> folios that we swapped out. Although there can be additional
>>>> compression savings from compressing large folios as a whole.
>>>>
>>>> Hence, I think choice 3 is the most reasonable one, at least for the
>>>> short-term. I also think this is what zram does, but I haven't
>>>> checked. Even if we all agree on this, there are still questions that
>>>> we need to answer. For example, do we allocate zswap_entry's for each
>>>> order-0 chunk right away, or do we allocate a single zswap_entry for
>>>> the entire folio, and then "split" it during swapin if we only need to
>>>> read part of the folio?
>>>>
>>>> Wondering what others think here.
>>>
>>> More thoughts that came to mind here:
>>>
>>> - Whether we go with choice 2 or 3, we may face a latency issue. Zswap
>>> compression happens synchronously in the context of reclaim, so if we
>>> start handling large folios in zswap, it may be more efficient to do
>>> it asynchronously like swap to disk.
>>
>> We've been discussing this in private as well :)
>>
>> It doesn't have to be these two extremes right? I'm perfectly happy
>> with starting with compressing each subpage separately, but perhaps we
>> can consider managing larger folios in bigger chunks (say 64KB). That
>> way, on swap-in, we just have to bring a whole chunk in, not the
>> entire folio, and still take advantage of compression efficiencies on
>> bigger-than-one-page chunks. I'd also check with other filesystems
>> that leverage compression, to see what's their unit of compression is.
> 
> Right. But I think it will be a clearer win to start with compressing
> each subpage separately, and it avoids splitting folios during reclaim
> to zswap. It also doesn't depend on the zsmalloc work.
> 
> Once we have that, we can experiment with compressing folios in larger
> chunks. The tradeoffs become less clear at that point, and the number
> of variables you can tune goes up :)

Agree, it's a good approach! And it hasn't any decompression amplification
problem.

Thanks.