From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE6F9C25B76 for ; Mon, 3 Jun 2024 06:19:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 658186B0083; Mon, 3 Jun 2024 02:19:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 608766B00A0; Mon, 3 Jun 2024 02:19:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F8226B00A1; Mon, 3 Jun 2024 02:19:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 378C46B0083 for ; Mon, 3 Jun 2024 02:19:52 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BE999407BD for ; Mon, 3 Jun 2024 06:19:51 +0000 (UTC) X-FDA: 82188576582.07.C530853 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) by imf16.hostedemail.com (Postfix) with ESMTP id 5E53E18000C for ; Mon, 3 Jun 2024 06:19:49 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=R8RO5Hhw; spf=pass (imf16.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717395589; a=rsa-sha256; cv=none; b=gaPw2BOTPuDgAfrdcgsAYPJuCyvLsTgDmu/5Id6R4yPi/WcBeLfJysAoBiEOAgthyT2mrN XfLAeb66jN9Ms3IO5swL1LqSaGlkBskLNT2LJqxGzLIDPpb80W/0hGHTY2bcdSyl2qVa1u f71GqZzK3ouWe6QEhPcGSBMBIvUFOYw= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=R8RO5Hhw; spf=pass (imf16.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717395589; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S3PaakJPuHRQHe3+zIiR5vamoIk168HLWLkXlrYdn70=; b=07wuXX5oAuhJXUk01XxZ6iF6M001cgTsrR0gCLyi4EoJt73WKYXl/jv2JaRtt8zu6asSBz o/onF61Hmub1go7hXpH3W6p6dIvUK62xJCxRWHrio+RwLaICBIa95CAOcjLKWRwK+Ei2YF XMgzumrYfb6iy/J6f2J264bQqas26js= X-Envelope-To: yosryahmed@google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1717395587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S3PaakJPuHRQHe3+zIiR5vamoIk168HLWLkXlrYdn70=; b=R8RO5HhwU/Uftl4Qyp7mi6KWYUl8eIBjW8t26ie7X7eu9H7429ID+7bZKuQDsMocndid3p I57Jfwsb20tC2z7R/E5LJD0oEuPsQCZxk7tUyQP44pkeklTKqRN40hJH3jBrML0CvmN2v8 a3ifNxVjcxAF+Lh1dStgVburb01w5f4= X-Envelope-To: nphamcs@gmail.com X-Envelope-To: willy@infradead.org X-Envelope-To: akpm@linux-foundation.org X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: david@redhat.com X-Envelope-To: 21cnbao@gmail.com X-Envelope-To: chrisl@kernel.org X-Envelope-To: ryan.roberts@arm.com X-Envelope-To: kasong@tencent.com Message-ID: <9de0ce63-3815-4c1a-91a2-11cb3d526672@linux.dev> Date: Mon, 3 Jun 2024 14:19:17 +0800 MIME-Version: 1.0 Subject: Re: [PATCH 0/3] mm: zswap: trivial folio conversions Content-Language: en-US To: Yosry Ahmed , Nhat Pham Cc: Matthew Wilcox , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , Barry Song <21cnbao@gmail.com>, Chris Li , Ryan Roberts , Kairui Song References: <20240524033819.1953587-1-yosryahmed@google.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 5E53E18000C X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: 3b836t8nxh786f8kuworej34qw6zd37g X-HE-Tag: 1717395589-577135 X-HE-Meta: U2FsdGVkX18BwbKP8Tqe7HKCeWZ5N+poPn8z1RePXuLlVPPreQZTAyyhWtbI3zUzW8hEkfsHLGZBJDDbBmOk/vk1I/Ikd77XQw+WaR22VqNy2bF+w91v4tbRx+MyLNuVBb/ZxXYM3+5fxxdjSq+skAkjKsZOLhsICa3KXFFpd0aP84cmyWxrSV2WrEaKFL/w6KG186gxS83OZsFtyfxOnGl7+m985cRmb2mGUlQWbBECbHCDPT8kESL0ZXAk7IHANyZCfoGw9imK38go9ngDiBpHq8SaU2vyso8Xg6tTqtrOJ/85fKyHL93Md26L52mfw8xtLodCZHduxU0nCDGV5u1v1fh5CAhfqmxqDzuKZ2H7Ykp21OMZaF89VaiicCdhHlAEfOn113m5YwLa49v5sLIC4fwt2Aoho7EMem+nIFXX7sBTiV0m7HqZuLLC2nfq1yHZoK4xOpBW9WApVVBbyWudfgCcKtMdGZa/8R8+JCDMAidWN3pLikpy8TO4ZSpkxVrhYF1xiP6WSe0g3oohIpokyV08jQcm8WdGOpBbnIcC6HH0Yxx7J59Jte1U17/3NHsxDASObSxiYj0o90b0E/OodB/I6fJgLPy7xNALLkrCuuLpHG5rjdmOz1Hx/aD0WA2VoudSEdgGz08qB7dnI74GYu+z2mOIXX0USGyoAbz0LGvpvPN0U2vP1KD2iqLMqWGsW1uXZK3F40Pv/xxRA8OTcQ3dDyloih4/8USs2MX+y1uGd/kgmDUy0TVrZeKgaDYL1QOqgTeuS/YFixY42FF6Z9IZgI5BAnrVsxgeKXe7sww5AaMVRFjs2p+hLiNyDhiJ07u9J4kkjv+Gw7K0/WQS5Yg/v7WGPnOj+oXkBqo+5d0g61s3TTtPuYKNzFOdu2358kwQ2nrh9eRelOtTG+zdAbUxhgbK90he0ciuojd90XG9UBFrC7tN6McJ4lGaQVcSusPi1p4/ZOgN6ZD /1dC7azJ t3L0fJvV1916NMsM/V96freGyOvc435F+EKPD7YyW5fW+Z9kow+uRzxPKEkBqQI4X4gNH3mV1PSkVB9VaQIyJFQInRLAKwunt1rC5F+TzTRwvfiR75Dh4azxSZVM6unSdiQkpzivipoi1wuuX6ejJxzjhSzFXYmO1fChAqhJ4JGzAXkXKsg/W9zpRWzAGTQHP5NQXDIaoHi7v9gKiutV5edTTK+BsqesODo+S X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/5/29 03:32, Yosry Ahmed wrote: > On Tue, May 28, 2024 at 12:08 PM Nhat Pham wrote: >> >> On Fri, May 24, 2024 at 4:13 PM Yosry Ahmed wrote: >>> >>> On Fri, May 24, 2024 at 12:53 PM Yosry Ahmed wrote: >>>> >>>> On Thu, May 23, 2024 at 8:59 PM Matthew Wilcox wrote: >>>>> >>>>> On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote: >>>>>> Some trivial folio conversions in zswap code. >>>>> >>>>> The three patches themselves look good. >>>>> >>>>>> The mean reason I included a cover letter is that I wanted to get >>>>>> feedback on what other trivial conversions can/should be done in >>>>>> mm/zswap.c (keeping in mind that only order-0 folios are supported >>>>>> anyway). These are the things I came across while searching for 'page' >>>>>> in mm/zswap.c, and chose not to do anything about for now: >>>>> >>>>> I think there's a deeper question to answer before answering these >>>>> questions, which is what we intend to do with large folios and zswap in >>>>> the future. Do we intend to split them? Compress them as a large >>>>> folio? Compress each page in a large folio separately? I can see an >>>>> argument for choices 2 and 3, but I think choice 1 is going to be >>>>> increasingly untenable. >>>> >>>> Yeah I was kinda getting the small things out of the way so that zswap >>>> is fully folio-ized, before we think about large folios. I haven't >>>> given it a lot of thought, but here's what I have in mind. >>>> >>>> Right now, I think most configs enable zswap will disable >>>> CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so >>>> let's assume that today we are splitting large folios before they go >>>> to zswap (i.e. choice 1). >>>> >>>> What we do next depends on how the core swap intends to deal with >>>> large folios. My understanding based on recent developments is that we >>>> intend to swapout large folios as a whole, but I saw some discussions >>>> about splitting all large folios before swapping them out, or leaving >>>> them whole but swapping them out in order-0 chunks. >>>> >>>> I assume the rationale is that there is little benefit to keeping the >>>> folios whole because they will most likely be freed soon anyway, but I >>>> understand not wanting to spend time on splitting them, so swapping >>>> them out in order-0 chunks makes some sense to me. It also dodges the >>>> whole fragmentation issue. >>>> >>>> If we do either of these things in the core swap code, then I think >>>> zswap doesn't need to do anything to support large folios. If not, >>>> then we need to make a choice between 2 (compress large folios) & >>>> choice 3 (compress each page separately) as you mentioned. >>>> >>>> Compressing large folios as a whole means that we need to decompress >>>> them as a whole to read a single page, which I think could be very >>>> inefficient in some cases or force us to swapin large folios. Unless >>>> of course we end up in a world where we mostly swapin the same large >>>> folios that we swapped out. Although there can be additional >>>> compression savings from compressing large folios as a whole. >>>> >>>> Hence, I think choice 3 is the most reasonable one, at least for the >>>> short-term. I also think this is what zram does, but I haven't >>>> checked. Even if we all agree on this, there are still questions that >>>> we need to answer. For example, do we allocate zswap_entry's for each >>>> order-0 chunk right away, or do we allocate a single zswap_entry for >>>> the entire folio, and then "split" it during swapin if we only need to >>>> read part of the folio? >>>> >>>> Wondering what others think here. >>> >>> More thoughts that came to mind here: >>> >>> - Whether we go with choice 2 or 3, we may face a latency issue. Zswap >>> compression happens synchronously in the context of reclaim, so if we >>> start handling large folios in zswap, it may be more efficient to do >>> it asynchronously like swap to disk. >> >> We've been discussing this in private as well :) >> >> It doesn't have to be these two extremes right? I'm perfectly happy >> with starting with compressing each subpage separately, but perhaps we >> can consider managing larger folios in bigger chunks (say 64KB). That >> way, on swap-in, we just have to bring a whole chunk in, not the >> entire folio, and still take advantage of compression efficiencies on >> bigger-than-one-page chunks. I'd also check with other filesystems >> that leverage compression, to see what's their unit of compression is. > > Right. But I think it will be a clearer win to start with compressing > each subpage separately, and it avoids splitting folios during reclaim > to zswap. It also doesn't depend on the zsmalloc work. > > Once we have that, we can experiment with compressing folios in larger > chunks. The tradeoffs become less clear at that point, and the number > of variables you can tune goes up :) Agree, it's a good approach! And it hasn't any decompression amplification problem. Thanks.