From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FBC1C25B76 for ; Sun, 2 Jun 2024 01:30:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE67C6B0092; Sat, 1 Jun 2024 21:30:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E6E2E6B0098; Sat, 1 Jun 2024 21:30:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0F676B009A; Sat, 1 Jun 2024 21:30:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AF9616B0092 for ; Sat, 1 Jun 2024 21:30:37 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1EB1080B10 for ; Sun, 2 Jun 2024 01:30:37 +0000 (UTC) X-FDA: 82184218914.17.7D41F52 Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) by imf04.hostedemail.com (Postfix) with ESMTP id 5A1844000E for ; Sun, 2 Jun 2024 01:30:35 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Rix54tGm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717291835; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GQ1gCPwC65c3cMl7vP99XEeMRRhkHaJFqWTOSEyeuz8=; b=cz3xRHysGfPLG+ONIdCb9NX4u/O1xs+gBiIoZqJlRf42R3pJWkJkxhi93qfMJnDyGJkxCR Fa8oAy8aV08/67jh+xFnBEBxO2QZVt5e7vtNxrU5a1d8DKAzIm7MjWQZdu5YLtsBYFRgJD FA0LHN8sIQYmLeyUHZhepxo59RnBnlc= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Rix54tGm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717291835; a=rsa-sha256; cv=none; b=cahxEScm/K3sM7AbetvyYDK2ZROOlkd1IzyxztmjuoTOjD8nEKnapRBN7lt+JNVWhyrCRI h3fZPl/sbyy66d/quciL0fjgajO11tNGH2EhAy0evKV3NiVJVNglUFoHbg8ovMe0syHisI w+R41S+sawvpqvXZXZbgNdZSYmhFiRg= Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-62a0c011d53so33345477b3.0 for ; Sat, 01 Jun 2024 18:30:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717291834; x=1717896634; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GQ1gCPwC65c3cMl7vP99XEeMRRhkHaJFqWTOSEyeuz8=; b=Rix54tGmCT/Fr6ukght++939H73yIpY1Ff2wsyaAmaLyqWag5WgFp+4UdjN36bWTYI kXIS/lfhrazK5OCVqr9x+3kPVs3P7XFrOz/t5k5s4Ax/VQ1nwEDallWxzazqOnCMQzjS cxIDuALe3/dMD/VBjb/T4ON5mpu40tbg0xGyeIAkiZBvYO9hIOuGiZ3Zc1NrdGGFrIr+ pEaZkNVocKEpO6448oGHMo2ly/NxyXaXJbzg7Kp21pWMwNOGeyQn+nbnLk6wmrS96jBs BmoXOObMn4orxBUoSTnBkayQm6a1CMhBFRWks3gcBXooRs2UQ8Z9sK/yrFl0hS+YWFB5 B0HA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717291834; x=1717896634; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GQ1gCPwC65c3cMl7vP99XEeMRRhkHaJFqWTOSEyeuz8=; b=hXE2Vo1XLDBk0Z6ZiYAVIdj8ejZ3XLt/81l95eO2bzAH5Cpar4f8sYXfFVUvScChZv XBbCMuoiqBKOYkWBg9nlXOR+xqp+/eL4sVJOeuk4om4g6ckJWD/WBC0l7iYww99VOQT7 Xh10JIgLJSY6r0jrCDHitSe+E9ojjw5CuETyCnrEjP3KlMmGQAGywFIa+oOnd9l8zeYW qN7Ci2QW0PXFF7F94e+BaEtARN2oGY1lPEVaCqqFw80q6BJpRFNs26Dmk2MjCFClKx+q NachucWrE5Z+wTGzkOiVuGUU+bBhrAgDNvFFQ+tzKmRitfg6GgOGq5IpPsaogdoWh5Yh XktQ== X-Forwarded-Encrypted: i=1; AJvYcCWB2hYJTTWfiQw8GnATs4ceI5z6qO0WS9wN9gb2FMbB7UswI/ipkRCaqkW8eaQrHfzQPUwPIi6wajKj0rMVpr1iDqs= X-Gm-Message-State: AOJu0Yyg0PI+4Mxm8RS+PYp7pWZBKg4Wbqhkv8PDHk+DqQjOCXWDzjrs oeMIxBDGMydrmH0bWty3lSdwsdS+yhIhRgC+dVc73fk6znrW1HrV8/ch+SgKwcfuhvgSN+EqD8/ 2g5zwY82KXPyTBsdu4MXScUsPGJg= X-Google-Smtp-Source: AGHT+IHKxiyZFijWNkk5tIO42eVIbE6xzWLhiJux2pBF/L01KS63eVliz62tniOqmGg1DMpFtl3lKQPoBxo9g2pjfqk= X-Received: by 2002:a0d:c987:0:b0:622:f785:5499 with SMTP id 00721157ae682-62c7acd2385mr58593317b3.37.1717291832826; Sat, 01 Jun 2024 18:30:32 -0700 (PDT) MIME-Version: 1.0 References: <20240524033819.1953587-1-yosryahmed@google.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Sun, 2 Jun 2024 13:30:21 +1200 Message-ID: Subject: Re: [PATCH 0/3] mm: zswap: trivial folio conversions To: Nhat Pham Cc: Yosry Ahmed , Matthew Wilcox , Andrew Morton , Johannes Weiner , Chengming Zhou , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , Chris Li , Ryan Roberts , Kairui Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 5A1844000E X-Stat-Signature: 9pbzpfaag16ini5oijxdbua1xdc14yhr X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1717291835-993963 X-HE-Meta: U2FsdGVkX1/4K3CjRxOzAONadKc9/scsL7vdm0aULgKM9MJfQIgZJDKejo0tY7aTyyMfms4iZOp4G5j+pb/0zt/A+Tu108Wos0ua2ITnrkOQbuxPVHjRHoDWuskoMNeaVRB+rq7miZAg8whN5JvvAJw+bSeF9p+TnkXTPco6bBytvsiQs2mY854K0sJc/0We31adFAIgoHmLj96QONPP2wKMgDb40O7WiXwFvfqznMr81QmwUMfadLB37PhBwTO7XTELLIWutNeq9TXKR2emSp3ZLsykpA74FPtJKmINzJOYp1Gu5FZttiz+Mu61TIdLo2IG+W17ALAZTn0UnDJNnfUmR++ty3HJIE4QEbXbsRB8jQpgMZnFVgEnerdfktggP2g48ooirGfUxollaYpeaW1dS0XRwPHUtPCajzwxLJdXS8rM6ZvS3v9So3ehkat/3eN22WNN+Scir68Rt188tcr7xrwv2Aj0iyTlpuxKVL4EOmQGfHx/k2k1g2J558u+XMuAUKDP1FwQIyNkv+tIsjOPRMWPka2WXrRRUyKwnjFmNjrdXVbCpXIJIT11binjvb06VgyA4pCuG0VdGs1utElvAae+ZuJt3HVY90AkxLssTi+TMrChVgDzCll1mPU/vByVq6kuo7x56dSrBFZnuBLgBMt8ynOrQ37BIrg4Xf3nrhvlXjIN1Wh5OmZJaLHleZLSOuL9nqIvq/y2kJotRQmGskXcRJ7HvTAs0uFYOmfhebaDDuI8ADm+v2Qea+YCGwOLaxFhAwEGGYYcJFAMq/umY0VElPjfHjCKfu5MyDBWDFEXUXUvuaCMCeoLGV+tF9h8reYhUNnIlDU0eZDqdlwKjK4WSyhdK18Lr2nqMlV0B+EEvjZvRoiCehCwhkns0AglbkVYoW5R9X3yXDGZsnga2ldwf9uMxYv6antdVK7+a68kYgAUXl1CKx9B3/mmSZ24ewrwNLeCDdfRypB wFwWU5yX gE7v3U6mmYzH7yZjXv6XG1LfRxGDlXRBKUw6wIMec36F318wkgw+AfS3WVGJ614Q8kTdm+tSY+5EoKOS5dBBmk6HLkfyJNC0HR4aYAMDN4lxfvIQoRK7P8lpJdtd3IOPf5xQfcnuXnnWKgSp2SRsJVUtq/GloNiLkCostSO8QaK7PMoetzE05GCALditvMmY8G1uHHmCLHhj+LNsAYsqzYYu5FWOHBflG2WojIIwTR0rDJVo5y2WH0XczLDqj+flWN1e1k2BqnJ5wWnGEJe0yO17HF3XbBcy+cn3SqqMy9ibTcBhlr7ssQJ4cLMXE4kTaK4zkFZ9I6ssMaYRs4R1aDTh+TuoOVnSMpVNV7cts4rRabzVsM/I4+3dis8afL+Kvy3wUoHPWk9E8L05OV0NEobleb/103AJ/i1P+a8TeJ2iRzD4JObawhthrLkVA0cw6PKlWXf2X3XAXnxN8A6pqSbvKskDawbCaKd5vuLibD/zJp5iQB/cu32uI3uhtWIKYNfCWWoP0hI9cYTw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 29, 2024 at 7:08=E2=80=AFAM Nhat Pham wrote= : > > On Fri, May 24, 2024 at 4:13=E2=80=AFPM Yosry Ahmed wrote: > > > > On Fri, May 24, 2024 at 12:53=E2=80=AFPM Yosry Ahmed wrote: > > > > > > On Thu, May 23, 2024 at 8:59=E2=80=AFPM Matthew Wilcox wrote: > > > > > > > > On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote: > > > > > Some trivial folio conversions in zswap code. > > > > > > > > The three patches themselves look good. > > > > > > > > > The mean reason I included a cover letter is that I wanted to get > > > > > feedback on what other trivial conversions can/should be done in > > > > > mm/zswap.c (keeping in mind that only order-0 folios are supporte= d > > > > > anyway). These are the things I came across while searching for = 'page' > > > > > in mm/zswap.c, and chose not to do anything about for now: > > > > > > > > I think there's a deeper question to answer before answering these > > > > questions, which is what we intend to do with large folios and zswa= p in > > > > the future. Do we intend to split them? Compress them as a large > > > > folio? Compress each page in a large folio separately? I can see = an > > > > argument for choices 2 and 3, but I think choice 1 is going to be > > > > increasingly untenable. > > > > > > Yeah I was kinda getting the small things out of the way so that zswa= p > > > is fully folio-ized, before we think about large folios. I haven't > > > given it a lot of thought, but here's what I have in mind. > > > > > > Right now, I think most configs enable zswap will disable > > > CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so > > > let's assume that today we are splitting large folios before they go > > > to zswap (i.e. choice 1). > > > > > > What we do next depends on how the core swap intends to deal with > > > large folios. My understanding based on recent developments is that w= e > > > intend to swapout large folios as a whole, but I saw some discussions > > > about splitting all large folios before swapping them out, or leaving > > > them whole but swapping them out in order-0 chunks. > > > > > > I assume the rationale is that there is little benefit to keeping the > > > folios whole because they will most likely be freed soon anyway, but = I > > > understand not wanting to spend time on splitting them, so swapping > > > them out in order-0 chunks makes some sense to me. It also dodges the > > > whole fragmentation issue. > > > > > > If we do either of these things in the core swap code, then I think > > > zswap doesn't need to do anything to support large folios. If not, > > > then we need to make a choice between 2 (compress large folios) & > > > choice 3 (compress each page separately) as you mentioned. > > > > > > Compressing large folios as a whole means that we need to decompress > > > them as a whole to read a single page, which I think could be very > > > inefficient in some cases or force us to swapin large folios. Unless > > > of course we end up in a world where we mostly swapin the same large > > > folios that we swapped out. Although there can be additional > > > compression savings from compressing large folios as a whole. > > > > > > Hence, I think choice 3 is the most reasonable one, at least for the > > > short-term. I also think this is what zram does, but I haven't > > > checked. Even if we all agree on this, there are still questions that > > > we need to answer. For example, do we allocate zswap_entry's for each > > > order-0 chunk right away, or do we allocate a single zswap_entry for > > > the entire folio, and then "split" it during swapin if we only need t= o > > > read part of the folio? > > > > > > Wondering what others think here. > > > > More thoughts that came to mind here: > > > > - Whether we go with choice 2 or 3, we may face a latency issue. Zswap > > compression happens synchronously in the context of reclaim, so if we > > start handling large folios in zswap, it may be more efficient to do > > it asynchronously like swap to disk. > > We've been discussing this in private as well :) > > It doesn't have to be these two extremes right? I'm perfectly happy > with starting with compressing each subpage separately, but perhaps we > can consider managing larger folios in bigger chunks (say 64KB). That > way, on swap-in, we just have to bring a whole chunk in, not the > entire folio, and still take advantage of compression efficiencies on > bigger-than-one-page chunks. I'd also check with other filesystems > that leverage compression, to see what's their unit of compression is. > > I believe this is the approach Barry is suggesting for zram: > > https://lore.kernel.org/linux-block/20240327214816.31191-2-21cnbao@gmail.= com/T/ > > Once the zsmalloc infrastructure is there, we can play with this :) > > Barry - what's the progress regarding this front? Thanks for reaching out. Not too much. It depends on large folios swap-in because we need to swap in large folios if we compress them as a whole. For example, if we swap out 64KiB but only swap in 4KiB, we still need to decompress the entire 64KiB but copy only 4KiB. Recently, we=E2=80=99ve only addressed the large folio swap-in refault case= s in the mm-unstable branch[1]. [1] https://lore.kernel.org/linux-mm/20240529082824.150954-1-21cnbao@gmail.= com/ Currently, swap-in is not allocating large folios in any mm branch. A major debate is that my original patch[2] started from SYNC_IO case for z= RAM and embedded devices first, while Ying argue we should start from non-SYNC IO and decide swapin sizes based on read-ahead window but not based on the original sizes of how folios are swapped out. [2] https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail.= com/ So I guess we need more work to get large folios swap-in ready, and it won't happen shortly. > > > > > - Supporting compression of large folios depends on zsmalloc (and > > maybe other allocators) supporting it. There have been patches from > > Barry to add this support to some extent, but I didn't take a close > > look at those. > > > > Adding other folks from the mTHP swap discussions here in case they > > have other thoughts about zswap. Thanks Barry