From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67567C25B78 for ; Tue, 28 May 2024 19:33:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 959206B0088; Tue, 28 May 2024 15:33:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 906B06B0089; Tue, 28 May 2024 15:33:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CE396B008A; Tue, 28 May 2024 15:33:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 602846B0088 for ; Tue, 28 May 2024 15:33:42 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0B0D91A0906 for ; Tue, 28 May 2024 19:33:42 +0000 (UTC) X-FDA: 82168804284.24.4BAE31B Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf21.hostedemail.com (Postfix) with ESMTP id 3F29C1C000C for ; Tue, 28 May 2024 19:33:39 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zYwK3Wow; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716924819; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5nndAkULklMmF966l0By5SKV5vIz2VJ0sZL6bYUx2uU=; b=vHSzpUWfLC1ViQaCPgShp4GvEgAV8iV7rRYdxbm9bKTcDksCp0jbb/r2XaY1tkHz2W7REa Doa8iDzbsVUi04atDr7vQCubnTzsOcnTtP0kFprBP/Kl0ePoGIpdT510Oe0R/sT+nfM5k9 ASvc9cywb+rcJxK5MdXWvGyXqJ0DFO4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716924819; a=rsa-sha256; cv=none; b=CVURZ11sQcz0kf0B1lD3Vl32SPS5BbuFs+RS6Bz0Pd60doQG3RMl0acGaVzo6vt2Q7i+rK jlY1Am0LVxXrw0ZcDdf1tf4mCSrdhl3nyJl0qypUsL7aaM6QEHK8rb4QasOH8hc41QzPy8 Va4C7mLAWYdSK2IWGXbUIC3KV7DmnVI= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zYwK3Wow; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-a6266ffdba8so121651166b.1 for ; Tue, 28 May 2024 12:33:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1716924818; x=1717529618; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5nndAkULklMmF966l0By5SKV5vIz2VJ0sZL6bYUx2uU=; b=zYwK3WowQGMVQbmkBpUWPPDhmDoqW64s3CYm/505jcgXFsWn71V365iRkeX+3mmQ2l Op46sKFMqR9waG+J5bx8SQtn3vzYDkdwO46QusBreu36++KvSukZsbN0osEYMr82oZb4 CPco8SQDneuNRzSAGEPJTGSZNYsL4829JCue/OSppgeswaANebZn5jXElRHl6e3xj6Dj 8RCyzQtpg9HkZxXWpcmsL1g4PIlVVayLpnb74LHxIacky0lc29DPanPU2vhqHwyexhx7 CBpoqmhcY853F183AAbAoWxYVQ9uZ060GSGO2GSXIGXZ1KjG5tLED4V4wRlMzWu/VeO0 fuCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716924818; x=1717529618; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5nndAkULklMmF966l0By5SKV5vIz2VJ0sZL6bYUx2uU=; b=XPhWXnkvO+2rgIXEBDb27DCcFKVcY+lkqyJsqwgX2BRpplDp4bJvYfi3Frwjdfs4Iq pNL2bydEt9sIf/A0Sy2D9aj7kx1tvAe74XX17qJUrry3N6rSKj8IbapVfE1IwUfiaZFM FLWX4unyxCNPXRBfIipce9NzG6vkdi/P7wfQs2A35WSAF+OT5y031sCbkHCNZwlJBmPX Nt0pm3d05Es78SLqr/QEzprS1HyTr5/XY0egpeXXRsMpzy0TdZWrcBHfDP7b4wz9eVWS zZYvAizIfpAke3NxHi2byVuTZ+sOrKMOJqGvHepSpEBRNTG4rs5LYdOPvnQxMvsUP+O3 NG6A== X-Forwarded-Encrypted: i=1; AJvYcCXxptzkTDZECXUeZBPnYrqYFxWxz9s6NcMI+Wsz8pQtEqNZWty38DpWfQ+D6Xb28MykaWk9S5SSloJ4jxnSsnuDPoQ= X-Gm-Message-State: AOJu0Yw77a5VeH/XmXPe8USiF1FgZXcLy8mB0jv3V/KBi+tMPRvEQhSJ OAS983vQwSEXgd9u+b6eicSIUzk4EHTrQCJOrZncQ29aQXHnG2gMCZopV/yCDnJefyb2nGufaCJ InCBIptFQopYE4CAo9nov4tIPDgxPXNSQY85d X-Google-Smtp-Source: AGHT+IHqGLGt3dTQAFd9KY3oTJ8YDtoM1LCiiQflMR8BF9JDKdt+ZQhmDfZocB+SZeE4pJ/FzvAVoF4Z2VdeB/HPW6o= X-Received: by 2002:a17:906:a14f:b0:a62:2cef:95e6 with SMTP id a640c23a62f3a-a626408547emr949969366b.14.1716924817521; Tue, 28 May 2024 12:33:37 -0700 (PDT) MIME-Version: 1.0 References: <20240524033819.1953587-1-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 28 May 2024 12:32:59 -0700 Message-ID: Subject: Re: [PATCH 0/3] mm: zswap: trivial folio conversions To: Nhat Pham Cc: Matthew Wilcox , Andrew Morton , Johannes Weiner , Chengming Zhou , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , Barry Song <21cnbao@gmail.com>, Chris Li , Ryan Roberts , Kairui Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 3F29C1C000C X-Rspam-User: X-Stat-Signature: k4m96cpckwrbq7xt6m3ggsmhmn8zxmug X-HE-Tag: 1716924819-464927 X-HE-Meta: U2FsdGVkX1/QhXixei7k2TceIy4k6iPEIq7znZCeYKgCE51MB8JDQkM+inSIdPCOSxIPmNtZZZHg3lhAfEiB8s/NHCk95oEopskNF64adxBGqA8u0/WQbVyVaeDompktLVIUwsOtfpetZjMUDdaxCcI3hRpPsoMQvj0xLVLGg7xC3OfQW8S7V1GqlLYo6UBejTTkg4/UX3yNKfOpjhEtFkPelCOte7NLzaDR49elMIGV+eJ9HWHOHcIScVI8pt9y+TukhRMiBa+jwv5swKuA8C8vm2z6WtQjtzCjJcW+54Yl9fy+qpz9tG6+gyk038tNA0I6f/AnfGk3xa63qCkLySkvX1gl4EeqXkdp3Qm+S3pdlmGuGEGFUHEJ9qTTXTeOtxgw+QrPjnIcJGPO6vc+gXu3NliM/DOlZlAqxfqizMk/lvCMnyfE7eFLhftwFxoNjQyx2AThWbk8hjfQppTGI+2q+h1EZ8WxxtBtMXM2oGRWgYfp77nDdbxrtHsNRNpAwQZYbjY3YUtpBvI9Fs7tgC69duxjoPjIiJ9NWjbGBofkIFfsV1ryHsauDGYgzbwMnW+uzEjHb4tFaAjIhhO5+mLJTDyJpsPY24ojGeOi+tNOYV/5hnTumGE3KR4vP4V35E1LW+Bs+NAyTXcmuO/f+AuNumwo1sjdarVDhUAPjLnFDKYtlzNmMVpDclInXY5rX0GaSIIPwYPScRePkl7AqHYNm/Nq2fw1FyK5kP1y4XBnXm1pA0wX/yJ6dHH30EiyZ96b5XUE1B6S+SnZ3uXnEkqtQteNIBIfEwpWCWf/of45fuImyKetRGZZzyxEa+w1pn4uGpTMgjhjxC5Lu3YLgDOBl/VyRgNfdB4rTz+Fo5IGJzmfHk4357FxqPiBPtSQU7HXPx3sM/2VtU0BaiGyDhBO/ccKKQu3byjRiRZ8d9MDWToCQBl7qWwD+ET1/MouUJvHPu0NCffxDfMAMHw uYNig4go ya7CaR4sy7F2FKp0XZ7TFkAFnYWpGUFpVXkHfX/18y/W4EJxPoFZ8Y8OuCgk25FxnqPH1/0AwhThFK/eKqie6dj48s10SRvKqHx7KV/m4ib5YkUkIKJ9fpq9e/2gcs/Y8Kcs2ZmsRWxzE8pgAMwEsvYngdXyqWIxag46iSSgXGxvBdqXVHmtGnhPxuo5fVs5lHIpaafBQJrgHmxNmAyvWtSvatwEoVfUQH8eq+HycyT1yeTT6Rtb1dGiNQ2FLc0dplDq2VKenPhIdlxFdqH2a8a38zvrjExsi5bntXi/oifvQ2Flny9xNC4zOOOoNcm8beGpvWHGgIN/H2ts= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 28, 2024 at 12:08=E2=80=AFPM Nhat Pham wrot= e: > > On Fri, May 24, 2024 at 4:13=E2=80=AFPM Yosry Ahmed wrote: > > > > On Fri, May 24, 2024 at 12:53=E2=80=AFPM Yosry Ahmed wrote: > > > > > > On Thu, May 23, 2024 at 8:59=E2=80=AFPM Matthew Wilcox wrote: > > > > > > > > On Fri, May 24, 2024 at 03:38:15AM +0000, Yosry Ahmed wrote: > > > > > Some trivial folio conversions in zswap code. > > > > > > > > The three patches themselves look good. > > > > > > > > > The mean reason I included a cover letter is that I wanted to get > > > > > feedback on what other trivial conversions can/should be done in > > > > > mm/zswap.c (keeping in mind that only order-0 folios are supporte= d > > > > > anyway). These are the things I came across while searching for = 'page' > > > > > in mm/zswap.c, and chose not to do anything about for now: > > > > > > > > I think there's a deeper question to answer before answering these > > > > questions, which is what we intend to do with large folios and zswa= p in > > > > the future. Do we intend to split them? Compress them as a large > > > > folio? Compress each page in a large folio separately? I can see = an > > > > argument for choices 2 and 3, but I think choice 1 is going to be > > > > increasingly untenable. > > > > > > Yeah I was kinda getting the small things out of the way so that zswa= p > > > is fully folio-ized, before we think about large folios. I haven't > > > given it a lot of thought, but here's what I have in mind. > > > > > > Right now, I think most configs enable zswap will disable > > > CONFIG_THP_SWAP (otherwise all THPs will go straight to disk), so > > > let's assume that today we are splitting large folios before they go > > > to zswap (i.e. choice 1). > > > > > > What we do next depends on how the core swap intends to deal with > > > large folios. My understanding based on recent developments is that w= e > > > intend to swapout large folios as a whole, but I saw some discussions > > > about splitting all large folios before swapping them out, or leaving > > > them whole but swapping them out in order-0 chunks. > > > > > > I assume the rationale is that there is little benefit to keeping the > > > folios whole because they will most likely be freed soon anyway, but = I > > > understand not wanting to spend time on splitting them, so swapping > > > them out in order-0 chunks makes some sense to me. It also dodges the > > > whole fragmentation issue. > > > > > > If we do either of these things in the core swap code, then I think > > > zswap doesn't need to do anything to support large folios. If not, > > > then we need to make a choice between 2 (compress large folios) & > > > choice 3 (compress each page separately) as you mentioned. > > > > > > Compressing large folios as a whole means that we need to decompress > > > them as a whole to read a single page, which I think could be very > > > inefficient in some cases or force us to swapin large folios. Unless > > > of course we end up in a world where we mostly swapin the same large > > > folios that we swapped out. Although there can be additional > > > compression savings from compressing large folios as a whole. > > > > > > Hence, I think choice 3 is the most reasonable one, at least for the > > > short-term. I also think this is what zram does, but I haven't > > > checked. Even if we all agree on this, there are still questions that > > > we need to answer. For example, do we allocate zswap_entry's for each > > > order-0 chunk right away, or do we allocate a single zswap_entry for > > > the entire folio, and then "split" it during swapin if we only need t= o > > > read part of the folio? > > > > > > Wondering what others think here. > > > > More thoughts that came to mind here: > > > > - Whether we go with choice 2 or 3, we may face a latency issue. Zswap > > compression happens synchronously in the context of reclaim, so if we > > start handling large folios in zswap, it may be more efficient to do > > it asynchronously like swap to disk. > > We've been discussing this in private as well :) > > It doesn't have to be these two extremes right? I'm perfectly happy > with starting with compressing each subpage separately, but perhaps we > can consider managing larger folios in bigger chunks (say 64KB). That > way, on swap-in, we just have to bring a whole chunk in, not the > entire folio, and still take advantage of compression efficiencies on > bigger-than-one-page chunks. I'd also check with other filesystems > that leverage compression, to see what's their unit of compression is. Right. But I think it will be a clearer win to start with compressing each subpage separately, and it avoids splitting folios during reclaim to zswap. It also doesn't depend on the zsmalloc work. Once we have that, we can experiment with compressing folios in larger chunks. The tradeoffs become less clear at that point, and the number of variables you can tune goes up :)