From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D758EB64DA for ; Thu, 20 Jul 2023 21:33:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9365F280161; Thu, 20 Jul 2023 17:33:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E6E228004C; Thu, 20 Jul 2023 17:33:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AE22280161; Thu, 20 Jul 2023 17:33:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6C09528004C for ; Thu, 20 Jul 2023 17:33:52 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 3BC521402AB for ; Thu, 20 Jul 2023 21:33:52 +0000 (UTC) X-FDA: 81033292704.23.3559E6C Received: from mail-ua1-f49.google.com (mail-ua1-f49.google.com [209.85.222.49]) by imf12.hostedemail.com (Postfix) with ESMTP id 4D7484001E for ; Thu, 20 Jul 2023 21:33:49 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=rirawEyH; spf=pass (imf12.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.222.49 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689888829; a=rsa-sha256; cv=none; b=tg4lgAbBuJRfQcdgBie8bakCU65Pgg3lEKv7KXir5A/x9WghAw6cBHDezqCIkCisV/Q2+m 0W7dBZmWNSZuQ4aQeQ4OZhxuegynIYpyVcl3moySeW/9rWqOfq5v62QMYgO49fBzcum6iM yiGEIg9KtETHwMvMT284558lFUqxoWA= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=rirawEyH; spf=pass (imf12.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.222.49 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689888829; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nKPKYbCnakgjeySnlcKiEAUl7NH69dLo9MLeSEw67MY=; b=1HDlt2k9XaEDzOSwlMPzx+8j89Lj9OhT8J78wk2fO/ao6DlD8AK3W+MrNJT3FAab7RYNIV h2ILAk+b6p4DznfR48lzqgrUdQv04sWncbtD1sf/NPqYj659ie2gYmgkTVEv1iWoCN4If/ 5QF5ny9fbO+vT3akJQPGb73woDXIYGQ= Received: by mail-ua1-f49.google.com with SMTP id a1e0cc1a2514c-7943105effbso1578028241.1 for ; Thu, 20 Jul 2023 14:33:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689888828; x=1690493628; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nKPKYbCnakgjeySnlcKiEAUl7NH69dLo9MLeSEw67MY=; b=rirawEyHLQaVJSUNIY3usdV8Ll7t5ZbgE+spjjc1WbBgrwJHk0umildcCxGgBmPo9n 7E3Yx1HAs9FoGbpPhQqok8981LDQCqnaNxJl32mVyjmJtzvwjO9+Ju+P8VDAtCaMP/75 CBYnjv1YO9N7vNG2vXOHvPYmEwSoEM4OTP9emXt0ndoBJhO8HgSyh5D+XgAy/9DPDNyH I6EXnUpUVlZMZPx208SdfA4Il8Pu49Nvcjx4frrThJZD9aeEh2VawZm0A6E5Kod3TZCh ONSmcG9VKzRPwLbo1Ewsq2aNU6gY7eNttS45yyvmIOZdPrMdWGAyV1clsQOyFMoSRE/l SqcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689888828; x=1690493628; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nKPKYbCnakgjeySnlcKiEAUl7NH69dLo9MLeSEw67MY=; b=Dg7CM1CUIKxCewLwihAk37MuToSYAYUhRW3VVp3FPQLPAysuVNS2yD19daO9siUBDl G+MawKyFVFvy2t+d8szrfYQKl5rxc6erLZo+Lbv6AqOHilztNRrCKxljJFWwEJKP5v1l GRDXdDG6slkkDTBpF38Mi7qcZRDkfp+o0YFeTelnMck7I8kuPdkflkN4L+zeTjKfsLAI MRXHKkzqApBofyKlSUIsCZSC/eFgBAlJlsUGRwqIxJvx9/4Z/ha1Hria2sjkfy7uvbze V7sICLKU1wnSUrzcESPLJduv5GoR1D35iXmgdP7EqcQm77EU1V1eBb7zBPYFw3gGSoQC 772w== X-Gm-Message-State: ABy/qLbWf15iabZjR8Rtlr/0Zj5+XBUVXuO72R18c+6UalpoKESMAwNu 7yKALFScIQ5UHlmRH0HJoNckw2XCe/aI7uXzjwY= X-Google-Smtp-Source: APBJJlE+SEts6zoJANMYs9nIHUxvGIhUPlEEQ0RtPfrBibN4ET9uGZoVfjp3AKQIJnTM45nwXwoGbO1bT6GBiGfP6BU= X-Received: by 2002:a67:fd88:0:b0:445:779:943b with SMTP id k8-20020a67fd88000000b004450779943bmr52437vsq.8.1689888828139; Thu, 20 Jul 2023 14:33:48 -0700 (PDT) MIME-Version: 1.0 References: <20230713042037.980211-1-42.hyeyoo@gmail.com> <20230720071826.GE955071@google.com> In-Reply-To: From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Fri, 21 Jul 2023 06:33:36 +0900 Message-ID: Subject: Re: [RFC PATCH v2 00/21] mm/zsmalloc: Split zsdesc from struct page To: Yosry Ahmed Cc: Sergey Senozhatsky , Minchan Kim , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Matthew Wilcox , Mike Rapoport Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 4D7484001E X-Stat-Signature: dmoojduzfmihg4h6ancnpbtfn77xfuuw X-Rspam-User: X-HE-Tag: 1689888829-70351 X-HE-Meta: U2FsdGVkX19cpni1x5zt4X2h2fTR9fbrYISQULP2i1/WoxqzYUq07FM+izaaShOBv5aUC5LoMhC8HDPogeEwOuJTiq8FGY203zj0+aPAkPzqMdptMMyxlFVjnf9yK2Lt4ucYjeCstaKrhNbVFCwS0hP360Nv3dKExLXPpbFKVEuvqQrK4LKDqW/LG6hyQLPQr5hiVqh8T5MhaNUG7XgkOkSu8ecoztfvr2Je6TEl0GfI+9StQJZzyrfWdmhBQYre7q9rk0/LS6t6pOZBjU3XTTt/5FRyINBz8z1xqldvI79Bg7SVqDOWoCeF0J7L9W3l6m6aiO+bhT6L7tRJWdp1sai1TYy6IztKL2RKxv8ESlHSZLq8he16VXysG0XxrVBgRKJzFpTBGy53Vw5qKRNzDjYmbCJ5XS+dzxwV4hjNkAOCrZeepVn5k8b5KyLUAEIilZUMnhtBuYUeGqhbJBunXrUOcingkfKSiDo2cq6cmQrhKAvJ58TVi1oFHkWD4yC3kJIE8ceUmfDORLYKWjfPHUhAglRNlzFxxm6pE+Ko0ShTqN7LEByRx4EGcQTSOSO5AFP7cE43mi6sW7kxSUglabceMlcNz8VX8l3jvmHAP2H7ePdsxMmhABS78mohg6nTHDN3KaQ/bOQ7cSHiobONveg9FF1G6QnEICFAV8gGIisF8F1lwP5QpsqkFrMq5SxZQ0gYPS7I7rpcvPgD5PML1yw0HvuDg9wf4UFjn61TmBM9Up9cvam+zSMqG0MEgmBz02HnUqz8kvqy5mVmgsHxY4e30R8Lmr/8zxmSyV+Q8qehaUfuizgIRcz7f1YmjWAEQZj6DkIjudoJTVW8tHLk0MSqKwXyQoqUScsWrzsj9lMtKE9mbRPQ+iap9F8rl/P+IanurtXdpEwifvSy2etrpktCrG1DOuBKl3v85/vZXo3QFUAZQf/14xNzQgJryBrvr9xSAS69v3bZANGBvdx Ld2T83r0 orSq2569lOo66k0hav6LpLUkSNvUPvX3AziJVD+Nze9HteMpRY6J28hRZS/BhNoRG5FTzHw73KnkyodFHnhsFBs0nZR0oD9leCZOkpd3HzShoRWVi3AZupRj/jyFKqgA5csccDwn49A60M+V0wGDa+c/b+ANWERaFLyPPNRytuHKz8qcFmXcrpWC9mo5+exFLW3ZaZ+QTu6JSfzTFlhZn7VuF20SaiRI69bOIHQJRcYO0ekHRKECF2RpOfh+FkRdDwGvKp3lSbkDXjnTuK9lb9H0nDKcvrbZYhPRMpYuTjsCvRhYwTO1aCb/ND+mKPbunFEVPDtZLe4gk/O53r+FlGqxatFbeKbPv3110Y5j0qnYXrQN6Jtpk5etun8dopfa9NV6U+HyciBk8IfdIDSlm13MAgk9fR14CJceREopXe+bzg6s7cBG+f+9rMSuRlRBYcpReFHqsoFp/ExCQW3GI6/VfPEPP4vaTHB/1SQHti/zl7Qc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 21, 2023 at 3:31=E2=80=AFAM Yosry Ahmed = wrote: > > On Thu, Jul 20, 2023 at 4:34=E2=80=AFAM Hyeonggon Yoo <42.hyeyoo@gmail.co= m> wrote: > > > > On Thu, Jul 20, 2023 at 4:55=E2=80=AFPM Yosry Ahmed wrote: > > > > > > On Thu, Jul 20, 2023 at 12:18=E2=80=AFAM Sergey Senozhatsky > > > wrote: > > > > > > > > On (23/07/13 13:20), Hyeonggon Yoo wrote: > > > > > The purpose of this series is to define own memory descriptor for= zsmalloc, > > > > > instead of re-using various fields of struct page. This is a part= of the > > > > > effort to reduce the size of struct page to unsigned long and ena= ble > > > > > dynamic allocation of memory descriptors. > > > > > > > > > > While [1] outlines this ultimate objective, the current use of st= ruct page > > > > > is highly dependent on its definition, making it challenging to s= eparately > > > > > allocate memory descriptors. > > > > > > > > I glanced through the series and it all looks pretty straight forwa= rd to > > > > me. I'll have a closer look. And we definitely need Minchan to ACK = it. > > > > > > > > > Therefore, this series introduces new descriptor for zsmalloc, ca= lled > > > > > zsdesc. It overlays struct page for now, but will eventually be a= llocated > > > > > independently in the future. > > > > > > > > So I don't expect zsmalloc memory usage increase. On one hand for e= ach > > > > physical page that zspage consists of we will allocate zsdesc (extr= a bytes), > > > > but at the same time struct page gets slimmer. So we should be even= , or > > > > am I wrong? > > > > > > Well, it depends. Here is my understanding (which may be completely w= rong): > > > > > > The end goal would be to have an 8-byte memdesc for each order-0 page= , > > > and then allocate a specialized struct per-folio according to the use > > > case. In this case, we would have a memdesc and a zsdesc for each > > > order-0 page. If sizeof(zsdesc) is 64 bytes (on 64-bit), then it's a > > > net loss. The savings only start kicking in with higher order folios. > > > As of now, zsmalloc only uses order-0 pages as far as I can tell, so > > > the usage would increase if I understand correctly. > > > > I partially agree with you that the point of memdesc stuff is > > allocating a use-case specific > > descriptor per folio. but I thought the primary gain from memdesc was > > from anon and file pages > > (where high order pages are more usable), rather than zsmalloc. > > > > And I believe enabling a memory descriptor per folio would be > > impossible (or inefficient) > > if zsmalloc and other subsystems are using struct page in the current > > way (or please tell me I'm wrong?) > > > > So I expect the primary gain would be from high-order anon/file folios, > > while this series is a prerequisite for them to work sanely. > > Right, I agree with that, sorry if I wasn't clear. I meant that > generally speaking, we see gains from memdesc from higher order > folios, so for zsmalloc specifically we probably won't see seeing any > savings, and *might* see some extra usage (which I might be wrong > about, see below). Yeah, even if I said, "oh, we don't necessarily need to use extra memory for zsdesc" below, a slight increase wouldn't hurt too much in that perspective, because there will be savings from other users of memdesc. > > > It seems to me though the sizeof(zsdesc) is actually 56 bytes (on > > > 64-bit), so sizeof(zsdesc) + sizeof(memdesc) would be equal to the > > > current size of struct page. If that's true, then there is no loss, > > > > Yeah, zsdesc would be 56 bytes on 64 bit CPUs as memcg_data field is > > not used in zsmalloc. > > More fields in the current struct page might not be needed in the > > future, although it's hard to say at the moment. > > but it's not a loss. > > Is page->memcg_data something that we can drop? Aren't there code > paths that will check page->memcg_data even for kernel pages (e.g. > __folio_put() -> __folio_put_small() -> mem_cgroup_uncharge() ) ? zsmalloc pages are not accounted for via __GFP_ACCOUNT, and IIUC the current implementation of zswap memcg charging does not use memcg_data either - so I think it can be dropped. I think we don't want to increase memdesc to 16 bytes by adding memcg_data. It should be in use-case specific descriptors if it can be charged to memcg= ? > > > and there's potential gain if we start using higher order folios in > > > zsmalloc in the future. > > > > AFAICS zsmalloc should work even when the system memory is fragmented, > > so we may implement fallback allocation (as currently discussed in > > large anon folios thread). > > Of course, any usage of higher order folios in zsmalloc must have a > fallback logic, although it might be simpler for zsmalloc than anon > folios. I agree that's off topic here. > > It might work, but IMHO the purpose of this series is to enable memdesc > > for large anon/file folios, rather than seeing a large gain in zsmalloc= itself. > > (But even in zsmalloc, it's not a loss) > > > > > (That is of course unless we want to maintain cache line alignment fo= r > > > the zsdescs, then we might end up using 64 bytes anyway). > > > > we already don't require cache line alignment for struct page. the curr= ent > > alignment requirement is due to SLUB's cmpxchg128 operation, not cache > > line alignment. > > I thought we want struct page to be cache line aligned (to avoid > having to fetch two cache lines for one struct page), but I can easily > be wrong. Right. I admit that even if it's not required to be cache line aligned, it is 64 bytes in commonly used configurations. and changing it could affect some workload= s. But I think for zsdesc it would be better not to align by cache line size, before observing degradations due to alignment. By the time zsmalloc is intensivel= y used, it shouldn't be a huge issue. > > I might be wrong in some aspects, so please tell me if I am. > > And thank you and Sergey for taking a look at this! > > Thanks to you for doing the work! No problem! :)