From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA8EDEB64D7 for ; Tue, 20 Jun 2023 19:49:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F71C8D0002; Tue, 20 Jun 2023 15:49:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A6988D0001; Tue, 20 Jun 2023 15:49:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26F7E8D0002; Tue, 20 Jun 2023 15:49:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 184898D0001 for ; Tue, 20 Jun 2023 15:49:04 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C2FFF1A0651 for ; Tue, 20 Jun 2023 19:49:03 +0000 (UTC) X-FDA: 80924164566.16.CF88542 Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by imf13.hostedemail.com (Postfix) with ESMTP id E5A972000C for ; Tue, 20 Jun 2023 19:49:01 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=SoIHZ9gN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687290542; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9s/u3hFtd0IBE7tUBIwuAjEIEeFQht4qufYbFXV/XQk=; b=B1GkOU8t8eCrDvkLvlmV6lZJxNQ1+cYTANhRcTtG0kfr6mIqT6yyhHIM+EBimMsSbJKdtq ZX++9yTtuW3it19aYsnnzAuZUJKWwFlN3x4K+kfto0XKRUEgvkIlGEjQRxUlzg7pIS8x+j K7RP2lgqLlIW5vpBiBF+Bh/kWdpYCXE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=SoIHZ9gN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687290542; a=rsa-sha256; cv=none; b=DD7l2m753cn/w9DjCK8oEKkiPKSaa9zJCzNVirqq7FSh8pbVFJBHh1yXakDTphI2U3Red0 jJsdhxVFnzhJVV1GboHiTOpHZkNtFQZRg3p4tAw/UcdwvTl6D9hkCgk3J+T8YB4GY0QLvn HqJIJ796SdD3JFDkN5B3PfankxM55WA= Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-982a88ca610so615589966b.2 for ; Tue, 20 Jun 2023 12:49:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687290540; x=1689882540; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9s/u3hFtd0IBE7tUBIwuAjEIEeFQht4qufYbFXV/XQk=; b=SoIHZ9gNZGx9ulbJa5FGNGB4riPB8xOL2DywLVaBAbznxmtY4QCbC+3YwLyjupKFnx 73HVqeqPytzCIhX3q/nh28darjjANhLeb3J8d/Y6aavGlNk81grVrzMkDX30omguC3R+ TrDtiraYFxOQiurn/IrKPsy8ERQKN+a72nJB4WkDGxGQPJx7AwPZXUr7gWRXgQCkwYIn oxNRo9Xcmvi9lySgr6VtN2p4xFu7lRmpGG7BtOn7GI62F/W8qKWzqiKrxYo/6PvcNkxR yLIdWg/55z9q+wXKedzzn+/Yxc+awJmt0kbnFBBmkCOzJZsHaiac39FgV9dJgAI1TGtr guYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687290540; x=1689882540; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9s/u3hFtd0IBE7tUBIwuAjEIEeFQht4qufYbFXV/XQk=; b=S7KFFw7GEPIiecsQcUQ65pRgrfkLik6JdhFiFAhlAJp8HsnybPqnRzErDxvUDUEjme nFU5gyZgQzJqG1gR5VrAxg0sHof9gfVtv4yiXoq3jkS5xQ3Om6quGmElxouy5biUUgra OTOGCxwT/B1CKIs6tM2GZBKGu0C1Z1Vat8c/+UjOa80i3DDbxMIocZFsuL8rWkcJxUtE jELNGEgIthzWLpFWmW1wlEVxnbDxNMUW8F6jdcsOFD8ioO7AP/10MegF5CzoDBCK7i50 ul5SQzFznZYURcH+OUcZ6JNcSXUpcWQNEfBCH+PRh7GFmuL+neysBvMr/AB0q92663CS WEvA== X-Gm-Message-State: AC+VfDx+KDcC8OiBlfyhgnki3NKVzg3XiKSNcEfEYvZWqTaajBAyDYal 052d5nAqzFpaRU7ZQ0cfvc1TjR5N8mnnAq7I8/HO1Q== X-Google-Smtp-Source: ACHHUZ4LTDrkAw0L7HlPtzAoKEGy6vu6rfRW948Z4uR9/WEvIGUKJtHdHiA7WMkNsH/gMkPcTprmuK5NDMCE6G0gdGA= X-Received: by 2002:a17:907:1607:b0:96f:dd14:f749 with SMTP id hb7-20020a170907160700b0096fdd14f749mr12248458ejc.23.1687290540158; Tue, 20 Jun 2023 12:49:00 -0700 (PDT) MIME-Version: 1.0 References: <20230531022911.1168524-1-yosryahmed@google.com> <20230601155825.GF102494@cmpxchg.org> <20230602164942.GA215355@cmpxchg.org> <20230602183410.GB215355@cmpxchg.org> <20230602202453.GA218605@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Tue, 20 Jun 2023 12:48:23 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: multiple zpool support To: Johannes Weiner Cc: Sergey Senozhatsky , Minchan Kim , Konrad Rzeszutek Wilk , Andrew Morton , Seth Jennings , Dan Streetman , Vitaly Wool , Nhat Pham , Domenico Cerasuolo , Yu Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E5A972000C X-Stat-Signature: qstdifzairumnzq98rkx9hapuqtf8z74 X-HE-Tag: 1687290541-451244 X-HE-Meta: U2FsdGVkX18csm2j9DAB3+AiymdsmkuBxStGe9oawhHA1nn063PRayW0YDVf7v6xIURP8iF+vuhAL8tuso6aDpsHdmR3Dk8w+INBYuN5l2qyqdGMYjVSiB9IueYDiohbYwUrrxwvJjZs8AeqvFeOzNzVQO3fYS32APffIUbuW9zH+WuWFSe+nws+nEWBHfjx59QV20yCjO9x9PszmcpC3ds+oz7zDMweLpmjry77L17vh2K4wtCVXnEaii7c2sduy9uc+vyuDw9n1GMo4t4C9HN+v3thZbw/3rUdQyuQowGjHKfNegJHPRUlQgjeKhc7WGUQutYR6IUyksBOCZIJb4F3wUJoxFOSeB+JiGJ/vfGUys3UEBLey11I2jRZPJX8oUI9IpWd7WLn2sQ/+e3QD9OKaQqdI/dkM/obsYEKSSU4LQUZjWllX6KDEl21TLm51GY86ZES2gVfk0E5HI4g06rLyIL1amNKbci/1mrEJNK0nrJ3NOQh8wc4cO2UTDJB3KJQCJM5vW7FSB3DzygEdyZyvuQLAKivh7gZvlzyhoNKMxcr2+UyjYxZ2cskeNhi1GPewwGUod274UK2UNgjtPj8ufEWozlTgc8eYHwMG5cSFnvCKtLW5oBPXPWyVOw4ATt42zcc0UXi76Ague0jK6i9WeCFC+vROqbylICe8D459ADJlw65eO8HAPD1bLrhhSWZzrUv7i7m+3TCRK5d3HhavL9vJYe8KHiFxm/2VfP2ZU5/ooOq33vCkasBW//LwFb8VIlFkCAtAA+/2OSBsJZEtqy7D9ecRvNrGXhZhme073mr9w9Tk7lvr5h7C6mRhDe1PAF8iSVvvvW7r049Up8pngVUQLuBws3ZaRKRSXvEfdtCgnBYDZ37ciW8SRNXSzdMFEaasaD5bdNkHh9CktrwWu2Ko2EM16E+zgobOtRNXlyzdABKipAYFMRDkPPJOqW4V/LiuvBbii/9fF8 vg5Wzynm p1P6Ybgph4ODaEy/1f/DoE9ueSdeos2cy4xfh3A7r9nMaa0/pgdXqyyVibqQZsCthOgthdCNrkmdFeSTnDUImcRdLOtsBJDnmowW9Bg/ezYy/XEkqJiTwIVCYPBT9Uze4XIIaGVkvHS1Pzl3Z9Zo57FRQrTiTH6WvtMzkJdERaB0w1ASkTqtsGigbC0o1ysqgyAbroPHz4FRGDogGm5lrNR0FHRQMUbPwNsHLxs1U5rU9M+m0fHtTTSPm+XOV216kRq1SVmHiJv1zegCFhrgqwkSSHGlmzB1plbu9XWjegCvQp2lWk2pJR6JE/c0IruQTHjJH+PXii8d1QKpLJJLQaTZegJ8OFL5izaphoqmEIzEB/yPKmrMce6Jlvw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 14, 2023 at 1:50=E2=80=AFPM Yosry Ahmed = wrote: > > On Wed, Jun 14, 2023 at 7:59=E2=80=AFAM Johannes Weiner wrote: > > > > On Tue, Jun 13, 2023 at 01:13:59PM -0700, Yosry Ahmed wrote: > > > On Mon, Jun 5, 2023 at 6:56=E2=80=AFPM Yosry Ahmed wrote: > > > > > > > > On Fri, Jun 2, 2023 at 1:24=E2=80=AFPM Johannes Weiner wrote: > > > > > Sorry, I should have been more precise. > > > > > > > > > > I'm saying that using NR_CPUS pools, and replacing the hash with > > > > > smp_processor_id(), would accomplish your goal of pool concurrenc= y. > > > > > But it would do so with a broadly-used, well-understood scaling > > > > > factor. We might not need a config option at all. > > > > > > > > > > The lock would still be there, but contention would be reduced fa= irly > > > > > optimally (barring preemption) for store concurrency at least. No= t > > > > > fully eliminated due to frees and compaction, though, yes. > > > > > > I thought about this again and had some internal discussions, and I a= m > > > more unsure about it. Beyond the memory overhead, having too many > > > zpools might result in higher fragmentation within the zpools. For > > > zsmalloc, we do not compact across multiple zpools for example. > > > > > > We have been using a specific number of zpools in our production for > > > years now, we know it can be tuned to achieve performance gains. OTOH= , > > > percpu zpools (or NR_CPUS pools) seems like too big of a hammer, > > > probably too many zpools in a lot of cases, and we wouldn't know how > > > many zpools actually fits our workloads. > > > > Is it the same number across your entire fleet and all workloads? > > Yes. > > > > > How large *is* the number in relation to CPUs? > > It differs based on the number of cpus on the machine. We use 32 > zpools on all machines. > > > > > > I see value in allowing the number of zpools to be directly > > > configurable (it can always be left as 1), and am worried that with > > > percpu we would be throwing away years of production testing for an > > > unknown. > > > > > > I am obviously biased, but I don't think this adds significant > > > complexity to the zswap code as-is (or as v2 is to be precise). > > > > I had typed out this long list of reasons why I hate this change, and > > then deleted it to suggest the per-cpu scaling factor. > > > > But to summarize my POV, I think a user-facing config option for this > > is completely inappropriate. There are no limits, no guidance, no sane > > default. And it's very selective about micro-optimizing this one lock > > when there are several locks and datastructures of the same scope in > > the swap path. This isn't a reasonable question to ask people building > > kernels. It's writing code through the Kconfig file. > > It's not just swap path, it's any contention that happens within the > zpool between its different operations (map, alloc, compaction, etc). > My thought was that if a user observes high contention in any of the > zpool operations, they can increase the number of zpools -- basically > this should be empirically decided. If unsure, the user can just leave > it as a single zpool. > > > > > Data structure scalability should be solved in code, not with config > > options. > > I agree, but until we have a more fundamental architectural solution, > having multiple zpools to address scalability is a win. We can remove > the config option later if needed. > > > > > My vote on the patch as proposed is NAK. > > I hear the argument about the config option not being ideal here, but > NR_CPUs is also not ideal. > > How about if we introduce it as a constant in the kernel? We have a > lot of other constants around the kernel that do not scale with the > machine size (e.g. SWAP_CLUSTER_MAX). We can start with 32, which is a > value that we have tested in our data centers for many years now and > know to work well. We can revisit later if needed. > > WDYT? I sent v3 [1] with the proposed constant instead of a config option, hopefully this is more acceptable. [1]https://lore.kernel.org/lkml/20230620194644.3142384-1-yosryahmed@google.= com/