From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6ADDEB64D8 for ; Wed, 14 Jun 2023 20:50:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EDDA18E0001; Wed, 14 Jun 2023 16:50:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E8D406B0075; Wed, 14 Jun 2023 16:50:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7C578E0001; Wed, 14 Jun 2023 16:50:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C50796B0074 for ; Wed, 14 Jun 2023 16:50:49 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 91B101C851D for ; Wed, 14 Jun 2023 20:50:49 +0000 (UTC) X-FDA: 80902547418.13.BD13296 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf21.hostedemail.com (Postfix) with ESMTP id 7191D1C0011 for ; Wed, 14 Jun 2023 20:50:47 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=SALkQ4t0; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686775847; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FC5iC4ImBYsUmKxr7XGRCVjARJJ2KtE+ycbFfpInJ5k=; b=VllBbCO7/8v5zztr5VNyRPQtK++eJcmJgyB2n7sKkz7frF/GE5w8kmx/1wEhUzKNdu/dT2 XYI3NVBUJL/2CXtl0hcblF0/Vz4jTO5IllPduYjjepIaAJBQaJMZtSidUN7XK0IHf6Bq0G SLFN2hY7/aLHAm9P+bfUfnVtrWozccM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=SALkQ4t0; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686775847; a=rsa-sha256; cv=none; b=R6R3mzpD8CTY4bu1ZtDP+XhGvmSW/M/egqFW4aqLFpvxIhKjPb7UKRItgwYqQ/9r+3fejH wv1yrCzzysE57WSVcVjp3tr0DLzQ7RAHwhdIj9DD1HvoI0IL3PpFn8qWrWZoy1VdysrRbl p5X3IzgVOk9KlvMqjahBlibXB/sxevk= Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-5186a157b85so4720299a12.0 for ; Wed, 14 Jun 2023 13:50:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686775845; x=1689367845; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FC5iC4ImBYsUmKxr7XGRCVjARJJ2KtE+ycbFfpInJ5k=; b=SALkQ4t0EysnL+VaZSowgUd4PFFi/eLXn+4cmJp8WenGL6SUWlKzPNIfSxP9ZT/QGw TX+Py43cYhmk9kTGEk0yrAaM2rpRgstVC4mzksGSXOUAuRu+tdE71/Qis43aJZNkH0J4 KI9mCQfw0TRFZ46DOBGs9ibOu3V8Pu8fP34nq+uyMXJQUolPZpG1stSxsHvwyIsUT9QN AtawcXSIHrbNyf603gKKJnTMvN8PNc1hSLp9RPWrHWiaw3IJiAfFGW3EHNQes/aHcpmN e4VDGvMHMc/U3UT1IdxUL1R3LeN9u6xD/HnsY7/99yR98HbngSLvn9xz6BDsX8TDhaj/ TXQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686775845; x=1689367845; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FC5iC4ImBYsUmKxr7XGRCVjARJJ2KtE+ycbFfpInJ5k=; b=SgKtinD/UX4AeFaAke0iQKhMhQGQ1RfwgI5Z4FDItSvNhLHzz5u2BHcgqRviy7mF76 aJPo+5847/8Kx4eVE8guBr/a1n8zpoBHi5VIbuqA7Mfuf1Kxf4Z4sS54mQo3LHt34StF ybBuk84XkeTwqSbNC/hPcSwrg+H2UK3HEv23b0/+yFJ3yP6YZWmYZfgIghtFB+IbQNWl fYOojh5cphB23IEgnb10B54vv3xDRbaq423A0gfsYWFvrGrhEQc02/57CMl5aBwKJhXy WsphHkhIWOtA4Cey2SU2mokZ/5kamY2dJMU7qpCif9G5DperggQf9ypIjsVeG1m5AfaS 45bw== X-Gm-Message-State: AC+VfDxakFivvWqOIRIK8AP148i4VAmd6QLhSFmBmqukuRxvFK2+hI8Y HtT1OKUDd1I6gjHH037/tQBaYPJ61HU36RpuAOO9Sw== X-Google-Smtp-Source: ACHHUZ4DNQDMUHyUAaSg0RGBtCXzTxsBnoFoo051p7yPalBI9ZAz2iwIHhdujTPQ/bNzI5jQsa3g9nJTDcretRDF59w= X-Received: by 2002:a17:907:961b:b0:982:9a9f:d43c with SMTP id gb27-20020a170907961b00b009829a9fd43cmr362564ejc.48.1686775845509; Wed, 14 Jun 2023 13:50:45 -0700 (PDT) MIME-Version: 1.0 References: <20230531022911.1168524-1-yosryahmed@google.com> <20230601155825.GF102494@cmpxchg.org> <20230602164942.GA215355@cmpxchg.org> <20230602183410.GB215355@cmpxchg.org> <20230602202453.GA218605@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Wed, 14 Jun 2023 13:50:09 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: multiple zpool support To: Johannes Weiner Cc: Sergey Senozhatsky , Minchan Kim , Konrad Rzeszutek Wilk , Andrew Morton , Seth Jennings , Dan Streetman , Vitaly Wool , Nhat Pham , Domenico Cerasuolo , Yu Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7191D1C0011 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 61trbdizwpx6fn9193k54y3q94yr7c1f X-HE-Tag: 1686775847-279609 X-HE-Meta: U2FsdGVkX18Q627Q/VForf8w5XsPErxJ4dP/P7Tu/vS3FVy3GZFZDpVQ2zN8jEjZKFmGVzsmH0xPtzTRwSzUx4uCkiw+IUsVDXTiXkwh5RONz1qsv2hRDMXafkTumNQdmbVz/YdbuBFA+fJZ27vcMLFz902r1C1tTr71AAupV/7ooWqs7iKjOHD93u/WQmYYL1Er3GUK9XlOawQb3+lxLN6j+BIqRagIuNpq2dZH0LLBzVWAEn2kpIzzdvTKmAJpVBtpAldfgkrrumH1odGaSvmWIHFfgWS7n7mHZEu4HN8TAl56ln70lIRpwvFvSRPUXyeHSBZrLXvJRPiG7/1gsE0LAFPh749g7KP7ppPgTIo3H5L0LKw6Z89XcDzSIrYOglLVD73kmnyKWcNYe5DI9I1dF6K6HRhERWuwDFanlNg8etFiD2MOeMoRniFu6SxFOVXgzZDjnwxrGO4bdssxBLgCM8AhKu91B6djJBGUqWXnn54VVIiqaYZbzSOh7wR5OhjyebxyBkGUA37+DHnHuqePl9Fnlqczgj3SpYgFNhrBd+FQkJyHBKPTVDWg/CTuFED8r64Xif9tpeeOhJ9CaeCqQwB702Hd9gPknudJ/qMlbIAq2X/XYdnDXuLFolxt07PeaGP/qzOg1syyNNN8c4FjrrUfKaE+4w7dzgYBZrxrf/mCj3msQiqavy12lqwaOAdk0p6NfxI6PV8poB8i8BQxTNTA4/gscU49B6DlzSntFwfSPZpFPI+6O5lQqLWNxfe8wUyqKw3kW3TbQU3ITLwXWOUg78r5V2BGfqRkoJenE7a8/0yGqg91NZEoF6gNnej/V7+OP7ZXeLjJHF9ChF94bkYcYvzJhBmgoFvv2gLbxVjg6wVQdaajJ3ZC+XxCVBSw0OV8qOyplVROmSBWjpWD83touc0kRhrtfA8IU8uuBgSt+DKQ/BQXy7+4GEWry4cfEuMKXgclRuPZlEf ZC2B1uz+ v1XzFE0eNTTluQ4imE3AFhV101jwVP0hj9t9+TQ++5USVmabix5yV2BzhDB+ONP8lg0i8gAd6JmEkRwX2o+JYE9ecUmJgpy4Qg8zm3UxrFbtxvEQmIuKtOR1A2y4m04dWJgojp+iV5MaDgelkLrVxPG29PcbLbeDFocd38W8VttB3legDe6NzzRPOlt59USslAazCJjyntTfnRWrawbCtbIXQkNh6HF4UEfSel5NLF7gF5cPQc+Dzi+GshN/VHOR5FQlmmd1ge9AX00vfkhUjSl3/D02LM9bvUUJbGqTJWFOaa2e8bh2QMJ9XBg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 14, 2023 at 7:59=E2=80=AFAM Johannes Weiner wrote: > > On Tue, Jun 13, 2023 at 01:13:59PM -0700, Yosry Ahmed wrote: > > On Mon, Jun 5, 2023 at 6:56=E2=80=AFPM Yosry Ahmed wrote: > > > > > > On Fri, Jun 2, 2023 at 1:24=E2=80=AFPM Johannes Weiner wrote: > > > > Sorry, I should have been more precise. > > > > > > > > I'm saying that using NR_CPUS pools, and replacing the hash with > > > > smp_processor_id(), would accomplish your goal of pool concurrency. > > > > But it would do so with a broadly-used, well-understood scaling > > > > factor. We might not need a config option at all. > > > > > > > > The lock would still be there, but contention would be reduced fair= ly > > > > optimally (barring preemption) for store concurrency at least. Not > > > > fully eliminated due to frees and compaction, though, yes. > > > > I thought about this again and had some internal discussions, and I am > > more unsure about it. Beyond the memory overhead, having too many > > zpools might result in higher fragmentation within the zpools. For > > zsmalloc, we do not compact across multiple zpools for example. > > > > We have been using a specific number of zpools in our production for > > years now, we know it can be tuned to achieve performance gains. OTOH, > > percpu zpools (or NR_CPUS pools) seems like too big of a hammer, > > probably too many zpools in a lot of cases, and we wouldn't know how > > many zpools actually fits our workloads. > > Is it the same number across your entire fleet and all workloads? Yes. > > How large *is* the number in relation to CPUs? It differs based on the number of cpus on the machine. We use 32 zpools on all machines. > > > I see value in allowing the number of zpools to be directly > > configurable (it can always be left as 1), and am worried that with > > percpu we would be throwing away years of production testing for an > > unknown. > > > > I am obviously biased, but I don't think this adds significant > > complexity to the zswap code as-is (or as v2 is to be precise). > > I had typed out this long list of reasons why I hate this change, and > then deleted it to suggest the per-cpu scaling factor. > > But to summarize my POV, I think a user-facing config option for this > is completely inappropriate. There are no limits, no guidance, no sane > default. And it's very selective about micro-optimizing this one lock > when there are several locks and datastructures of the same scope in > the swap path. This isn't a reasonable question to ask people building > kernels. It's writing code through the Kconfig file. It's not just swap path, it's any contention that happens within the zpool between its different operations (map, alloc, compaction, etc). My thought was that if a user observes high contention in any of the zpool operations, they can increase the number of zpools -- basically this should be empirically decided. If unsure, the user can just leave it as a single zpool. > > Data structure scalability should be solved in code, not with config > options. I agree, but until we have a more fundamental architectural solution, having multiple zpools to address scalability is a win. We can remove the config option later if needed. > > My vote on the patch as proposed is NAK. I hear the argument about the config option not being ideal here, but NR_CPUs is also not ideal. How about if we introduce it as a constant in the kernel? We have a lot of other constants around the kernel that do not scale with the machine size (e.g. SWAP_CLUSTER_MAX). We can start with 32, which is a value that we have tested in our data centers for many years now and know to work well. We can revisit later if needed. WDYT?