From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FADAEB64D7 for ; Tue, 13 Jun 2023 20:14:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E96A26B0075; Tue, 13 Jun 2023 16:14:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E47126B0078; Tue, 13 Jun 2023 16:14:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE8788E0002; Tue, 13 Jun 2023 16:14:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B90D46B0075 for ; Tue, 13 Jun 2023 16:14:41 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 68B591605EB for ; Tue, 13 Jun 2023 20:14:41 +0000 (UTC) X-FDA: 80898827562.08.7C19A07 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf05.hostedemail.com (Postfix) with ESMTP id 76F61100015 for ; Tue, 13 Jun 2023 20:14:38 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=GiOiER+s; spf=pass (imf05.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686687278; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=W1psdIQn/Alk8QNqRdoqO3HAQ2Q9SRKg6BbF+y6GPB4=; b=bqik+GIcIy/2mINCWF31/Qta8PMLLukcctmE8ZLUygkK93gZ5aRqNFDOFwbYmLyhvn1c+X nA7V3hMl0NrgNIlWG2jINrWB6T6yCtYSmKiQReavHG/eDom2LZS/V+5tTrKbFqbiHn96OC Ata5dGQBZvIGOXMFN3BoigYkIkzfAcg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686687278; a=rsa-sha256; cv=none; b=tOZf/BQa+Hli8cUVwsI/zNrK42V1hmBkFHTRkJD93ymltt5cIMDww3ursmURHlm+LVQ2nx BcYfWrmBp1NQidxEdZiMBGhGiO1BeObtNrASy2rQltwH1bBVfb3Rwa/E9kbH7eN8KXu8La ONE7WlUNvB/g9GsZNThKm1EWsdyrlw0= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=GiOiER+s; spf=pass (imf05.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-9745d99cfccso1028996666b.1 for ; Tue, 13 Jun 2023 13:14:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686687277; x=1689279277; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=W1psdIQn/Alk8QNqRdoqO3HAQ2Q9SRKg6BbF+y6GPB4=; b=GiOiER+sxlhsvK2jx8M07h1B4SaZgb9IEwQ8TJ8C1zJ/EeFzbStk1wpz1w7kU1PyTO by8+yfr6hD8L8hvdyvNmWy6gwX7W1yrTq0KiCF35hjBcg1SjEK4MHaH0pam/9vghnjL4 aiFt2qyHdxMoc88O/tZ4jPv2lPuciL+e28EmmPZLQEt30Voc/oGseWBow8yhSjQOcqYZ HUV+cXiSKywb9rI81U18z0qd7VJkK/+q2vU+V4YFx1rRjrBiAsNgbjid8ywF1diwoazm PVNg6cGUk5WP5eDzrV5r1G1q7VZz6Tuuf2fvuF70U5BsuOLitPmEn50drj+BaqnboFG5 CZfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686687277; x=1689279277; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W1psdIQn/Alk8QNqRdoqO3HAQ2Q9SRKg6BbF+y6GPB4=; b=HV4KkexgCeGJy6mUX1E8vT35fSaPWS699UK42F18BD8R4SMbyDLB49LyPl2MO54C+b 62lKZOtgXKT8w4D96vvCEUAK80ArB+xhm2hbt57+4qHoT//rdRARe+UIvhPEMpdSLq/W o7WbfYJp6nKErbuqFxlWOUbWYOM+TCvZoAuXwEYy2JP6tZZ8a4qrrtuxh6BEBwGGg6mT GHNkHZKSSarMMlioHz2bTG8xE4RJ1PF++rCXPGjklLU+48v/DZHQp+bewiwixc5lYDh4 6Iwd4yRPi+bjls7DBhrcMHZw4ckDTfygEeXanx2QOkubUjkX48otXZ62VHQojjYXmhus TB7A== X-Gm-Message-State: AC+VfDzd5SqZWCdJRe0Wwb8J6rg6pZW1/MnqqfxyO/DCzlzapFmnYaaA awS8iWrmhESz0BpB9RJl5YWB7z4+yFoSXreCGtb2RQ== X-Google-Smtp-Source: ACHHUZ53+2YVtJiYKGxWl2T46vTV5fmxFpoPtRAbXNaVzRoAjUj9Fvo+8ty8smNGvZMLrqhuwKgSm6EtQa3Hor/qves= X-Received: by 2002:a17:907:9620:b0:978:8790:9103 with SMTP id gb32-20020a170907962000b0097887909103mr17026037ejc.70.1686687276699; Tue, 13 Jun 2023 13:14:36 -0700 (PDT) MIME-Version: 1.0 References: <20230531022911.1168524-1-yosryahmed@google.com> <20230601155825.GF102494@cmpxchg.org> <20230602164942.GA215355@cmpxchg.org> <20230602183410.GB215355@cmpxchg.org> <20230602202453.GA218605@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Tue, 13 Jun 2023 13:13:59 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: multiple zpool support To: Johannes Weiner Cc: Sergey Senozhatsky , Minchan Kim , Konrad Rzeszutek Wilk , Andrew Morton , Seth Jennings , Dan Streetman , Vitaly Wool , Nhat Pham , Domenico Cerasuolo , Yu Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 76F61100015 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 4bfixbcwuiauckznacj8dc8t3egbboea X-HE-Tag: 1686687278-325813 X-HE-Meta: U2FsdGVkX1+Vxmb6k4y/n4LyD26XMrNy0QIlP3BcUFk7PQhgZ2+g8zq+ZorMP49VVuIg2Jg2O7GHE8VXy0C3Vt90lI+ZDEpswY7CjITRjTLrbxDJsbxtvYd0q7jKk03ZeSR+lxc+tMcSOfEDjJqg//84SrW2Qtz3Op03k1n5IVXbHaWEEjSUQcY3883XEcV5EE8qH4W0JPSDV3VRDQO9a8TBEmHV/JciU3DX8wyOGv0EL0JxXC93iPosJPMUz/O5o4tHcR+oyKzlAtM53CP3RJOYyTHCjy1+MJx6cyGVV6alNI1CHzfRcUxTtCrlJ7lbuAPUuNlEOEx4v0aOljstY2la7D6hh9e8PelBmY22i3v/V4YiRqBstfey1+7YwsJuBOf7hIyhT4xEwQcTuw6xtC5AA08yRNpsdCZ3Ih8xA6AFXjuprnK5XAGoPGZvWbaf7LpAO19gJGOq+532AmbclM1LdCHl4JK34gqaNoIyvSu8yIIA8Xum/NMmdHtLpTCCiiIpS3VKoFqTqX4gCKwi/3lc97p3xtYAUJIdTHxIA4k53HKc2eUsNwp+bWbF5Ia+gudEsxzBpxR/b73zZpa7qfxN21R8K9Dl/K9kYgQeU5f4ZXSoAc5/PxmgaG+6uc2tvClfHXntRydAFqeg659dorZCo+Hb+ES4mdOBT9kfpKN9B7VHTPwK6qGSKMexRmpeQED7zRlLV3rO4z9NM6LmRHB52Hf86i8ASaeBPFTguP3UfZLlBzaVAdhZExYX9wZpwlc4gRaiJ/ArKSqPkwZMBqwxtzGYuIgUm/iR6zCcq/ZPOn/tXxl0rgfLsnwWuZZcyGwBEw3KqNoLhfvcT7mIHgH+nJebpB7LL6/0LrRJqipqVbvpNwLjc4VlBOOeekye8m7fqbHmDVWZAKxZkmOm8aoRXjxOI8MJAYR/mWuYYbWz1oFLX9JIbBdfWpDtzVmrfI0Ofusw/SsftuphDJ0 rdpHMwy5 dbxkQmBABW2aAgdFcp9hhEKe//WNihO8cqNqOByW4uBwWBGA3LiLeo6oGRzvNwkTe5gkECbLeZxPUvwDGJBqZdkfKKwfeMWX/yFXJSscQ+ArCII3yLeuIwirovU7ESj1R0eieomphHraL4Z/VgPnBh2TCn7vxjOvogO9fWggch3PS06K8rqYZ2tyhw9zzaNaaqqMzN6GC37x9m5enYHjdv84Oq8TQr+vSV/iTlKgTkLFs1ph1sl/XpjYvBAO2VakVKvBrsCEXTk9hZIulsr8TTRnzmBjuOS6XCIhDWnyQ+W47eXMcA3my9pPHNw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 5, 2023 at 6:56=E2=80=AFPM Yosry Ahmed = wrote: > > On Fri, Jun 2, 2023 at 1:24=E2=80=AFPM Johannes Weiner wrote: > > > > On Fri, Jun 02, 2023 at 12:14:28PM -0700, Yosry Ahmed wrote: > > > On Fri, Jun 2, 2023 at 11:34=E2=80=AFAM Johannes Weiner wrote: > > > > > > > > On Fri, Jun 02, 2023 at 09:59:20AM -0700, Yosry Ahmed wrote: > > > > > On Fri, Jun 2, 2023 at 9:49=E2=80=AFAM Johannes Weiner wrote: > > > > > > Again, what about the zswap_tree.lock and swap_info_struct.lock= ? > > > > > > They're the same scope unless you use multiple swap files. Woul= d it > > > > > > make sense to tie pools to trees, so that using multiple swapfi= les for > > > > > > concurrency purposes also implies this optimization? > > > > > > > > > > Yeah, using multiple swapfiles helps with those locks, but it doe= sn't > > > > > help with the zpool lock. > > > > > > > > > > I am reluctant to take this path because I am trying to get rid o= f > > > > > zswap's dependency on swapfiles to begin with, and have it act as= its > > > > > own standalone swapping backend. If I am successful, then having = one > > > > > zpool per zswap_tree is just a temporary fix. > > > > > > > > What about making the pools per-cpu? > > > > > > > > This would scale nicely with the machine size. And we commonly deal > > > > with for_each_cpu() loops and per-cpu data structures, so have good > > > > developer intuition about what's reasonable to squeeze into those. > > > > > > > > It would eliminate the lock contention, for everybody, right away, = and > > > > without asking questions. > > > > > > > > It would open the door to all kinds of locking optimizations on top= . > > > > > > The page can get swapped out on one cpu and swapped in on another, no= ? > > > > > > We will need to store which zpool the page is stored in in its zswap > > > entry, and potentially grab percpu locks from other cpus in the swap > > > in path. The lock contention would probably be less, but certainly no= t > > > eliminated. > > > > > > Did I misunderstand? > > > > Sorry, I should have been more precise. > > > > I'm saying that using NR_CPUS pools, and replacing the hash with > > smp_processor_id(), would accomplish your goal of pool concurrency. > > But it would do so with a broadly-used, well-understood scaling > > factor. We might not need a config option at all. > > > > The lock would still be there, but contention would be reduced fairly > > optimally (barring preemption) for store concurrency at least. Not > > fully eliminated due to frees and compaction, though, yes. I thought about this again and had some internal discussions, and I am more unsure about it. Beyond the memory overhead, having too many zpools might result in higher fragmentation within the zpools. For zsmalloc, we do not compact across multiple zpools for example. We have been using a specific number of zpools in our production for years now, we know it can be tuned to achieve performance gains. OTOH, percpu zpools (or NR_CPUS pools) seems like too big of a hammer, probably too many zpools in a lot of cases, and we wouldn't know how many zpools actually fits our workloads. I see value in allowing the number of zpools to be directly configurable (it can always be left as 1), and am worried that with percpu we would be throwing away years of production testing for an unknown. I am obviously biased, but I don't think this adds significant complexity to the zswap code as-is (or as v2 is to be precise). WDYT? > > Yeah I think we can do that. I looked at the size of the zsmalloc pool > as an example, and it seems to be less than 1K, so having one percpu > seems okay. > > There are a few things that we will need to do: > - Rework zswap_update_total_size(). We don't want to loop through all > cpus on each load/store. We can be smarter about it and inc/dec the > total zswap pool size each time we allocate or free a page in the > driver. This might need some plumbing from the drivers to zswap (or > passing a callback from zswap to the drivers). > > - Update zsmalloc such that all pool share kmem caches, instead of > creating two kmem caches for zsmalloc percpu. This was a follow-up I > had in mind for multiple zpools support anyway, but I guess it's more > significant if we have NR_CPUS pools. > > I was nervous about increasing the size of struct zswap_entry to store > the cpu/zpool where the entry resides, but I realized we can replace > the pointer to zswap_pool in struct zswap_entry with a pointer to > zpool, and add a zswap_pool pointer in struct zpool. This will > actually trim down the common "entry->pool->zpool" to just > "entry->zpool", and then we can replace any "entry->pool" with > "entry->zpool->pool". > > @Yu Zhao, any thoughts on this? The multiple zpools support was > initially your idea (and did the initial implementation) -- so your > input is very valuable here. > > > > > I'm not proposing more than that at this point. I only wrote the last > > line because already having per-cpu data structures might help with > > fast path optimizations down the line, if contention is still an > > issue. But unlikely. So it's not so important. Let's forget it.