From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2D15C4167B for ; Wed, 6 Dec 2023 20:08:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 46D0A6B0080; Wed, 6 Dec 2023 15:08:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 41C1A6B0081; Wed, 6 Dec 2023 15:08:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E3886B0082; Wed, 6 Dec 2023 15:08:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1ABE56B0080 for ; Wed, 6 Dec 2023 15:08:32 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E383BA023A for ; Wed, 6 Dec 2023 20:08:31 +0000 (UTC) X-FDA: 81537480822.30.D6A72B2 Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by imf14.hostedemail.com (Postfix) with ESMTP id EF67410000F for ; Wed, 6 Dec 2023 20:08:29 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TC77AERz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701893310; a=rsa-sha256; cv=none; b=A8acokM7hqWwyWX7F7DyMvd1vM6RP7zCb80X1ek01snIVlr1rae0ZMtU1M5AxPMDaU4beS fS7s22V4f12lcaKcLV3vqVVZSR30ACHZ+R6Om509Y+PEmqkjsjBmrGy+Nd96vsgWiDa1tj DN4OqN6CIpfLMaZEIsMCnvh9RZFfn3k= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TC77AERz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701893310; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7SfYIeu14tCXK8E3Za44CV4qWg51U2kdyUz3RWXxawQ=; b=QENhzDgjagJFGW+Mi3leVMczE2uTPZVtGbv1ws3CFRMo2iepMGqVRJdZZHwhCWYhB8aWVc /ew2Iy0yZOmjtsjVVfxFnGX4sJMGal1Mk0NrqP30ktfTne4HnbRw4890BsWN8tvuj6ipJ3 4kD3AdyDBxZqppyoU3fWYJLfcAk6IEw= Received: by mail-vk1-f174.google.com with SMTP id 71dfb90a1353d-4b2c0ba26f1so143733e0c.0 for ; Wed, 06 Dec 2023 12:08:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701893309; x=1702498109; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7SfYIeu14tCXK8E3Za44CV4qWg51U2kdyUz3RWXxawQ=; b=TC77AERz/4kz16f+cIJ6P2Vcv2FiLfMmzV+jVXBXd/HVn2yTBQdEp38wUURLNRKXB2 aQySbexG+YRJ2hyhL4++8LnHK6kn2552jM9dj2Txe3PjUivL7e2MWJdNb0+bwbcKc/tD ljP7aLZnHfs+w1JN+VEWGLrgF7JXASyYFhCtMeg86/tFR2fFNAIH6+fXGUDwDASM5d0k K6ggpFTM9xWw+uwTiXn9hqt5Fvsp2PrHY0q7Sbp9E3Xrvf1gO28817SAAO/SBz5KbTpa zNQacl55qVdOsrukigp0PmayAqBOCz0RR1F+Al5ch87D9WDAyzdVPpwJ7hMG32LJkCXq XrmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701893309; x=1702498109; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7SfYIeu14tCXK8E3Za44CV4qWg51U2kdyUz3RWXxawQ=; b=eQvWFjLKI975Y/Ragm6y9izEwcz0BS5WMk09pjTFHuSyUdunqkWzWLh/DoZFwtLMDA ybbQOHKGLAbgxNzljHdbO38y7Znqii3J9zv/xBdrZ4Z5MCvW4kqBwmoMt7JS5cPg+AJ6 PBxivq/y82vaT1S1B1+bYc6QZ+o/aXmFK6RTYmdUrMHROQyx9cF0wurxL+JBRmm4Qwyz 5hFWc97/OIIVszb+85N0TUoYNlr4vmXVJd0s+OBwp/GwfswfbeCag+Uxzk6v+LmfDEKV 19QIRCtBb+j2YS0hEkgC13kuqLER+G59/oj4OIwIcdjCye+2RywTAnrKv83vBIBn1wwp NGOg== X-Gm-Message-State: AOJu0YwXHuRkq/tuDvURFuxqPu3isZ5Y9YHKP4zuz9hp3FDAj7SJlM/j PdEI8yrUxI52aVeqtziXKM2R+BLVpKyt+zHGw6g= X-Google-Smtp-Source: AGHT+IHe02MEVz7Be6DdmFr7DSfmJUUxAP+wHlwxCJjgnIhbuyrHJbmYVUuhbD47v/gkgt0DmytXtO6YnhoqGZepol4= X-Received: by 2002:a1f:ecc4:0:b0:4b2:c554:cd05 with SMTP id k187-20020a1fecc4000000b004b2c554cd05mr1332842vkh.17.1701893308918; Wed, 06 Dec 2023 12:08:28 -0800 (PST) MIME-Version: 1.0 References: <20231206-zswap-lock-optimize-v1-0-e25b059f9c3a@bytedance.com> In-Reply-To: <20231206-zswap-lock-optimize-v1-0-e25b059f9c3a@bytedance.com> From: Nhat Pham Date: Wed, 6 Dec 2023 12:08:17 -0800 Message-ID: Subject: Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree To: Chengming Zhou Cc: Vitaly Wool , Johannes Weiner , Michal Hocko , Seth Jennings , Dan Streetman , Andrew Morton , Yosry Ahmed , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: EF67410000F X-Stat-Signature: 5wbc8fjt5axm6u3goo4mezrrgxa66in4 X-HE-Tag: 1701893309-292162 X-HE-Meta: U2FsdGVkX182hFEWAGMWLH02Y+tvKY9R7GIlogyG9hz/Ejx7FzfzDtrHAyXKPMJRMNqee1bypExslD38eUjCt8aTRDERbZjTMBKXsLfgtwbpisei34UkYmfNOGQYS3wpDHEP6q3/JvlrFhXeyrmHO2CHaqTNFxj7Smsq7AXTNe3XMlRPr04aJcmINOrhP/a1O1slyp74ICsaN4b5pJMargOV43dt1prMlKis2T8TxNvLezmwE9sR8DG04d+Uro9hxCrf0JaUt8ut8TCGOniLRZRae/g+QWOX+1yIfVFUKsdrI6GG6JhjBJR3ivlkLEt28fkgvCttyRgO1AL28QkYvKUlcZ3I2Zt39z3IV2RWxYKbBSrt0/+AenmXnEoX33nUFCSTz20kFC8fC8APukWvEuLHWkhiBqBwrsjs7CQkukOgf6Qp382Zcm90eKyiehVI81EY94cc6VdUBh59LtOamopEZ5k0Y+RlpKMwECm3mdSAc6luMih3tTTHJqiUlRc93iS7YXwvAUkXo/jSh++sPd+KRKfFWBsrksbInsDMDttf1+6E66XTDeURV26NuB7TnrKA1lBl1LPKt0UMh3/5eR2DJ9vDhY8XpzK7i9Yu+cOmaJ4BPt5tb+SEwDSQzE7e/6o3HM4JwtUsiRCeJsW3mzkiBl0E2GZ6UV/7qJqcdD8ir4x3TXD2e9o4oy0anpKZFTr+CYp11nXQNea3nCwu///9MLHH5+rwn/2VnUAPH/0v6s2w3q+ms1Zde4Q09mG40S+z//bJd0unr52AWvCcitidNO5ZtwSxCiBEhNiQjlu6O8MhG4W3b8GZ4FyKpGIiT/qvNkFZgOSWoTrYaQJJyENfdco9jl7tNTFe5AOoRRhGuXAEe+VOaYxPHxm+K9BrpO2tFPAEPUSD1dUbSP+Uq1B1a5BvZMlS/jaXVZ17rFvUQSxkbnsYuYP2xro+HwgiBnkzxv8d14LVf1TNMPz JJyan4Og tYUpfr1uLwVAzMsuLA/VNVlLxf+CAscvg9MXrGZPvTpdZtKi11icw3yEyf5fF65FyhUebbObxjXLwM2M5/pJaEOpmzYavXt6b1EuB65a91/43+9gvPqG7RE07OLKwKcDb+hV3s7itbYcWdfPszI0cw3x/ixw46z7aDviots8nFzKxKBfxTXlEJ0Ym0nktPheGRyhk9O8n/fZE5QpBS6xdMK/51F05NNRSclOaUJM47LQXEHM819GG1Rz/c+e0k2yYqkYlEB2kWo8n4FUDiPNaCiUYvVea1dfjt8Hm0MLyyWANlLxHb/TyupTzG8NP469uXqA85yGHGJgD6H1BQV1/A+7WE0WjtaJ+Yfw8A37H2/7Z+FAmQDLPp9tLLsXNPVZ2X0g9fWoa2g2ttl1KSUzAMGvx6OvIdJzWEOYb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 6, 2023 at 1:46=E2=80=AFAM Chengming Zhou wrote: > When testing the zswap performance by using kernel build -j32 in a tmpfs > directory, I found the scalability of zswap rb-tree is not good, which > is protected by the only spinlock. That would cause heavy lock contention > if multiple tasks zswap_store/load concurrently. > > So a simple solution is to split the only one zswap rb-tree into multiple > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea i= s > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks= "). > > Although this method can't solve the spinlock contention completely, it > can mitigate much of that contention. By how much? Do you have any stats to estimate the amount of contention and the reduction by this patch? I do think lock contention could be a problem here, and it will be even worse with the zswap shrinker enabled (which introduces an theoretically unbounded number of concurrent reclaimers hammering on the zswap rbtree and its lock). I am generally a bit weary about architectural change though, especially if it is just a bandaid. We have tried to reduce the lock contention somewhere else (multiple zpools), and as predicted it just shifts the contention point elsewhere. Maybe we need a deeper architectural re-think. Not an outright NACK of course - just food for thought. > > Another problem when testing the zswap using our default zsmalloc is that > zswap_load() and zswap_writeback_entry() have to malloc a temporary memor= y > to support !zpool_can_sleep_mapped(). > > Optimize it by reusing the percpu crypto_acomp_ctx->dstmem, which is also > used by zswap_store() and protected by the same percpu crypto_acomp_ctx->= mutex. It'd be nice to reduce the (temporary) memory allocation on these paths, but would this introduce contention on the per-cpu dstmem and the mutex that protects it, if there are too many concurrent store/load/writeback requests?