From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E383C47DB3 for ; Thu, 18 Jan 2024 18:37:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F012E6B00AC; Thu, 18 Jan 2024 13:37:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EB18D6B00AD; Thu, 18 Jan 2024 13:37:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D798E6B00AE; Thu, 18 Jan 2024 13:37:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C34CF6B00AC for ; Thu, 18 Jan 2024 13:37:57 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 56DFCA24FE for ; Thu, 18 Jan 2024 18:37:57 +0000 (UTC) X-FDA: 81693290994.08.390674A Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf02.hostedemail.com (Postfix) with ESMTP id 8394C80009 for ; Thu, 18 Jan 2024 18:37:55 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OsbT2LRT; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705603075; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9bYBrB0Absrq0cOqKL/yLzU6Obd9P2ZIGfMVvbGA9S8=; b=e3NujslqLP6a6erheecfstITA5iTjmrSnBX8lbxupS9foFHX3xPnKc7+Fq2BigcnswKhYT DENhSLVYT87GayQOja4Z24dXI/LbOWY4Z41R9UUVrbBQ455Ds5mkAH4LLTLRKDttouJmPG cROcG+/Xd4cq/vpeUDfBzD/HX0HDFkY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705603075; a=rsa-sha256; cv=none; b=xlAisIrgimxTI/3Aa/AxHLlxZglbTGRJCTowcbGbspMxQN0EaeUv/wzSc7N+wXRHrcIPSn WU+jVAuf+/nSl2pxPYU4TC9D766vKOD97gP+KatV9EyJo6ObpXVS/iEIXvW2cPlllQl+Ku KsOKa0PxbP5Eh9fGM/lsrtOVre3rvAw= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OsbT2LRT; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a28fb463a28so1351254566b.3 for ; Thu, 18 Jan 2024 10:37:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705603074; x=1706207874; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9bYBrB0Absrq0cOqKL/yLzU6Obd9P2ZIGfMVvbGA9S8=; b=OsbT2LRT0roktshHjmUge3sbUq2hCZXmrDhAcl4wQ7bs0K/btQ9sfaEpdc8qHYt9yg 5V/skfS5k9VYqXKixsoYikg0DWkZXpEzqiNlYBsKu7jchI0cLxOYtHbdmWOKVihgpiDH Z9PnRI1NEup8i2AFxlf0uVT65rYOe7DIJj2OXvc1an6GDkRXmiUXO7F9p5qpUOCubd5Z IdEIqgSFWX92qjWgpYW5LNPndtNytIpaxlvbZOPs5TAO+iomzgCk/UjNh0aG32iBsJJc DCILJOO/bY6PUg9fON18cwXXYLT7igblXx+cHiOH5EhvX5LKDUisx4/QP/RsG8LeT671 1Hqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705603074; x=1706207874; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9bYBrB0Absrq0cOqKL/yLzU6Obd9P2ZIGfMVvbGA9S8=; b=JW4UCGKUWZcvRc6bAhasV0VjG5/dygBV8PEoQIm3VJasZOv3Euu7QjmdU6X+6slwRf /8uEfEkr643nG6z82GLC+67n7HhjhG0Tg+pmcMd++ja77XF7mMl88jnm3f7EOk2xfED9 x1s7UyCoZLqb0XmXR1S2Hj+1/QTTN5Z/9F9CiYPEvvszexxlR7s5RCwAot2l1uuQRLqv KfKhwSNMfbpzTYW4EWLjO2szg2riXOmd6VnvG7TgHL5mqM7cI2RkcNhjH1QfyCg+xTHS umtl2p/sUOLQEaVS3gwMJwfPSExD3T2lvW5yGQe6ZktJzYeRsng5TR9/Rlpu1Uykdy27 yM+w== X-Gm-Message-State: AOJu0Yy6ne9wwOs73WB7qnmSLFuBxLQcXrHY1/VIpXanZ9zO14Ax7eyG ZrCckeu8vKUaDj+dHMbgfob4N+cZH86ZUQXnw1HChOunhSWWRKzF3ZRGYp9NlPxiOG7IUtjbXoq xP9WnjzPFnuAUWHjGuKJ+WjLCu41qqpw15is8 X-Google-Smtp-Source: AGHT+IFeUx3hVxaDTHPRbozzkWnnAPGABzgGB8u4CWgMJ1hUMaUFdcZoBq0X8CuWN6gIpHNy8n//Rg5/P84Vg2NyiSQ= X-Received: by 2002:a17:906:6150:b0:a2f:17d6:68b1 with SMTP id p16-20020a170906615000b00a2f17d668b1mr716320ejl.30.1705603073825; Thu, 18 Jan 2024 10:37:53 -0800 (PST) MIME-Version: 1.0 References: <20240117-b4-zswap-lock-optimize-v1-0-23f6effe5775@bytedance.com> <20240118153425.GI939255@cmpxchg.org> <20240118180655.GM939255@cmpxchg.org> In-Reply-To: <20240118180655.GM939255@cmpxchg.org> From: Yosry Ahmed Date: Thu, 18 Jan 2024 10:37:17 -0800 Message-ID: Subject: Re: [PATCH 0/2] mm/zswap: optimize the scalability of zswap rb-tree To: Johannes Weiner Cc: Chengming Zhou , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chris Li , Nhat Pham Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8394C80009 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: zygrz3crf5ua8nut5x33ftkst68453ns X-HE-Tag: 1705603075-478323 X-HE-Meta: U2FsdGVkX18Pdv1QqUgZSgUuvuNMTthxYk2yOMasMf1USw8uAg7500qJrq1+4Ouuovy+3JHtczebH4Ty0mwh5vroMBSUvbNr3aDdyy2RYlKO1ZCF2FQUBbsY/s4G8ei03dld32sV9zJcmiZ2gSp2gWmj6/GHZAWDr2SMdHYA5NmdsdjJjV5mfBuTyVEMWEXdF+DJycAfCPHkicUWh86JdAqx6TDIUwkqAQ5aGPZLPoxmGiA98z8QvrW0KAPXXMJccqiRFYLeSKSELaeRnvl8HjSqu1GjaCLO+BIwezooYNHFJIaav8F7a1NHof3JV8jA75Xr9OwTMflocQG6OLjxRVFYHzzlHATT+c9Goj6uF39eh9xffIBvnxhoG5T6SOch36YEhorq6QV6YunNY8bxP97FqBAVP/5lXKp3D6cajXcXPH43J+JCywtEMu8avbO8St2wASKQlZL2QBm+SvBUM8Xj4IiKkGV94kRos90R7HFLQNmcE7E0F9EOnx5Z/tfkI0VM3EsMFZRbRNs9Z4ttqrQ72OPTT+zsn6tJ4o+bSuaI+QftSM4Y2CuB9C6C9Xs43tEzyvMDQDsDJUzOZL9oIPgTPbXHsMiv4UpxwWah//QH8gtEHfzuhEGXLUT9b4HKb6gHAx3fhcEW5zUjLUoumRjdRd0MOc2F957A5fIFfvzPKGki8retWCXqbjF+ARELzbx2UpDnl4T1hAOkcC47ILzY91eSt/+AwwNEpy6uttEQHqM9tYyf6ISFVeWmvbfQV9TxpXzEeauAzmPFeXxo5U5X0KKo82SwflxJtD2WvDJqd/zKNjLD3+QawY5e4QzrcTyvm6A4mQMsO19PnSAsapLg5VA3pmPW4rbhnOJedYYuhRJqyMb8gpNpouYeZO9vcCOF4563//Vg7bOWaxDJeVqf+6N/b85X2Nc+iIZhAk7LvIFDWKJzoyOYYnTpiPoajJBKAQAoWi/m8zjVFqL zqiMRkxk AkuDPfbS8kK5BACOQkNg55IxCWfzB5KjXCDHgkatEwGpjU4zuUag2yg8dI4sCcbgTrxFkTN5GMEPPNThTZevuJk+DszHXH8BQGDzMYTP1HMUAanZVftRcwBCcxMbFLeMxNJbwqCbAjutMAZOcrLAjn1g4aNm/55s0wB0t8umIYXvDKVCc9pwv/TYkgLDg7JL1leX3TLsWXovEfkQFkCTaYXCoLyZFtuHuP4UE5xHLXzpFlnohH/WnGm7OhHeG9WknIMmtN1hVS0fIbSY6DZIhMpI2YdTHkcZ6PxT0c5DOaLpTK+tV1Yb49RMXRJpMcMjRVBTNMxDcL5nHT7Ug5LK5VRgRjg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 18, 2024 at 10:07=E2=80=AFAM Johannes Weiner wrote: > > On Thu, Jan 18, 2024 at 09:30:12AM -0800, Yosry Ahmed wrote: > > On Thu, Jan 18, 2024 at 7:34=E2=80=AFAM Johannes Weiner wrote: > > > > > > On Wed, Jan 17, 2024 at 10:37:22AM -0800, Yosry Ahmed wrote: > > > > On Wed, Jan 17, 2024 at 1:23=E2=80=AFAM Chengming Zhou > > > > wrote: > > > > > > > > > > When testing the zswap performance by using kernel build -j32 in = a tmpfs > > > > > directory, I found the scalability of zswap rb-tree is not good, = which > > > > > is protected by the only spinlock. That would cause heavy lock co= ntention > > > > > if multiple tasks zswap_store/load concurrently. > > > > > > > > > > So a simple solution is to split the only one zswap rb-tree into = multiple > > > > > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). Thi= s idea is > > > > > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64M= B trunks"). > > > > > > > > > > Although this method can't solve the spinlock contention complete= ly, it > > > > > can mitigate much of that contention. Below is the results of ker= nel build > > > > > in tmpfs with zswap shrinker enabled: > > > > > > > > > > linux-next zswap-lock-optimize > > > > > real 1m9.181s 1m3.820s > > > > > user 17m44.036s 17m40.100s > > > > > sys 7m37.297s 4m54.622s > > > > > > > > > > So there are clearly improvements. And it's complementary with th= e ongoing > > > > > zswap xarray conversion by Chris. Anyway, I think we can also mer= ge this > > > > > first, it's complementary IMHO. So I just refresh and resend this= for > > > > > further discussion. > > > > > > > > The reason why I think we should wait for the xarray patch(es) is > > > > there is a chance we may see less improvements from splitting the t= ree > > > > if it was an xarray. If we merge this series first, there is no way= to > > > > know. > > > > > > I mentioned this before, but I disagree quite strongly with this > > > general sentiment. > > > > > > Chengming's patches are simple, mature, and have convincing > > > numbers. IMO it's poor form to hold something like that for "let's se= e > > > how our other experiment works out". The only exception would be if w= e > > > all agree that the earlier change flies in the face of the overall > > > direction we want to pursue, which I don't think is the case here. > > > > My intention was not to delay merging these patches until the xarray > > patches are merged in. It was only to wait until the xarray patches > > are *posted*, so that we can redo the testing on top of them and > > verify that the gains are still there. That should have been around > > now, but the xarray patches were posted in a form that does not allow > > this testing (because we still have a lock on the read path), so I am > > less inclined. > > > > My rationale was that if the gains from splitting the tree become > > minimal after we switch to an xarray, we won't know. It's more > > difficult to remove optimizations than to add them, because we may > > cause a regression. I am kind of paranoid about having code sitting > > around that we don't have full information about how much it's needed. > > Yeah I understand that fear. > > I expect the splitting to help more than the move to xarray because > it's the writes that are hot. Luckily in this case it should be fairly > easy to differential-test after it's been merged by changing that tree > lookup macro/function locally to always return &trees[type][0], right? Yeah that's exactly what I had in mind. Once we have a version of the xarray patch without the locking on the read side we can test with that. Chengming, does this sound reasonable to you?