From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93E90C47077 for ; Thu, 18 Jan 2024 07:35:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD6206B007E; Thu, 18 Jan 2024 02:35:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B5E2B6B0098; Thu, 18 Jan 2024 02:35:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D8666B009A; Thu, 18 Jan 2024 02:35:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 854D26B007E for ; Thu, 18 Jan 2024 02:35:58 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 514871C0F3F for ; Thu, 18 Jan 2024 07:35:58 +0000 (UTC) X-FDA: 81691622796.04.E620E52 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf14.hostedemail.com (Postfix) with ESMTP id 0CE15100004 for ; Thu, 18 Jan 2024 07:35:54 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=KBaB9ivO; spf=pass (imf14.hostedemail.com: domain of zhouchengming@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=zhouchengming@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705563355; a=rsa-sha256; cv=none; b=zIjlju0bP5NMrbms+ws40COMLB4EFSHKvzawbPWaoyPDnGgnd4/E2updkfZW1dlggWzgwv 2oH6B2EVazFkibBG1mMwyFQEht+fqm9AtHB92EigZShfd/vp3heUOCOiG7Figx6JrhApUW rMKVMb85mTEOwzmd0QLXLApec0ohLLE= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=KBaB9ivO; spf=pass (imf14.hostedemail.com: domain of zhouchengming@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=zhouchengming@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705563355; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=blJO9gE2XGx1TFkEN6VRT9JbKi24N1Ai12mUj8knsEI=; b=rq+CgZEnOSy+u8S2LeNNk0F3jxKgaCBjfXx66eXlUfKUbBaEUsTOVJRoA73AlBxWV51EZj kviojxWaOte23XQ4+SkqSgQALLtZbf52qqXWCBTl8ylftJaJmBrM+Isg7fOJTdV1wLN/sX 7bdCg9llJLuAQaxw4Lx4773TtB/8ttM= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1d427518d52so3869715ad.0 for ; Wed, 17 Jan 2024 23:35:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1705563354; x=1706168154; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=blJO9gE2XGx1TFkEN6VRT9JbKi24N1Ai12mUj8knsEI=; b=KBaB9ivOYgo4MZbCGhJclK3eRfbbEC8YaFGtEp2ykGy8uLlRFZMoy/9sBXVsN86c1g JE/cJk6e6FR2/tvAR/5jXyUx6lTZ8F0tYgdCwrBARwiGc7TIXSYCUlMuyyv2stN5Pln6 SVPOYHp56qGX/a3ZP6GmRQoJYgXaiI3N37U7XVKJvtoaAXl3NcxnKbif3WQIlmgEfMeW if3woo92J/vg/7LL62nVP5beGg7AcoatTjxbfWbFmhtk82CdOADNBJFV2x1tOzRU2Xzo 8bVfpZP3d3eWbBGpUoUWoOg4lXHBWCGYffJMGtL8Qf0aQLO3dv2PNBchPNeeaiNAJh1q wK0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705563354; x=1706168154; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=blJO9gE2XGx1TFkEN6VRT9JbKi24N1Ai12mUj8knsEI=; b=Q40rJyPsVT4Fm//dGlXnGGmN7kZPe3sWbRdgi9LSucY+rSwvMoPbSTFjs/idWnbeGT gkWNDaGYp5d5Sm2ZbGrE+4MbFGXKZAHYNMJ1Iz3cn6r1wPhgjiIeaZg7kieT92VyUOhd loY7r0pgeCOXO94qfPHe+1ThF4favG97Iggt3O3CujTPkISUweybgsn/hCCXPXu3wC34 +WQRHFL368HgXtc+1uwXeNq2yuvgb4LuV8dHANF5EuZ/LOsgs5mutKG/menODJnDxXjn Cq0W1N9aLxCtSQwxdx/JKsGbXgs3RKsrAJbrOAL1alUm35COYSCjTghccOAz2Ag5Eoza wFyQ== X-Gm-Message-State: AOJu0YzzUcxSJIISLzOb3W/5H9vos49IXcqQ6VMTjMKI5BgoeWCHTmDO ZQLTHT48AHI19dtc6u+ebVxwVeCMRlUfRSmzbu9il8CpIxbnf64OfW/xR3IMH7M= X-Google-Smtp-Source: AGHT+IHdS0Lwq6Q/NoPTphsJUIqM1+mK9iR8t9NerFXpVQ56PVLfgHwgQlxGnuYenDEwDKnLQWPFdw== X-Received: by 2002:a17:903:41cf:b0:1d3:abaa:1399 with SMTP id u15-20020a17090341cf00b001d3abaa1399mr726648ple.51.1705563353749; Wed, 17 Jan 2024 23:35:53 -0800 (PST) Received: from [10.255.202.70] ([139.177.225.232]) by smtp.gmail.com with ESMTPSA id q14-20020a170902dace00b001d58ed4c591sm757249plx.105.2024.01.17.23.35.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 17 Jan 2024 23:35:53 -0800 (PST) Message-ID: <3a1b124d-4a97-4400-9714-0cceac53bd34@bytedance.com> Date: Thu, 18 Jan 2024 15:35:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/2] RFC: zswap tree use xarray instead of RB tree Content-Language: en-US To: Chris Li , Yosry Ahmed Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, =?UTF-8?B?V2VpIFh177+8?= , Yu Zhao , Greg Thelen , Chun-Tse Shao , =?UTF-8?Q?Suren_Baghdasaryan=EF=BF=BC?= , Brain Geffon , Minchan Kim , Michal Hocko , Mel Gorman , Huang Ying , Nhat Pham , Johannes Weiner , Kairui Song , Zhongkun He , Kemeng Shi , Barry Song , "Matthew Wilcox (Oracle)" , "Liam R. Howlett" , Joel Fernandes References: <20240117-zswap-xarray-v1-0-6daa86c08fae@kernel.org> <7f52ad78-e10b-438a-b380-49451bf6f64f@bytedance.com> From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 0CE15100004 X-Stat-Signature: 7uzkt43qd3fx68htusd34qm3qdawnk1h X-Rspam-User: X-HE-Tag: 1705563354-771107 X-HE-Meta: U2FsdGVkX194o2YKwNeAycblmHqXvcwTxbM5/1Ffbn0A2fBb5DNiOtnyyjhCoUN2I5dE+Re5+vEZANbDtohXaWxzTrjw2DiNdGoa6+WkZSWCaw/987o+ctY9cDd1UPM5S04zZA3KALDZ6xlzaSLerZJZljt3lMVzkSpUC3CA+eodvaqnljcKbhCgoRASWza142Ke28j/zdo3bZqnH082/nCQVSKu9JP59MOgfY2sKwnUNEKThlBjyVUZ+Q5k4zqrWsbpyOUpmiSfM87o9y+saIKfbOsxodRrAZ59zNrrWvLYs7gZWu575ERIJUQZCPHnNd4N1w6k+o8P1Pxa9LYcC2yEFhGmlhzTtdw7iiayn4i9sShqWiCnsfu4rTPTnpnza5YEv0eGdrCeBa/AmXpxyxmBz6z8ibp+C/T1xiDYlfOA7tB553aBkYTT7hm0CZw4wt4q8jvcm6PJfQsoe+CJgtZcrfgaZsInDScwWRnch6cLMrPjzna10TcH3rfPmP5aeyNR8WeAfrzMShQSOrhDT1nPx86aH/98H7Gqspqn/VbGIW1jzEdzLJXUu3yZ5CLQp/15l1xYS7DadY9kndQ6uNjvv1rqNFhMXMNa5vhzmThkmruykeLe5vl6laM0Db4qRQb/pSPv2zUsd0fzsmy1onZ90ujIwLtccsuO1gizgKf/s3Z647XdEV2dRrLHqdJp1qxZxFDtLw9QU87waHiDpUIIis2KXftzpNsUNQpdfb9tebZTcMh1MUN4KY3NfTace8Sb8YfhYyKosfyzyhfbQAjCWCncz6hokVLzjYyWwFTkGw+d6iNA9aOtNjsN0waBsfq/lnuUREWJgn5DQeu1K30kbDqJTIqzXseoysJzgixTQpSHGa0iJIvCHsr2a2L8Iw2MbBjL2Y00EbQ82BCYd9/mkltGzKcP9LSSvkGAa4ZuZe2FzJlH/6rbo3uIh/0BWf8VnIZ7MhXCIxjCVgU 7zkZdn9p mvs0Sa+PXkM8iI5YRahUwzeUFLPPbTthtfwNacbJ3k9v2ts/zSAHU2XNzSLLzNheRSmsYiUn4v6VZc+fYSkiEZMcaykb8rCk5ZCGepfkjyvtt5xfy8ASocjonM/rLnkFxU2KOJz3yHFXwVAk6LEYmfxH3MFKcqxal8oNdjj3Pi3BHeMwSHROFnB35lDEBPma8hFsZlpwwv384GlNpvlM24Ik19e7UEWG29zfxgZrkoCVlQ+u1mtqE0XEL2KoJavMVe3qDqhJ7mJcNVWG3O6Uj8A/5jKgjiqvH4VwqIxdxbkuE4iB8+o+xl7/85gIS6nwMc3Hitt8fXLswhFBdfvafJU/atFL5pgMpmb/jIzYs0fxajsncr/36NzG1iHkL+T8NmLUrY+B9QkJLDoTWgKCd1HnBnHl2wEEMg1lfzFLzm4oHkoyU3csVNCVfKtYJT024fq654V+YTeHj/HbS7/dioMZ+62QjfKzgLnXmu5oo+zuDW3bAyex38QokD/X9R4/ecW3y3BZXhYbOHcZPszaO0/Pb9Ohf0CrG4E0PJoDg+OfBQ/E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/1/18 15:19, Chris Li wrote: > On Wed, Jan 17, 2024 at 11:02 PM Yosry Ahmed wrote: >> >> On Wed, Jan 17, 2024 at 10:57 PM Chengming Zhou >> wrote: >>> >>> Hi Yosry and Chris, >>> >>> On 2024/1/18 14:39, Yosry Ahmed wrote: >>>> On Wed, Jan 17, 2024 at 10:01 PM Yosry Ahmed wrote: >>>>> >>>>> That's a long CC list for sure :) >>>>> >>>>> On Wed, Jan 17, 2024 at 7:06 PM Chris Li wrote: >>>>>> >>>>>> The RB tree shows some contribution to the swap fault >>>>>> long tail latency due to two factors: >>>>>> 1) RB tree requires re-balance from time to time. >>>>>> 2) The zswap RB tree has a tree level spin lock protecting >>>>>> the tree access. >>>>>> >>>>>> The swap cache is using xarray. The break down the swap >>>>>> cache access does not have the similar long time as zswap >>>>>> RB tree. >>>>> >>>>> I think the comparison to the swap cache may not be valid as the swap >>>>> cache has many trees per swapfile, while zswap has a single tree. >>>>> >>>>>> >>>>>> Moving the zswap entry to xarray enable read side >>>>>> take read RCU lock only. >>>>> >>>>> Nice. >>>>> >>>>>> >>>>>> The first patch adds the xarray alongside the RB tree. >>>>>> There is some debug check asserting the xarray agrees with >>>>>> the RB tree results. >>>>>> >>>>>> The second patch removes the zwap RB tree. >>>>> >>>>> The breakdown looks like something that would be a development step, >>>>> but for patch submission I think it makes more sense to have a single >>>>> patch replacing the rbtree with an xarray. >>>>> >>>>>> >>>>>> I expect to merge the zswap rb tree spin lock with the xarray >>>>>> lock in the follow up changes. >>>>> >>>>> Shouldn't this simply be changing uses of tree->lock to use >>>>> xa_{lock/unlock}? We also need to make sure we don't try to lock the >>>>> tree when operating on the xarray if the caller is already holding the >>>>> lock, but this seems to be straightforward enough to be done as part >>>>> of this patch or this series at least. >>>>> >>>>> Am I missing something? >>>> >>>> Also, I assume we will only see performance improvements after the >>>> tree lock in its current form is removed so that we get loads >>>> protected only by RCU. Can we get some performance numbers to see how >>>> the latency improves with the xarray under contention (unless >>>> Chengming is already planning on testing this for his multi-tree >>>> patches). >>> >>> I just give it a try, the same test of kernel build in tmpfs with zswap >>> shrinker enabled, all based on the latest mm/mm-stable branch. >>> >>> mm-stable zswap-split-tree zswap-xarray >>> real 1m10.442s 1m4.157s 1m9.962s >>> user 17m48.232s 17m41.477s 17m45.887s >>> sys 8m13.517s 5m2.226s 7m59.305s >>> >>> Looks like the contention of concurrency is still there, I haven't >>> look into the code yet, will review it later. > > Thanks for the quick test. Interesting to see the sys usage drop for > the xarray case even with the spin lock. > Not sure if the 13 second saving is statistically significant or not. > > We might need to have both xarray and split trees for the zswap. It is > likely removing the spin lock wouldn't be able to make up the 35% > difference. That is just my guess. There is only one way to find out. Yes, I totally agree with this! IMHO, concurrent zswap_store paths still have to contend for the xarray spinlock even though we would have converted the rb-tree to the xarray structure at last. So I think we should have both. > > BTW, do you have a script I can run to replicate your results? ``` #!/bin/bash testname="build-kernel-tmpfs" cgroup="/sys/fs/cgroup/$testname" tmpdir="/tmp/vm-scalability-tmp" workdir="$tmpdir/$testname" memory_max="$((2 * 1024 * 1024 * 1024))" linux_src="/root/zcm/linux-6.6.tar.xz" NR_TASK=32 swapon ~/zcm/swapfile echo 60 > /proc/sys/vm/swappiness echo zsmalloc > /sys/module/zswap/parameters/zpool echo lz4 > /sys/module/zswap/parameters/compressor echo 1 > /sys/module/zswap/parameters/shrinker_enabled echo 1 > /sys/module/zswap/parameters/enabled if ! [ -d $tmpdir ]; then mkdir -p $tmpdir mount -t tmpfs -o size=100% nodev $tmpdir fi mkdir -p $cgroup echo $memory_max > $cgroup/memory.max echo $$ > $cgroup/cgroup.procs rm -rf $workdir mkdir -p $workdir cd $workdir tar xvf $linux_src cd linux-6.6 make -j$NR_TASK clean make defconfig time make -j$NR_TASK ```