From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8BDAC77B73 for ; Fri, 26 May 2023 23:05:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FFDD6B0074; Fri, 26 May 2023 19:05:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B05F6B0075; Fri, 26 May 2023 19:05:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 377F1900003; Fri, 26 May 2023 19:05:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 275B76B0074 for ; Fri, 26 May 2023 19:05:19 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E71001A0FB4 for ; Fri, 26 May 2023 23:05:18 +0000 (UTC) X-FDA: 80833939116.04.74C0AA7 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id 2332140010 for ; Fri, 26 May 2023 23:05:16 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=JZwCITr2; spf=pass (imf07.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685142317; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fej27+UW2WctZnjK6a6Xwarozk5lgZH3Ol3mD1BkHkw=; b=LowVMgpZdKU5bF5Tjau5mwZT4DRtjev20F+Nogt3eEMoqpB90rfrK0KDrx+cKgShRmka+L hF/JDkiwiX0IN6hQnuXYZHPMGMU0qu0k43i+n7Viy79bHrxcsgIZZCd/UUfrO7kQD8SGFH neM9H46pP/XrwbXYAUNuV7lIqDJ8Jro= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685142317; a=rsa-sha256; cv=none; b=ZmFZXaNOWrlWCnlnHm/Rc7SSKIoqwAcf9tGriBmNtLcl3Y4kzqF11kb1fEmpmrSMCVAXsa vvc6zzYrm2FMJD61TLY0k0lX1RQU/ecHCTq6XOC113gwumBMDkMW9HO4ICII9wI0UM3kwq ZvW3o11QGeSziV40SiKOSPVNy1V3laE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=JZwCITr2; spf=pass (imf07.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 103A763A2C; Fri, 26 May 2023 23:05:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 23CD0C433EF; Fri, 26 May 2023 23:05:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685142315; bh=o7r+d8qzwX9M3m6z3rdCF7BCtCqP4HTkRqtrPREaUUM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=JZwCITr2YFhYNri4/dGDXFblt+6XlKP9TEe3b+489HHKlYRPXjrTUg796gOhh64sF 3w0stqbIgJtgjqrpznsVihZ7M3m7KVtyKRs7Ovzai854pxyPbBGouLU6v5veQUoMgr W+whw8M85k/y9KWzu044ugGn50RSO76RcVR23tDiWN84fOiUaJQre2g1I+t9Jk+Qg3 w17wpNaB7z1tAOKNjbJyU8ijnlvn3PzSUBP33Jc6STe+wYvlRSvkN5Nh8tYeOXk8Hy pSqBnwmx1ZEz1OltHv/Lbx9TvviHLbKnagXRKFHrkJ8xLlEUqk/SJarv2653YO2zbN CcfvkKM8eZqtg== Date: Fri, 26 May 2023 16:05:13 -0700 From: Chris Li To: Domenico Cerasuolo Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, yosryahmed@google.com, hannes@cmpxchg.org, kernel-team@fb.com Subject: Re: [PATCH] mm: zswap: shrink until can accept Message-ID: References: <20230524065051.6328-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230524065051.6328-1-cerasuolodomenico@gmail.com> X-Rspamd-Queue-Id: 2332140010 X-Rspam-User: X-Stat-Signature: ju4ng3nax1xptnwk353zcog7ehizcgp1 X-Rspamd-Server: rspam03 X-HE-Tag: 1685142316-977208 X-HE-Meta: U2FsdGVkX1/GeS2lLZNX7CcpQLnfaOAhvyuX7Dbh7D8xItl90XVWXGu0wThWU7UsrkYYB8DKsSwkpIUGj8bJioTE5VeK5V4HrHuJ6dl/lm02xJvl9rvabUf18DzbUWEegQJhe6xK5gekPPrFTkBb3pkHcSkmQa9gFt0DPT4MgpFq4sUcbiBdy/QpJEnBrtx/CO7n2qHG4lYbHwBw1orfg51AP2g83OhfTnc0+CQbE95DFelzfta+d5wuH26FdgjBI9OdrQyS32uPf9aclgH8s0X4kBc/PXw4RVylmvE8G+IMrVq8dkxC6ioQTYV7bc4MLHUP+bj0Hx0s/8ShvrDJWofbn6nI3i1LSO4sfa3fvmEea+y3xpD1+Z4rlUq8lsWxlYOFe1BynWYEQ0x4qfoVo7+gktLmigaMHcnNPbfCSKQv+mptfFS0R2QD9XLuFbDBj9rkFFnf6z3jliszDWZsbN05DirX55OPOVB+tGFVA2vtF6h80gk5hIc72MYViT2SpurKQ2JGNwPd/BIFEaA0lyyHWU7Bv0ojChY42stEFKmoj1BTHxYPXTwG+zTy/0mQMEDNSjp7c9LTZLG4ZdisnUD/LS15Do4WU1dywC1o6ouOqqCqLSrU9f5QneIEA65HtdQPCRkqQ/yMDcLC++CwjTMw6KA+75DInb6nK6QgEUsKMttXyGW2aNENIRJe7LZe1THBZzpJqDCh4WbN58tvncURWvmrhY9KE8xF1Y0g234rHMhDDByMK5c+xNhHUH+zVl2FGnkcWrLDSHlQBDSn/IcvBmXsZz8WoiCg4jX8oPw66Vg/6vlpm/KhLwYJhvMrWU5YPWNZmnCK9CyRCJO/Nymo5p2cnWxhzvjpqlqLiciZpgPPxdbx9Vl2SPOtKy6YMlrP6sHCdGbZWd52Xvihp/Qp/FQo11bRavHN+92w+4uCf/iWeoUqguRdF8IiTUAI51JWp4rYbMQbdnWEBBx /b9r6hgv ldFd57T1VFMSZUDeFJKZpgZxFXpNnxO68/KP0VIWlWZZvtYhfRJ5hzpBPBZ7lOX0PMR0V44YRcxp2r24m5qwCdOzcfga7rxjnXwByUCTJq75rVek5UqN/5c4OaCMbOi/ckOwg2X4jpm1eM3yZertHDYffC3DXQR/rR7EZzZXwieyzMJvYgrGs7xzxLiFQW0MxL1YVldSD3MoZ8AdsWv2pDkmqxml18LcAN0G2HJy6s1byuUCht8NdDBSxYY8yAkWWXVIio1CHNZrBnLrQNog1wXR24A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 24, 2023 at 08:50:51AM +0200, Domenico Cerasuolo wrote: > This update addresses an issue with the zswap reclaim mechanism, which > hinders the efficient offloading of cold pages to disk, thereby > compromising the preservation of the LRU order and consequently > diminishing, if not inverting, its performance benefits. > > The functioning of the zswap shrink worker was found to be inadequate, > as shown by basic benchmark test. For the test, a kernel build was > utilized as a reference, with its memory confined to 1G via a cgroup and > a 5G swap file provided. The results are presented below, these are > averages of three runs without the use of zswap: > > real 46m26s > user 35m4s > sys 7m37s > > With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G > system), the results changed to: > > real 56m4s > user 35m13s > sys 8m43s > > written_back_pages: 18 > reject_reclaim_fail: 0 > pool_limit_hit:1478 > > Besides the evident regression, one thing to notice from this data is > the extremely low number of written_back_pages and pool_limit_hit. > > The pool_limit_hit counter, which is increased in zswap_frontswap_store > when zswap is completely full, doesn't account for a particular > scenario: once zswap hits his limit, zswap_pool_reached_full is set to > true; with this flag on, zswap_frontswap_store rejects pages if zswap is > still above the acceptance threshold. Once we include the rejections due > to zswap_pool_reached_full && !zswap_can_accept(), the number goes from > 1478 to a significant 21578266. > > Zswap is stuck in an undesirable state where it rejects pages because > it's above the acceptance threshold, yet fails to attempt memory > reclaimation. This happens because the shrink work is only queued when > zswap_frontswap_store detects that it's full and the work itself only > reclaims one page per run. > > This state results in hot pages getting written directly to disk, > while cold ones remain memory, waiting only to be invalidated. The LRU > order is completely broken and zswap ends up being just an overhead > without providing any benefits. > > This commit applies 2 changes: a) the shrink worker is set to reclaim > pages until the acceptance threshold is met and b) the task is also > enqueued when zswap is not full but still above the threshold. > > Testing this suggested update showed much better numbers: > > real 36m37s > user 35m8s > sys 9m32s > > written_back_pages: 10459423 > reject_reclaim_fail: 12896 > pool_limit_hit: 75653 > > Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit") > Signed-off-by: Domenico Cerasuolo > --- > mm/zswap.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 59da2a415fbb..2ee0775d8213 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -587,9 +587,13 @@ static void shrink_worker(struct work_struct *w) > { > struct zswap_pool *pool = container_of(w, typeof(*pool), > shrink_work); > + int ret; Very minor nit pick, you can move the declare inside the do statement where it get used. > > - if (zpool_shrink(pool->zpool, 1, NULL)) > - zswap_reject_reclaim_fail++; > + do { > + ret = zpool_shrink(pool->zpool, 1, NULL); > + if (ret) > + zswap_reject_reclaim_fail++; > + } while (!zswap_can_accept() && ret != -EINVAL); As others point out, this while loop can be problematic. Have you find out what was the common reason causing the reclaim fail? Inside the shrink function there is a while loop that would be the place to perform try harder conditions. For example, if all the page in the LRU are already try once there's no reason to keep on calling the shrink function. The outer loop actually doesn't have this kind of visibilities. Chris > zswap_pool_put(pool); > } > > @@ -1188,7 +1192,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > if (zswap_pool_reached_full) { > if (!zswap_can_accept()) { > ret = -ENOMEM; > - goto reject; > + goto shrink; > } else > zswap_pool_reached_full = false; > } > -- > 2.34.1 >