From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 163BCC7EE2C for ; Sun, 28 May 2023 08:54:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 870406B0072; Sun, 28 May 2023 04:54:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 820766B0074; Sun, 28 May 2023 04:54:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E99D6B0075; Sun, 28 May 2023 04:54:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 608436B0072 for ; Sun, 28 May 2023 04:54:04 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2A8CF1C6A95 for ; Sun, 28 May 2023 08:54:04 +0000 (UTC) X-FDA: 80839051608.24.314EF76 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf02.hostedemail.com (Postfix) with ESMTP id 5B86E8000C for ; Sun, 28 May 2023 08:54:02 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=hbPqke+W; spf=pass (imf02.hostedemail.com: domain of cerasuolodomenico@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=cerasuolodomenico@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685264042; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gTNnEKjsWf93jMarUJhL2TSgg0Eu0fVUjkEcBR2z2aE=; b=6pxqQru5tj4fv366fyrpoLayXE5Mf7F5Pp5HF0FnwP3GjbmQMxszHoNrM8Fgc4Y3IqFjeN KUuus72dxYbYh3gFtGQP9nNpMCze3fLer+NlLbuY7ggCCt8sQdyaPPlDTWEw7AqNNJqf6C mIT3h7bhjqwGXLIgsn5nFxz/E1o9N3g= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685264042; a=rsa-sha256; cv=none; b=vcjNA2csP5F3GHlBoS7AHkiYm4AuQn5I80yzKLeMyleEgizQyd7Z/rjF+6NpRhQCFlFKw7 qiI2XpSSFHnIjx35qsuqunQJYRe3lPfiTKyGJdNLK+1lAafDX+JrT0yUOgY322NKd3XUtj +6RiJYVVGcFHOabOXf5ICltFpzFNzcU= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=hbPqke+W; spf=pass (imf02.hostedemail.com: domain of cerasuolodomenico@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=cerasuolodomenico@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1ae3fe67980so21026225ad.3 for ; Sun, 28 May 2023 01:54:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685264041; x=1687856041; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gTNnEKjsWf93jMarUJhL2TSgg0Eu0fVUjkEcBR2z2aE=; b=hbPqke+WmacLdkNB+dPWppjHdcIR1P2gNFdwrYrOz9Gzx1W/cAtQTkNjXSpKLoTo6g MFao17XomXbR0tjG3Enp/ZQ0S/nzdp6ji3oD+OiPSe+OsJ/ZAJNKIaua754C1Wqy64Rd qBiHmrgcQUi/CGgujmt8y2puIaueVC5QvxSm3r1Ou1kvgdxJK32iwEEb79FF0q2iWy3D ZB5KlwtwkD+D51usS+d35iqiTFCvbKdSFFrLE4Gcfzltw+/juzObF0qKYgGCO2eyA0rL iVQywffAUyH9a9CMbEC2IW1borYQyqgyN21DEzcHdHDHzjrSkbC/m7XnH7w1rKOweEer DpuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685264041; x=1687856041; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gTNnEKjsWf93jMarUJhL2TSgg0Eu0fVUjkEcBR2z2aE=; b=IGNBpV7U7TuzjPFt+ap/IduaBqN9ASCY0AbUyiYxrgeWYdo7xBT9vdmxACWoGGHqwE cNCh9V3UT/GIsmKPiCLpRdd2OOnhqW6XDiDiixZEmQQhuprVLoQAoKDPnA9ivr7xEEBS +ATZJMHqe+OImoQXkncyANw9g86cALyV0r7jSpBTLaVtfLnIlH7QThx6xk1RhDGaWho+ EpyhrBPW9f7Alc2BsfWpnsl5esVDuFpfW3hpnHAl8i6R6TbtT7yQGTITDqhlfnz1i02T jlLvwCcgGjk8RcsSGUoeGys2+t/gBkIWg4So7qbmet2s4vHGB/wSJY7wzGdMOSnI1hTu G8Cg== X-Gm-Message-State: AC+VfDxSHJNHHQnEipr+cR25NxcSFUB4SHzD1EGuRlbzIu+g26w0Xd3N t1Rm/kORx50N0ALT4fsEPOImOHJk+sZmsmpraLU= X-Google-Smtp-Source: ACHHUZ7j7tE+7vfXkZ/cH6iNrKVJucsYLdVKjYbOkhT5oYrXuM6iDZLa8Rdy4rY166t9/MSX9mdh7KOslmVki204oL4= X-Received: by 2002:a17:903:1c4:b0:1ab:109e:a553 with SMTP id e4-20020a17090301c400b001ab109ea553mr9983613plh.62.1685264040912; Sun, 28 May 2023 01:54:00 -0700 (PDT) MIME-Version: 1.0 References: <20230524065051.6328-1-cerasuolodomenico@gmail.com> In-Reply-To: From: Domenico Cerasuolo Date: Sun, 28 May 2023 10:53:49 +0200 Message-ID: Subject: Re: [PATCH] mm: zswap: shrink until can accept To: Chris Li Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, yosryahmed@google.com, hannes@cmpxchg.org, kernel-team@fb.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 7n8qo6fjo3p7xkqeiwxcucnwmj5j9u7s X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 5B86E8000C X-Rspam-User: X-HE-Tag: 1685264042-425653 X-HE-Meta: U2FsdGVkX1/HWitEhim+ojdkSwIdb0Mi7+Is4asl6IY2znWS8d/N7x4js4jdks6zWuUV70wfedGaZeBWMhi1ky+BiS5l1LSdkg88Ual5uBtzXPcnECz70Arn352JH9YeA24OGT0HmjzxY0Uf+59Sq2S3WCeRvKyI3yCYjwkMhnxJory0puU0BWQ3x1lwDTTvCEPMzx78Ar1KOg8WbDXqTtlWq/RViwhsFyAkSpnUDj37d5y7r1fWIMpYssBycfohcrgad4QgliXjoydeQSpooodjeqb/n+j2VhWfrcUsmjswegJtG/ezQHQDNMWPRF04PAiCveFvvbzo7kNO68o1rTfvxiTG+H8HMadbc85Ax0DnHRxOI18bTn7YZdge31ic3uO/+3nnbhEUZFCmBUmp7HLt1Dgy1yCNs0YeI+RJyrnJKN/pqz8X+AMTOi3IAndxkyoF8tzMy2FNHDtwaBs63P0vHC11Wf5QME7o7IKSGgXITGBZKU6bCfmLcF/q38BJqjC6sFUx6vSi5SNAe6xoF40AUYFwRHN40LTLAJRUhKuWwa34aTZGB3+rJFOctnsLdyaLdrfLooROMXkxNZr+4+OJ4Rws5scuHoBlcrR+4L86/UR5WKpxMdApNeOkyzKDEPForjib30aAi5vjTVOzws7g5Jnxd9UZ31jffLgilhYCBJf4VAAObOQDJFhILqMGLL0cn41bBp/Glb9/N6gwcSv5ZFVo4CLCPNFvJyTsKM+nQxrm4KhXyzIKS+7wRECMVTec/CZkWFwKDqsgMcIsjL5Z+6h0oep3NIWgNnFbX/diRd3AFhAy0H3QKKv4lXJLRSK5Z4NjKnGXn9pDLkBWQ1J4ghRlkEsvHNltLSX/dO6TLbXHDtIFxkyTfeN8IrhCJ0kM7o7pdT+PY7/WfEkAbDvNFa87UW+Mjn7DfLBYCJj9UxXvMBYGtjXX4UxaisQyyOygh5M2R9mnS26I0Eb 2tuVrsiH Nwdjamf2gVKNySzwbb5QcqH8zEwP2afXdQYIVwg2wVR4XIm21LzB3fRzdOVZDgLEWbwWslFdBX3VFd8ku+9eQPsDMkDupaTVOJOfgQe3W/YtJaV3ls6Jza/i8FXNbk03sv1dGGSruLn7sM4enl2JcXzTLOuSkUBSLdVQGLXkGVd9gHYwuVXhOa11nubEOw4ibWLTSH9nEX84fI8IJohOHBFb+XSWZqcWDJvgI8JVfRRCoNQw8kw9qXVywfh2eUfVdjjRwPh+VhVdkvr81S+h10yXxRnS8ST/zDwQiGjwrWHFBeA1fy75YkhQA5uUrfwtp0mGJdc2Ku2MrI4roahJKOMQjWHa5KTEr8sO+M69/E7PvulxA8eagWzYjNnOVF9SJbmm2wI6IAqKKzlSsxXsynUlsXLaf4tPWtCilkIfPxZ5Hlr0axNxnjp/KnQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, May 27, 2023 at 1:05=E2=80=AFAM Chris Li wrote: > > On Wed, May 24, 2023 at 08:50:51AM +0200, Domenico Cerasuolo wrote: > > This update addresses an issue with the zswap reclaim mechanism, which > > hinders the efficient offloading of cold pages to disk, thereby > > compromising the preservation of the LRU order and consequently > > diminishing, if not inverting, its performance benefits. > > > > The functioning of the zswap shrink worker was found to be inadequate, > > as shown by basic benchmark test. For the test, a kernel build was > > utilized as a reference, with its memory confined to 1G via a cgroup an= d > > a 5G swap file provided. The results are presented below, these are > > averages of three runs without the use of zswap: > > > > real 46m26s > > user 35m4s > > sys 7m37s > > > > With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G > > system), the results changed to: > > > > real 56m4s > > user 35m13s > > sys 8m43s > > > > written_back_pages: 18 > > reject_reclaim_fail: 0 > > pool_limit_hit:1478 > > > > Besides the evident regression, one thing to notice from this data is > > the extremely low number of written_back_pages and pool_limit_hit. > > > > The pool_limit_hit counter, which is increased in zswap_frontswap_store > > when zswap is completely full, doesn't account for a particular > > scenario: once zswap hits his limit, zswap_pool_reached_full is set to > > true; with this flag on, zswap_frontswap_store rejects pages if zswap i= s > > still above the acceptance threshold. Once we include the rejections du= e > > to zswap_pool_reached_full && !zswap_can_accept(), the number goes from > > 1478 to a significant 21578266. > > > > Zswap is stuck in an undesirable state where it rejects pages because > > it's above the acceptance threshold, yet fails to attempt memory > > reclaimation. This happens because the shrink work is only queued when > > zswap_frontswap_store detects that it's full and the work itself only > > reclaims one page per run. > > > > This state results in hot pages getting written directly to disk, > > while cold ones remain memory, waiting only to be invalidated. The LRU > > order is completely broken and zswap ends up being just an overhead > > without providing any benefits. > > > > This commit applies 2 changes: a) the shrink worker is set to reclaim > > pages until the acceptance threshold is met and b) the task is also > > enqueued when zswap is not full but still above the threshold. > > > > Testing this suggested update showed much better numbers: > > > > real 36m37s > > user 35m8s > > sys 9m32s > > > > written_back_pages: 10459423 > > reject_reclaim_fail: 12896 > > pool_limit_hit: 75653 > > > > Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool lim= it is hit") > > Signed-off-by: Domenico Cerasuolo > > --- > > mm/zswap.c | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 59da2a415fbb..2ee0775d8213 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -587,9 +587,13 @@ static void shrink_worker(struct work_struct *w) > > { > > struct zswap_pool *pool =3D container_of(w, typeof(*pool), > > shrink_work); > > + int ret; > Very minor nit pick, you can move the declare inside the do > statement where it get used. > > > > > - if (zpool_shrink(pool->zpool, 1, NULL)) > > - zswap_reject_reclaim_fail++; > > + do { > > + ret =3D zpool_shrink(pool->zpool, 1, NULL); > > + if (ret) > > + zswap_reject_reclaim_fail++; > > + } while (!zswap_can_accept() && ret !=3D -EINVAL); > > As others point out, this while loop can be problematic. Do you have some specific concern that's not been already addressed followi= ng other reviewers' suggestions? > > Have you find out what was the common reason causing the > reclaim fail? Inside the shrink function there is a while > loop that would be the place to perform try harder conditions. > For example, if all the page in the LRU are already try once > there's no reason to keep on calling the shrink function. > The outer loop actually doesn't have this kind of visibilities. > The most common cause I saw during testing was concurrent operations on the swap entry, if an entry is being loaded/invalidated at the same time as the zswap writeback, then errors will be returned. This scenario doesn't seem harmful at all because the failure doesn't indicate that memory cannot be allocated, just that that particular page should not be written b= ack. As far as I understood the voiced concerns, the problem could arise if the writeback fails due to an impossibility to allocate memory, that could indi= cate that the system is in extremely high memory pressure and this loop could aggravate the situation by adding more contention on the already scarce available memory. Since both these cases are treated equally with the retries limit, we're adopting a conservative approach in considering non-harmful errors as if they were harmful. This could certainly be improved, but I don't see it as = an issue because a differentiation of the errors would actually make the loop = run longer than it would without the differentiation. As I was writing to Yosry, the differentiation would be a great improvement here, I just have a patch set in the queue that moves the inner reclaim loo= p from the zpool driver up to zswap. With that, updating the error handling would be more convenient as it would be done in one place instead of three. > > Chris > > > zswap_pool_put(pool); > > } > > > > @@ -1188,7 +1192,7 @@ static int zswap_frontswap_store(unsigned type, p= goff_t offset, > > if (zswap_pool_reached_full) { > > if (!zswap_can_accept()) { > > ret =3D -ENOMEM; > > - goto reject; > > + goto shrink; > > } else > > zswap_pool_reached_full =3D false; > > } > > -- > > 2.34.1 > >