From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1636C2BD09 for ; Mon, 15 Jul 2024 08:20:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 403E86B0088; Mon, 15 Jul 2024 04:20:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B3DF6B008A; Mon, 15 Jul 2024 04:20:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27CFF6B008C; Mon, 15 Jul 2024 04:20:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 089286B0088 for ; Mon, 15 Jul 2024 04:20:21 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AC928A14BB for ; Mon, 15 Jul 2024 08:20:19 +0000 (UTC) X-FDA: 82341289758.11.0674A23 Received: from mail-yb1-f170.google.com (mail-yb1-f170.google.com [209.85.219.170]) by imf18.hostedemail.com (Postfix) with ESMTP id E8DDE1C0020 for ; Mon, 15 Jul 2024 08:20:17 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=G7Pgw2nt; spf=pass (imf18.hostedemail.com: domain of flintglass@gmail.com designates 209.85.219.170 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721031589; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w32Lnn89aqttGf8i+OlyMuDjy+X6udKgPOF42Cmtux8=; b=HcfCqXrOiqZM7qEvBWHieWZY/4c+T2UU/2kKcu0CE1muRBy6h1BrQRHJHn3BKVt2h5EmNz Ue39BXfNVtoVzIXmTFX6+B5AyS9OLY5DiVrWEo3gPhh2OwO+eu/TysTGgWtU6ZGc9+P0xg FgCjuQGBwUPTZN8BBYx4HiKF5fE36BU= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=G7Pgw2nt; spf=pass (imf18.hostedemail.com: domain of flintglass@gmail.com designates 209.85.219.170 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721031589; a=rsa-sha256; cv=none; b=In73hxA+BOak8pHliN01ydM2pO+Vy6AEmDxAwTdFtKHPYt+nMuHMylHpDICBflWkpB4g7R PRcfGXy5xZo3lQVaKGr/kszrc4PXAYHiUJzHFkLReLSVDAP2HS5+coyTzeBljHEb1CJ1De 25ZjuoDh5c3DYwe6A5Y4mu386z2zsJY= Received: by mail-yb1-f170.google.com with SMTP id 3f1490d57ef6-e03db345b0cso3825631276.1 for ; Mon, 15 Jul 2024 01:20:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721031617; x=1721636417; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=w32Lnn89aqttGf8i+OlyMuDjy+X6udKgPOF42Cmtux8=; b=G7Pgw2ntudIdx5ed/YKEhRUMPYmhimHMjo/a2vFIa4Pn378CNRv+YPHU2HNUiEPsDh QvbxJQZKEJYarac4iCLOH6IwlwJQGFzaKxk9cZFOcMvsyierqXBvi80E7KrBJH10ameI inJgAkxHz1VKLmhT7kJqmcr9J0mU8PLODpHbO/gKMx/a0CQx9DFciTkwHUW2l1+NDRb4 w1uY7rrGEgRCN9AIO40A0BWLd3awORPt07AH1S0x6Yqie8W8tmgL1QGA8hJ3v0TzIvx2 0wCEQApjzc66t5yTZDRXtIxMbyHS+bClVB6xZjik9XCIH/F1CIhAXT35zi3nZvQ6vBzO wLMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721031617; x=1721636417; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w32Lnn89aqttGf8i+OlyMuDjy+X6udKgPOF42Cmtux8=; b=vGSk6AauAJYAWDqX/hRk68TGwpsB0OZ46MhloMnoENdfqSm4JI1EyCFeL6e2wo/5bm gIue/PDePXxxeRFZFlPg7rh99Y99ofgpLd37Rc6lDVmm3fKhdnZLPokgcKFoDtQowOox 5kR3g9C/DaEBJuWD2MSnDVthguXEG2lotZXzi03nyYjju1yeWDnUSBYTZdNeCZjjbl7p Rpan3ME5Mgrnk/6RWeheTuX8bMkFlw6BkClv2UWRVd8Cv78cd2uhHHEcxI6sFjRaEYI9 jms77FQlxCYSgTCzpilx2CwK3ulN5ZdGg1WC1RTbObmBBujpHJXGP3RGSVIPo4MgmNJz D3Lw== X-Forwarded-Encrypted: i=1; AJvYcCWI8ZEKI6XtwD9EUzF7OgrTcClHl3UbO74ept2nz6m4VAvvEgw4vmADBcDbR4ZboFpLGoyN+RkATJbqii6HIQYZwbo= X-Gm-Message-State: AOJu0YzkoZVbR6WDH8H1BxFikj6d/ARpqz34tbtr4Q1pyZ7MscCcTt4H hwsLuhanuf/s72SFJhHSqoVJWXVcKjWsvGsIbQjQF1cFWNrHWZBReU8K76BjfILcdo/d3XCavTp 2Tiiuf7FQOV+vHTRgQWTJa4/Jdtg= X-Google-Smtp-Source: AGHT+IHqF0bzN3oaeHLdU3o7OVMKaBGJVKKiVAcQw8uaPEHiVkAtxgzLjYUDhiSGbPFDUHm2O/xsWwoFjm8CO/6yiXw= X-Received: by 2002:a05:6902:50d:b0:e03:580a:da43 with SMTP id 3f1490d57ef6-e041b05cac1mr20842473276.22.1721031616930; Mon, 15 Jul 2024 01:20:16 -0700 (PDT) MIME-Version: 1.0 References: <20240706022523.1104080-1-flintglass@gmail.com> In-Reply-To: From: Takero Funaki Date: Mon, 15 Jul 2024 17:20:06 +0900 Message-ID: Subject: Re: [PATCH v2 0/6] mm: zswap: global shrinker fix and proactive shrink To: Nhat Pham Cc: Johannes Weiner , Yosry Ahmed , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: smz1sbazb4exesciwpy7gd9pxp1ckrjx X-Rspam-User: X-Rspamd-Queue-Id: E8DDE1C0020 X-Rspamd-Server: rspam02 X-HE-Tag: 1721031617-277479 X-HE-Meta: U2FsdGVkX183WBMvqj49fNJ2AMRzQLZqdVkx3pgmQM7iFMfVBDDe75i+IWYtb8WFDH2LpQPqkxiCPNwCHAxCD9Ix2vVq+EP6dm8waF33McHaNIm5s/StQKWm9ktew5fqHYtg6Xv2IOMIpczxXbTgQFukG3KWn8wBVK+dlZfmy3DlXIC8DMmDlahxEslzD3roxDpQJxbLGoDHn+SXs5o+Xx6JeyiFlHVNyFTbsDsxt0D2FFABl6AdeOgVbFyY2ERfpYLshmqS45KrJbOD+Tg+OmYCR8u316NQfpooTKyfIe5usDucEwBIS2YTOezkWIt6QIVljK3Puz4ziROnW+KI2Jh4YvQjwLvH/qAq3JVmJTXr0q627JPI/zIMdomJ2pwlEnc5R0Pk3UVCTAiu2lLHJOoHUPR87u+e0T8GcS7nVQywS8akmEGg+XYX5WxWHRoeyQNoOQeWrI2//kYox0zyZi/MpG04uvjF6r4fD/Finr+rViwD0hGPtBVY0d+Qh4nOzmE5kOPkPE1RNkNoCcnKvze/8RvzRw7K+ZwIz2BFHHWTxI1IGz4loiG4j7OV0l90BZjIL4uw50zfR0wJyyzvhRbAK+DVnANOCQnX6Fc9dqfvw7U7bMuUOAXGeT5mHLIVAd/qczHsMuqJiwmO9hwlP4vLnIHF3lvxBy6jgUkwjf/RrrTCuLn4dnaHJ3pOTLNj7YKr271+0zAw/qZSbWvbJBHlKPpRa8uh9tWDBdtpfBYxu4UE3FttS/dDEYLoe3ExLy3VtOAx7aw0YRYCFW31UZxnh6f4wof93oOUQh7TXonh8I56NjBnTjETXOL1D/rv+Wj3napXY8t33KoeGtmtvYES4yeOOmMAgYGe9qUMbL4cqyQS2Vy/IBFW0owIf0hNAZn/OBYkqMh4KnjOaXYS9RYPFDwDkhTa+Xp5aUqv7615M37MO1hH7xyP4ETKuw+8H7X1BmblW/w629O7LUM j/GKDVZs cVWmAV1mEz3+nR2R07cIy1UXjsGLsZ/Coni3Na2vGYMq+RPZPMEjh+bSJ43HuCnnHh4Z38ISWWd3wcLsGSghXmCaX9X2Qfdt6j6yzAnMmsyQeh5SzChWPEt0jeSaktLHrxooRQdNT/qIwsAMRVu2tEyBAVJdv0ZueiYF1Iap/pppQiDQQfgg9Lfm5idaxxGb1jHAZjGm/0aW/+3kExJ21pBIM5zZZK9q3xuDdn8jovomsY2nxdRLqDaujwzsO0hW6VtR57Baw+uceoAwkLccks26w7niwXMnFjaINrJgFI8GFEGEtkMXDa5ZLfkeUMoeR/2aR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 2024=E5=B9=B47=E6=9C=8813=E6=97=A5(=E5=9C=9F) 8:02 Nhat Pham : > > > > I agree this does not follow LRU, but I think the LRU priority > > inversion is unavoidable once the pool limit is hit. > > The accept_thr_percent should be lowered to reduce the probability of > > LRU inversion if it matters. (it is why I implemented proactive > > shrinker.) > > And yet, in your own benchmark it fails to prevent that, no? I think > you lower it all the way down to 50%. > > > > > When the writeback throughput is slower than memory usage grows, > > zswap_store() will have to reject pages sooner or later. > > If we evict the oldest stored pages synchronously before rejecting a > > new page (rotating pool to keep LRU), it will affect latency depending > > how much writeback is required to store the new page. If the oldest > > pages were compressed well, we would have to evict too many pages to > > store a warmer page, which blocks the reclaim progress. Fragmentation > > in the zspool may also increase the required writeback amount. > > We cannot accomplish both maintaining LRU priority and maintaining > > pageout latency. > > Hmm yeah, I guess this is fair. Looks like there is not a lot of > choice, if you want to maintain decent pageout latency... > > I could suggest that you have a budgeted zswap writeback on zswap > store - i.e if the pool is full, then try to zswap writeback until we > have enough space or if the budget is reached. But that feels like > even more engineering - the IO priority approach might even be easier > at that point LOL. > > Oh well, global shrinker delay it is :) > > > > > Additionally, zswap_writeback_entry() is slower than direct pageout. I > > assume this is because shrinker performs 4KB IO synchronously. I am > > seeing shrinking throughput is limited by disk IOPS * 4KB while much > > higher throughput can be achieved by disabling zswap. direct pageout > > can be faster than zswap writeback, possibly because of bio > > optimization or sequential allocation of swap. > > Hah, this is interesting! > > I wonder though, if the solution here is to perform some sort of > batching for zswap writeback. > > BTW, what is the type of the storage device you are using for swap? Is > it SSD or HDD etc? > It was tested on an Azure VM with SSD-backed storage. The total IOPS was capped at 4K IOPS by the VM host. The max throughput of the global shrinker was around 16 MB/s. Proactive shrinking cannot prevent pool_limit_hit since memory allocation can be on the order of GB/s. (The benchmark script allocates 2 GB sequentially, which was compressed to 1.3 GB, while the zswap pool was limited to 200 MB.) > > > > > > > Have you experimented with synchronous reclaim in the case the pool i= s > > > full? All the way to the acceptance threshold is too aggressive of > > > course - you might need to find something in between :) > > > > > > > I don't get what the expected situation is. > > The benchmark of patch 6 is performing synchronous reclaim in the case > > the pool is full, since bulk memory allocation (write to mmapped > > space) is much faster than writeback throughput. The zswap pool is > > filled instantly at the beginning of benchmark runs. The > > accept_thr_percent is not significant for the benchmark, I think. > > No. I meant synchronous reclaim as in triggering zswap writeback > within the zswap store path, to make space for the incoming new zswap > pages. But you already addressed it above :) > > > > > > > > > > > > I wonder if this contention would show up in PSI metrics > > > (/proc/pressure/io, or the cgroup variants if you use them ). Maybe > > > correlate reclaim counters (pgscan, zswpout, pswpout, zswpwb etc.) > > > with IO pressure to show the pattern, i.e the contention problem was > > > there before, and is now resolved? :) > > > > Unfortunately, I could not find a reliable metric other than elapsed > > time. It seems PSI does not distinguish stalls for rejected pageout > > from stalls for shrinker writeback. > > For counters, this issue affects latency but does not increase the > > number of pagein/out. Is there any better way to observe the origin of > > contention? > > > > Thanks.