From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CC2CC3DA41 for ; Wed, 10 Jul 2024 22:27:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC9C86B0083; Wed, 10 Jul 2024 18:27:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B79C96B009F; Wed, 10 Jul 2024 18:27:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A41766B00A2; Wed, 10 Jul 2024 18:27:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 863486B0083 for ; Wed, 10 Jul 2024 18:27:09 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2F44A1402C3 for ; Wed, 10 Jul 2024 22:27:09 +0000 (UTC) X-FDA: 82325279778.23.8FBA608 Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) by imf16.hostedemail.com (Postfix) with ESMTP id 5752E18000D for ; Wed, 10 Jul 2024 22:27:07 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZFcaCYDf; spf=pass (imf16.hostedemail.com: domain of flintglass@gmail.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720650396; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=I/Cc2aQVQ5C4YUCt2A1KuP8JFLG/zHl92A4qTb1v1cg=; b=23Wlzw6HLi3+UKvSMTrqRO8t+CJQ5G4HED09KiN+ez1RCcxtzvwPNuNktsT2CVnsPPmTA8 GV0b3I0PboNedIGJa61ERhWqzL6YSZFQcKOgy+F4MDcMQT1hX/R4XA1kkKLuT/9D2yU9vn 6i6TbfRD4NYPADRy2JrsY8nJP0i2i4I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720650396; a=rsa-sha256; cv=none; b=nAGZ2pS6WpzCPBhErEiHKEAyRqRl3Z93HFZxml9R0NbmdMJzwVIZHgPN0X8twSjzxJSVYO TnF7ZGf5wuesG1QJ0rt7ur2tBFmWsLnn3TbfvduoqBiO3wl4JZJe424jHyz3PT5fEXmd3Q tEkNtaxRUDbt4Sd1CtZmEmE+rXZUjjQ= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZFcaCYDf; spf=pass (imf16.hostedemail.com: domain of flintglass@gmail.com designates 209.85.219.174 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-yb1-f174.google.com with SMTP id 3f1490d57ef6-e03a581276eso218478276.2 for ; Wed, 10 Jul 2024 15:27:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720650426; x=1721255226; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=I/Cc2aQVQ5C4YUCt2A1KuP8JFLG/zHl92A4qTb1v1cg=; b=ZFcaCYDfhnu4KGWU24H4SaANvaYw9Ic5PhvO6m6S88J3F+tf537L+t/5AcjVvQ3KJP LldXh3AVaA84fgaf9iXiwIosLQqjgBUKHfaPK2awKnoMJEyXl7de9q9GvdL5IWQuYyMd ungxz+hVVMy5KZ8F/0RJx/HKKi/2ofNTBypJQbVRNpVEnHozBL9sUJLrc5/f78vHTOO9 XRDhyA+LePpysgPREWv5iT64zkPt1deS0BVvmywIqapxbsv0rN+tRzqBlp+cVEe41XNG SYjWK4atlTOC6JboXoOHXfR6wtJVp2fZON9aWUt7z8mL5SyzR5yvYg/0k9G6OfWGGScV N3Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720650426; x=1721255226; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=I/Cc2aQVQ5C4YUCt2A1KuP8JFLG/zHl92A4qTb1v1cg=; b=VsmM4DWT58YaadJWgSSEwIEb5sVxad81k6+tzatjTRiD4qwxgXXxNtXNKD6pNOzGBo vyXsMnLuiBhg9Bj4YAAbmL9Uj/afKPp3HDEZcXheW1wGbZ9pgJ/cPiwd9wr8s5kUabMj ZyD0ikYgEE6oheyvxo5MtmmEjQGrdvlDPy3RPa7qxqbPPw3nFLW8MJ8I3M2FXVk+ebxY vwc3wqyxDBZ8+x9YU9YR/Zn6OOFBHZaPieVRRIA2IvjZF9hDbLGyxxYN1BCXuL/z5O32 BVUhWvokciLBbUfbgdYi1IqTH2GfYbtKYcg0jeJZwrGH4uSJ3HMSCw6FgfaRLqUlGFUy l98A== X-Forwarded-Encrypted: i=1; AJvYcCWN3moUNHhSU6eCPzkJB65FSgkFmXjbjykUF6Fi829gXokpRzapP8pBzRj9s0tZ0BMiZocXWgVl9+inJgDhF/JZnqc= X-Gm-Message-State: AOJu0YxHHzzTLN5iZFmfmlrpYPYhtF0c3dFf1dygsBm+CjRjRJOM2+Ez pPj6hQSjCc1p+OgF02fi/Bd9N2s+WHz9TMcmlTqyFmXBu4tNv/j1293M36+PriEgEhH1CsDoIe7 0dz6+oEXmvjN8zcNOy0wJCC17S9c= X-Google-Smtp-Source: AGHT+IE3MSNblNyjPTWlXtNx8ItMNm10tT48kqwkStoBU3USdjc5JBVU9DQ4bDKUP4uy6AFDnPXlKLFcfHrwVbPwkz4= X-Received: by 2002:a25:8387:0:b0:dfb:bf0:59db with SMTP id 3f1490d57ef6-e041b120cdamr7581879276.41.1720650426328; Wed, 10 Jul 2024 15:27:06 -0700 (PDT) MIME-Version: 1.0 References: <20240706022523.1104080-1-flintglass@gmail.com> In-Reply-To: From: Takero Funaki Date: Thu, 11 Jul 2024 07:26:55 +0900 Message-ID: Subject: Re: [PATCH v2 0/6] mm: zswap: global shrinker fix and proactive shrink To: Nhat Pham Cc: Johannes Weiner , Yosry Ahmed , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 5752E18000D X-Stat-Signature: pzwkrs5rdq351xbjjcfcgaoimzig48cu X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1720650427-590068 X-HE-Meta: U2FsdGVkX1/9qc/9Wlp3bKJeVxZx4jtsdFRoIJ0VSyo3MsJ5E8ifLIazki2StpJz6F+Z+s9Lqd6Njy8wX8BFvVkvx8wHbDkAzpoSWH9CO0Yiid9C964LlGPAYFtWXUefLV7Zk76ecsoYQITl+6J8EoPMKBfhhqqgtLhVGeEW2DIDXuM6R5xbOuTGLcKIdLztCpOuASiPOoPBZCMP1VpINepSBNdd2ntq7rgO9jwiSrNKCGh/WrODgBQL+vKwj90yzHk4eqdCEKaJe71u/XuczBeyg2OS6mQKeev6qQkwgHMx60ppTU726ATXF19LW86/UIl7SkR/2qsfrpIav65TUZx2SUbp/1ZnZ+vDoCGMvUfvUXnbI6zXXZIPVboTlHc81r9AamL4HZQeqNAy2XICOU7uKrepiGKHiug0Z3M17030yiL3xKxlSytqYnK3i//bIHsukKaWg/LPDbK6CkFovdr2S6Ax21rVVFW1CGVCVnMQhj2sUEijQmMV4iTmuGpOq3ATURf5xDs2kbCDCTmROUHiyIMYCtmdvKaatw+AQhaQyaPC+ThUEspr5z/VjFatEL07eq6QigRfXuKVRchLBlJVJcisS7QeDEvWBg2kDNCWMNIYvURSx3cnIhLxGaQvZR7a2yWKLFZik71fwXn78mdP/DKZzG61XEE5UHqCBXTuy0oGnQe2vCg2vh3S/MVw8s6sS9hhRsgVG5n8DIAFy1LkI11fzIQuC0E6fVf/tItCxB1dJ7B4BKQFzX+LkMQkkmlFW2FtR5GjswWcdSV7B/D2EuAMFp7d8a/nMXp4oBAQI2ll+i11NEOHMoRxVuN1HhUCEIOdtT6O2cXKWZsUBq9/a8gkN8CcfIbxzhBQdTIDeGvTnN7frdMtAgNz+MmvlLXLQyw+ZbJ3zdVxjMdCuHQcRwbMzvRLC2DTiG6q5dJoj8LAOvsc2i3ef/KqQB9C6JkBtR90F1ABbUno8+I y5hTZRsh UYs0htMTTT9HGtITYybV+7DrU6j5KHsPa4kEhFEv/8mMIBLblg4qxBp1r3+qsGE7vY44qX+wizzoP6fSXfhFTV8jo0CuEDZgBk2/bL/Ld3jrN6AxWKftor5DvMg0xfe9PCzreceNM8AcmzLFQ74SJYjbBRpc52/JQCye0ycJo6uVqt+zPuMzTW7YhbFXTqgs/VaREn7qCxE8ke9QQDjTdw95Q9iLoKKWpjYuVQsFqBNhKFhU/BAdLVG7zfynSLYoMczznxyW9e9/OWEg66Rhbk7iwfaqdAY6/E9MqyTSlCUGVrHdpqXk2PmpIqQhhjsCzWN92 X-Bogosity: Ham, tests=bogofilter, spamicity=0.006212, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 2024=E5=B9=B47=E6=9C=889=E6=97=A5(=E7=81=AB) 9:53 Nhat Pham : > > post-patch, 6.10-rc4 with patch 1 to 5 > > You mean 1 to 6? There are 6 patches, no? oops. with patches 1 to 6. > > Just out of pure curiosity, could you include the stats from patch 1-3 on= ly? > I will rerun the bench in v3. I assume this bench does not reflect patches 4 to 6, as delta pool_limit_hit=3D0 means no rejection from zswap. > Ah this is interesting. Did you actually see improvement in your real > deployment (i.e not the benchmark) with patch 4-6 in? > As replied in patch 6, memory consuming tasks like `apt upgrade` for instan= ce. > > > > Intended scenario for memory reclaim: > > 1. zswap pool < accept_threshold as the initial state. This is achieved > > by patch 3, proactive shrinking. > > 2. Active processes start allocating pages. Pageout is buffered by zswa= p > > without IO. > > 3. zswap reaches shrink_start_threshold. zswap continues to buffer > > incoming pages and starts writeback immediately in the background. > > 4. zswap reaches max pool size. zswap interrupts the global shrinker an= d > > starts rejecting pages. Write IO for the rejected page will consume > > all IO resources. > > This sounds like the proactive shrinker is still not aggressive > enough, and/or there are some sort of misspecifications of the zswap > setting... Correct me if I'm wrong, but the new proactive global > shrinker begins 1% after the acceptance threshold, and shrinks down to > acceptance threshold, right? How are we still hitting the pool > limit... > Proactive shrinking should not be aggressive. With patches 4 and 6, I modified the global shrinker to be less aggressive against pagein/out. Shrinking proactively cannot avoid hitting the pool limit when memory pressure grows faster. > My concern is that we are knowingly (and perhaps unnecessarily) > creating an LRU inversion here - preferring swapping out the rejected > pages over the colder pages in the zswap pool. Shouldn't it be the > other way around? For instance, can we spiral into the following > scenario: > > 1. zswap pool becomes full. > 2. Memory is still tight, so anonymous memory will be reclaimed. zswap > keeps rejecting incoming pages, and putting a hold on the global > shrinker. > 3. The pages that are swapped out are warmer than the ones stored in > the zswap pool, so they will be more likely to be swapped in (which, > IIUC, will also further delay the global shrinker). > > and the cycle keeps going on and on? I agree this does not follow LRU, but I think the LRU priority inversion is unavoidable once the pool limit is hit. The accept_thr_percent should be lowered to reduce the probability of LRU inversion if it matters. (it is why I implemented proactive shrinker.) When the writeback throughput is slower than memory usage grows, zswap_store() will have to reject pages sooner or later. If we evict the oldest stored pages synchronously before rejecting a new page (rotating pool to keep LRU), it will affect latency depending how much writeback is required to store the new page. If the oldest pages were compressed well, we would have to evict too many pages to store a warmer page, which blocks the reclaim progress. Fragmentation in the zspool may also increase the required writeback amount. We cannot accomplish both maintaining LRU priority and maintaining pageout latency. Additionally, zswap_writeback_entry() is slower than direct pageout. I assume this is because shrinker performs 4KB IO synchronously. I am seeing shrinking throughput is limited by disk IOPS * 4KB while much higher throughput can be achieved by disabling zswap. direct pageout can be faster than zswap writeback, possibly because of bio optimization or sequential allocation of swap. > Have you experimented with synchronous reclaim in the case the pool is > full? All the way to the acceptance threshold is too aggressive of > course - you might need to find something in between :) > I don't get what the expected situation is. The benchmark of patch 6 is performing synchronous reclaim in the case the pool is full, since bulk memory allocation (write to mmapped space) is much faster than writeback throughput. The zswap pool is filled instantly at the beginning of benchmark runs. The accept_thr_percent is not significant for the benchmark, I think. > > I wonder if this contention would show up in PSI metrics > (/proc/pressure/io, or the cgroup variants if you use them ). Maybe > correlate reclaim counters (pgscan, zswpout, pswpout, zswpwb etc.) > with IO pressure to show the pattern, i.e the contention problem was > there before, and is now resolved? :) Unfortunately, I could not find a reliable metric other than elapsed time. It seems PSI does not distinguish stalls for rejected pageout from stalls for shrinker writeback. For counters, this issue affects latency but does not increase the number of pagein/out. Is there any better way to observe the origin of contention? Thanks.