From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2F00BC77B7A
	for <linux-mm@archiver.kernel.org>; Wed, 31 May 2023 01:06:45 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 9378F900003; Tue, 30 May 2023 21:06:44 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8C033900002; Tue, 30 May 2023 21:06:44 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 76132900003; Tue, 30 May 2023 21:06:44 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 5FABA900002
	for <linux-mm@kvack.org>; Tue, 30 May 2023 21:06:44 -0400 (EDT)
Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 24CA31C72C9
	for <linux-mm@kvack.org>; Wed, 31 May 2023 01:06:44 +0000 (UTC)
X-FDA: 80848760328.08.52ABD4F
Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217])
	by imf18.hostedemail.com (Postfix) with ESMTP id 6948D1C0014
	for <linux-mm@kvack.org>; Wed, 31 May 2023 01:06:42 +0000 (UTC)
Authentication-Results: imf18.hostedemail.com;
	dkim=pass header.d=kernel.org header.s=k20201202 header.b=nWx7LkvO;
	spf=pass (imf18.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org;
	dmarc=pass (policy=none) header.from=kernel.org
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685495202; a=rsa-sha256;
	cv=none;
	b=0pea4TTwoY5fJSdd17Waf2Gxt13DvnhCemknqVkt2GwYHdUC3wYRGnGzASegdv7Eq8v/pe
	UklJ5fh1Vh0csBYU2tEJz0sCL5NOfEB7dp+b7M/bzqYnqG+WiTl8lMSSwy1hLuz7ej9hzK
	HJ7DVVTax/ECF8grEshpzbW1M3Idcoo=
ARC-Authentication-Results: i=1;
	imf18.hostedemail.com;
	dkim=pass header.d=kernel.org header.s=k20201202 header.b=nWx7LkvO;
	spf=pass (imf18.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org;
	dmarc=pass (policy=none) header.from=kernel.org
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1685495202;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=rAghq1jNqfdVVCjOpZzwvb1zxQ7Re/mYbd6qoDv/GeY=;
	b=cxlXA8fEPh73rIlJlESq0BV97ToK0ngSoW1b0ehbyGIyjzmhHlWBt3jwHGCbI+YgQiUm+h
	GYDXyta8MXjTby6ipH6fk9ibxFRC1jvf9aiRCyWHJ+X6qRez0pQOkpZslWoFrx8XjGgFp7
	d0H9w5aoZv621r7bCq2l58DiA3qAIDA=
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dfw.source.kernel.org (Postfix) with ESMTPS id 6FD626355D;
	Wed, 31 May 2023 01:06:41 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80F08C433D2;
	Wed, 31 May 2023 01:06:40 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1685495200;
	bh=cIUz6fVeRhVKCmvPBvZJh6mfPlzIro+3fTmeKE1vVQI=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=nWx7LkvOy///Dtnw40aqCRG+YIwOdLTbCXHUV7eRRcbKWGwnGl1uQlr/xoqwS84S9
	 jKAEH38FVfDc44Dth7bU32aw7TocDAyWQLuxpKkki8g8vkZ6uh9lYvAn+tJJiOkETx
	 lrAKh38u996iyY4TuTQKoCnh2Dgvp2TI43UDzVJ0i+3GLlhFVgn2tWP2elqfBqVsTN
	 DWz3kElxSOakSKqh8nWm9Aa3NwGl58BmeeK7BBKRZmG28is3b+FstNLSuVoH7A5SLA
	 E3Y/KjsXTv+yKOwySSz0H+0VhmKH9XamFfS/zLwGu4AR89YdmfkphR2maKsoJyjMdy
	 5WBaHUsL8Q5BQ==
Date: Tue, 30 May 2023 18:06:38 -0700
From: Chris Li <chrisl@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, sjenning@redhat.com,
	ddstreet@ieee.org, vitaly.wool@konsulko.com, yosryahmed@google.com,
	kernel-team@fb.com
Subject: Re: [PATCH] mm: zswap: shrink until can accept
Message-ID: <ZHadno3POk05o82x@google.com>
References: <20230524065051.6328-1-cerasuolodomenico@gmail.com>
 <ZHE7KWId1fvXLpUy@google.com>
 <CA+CLi1iC35exSaKaCmVmTWtna=9xBKHSyGNGmHFo=OqiSECGoA@mail.gmail.com>
 <ZHUSfg+z3wcaIhAT@google.com>
 <20230530041341.GB84971@cmpxchg.org>
 <ZHYNaxgcKs7PSp9I@google.com>
 <20230530155519.GB97194@cmpxchg.org>
 <ZHY+C0ICTah8/+V3@google.com>
 <20230530185451.GA101722@cmpxchg.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20230530185451.GA101722@cmpxchg.org>
X-Rspamd-Server: rspam08
X-Rspamd-Queue-Id: 6948D1C0014
X-Stat-Signature: 3tm8r8szdw38p3iwfdwgza94y3kjkqqk
X-Rspam-User: 
X-HE-Tag: 1685495202-248865
X-HE-Meta: U2FsdGVkX197ztHApC4syBsgVFeOj8k8d8IzYjHpnx7q0EvcJ0HPV4cRcl/CEf4S+NJ7GKun5IhRfDnLSt8sPM26/C6dclkBsyI6VYyed1nKSh59onxobejgITmw3blFe2hpltRVlfaKsU9iF+XWANQbUz2HBPCAGlzCzRagyNYCbsGOSYYn1+bmwXo/FMI3uVUNdvZFB0pueCWDghMM2mt6QAXYCbMiaq8yY+jADtxirE7Fm5XCfX2UJYtG0gYTuXs3s9OQiUWkoRnPO9TjRbW7OGKWSKIAOQMXbdsx/MOkMhO1taUokx7fKGsj5EEIcV2SpxfkNrcoA1A9JoHAxt++BXRqwTaWNMknUzI/GICCajkfzUpFUzV0aFayq5LCywXpH4tzB5zBxRJB/CPQ9ywJFz3LW9Swb9VqlmVcQg5kQKGBy+aAqbEREGnl1vpmjOC12v2K5+m26AC+Tac+4GFeUpDME/SvCgiDt63hhirX1uaUpJJgwi67eMupYfg+LzZwBZjMd4/5TaVzfdy32yCzEQwEFTeBuw21xdHuOhCpevhBzGrVzDJzE7+OfIX4yLVnEJzPG3QiYlwXoia0E0M2gdwPytV/5euV+fPiS47ADq03pzDT+69hZuwOFJTM9gQDC7sAjc/PGMbdB6OzHfyJ0eAac9Mfh/zNEL5bd6ASJpjatq+AMpXjNXl8MJdf5YlJh1MtAqoJy7N4jyROJIpgjgyG7iA4LIzlViIoF8L9nG0j1Lv2eI0atm/LvWGQtRmSGR4mdrnfGg5nnGa6ash/80OwKz0bxk7RvvQBXgWZAJhQkKC0MWUZxRChfDzaFi7g92izLgcQaXpGlXuMjR7KIDfgwgvZDHT2whGJ+ltRP6tBf7gAQH+vVCi1hCKuB0HAJ2lMIkGqOURjR8FQvvsY0Q2PEX2UyMkpzxRfG6ovf7nkHkzIEozAe++WDHxF70ovK/69zPMQDz5MiNc
 GlUNKqjU
 2LOjdSFO2Y2cpxvcgGOXkbkJl+nlKArNt8u5aT5CicKSRy2Qor7vutYLKW8nv+wgWNPNQLaSXSyPoKGJLovIywmPwGNYYekWKLsXKzou9wk67qdY7XMUV3ezs3X3qNCN1h+iK9DFtSF4X1TwDxQ2eh1oj806ohF807RDp8kaq+YC+3OqWoQT04uaHI9GKV/kCNe2+osyccEsa3lpVlKgE1XTN7LCUq8bJXecrfkmCqQw5L92wC2dRB49v+w==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, May 30, 2023 at 02:54:51PM -0400, Johannes Weiner wrote:
> > Maybe ENOMEM is a bad example. How about if the swap device
> > just went bad and can't complete new IO writes?
> 
> This is actually outside the scope of zswap, and handled by the
> swapcache (end_swap_bio_write).
> 
> Once the IO is submitted, zswap will ax its copy and leave the rest to
> the swapcache. It behaves the same way as if zswap had never been
> involved to begin with when the swap out fails on IO errors.
> 
> From a zswap perspective, there are no persistent errors in moving a
> zswap entry back into the swapcache. Not just currently, but generally.
Again, you are right that this zswap writeback is async.
So the writeback error is NOT going to propagate to the
shrink function.

With the current three pool backends that I looked at{zbud,
z3fold,zsmalloc} they all have internal retry 8 times.
Adding more retry did not fundamentally change the existing
behavior.

I look at all the possible error codes generated inside
the reclaim code, the only noticeable errors are ENOMEM
and concurrent swap invalidation or a racing swapin fault.

BTW, zswap reclaim consumes memory. Keep on looping ENOMEM
might cause more OOM. But that can exist in current code
as well.

> > > Aside from -ENOMEM, writeback_entry will fail on concurrent swap
> > > invalidation or a racing swapin fault. In both cases we should
> > > absolutely keep trying other entries until the goal is met.
> > 
> > How about a narrower fix recognizing those error cases and making
> > the inner loop continue in those errors?
> 
> Right, I just I don't really see the value proposition of this
> complication, and I see some downsides (see below). No single entry
> error should ever cause us to stop the wider reclaim loop.

That is until the current LRU list has been through once.
I expect repeating the same list yields less reclaimed pages.

> 
> > > > > extreme case where it's the only page left on the list, I again don't
> > > > > see how retrying a few times will make the situation worse.
> > > > > 
> > > > > In practice, IMO there is little upside in trying to be more
> > > > > discerning about the error codes. Simple seems better here.
> > > > 
> > > > Just trying to think about what should be the precise loop termination
> > > > condition here.
> > > > 
> > > > I still feel blindly trying a few times is a very imprecise condition.
> > > 
> > > The precise termination condition is when can_accept() returns true
> > > again. The safety cap is only added as precaution to avoid infinite
> > > loops if something goes wrong or unexpected, now or in the future.
> > 
> > In my mind, that statement already suggests can_accept() is not
> > *precise*, considering the avoid infinite loop.
> > e.g. Do we know what is the optimal cap value and why that value
> > is optical?
> 
> Oh but it is precise. That's the goal we want to accomplish.

I understand it is the goal, the precise condition I am
talking about is the loop termination condition, can_accept()
is not the only one. Anyway, let's move on.
> 
> The cap is just so that in case something unexpectedly goes wrong (a
> bug), we fail gracefully and don't lock up the machine. The same
> reason we prefer WARN_ONs over BUG_ONs if we can, to avoid
> crashes. That's really all there is too it, and it strikes me as
> reasonable and robust design choice. It's fine to limp along or be
> suboptimal after such a bug happens; the bar is avoiding an infinite
> loop, nothing else.
> 
> Your suggestion of whitelisting certain errors is more complicated,
> but also less robust: in case an entry error does by some accident
> become persistent for the whole LRU, we're locking up the host. We'd
> rather catch a bug like this by seeing spikes in the reclaim failure
> rate than by losing production machines.
> 
> > Putting the definition of precise aside, I do see the unconditional
> > retry can have unwanted effects.
> 
> I hope I could address this above. But if not, please share your
> concerns.
Thanks for the discussion. I am less concerned about the retry now.
Retry on EAGAIN might be the simplest way to proceed.

Outside of the scope of this patch, I am still surprised to see
such a high number of retries caused by race conditions. There are
8 inner loop retry already. The actual pages retried need to
times 8.

If there is a reproducer script, I want to local repo this
to understand better. Wish there are ways to reduce the retry.

Another idea is that we can start the shrinking once the
pool max was reached. Try to reduce to the threshold earlier.

Chris