From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67FDDC3DA6E for ; Thu, 21 Dec 2023 00:25:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E57916B007B; Wed, 20 Dec 2023 19:25:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DE0616B007D; Wed, 20 Dec 2023 19:25:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C33416B007E; Wed, 20 Dec 2023 19:25:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AE9B56B007B for ; Wed, 20 Dec 2023 19:25:02 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 913941A051E for ; Thu, 21 Dec 2023 00:25:02 +0000 (UTC) X-FDA: 81588930444.03.74421F9 Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) by imf30.hostedemail.com (Postfix) with ESMTP id C592E8001E for ; Thu, 21 Dec 2023 00:25:00 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XQ+yS7eg; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703118300; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n2A+gj4jHgXnzarDM6KYQd3pZD+FO87csNCByA+KPf0=; b=eg9k/+KPaCOw5TK9v6oGFPFbbmxro0MIvr5xxN+FgOkTw19D01vdnaJu0rq3CUs/s5UQTW 4pB3QgBF/uGT8CYklRpcdAdk9ZFJVp4AQb+iWHkmmLzpn2SlmerBmPuTyAa2jAD88xWZb3 eAeJx9eMow0ABz7etm2f6Cyc3mO0xdc= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XQ+yS7eg; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703118300; a=rsa-sha256; cv=none; b=ZVsGBaL03RwcCXo60Jshdvpc+XFfkI7Fa27iGZwP1bquTHQtv3AkN40FKijEBxpqJ8ZNP6 Qb+oe1h0StEFcYpRptbdm+tFVQmTYuR6vvhmifH3v6g4fgqeZsGsWbAfVeQ+pMuloiYyIF riPP3iXbzbDnkxRTtmPpE0UxpFXxv5I= Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-a233a60f8feso23025866b.0 for ; Wed, 20 Dec 2023 16:25:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1703118299; x=1703723099; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=n2A+gj4jHgXnzarDM6KYQd3pZD+FO87csNCByA+KPf0=; b=XQ+yS7egu9tY2SDNxgjJaLfa9vLu+R9Vlzk0LnvXYgYFp9q3ImUrP8RhaabiuJ7+7T Apzi77JdrIBf7LPjRQtD81EwdoIZAlYxTjrNIe3pOtCd4l5J8oi/JqFlQ1prNrxhiNjX 2Ta330hLP+EHJUgm7UPpCbX0a3zFdNaRogTFNgD1evgHwQWf8q1cPp0BHS1e98M9htPP 5BaH8huSvzKtAVWzt83Dv55k3i2Yi7oZlDiBRjijtb1amGsmuzI7lNyMXthPwXLux0JQ mz+53dxViqMXbQwRXrb979EmZFMxSEtVyam97qJlVGNJye6IirDP7YbUslwUygNRX2ID BWpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703118299; x=1703723099; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n2A+gj4jHgXnzarDM6KYQd3pZD+FO87csNCByA+KPf0=; b=RbsxjFHo6fNa4S1swCIc99Y73pM3H/GEfov7zpyKBHjNcTXGmRb4vHMOcztQCgjHNQ 2b/RH4aDTbijnB+S6pvgMI5MPGxy/3joie1BmNWZhyPNNz47vc/EMxcQpJJx7x24V8mL MSHUmlIOHEX30Uu8G4RTsIqdAwxQZqPD0iVJLC+qGriJMLg/LaopYzKGssxeiRQgKbAb 47tBASm0qGEQmyzc7gYNhD9HS9NqKDIREdy/HjdIyvSzyz1F8AL/QUJve6b9X8zqqibV dM26w/IVA/0vNCec/WD6B3jqit9IJKVuke+8tJLevb2oaTL/Tk2sujWLVHRApaWQc5Y3 eINA== X-Gm-Message-State: AOJu0Yz+dFNpZFL+3ofiCbQl2JC8nEG+u03o3ylkPlXNr3MtOcwdXH46 +ITVJLT7OqdyNr2IMMemdVQqpMMtWYEY4CIQK/oGxQ== X-Google-Smtp-Source: AGHT+IF/q0Cvl7dgpQLrdGEKfgdZRiZpZhKKggUyb19FqDUvKeUfvQ5HG1Q7O7+mO/6HSLVd2kEHYHF6fKE1AeZSODg= X-Received: by 2002:a17:906:af0c:b0:a24:20f:d63a with SMTP id lx12-20020a170906af0c00b00a24020fd63amr1874289ejb.97.1703118299064; Wed, 20 Dec 2023 16:24:59 -0800 (PST) MIME-Version: 1.0 References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231218144431.GB19167@cmpxchg.org> <20231220051523.GB23822@cmpxchg.org> <20231220145025.GC23822@cmpxchg.org> In-Reply-To: <20231220145025.GC23822@cmpxchg.org> From: Yosry Ahmed Date: Wed, 20 Dec 2023 16:24:22 -0800 Message-ID: Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling To: Johannes Weiner Cc: Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, chrisl@kernel.org, Wei Xu , Yu Zhao Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C592E8001E X-Stat-Signature: x3iftot9wt8jgkpc3rzzm47bjnf575m6 X-HE-Tag: 1703118300-225670 X-HE-Meta: U2FsdGVkX1909MtYQMVdGBgZ7MiRGzM2bFpCnUMVGZgWLhCF9X+rrFaawt++H5hCeoffPJqFIuF5W4cKiYNi2y0GIyqyjhPrSxztZb7+/ZvWSJwKLLv3kZa1bUBfTxZnpI47f1LGNXjKgVxbVJ4W7Z1sYdVP6MYSLyVb3VftEWNOfHK3+QN3tuau23D+4hq73wkcdKGNbKcY1qCXXKAx2G+zNymCT/zQhvOwuQRtL5Uds2ES1dQS1hK3T4VrfRi5SlnF3Qymxoxp/PPKuCYzoJnDTSl+FYTHA6kg94l+mtA7CBWn1zfp+PA0zC46EKyRW8yjt5dVEkVfIbeu9Up8ZDBMOCtAlTxOxjzsoZN5SeD3jKapv9mPjpnKTNtQNrug1x3A8tYNCbkmml8r/APV3zAxcS5FFIGSfpbW/LU3iuBNGHtt963lDMatLA0tBcAy6byA+L6Wq7xoHWCp4v6STYDBa0tYvNw0Rgn3FhR9IH4od4wQfP6DiLY676FhSZQ+4FUpasfVTUI17Qef6fGZ/NbyxsnsWAPOw7TXXurZL77/bfFxzUTJDR5DXDabCQ4qNIeKG5uHU6WfFpouzguCkDc61WFlXfeRB1R8hRFvyxKLLpWhxmPdG3XdmukkPZm1PrkzDNIZPzJY9VfH7K/tJCZ5vdpr2I9KSIKkBRDVPt4/8jRFDknRnjHX37A5eJQlb9vZNN4OyEOlKiuUaI03t3RVMO9IpLCL5EQbJa7SJ9XQQ08+tRXALXv9dBcdr6jK36Z1E/MzwTnMLBwlALLJeww3CBc4w0wT8g+yptnk8mEOU7RGtQntiZKgFgpfS7Nocm4TFzsfjuyLj9siwonqcYm5MK4C2r80Ir655qghkiHlhA/psZHik8wxMGBKPDUMLovFejIrh5nEkXpmdg5QKx1ygMrezh/Rt7XBoHuN1DzxzuMv9PWgyBfnk5jw3o9E0lbOD0wjeXIK1yA0XVP WKVfdd9E gG2e7qXKPuS+BjScSJBJcS6l8gWGiFMYjufb2EHsYHy4m2LqFq/HZNZCq2xywpDApy7t4hi3x+YJoX87VqYIVhTU8GjPCRrpSzPQHYke8iwPUwvVNDkTiL37gdZqPEr9zUlGuzxz2BFEDvo014HMIRgf6fCu3o1d3fl8rkYhBLorIfiX9F8HViBH03Zo+YOc7IwyoUS+A01ySft/nwvRfgqsWtpNIJajq4XnXJgr7h/9OTWx0IeFZb2IQw0wEBiYt9W/fOdib9t1BzQTenoi5cXQPI1KfjoXk8Kgl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 20, 2023 at 6:50=E2=80=AFAM Johannes Weiner wrote: > > On Wed, Dec 20, 2023 at 12:59:15AM -0800, Yosry Ahmed wrote: > > On Tue, Dec 19, 2023 at 9:15=E2=80=AFPM Johannes Weiner wrote: > > > > > > On Mon, Dec 18, 2023 at 01:52:23PM -0800, Yosry Ahmed wrote: > > > > > > Taking a step back from all the memory.swap.tiers vs. > > > > > > memory.zswap.writeback discussions, I think there may be a more > > > > > > fundamental problem here. If the zswap store failure is recurre= nt, > > > > > > pages can keep going back to the LRUs and then sent back to zsw= ap > > > > > > eventually, only to be rejected again. For example, this can if= zswap > > > > > > is above the acceptance threshold, but could be even worse if i= t's the > > > > > > allocator rejecting the page due to not compressing well enough= . In > > > > > > the latter case, the page can keep going back and forth between= zswap > > > > > > and LRUs indefinitely. > > > > > > > > > > > > You probably did not run into this as you're using zsmalloc, bu= t it > > > > > > can happen with zbud AFAICT. Even with zsmalloc, a less problem= atic > > > > > > version can happen if zswap is above its acceptance threshold. > > > > > > > > > > > > This can cause thrashing and ineffective reclaim. We have an in= ternal > > > > > > implementation where we mark incompressible pages and put them = on the > > > > > > unevictable LRU when we don't have a backing swapfile (i.e. gho= st > > > > > > swapfiles), and something similar may work if writeback is disa= bled. > > > > > > We need to scan such incompressible pages periodically though t= o > > > > > > remove them from the unevictable LRU if they have been dirited. > > > > > > > > > > I'm not sure this is an actual problem. > > > > > > > > > > When pages get rejected, they rotate to the furthest point from t= he > > > > > reclaimer - the head of the active list. We only get to them agai= n > > > > > after we scanned everything else. > > > > > > > > > > If all that's left on the LRU is unzswappable, then you'd assume = that > > > > > remainder isn't very large, and thus not a significant part of ov= erall > > > > > scan work. Because if it is, then there is a serious problem with= the > > > > > zswap configuration. > > > > > > > > > > There might be possible optimizations to determine how permanent = a > > > > > rejection is, but I'm not sure the effort is called for just > > > > > yet. Rejections are already failure cases that screw up the LRU > > > > > ordering, and healthy setups shouldn't have a lot of those. I don= 't > > > > > think this patch adds any sort of new complications to this pictu= re. > > > > > > > > We have workloads where a significant amount (maybe 20%? 30% not su= re > > > > tbh) of the memory is incompressible. Zswap is still a very viable > > > > option for those workloads once those pages are taken out of the > > > > picture. If those pages remain on the LRUs, they will introduce a > > > > regression in reclaim efficiency. > > > > > > > > With the upstream code today, those pages go directly to the backin= g > > > > store, which isn't ideal in terms of LRU ordering, but this patch > > > > makes them stay on the LRUs, which can be harmful. I don't think we > > > > can just assume it is okay. Whether we make those pages unevictable= or > > > > store them uncompressed in zswap, I think taking them out of the LR= Us > > > > (until they are redirtied), is the right thing to do. > > > > > > This is how it works with zram as well, though, and it has plenty of > > > happy users. > > > > I am not sure I understand. Zram does not reject pages that do not > > compress well, right? IIUC it acts as a block device so it cannot > > reject pages. I feel like I am missing something. > > zram_write_page() can fail for various reasons - compression failure, > zsmalloc failure, the memory limit. This results in !!bio->bi_status, > __end_swap_bio_write redirtying the page, and vmscan rotating it. > > The effect is actually more pronounced with zram, because the pages > don't get activated and thus cycle faster. > > What you're raising doesn't seem to be a dealbreaker in practice. For the workloads using zram, yes, they are exclusively using zsmalloc which can store incompressible pages anyway. > > > If we already want to support taking pages away from the LRUs when > > rejected by zswap (e.g. Nhat's proposal earlier), doesn't it make > > sense to do that first so that this patch can be useful for all > > workloads? > > No. > > Why should users who can benefit now wait for a hypothetical future > optimization that isn't relevant to them? And by the looks of it, is > only relevant to a small set of specialized cases? > > And the optimization - should anybody actually care to write it - can > be transparently done on top later, so that's no reason to change > merge order, either. We can agree to disagree here, I am not trying to block this anyway. But let's at least document this in the commit message/docs/code (wherever it makes sense) -- that recurrent failures (e.g. incompressible memory) may keep going back to zswap only to get rejected, so workloads prone to this may observe some reclaim inefficiency.