From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E16CC3DA6E for ; Wed, 20 Dec 2023 05:15:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BBF4B6B007D; Wed, 20 Dec 2023 00:15:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B6F706B0082; Wed, 20 Dec 2023 00:15:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A36E96B0083; Wed, 20 Dec 2023 00:15:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8DEC36B007D for ; Wed, 20 Dec 2023 00:15:31 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5B67C403D5 for ; Wed, 20 Dec 2023 05:15:31 +0000 (UTC) X-FDA: 81586033662.30.BF28CF4 Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) by imf08.hostedemail.com (Postfix) with ESMTP id 156AF160021 for ; Wed, 20 Dec 2023 05:15:28 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=3c2yJt4G; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf08.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.208.172 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703049329; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rvHlCxj951YjDHsPMmawLzaHggA+zCebO+fEWpKIwSo=; b=J2sy07umcVVdnQriURmEbCBbPLxJw4txcLVWSd6sGSefvwq5h3yJCtze8RoN/kyh6Zssya tcqGFZvFYVA7oi3WQVqahnqOHAPC5hd7vPKXswu0bv9EmAE2bSfVxF6I+vupi5TdbZhAWj PbYuOXR6xi90tuJG7Cs2KZheoLjfJnk= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=3c2yJt4G; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf08.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.208.172 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703049329; a=rsa-sha256; cv=none; b=kCethqWwyPyO5HdYTI9Afo4tn6YaEg+3K/6bjRG7Lt+Hpqj9Ph17XFod7b9hubfzMR+k8F xkw6fcgjF1dQdw5HIAskSOL7lMulq4DW/vBUAaZeDTUzN/gvc9ugy9RuFF7VWYfTlskhRJ OWdSjfAIJ7bQev6EFmeKACLicpn7SGg= Received: by mail-lj1-f172.google.com with SMTP id 38308e7fff4ca-2cc7b9281d1so28883191fa.1 for ; Tue, 19 Dec 2023 21:15:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1703049327; x=1703654127; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=rvHlCxj951YjDHsPMmawLzaHggA+zCebO+fEWpKIwSo=; b=3c2yJt4GLAPsZoItxGWsOsW4a4gAIUqLMKo6YEixg6BHF9GbXAT6iJ13YTHWMhMTrY LeGTygSLwH/fIzwbpYUfcEA+868durbc7/ng8N9BJSxijJIONlHJIggd3wHESFttFh59 9pgMFXpjdWDdkpT/HR1H9eRH8YY24M68JnCi3nGjI/hBG/i+qUdPAaVkbsXPiqIbVMzn TjH+Sj88sh+KGBQi2zBVrrCnh6gIBfaUVyIMKAJ7jf2iuRIdjNcdFNmnL0TuP5bfr/dd QJ//t7Vcp1Zq1ApEaAd/bZqKS73Y/W7TFFdKeLaj49TtZ6FRBJnrwwygtO//19NJenuw lB7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703049327; x=1703654127; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=rvHlCxj951YjDHsPMmawLzaHggA+zCebO+fEWpKIwSo=; b=ZNHroKZ4Hps78WDUNDFpYwDUqVc8K5XVQd0pazFqYkJ0ovCcrXWCTRsB101PjaaYgm Dg8ePcXzA9zoBifXwxU8ap2AYVL25CuwaNzi5aH49+NJm/isbcgktAxuesp+YKKMCGLT kg2n4JPLjbS+IEZfyInjzDHMBKXycyrz0djMwA2ezrvztsMcm0KmQO2d4ths3hwVjPbv lagnYlorcvhN/TMXEtDNNnI6n1hdcEQyZkg5NOE+Ma2aFzZLHl5R5OIEoo+bepx5HlWK 8UI1mKtNbobZ6qDheDVAmY4x2LDZmXAgARKUW37sy53pkpULEgfGbIJpV7MfyipJUJrW QXJw== X-Gm-Message-State: AOJu0YypiystUK99pS6M928eP6SrFOO5anfZJQJbp7ZiX+A6r/ohpWMo fqh9LK+mnGKzIXndDC4QbtOxCw== X-Google-Smtp-Source: AGHT+IG9s4P1zgEh18QajBxvCQTncRFiLN4R6z6+GvId+kQRZ8914UP7UCTqigd64pASLKZgLYt9GQ== X-Received: by 2002:a05:651c:211a:b0:2cc:8a2b:4583 with SMTP id a26-20020a05651c211a00b002cc8a2b4583mr541496ljq.90.1703049326950; Tue, 19 Dec 2023 21:15:26 -0800 (PST) Received: from localhost ([2620:10d:c092:400::4:a75b]) by smtp.gmail.com with ESMTPSA id u14-20020aa7db8e000000b005530cb1464bsm3921043edt.15.2023.12.19.21.15.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 21:15:26 -0800 (PST) Date: Wed, 20 Dec 2023 06:15:23 +0100 From: Johannes Weiner To: Yosry Ahmed Cc: Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, chrisl@kernel.org, Wei Xu , Yu Zhao Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling Message-ID: <20231220051523.GB23822@cmpxchg.org> References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231218144431.GB19167@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 156AF160021 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 41hq6wt5so8dh6yqqaj9jdcb3fyjm6db X-HE-Tag: 1703049328-149170 X-HE-Meta: U2FsdGVkX18PJFkKCbPraa+uhUtJjVCEgQrLc3lUvf52nHm0zyXhtEqV3NSX7pF4uKgfufSCMmJdH+Tu7li6x1ZJYvz7Fta12LMH3jT3vFtscK8oq3hRmH8erAv5fL0kNxR/6qLQkq2HpY1uXapeH0s9tGYfyP5RcLYn2aKLTu886P/mijqlDK6CLb04K469dQ3h0sGhXkpFKJWZsDuEWgBHVIbRRMBR/b8Yv+MXHKgsqmviyqdYv9xwMY/j7scMCB2UNafjDcc867N/Ctj7jMKaVT5bWqUHeeWNjErK34SRzuH3hRsaVXImri90tJ2GdRo6OHEjCWMmfXAK/iPtTxTHVTSQiGylkpqWW0MXrXuPID25oHQ/ff7XqHh6LbREihnxdbcdgT1I3/SeZaI7pjTKJ6yULE3mZJWrUVNKL1CA8ZLIZVn5/EmNLfEXLn5Chrf29ib/ErvvbxKZlQ8I4cA4/lCRvxC8RH8OWJzP1JlZmA54wwws5DExpovSfyeNs+IFmgbhhjVoAd6eFr8Jr6HEgIpAShj4DIP/sszFAZKYio6US3C9Kj4i6gzrpCX2kVt7O39oHRALYrPnZyBbQkynKDEIg1EdNM8XBkkhzv1j4QVeNeSXQB754Tk/meKiJ90roArKB+gmF4SYtQA6/0iWAqNv00rfcqabeDE2hqmNDavTFI4hvsMEfnRdzMmtY3WEd6Af5Not8+qJzGkcTf0NbNIdSqxzp6olWAgv37GFWLfNfXqMvozNFmwvDobt8wxK1y4/4XFsP7Sif+7nRSVR90Ul2hxGkjo2iVJEBrcH+PsYaxSDEb1ofjSp8FgE0f/k905ovOAFiD1bZXMp5851JHqTjfxOZ4ukWDR6NwqZxbypz8vU2NMY/JswkUJeZ+1YoU5o/ie4BYoWcmj071vLiGtrFrwgEs1agBZT8HZCu9k6vqfr3AYAvJLqh36U72SxJ2lvekLlwYlOmhT 5KW4y997 1CRyNmKF2l7NBXVlrKlU7LNU8xRssoTZPoMtyPX0w/3bBeM3/xVMGQsh/RJdd5YEnfp4pAFYUgSTlF0aDbHLsL/b+p7QdBm3G+ucwTYLtOs7yKHnMqh0gmPfd6vfoBSIzrRCJRV1aAJZWyQE5piA36qlmTaMIxKb0B/8ioo/4D6h9LxrM7waQQlJ9TmWfiVQJCu9iICWm4oYqFX+02coj4i7nu83bI7pl6DDmIYOWcWkwEJghZZqU4KHcH6CwSX/k/WHYpIl7pz0Y2Eefb0A8Gn/wQcOp8bmsYaDHftchPlNSGxnf8yd4WsN3OMF4QxxfvcRe4EsAnLEcXeBUYU2zqeXHICmbJ3Mm1OvPfLVVBSmnRl6upEp+g6OPik+SRaU5pXp6A3i/VnNkBVVdPRXoF6AeASNFmNESkmyupXadwSIW8PL9+tZ7cAVVq3n8CzrwcWkytMCuLZ5ZHjz7ZmSczfKaLpddhqv61B6b2rXH797SpSEU6t3cJSs600HmN9omn6NKE5ONjctzTZpPNLpcPR3/apBpF70sHD7u X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 18, 2023 at 01:52:23PM -0800, Yosry Ahmed wrote: > > > Taking a step back from all the memory.swap.tiers vs. > > > memory.zswap.writeback discussions, I think there may be a more > > > fundamental problem here. If the zswap store failure is recurrent, > > > pages can keep going back to the LRUs and then sent back to zswap > > > eventually, only to be rejected again. For example, this can if zswap > > > is above the acceptance threshold, but could be even worse if it's the > > > allocator rejecting the page due to not compressing well enough. In > > > the latter case, the page can keep going back and forth between zswap > > > and LRUs indefinitely. > > > > > > You probably did not run into this as you're using zsmalloc, but it > > > can happen with zbud AFAICT. Even with zsmalloc, a less problematic > > > version can happen if zswap is above its acceptance threshold. > > > > > > This can cause thrashing and ineffective reclaim. We have an internal > > > implementation where we mark incompressible pages and put them on the > > > unevictable LRU when we don't have a backing swapfile (i.e. ghost > > > swapfiles), and something similar may work if writeback is disabled. > > > We need to scan such incompressible pages periodically though to > > > remove them from the unevictable LRU if they have been dirited. > > > > I'm not sure this is an actual problem. > > > > When pages get rejected, they rotate to the furthest point from the > > reclaimer - the head of the active list. We only get to them again > > after we scanned everything else. > > > > If all that's left on the LRU is unzswappable, then you'd assume that > > remainder isn't very large, and thus not a significant part of overall > > scan work. Because if it is, then there is a serious problem with the > > zswap configuration. > > > > There might be possible optimizations to determine how permanent a > > rejection is, but I'm not sure the effort is called for just > > yet. Rejections are already failure cases that screw up the LRU > > ordering, and healthy setups shouldn't have a lot of those. I don't > > think this patch adds any sort of new complications to this picture. > > We have workloads where a significant amount (maybe 20%? 30% not sure > tbh) of the memory is incompressible. Zswap is still a very viable > option for those workloads once those pages are taken out of the > picture. If those pages remain on the LRUs, they will introduce a > regression in reclaim efficiency. > > With the upstream code today, those pages go directly to the backing > store, which isn't ideal in terms of LRU ordering, but this patch > makes them stay on the LRUs, which can be harmful. I don't think we > can just assume it is okay. Whether we make those pages unevictable or > store them uncompressed in zswap, I think taking them out of the LRUs > (until they are redirtied), is the right thing to do. This is how it works with zram as well, though, and it has plenty of happy users. The fact that there are antagonistic workloads doesn't mean the feature isn't useful. This flag is optional and not enabled by default, so nobody is forced to use it where it hurts. I'm not saying it's not worth optimizing those cases, but it doesn't look like a requirement in order to be useful to a variety of loads. > Adding Wei and Yu for more data about incompressible memory in our > fleet. Keep in mind that we have internal patches to cap the > compression ratio (i.e. reject pages where the compressed size + > metadata is not worth it, or where zsmalloc will store it in a full > page anyway). But the same thing can happen upstream with zbud. I hate to bring this up, but there has been a bit of a disturbing trend in the zswap discussions recently. Please do not argue with private patches. Their behavior, the usecases they enable, and their dependencies are entirely irrelevant to patches submitted in this forum. They do not need to cater to them or consider the consequences for them. The only thing that matters is the upstream codebase and the usecases enabled by it.