From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19898C3DA6E for ; Wed, 20 Dec 2023 14:50:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 837056B0081; Wed, 20 Dec 2023 09:50:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E74E8D0001; Wed, 20 Dec 2023 09:50:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 688356B0087; Wed, 20 Dec 2023 09:50:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 542EA6B0081 for ; Wed, 20 Dec 2023 09:50:32 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 25013A1CDC for ; Wed, 20 Dec 2023 14:50:32 +0000 (UTC) X-FDA: 81587482704.03.6B7CFE3 Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) by imf15.hostedemail.com (Postfix) with ESMTP id D0DEFA000E for ; Wed, 20 Dec 2023 14:50:29 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=xSQFmsDx; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf15.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.218.53 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703083830; a=rsa-sha256; cv=none; b=eh4v76wIHkLZKHX+gz0I5F+Hb1Yr3AHXjjYiRLY8qtV1No7alvsO5d/WjrPkvIWJpwZ4SJ D6K2r0KhjlrM0938f6S1FzvYH6388qbp8HVZPPtbGfsf6UrxGX0dRTjL2jsqkfuy87vM89 XqwHaHWymuZt4BLPXyR1Orvp7YyUaiw= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=xSQFmsDx; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf15.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.218.53 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703083830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z24VtdqHHTiY12h8DIjtzTUlmR3aKpU/E8WkTXbkbWo=; b=5bemEWXpOYspqFoma+yxtXyptVx+Yurvm+9auOW0b2OY8GfoGFO13FA0TI9kpsW8h+Dueg m1wV8qi1HgjfQ2wbtfSlYZD6C+YcUkZpZo7oQiCucswjzSrSUuNnjIAC7vW+aYxtdPZ+Tg JEHEGf6LWxjWST3xSOTpGgwUQ1cIFpI= Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-9fa45e75ed9so651214366b.1 for ; Wed, 20 Dec 2023 06:50:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1703083828; x=1703688628; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=z24VtdqHHTiY12h8DIjtzTUlmR3aKpU/E8WkTXbkbWo=; b=xSQFmsDxjNqTrvqck8cLrsJKhL5AMEgEIEwJk4zL7ZCA1tNQ8JtFIaFPJB+3V08rQf d3mx2lJyqeb2SzrmpOc2S6x8Jydv9HNzcllcjgJS4Le13fauxJbs2V4zj669o2ZX39Uh dMrs/EZiwA80qAB1w9QAyIQJOLywcPI/UekCLQASZelpebZjnZNENGCa2rf4kNyi+dlI lnXoIOz/JdZ09zzr9O+lSVriUQF3Hy53Q5r7kM67MTbe1MZ+G0wvWrLq3Q3L58NTmGuu wGiNApywmEC2qWWwMRrvdDA3LCL9z8Bouc/Lh32HpM2uZoq2fGLIiVAGxj0OI1Y0rggE N6fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703083828; x=1703688628; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=z24VtdqHHTiY12h8DIjtzTUlmR3aKpU/E8WkTXbkbWo=; b=mYu2X26hsT4jxdlRCWggMjFbeRB3ehrbppl3kTYAQHeg5O7KpBi0oY6EBe+lLcCzlD xscQN3SzgkJTwDEjRXshaEOdkkKuH5CIkzhSwKyYu5UzArijSXqmsL5TxNC+lV3yxfaw i3L4zob8lULLNyzSMuNOA1PJ7/ZtMYnApAThqaR+eQ3WQAVeeOg9uPcf75+NBdrp/v7A ZNZNanh1aSPpw08AG7cXjAVm/F1VhBCLe+T10+NYBhx65zGb4TIfZnqjv3PAmczTPEFd BRz5/dkFNhhGnmlXLrWjlSfedK1/XXw3hPnw10/NBU6lNuW8Uh6KIC8hdVg6P5jL/1uJ gMwA== X-Gm-Message-State: AOJu0Yx5nlPhKNSxFaSuBRaW6CMdo2KOD3AL+IE+tfGr4LwvA4Ae5rIC q9g9+vFpWiLDfhYGqdcOLdgFGA== X-Google-Smtp-Source: AGHT+IFSd6R4MolrVxLZOeFPXKF/fkA1R3XqYnZaTri+l+ALtA2Sg/P9TVp4FuzHLLl9ftft3R5ibA== X-Received: by 2002:a17:906:4cd0:b0:a19:a19a:ea9f with SMTP id q16-20020a1709064cd000b00a19a19aea9fmr8060458ejt.88.1703083828084; Wed, 20 Dec 2023 06:50:28 -0800 (PST) Received: from localhost ([2a02:8071:6401:180:f8f5:527f:9670:eba8]) by smtp.gmail.com with ESMTPSA id vs6-20020a170907a58600b00a1f99e749dasm13936441ejc.210.2023.12.20.06.50.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 06:50:27 -0800 (PST) Date: Wed, 20 Dec 2023 15:50:25 +0100 From: Johannes Weiner To: Yosry Ahmed Cc: Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, chrisl@kernel.org, Wei Xu , Yu Zhao Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling Message-ID: <20231220145025.GC23822@cmpxchg.org> References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231218144431.GB19167@cmpxchg.org> <20231220051523.GB23822@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D0DEFA000E X-Stat-Signature: rtqnpj7uboftbwyxege9tbwm66zjaw61 X-HE-Tag: 1703083829-904258 X-HE-Meta: U2FsdGVkX19Er8Q/gLYWpE4wgV/zgJw/aleKjT2VzcItA/p6wX8NwZ1sD7NBsvUbIYPRKkvhjNcdNIeS0GBwbRdRXj/6S3PoGmZS39S9fNG7rEQvhHYogLRdFl9u4mwsiPg6i+OggI7O69FeiG1V8QYBdBsn8DMTRaAKwVWq0Hya/xxhQKLcGuPrvT4ldzLB61Al68s+VRtri9PIRREuApmH6rsTtsFhch1p4LujVz0nxP46VPISS1Tv1ARXGk35BxgmohHkxcsXskQAWR4CJZNBqtFv1pwa8h41JX8mzYoq8hRq+8xOwIzC45XRnZF4LSuYsvx73PjvM15Oo2I3Ix9cJJJvb39ZCi181WPzC8s/nNMslxbGplf6/6EelEyma9BHAJDJWTXVoKQzpN7AT1NV5eQFwUjO0e7tkaNNAUfsdh9uHFfQmmKfNiT/OalGsfrTnoX3vjQzbtBoXfq0IxCkShvygwATPqIXoxnI5fsHWYSzhFZcooKOZfguckJNlMmQQDoNa/LAqMD6lqrFmJdn5Wmq4IMmuzck93fFi3Yy3slS2NCdwCsypIPBDnLyJfRtU3UCGVmlXFHcBVcu2XOrGYEyeh6aozQjbxYMZBJoWR4fKM/U0WlQAdhvPyzx7GUVsm3+HigIqWpqobB+xKs+KwseDbozo4LViKlBxlZ5fL0cU5iub0Av4pEZVioJZD+zmSRw+yp+4jXv3V3/2Yek57GriNGNu7W841oRH9hVDF00dFO7sq/epRc1US6Rv/MK1yfCdxpM3/ABKWIPiGZky04KGjN4RliRYoQ8Aol3O+trMeJ5mhQCyQ0zK5mgkKlfvoYj12zMJFJDfy7A83nMK9MufON2dftu0TQ1Ph2oOnbGBHiEnAvfQ8DgAv/8IymPdu/U8yoB7nP3c3z3k64iNDXxy92z3uiCoAgyGkKdt7/zZXT98YZ0ep+nO8smmtxKNWi8LcklKd65YBn z/Na9r48 uSCRMCvglHH38MSXuhd9yJl/dv0gtThXutm/TNathX2/Sx4zUu0GLz2mRfnq4QkbEnzUl6JE2N5hhC3VR/n0csGaICDEZagTOdoC4URx85XLCvduTZG0Z60PFi2eoBq7CJaDzaAJDuo6TqKJunCdle7CTMv0fxCF8/dsqTsIK6fzHjrh/LQimjIiT0PrU2AC6RC8d+pWFR9u7KxWyrcJ++FOsPWkuFRkrDfEisZWcgbibkATG5MutgY1OFp1knFMAWNxcW9t5agtoDZyFbdOEB0CQuGUoV8/eJ7YtfGX7dSJMGLzk049bvHNmPcWaIlwgUEKd3ryPTJTCL92J0gE/iRBFzsVOYExFMTOOaBacnUsNJ1cFI9d36LjNUnMUsA/HVQeeZBaAw9NBT2UEJRS7r29VOnEEscJp4uWJdmWWnORAihSvf6tML0jh6AxYCpaYNpgh12P8cKxQ4sJQTFm9xjsdQJ38hGA/J6ps X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 20, 2023 at 12:59:15AM -0800, Yosry Ahmed wrote: > On Tue, Dec 19, 2023 at 9:15 PM Johannes Weiner wrote: > > > > On Mon, Dec 18, 2023 at 01:52:23PM -0800, Yosry Ahmed wrote: > > > > > Taking a step back from all the memory.swap.tiers vs. > > > > > memory.zswap.writeback discussions, I think there may be a more > > > > > fundamental problem here. If the zswap store failure is recurrent, > > > > > pages can keep going back to the LRUs and then sent back to zswap > > > > > eventually, only to be rejected again. For example, this can if zswap > > > > > is above the acceptance threshold, but could be even worse if it's the > > > > > allocator rejecting the page due to not compressing well enough. In > > > > > the latter case, the page can keep going back and forth between zswap > > > > > and LRUs indefinitely. > > > > > > > > > > You probably did not run into this as you're using zsmalloc, but it > > > > > can happen with zbud AFAICT. Even with zsmalloc, a less problematic > > > > > version can happen if zswap is above its acceptance threshold. > > > > > > > > > > This can cause thrashing and ineffective reclaim. We have an internal > > > > > implementation where we mark incompressible pages and put them on the > > > > > unevictable LRU when we don't have a backing swapfile (i.e. ghost > > > > > swapfiles), and something similar may work if writeback is disabled. > > > > > We need to scan such incompressible pages periodically though to > > > > > remove them from the unevictable LRU if they have been dirited. > > > > > > > > I'm not sure this is an actual problem. > > > > > > > > When pages get rejected, they rotate to the furthest point from the > > > > reclaimer - the head of the active list. We only get to them again > > > > after we scanned everything else. > > > > > > > > If all that's left on the LRU is unzswappable, then you'd assume that > > > > remainder isn't very large, and thus not a significant part of overall > > > > scan work. Because if it is, then there is a serious problem with the > > > > zswap configuration. > > > > > > > > There might be possible optimizations to determine how permanent a > > > > rejection is, but I'm not sure the effort is called for just > > > > yet. Rejections are already failure cases that screw up the LRU > > > > ordering, and healthy setups shouldn't have a lot of those. I don't > > > > think this patch adds any sort of new complications to this picture. > > > > > > We have workloads where a significant amount (maybe 20%? 30% not sure > > > tbh) of the memory is incompressible. Zswap is still a very viable > > > option for those workloads once those pages are taken out of the > > > picture. If those pages remain on the LRUs, they will introduce a > > > regression in reclaim efficiency. > > > > > > With the upstream code today, those pages go directly to the backing > > > store, which isn't ideal in terms of LRU ordering, but this patch > > > makes them stay on the LRUs, which can be harmful. I don't think we > > > can just assume it is okay. Whether we make those pages unevictable or > > > store them uncompressed in zswap, I think taking them out of the LRUs > > > (until they are redirtied), is the right thing to do. > > > > This is how it works with zram as well, though, and it has plenty of > > happy users. > > I am not sure I understand. Zram does not reject pages that do not > compress well, right? IIUC it acts as a block device so it cannot > reject pages. I feel like I am missing something. zram_write_page() can fail for various reasons - compression failure, zsmalloc failure, the memory limit. This results in !!bio->bi_status, __end_swap_bio_write redirtying the page, and vmscan rotating it. The effect is actually more pronounced with zram, because the pages don't get activated and thus cycle faster. What you're raising doesn't seem to be a dealbreaker in practice. > If we already want to support taking pages away from the LRUs when > rejected by zswap (e.g. Nhat's proposal earlier), doesn't it make > sense to do that first so that this patch can be useful for all > workloads? No. Why should users who can benefit now wait for a hypothetical future optimization that isn't relevant to them? And by the looks of it, is only relevant to a small set of specialized cases? And the optimization - should anybody actually care to write it - can be transparently done on top later, so that's no reason to change merge order, either.