From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11FF4C35274 for ; Mon, 18 Dec 2023 21:53:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EE506B007E; Mon, 18 Dec 2023 16:53:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 39E676B0080; Mon, 18 Dec 2023 16:53:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2668B6B0081; Mon, 18 Dec 2023 16:53:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 16E386B007E for ; Mon, 18 Dec 2023 16:53:04 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D390414017F for ; Mon, 18 Dec 2023 21:53:03 +0000 (UTC) X-FDA: 81581289846.23.624DAAA Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180]) by imf03.hostedemail.com (Postfix) with ESMTP id 27E1920006 for ; Mon, 18 Dec 2023 21:53:01 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=17BQbCDS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of yosryahmed@google.com designates 209.85.221.180 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702936382; a=rsa-sha256; cv=none; b=OHIUjSRZ6w82Lbj3vC3b2QdnLh2J0JYPWFHWKy6O6IRsAEqF4nXLxz3Pep/oMHhReRUoZo 8q7UJcKNUNWiNPm6cdmypshPZ61IO68SMrW7ty9dykvlKFgVb59sLN1HbVCqFR+3LbW1Sa 5SARHXPFM4WnVo20LTUQrkuo18jGt6E= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=17BQbCDS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of yosryahmed@google.com designates 209.85.221.180 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702936382; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qL6Hfk+uqwtN0FGrPdWYTs1ebhUVgd8KgkwFHPg64Lk=; b=N4ZE4tV3dZb6iZdhiQnDPhIT5gJa8m7KLQKDm3TjA7iPqdYxZl9HYDNKPe+Ovq87uiZ/7a MeiiAEeHFOvQUBCtijhUdBtBODKiYPdlCzPKXqTd6dz1ib36wSm6huZi8T3YlDXAdMHkpH WD43AWpQjzJzOXCOGuKKllua730XQW0= Received: by mail-vk1-f180.google.com with SMTP id 71dfb90a1353d-4affeacaff9so504631e0c.3 for ; Mon, 18 Dec 2023 13:53:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702936381; x=1703541181; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=qL6Hfk+uqwtN0FGrPdWYTs1ebhUVgd8KgkwFHPg64Lk=; b=17BQbCDSXjSCq7Qup+fd/BzKytX7V0eILni+KzEa5aytRtpEFQUasWw3aGRuoIF/CS NTqmdctMEgMgnghDcLL4vlqnKQe3mO4egXQSQJGhPaMac4CtkHuC7FZH918I6AHKmVYW xw1GoHBHz52rvGK9OQX+Y9+au3I7QHxq55rCBnnG6rLwicanfaJPquRD6JW7LEsw/1S5 CdKIi9WngjM8ZvZ7YtkvHymBADn1tMoEWeVS/ZmcsuEHF8F1eKsnt0m0TSonCGj2RAlY e4MS09a/H1WhiTYqPVqRBhAJRA52qJDLM2yq4dP/5rgx5KDMzlUjiCOYVM1ocabGvabZ Phtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702936381; x=1703541181; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qL6Hfk+uqwtN0FGrPdWYTs1ebhUVgd8KgkwFHPg64Lk=; b=JJ7Hk8X3dmR23+jVG60AONz13SMX7d0EE7JOMQyb3GOntxzxM6K2X8yBEX8eYxwPUs ZTL9+AXgLuzYAGKh5AgD3XkngEos/FRdZhB6fy9mvMewbGy9/hOB8f93DplA35tcj6oF CdEF7oo/0UEY5U7zBx+ThV+WPLhRgksayQX2QeymAV11QE4umQ4zOPtZSVeHYB0ryGZK 900P28mbII0VyWiYbffEMczgY/YgNEJaCQfyK9cdqctsvGhVXnAIKaBhUbhQpXHjeGbp HxhijjehRX1y4vSYkuub29xv0N3ZsNBse6fFSQrkNKBS2NYWF7Vig3bnJjaCiiecoDDM errQ== X-Gm-Message-State: AOJu0YzUNfXNaTEXSr45w+p/xpre0AHK86zjl+1BR7Q+B0a2dd4+P4bN xJzF2S2FW9IaZRmTKnidacA24InTXgx91buBTqmZKQ== X-Google-Smtp-Source: AGHT+IGcfs4lSTJfkLGXzFhIGnzieJHv5rnAJxXnS53tFTrXJm63SMA3fMbE6syGwJ8lzSzgsSoOhpYioFVHH40cxnc= X-Received: by 2002:a05:6122:4e90:b0:4b6:bb4c:ce57 with SMTP id gf16-20020a0561224e9000b004b6bb4cce57mr1756085vkb.6.1702936381100; Mon, 18 Dec 2023 13:53:01 -0800 (PST) MIME-Version: 1.0 References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231218144431.GB19167@cmpxchg.org> In-Reply-To: <20231218144431.GB19167@cmpxchg.org> From: Yosry Ahmed Date: Mon, 18 Dec 2023 13:52:23 -0800 Message-ID: Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling To: Johannes Weiner Cc: Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, chrisl@kernel.org, Wei Xu , Yu Zhao Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 27E1920006 X-Stat-Signature: ndzbgfa6a4dhmdfio16c3tofi3d9y6js X-HE-Tag: 1702936381-894014 X-HE-Meta: U2FsdGVkX1976YcOkwzxJ8Rgz0QZ7tQP9lDoZXFnz1czsSJuipYLtpcePsjmK0nu4sN0zxWBQjvgS2NjoTi3PIiJz7LbAD5ZPH2+oh4sdCeeogUxOe8JxqKzmJlj46CpAOm+dV6HgGONDK/Bu8G3SJLaZKug+1Fs6bl+GTyXyukWxUUhNZ2bwyPvhgfpNIAqg0cQXsmg9xrczmrnc+aozWw+HMEjAXcr+MgrBDobYnoapyi6QAqswc0PfLbzx/XabbRgpdDYuFwUrnG1gL05dxNyn0URaLNfsc0dV7a/81dpqDKkMvGlK4IYbjhMgOxDV/cGqQk2RJT4JKiKXM9ZF44ajjajw3uRT7Sxn1pDIYd2WQo/LWq5VlSR15SW0/3zaWIRNgskMguichnZkJZ0HlRtS+YottT05OPf7m0VUQi4Yv5G7KfgqIm9zCzr+zv0UggYaArGCnFNLMkUN9I70UNh74aZK41FS92EoEgJykmesKA2xh2LEYc9Ih/FZmeg7Lm2wONoVzlDUvAxlEGT44ePVywC8EIg76K3AbDp13bdEHln2nXiRPgWbq4cbMT3bZwNDpgRD9xBVrYloadey8wkE7HY4JHVDDawvwO1N/6YW7GVoApVssWE31x9uo6fI7k32WXmJYzeovVNBCxdSTMVLLzBurS+AqlHAif2uSDEWINvKxJWJgDizhyYnnXGLy06+ZNXdTeV0/3OC6zNSpKmmsdewoHsc3/PTB5DQuHccwWnWSJMGHC00gn0tLFM5ejrEo0NIPU4QFA0VuevgYxKsn0goupKJbwil76+E+2h04nuS7OB1kmfAirhNMJRzWR7ahwzUVx0yFbHBM9Mb04VYtj1uz7GjEEiy9cZbxfYyk1LgFre/Tp8NRJiZuVfpMmzkgEXucFrbqAPESmPFUxP78u4F5hfHjpp7wxytO6WOhtvSBQ89JIpUaReydhfr4cJ/3yeERgOcAS9fGL 7N5Fw8Q9 kQhmnEJhU9VUydDtTAXLgDbz1zbPzeR8QXjyHswTfcajAJDqsfJNUcNJZ3qU+3PP2ICiYrI5cid1z8k2HUuvW0QU57tlMa2tnhxjTzrOcZEGCv5uW1AHOOQXMFShwllz/AVikDQRE0w6ZJI1AUFsvsOkW7zw87Ys+tzXQv8cRpnFwO7FOVIh/D+ps4GOsFxcJrvIlrD+hA0/gZ+tWI/ff4eVmxgaQiYbCUE3u X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > Taking a step back from all the memory.swap.tiers vs. > > memory.zswap.writeback discussions, I think there may be a more > > fundamental problem here. If the zswap store failure is recurrent, > > pages can keep going back to the LRUs and then sent back to zswap > > eventually, only to be rejected again. For example, this can if zswap > > is above the acceptance threshold, but could be even worse if it's the > > allocator rejecting the page due to not compressing well enough. In > > the latter case, the page can keep going back and forth between zswap > > and LRUs indefinitely. > > > > You probably did not run into this as you're using zsmalloc, but it > > can happen with zbud AFAICT. Even with zsmalloc, a less problematic > > version can happen if zswap is above its acceptance threshold. > > > > This can cause thrashing and ineffective reclaim. We have an internal > > implementation where we mark incompressible pages and put them on the > > unevictable LRU when we don't have a backing swapfile (i.e. ghost > > swapfiles), and something similar may work if writeback is disabled. > > We need to scan such incompressible pages periodically though to > > remove them from the unevictable LRU if they have been dirited. > > I'm not sure this is an actual problem. > > When pages get rejected, they rotate to the furthest point from the > reclaimer - the head of the active list. We only get to them again > after we scanned everything else. > > If all that's left on the LRU is unzswappable, then you'd assume that > remainder isn't very large, and thus not a significant part of overall > scan work. Because if it is, then there is a serious problem with the > zswap configuration. > > There might be possible optimizations to determine how permanent a > rejection is, but I'm not sure the effort is called for just > yet. Rejections are already failure cases that screw up the LRU > ordering, and healthy setups shouldn't have a lot of those. I don't > think this patch adds any sort of new complications to this picture. We have workloads where a significant amount (maybe 20%? 30% not sure tbh) of the memory is incompressible. Zswap is still a very viable option for those workloads once those pages are taken out of the picture. If those pages remain on the LRUs, they will introduce a regression in reclaim efficiency. With the upstream code today, those pages go directly to the backing store, which isn't ideal in terms of LRU ordering, but this patch makes them stay on the LRUs, which can be harmful. I don't think we can just assume it is okay. Whether we make those pages unevictable or store them uncompressed in zswap, I think taking them out of the LRUs (until they are redirtied), is the right thing to do. Adding Wei and Yu for more data about incompressible memory in our fleet. Keep in mind that we have internal patches to cap the compression ratio (i.e. reject pages where the compressed size + metadata is not worth it, or where zsmalloc will store it in a full page anyway). But the same thing can happen upstream with zbud.