From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EE2AC3600B for ; Mon, 31 Mar 2025 17:06:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D599D280002; Mon, 31 Mar 2025 13:06:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE209280001; Mon, 31 Mar 2025 13:06:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5AE4280002; Mon, 31 Mar 2025 13:06:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 94787280001 for ; Mon, 31 Mar 2025 13:06:16 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 762F1BA0E9 for ; Mon, 31 Mar 2025 17:06:17 +0000 (UTC) X-FDA: 83282474394.17.635D4CE Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf25.hostedemail.com (Postfix) with ESMTP id 75267A0008 for ; Mon, 31 Mar 2025 17:06:15 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b="Oc1//Aa/"; spf=pass (imf25.hostedemail.com: domain of gourry@gourry.net designates 209.85.222.175 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743440775; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0LYO4zm8RMlUKfb4Sk4QcbEcVNv6rhkxdm6HCjIzY8Q=; b=Yj1efcsNhQ1ZkunRzmJZ5PeJfPbL7ZVVNmm1JHERTfTKYsQKG8+v3qN3MC6LSc75b59ocO fLcEsvhc7nD7Cg5nN6Spnm7p0XT49AjO36vOa3dUDY+q0CDjBhXTDVe02gPueQgFr34iHq TEuZ8fM4uRmDFViVtxABKAC0gKpq1Go= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743440775; a=rsa-sha256; cv=none; b=ygR6yENwUmx62LLc06aPqkL4yrT8eWxwW5G6QwQtMnEvkw2ZXma7xyLxe8JNyblSbYASh6 8Ls7exfL1lI0bS0sKaSXQp0RfNj0A1JR392v0Rj/tCvxyMQ/tJ3IKaE8RAp3HiK7pyWQ6V sxvbTEN9xDZuU33hWIpjA62+IT2agTU= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b="Oc1//Aa/"; spf=pass (imf25.hostedemail.com: domain of gourry@gourry.net designates 209.85.222.175 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none Received: by mail-qk1-f175.google.com with SMTP id af79cd13be357-7c592764e24so495577785a.0 for ; Mon, 31 Mar 2025 10:06:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1743440774; x=1744045574; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=0LYO4zm8RMlUKfb4Sk4QcbEcVNv6rhkxdm6HCjIzY8Q=; b=Oc1//Aa/v8W9OexJECtRtaLfX/GLrgTZMYlL+j7BcjxuIYjkbezGezjbK9djbBivpJ 8htyjNUttqAHGjYLaR5b7Wz+MRZmX8AY2WKY0/eWGS6orfTt5f2cvZPzgpb1NW82QkzY KN+LBA0Hfgv+dKrkEsL7BTcfva1AiHOd5W9om9+37p0c0zBvWvl+WmEt3lrOsiIAf6mx 5e8SVFGWQvONYQb7WYV0GxqzlJxzddA+2EAHB+87iEd4LBneqn1Ftjz7GY/LcL2GsgCq SRHep/iZNyddrdjxH70yNAHouvRSc1QU5WEsOWg+plKehopbuPdpkSVCflPzxdsJ6Jqu AuNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743440774; x=1744045574; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0LYO4zm8RMlUKfb4Sk4QcbEcVNv6rhkxdm6HCjIzY8Q=; b=tG4dOsus4orHkoPiIuSdrTcved1RUp0RlTwXVsBWgnFx+PaancO8mpyvdjgU82Hmyp rZFJpr3JEkUqAMRrrdsbxApqFwhLy2XzWfG+74ftbL4uayZtCOGA0rjVD5zEmco+Qlcv Nib0FKg1u41r9GoNU/5nud3L6GQwc9vEG03/pqhtxkHxOTvJc+PsQFBknV/Ob+pQYcaj hULgQfyhx4RoJPK7YF9dzL7yNqUuU9xd4TIjFMs2uz68Su2fVzv8KrZBhNbyqGTDZm5t PiH812PcYbxE22/iLMaa+OoHnl0yqoI81iJ1G5uGXAngF2zPB+x+NrBXB3er5Y4Axf12 DnJA== X-Forwarded-Encrypted: i=1; AJvYcCUPgR0lhDWetwrDawbjrKEtdvxygSDdIP3bRr5ZKEqVK9ypKLfLVxYO2gAo0qHSs2u4yLF+pLr/Ew==@kvack.org X-Gm-Message-State: AOJu0YxGwZ+PIzD3bEGlrLXYcdV6iAcmKIdXnIvQxCacyH4J8kACGCFX h1ZHlKGJP4ZnI2lad2WP3o/bX7QbcBKHt8wY2cQkZUulvJiorSSV50/GbNRmNJE= X-Gm-Gg: ASbGncvHgVWVhPQuQk1P8xmBaGGz7sygyQOhdZPIpvjgFukQsVYalV8owF59CxVNhfK pOqshJEgennCjEJou+/T5ckjbJvlWJ6N3cinIkbrvNl6fUCZNNOHpXj5FXuWFNxXtUnroUIzKmO umCe5+i8LFGP2WjQXgD+l45nMEdGrKy68Spxc8/nmseWMDHXAvuiyJ0xijL+vw6nI3hq438G80p cNV349wC/dlOpxbq++EGX2KP1HBe6FuORPXCttQ3uzCaLzYkvwSnMFrHn2dOQJSxThJvlmh6Awv 6ZeLj4bwl7TjTOPqynJRrNcJhEC/nKguuBUwj3vT09bIleyoyHvA8sH156oBVJFFi7TxdQGbPtW PBy/JdgjL19If7XNrQTEHvv7mxNE= X-Google-Smtp-Source: AGHT+IF+bBUUF/z4WAINvTMEHGuMif5wEic6RxEk+UKvDoo5T5mh5wZQrz6qGBS+JbGWxPuXbLYPyA== X-Received: by 2002:a05:620a:3723:b0:7c4:bca3:6372 with SMTP id af79cd13be357-7c612c36e30mr1561550085a.0.1743440774368; Mon, 31 Mar 2025 10:06:14 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c5f76608f3sm524042985a.8.2025.03.31.10.06.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Mar 2025 10:06:13 -0700 (PDT) Date: Mon, 31 Mar 2025 13:06:11 -0400 From: Gregory Price To: Yosry Ahmed Cc: Nhat Pham , linux-mm@kvack.org, akpm@linux-foundation.org, hannes@cmpxchg.org, chengming.zhou@linux.dev, sj@kernel.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, willy@infradead.org, ying.huang@linux.alibaba.com, jonathan.cameron@huawei.com, dan.j.williams@intel.com, linux-cxl@vger.kernel.org, minchan@kernel.org, senozhatsky@chromium.org Subject: Re: [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems Message-ID: References: <20250329110230.2459730-1-nphamcs@gmail.com> <2759fa95d0071f3c5e33a9c6369f0d0bcecd76b7@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2759fa95d0071f3c5e33a9c6369f0d0bcecd76b7@linux.dev> X-Stat-Signature: x5nosqpmixepu3ba4gnd8codwmd1uk97 X-Rspam-User: X-Rspamd-Queue-Id: 75267A0008 X-Rspamd-Server: rspam08 X-HE-Tag: 1743440775-237821 X-HE-Meta: U2FsdGVkX19e6nUsfa4/25h5OZmcIWU5Wh5iF4t4i830DOKCRDkUWqrtyJMmkHw+r6Ha0Hnrjr0yG6VOsO1yPXENPgAJ73jtMs2yN+/Pj4m+V+9Va0xSqmsykKyoffZBA8jjgxLf4OUFKnEGI4/Mn2Q3saueDQTNhXmkxxG+teFwDD50b5cxq2um8HgqwFOWkmkTpgcDNHoWXLhoo18qPK6Fo7mYzw5zW9O6Av/PXQoWQnXxRKsPE37XGXY/ji+kn3/ED1chHUR6juzhFGkZ9OJjblzGk1dNL3TWJIc9RrLtE9gV3wXYiELYbUUotIMd8Ci18GeiXxJKRrGafMMejJPQx9WSX7qDlXpdfROFtAcKttYGgj5GSQV48CJmBBN8LikYG4zwKEuQ7cs7+5h51RnIP4URJ4/2wFtveqZvexVcjrykZbQsP3U2JPcQ2Q22Jlf9r/dt1UuOBRC9Y7gxzyd/FtYy/EfUeuN15FjTyPlS3THHdgSlp0zjhSTG/esvkOUKj+iblZwCrl8XcMX4tt62PNKVdzRqrEbV6rk0Jzujvq9cBigb2wMb8IDqI3S0MvrsbyqLm9SPom6Ps/u3kxrmBsVhxGo90EJLFj59AzpVasdAggxJD4KORjMGMsCR5xPdaea66wULEL0XWTuTLDKsF9EREWsZhd3t8MKyoPP4mDI56oB552BrARqLV3dd0RFS9HhjzUjDSowGg0sCb4/+V+ATC7efR3PQgl50GR4RaZg5lVT3jyLQ8UDOZG9EFeF4mfVOj2O2Q+1cZjimRuaaAAdI2jTxgXrIFzud+v7er53xob9XuKq/zToJBxDFHqbx/c2PzZldZ7NifOpDaGtqINvZqchJsU4riIiRaEz51pcbqk6deax0l2iy+I5+qWlksjt4q5qurKVsN5d7yR8bAZLdaKsLGWzIHmLfRWllMU58xSDBRai5VzX/dqflyyzeqzoMUhcZ1KtaxS3 kKn0SAfh OYANkZZVehOruxuYodCXVI4QTg/5Fhhckzaj759Q+H1WEpvzzVMJvuaaJXsnPOhSJczkUPeT8Re6G98ESyjl3YDjLEvx/X7RmGJoNrBHxmUodzpkRcfhnbd1yq428utBxT34HeaGND1TrSNlrGCaUqUmiU2hdgMZiljPe9gjHlF1lXAAn8KDgCdgIPelDUPtpHRukuFenn4mKd0s7DTdowM9qNzBnvt32eyPH2PEecwKOBPoCLsPWoz/qQ5eoR8gx3JI4gzHDeEIooo/42lhFHDJk5g6Rwocp7cFrYkKcBcp2hFD8/DRs4sIfzqUInJbh7G3XPF0Qoky8/a1EL2ltQyFQedQYhWDjysTp X-Bogosity: Ham, tests=bogofilter, spamicity=0.001000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Mar 29, 2025 at 07:53:23PM +0000, Yosry Ahmed wrote: > March 29, 2025 at 1:02 PM, "Nhat Pham" wrote: > > > Currently, systems with CXL-based memory tiering can encounter the > > following inversion with zswap: the coldest pages demoted to the CXL > > tier can return to the high tier when they are zswapped out, > > creating memory pressure on the high tier. > > This happens because zsmalloc, zswap's backend memory allocator, does > > not enforce any memory policy. If the task reclaiming memory follows > > the local-first policy for example, the memory requested for zswap can > > be served by the upper tier, leading to the aformentioned inversion. > > This RFC fixes this inversion by adding a new memory allocation mode > > for zswap (exposed through a zswap sysfs knob), intended for > > hosts with CXL, where the memory for the compressed object is requested > > preferentially from the same node that the original page resides on. > > I didn't look too closely, but why not just prefer the same node by default? Why is a knob needed? > Bit of an open question: does this hurt zswap performance? And of course the begged question: Does that matter? Probably the answer is not really and no, but nice to have the knob for testing. I imagine we'd drop it with the RFC tag. > Or maybe if there's a way to tell the "tier" of the node we can prefer to allocate from the same "tier"? In almost every system, tier=node for any sane situation, though nodes across sockets can end up lumped into the same tier - which maybe doesn't matter for zswap but isn't useful for almost anything else. But maybe there's an argument for adding new tier-policies. :think: int memtier_get_node(enum memtier_policy, int nid); enum memtier_policy { MEMTIER_SAME_TIER, // get a different node from same tier MEMTIER_DEMOTE_ONE, // demote one step MEMTIER_DEMOTE_FAR, // demote one step away from swap MEMTIER_PROMOTE_ONE, // promote one step MEMTIER_PROMOTE_LOCAL, // promote to local on topology }; Might be worth investigating. Just spitballing here. The issue is really fallback allocations. In most cases, we know what we'd like to do, but when the system is under pressure the question is what behavior do we want from these components. I'd hesistate to make a strong claim about whether zswap should/should not fall back to a higher-tier node under system pressure without strong data. ~Gregory