From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C765C3600B for ; Mon, 31 Mar 2025 17:32:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 673FB280003; Mon, 31 Mar 2025 13:32:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 62163280001; Mon, 31 Mar 2025 13:32:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 510C0280003; Mon, 31 Mar 2025 13:32:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 302F9280001 for ; Mon, 31 Mar 2025 13:32:15 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B301AC1A25 for ; Mon, 31 Mar 2025 17:32:16 +0000 (UTC) X-FDA: 83282539872.08.7E7A10A Received: from mail-ua1-f53.google.com (mail-ua1-f53.google.com [209.85.222.53]) by imf16.hostedemail.com (Postfix) with ESMTP id D4A3E180018 for ; Mon, 31 Mar 2025 17:32:14 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d+KCxXfH; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.53 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743442334; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zySoybPr1LS9gCbTUTDnbZKO2E3W6qo//cfUHqXvqPA=; b=3bsSm6Ug4uhO9dVIDq3kxoGZdXT8sktPxutwOehMECAr6oTSJxJUucAjBT3lC7/3VbxZgj u39K6wsY4/ylbcj1qoRZGIIgISV9J2JKTyTu+DyaJHYOExlIjFFbN4fSu2YeXo8UL5Xyel /bWbIE3Io2wh/KUfZZF/FehQTRUZNsI= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d+KCxXfH; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.53 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743442334; a=rsa-sha256; cv=none; b=7RCStnbbUXprr+/mQwNbW1zeBkrfQe5VzEZwOxSSPm+nf8hlTYEMPPH6rVE7WSUBdgTm2l nuwefMNwhRtLdNgrS5Mn2HIalqy/IPG60Rax+zSrsdNfmml3pKOfKstcZnJTglFOSjSUBY e+Lf4xVU+ak8aZ8BxfAUEs94TYwK9Wo= Received: by mail-ua1-f53.google.com with SMTP id a1e0cc1a2514c-86d5a786c7cso2018093241.2 for ; Mon, 31 Mar 2025 10:32:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743442334; x=1744047134; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zySoybPr1LS9gCbTUTDnbZKO2E3W6qo//cfUHqXvqPA=; b=d+KCxXfH686/KeLI3r3Bfqx4jckS4f1c/mhRxWLTLZvXWce07YBslLvIAdYBHrQ3ME Hba8D4WQ56RqTmkdjUMdAZzz2jZlCsoiZ+3OFEzMoYYlDVuZf7iYhSHQuqK0Ka6w5Na4 GbSd3GVJJn+gyQwY/XHNL7kmTVyY3g1x6z6v0YDMhdg/lTAJGXltH+WUaUeFnC8dVgfB lINEb4F5B3MUgxnOGJ1EleVcgN3suopUNuMmjSG3RuaGexFFjCsQjOLFnM8ch3T9GLU9 iZKO4834/0w26v/kWZmXDPirVgReKXK9Xg5HRs7klwfJg/MxAeHJqjGUxdpqCDrjUdur i0FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743442334; x=1744047134; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zySoybPr1LS9gCbTUTDnbZKO2E3W6qo//cfUHqXvqPA=; b=OhaxtvATovZvEM7nCXbTNhNzkgN36gsq+lxtS5+2pzmXTg7VayyUpbtu8Pt3F6ldW7 x9HiA+dXSduai8ukHzxaUyi4sq+gPWQbDLqfEs7L0oulDVzWaLGKyjMv6ZisXH2o5jas 7KBvOwz4COTMv0AE6Og5zY/1ch/pbgC2Wk5DkAV4JFTJ1nz1joERdD7LqB3nTJ06s5p9 d6M9RR+QBuUB6RYFBnOwdUKcl6aAIglf+amybqXSm+JWI4qfFx5GrJH8viWdA9NgwD+s BfkwdtYAuAKlL+kuPffBAf/yu7BGkAgrU6qqJGhcmpA1n/T12dSJ763XVjwYsvbl283F lfyQ== X-Forwarded-Encrypted: i=1; AJvYcCUZGJFl3KpYKVzNEp8vhEDN8ry5yVqLMWQaT7zDp0LsUob6kNK6YgIuoZ3v1Ped9nDbPFA6uwORlw==@kvack.org X-Gm-Message-State: AOJu0YwoMemIg0GzQPjFjrmWbN9xMhPBCqk9GWQVOu8nWxN5mDfR4NUr SmPjNbPbaHQKDiV/skEVgJ0VURoZclc4GAIy9fklaT/YH7bm54T4dqxZSkpgZmuD09cH9JSFnZT MJ2Ktlz0jKVpWOK11oc8WR8h/S0I= X-Gm-Gg: ASbGncswhruFwi5GVRGUEKeXvMc/4KM64Gr4moBr9aqcWW6kZ+rToXNF69XrTpKkCqo 5E2AkasXEeTlpU/vEoKvV32eU38feIHZ73Yc/+1kkhpT7gB8rGD7aKXCZT9CZUw1vRaZeHBOgS7 pGgDLQS6cCl5mbehOd9JaoYG9xKutH7fpQwxtUE2TYeFWf5wJy0eWU X-Google-Smtp-Source: AGHT+IFRt7nEawEZeSSAGR/c7AAI9W5lRTtFBUHzbaU2PE6O+vekjVBeiEfpVtBmNdr2HXdWKp1eXSD0GadA5yFaBwY= X-Received: by 2002:a05:6122:50b:b0:526:7f3:16e0 with SMTP id 71dfb90a1353d-5261d35fd7amr5348864e0c.1.1743442333680; Mon, 31 Mar 2025 10:32:13 -0700 (PDT) MIME-Version: 1.0 References: <20250329110230.2459730-1-nphamcs@gmail.com> <2759fa95d0071f3c5e33a9c6369f0d0bcecd76b7@linux.dev> <20250331165306.GC2110528@cmpxchg.org> In-Reply-To: <20250331165306.GC2110528@cmpxchg.org> From: Nhat Pham Date: Mon, 31 Mar 2025 10:32:01 -0700 X-Gm-Features: AQ5f1JrIIZgKp1cRUxc-JOkq-RdPa2Tf7EGPfmg76yBDjQaf_dXAo3NsY6ZhY-U Message-ID: Subject: Re: [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems To: Johannes Weiner Cc: Yosry Ahmed , linux-mm@kvack.org, akpm@linux-foundation.org, chengming.zhou@linux.dev, sj@kernel.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, gourry@gourry.net, willy@infradead.org, ying.huang@linux.alibaba.com, jonathan.cameron@huawei.com, dan.j.williams@intel.com, linux-cxl@vger.kernel.org, minchan@kernel.org, senozhatsky@chromium.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: D4A3E180018 X-Stat-Signature: x7qs6eh9kg5xkkgtg7cfjaar36wyh3sn X-Rspam-User: X-HE-Tag: 1743442334-85164 X-HE-Meta: U2FsdGVkX1/wesyIK5HDZQvGxI9lJy/6beZ1nxtyUt9WmeC5hClcxqE50QM6CbJv11Tm7QgIfZQEB1YDdOo5GbDwzZPCSz1Qa0GOSubTSAZGLGP3RmpWBA0pB9Sl1BjmCue4hXqHPIhPPnrJgMUrTEGF9H0kTbeLNBwvfX6mSphSGGX7qflEmd3boQqoSC04U1kRAg2Ai+IJODMMzui0bjWtHWaaAO8ANp+P06of6w9ufHHQRzeZ9z7g8gS0/GKTNMBXDwi/I//hiBHmvkqDk4WSdxQaFXYjfJr9ZcE+pD4jtHCRtmcGqTUHGlvyazMl22kzKH93tPz5bMZi74sNhbb5I2ij1ULGaO+dS9x/Ocb7yQ1oLlj45iCxrFubcYfk4j7XEwOqb3+6ZcUOqWk67t68iCWvFYZ8nsZ54jXgcvZS4XbdcaOSbrn+gMHjpHL6JUYT/AuppEtsubtBNssMxr9NExLxMaVbK23m6U9V5R9N4e2vwoN+inh52OWKnag7q3QG/7KjJ2smHqr0Qs0U/KA2+Ufw23ZOOoczMyVOfLfP4fGb/yFXmkcXVEcLzIMFrRR7kuzT6HSriFAGPAnESucw0aVtp1sDpzsfoTCUgG2UoS9Hu7BogzLe6lGGnttT3tYQCGM/98ZVVGSs1+bzZpPNCJE9QOmYy6uWhF0ldTgG2eiQSYh7J07aeUY28EXicDYXnNKyy3Pnu51DHBwPhFlB5iOkGeV+CQcpWL8e1eP39xONX915SYm45soluDPmylb+Z+LMM2OE0f8ugNb0+C4wNK2WEWF7GprvVrBhUeGGHfijec0sF/ejAh2frH0IR2Edd+HUq2ojLXCKYNUSP85z6DyQMmeIgIn70foKHzaSD1zv9gzxnju8ye0qbossRSY1zcpGstDwgk4wBgNJrYSihfgybzw0/hndzHuuBqBaGI+fwB1R4C/4SomS4IFzzG8n0WaetjiCwn44M0p njnjRsyQ uXfphMrI+1z/fusKgxB4dQgNzaX0puZAan/BnCDOt7c5MO5/TfKmMBbItm7MmTiZIpHSy7E0/ZtmsLeiLziwBB9UwA8N953x70/JU93NTAJtwHii1ypEAdx/iOYgDELAy1vmmzssCjiELS0y+gpghbWb81lF9na+P4QCE8e0ZFgJsZx50H4hPrrrCBUcZNZ5UjFBWUOkhxBKivwCQ57LH5GhbYhzLTYhE/xP3E3P0ZmaY8xOpFdKPKUpt4EbEhey2VIcXViT5AkUR4PavIZdyu/lTL5t0A0pHcCAdX9r2gxuwSjnUK88LiqdLzHC7ULwOxAiKj3FXqyc38I35hJ1/STKX331jc90DKXHtV3w9q8leTLJcHcG0NztWh1B2ElxFp91NYWxb3q7BDGrByp3jzrp1Aw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000041, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 31, 2025 at 9:53=E2=80=AFAM Johannes Weiner wrote: > > On Sat, Mar 29, 2025 at 07:53:23PM +0000, Yosry Ahmed wrote: > > March 29, 2025 at 1:02 PM, "Nhat Pham" wrote: > > > > > Currently, systems with CXL-based memory tiering can encounter the > > > following inversion with zswap: the coldest pages demoted to the CXL > > > tier can return to the high tier when they are zswapped out, > > > creating memory pressure on the high tier. > > > This happens because zsmalloc, zswap's backend memory allocator, does > > > not enforce any memory policy. If the task reclaiming memory follows > > > the local-first policy for example, the memory requested for zswap ca= n > > > be served by the upper tier, leading to the aformentioned inversion. > > > This RFC fixes this inversion by adding a new memory allocation mode > > > for zswap (exposed through a zswap sysfs knob), intended for > > > hosts with CXL, where the memory for the compressed object is request= ed > > > preferentially from the same node that the original page resides on. > > > > I didn't look too closely, but why not just prefer the same node by > > default? Why is a knob needed? > > +1 It should really be the default. > > Even on regular NUMA setups this behavior makes more sense. Consider a > direct reclaimer scanning nodes in order of allocation preference. If > it ventures into remote nodes, the memory it compresses there should > stay there. Trying to shift those contents over to the reclaiming > thread's preferred node further *increases* its local pressure, and > provoking more spills. The remote node is also the most likely to > refault this data again. This is just bad for everybody. Makes a lot of sense. I'll include this in the v2 of the patch series, and rephrase this as a generic, NUMA system fix (with CXL as one of the examples/motivations). Thanks for the comment, Johannes! I'll remove this knob altogether and make this the default behavior. > > > Or maybe if there's a way to tell the "tier" of the node we can > > prefer to allocate from the same "tier"? > > Presumably, other nodes in the same tier would come first in the > fallback zonelist of that node, so page_to_nid() should just work. > > I wouldn't complicate this until somebody has real systems where it > does the wrong thing. > > My vote is to stick with page_to_nid(), but do it unconditionally. SGTM. >