From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D221C3601A for ; Thu, 3 Apr 2025 16:38:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B7E7280005; Thu, 3 Apr 2025 12:38:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 065FF280001; Thu, 3 Apr 2025 12:38:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6FF8280005; Thu, 3 Apr 2025 12:38:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C6187280001 for ; Thu, 3 Apr 2025 12:38:23 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DFED655280 for ; Thu, 3 Apr 2025 16:38:24 +0000 (UTC) X-FDA: 83293290528.06.47BEA39 Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) by imf15.hostedemail.com (Postfix) with ESMTP id C669BA0017 for ; Thu, 3 Apr 2025 16:38:22 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=F8Ky+Ml7; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf15.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.178 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743698303; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8kHE37hxmRaBTotqSR2RNNVMRoKL4Qo16Blmg/ywGtk=; b=zsszWoEz91JR6DZECnFOSdwRG/C0j2IK0VktlfCaT3ElVrMp+D9KRhd6VQCuwbV8P1vxpT mkwVwCJZwyT4/fHkTDJOPddBeOjxCEQMt9Vjh1JeYqxFxK+B8v4NoFty4Kbt+w2FpomfS8 KzFqulrpWAB51N7Zu8Fm9cgjS1HNqBo= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=F8Ky+Ml7; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf15.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.178 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743698303; a=rsa-sha256; cv=none; b=jq9M8brU3PblyrxQbFLC4l4ZCx+bhxpv+hVcuJUjjLe6RVZO4dspmPuIQBELfL2EgRqDaJ JD+MgTK5Hjk38ij+ekCestLO56eblVjIS0eJ4qz26Ela5CLKEqsHWwl9RZvG5ESHUx8Nfz ppF6xrtnRnqisLPdALUcqlMdjwQM/bs= Received: by mail-qk1-f178.google.com with SMTP id af79cd13be357-7c081915cf3so128147785a.1 for ; Thu, 03 Apr 2025 09:38:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1743698302; x=1744303102; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=8kHE37hxmRaBTotqSR2RNNVMRoKL4Qo16Blmg/ywGtk=; b=F8Ky+Ml74TERh1sVGrjLlM4c5E9spjVC7dVkd5ZBzlk8tu33Zk8l+81KJBiY0F0QD+ 8VXbAuXskSoygLarnKMNizYiDwWUNweoun8ifdD4R1wm72IAwLv6pQPqGOpfbn8TZv2t ebsX3Oq6U1Y0mPdIIL/Wito+OsIcgDKY0v62DqdgqOKZkPC2/uom95MFZ+LQb2pHZJZ9 IpuQ4h61aYoVKVbz+NGuL5i5KGRdxdMi0vf1rum+QxTXuSJwtP4zu0HSvS1/ewtXIqWl gPi5upTeSynu/BGLIRW0s8Pm1lADtXlaXUNyFGnKSQzd1izuJ8e3tBCDKRDn/DxLZIEI wldg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743698302; x=1744303102; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8kHE37hxmRaBTotqSR2RNNVMRoKL4Qo16Blmg/ywGtk=; b=t388EVSuHNj82FOHBu4MNous8m/j2BJ++8dqIKxKpSkylnz0V37Ay1LbutivFNJMDq ddahNiATQPr6V9h/rzkTbBYuXKLyzry8FyR1bMAio+Bu5EFrNLkQpRX3MkBXLOEl1hXw B28zAbGgLpucPuoNdghXZp51ezWkG3ufZon3GObu0DL8PLnBHmRJEqqQhYLrFhqY1bd/ oKXHpwD6zehGF8riR2T33Ve8A+IW4OodAflkc7moUIPKNMT89ZF69H6Dgknpckck2osp DH2EuNWH5EX3pq6Vyzc1wi2F6yWUjrW78K0NlVP61i6+UFCk4ZDnXTltyeC7Cn9qoZYP wSgw== X-Forwarded-Encrypted: i=1; AJvYcCWCbg9jj+6I8HpFIAjpEd+/j+yMwLgMavhr1dscOuxY7kLZVflyYVVkG7MO5A2+lbTYyRgCnEKdgw==@kvack.org X-Gm-Message-State: AOJu0Ywn0MvBmGlGWkffZUgy4KcKAmpfTSojQnvfS8vJWnPDlfswReOV 2MlU5pQRsGz0Hv2abov4RQGfJ61XPNykhHQ9aZ2OSmKfZ1NmpEkMFswHzKPiNo8= X-Gm-Gg: ASbGncv4HCKSH+xNueldvYYDjN2JNPb71ifnE7d6P62XgY4BXZsG5+686DM1nqa5sxO rPcRMWj27xyQAjD3zTt+9oZyl0y0pyhJyBFPioWIn2mIplUfQm2zE4oLSmiEp0eqdBN65FuZoMn imH209haCoPKb0Qh0x8IHqjWVtRQnk/9aSyFVIA9KNBR+053dn1xXzx1BtAX1PDREHZXBmkJyzY tJ4P72rDpDkTR1mea1K1ZKC+vZZkH/bHeNHQFuYZJaPCLHagov7OE1eG7QCkNU/mnQ5lPpRwoR/ oN4g1FBZ5O2GLWWz3M2PTYKH9wZMRKzNGb3nfvKujbY= X-Google-Smtp-Source: AGHT+IEEsdFf1LaMB2GezCtiKDLY6ywrkzjcE11zVM3JMzDGnvIDIL//sh7UXZqYvqqen1XIm32Nsw== X-Received: by 2002:a05:620a:2617:b0:7c3:c3bb:2538 with SMTP id af79cd13be357-7c76df6874bmr420292985a.14.1743698301747; Thu, 03 Apr 2025 09:38:21 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id af79cd13be357-7c76ea9691asm95683085a.111.2025.04.03.09.38.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Apr 2025 09:38:21 -0700 (PDT) Date: Thu, 3 Apr 2025 12:38:19 -0400 From: Johannes Weiner To: Nhat Pham Cc: akpm@linux-foundation.org, yosry.ahmed@linux.dev, chengming.zhou@linux.dev, sj@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, gourry@gourry.net, ying.huang@linux.alibaba.com, jonathan.cameron@huawei.com, dan.j.williams@intel.com, linux-cxl@vger.kernel.org, minchan@kernel.org, senozhatsky@chromium.org Subject: Re: [PATCH v2] zsmalloc: prefer the the original page's node for compressed data Message-ID: <20250403163819.GA368504@cmpxchg.org> References: <20250402204416.3435994-1-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250402204416.3435994-1-nphamcs@gmail.com> X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C669BA0017 X-Stat-Signature: 67zp773qioo41gkx1zw4z1twoji945c7 X-Rspam-User: X-HE-Tag: 1743698302-930384 X-HE-Meta: U2FsdGVkX18NH1FDxDAXPlIYJcDZ/CwQBqeUIGcaUAt7LkR7p6C87mVDhajuj/03jXxLvEmkMtbl2blLP6JnjEorOswLTmFz5Ak66rPYB9bjehDHtXxr2AQoPTVWP+Kn15ITogBO0P5KhKHMyQqRG5JBSgV8ulJZEZcv99VOX/oGzUdWlFLtLYRAjmdGU0XwM2fqWoj6iSAua9pj9LmeHntZnqf7T1DICh63ICBQ4RyIDz45hjz0oNtybCkU2fo0U8i62DVaKsBoUSyyHI+XoUYeQ7A6MfmzrCmSWWD1giOhh3zKkHr4iDBsS+0qePJ2dJBjLklH1LOgfk6cQCgpzUu9joSnXg57Z61TV8virA04OAK3DjvfkF7hip1Fjc+EWhYNinahUB1K1lN7JPMcTXZNmpPcLnFliOQMJVwViG5hoOxjDgLorgkYH4sZnj2tbrcrNc/fXiaRfuHZnhLG1RixAgILXGW4Qop/BCxvU5NeoRVFdE2qBtr+Yp3BOJ48cekCIyd38Qok6Gx07YK3hdyfu5uSTEid1c5BAj5NvZXyFcn9NHN+adzpxEQSHyvlbvO5UnULxwy0Qax88ZyA3xDHj0SG/trd3mcEBubbmzt/vAic4b1pr8PdHQdrGEXo8uc21rNps+kQnzzeGstGp4Af74ax8XSsYMnB5Kxa+it6yx21ug8m4cEuWwk+tv6yQcZdeR1Io2uo0e7oYKnYSQ+ohVhNaS16zjGM0eOwAtMVbAR8jmfNIwuXrWdaB1Ran0LYVmsMmcs0+NmUz2iIhJs4+MLi7RzDHR67lyLe/XPIPE9Q3MJdkuuu1lImWrrwW/yopQ6YqiB75u2qVsM9dHel6amtEcXxk5PUnUDkzvRuDxE8+TOG3eyjTBDLoqjKFPSWKPqrfSCO3ZEqMin+Ss0v9N2wKeDAZP5rovRk2tQzCms0avHg7P/x/QAEm8f1syJEw5AWd57jvqCj+9H DHlYlmBt m9d6YHQmOcR8I1xEd2NmEnEKTJ6zLycXmY/NvxIHralBqSXVm9S+WQTyWTLD7uH9UUBV/IsjT1UBETQkrJYc5l0hsKZJsyewxWiNfAu3LvwGoBGdor81l9CgEF4AQQMVhV9u28NY/Q+jmCAc3Fo9b1zHg49S3zQoUeOLkhJ94av/5gr3hPyxGXmUxZb0fEBkdC9fj6jAj1Yb5rBtq/zM0B1HdeJfiRcjCC0x9JittjnmP35FqipNoCcyK6PRdUDjZmgVzut+tFYWc6+pyLOaJpd60B/m4uV8u+4nwBkoap/5hrrk6IFyv9hBkF25/SMHChjpyjY0Nt2tBUXFO3dIrYPdXaY5EhRhRPMqvp1RvT0T7etMmlsNLz+PS4JkN+9AF67HH+kXFQ4ZV8OB58oATLkW3A/raqxB6eU6emL5s741lQ3VbHEVi2gamlAMSLrCkIiToo51fXAFFveDUpvcRRSErJTqFPc92GlseAuHzpVCH9MBsFA402daabFQOcZSHhPVu2qeLksk9BVtPRgSNgavHHBCDD+z7pA4fEn0fLu2o2QrpP/DKq9E38mV8A1ApPeVRldhgEuScSN5p2jy8vuhAkQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 02, 2025 at 01:44:16PM -0700, Nhat Pham wrote: > Currently, zsmalloc, zswap's and zram's backend memory allocator, does > not enforce any policy for the allocation of memory for the compressed > data, instead just adopting the memory policy of the task entering > reclaim, or the default policy (prefer local node) if no such policy is > specified. This can lead to several pathological behaviors in > multi-node NUMA systems: > > 1. Systems with CXL-based memory tiering can encounter the following > inversion with zswap/zram: the coldest pages demoted to the CXL tier > can return to the high tier when they are reclaimed to compressed > swap, creating memory pressure on the high tier. > > 2. Consider a direct reclaimer scanning nodes in order of allocation > preference. If it ventures into remote nodes, the memory it > compresses there should stay there. Trying to shift those contents > over to the reclaiming thread's preferred node further *increases* > its local pressure, and provoking more spills. The remote node is > also the most likely to refault this data again. This undesirable > behavior was pointed out by Johannes Weiner in [1]. > > 3. For zswap writeback, the zswap entries are organized in > node-specific LRUs, based on the node placement of the original > pages, allowing for targeted zswap writeback for specific nodes. > > However, the compressed data of a zswap entry can be placed on a > different node from the LRU it is placed on. This means that reclaim > targeted at one node might not free up memory used for zswap entries > in that node, but instead reclaiming memory in a different node. > > All of these issues will be resolved if the compressed data go to the > same node as the original page. This patch encourages this behavior by > having zswap and zram pass the node of the original page to zsmalloc, > and have zsmalloc prefer the specified node if we need to allocate new > (zs)pages for the compressed data. > > Note that we are not strictly binding the allocation to the preferred > node. We still allow the allocation to fall back to other nodes when > the preferred node is full, or if we have zspages with slots available > on a different node. This is OK, and still a strict improvement over > the status quo: > > 1. On a system with demotion enabled, we will generally prefer > demotions over compressed swapping, and only swap when pages have > already gone to the lowest tier. This patch should achieve the > desired effect for the most part. > > 2. If the preferred node is out of memory, letting the compressed data > going to other nodes can be better than the alternative (OOMs, > keeping cold memory unreclaimed, disk swapping, etc.). > > 3. If the allocation go to a separate node because we have a zspage > with slots available, at least we're not creating extra immediate > memory pressure (since the space is already allocated). > > 3. While there can be mixings, we generally reclaim pages in > same-node batches, which encourage zspage grouping that is more > likely to go to the right node. > > 4. A strict binding would require partitioning zsmalloc by node, which > is more complicated, and more prone to regression, since it reduces > the storage density of zsmalloc. We need to evaluate the tradeoff > and benchmark carefully before adopting such an involved solution. > > [1]: https://lore.kernel.org/linux-mm/20250331165306.GC2110528@cmpxchg.org/ > > Suggested-by: Gregory Price > Signed-off-by: Nhat Pham Acked-by: Johannes Weiner