From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EDD7C54FB3 for ; Sun, 1 Jun 2025 21:08:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A79C6B020D; Sun, 1 Jun 2025 17:08:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 67F1E6B020E; Sun, 1 Jun 2025 17:08:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 597726B020F; Sun, 1 Jun 2025 17:08:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3AFE66B020D for ; Sun, 1 Jun 2025 17:08:36 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DCBC5C129E for ; Sun, 1 Jun 2025 21:08:35 +0000 (UTC) X-FDA: 83508070590.06.5051602 Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) by imf10.hostedemail.com (Postfix) with ESMTP id 1F808C000B for ; Sun, 1 Jun 2025 21:08:33 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=S0vdECjV; spf=pass (imf10.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.170 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748812114; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9NoP05Qsq2uZudieJiCRentyec71gQ2RU6SK/0NLWW0=; b=uaaWVevCmLXKh7rAXiMP4ZoSU7/m0jaJav3O8XvQbKUCdAx8jzOwnztg8l79BfRZEal4Ix alpd3sS7RQP5WJQYQWaImSl86wvhTecQshzcy3Dp/guqWriyFIrGF8WxicD99VfvzVTQEy SK5cCOO4dNciKaaxf8wMG8l+5tU+5NY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748812114; a=rsa-sha256; cv=none; b=XzVUPzeXp2Rf0ddI8Tx+Y5hkmoG0ALauAuRPKgPM/Uz25PztHxiqr9L+avhuC4alXSZdBG vVKZuvVcudtjjpXsK/8BfIQ9n13YYymP7rdiTyxxEwchZjNsqG3Z0+CCSFNTlfz+kMxwfn vbInOTC99hjgiOlo/rRGYQJ71iULST4= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=S0vdECjV; spf=pass (imf10.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.170 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-7c59e7039eeso557696085a.2 for ; Sun, 01 Jun 2025 14:08:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748812113; x=1749416913; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9NoP05Qsq2uZudieJiCRentyec71gQ2RU6SK/0NLWW0=; b=S0vdECjVeC+uZG4aet4vWzECVlPyXUDh170Fsfs4ZnMoK0bdr/kA3Zq8FDdB0sJFLt MG4P4IYZW/UtQV/IlVoaYLPRYuLhZrWvoBybYfsyVR3OtA3q5ACYzGIxzKjZAkNMbFIg Au4DPsuGC48gllR7QK6XXP5SR15SNNb5+4UTnKH1H5c5p96gqivq30Euov/B0yj6JJ09 x1IkA4u9odqNom1TeFBPFxrdYi085UBzWBsAdgW875t1gpMo5StwksFv+AVqbK389e9+ tuU9zrUDg2LKwW7h1B4n9E+MnLZl2IvUyfkzL+Qs7VrAYcYYkSJw+c4iJ/xb/56C6Wwe 52Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748812113; x=1749416913; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9NoP05Qsq2uZudieJiCRentyec71gQ2RU6SK/0NLWW0=; b=sOwIdjhJPVoZItmZdkQxB7KYaBKGtGjZFvhRIbnDGQPhhxo8mr1/SFRMe3/y1PgaM8 uRtrHMGhHSPMVWSZRpP/MnX4jDBXP+tK8mkR7oo498xH/eMtWBhdccpTegJhEebTXxbC 6cW0pfBRmjWJmSInjAGV7OoHkysCEW8CgPPhMn/kCzvbw2DOFQWwPP+1ze3M7leQo4tG E9QMI0/Z6yUV0jAIu8FPxeh64gR07urAlZqMH+HqiNkODnuunkKBZEqD0DSVXL0r4SSO XyLm3sHrYqQ4NfkjtnJFf+Jx1OpPtNPUdoFtD/jt1rBUPJ51t6w5s81/lNRoITaXO0ZU BmsQ== X-Gm-Message-State: AOJu0YxU2LZXrLnh/biXAPiy8eU3DhBIb78H568EEmBmQS7pHImrAwT5 ZUF1DJWHF/G4PybiPzVRDk8A4NQdI07Y9vmQ1gON8EwmAFZU4OpmWILaRKHcQGSMECydB/PxdSH jdZaKjbCNHbc/ey0VmC92zCaGXu1WelE= X-Gm-Gg: ASbGncsYuTHShs2IRgew7lWdNPVPdpwtNtvJ2f98YfLXhonl9z3zB4xULI1EVE1DJu/ 5Yx50OtYsBQgraBSh61hTd/9/t+bcYreNMdoMzBQGDrNjN4j5NI48DU5N7hQEaRgDr7fbS2tyJv tQDkN4N/LHI4RT7wtiDrJHU/IJNCGJwitGvQ== X-Google-Smtp-Source: AGHT+IHxlNiEI48RrXYIzjtR+5FeE3qMDaF/VMFFLOZ37w2QgHT7hJKBRJx5ce9frjNKjtzDZPYHRGlUZGA6Q/zDpDE= X-Received: by 2002:a05:620a:2a01:b0:7c7:827b:46bf with SMTP id af79cd13be357-7d0a4e3cc82mr1374737285a.39.1748812113011; Sun, 01 Jun 2025 14:08:33 -0700 (PDT) MIME-Version: 1.0 References: <20250429233848.3093350-1-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Sun, 1 Jun 2025 14:08:22 -0700 X-Gm-Features: AX0GCFtF6d0uXcv46vS-wy8bPQYtvuz0ynLLrUw6yssIOl32RhKsTE62_m874IM Message-ID: Subject: Re: [RFC PATCH v2 00/18] Virtual Swap Space To: YoungJun Park Cc: linux-mm@kvack.org, akpm@linux-foundation.org, hannes@cmpxchg.org, hughd@google.com, yosry.ahmed@linux.dev, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, len.brown@intel.com, chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org, huang.ying.caritas@gmail.com, ryan.roberts@arm.com, viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de, lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu, pavel@kernel.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-pm@vger.kernel.org, peterx@redhat.com, gunho.lee@lge.com, taejoon.song@lge.com, iamjoonsoo.kim@lge.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1F808C000B X-Stat-Signature: 5j3h8to877jhrdwmtkujrd84zqyccp4j X-Rspam-User: X-HE-Tag: 1748812113-22732 X-HE-Meta: U2FsdGVkX18C3xNNElyHeKaD7p/JYDuqhNFWYzogRFXuYyKlDcCOTCbR0CvfCih7VGb4R1pbgE4PGCR6wm8UXs6+fu1DyfIsB0YD+IeW6IDLD3rIX2zYmB+9zswrFSrNP/Pdg8AKTf1l6pSIGd7QSQsAH1NkeHejO/fFGfQ89Qq/1xWOTLo0tW0+kp1ldbgMFfFSODEg5oYlWuzm5tsiN/X1G40LdAkW/eezPNC9T9vnyv8r2WIfAVnwaQDXdbFVXrtdqQNlvdL83MTJsPgwIwRvNcPTerBYh9G1CAO7QlvVvRs3JqVDGvPhOWq4oJe2A+RSmRhulFJmjVxsc2oxK3Qw4odBwFUL+En3VnocyoQuBWriJpwZfwGpan02a53PUFxp7eq3+85jNFjnLIJziyz33k7zFaFwopGut9wtKr2QnKsM+QtK7n27cM8wCImktZ9O7gMQJnvG7SrCpMz6ZDmWaiyzpPEiAvbdSWtjnoj5z30S+OCKr5kpiz75wVKJFth/3AVJrEDnx0fE9VoExs7MEVI+D19x8tQKspbxCvrwdzcqjMfdXNNLaZ/dyr+NbuIdqzo+Yl9dSn+T7F/rHUMN+MXSm/Mabx4Oesxk6z/L3TKRse7zXFSksAsNWSkOfrzr+SioYYZ2F0/QdeHB5NqCzp434dGEAy4M6ni64cwykaQt7tPRkQec7pEKx9jxvGAsqFx9uU9ShAjHF+ztm7+/ybQ5HVfY6Q9n/moHJGffpwTUNUpVEjfwXKSegoaeSVeGhhwNB47S/yQyLrCLxVQMyNre4+0mZKaSWvX2kxvnXDUIpMsP3lbpLN02lhJCwHutCzjhWJbhMf71teRGx+7U2cALmuW1EzF+rau22uzU2p3WxMhXeOww3krBJfc7/BhNgzQfMrZbYx8/JVDDwhuSQN+AMdKmDNe13SIP1wqppzEqBF7MIetaIATy9T0Bx4H4BE9DBiEOPW3sLD2 dle/th+L MWzQ7jqxzP6trY0JZ4wuMdhb1sQGSAdBz8N5QCunNYd3+Etp9hBya9WU4br2BQgxQuuXncTGo6oMITrUBCfClFJAWVV/097gZZT4bWtj6tHYQabyoBrLmQ72Wv4Z5VDrTYjib1xhKEOBziTwMu+f0ox5TdBfMjbNU0eoSnWjU9xg27ctLF5H87ef3k8FYfzORoqRBpsMfa2jet4/zKVnQChID2jHNwqiy1E94YD7ejIXXoCK4r2eDWTha6U0ygCaPSVI4mdyKxk0hwjP2mPQBPpwv8noxcmcfegpdfZouGwtndOuYDIlVUKr/m3dOljm0XuAErwLMwGyC3EIKrMyLdSDvDwUpjRWM0LLzi0iBP/mec6fIQGFpcMeV7tDHKlv0y4FgsVzazU8vXM2PAG8Yon0bJZ4x93x1HEzT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jun 1, 2025 at 5:56=E2=80=AFAM YoungJun Park wrote: > > On Fri, May 30, 2025 at 09:52:42AM -0700, Nhat Pham wrote: > > On Thu, May 29, 2025 at 11:47=E2=80=AFPM YoungJun Park wrote: > > > > > > On Tue, Apr 29, 2025 at 04:38:28PM -0700, Nhat Pham wrote: > > > > Changelog: > > > > * v2: > > > > * Use a single atomic type (swap_refs) for reference counting > > > > purpose. This brings the size of the swap descriptor from 6= 4 KB > > > > down to 48 KB (25% reduction). Suggested by Yosry Ahmed. > > > > * Zeromap bitmap is removed in the virtual swap implementatio= n. > > > > This saves one bit per phyiscal swapfile slot. > > > > * Rearrange the patches and the code change to make things mo= re > > > > reviewable. Suggested by Johannes Weiner. > > > > * Update the cover letter a bit. > > > > > > Hi Nhat, > > > > > > Thank you for sharing this patch series. > > > I=E2=80=99ve read through it with great interest. > > > > > > I=E2=80=99m part of a kernel team working on features related to mult= i-tier swapping, > > > and this patch set appears quite relevant > > > to our ongoing discussions and early-stage implementation. > > > > May I ask - what's the use case you're thinking of here? Remote swappin= g? > > > > Yes, that's correct. > Our usage scenario includes remote swap, > and we're experimenting with assigning swap tiers per cgroup > in order to improve specific scene of our target device performance. Hmm, that can be a start. Right now, we have only 2 swap tiers essentially, so memory.(z)swap.max and memory.zswap.writeback is usually sufficient to describe the tiering interface. But if you have an alternative use case in mind feel free to send a RFC to explore this! > > We=E2=80=99ve explored several approaches and PoCs around this, > and in the process of evaluating > whether our direction could eventually be aligned > with the upstream kernel, > I came across your patchset and wanted to ask whether > similar efforts have been discussed or attempted before. I think it is occasionally touched upon in discussion, but AFAICS there has not been really an actual upstream patch to add such an interface. > > > > > > > I had a couple of questions regarding the future direction. > > > > > > > * Multi-tier swapping (as mentioned in [5]), with transparent > > > > transferring (promotion/demotion) of pages across tiers (see [8] = and > > > > [9]). Similar to swapoff, with the old design we would need to > > > > perform the expensive page table walk. > > > > > > Based on the discussion in [5], it seems there was some exploration > > > around enabling per-cgroup selection of multiple tiers. > > > Do you envision the current design evolving in a similar direction > > > to those past discussions, or is there a different direction you're a= iming for? > > > > IIRC, that past design focused on the interface aspect of the problem, > > but never actually touched the mechanism to implement a multi-tier > > swapping solution. > > > > The simple reason is it's impossible, or at least highly inefficient > > to do it in the current design, i.e without virtualizing swap. Storing > > As you pointed out, there are certainly inefficiencies > in supporting this use case with the current design, > but if there is a valid use case, > I believe there=E2=80=99s room for it to be supported in the current mode= l > =E2=80=94possibly in a less optimized form=E2=80=94 > until a virtual swap device becomes available > and provides a more efficient solution. > What do you think about? Which less optimized form are you thinking of? > > > the physical swap location in PTEs means that changing the swap > > backend requires a full page table walk to update all the PTEs that > > refer to the old physical swap location. So you have to pick your > > poison - either: > > 1. Pick your backend at swap out time, and never change it. You might > > not have sufficient information to decide at that time. It prevents > > you from adapting to the change in workload dynamics and working set - > > the access frequency of pages might change, so their physical location > > should change accordingly. > > > > 2. Reserve the space in every tier, and associate them with the same > > handle. This is kinda what zswap is doing. It is space efficient, and > > create a lot of operational issues in production. > > > > 3. Bite the bullet and perform the page table walk. This is what > > swapoff is doing, basically. Raise your hands if you're excited about > > a full page table walk every time you want to evict a page from zswap > > to disk swap. Booo. > > > > This new design will give us an efficient way to perform tier transfer > > - you need to figure out how to obtain the right to perform the > > transfer (for now, through the swap cache - but you can perhaps > > envision some sort of locks), and then you can simply make the change > > at the virtual layer. > > > > One idea that comes to mind is whether the backend swap tier for > a page could be lazily adjusted at runtime=E2=80=94either reactively > or via an explicit interface=E2=80=94before the tier changes. > Alternatively, if it's preferable to leave pages untouched > when the tier configuration changes at runtime, > perhaps we could consider making this behavior configurable as well. > I don't quite understand - could you expand on this?