From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F18DC5AD49 for ; Fri, 30 May 2025 16:52:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DAAC06B0176; Fri, 30 May 2025 12:52:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D5B4C6B0178; Fri, 30 May 2025 12:52:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BFBB36B0179; Fri, 30 May 2025 12:52:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A1F0C6B0176 for ; Fri, 30 May 2025 12:52:56 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4F0EEE9E51 for ; Fri, 30 May 2025 16:52:56 +0000 (UTC) X-FDA: 83500168752.23.D9D2E55 Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by imf17.hostedemail.com (Postfix) with ESMTP id 74EFC4000A for ; Fri, 30 May 2025 16:52:54 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FfBYC6As; spf=pass (imf17.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748623974; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=L6ceonNyxc//yae0QTPj5bc1vLf7f1njr9qW4dTFnws=; b=t+efR4DbJbwjClIagEPLuSWicTDw8XZ8g5edd2ysbZM6Nfq30T574BV4rpdsj5gOjEHsfB eW++UtrW7N3bE1SnoqKCr47HNa09QQL2w2pcFyqtV+Cyo6igPZFDxZ0H7j+I2A7ZeX1bf3 DQT6xxVxcjMsJwhnx5MBxUQOyJ14cdM= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FfBYC6As; spf=pass (imf17.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748623974; a=rsa-sha256; cv=none; b=Gdcviyw8AABsr8olUu9YtCZrhxGvODLcB7tIvNouObHfqQqC5mllEBJTUzCg12s1f7oNyQ dvalRnSB1CPtFTpMPkO2U7RP7JhyO/7eWquXA4pWLSPYjR5OX+b4RBCWAfH2S6VjOlnJjt AjrG2ZnRVFXZ/1NeNkLfGGPX/Qo7cpQ= Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-6fab467aad1so26830286d6.0 for ; Fri, 30 May 2025 09:52:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748623973; x=1749228773; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=L6ceonNyxc//yae0QTPj5bc1vLf7f1njr9qW4dTFnws=; b=FfBYC6AsbJgTN7sdiOoX/PHsE1bqZJ689Uzkjltx4BTuKv/hA9qtwxpnEw+D1GMV0K hk/NFUMPA26jF3zfMM+1bRc7WSHddRbk1OFoGjPNO5LExtBsvdsL9Cdpz9eMo3MOoyGt jRYnp37Fesi/p2q+8rB3kG0ojBBYP8GtXeBsZOq46tQGq/DIT1APcu9BckStcbHuxfX/ rqN362HVttRhWxJxh9XX7tErMKeNtZOeIVOuLqoOPMK1Svm2T3xtAcv5D2YEBs/dOeKA UDnZuB1H/qB0+L/9GaJnqvHADCfZsjuvipyLCC9bgG1GPo66hWXxYz9yr8CgQckVAZq0 gAWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748623973; x=1749228773; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=L6ceonNyxc//yae0QTPj5bc1vLf7f1njr9qW4dTFnws=; b=JuTuHuLNfvK+OxRSdq2Pxxwu2PjJBvUBEOzxCswW5XkXTSAV6VK8x+hJn03OqQi6zX aDP1cfC/awC5k/ec0CVG9EIHroMknkSb40OB9ekGj64QjSBVHYIN/hPh+p3K+Yte8OjG m2R6IAphJFWXom5zS+sBKepjJlbW6cxWAw0dlIN9EJ3wKgaUXaKPWFWDh2lO3UTnk+6c uPVBTLfgq4rTjKB0t289ZZ9in3IexD4RLKJ+SCn2oUje09BppqaMK2rnaP/drMyAdGu4 nQOqSSqdPi0hQdLi9Hriy+wRZi4OV4rxCLBgoeO7WzKdhMJq2pZzw7E7RPF6jyM+pbRm aS1g== X-Gm-Message-State: AOJu0YygsqepgPYLdtHXr5VqNhQa6NlDj+IVnT2yaaDViOPiccJwgtlc ljBJgKD/Kd0YzOSS8hpPC1zL3q/nD4olCSw9/Iz6QUEbbA53SstXeoSXPT4HORRB1MLpsRj1jPw jvNtvIUpBBFlSj9hpGN3s5F3jtwys2Us= X-Gm-Gg: ASbGncvmoVQPWtllU01y7HH1WDkrpn09BFMjVayKK0UdWi/xP17gOMJ8qX1IEkb/knk vwQheiTm9NZZsu1NRaxNaUnOMBCchTYNSUIvfpEN51AX7YtIgoQ3qlBMkMVHVxuvv8lJATPM54G Cb1U8ADmGpqT5atAXI2yzMji3q2WBtvopHHS+xRQjKH8F1 X-Google-Smtp-Source: AGHT+IHoTU8m4OB7wecTn9blPTc4VRTJ944sSaekMzsq1udHX0CGv9f8K0w3JZqp7Phzxa8vH39KHZr9pZZW9bpjbZA= X-Received: by 2002:a05:6214:300b:b0:6fa:ce87:2302 with SMTP id 6a1803df08f44-6fad1ac72acmr41551796d6.40.1748623973310; Fri, 30 May 2025 09:52:53 -0700 (PDT) MIME-Version: 1.0 References: <20250429233848.3093350-1-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Fri, 30 May 2025 09:52:42 -0700 X-Gm-Features: AX0GCFtz4cKX2BeWIy2JZWlB5tliwPvoCYexZVuGP618T9LnyL05hYBxApuOXM0 Message-ID: Subject: Re: [RFC PATCH v2 00/18] Virtual Swap Space To: YoungJun Park Cc: linux-mm@kvack.org, akpm@linux-foundation.org, hannes@cmpxchg.org, hughd@google.com, yosry.ahmed@linux.dev, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, len.brown@intel.com, chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org, huang.ying.caritas@gmail.com, ryan.roberts@arm.com, viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de, lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu, pavel@kernel.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-pm@vger.kernel.org, peterx@redhat.com, gunho.lee@lge.com, taejoon.song@lge.com, iamjoonsoo.kim@lge.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: tto4kdmzi3ntiqu4heu9a9149eneyf7b X-Rspamd-Queue-Id: 74EFC4000A X-Rspamd-Server: rspam11 X-HE-Tag: 1748623974-892073 X-HE-Meta: U2FsdGVkX18UVV5XL+1N8Sw95oCcipIj4ezZIQOa9Lzi6GnVneAP9xnyyFdEdM2qOUBw7+nhSbp453CQ4gjJ8Z5IRpQT+qjb/R6HwHnkY7ODAeuw+u1P4aQqjWh9172uMzel+TATTc/tAcOUPIMoDvmoeUor+kxT2d7Vnx6b4kePEGM3SvnR31gpcfubBfqPl1baI8ZKYsubvuwJ2i/zbZUHq8vphZcoGtfMt7yUxG8Sk+/QcERBQFHFTPKa57AC0CU3pGrguPNlcGXGE+w6SR043vI+aUA1nTNs/o25o5M3FZ+FePx4jJOt+6tR2CtL3Q17sVOx8QZrkZ6UbeBgHES8O7Cc7HHa0c0ARC8BIWOReF6mVzyI6T1LSrJvSXbXxu27lw3j/p9XDKzDdHXUAVcxdEJJB2BXA6BS28KUCaPw1uEpbF8AhFWSClrVsrQE0Z13BHs0g/MwbRxspyFwe5ICdG56E8AqqAC6/+zCELs0Y4tJdVU3sOSy5OyzTz91XdpjwEXz7a0nCLJZ19xfVoEc4CakHm5dqLNMcnfVcReu1pHdRuSzPWXEBP6mLOenl+pnjxOW/TWD9n451ZER0gHjGTH6FuvoJoOxWl2kUo0XSdrBRNVnjc81niz0VsIFNlzeDjU+iKHaMEz7XbCwNBTNuXYl7ryNt7de69MEvefZJqpPI08lLqo8S6IR2MavRN3QXJ/LBPPdSBfK6K+duxFquXYnRvjmsIaG/wadOeCZOXSnkEEATyR1cCDZcyqy/Hl2+gsR4KELGaMkNfazCHm+J7dZ+ZiVVTOXlJn3JsrCWL4eOlSQVbk5rMWhcgM5dO080yUvWi4dC7akdHMl0eXTdsX1KkR6LdYQRjPam3abB4D6ZExls4/+EiEpdFyeCZ7EUXNumTjGXnkFPX2MExmVKE6ikUyBOTNWl8+sdl9yYkwe0iyRUbQhsToJTrAcMDGbp7x7Mw3y4q2B4eY Mhr10A+V OkQjoUHQ5veZDlY+nLDBDkrTv09/37r+OyRmCUxQZTudSdIYU32em9+uWxaRc30mg5oVaq7TvltaXUz8iM8h1vjC17DFGiwUSvftUE/VYQJTVe4Coko/d7dqz4JfMhEmbY4v0LY/1iRYfFR0QSydqBYskaeFJj83n3/LFfpNcKRTPPZ6xSFik0hcIDFBkn1BahVt3RX5+q7TZP/3WOATXrvV9Aj79gksu6oeFy/QWXgm2jA2GqwcgL6nlR9CGE67qyv/9+4UhuTzCriu5GkB1H1z9IRwcxBq9Z3y9PvS6oLmkgAilVokXe5HH7ViFmdTtZLS2KjPtnKoRQ4fYiiArylDurTtC86qQmCG40gCcMdgOyDIF9iw2J6Hy04i/6o8C7ubS1EOKSIaLJusPKCta3QGitvDmP0I7TJDc/Fu59SiPVQ+rdkpU5X8fVxSeFwGx+u2EUK6ndpj7A50JJxhKEX9pYw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 29, 2025 at 11:47=E2=80=AFPM YoungJun Park wrote: > > On Tue, Apr 29, 2025 at 04:38:28PM -0700, Nhat Pham wrote: > > Changelog: > > * v2: > > * Use a single atomic type (swap_refs) for reference counting > > purpose. This brings the size of the swap descriptor from 64 KB > > down to 48 KB (25% reduction). Suggested by Yosry Ahmed. > > * Zeromap bitmap is removed in the virtual swap implementation. > > This saves one bit per phyiscal swapfile slot. > > * Rearrange the patches and the code change to make things more > > reviewable. Suggested by Johannes Weiner. > > * Update the cover letter a bit. > > Hi Nhat, > > Thank you for sharing this patch series. > I=E2=80=99ve read through it with great interest. > > I=E2=80=99m part of a kernel team working on features related to multi-ti= er swapping, > and this patch set appears quite relevant > to our ongoing discussions and early-stage implementation. May I ask - what's the use case you're thinking of here? Remote swapping? > > I had a couple of questions regarding the future direction. > > > * Multi-tier swapping (as mentioned in [5]), with transparent > > transferring (promotion/demotion) of pages across tiers (see [8] and > > [9]). Similar to swapoff, with the old design we would need to > > perform the expensive page table walk. > > Based on the discussion in [5], it seems there was some exploration > around enabling per-cgroup selection of multiple tiers. > Do you envision the current design evolving in a similar direction > to those past discussions, or is there a different direction you're aimin= g for? IIRC, that past design focused on the interface aspect of the problem, but never actually touched the mechanism to implement a multi-tier swapping solution. The simple reason is it's impossible, or at least highly inefficient to do it in the current design, i.e without virtualizing swap. Storing the physical swap location in PTEs means that changing the swap backend requires a full page table walk to update all the PTEs that refer to the old physical swap location. So you have to pick your poison - either: 1. Pick your backend at swap out time, and never change it. You might not have sufficient information to decide at that time. It prevents you from adapting to the change in workload dynamics and working set - the access frequency of pages might change, so their physical location should change accordingly. 2. Reserve the space in every tier, and associate them with the same handle. This is kinda what zswap is doing. It is space efficient, and create a lot of operational issues in production. 3. Bite the bullet and perform the page table walk. This is what swapoff is doing, basically. Raise your hands if you're excited about a full page table walk every time you want to evict a page from zswap to disk swap. Booo. This new design will give us an efficient way to perform tier transfer - you need to figure out how to obtain the right to perform the transfer (for now, through the swap cache - but you can perhaps envision some sort of locks), and then you can simply make the change at the virtual layer. > > > This idea is very similar to Kairui's work to optimize the (physical) > > swap allocator. He is currently also working on a swap redesign (see > > [11]) - perhaps we can combine the two efforts to take advantage of > > the swap allocator's efficiency for virtual swap. > > I noticed that your patch appears to be aligned with the work from Kairui= . > It seems like the overall architecture may be headed toward introducing > a virtual swap device layer. > I'm curious if there=E2=80=99s already been any concrete discussion > around this abstraction, especially regarding how it might be layered ove= r > multiple physical swap devices? > > From a naive perspective, I imagine that while today=E2=80=99s swap devic= es > are in a 1:1 mapping with physical devices, > this virtual layer could introduce a 1:N relationship =E2=80=94 > one virtual swap device mapped to multiple physical ones. > Would this virtual device behave as a new swappable block device > exposed via `swapon`, or is the plan to abstract it differently? That was one of the ideas I was thinking of. Problem is this is a very special "device", and I'm not entirely sure opting in through swapon like that won't cause issues. Imagine the following scenario: 1. We swap on a normal swapfile. 2. Users swap things with the swapfile. 2. Sysadmin then swapon a virtual swap device. It will be quite nightmarish to manage things - we need to be extra vigilant in handling a physical swap slot for e.g, since it can back a PTE or a virtual swap slot. Also, swapoff becomes less efficient again. And the physical swap allocator, even with the swap table change, doesn't quite work out of the box for virtual swap yet (see [1]). I think it's better to just keep it separate, for now, and adopt elements from Kairui's work to make virtual swap allocation more efficient. Not a hill I will die on though, [1]: https://lore.kernel.org/linux-mm/CAKEwX=3DMmD___ukRrx=3DhLo7d_m1J_uG_K= e+us7RQgFUV2OSg38w@mail.gmail.com/ > > Thanks again for your work, > and I would greatly appreciate any insights you could share. > > Best regards, > YoungJun Park >