From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9802EC369A2 for ; Tue, 8 Apr 2025 15:44:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25C676B0085; Tue, 8 Apr 2025 11:44:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E5036B0088; Tue, 8 Apr 2025 11:44:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 086986B0089; Tue, 8 Apr 2025 11:44:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DCF7C6B0085 for ; Tue, 8 Apr 2025 11:44:07 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6B3705A2E4 for ; Tue, 8 Apr 2025 15:44:08 +0000 (UTC) X-FDA: 83311297776.16.153E806 Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) by imf22.hostedemail.com (Postfix) with ESMTP id 815BEC0012 for ; Tue, 8 Apr 2025 15:44:06 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JaymKttO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.49 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744127046; a=rsa-sha256; cv=none; b=rXiXz2ANxenTkL0UMZdP2xsVB1p1imw5KGcjj5xV2jdxELsFWY33f9C51m89LODihDyUEw InriLgFI331LWY4riQmHiODcfF5FMHwcnC7v50Pt/yuQaFe38ihS9xQA+B5002eCZdQ6BQ CflzvnEwFQgoUS+r1gFOkODmO2JXbGw= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JaymKttO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.49 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744127046; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=reeoYvxYItdd7KUhzGsTnl24tBhKlbJEGhQp4n5yZys=; b=6l7vgP/LBvcA2tpo5anZVu1VqHS1+6J0XYr+syuB3tqmQqqtIBlHbzCiDVdrcV3Zuf42Nm 3XgfmqF3cC8XhhO39PmOydsnxdiryh3SyHM6L+gk0PxrMZm1U45ImlQyAJr6xp1lNMkIGw HOnqh+pkGdA2FxUMPrwTRFhnkNqWpXE= Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-6eeb7589db4so60205116d6.1 for ; Tue, 08 Apr 2025 08:44:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744127045; x=1744731845; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=reeoYvxYItdd7KUhzGsTnl24tBhKlbJEGhQp4n5yZys=; b=JaymKttOaH7JJL4woZ426mHkfChoLh4wx8oUP2eIQPT8fNrRr0DCJo67q2PPUMNYhQ cXQfX8KUNijBMJRTLDCd3RK2f0w5avSiiNCJRWMQNHQ7xUbEY2xHuZ5H5g008ALeq8Ix ogUeOVe28efD/WnFUtXQ4WNVpH+oikunzefRk3v8kU7wS5eAqL3qJ4P1BXOvrJx15Dfr U63ANBPCJtROTLbklmjxKsl/xCVBC7bV8NGTS9piV62JeaXSv32NdPEe6TW0o+YNLVr8 AoM5Zeunvi3lyE+Uc7kOgLAiVj69V98p0p1mrDHtUoNIsAV7dsrLz5R9b85W4/FXSQAz qdOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744127045; x=1744731845; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=reeoYvxYItdd7KUhzGsTnl24tBhKlbJEGhQp4n5yZys=; b=iozeHaXJDaOKhWFVWAclMLU3ta/sKCbQK1wlDQ9/j5zsMs4NBovJOGUl103vbTj7ld ddvhEwex19URthqJ2KYjC4opKag5lbNXra6iRUjTGJ6xmTQiZaVuIee2kn833ypemYjR Wb+S9vO7sxUNOaLepvxwL0yMLBcuPtRcKBI3wDR0WS9LaAbnAgJgQRyGj+omjb0EY6Dp pDOoqW517rxHuM7ZEkB+q7GfrbHxyvcJNkUPAcjcp/NTdWeD27GzdPddrAVp3JBAh3xW 9PTAgqQtCwqoKdtfYrYNdWCaSUPf4wUImp0fgVik1V2cP6/fdzy03Z1imyLHr+tDHKfu wHSw== X-Gm-Message-State: AOJu0Yy5i3XXos+v1AaKiv/jQfRCGY/zElvmR7kLPyGzWB88lOwRGCzR dMvSMrT9smAKsUdhWclRN9crS1Y8sMMaDXlR4t9p8tyU2FHPqc/YjJ3DTK3fXmTn4Dy9NQUVPX+ 3eQDoFk0CqQHb6WonmBILAQF+8kQ= X-Gm-Gg: ASbGnctdXS/wYjWIHqK5wQfYSGpwRovbMC0Cum46ruBJqhfMi7UVN+ApJjqwuum2G9Z YJgT5KzjYxW7t8/xWAWct+XuD5l3d0xw27BOY1Ksjv39EcgQwguwGHUklUnwHzuaxYf51HKXqdr D092chXjImxr7bzXMC/QMiJ4r6S35c0u9d2vR45n74Eh7Rye4xv0lW7Q12pQ== X-Google-Smtp-Source: AGHT+IF5WXdcFK/qrko60GZWMJ9z/lrRCbPJa3p0lyBkNeVsUG2gDnLUBctnRZfwUhghIHT3QEEGbxR/fcwxhONO0xY= X-Received: by 2002:a05:6214:20ce:b0:6ed:1da2:afb9 with SMTP id 6a1803df08f44-6f05848f418mr344645726d6.19.1744127045531; Tue, 08 Apr 2025 08:44:05 -0700 (PDT) MIME-Version: 1.0 References: <20250407234223.1059191-1-nphamcs@gmail.com> <20250407234223.1059191-5-nphamcs@gmail.com> <20250408150019.GB816@cmpxchg.org> In-Reply-To: From: Nhat Pham Date: Tue, 8 Apr 2025 08:43:54 -0700 X-Gm-Features: ATxdqUGz4vsP2W5hwYJQzhrTKN5NxXh5yEJOebR2aaMkEy7r_wMz7Iwxi5kHDXM Message-ID: Subject: Re: [RFC PATCH 04/14] mm: swap: swap cache support for virtualized swap To: Johannes Weiner Cc: linux-mm@kvack.org, akpm@linux-foundation.org, hughd@google.com, yosry.ahmed@linux.dev, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, len.brown@intel.com, chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org, huang.ying.caritas@gmail.com, ryan.roberts@arm.com, viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de, lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu, pavel@kernel.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-pm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 815BEC0012 X-Stat-Signature: 99dqrwmkmpnc3rpk8zi664k3do5whine X-HE-Tag: 1744127046-200792 X-HE-Meta: U2FsdGVkX19JXTjPNXKagSEo/UYmyag0Y4MxFY+M3jqYkiDbBxPHicVmmH8ccvwwhIic+UpYyCm85VV09SIH4axBt5GWCvDsr+CQpWaW+KxONzSlRTwHhW7sBWbpQR3xX0dvog9Zb9V60FeWGcG6Ujo79v0ntRZ1zEGoE39uWfDPXrmqwDp7lWDsyu7pp90TC/t152908Pg/dJ3ODat7YDInzaYoqL0B4WBagfqAd9l4SIPMke3W6dlLbVeOlsk3ROodZesRjuGL3QpG8LGpLoddnPuLkl0z2rzLa73h90yvqZevwX0kBupppRvr+ipz13POxTyizSvrjJCzNthBO6EUeeXVpsvhoDMZkcOZer78O0jVo9ERnqpes6EMHpPN5JG4bieZmhcklbISCWLGCt8mpMViPm7ODBtBdzZqexNioLXqz9E3v5HMcHBVaWg5UdrAjYOGWilwRAEvPxm2XwiZ4P0Kc2MUN+IDJUFqxl+zD67z7AYcfNypQc7Db8tSViDmCCmSXxrfNgOX1aBq6wiQd0rNCQ700mcSeMWR6+wR/azcRugnF3rG/1diEWRp1caDAKkRdZbTXU19fVWo7SmbSFEaok3Sn6lJRZnA1qUQu219dGzl+wnRqivm101QA60pjIRSIfXEXRCqEo3X5oW+mG8YEZrxH0TEFcT/k04DFLjXEEDjtViq5NUvDJCm4ymj5juyuDECLSO7QyDlmr5yMQ1qHuQ8nf9f4FxqI3qP+0nMW6+j4QQAWfwLMtcq8cFtTGTOZelxPPIlHBfKMS2IMN6P7vywbL5CdUoASORlhs/76ZUOB0gdwmT16RJjftEHtFwcjCgVfKa9VtH35KOZZtcNtTtAi2JqqjJa8qz3OOvSH1/F+Q47oKnsVSFRlr4TIX0L5JLRcnJw8LzFe9gfWtuSx743ZnffRiirJHvQfJuZF3RRVRspsB0CMrhO1CxpuU5Mtp3FUmXsK0x +UTQXMxu opYvUl+OZDrlmZYXi1PrA/hH3F8nYzZS46AvPe8L8kQdGudiGgrK4fHJiMP2U5Eh2O2INJSd2lbeayFkcZiDjw6DXXr1p4SaYG62xBEQZRsJ88duCLThcNMGlcr2rJzyfbtNjJoqKIYT0nk9LYMvqPmNicHL/DYF0DULcOzqVKLoO/s2u35kP8WQ4VKxCOzqeeHBdYXkMdVBivBYxyRb99oai6bVLuEAkoc9RSHVw4XC5MwHeMyK5v4EaDA2snO/2yXHb/5TQ8zhjHmxmYKhDYVP5rAAe8EHeHzOmt0EsH9kn+aBacKASfMCOfUTjOFO4HC+6xKdHU0QJD4kI2QKxpGYvsdn2IUX8qoyEZ0wQFeSjvou4CePfI4so/10FpXLhoybd6f5sZXwhvZK5Qg3Hh4YFoMp/dS2n5ohVZLuLk3UsimFfXdi/jvqTD7AwjgwdOCVpcVNFGupJ3TwOJiEPscz5zH0a7NRP6KlT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 8, 2025 at 8:34=E2=80=AFAM Nhat Pham wrote: > > On Tue, Apr 8, 2025 at 8:00=E2=80=AFAM Johannes Weiner wrote: > > > > On Mon, Apr 07, 2025 at 04:42:05PM -0700, Nhat Pham wrote: > > > Currently, the swap cache code assumes that the swap space is of a fi= xed > > > size. The virtual swap space is dynamically sized, so the existing > > > partitioning code cannot be easily reused. A dynamic partitioning is > > > planned, but for now keep the design simple and just use a flat > > > swapcache for vswap. > > > > > > Since the vswap's implementation has begun to diverge from the old > > > implementation, we also introduce a new build config > > > (CONFIG_VIRTUAL_SWAP). Users who do not select this config will get t= he > > > old implementation, with no behavioral change. > > > > > > Signed-off-by: Nhat Pham > > > --- > > > mm/Kconfig | 13 ++++++++++ > > > mm/swap.h | 22 ++++++++++------ > > > mm/swap_state.c | 68 +++++++++++++++++++++++++++++++++++++++++------= -- > > > 3 files changed, 85 insertions(+), 18 deletions(-) > > > > > > diff --git a/mm/Kconfig b/mm/Kconfig > > > index 1b501db06417..1a6acdb64333 100644 > > > --- a/mm/Kconfig > > > +++ b/mm/Kconfig > > > @@ -22,6 +22,19 @@ menuconfig SWAP > > > used to provide more virtual memory than the actual RAM prese= nt > > > in your computer. If unsure say Y. > > > > > > +config VIRTUAL_SWAP > > > + bool "Swap space virtualization" > > > + depends on SWAP > > > + default n > > > + help > > > + When this is selected, the kernel is built with the new= swap > > > + design. This will allow us to decouple the swap backend= s > > > + (zswap, on-disk swapfile, etc.), and save disk space wh= en we > > > + use zswap (or the zero-filled swap page optimization). > > > + > > > + There might be more lock contentions with heavy swap us= e, since > > > + the swap cache is no longer range partitioned. > > > + > > > config ZSWAP > > > bool "Compressed cache for swap pages" > > > depends on SWAP > > > diff --git a/mm/swap.h b/mm/swap.h > > > index d5f8effa8015..06e20b1d79c4 100644 > > > --- a/mm/swap.h > > > +++ b/mm/swap.h > > > @@ -22,22 +22,27 @@ void swap_write_unplug(struct swap_iocb *sio); > > > int swap_writepage(struct page *page, struct writeback_control *wbc)= ; > > > void __swap_writepage(struct folio *folio, struct writeback_control = *wbc); > > > > > > -/* linux/mm/swap_state.c */ > > > -/* One swap address space for each 64M swap space */ > > > +/* Return the swap device position of the swap slot. */ > > > +static inline loff_t swap_slot_pos(swp_slot_t slot) > > > +{ > > > + return ((loff_t)swp_slot_offset(slot)) << PAGE_SHIFT; > > > +} > > > > In the same vein as the previous email, please avoid mixing moves, > > renames and new code as much as possible. This makes it quite hard to > > follow what's going on. > > > > I think it would be better if you structure the series as follows: > > > > 1. Prep patches. Separate patches for moves, renames, new code. > > > > 3. mm: vswap > > - config VIRTUAL_SWAP > > - mm/vswap.c with skeleton data structures, init/exit, Makefile hook= up > > > > 4. (temporarily) flatten existing address spaces > > > > IMO you can do the swapcache and zswap in one patch > > > > 5+. conversion patches > > > > Grow mm/vswap.c as you add discrete components like the descriptor > > allocator, swapoff locking, the swap_cgroup tracker etc. > > > > You're mostly doing this part already. But try to order them by > > complexity and on a "core to periphery" gradient. I.e. swapoff > > locking should probably come before cgroup stuff. > > > > Insert move and rename patches at points where they make the most > > sense. I.e. if they can be understood in the current upstream code > > already, put them with step 1 prep patches. If you find a move or a > > rename can only be understood in the context of one of the components, > > put them in a prep patch right before that one. > > Makes sense, yeah! I'll try to avoid mixing moves/renames/new code as > much as I can. > > > > > > @@ -260,6 +269,28 @@ void delete_from_swap_cache(struct folio *folio) > > > folio_ref_sub(folio, folio_nr_pages(folio)); > > > } > > > > > > +#ifdef CONFIG_VIRTUAL_SWAP > > > +void clear_shadow_from_swap_cache(int type, unsigned long begin, > > > + unsigned long end) > > > +{ > > > + swp_slot_t slot =3D swp_slot(type, begin); > > > + swp_entry_t entry =3D swp_slot_to_swp_entry(slot); > > > + unsigned long index =3D swap_cache_index(entry); > > > + struct address_space *address_space =3D swap_address_space(entr= y); > > > + void *old; > > > + XA_STATE(xas, &address_space->i_pages, index); > > > + > > > + xas_set_update(&xas, workingset_update_node); > > > + > > > + xa_lock_irq(&address_space->i_pages); > > > + xas_for_each(&xas, old, entry.val + end - begin) { > > > + if (!xa_is_value(old)) > > > + continue; > > > + xas_store(&xas, NULL); > > > + } > > > + xa_unlock_irq(&address_space->i_pages); > > > > I don't think you need separate functions for this, init, exit etc. if > > you tweak the macros to resolve to one tree. The current code already > > works if swapfiles are smaller than SWAP_ADDRESS_SPACE_PAGES and there > > is only one tree, after all. > > For clear_shadow_from_swap_cache(), I think I understand what you want > - keep clear_shadow_from_swap_cache() the same for two > implementations, but at caller sites, have the callers themselves > determine the range in swap cache (i.e (begin, end)). > > I'm a bit confused with init and exit, but I assume there is a way to > do it for them as well. > > I will note though, that it might increase the number of ifdefs > sections (or alternatively, IS_ENABLED() checks), because these > functions are called in different contexts for the two > implementations: > > 1. init and exit are called in swapon/swapoff in the old > implementation. They are called in swap initialization in the virtual > swap implementation. > > 2. Similarly, we clear swap cache shadows when we free physical swap > slots in the old implementation, and when we free virtual swap slots > in the new implementation, > > I think it is good actually, because it makes the difference explicit > rather than implicit. Also, it helps us know exactly which code block > to target when we unify the two implementations :) Just putting it out > there. Actually, I think I was confused. At this stage, we have no real difference in the implementations yet - it's purely single tree vs multiple trees. So you're right - we shouldn't even need two implementations of the code... I'll fix this. > > > > > This would save a lot of duplication and keep ifdefs more confined.