From: Nhat Pham <nphamcs@gmail.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, hughd@google.com,
yosry.ahmed@linux.dev, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, len.brown@intel.com,
chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org,
huang.ying.caritas@gmail.com, ryan.roberts@arm.com,
viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de,
lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu,
pavel@kernel.org, kernel-team@meta.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-pm@vger.kernel.org, peterx@redhat.com
Subject: [RFC PATCH v2 09/18] swap: implement the swap_cgroup API using virtual swap
Date: Tue, 29 Apr 2025 16:38:37 -0700 [thread overview]
Message-ID: <20250429233848.3093350-10-nphamcs@gmail.com> (raw)
In-Reply-To: <20250429233848.3093350-1-nphamcs@gmail.com>
Once we decouple a swap entry from its backing store via the virtual
swap, we can no longer statically allocate an array to store the swap
entries' cgroup information. Move it to the swap descriptor.
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
mm/Makefile | 2 ++
mm/vswap.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 73 insertions(+), 1 deletion(-)
diff --git a/mm/Makefile b/mm/Makefile
index b7216c714fa1..35f2f282c8da 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -101,8 +101,10 @@ obj-$(CONFIG_PAGE_COUNTER) += page_counter.o
obj-$(CONFIG_MEMCG_V1) += memcontrol-v1.o
obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o
ifdef CONFIG_SWAP
+ifndef CONFIG_VIRTUAL_SWAP
obj-$(CONFIG_MEMCG) += swap_cgroup.o
endif
+endif
obj-$(CONFIG_CGROUP_HUGETLB) += hugetlb_cgroup.o
obj-$(CONFIG_GUP_TEST) += gup_test.o
obj-$(CONFIG_DMAPOOL_TEST) += dmapool_test.o
diff --git a/mm/vswap.c b/mm/vswap.c
index 23a05c3393d8..3792fa7f766b 100644
--- a/mm/vswap.c
+++ b/mm/vswap.c
@@ -27,10 +27,14 @@
*
* @slot: The handle to the physical swap slot backing this page.
* @rcu: The RCU head to free the descriptor with an RCU grace period.
+ * @memcgid: The memcg id of the owning memcg, if any.
*/
struct swp_desc {
swp_slot_t slot;
struct rcu_head rcu;
+#ifdef CONFIG_MEMCG
+ atomic_t memcgid;
+#endif
};
/* Virtual swap space - swp_entry_t -> struct swp_desc */
@@ -122,8 +126,10 @@ static swp_entry_t vswap_alloc(int nr)
return (swp_entry_t){0};
}
- for (i = 0; i < nr; i++)
+ for (i = 0; i < nr; i++) {
descs[i]->slot.val = 0;
+ atomic_set(&descs[i]->memcgid, 0);
+ }
xa_lock(&vswap_map);
if (nr == 1) {
@@ -352,6 +358,70 @@ swp_entry_t swp_slot_to_swp_entry(swp_slot_t slot)
return entry ? (swp_entry_t){xa_to_value(entry)} : (swp_entry_t){0};
}
+#ifdef CONFIG_MEMCG
+static unsigned short vswap_cgroup_record(swp_entry_t entry,
+ unsigned short memcgid, unsigned int nr_ents)
+{
+ struct swp_desc *desc;
+ unsigned short oldid, iter = 0;
+
+ XA_STATE(xas, &vswap_map, entry.val);
+
+ rcu_read_lock();
+ xas_for_each(&xas, desc, entry.val + nr_ents - 1) {
+ if (xas_retry(&xas, desc))
+ continue;
+
+ oldid = atomic_xchg(&desc->memcgid, memcgid);
+ if (!iter)
+ iter = oldid;
+ VM_WARN_ON(iter != oldid);
+ }
+ rcu_read_unlock();
+
+ return oldid;
+}
+
+void swap_cgroup_record(struct folio *folio, unsigned short memcgid,
+ swp_entry_t entry)
+{
+ unsigned short oldid =
+ vswap_cgroup_record(entry, memcgid, folio_nr_pages(folio));
+
+ VM_WARN_ON(oldid);
+}
+
+unsigned short swap_cgroup_clear(swp_entry_t entry, unsigned int nr_ents)
+{
+ return vswap_cgroup_record(entry, 0, nr_ents);
+}
+
+unsigned short lookup_swap_cgroup_id(swp_entry_t entry)
+{
+ struct swp_desc *desc;
+ unsigned short ret;
+
+ /*
+ * Note that the virtual swap slot can be freed under us, for instance in
+ * the invocation of mem_cgroup_swapin_charge_folio. We need to wrap the
+ * entire lookup in RCU read-side critical section, and double check the
+ * existence of the swap descriptor.
+ */
+ rcu_read_lock();
+ desc = xa_load(&vswap_map, entry.val);
+ ret = desc ? atomic_read(&desc->memcgid) : 0;
+ rcu_read_unlock();
+ return ret;
+}
+
+int swap_cgroup_swapon(int type, unsigned long max_pages)
+{
+ return 0;
+}
+
+void swap_cgroup_swapoff(int type) {}
+#endif /* CONFIG_MEMCG */
+
int vswap_init(void)
{
swp_desc_cache = KMEM_CACHE(swp_desc, 0);
--
2.47.1
next prev parent reply other threads:[~2025-04-30 0:55 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-29 23:38 [RFC PATCH v2 00/18] Virtual Swap Space Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 01/18] swap: rearrange the swap header file Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 02/18] swapfile: rearrange functions Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 03/18] swapfile: rearrange freeing steps Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 04/18] mm: swap: add an abstract API for locking out swapoff Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 05/18] mm: swap: add a separate type for physical swap slots Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 06/18] mm: create scaffolds for the new virtual swap implementation Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 07/18] mm: swap: zswap: swap cache and zswap support for virtualized swap Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 08/18] mm: swap: allocate a virtual swap slot for each swapped out page Nhat Pham
2025-04-29 23:38 ` Nhat Pham [this message]
2025-04-29 23:38 ` [RFC PATCH v2 10/18] swap: manage swap entry lifetime at the virtual swap layer Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 11/18] mm: swap: temporarily disable THP swapin and batched freeing swap Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 12/18] mm: swap: decouple virtual swap slot from backing store Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 13/18] zswap: do not start zswap shrinker if there is no physical swap slots Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 14/18] memcg: swap: only charge " Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 15/18] vswap: support THP swapin and batch free_swap_and_cache Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 16/18] swap: simplify swapoff using virtual swap Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 17/18] swapfile: move zeromap setup out of enable_swap_info Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 18/18] swapfile: remove zeromap in virtual swap implementation Nhat Pham
2025-04-29 23:51 ` [RFC PATCH v2 00/18] Virtual Swap Space Nhat Pham
2025-05-30 6:47 ` YoungJun Park
2025-05-30 16:52 ` Nhat Pham
2025-05-30 16:54 ` Nhat Pham
2025-06-01 12:56 ` YoungJun Park
2025-06-01 16:14 ` Kairui Song
2025-06-02 15:17 ` YoungJun Park
2025-06-02 18:29 ` Nhat Pham
2025-06-03 9:50 ` Kairui Song
2025-06-01 21:08 ` Nhat Pham
2025-06-02 15:03 ` YoungJun Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250429233848.3093350-10-nphamcs@gmail.com \
--to=nphamcs@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=christophe.leroy@csgroup.eu \
--cc=hannes@cmpxchg.org \
--cc=huang.ying.caritas@gmail.com \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=pavel@kernel.org \
--cc=peterx@redhat.com \
--cc=roman.gushchin@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox