From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AB5FC3ABA9 for ; Wed, 30 Apr 2025 00:55:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 14A176B00D2; Tue, 29 Apr 2025 20:55:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D8F56B00D4; Tue, 29 Apr 2025 20:55:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF11A6B00D6; Tue, 29 Apr 2025 20:55:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B79876B00D2 for ; Tue, 29 Apr 2025 20:55:07 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D7705C1514 for ; Tue, 29 Apr 2025 23:39:05 +0000 (UTC) X-FDA: 83388699450.06.269CC47 Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) by imf25.hostedemail.com (Postfix) with ESMTP id 0C7B9A0007 for ; Tue, 29 Apr 2025 23:39:03 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="V2vjD/V+"; spf=pass (imf25.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745969944; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tMZjxyYXXDk1gidKnzSq/7a4QhnBooZEU1T9UUeNwgw=; b=LV2GtzVgMStTxBqvL6jCcAtdwHugpH8AxpApRgovO0h3TTY5pLcz8WXe9OW0rVuPbJwZwb 8G6OE5Za3xJwu5Tm+hAiLTaXr9tjAJa++I/NnQucTFW4qIWA+RZ03IcQRhzmZlQLY5hsGU 5mpNo/ewdf2VBG37EG1Ozo17J2oALPU= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="V2vjD/V+"; spf=pass (imf25.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745969944; a=rsa-sha256; cv=none; b=aMZcqsxy74TrVj/kgWK/SGCz8YvMikv2yyEQUucorxDZT3MAdDvd5TA2mPQhcNrLgFXoTS P4rcMBPfx1lm3PinzLKo9yAtOGE22EtRmQ4lWaQmU4NaaDpQP/X96uGBLURF+wQWFEQq7q uLn3zirp7LAmEn9IsLFUrrpprqwgNtg= Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-7080dd5f9f1so45230737b3.0 for ; Tue, 29 Apr 2025 16:39:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745969943; x=1746574743; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tMZjxyYXXDk1gidKnzSq/7a4QhnBooZEU1T9UUeNwgw=; b=V2vjD/V+t8W2kTMtqdjR6FjvaxWFugcfNQWDhhNnUSNKesfIm54V4a9Gd9rP4fs9PA j5GyNZy1den3Rxr6ZJlmwc/ZM4d6RyIcKaObF2ifGUJ03FqVv9rNMf5x1pTC0vnrW0Hk 2Xfo0CtylXJlsXa8rbot9NdFOIz5c9+jvTnNWjDxwsSosEtkOsb7SC1SN5xycEhX2aQo cFrHA7asWxybwplsH3ThY9VVzgwNTcBOK4wdVums3KvqkXjvgI0X7p77lU0Wy5QTCoGO 0ZVPmRI+KoZSSz0dRQ7Nq/rMU4JlKzW94wK5GFCMJyrpNRTr7xd+OVvau1XvYydKrYSm 7e2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745969943; x=1746574743; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tMZjxyYXXDk1gidKnzSq/7a4QhnBooZEU1T9UUeNwgw=; b=PghiROVOnQCIkw5LpG4u43j5zX7ssFSTlmLbm/MFZ9FJzF5hsMyJGMYxl3VzmOq+oK E02Oj3gokSUKdOaLgm0m4SZ+YxQuC2j4DtEQxvB8hWp0SzhqH0NtVf0YAb0COkptUHjA iFolzsXkpNoIWb+PoltzUWSBkza9tDnutoM3GRbf/GzXYCpI95LmjtbIGQ7ouOhK+jiq MI0XgQxeOKEkm01WOW57TFIrCK0VnYjcGLrmRjcgK5Eh0QIYNWpR0X/YYFFGWZ1I1JKC 0oWE7uZhiQvaAIRDuGV3k0jAkOGcAbsbsbsGQ8ahXfxUdqiwBqJJ/m9FjMyuB3FEtChu Oa4g== X-Gm-Message-State: AOJu0YwDGE+PRSt/C1YvQ8pO/GQ8pfwA9u1Pz/G8Tu/WNVw4E6Zr72r2 TgWYzSh4W7Ssnu0i7XZMwN5bsMBzoO4ekihvNltOE4HYiFN/KYcGeDFo4w== X-Gm-Gg: ASbGnctEJtfP2psf3PHQE4rwRqcMpKd84/41zE4WDt4EaGOQhB4xKfda1GsP7L5VDyj xo99YCWlro7hjRM1thnenDSU19LokHF3GIliYB2fhDMjmMJL4FBUCZx+rWF3lhcgkC02JnvrvR2 Wr0aUiSfxKnSqqo7fTVTQzv+EGF7STW/tRgThfC0/0YrfELo4zioAt70G11+ZSBQ2uqjVO61HVQ w+NSPtHmInH4SmmFYuvboIRxu8MF1pRuyDY4JcW5S41esrmPBzq/AsuYvKoIHubDV6Y2kNJ1YZ5 LFpwVz5cUPf4KtUXr5BAieOfLlQfCQ0= X-Google-Smtp-Source: AGHT+IGELX37yPB7pWIFLWi40woXsWK753X8DNcmjzLObYxT76mYAq69pGlTXHByb9H+htTw1h1vuw== X-Received: by 2002:a05:690c:6e01:b0:6fb:a4e6:7d52 with SMTP id 00721157ae682-708abe47dfdmr21530177b3.35.1745969942983; Tue, 29 Apr 2025 16:39:02 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:7::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-708ae1e9ea3sm701667b3.105.2025.04.29.16.39.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Apr 2025 16:39:02 -0700 (PDT) From: Nhat Pham To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, hughd@google.com, yosry.ahmed@linux.dev, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, len.brown@intel.com, chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org, huang.ying.caritas@gmail.com, ryan.roberts@arm.com, viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de, lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu, pavel@kernel.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-pm@vger.kernel.org, peterx@redhat.com Subject: [RFC PATCH v2 15/18] vswap: support THP swapin and batch free_swap_and_cache Date: Tue, 29 Apr 2025 16:38:43 -0700 Message-ID: <20250429233848.3093350-16-nphamcs@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250429233848.3093350-1-nphamcs@gmail.com> References: <20250429233848.3093350-1-nphamcs@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 0C7B9A0007 X-Stat-Signature: 5gaoc9cpfs5t87kkmhq9daj4kp8gfwcs X-HE-Tag: 1745969943-711754 X-HE-Meta: U2FsdGVkX1/8RBRa8hf5meTrRpbD39r1zB40qaClzkl5xmWDEZk7ai4hQlmvn//G1kP/2OngOl74B0Pqod0MrGgLHEGr5YO1bL74V2LkL166KE/WSJJQMXpqhb3UrAtj6Z52GlaZ7yywsijGI6cj/k4WFjVIPrmZzJDbHgHdySjo5Jii+59EW1wBiGlXcIm6rIrqEqPm+CIO/63Adrm9qwAd29zrNY1XXs4sVU4IkXjamH65N9xVDTJIanCW5eLe9LIPnm9c4FZufzhINClned6SY4ry3rYr6JMC9zDN9+qFi//jnlC7L5RBmoSy97XUgw5rK6jhJ53ozK7sgVyqEUy+aJih9L5CgIaa6Fs8jS6JDRNssWWonQ1zFgI8/cMOLWhRnNzCTJ4zEYQrx/k7MR7s7nJJ1cEO892BBQnK2C6bGb+8kTSsGoeUwhQgWOHRxpiLYgvyUDsqEkGhR3vFVywwuTDahJQkZOeE72pGsFWDW/Wegd0COEZhjYzwKx/eVGCnPX9acoEy5n0Z4e37/rDBOgC3yHqe9pZIMJfOwGSAYPh71Cady1J7UZUBigIzC+rGI5t7wNcHznglWa7Coo4ptPe99WqCWtAQn7hKzexMq1tRgMSdmwFrwCXYrZ6Az0vwfiInSTfsqknHLkRMwJEgthiMzZ4ygg5o4Z7SqYWQOjn4lv8eBQVWIalpMzHDZ6+y8+n/k2hoLXH7rOsmxrJhBl6jd9jMt04bJnDpMZ3+xwPMXbtZlvkiHzZ/9BfXuFyBjsm2+iepSe+DzfqCndEwYym9ahV3DXqcLunaboCoRAozkWGkGgaZY1ldZBNLC52CoBaTsic652OAx4JM7LGjD/VZE+4ADAQGcxjQZ1YrCG1QbPifG1VotmduICxgcBRtucG7Syl3qadTY0GbWARaNwMzaMuaYjMbq7kls77p3IZtWac23pGjXDcp8lqpIq0qX/PZtkpMM9MidDU 7WC/dJFL HZqF0VAaZ97ePXImbj/xmKVFRdIw/Uxb/GS6cs7i6+NRwCs/aqjsIDFl4mAb8Qx+18dZZgEdQGN5+zKLSlndf2501XIUaLHwJaXDo7QkjmYNbY/92FcJp+9wpjpM+kOWeQlJkK8Gw1/SqDmvkWooeZGeIguiwW30YlSr0x35ow2nUcvl4oBRQAQbmBcV5cPMx5UganaBYI9nCJHzSJ2Ce8ZhFYy9BH3wi9zzM4sUQovZZM4df0hpIEkdBaoxdS0Lx/HeJRCSPJ5rDOSUiSVj9I31XMv7CarHXgjj1JEq8Ml036uBJbXq1G7oxW09Ifyn9S98yc80ByjXrOc+xdzbSTLyL4eKAld22I4ppbe/iqHKXlGCXVVLpYMhNADXYCOp3PZm+VWUMNWW123m+zwNaOuq+V41tQ4soRM0kJP9/88Nq/VBJs/tUFCOKfSdbWogQag3kN32YF8pbJb89h5L7CiBT4Nk095dnuEGDjxTkJNdvDg+qBJBmnVz91t/YQOd0BKR0ojrCsTPUCBjKR/5ebBoiKeKihUWsXKj6VUJAAjp1j/Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch implements the required functionalities for THP swapin and batched free_swap_and_cache() in the virtual swap space design. The central requirement is the range of entries we are working with must have no mixed backing states: 1. For now, zswap-backed entries are not supported for these batched operations. 2. All the entries must be backed by the same type. 3. If the swap entries in the batch are backed by in-memory folio, it must be the same folio (i.e they correspond to the subpages of that folio). 4. If the swap entries in the batch are backed by slots on swapfiles, it must be the same swapfile, and these physical swap slots must also be contiguous. Signed-off-by: Nhat Pham --- include/linux/swap.h | 6 +++ mm/internal.h | 14 +------ mm/memory.c | 16 ++++++-- mm/vswap.c | 91 +++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 110 insertions(+), 17 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index a65b22de4cdd..c5a16f1ca376 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -773,6 +773,7 @@ bool vswap_folio_backed(swp_entry_t entry, int nr); void vswap_store_folio(swp_entry_t entry, struct folio *folio); void swap_zeromap_folio_set(struct folio *folio); void vswap_assoc_zswap(swp_entry_t entry, struct zswap_entry *zswap_entry); +bool vswap_can_swapin_thp(swp_entry_t entry, int nr); #else /* CONFIG_VIRTUAL_SWAP */ static inline int vswap_init(void) { @@ -839,6 +840,11 @@ static inline void vswap_assoc_zswap(swp_entry_t entry, struct zswap_entry *zswap_entry) { } + +static inline bool vswap_can_swapin_thp(swp_entry_t entry, int nr) +{ + return true; +} #endif /* CONFIG_VIRTUAL_SWAP */ static inline bool trylock_swapoff(swp_entry_t entry, diff --git a/mm/internal.h b/mm/internal.h index 51061691a731..6694e7a14745 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -268,14 +268,7 @@ static inline swp_entry_t swap_nth(swp_entry_t entry, long n) return (swp_entry_t) { entry.val + n }; } -/* temporary disallow batched swap operations */ -static inline swp_entry_t swap_move(swp_entry_t entry, long delta) -{ - swp_entry_t next_entry; - - next_entry.val = 0; - return next_entry; -} +swp_entry_t swap_move(swp_entry_t entry, long delta); #else static inline swp_entry_t swap_nth(swp_entry_t entry, long n) { @@ -344,8 +337,6 @@ static inline pte_t pte_next_swp_offset(pte_t pte) * max_nr must be at least one and must be limited by the caller so scanning * cannot exceed a single page table. * - * Note that for virtual swap space, we will not batch anything for now. - * * Return: the number of table entries in the batch. */ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte) @@ -360,9 +351,6 @@ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte) VM_WARN_ON(!is_swap_pte(pte)); VM_WARN_ON(non_swap_entry(entry)); - if (IS_ENABLED(CONFIG_VIRTUAL_SWAP)) - return 1; - cgroup_id = lookup_swap_cgroup_id(entry); while (ptep < end_ptep) { pte = ptep_get(ptep); diff --git a/mm/memory.c b/mm/memory.c index d9c382a5e157..b0b23348d9be 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4230,10 +4230,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) * A large swapped out folio could be partially or fully in zswap. We * lack handling for such cases, so fallback to swapping in order-0 * folio. - * - * We also disable THP swapin on the virtual swap implementation, for now. */ - if (!zswap_never_enabled() || IS_ENABLED(CONFIG_VIRTUAL_SWAP)) + if (!zswap_never_enabled()) goto fallback; entry = pte_to_swp_entry(vmf->orig_pte); @@ -4423,6 +4421,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } need_clear_cache = true; + /* + * Recheck to make sure the entire range is still + * THP-swapin-able. Note that before we call + * swapcache_prepare(), entries in the range can + * still have their backing status changed. + */ + if (IS_ENABLED(CONFIG_VIRTUAL_SWAP) && + !vswap_can_swapin_thp(entry, nr_pages)) { + schedule_timeout_uninterruptible(1); + goto out_page; + } + mem_cgroup_swapin_uncharge_swap(entry, nr_pages); shadow = get_shadow_from_swap_cache(entry); diff --git a/mm/vswap.c b/mm/vswap.c index c51ff5c54480..4aeb144921b8 100644 --- a/mm/vswap.c +++ b/mm/vswap.c @@ -9,6 +9,7 @@ #include #include #include +#include "internal.h" #include "swap.h" /* @@ -984,7 +985,7 @@ void swap_zeromap_folio_set(struct folio *folio) * * Note that this check is racy unless we can ensure that the entire range * has their backing state stable - for instance, if the caller was the one - * who set the in_swapcache flag of the entire field. + * who set the swap cache pin. */ static int vswap_check_backing(swp_entry_t entry, enum swap_type *type, int nr) { @@ -1067,6 +1068,94 @@ bool vswap_folio_backed(swp_entry_t entry, int nr) && type == VSWAP_FOLIO; } +/** + * vswap_can_swapin_thp - check if the swap entries can be swapped in as a THP. + * @entry: the first virtual swap slot in the range. + * @nr: the number of slots in the range. + * + * For now, we can only swap in a THP if the entire range is zero-filled, or if + * the entire range is backed by a contiguous range of physical swap slots on a + * swapfile. + */ +bool vswap_can_swapin_thp(swp_entry_t entry, int nr) +{ + enum swap_type type; + + return vswap_check_backing(entry, &type, nr) == nr && + (type == VSWAP_ZERO || type == VSWAP_SWAPFILE); +} + +/** + * swap_move - increment the swap slot by delta, checking the backing state and + * return 0 if the backing state does not match (i.e wrong backing + * state type, or wrong offset on the backing stores). + * @entry: the original virtual swap slot. + * @delta: the offset to increment the original slot. + * + * Note that this function is racy unless we can pin the backing state of these + * swap slots down with swapcache_prepare(). + * + * Caller should only rely on this function as a best-effort hint otherwise, + * and should double-check after ensuring the whole range is pinned down. + * + * Return: the incremented virtual swap slot if the backing state matches, or + * 0 if the backing state does not match. + */ +swp_entry_t swap_move(swp_entry_t entry, long delta) +{ + struct swp_desc *desc, *next_desc; + swp_entry_t next_entry; + bool invalid = true; + struct folio *folio; + enum swap_type type; + swp_slot_t slot; + + next_entry.val = entry.val + delta; + + rcu_read_lock(); + desc = xa_load(&vswap_map, entry.val); + next_desc = xa_load(&vswap_map, next_entry.val); + + if (!desc || !next_desc) { + rcu_read_unlock(); + return (swp_entry_t){0}; + } + + read_lock(&desc->lock); + if (desc->type == VSWAP_ZSWAP) { + read_unlock(&desc->lock); + goto rcu_unlock; + } + + type = desc->type; + if (type == VSWAP_FOLIO) + folio = desc->folio; + + if (type == VSWAP_SWAPFILE) + slot = desc->slot; + read_unlock(&desc->lock); + + read_lock(&next_desc->lock); + if (next_desc->type != type) + goto next_unlock; + + if (type == VSWAP_SWAPFILE && + (swp_slot_type(next_desc->slot) != swp_slot_type(slot) || + swp_slot_offset(next_desc->slot) != + swp_slot_offset(slot) + delta)) + goto next_unlock; + + if (type == VSWAP_FOLIO && next_desc->folio != folio) + goto next_unlock; + + invalid = false; +next_unlock: + read_unlock(&next_desc->lock); +rcu_unlock: + rcu_read_unlock(); + return invalid ? (swp_entry_t){0} : next_entry; +} + /* * Return the count of contiguous swap entries that share the same * VSWAP_ZERO status as the starting entry. If is_zeromap is not NULL, -- 2.47.1