From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D085C3ABDD for ; Wed, 14 May 2025 20:19:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DCAEA6B00B5; Wed, 14 May 2025 16:19:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D54736B00B6; Wed, 14 May 2025 16:19:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B80236B00B7; Wed, 14 May 2025 16:19:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9395B6B00B5 for ; Wed, 14 May 2025 16:19:21 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 97112121128 for ; Wed, 14 May 2025 20:19:21 +0000 (UTC) X-FDA: 83442628122.07.9BA90FB Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf21.hostedemail.com (Postfix) with ESMTP id 9DACB1C000E for ; Wed, 14 May 2025 20:19:19 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CH+lmTci; spf=pass (imf21.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747253959; a=rsa-sha256; cv=none; b=J35B4DDSpvB87mJ7C4EeKJd4h5A5IApMGnvWqbSbOLeqr3cbJT8jDT0kAvdLJSVdVbHEE9 xZkSqXFOkDd44ZzFtQLXZ/1Eksb5CmB0PvckDp6OPEB73BAS5ZM9H8kmgg7gm/HYFGdYXy 4OBN2g5nHDCAizc6MHpRQA8xUj6qs3o= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CH+lmTci; spf=pass (imf21.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747253959; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oh+cjst6h3YSQmb37WUTUKQ712nx+NXtd16QdB6aMW4=; b=tNfpHHgDqKYvWC23V+Ae7ivESsTLCdAesBdyhPCOabX8DIJHbKqq9K0LKy/qVux1hkPupr 4g3cxXfs4tK3n/SHJxTn3uFsLfJx/6lBFQDuR295xEeav+Mm7nKNsJFYDgPeizkJtvhWuA gX96ovfGEjlnXnNNAQp4bPFdGFLzrRk= Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-306b6ae4fb2so257966a91.3 for ; Wed, 14 May 2025 13:19:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747253958; x=1747858758; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=oh+cjst6h3YSQmb37WUTUKQ712nx+NXtd16QdB6aMW4=; b=CH+lmTcivdnLYrejrZhYw3NdHA6l6ia4DGe5hQludu8IxW1FZ9uyeyyAegIEIbkyQp ggp/4svVjLcRfk/tEQQ6jqx6Xjn15SXHMzJynmLYG14K4mPwfgxJXmyxh9P0/gpAVHwK o8PUDfuke9k0Sg81Tshmc9gqkbho4BjN8HsabPFTmzbUf4fJxnkqhiknbeqXW8qtJDG4 buVUQp1mOp9xPXtiNcWFPugra7Whn9h6CRlwdL1JVPEH0y4tciDkyrPoblA6vSOvM4Cy t1LlcMUiEWxzW86tqsMIM38hGghqijlF7epMJt2lQToUU6ujU4xQa/cKqssLxnAfI69R viTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747253958; x=1747858758; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=oh+cjst6h3YSQmb37WUTUKQ712nx+NXtd16QdB6aMW4=; b=dugSiVH5KONtS+5/Mw656qbSAJXpBMhM51nta23DfKrn+FSlFQ4tYxkF3ixLVOY/rw NbGHss06qXVhmjAhCF+Ac9nIpYjtRrP1LYC+gPaZSe2GRxhG2tdhlgkjVLrYP/0gpHwr 3zkuT6stB/h3Z9dc5wtdTXoIWrQbKVTSqDGIo7BQrtef4GG5LPJVokCyrLjpffjRNTCn GaKNA/XJU0MTWgtHK0sLVr0NfI3cS6AA4BX67qCCNUdIuziUc78VpOLsq0gd/orLl9kJ aXNiLIj9piTbtnHDLgLQCEOGgYT5QafaJSak36FbTHkutPFoaEpVHu1LnpX2t1pXgxe4 43mg== X-Gm-Message-State: AOJu0Yyhdml0Dk49+LHW67U31KgtvZjOY/EBp90P3GpO0TiOuUQXyV8q vCHMM2rToQuHBrUuUIq4/ql+t3qIKv6T4M6KdnZL/TkfU5zgwhZT8M3SEjYVipw= X-Gm-Gg: ASbGncskrxXd7HZzkTUADzv9KTJCjmsdBXRBI8jIPTmZhU19TI4Va28GFkC9nyYSaRm MYRcLt4+xZkq4rHF1aim0fqNsFGHYZ0j2ZzA/P9NE2uVN50cD12tweYl5uvNBWAEOyrYWU7dl8u Jl2nUCSWvGJ4pzUne3HP1FbwxwUt26i60WuqFIKxvnMdKXgoEwOGkoonkzG5eiJdplyFcZDHdnf tnR17tG7NL/dqky1pqFtKF/fUEw/5LcmCCWbHX81hnthLHhF993CjSwSKDVoq73uHy4pQ4dL4WC 3/cLpFFEDEJVYwgOJ+Ixu4KRHapvZvicRW9vYfzLd2qxbEd3jzbWjpLdLUtt3B5A9NSlD9WXdr9 VEjxjnRM= X-Google-Smtp-Source: AGHT+IG8KbXNPdNHSbG8nQm2c4WK5MYgaQbyZetU6IldJXHiONdTsQTcA6uQ/sbpCJkhfw0l0re02g== X-Received: by 2002:a17:90a:d886:b0:2fe:9783:afd3 with SMTP id 98e67ed59e1d1-30e2e59a1c1mr7975238a91.2.1747253957653; Wed, 14 May 2025 13:19:17 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30e33401934sm2003692a91.9.2025.05.14.13.19.12 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 14 May 2025 13:19:17 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Matthew Wilcox , Hugh Dickins , Chris Li , David Hildenbrand , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Baolin Wang , Baoquan He , Barry Song , Kalesh Singh , Kemeng Shi , Tim Chen , Ryan Roberts , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 19/28] mm, swap: clean up and improve swap entries batch freeing Date: Thu, 15 May 2025 04:17:19 +0800 Message-ID: <20250514201729.48420-20-ryncsn@gmail.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250514201729.48420-1-ryncsn@gmail.com> References: <20250514201729.48420-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 9DACB1C000E X-Stat-Signature: ujp9sbs14odskqyyy3rzbpyyet5u48y5 X-HE-Tag: 1747253959-235278 X-HE-Meta: U2FsdGVkX19MB9bWVBq7KL4QC5Dyu+mmgqg68yC+4lIUHrz7sEyWTn6oTTHr0TueFCJdMAszLbkZiLiL/a6lWIzhwFHna3QYdHnR+Osg4zqWpvXvbh3PJMARx/n1eEy0xcy65IjjHnCgLlsPrmv2vAmK4dgnzIJI9a83ZN04cZXAkNqQ5pWwcKaoRJenRxzzdkgGo33aDDVPUKpfQJqa9gHf+OroGVzDp5+o1UOCA1tXUNb92VKadP+9MHMXRZLynEVVpBm8ISAGsDWXaCoQg3E1JL0QBip3aLqkp8xw6sEtuHvkG0HlhJwZnbgXLiwjOArDA28EfN1QOxXHzQVOm6OXFhefYFeaSkVzqEwcf1lqQjwAkmI+mvxZ+wD4xaOPlfYsN1Vy/t9KDaFQ4cjCTFZhUB9OpU5K+IHkU0+EHDDFa4Ul1xuv3x+ONoz8OY8NjabovFeHuiJ+N8FPnOYY4xcYO1EkZjF14kYNgjiqUId0Ina3gjFVyle42XCDNn1VzARPaXfITo6cirA9McH2iqQgPdYo6U4mZ54KtHMvQIU3lTYCaE8aKvlWv3Ccv1HTfVv/fTR2IYjTsW+P4R0FsliyA2mGEmIgdGf+Yv8JgV6Gc+dAyhJ3EV9CPFCIehEjJAKmmJqgJQtZ5w5c9M5t50l8t7JuLjCd+PfbYJULrnIIO57TaXM5RMGm+P6MSg5EWb8K44D04c1G15h2+WRY9WqxwbFpUTOBJygTWBMw7+4MCH6D+E1sbtAMxJdxmLJlmt/Bjf+3Ws2MWUhRpiOhbLht2Cs+ZEgLfkHzvO5He5OjASoddhao56aMh+5U/xlLtCdUM85rZ21yXiHTj8T8tw9sCGgjkGnDhITLGJtRvf3oavT79ny6UMOzQtcZ6bcePR+Vs/uO0S0i1NmqUQFmk3J3SCeTiBbRmt+oxQ6+7XYkzsUcmG1Yjm2vYnRvPrD63LIlaYnOAll5MhX7hdE iii9ugA0 EzzUAfvrTL2aFgS+TqhtcWcyH8HdEpNALaUWtSTooSqNwWF2Kzx5fcgBKrLXkOUa7IOrIEeQTa9wwb50YwF/ulJPc+YG1doAs/yqsCclJYnsUDDVuBGWWaQNrcrsMMFCWlCVIVIJmhfcH9V1HLbjPbnTgWuzroDApQzUWfh9DyD2AlXiSzIXW0h1WPt8J6RrxiSM4FlTHtowdiSvWqYXxk4P725TDCY7b4XyjSW42PBNiDeXOqH9LwcPF/bjrbIvkREeqEt9tsqYqdQ7Z6c6NswsJcCEgmVj9wbB576A6bFmorgPm9YpHqx7B478qWqH4lsqGeKLrbRyXRtn832ej+bNcikmgemoiaLazq9qUpvyz63pNKrPUPeETJCdIHMJEM0zKW4J+4IfK3oKtXiDX52vAtjnezLk16IIVzQuHXL3S+qgl3bS1DsggmmypsXNVe0uI9Wtxxhq/qFsfDoOPSDWYf7XuaJbRsE8RH1XXdpmlEnuBLMxqeUtJWMLWows5q2zyDEiakEpR3nl/Qc856DLUHzbOZw3FrCdbrpSIIRkEEru4DhrUX2q9VDxxeDHvI3uU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Introduce a helper to free up all the continuous entries that has only one last swap entry count left and has no cache. Compared to the current design, which scans the whole region first, then frees it only if the whole region is filled with the same count, this new helper avoids the two-pass scan, and will batch-free more entries when fragmented, also more robust with sanity checks. And check the swap table directly for the cache status instead of looking at swap_map here. Also rename related functions to better present their usage. This simplifies the code and prepares for follow up commits to clean up the freeing of swap entries even more. Signed-off-by: Kairui Song --- mm/swapfile.c | 165 ++++++++++++++++++++------------------------------ 1 file changed, 67 insertions(+), 98 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 0a8b36ecbf08..ef233466725e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -54,14 +54,16 @@ static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); static void free_swap_count_continuations(struct swap_info_struct *); -static void swap_entries_free(struct swap_info_struct *si, +static void swap_free_entries(struct swap_info_struct *si, struct swap_cluster_info *ci, - swp_entry_t entry, unsigned int nr_pages); + unsigned long start, unsigned int nr_pages); static void swap_range_alloc(struct swap_info_struct *si, unsigned int nr_entries); static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr); -static bool swap_entries_put_map(struct swap_info_struct *si, - swp_entry_t entry, int nr); +static unsigned char swap_put_entry_locked(struct swap_info_struct *si, + struct swap_cluster_info *ci, + swp_entry_t entry, + unsigned char usage); static bool folio_swapcache_freeable(struct folio *folio); static DEFINE_SPINLOCK(swap_lock); @@ -193,25 +195,6 @@ static bool swap_only_has_cache(struct swap_info_struct *si, return true; } -static bool swap_is_last_map(struct swap_info_struct *si, - unsigned long offset, int nr_pages, bool *has_cache) -{ - unsigned char *map = si->swap_map + offset; - unsigned char *map_end = map + nr_pages; - unsigned char count = *map; - - if (swap_count(count) != 1) - return false; - - while (++map < map_end) { - if (*map != count) - return false; - } - - *has_cache = !!(count & SWAP_HAS_CACHE); - return true; -} - /* * returns number of pages in the folio that backs the swap entry. If positive, * the folio was reclaimed. If negative, the folio was not reclaimed. If 0, no @@ -1237,6 +1220,56 @@ static bool swap_alloc_slow(swp_entry_t *entry, return false; } +/* + * Put the ref count of entries, caller must ensure the entries' + * swap table count are not zero. This won't free up the swap cache. + */ +static bool swap_put_entries(struct swap_info_struct *si, + unsigned long start, int nr) +{ + unsigned long offset = start, end = start + nr, cluster_end; + unsigned long head = SWAP_ENTRY_INVALID; + struct swap_cluster_info *ci; + bool has_cache = false; + unsigned int count; + swp_te_t swp_te; +next_cluster: + ci = swap_lock_cluster(si, offset); + cluster_end = min(cluster_offset(si, ci) + SWAPFILE_CLUSTER, end); + do { + swp_te = __swap_table_get(ci, offset); + count = si->swap_map[offset]; + if (WARN_ON_ONCE(!swap_count(count))) { + goto skip; + } else if (swp_te_is_folio(swp_te)) { + VM_WARN_ON_ONCE(!(count & SWAP_HAS_CACHE)); + /* Let the swap cache (folio) handle the final free */ + has_cache = true; + } else if (count == 1) { + /* Free up continues last ref entries in batch */ + head = head ? head : offset; + continue; + } + swap_put_entry_locked(si, ci, swp_entry(si->type, offset), 1); +skip: + if (head) { + swap_free_entries(si, ci, head, offset - head); + head = SWAP_ENTRY_INVALID; + } + } while (++offset < cluster_end); + + if (head) { + swap_free_entries(si, ci, head, offset - head); + head = SWAP_ENTRY_INVALID; + } + + swap_unlock_cluster(ci); + if (unlikely(cluster_end < end)) + goto next_cluster; + + return has_cache; +} + /** * folio_alloc_swap - allocate swap space for a folio * @folio: folio we want to move to swap @@ -1351,7 +1384,7 @@ void folio_put_swap(struct folio *folio, struct page *subpage) nr_pages = 1; } - swap_entries_put_map(swp_info(entry), entry, nr_pages); + swap_put_entries(swp_info(entry), swp_offset(entry), nr_pages); } /* @@ -1407,7 +1440,7 @@ static struct swap_info_struct *_swap_info_get(swp_entry_t entry) return NULL; } -static unsigned char swap_entry_put_locked(struct swap_info_struct *si, +static unsigned char swap_put_entry_locked(struct swap_info_struct *si, struct swap_cluster_info *ci, swp_entry_t entry, unsigned char usage) @@ -1438,7 +1471,7 @@ static unsigned char swap_entry_put_locked(struct swap_info_struct *si, if (usage) WRITE_ONCE(si->swap_map[offset], usage); else - swap_entries_free(si, ci, entry, 1); + swap_free_entries(si, ci, offset, 1); return usage; } @@ -1509,70 +1542,6 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) return NULL; } -static bool swap_entries_put_map(struct swap_info_struct *si, - swp_entry_t entry, int nr) -{ - unsigned long offset = swp_offset(entry); - struct swap_cluster_info *ci; - bool has_cache = false; - unsigned char count; - int i; - - if (nr <= 1) - goto fallback; - count = swap_count(data_race(si->swap_map[offset])); - if (count != 1) - goto fallback; - - ci = swap_lock_cluster(si, offset); - if (!swap_is_last_map(si, offset, nr, &has_cache)) { - goto locked_fallback; - } - if (!has_cache) - swap_entries_free(si, ci, entry, nr); - else - for (i = 0; i < nr; i++) - WRITE_ONCE(si->swap_map[offset + i], SWAP_HAS_CACHE); - swap_unlock_cluster(ci); - - return has_cache; - -fallback: - ci = swap_lock_cluster(si, offset); -locked_fallback: - for (i = 0; i < nr; i++, entry.val++) { - count = swap_entry_put_locked(si, ci, entry, 1); - if (count == SWAP_HAS_CACHE) - has_cache = true; - } - swap_unlock_cluster(ci); - return has_cache; -} - -/* - * Only functions with "_nr" suffix are able to free entries spanning - * cross multi clusters, so ensure the range is within a single cluster - * when freeing entries with functions without "_nr" suffix. - */ -static bool swap_entries_put_map_nr(struct swap_info_struct *si, - swp_entry_t entry, int nr) -{ - int cluster_nr, cluster_rest; - unsigned long offset = swp_offset(entry); - bool has_cache = false; - - cluster_rest = SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER; - while (nr) { - cluster_nr = min(nr, cluster_rest); - has_cache |= swap_entries_put_map(si, entry, cluster_nr); - cluster_rest = SWAPFILE_CLUSTER; - nr -= cluster_nr; - entry.val += cluster_nr; - } - - return has_cache; -} - /* * Check if it's the last ref of swap entry in the freeing path. */ @@ -1585,11 +1554,11 @@ static inline bool __maybe_unused swap_is_last_ref(unsigned char count) * Drop the last ref of swap entries, caller have to ensure all entries * belong to the same cgroup and cluster. */ -static void swap_entries_free(struct swap_info_struct *si, +static void swap_free_entries(struct swap_info_struct *si, struct swap_cluster_info *ci, - swp_entry_t entry, unsigned int nr_pages) + unsigned long offset, unsigned int nr_pages) { - unsigned long offset = swp_offset(entry); + swp_entry_t entry = swp_entry(si->type, offset); unsigned char *map = si->swap_map + offset; unsigned char *map_end = map + nr_pages; @@ -1622,10 +1591,10 @@ void __swap_cache_put_entries(struct swap_info_struct *si, swp_entry_t entry, unsigned int size) { if (swap_only_has_cache(si, swp_offset(entry), size)) - swap_entries_free(si, ci, entry, size); + swap_free_entries(si, ci, swp_offset(entry), size); else for (int i = 0; i < size; i++, entry.val++) - swap_entry_put_locked(si, ci, entry, SWAP_HAS_CACHE); + swap_put_entry_locked(si, ci, entry, SWAP_HAS_CACHE); } /* @@ -1843,7 +1812,7 @@ void do_put_swap_entries(swp_entry_t entry, int nr) /* * First free all entries in the range. */ - any_only_cache = swap_entries_put_map_nr(si, entry, nr); + any_only_cache = swap_put_entries(swp_info(entry), swp_offset(entry), nr); /* * Short-circuit the below loop if none of the entries had their @@ -1917,7 +1886,7 @@ void free_swap_page_of_entry(swp_entry_t entry) if (!si) return; ci = swap_lock_cluster(si, offset); - WARN_ON(swap_count(swap_entry_put_locked(si, ci, entry, 1))); + WARN_ON(swap_count(swap_put_entry_locked(si, ci, entry, 1))); /* It might got added to swap cache accidentally by read ahead */ __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); swap_unlock_cluster(ci); @@ -3805,7 +3774,7 @@ int add_swap_count_continuation(swp_entry_t entry, gfp_t gfp_mask) * into, carry if so, or else fail until a new continuation page is allocated; * when the original swap_map count is decremented from 0 with continuation, * borrow from the continuation and report whether it still holds more. - * Called while __swap_duplicate() or caller of swap_entry_put_locked() + * Called while __swap_duplicate() or caller of swap_put_entry_locked() * holds cluster lock. */ static bool swap_count_continued(struct swap_info_struct *si, -- 2.49.0