From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF5E8C83F17 for ; Fri, 18 Jul 2025 06:51:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F2218D0003; Fri, 18 Jul 2025 02:51:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A1A88D0001; Fri, 18 Jul 2025 02:51:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 490928D0003; Fri, 18 Jul 2025 02:51:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 358FE8D0001 for ; Fri, 18 Jul 2025 02:51:52 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BC45C1401BD for ; Fri, 18 Jul 2025 06:51:51 +0000 (UTC) X-FDA: 83676465222.05.F2C6135 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by imf15.hostedemail.com (Postfix) with ESMTP id 38A83A0008 for ; Fri, 18 Jul 2025 06:51:46 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; spf=pass (imf15.hostedemail.com: domain of shikemeng@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=shikemeng@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752821510; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=YVdTuPIJ/kAik3tmI2iuG7engybN5gVLJ4Gs9rMZBhU=; b=LHmP/TGgrkAu1oumVyydcunnkuNFkSdsLHS2Cs/9WbX2j7e7GLxReolApgCfruR4sEUGA6 ImIPA8Rr4sB7PIwJ3N/8ztPdjfk655OCO7kdXGBUmr/Xq44yAjieu9yd6EYd4cVqoziLRC oS68MCakiPORWjb9LLvFiqB96U3MtV0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of shikemeng@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=shikemeng@huaweicloud.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752821510; a=rsa-sha256; cv=none; b=tJMZmOi3VXrwPrnLYIe6zmfzwIHLKKwajvLSJg+Eak8xM9UxgLMnVZC2IfizlGiqm0q8GB pBtUhvEPXIiIFAbpRiHcPkwhdV1USOnz2yDCRpJJMHkkiv/97GDARkn6gIDVW+ivFxNAVa JwL2z7yH+Mlf6u+o/bnYlFRWK6NDmVA= Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4bk0nr0QVxzKHMw0 for ; Fri, 18 Jul 2025 14:51:44 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id AC0701A01A5 for ; Fri, 18 Jul 2025 14:51:42 +0800 (CST) Received: from huaweicloud.com (unknown [10.166.178.91]) by APP3 (Coremail) with SMTP id _Ch0CgA3sdv87nloC4iRAg--.35938S2; Fri, 18 Jul 2025 14:51:41 +0800 (CST) From: Kemeng Shi To: akpm@linux-foundation.org, kasong@tencent.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, chrisl@kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kemeng Shi Subject: [PATCH] mm: swap: correctly use maxpages in swapon syscall to avoid potential deadloop Date: Fri, 18 Jul 2025 14:51:39 +0800 Message-Id: <20250718065139.61989-1-shikemeng@huaweicloud.com> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID:_Ch0CgA3sdv87nloC4iRAg--.35938S2 X-Coremail-Antispam: 1UD129KBjvJXoW3Ww13Kw4rKF4rWFykWr1UAwb_yoW7tF1xpF W3WFn0kr4kJrn2kw17Aa1DCry3Cr1fCa17ta13JFySv3WDXrySgr97trn5ZrySgFn5JFyq qrs7t34Uu3WYqa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkC14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_ Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67 AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIY rxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14 v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8 JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUoWlkDU UUU X-CM-SenderInfo: 5vklyvpphqwq5kxd4v5lfo033gof0z/ X-Stat-Signature: wyynxyk37pjyuydr77g3yraeryyk1eoi X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 38A83A0008 X-Rspam-User: X-HE-Tag: 1752821506-252467 X-HE-Meta: U2FsdGVkX1+u1lkki8CtYOcnyQlQgKfs6/LkTAdhEQswRDrU8fDy3YOkJcG7qD+OVTI0vxvZ2ilSeBKm0bM1m3Ck8WnQ1nXuphxXbVT5oAybalsYg1gszjV9M9WhAxyUP+6IMpp091650nXvMr0d1+zohd/4JHUGPjIsAVhW4vuqAtmLsZL+/mJkakOgoBesAFlMJeh0+Ax1X5pFPLCRSP9XNY3u5eepL4QfJgiqlOmLfxcAXUDeO2FnIXpdV3xnGK6Q2CzShK/hfpE6Je1xx+ivsI7/EVLmaR4urZZlvXq/hdAAcSGr7CSPgLRqDNn+gcri2SlbR7UvetHhpTX7i5mR9q/dT2T2m8PUpc09WFuMHbtb+6cOTeau+VAptrF+dTu369XKm0OBl6CrrV6cUc2RQtCF0AoXpYe0DvCfUld4FxmuG2vvVnpJGKJ4llZlbjwnDxTOCyHgEX8zWa7CpsSFI2iSIz3WtjC0mKrThi3smheRdF+7iqeX5IqJRCoqbQlew3jhhUnI/JHmCTTxkFffYluGjJIjEMjSZdS6b0MdD0gzCpeodGyFnJMeWsaK0uLZxPD9jGnHe4qGLiONOPFsIYx0BHJmH/B4m44fpORUYo7PgfTlxWD95hlBeaLUm1guz58gvsNteICruCXxqUhZsXs4i3a+xhzc6gJfVfqL6TavocdlAtiRErTB/sq3+F141nCQfkX36EIa1m77X9AdkvLuCJqsgVf4hd/bjpQDH7lYs4FnOQsmzNPnmHR//YNQPNpw/4AOCXtRyN2cHbNu5IXmqqGNocV7p6mlBEMue/0wTtCjeivpK9D3IIq2Al41nsO1mqS37hL7DM1xnQMcIVy1Za2mlTotkLEDSofQnZNFkGpOmsVIkUpq5GG7A4nZkVFuvwAYweknkDc7nF6C7AfFL9fNEbJK713XO6ihe4ijLNI448GkzKxfKTY25e4J3qChw9BquF9b4TP En2rFY1W B096kp0yX8hzHI5FvVAB+6T8llp1h+jUaCIEU2OpnF6OhgnuX5/HD4jtxBUpa6dzbwkLxJQAGD03/KWu0JD2WQZa7It8uk2U8PILkGJJJs7YWfeH1jQoo5/MxlWGcJyD0Y5jOAP+12G3tNaQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We use maxpages from read_swap_header() to initialize swap_info_struct, however the maxpages might be reduced in setup_swap_extents() and the si->max is assigned with the reduced maxpages from the setup_swap_extents(). Obviously, this could lead to memory waste as we allocated memory based on larger maxpages, besides, this could lead to a potential deadloop as following: 1) When calling setup_clusters() with larger maxpages, unavailable pages within range [si->max, larger maxpages) are not accounted with inc_cluster_info_page(). As a result, these pages are assumed available but can not be allocated. The cluster contains these pages can be moved to frag_clusters list after it's all available pages were allocated. 2) When the cluster mentioned in 1) is the only cluster in frag_clusters list, cluster_alloc_swap_entry() assume order 0 allocation will never failed and will enter a deadloop by keep trying to allocate page from the only cluster in frag_clusters which contains no actually available page. Call setup_swap_extents() to get the final maxpages before swap_info_struct initialization to fix the issue. After this change, span will include badblocks and will become large value which I think is correct value: In summary, there are two kinds of swapfile_activate operations. 1. Filesystem style: Treat all blocks logical continuity and find usable physical extents in logical range. In this way, si->pages will be actual usable physical blocks and span will be "1 + highest_block - lowest_block". 2. Block device style: Treat all blocks physically continue and only one single extent is added. In this way, si->pages will be si->max and span will be "si->pages - 1". Actually, si->pages and si->max is only used in block device style and span value is set with si->pages. As a result, span value in block device style will become a larger value as you mentioned. I think larger value is correct based on: 1. Span value in filesystem style is "1 + highest_block - lowest_block" which is the range cover all possible phisical blocks including the badblocks. 2. For block device style, si->pages is the actual usable block number and is already in pr_info. The original span value before this patch is also refer to usable block number which is redundant in pr_info. Link: https://lkml.kernel.org/r/20250522122554.12209-3-shikemeng@huaweicloud.com Fixes: 661383c6111a ("mm: swap: relaim the cached parts that got scanned") Signed-off-by: Kemeng Shi Reviewed-by: Baoquan He --- v1->v2: -Fix typo -Add description of behavior change of "span" in git log -Ensure si->pages == si->max - 1 after setup_swap_extents() mm/swapfile.c | 53 +++++++++++++++++++++++++-------------------------- 1 file changed, 26 insertions(+), 27 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 68ce283e84be..57397434929e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, return maxpages; } -static int setup_swap_map_and_extents(struct swap_info_struct *si, - union swap_header *swap_header, - unsigned char *swap_map, - unsigned long maxpages, - sector_t *span) +static int setup_swap_map(struct swap_info_struct *si, + union swap_header *swap_header, + unsigned char *swap_map, + unsigned long maxpages) { - unsigned int nr_good_pages; unsigned long i; - int nr_extents; - - nr_good_pages = maxpages - 1; /* omit header page */ + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ for (i = 0; i < swap_header->info.nr_badpages; i++) { unsigned int page_nr = swap_header->info.badpages[i]; if (page_nr == 0 || page_nr > swap_header->info.last_page) return -EINVAL; if (page_nr < maxpages) { swap_map[page_nr] = SWAP_MAP_BAD; - nr_good_pages--; + si->pages--; } } - if (nr_good_pages) { - swap_map[0] = SWAP_MAP_BAD; - si->max = maxpages; - si->pages = nr_good_pages; - nr_extents = setup_swap_extents(si, span); - if (nr_extents < 0) - return nr_extents; - nr_good_pages = si->pages; - } - if (!nr_good_pages) { + if (!si->pages) { pr_warn("Empty swap-file\n"); return -EINVAL; } - return nr_extents; + return 0; } #define SWAP_CLUSTER_INFO_COLS \ @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, * Mark unusable pages as unavailable. The clusters aren't * marked free yet, so no list operations are involved yet. * - * See setup_swap_map_and_extents(): header page, bad pages, + * See setup_swap_map(): header page, bad pages, * and the EOF part of the last cluster. */ inc_cluster_info_page(si, cluster_info, 0); @@ -3363,6 +3350,21 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) goto bad_swap_unlock_inode; } + si->max = maxpages; + si->pages = maxpages - 1; + nr_extents = setup_swap_extents(si, &span); + if (nr_extents < 0) { + error = nr_extents; + goto bad_swap_unlock_inode; + } + if (si->pages != si->max - 1) { + pr_err("swap:%u != (max:%u - 1)\n", si->pages, si->max); + error = -EINVAL; + goto bad_swap_unlock_inode; + } + + maxpages = si->max; + /* OK, set up the swap map and apply the bad block list */ swap_map = vzalloc(maxpages); if (!swap_map) { @@ -3374,12 +3376,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (error) goto bad_swap_unlock_inode; - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, - maxpages, &span); - if (unlikely(nr_extents < 0)) { - error = nr_extents; + error = setup_swap_map(si, swap_header, swap_map, maxpages); + if (error) goto bad_swap_unlock_inode; - } /* * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might -- 2.36.1