From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65F1CC54E90 for ; Sun, 25 May 2025 17:09:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C7F0C6B007B; Sun, 25 May 2025 13:09:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C30D26B0082; Sun, 25 May 2025 13:09:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B45B16B0083; Sun, 25 May 2025 13:09:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 94CBA6B007B for ; Sun, 25 May 2025 13:09:18 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0CFD75DD66 for ; Sun, 25 May 2025 17:09:18 +0000 (UTC) X-FDA: 83482065996.05.284BD56 Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) by imf30.hostedemail.com (Postfix) with ESMTP id 0E95F80005 for ; Sun, 25 May 2025 17:09:15 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=h2bCQBVk; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748192956; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gCupfkLLBRXyvgezOvk1m5uJBgj9564oL6KNWBNiFkc=; b=hUixjXeu1DgVBO/5F8P23N/RehidgtvXZ7OPMm/+ylrt+TeLDO747dNXymxR7WSd28xEKh wbtw8bQJ5hLOj+Ffv9gjQI3rX2oyweFcItCx+gGpHweudrnm8MB2a6A3OdlxU3ejaym5IZ vtxj7tmbZ5IiVfASG1fRhPQ6xS97/aw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748192956; a=rsa-sha256; cv=none; b=4dwXzoYolqFxV+thK5tHVLY4Qde8B+udo+znKY5rRr1oySE3nkO2RjL0bXaz26zsnzZ2wR bhjY9XOJWAXdkXTpQYKmgBCsz/Oj31YRjBdRmzAXvJTMFrh7aZHEC0u/N6dfr/VWvObRc1 Vuw28kH4RHmJJGMn3Y65Y3M9rFZR87g= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=h2bCQBVk; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.169 as permitted sender) smtp.mailfrom=ryncsn@gmail.com Received: by mail-lj1-f169.google.com with SMTP id 38308e7fff4ca-32928704a28so18262661fa.1 for ; Sun, 25 May 2025 10:09:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748192954; x=1748797754; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gCupfkLLBRXyvgezOvk1m5uJBgj9564oL6KNWBNiFkc=; b=h2bCQBVkbTRL0j5nRKVBKVqHoH2/ISNnfpakWGPzrRBVD+Ry6kmQqUmgXc9erWU70l CEWcoKg1lILWUDaM+9mrzNxAWXGg4eDgiijIF31tnGFpzaM1JKTtCOJq9FDjZPiz997/ H9puXb5cgPzYJ8H2EQj60SUWR3nuSmt+g/F/sOv72VImo3vHs9HuHR0wfFYhyQ7EM7zy lPOS7ZtFEaGpXp+QkmUAdkEYZSV7AJ8zB1XGKNBx+2+zljxHV+TPNQTY5H8L8nG3x8ZJ x5TxuY7AvHMUHjejsLmOpihDgppfwson2B6YKtqC9Z5/jnwVeZSf2dqjKssZ75ZpXHip Nr1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748192954; x=1748797754; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gCupfkLLBRXyvgezOvk1m5uJBgj9564oL6KNWBNiFkc=; b=drdLCLNpChPEamH6chTHb6T1jjPBla0esgLTSst0l64DwZ0/pOov5Uo8gAiBaKDrdc O5UC9rnNUH9a3c5wBzodWEwWyHUf7QzM4oPwjlJtF/VD3NzM7P/GdGaNIMRGBi6cIojz wOOpfQHROU0o8AdQB5YlxXtEwyY5ZYb2CBxlKMJnVFoWuENMBRzA17S8PD4nmRW/joht ApI/IEM3PfSHgC8xT61t1KrLyPGO/j2liba0KVjSPELftEI2367en5paPJKK2/fG/oda PBUyMSMZ8i5QKP5sFJySPZxCzPAsfFNrJm8X6DFc56f27sYPWGm1MGGsWlkhpCa07Jc4 1FBg== X-Forwarded-Encrypted: i=1; AJvYcCUJbSsU/zMBVpkC3isHKf80K0WrcIWfNRfow5SzKKUTIJ1Hj4vgGEnJoF5+6FEPAPPisVZ+HCge3A==@kvack.org X-Gm-Message-State: AOJu0YwKVp+Lqww/GdtCwlIoGcbKhs64yRLV9B/1W5yX7kPJrAadUUn5 SZnJVIyaWer48TtxJ6vT1r2Rdu9L2LGUsPRzxTR4Gb/OUNNeFEruh4cpLDP+YIpEopSA5gcVp6h 7cWLysJgWlH94If7LO3F79EbudOVPm7n9hhO6 X-Gm-Gg: ASbGnctGM5YzZ6Yt2tmthUbsG0uQMtoEUPL7tSczhCav7iuAfxfyEJ/w+K6Hu2pUMpn XOZ1LtBrqIk27I7wAxsCvIROTHI3vqQK1nKhhwV3qYBNQnfMQYaPMSyos00a4pIi5PzigLWJzd3 TPJCWICcZ+rGUGk/vrliVUn+Tgc/K49MkN X-Google-Smtp-Source: AGHT+IFqpNFA9RLARunHpODR6agkcc/Fqy/z67dPDN2eW1f1qIOvdHUMZ4vppO5dzOIZ1Ov5h7+WWThr6lGEwY6atmQ= X-Received: by 2002:a05:651c:210c:b0:30b:d17b:269a with SMTP id 38308e7fff4ca-3295b9aea2fmr14116971fa.7.1748192954065; Sun, 25 May 2025 10:09:14 -0700 (PDT) MIME-Version: 1.0 References: <20250522122554.12209-1-shikemeng@huaweicloud.com> <20250522122554.12209-3-shikemeng@huaweicloud.com> In-Reply-To: <20250522122554.12209-3-shikemeng@huaweicloud.com> From: Kairui Song Date: Mon, 26 May 2025 01:08:56 +0800 X-Gm-Features: AX0GCFufTR1qMhjI3981pwKPLYMy0g5s9XrhAvw9WjaIlvBoxWEgOqJ3JrPida4 Message-ID: Subject: Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop To: Kemeng Shi Cc: akpm@linux-foundation.org, bhe@redhat.com, hannes@cmpxchg.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 0E95F80005 X-Stat-Signature: k7qhb3qc1rmsjqqyniknsoignjmc3afa X-Rspam-User: X-HE-Tag: 1748192955-560688 X-HE-Meta: U2FsdGVkX19bhYeF4MdEzGmfI/vZN+G2ESChvFQRPsQgebviBzxQ1ETyhTOhK6LIH8X2DR1rlQ8Mbtzsw4Bk9ZVtnmo1lLkdYODYPpUrxSU6EiN3ykXgAapla9zsk56HUG4xl2rg+HgWmbyJkDpJzbEvqDhqltc1RsLBYTlBY7TGqtpa862l9VftWL48xefu+4v2mhYrtqMlxWA8iuXDW0Csu3v0A0yydcxhxlfXXd1Qk+hzkTp+zHk5UfBwJxkPmvQqXGtiGm61qjvh6jGriGDa5MKFp5D7uHB9eK0B35xt4DVRQtBoTZGMd7FhZiJmCOlAPUPPxKJ0EZh7cq2pyKFAMdGSOceZVs2kbv+dBfOghxqOtOM8srZmTACygwF8ps//oVcDJcg3QLmhBplDfnX2jqrBWYXcJ0AA5ADSo7XpyPR949o4obqW6+fz9bxWTSZXaoK4+4Ps1S64gJ0fqvTDz7aHBYhwP041y58m694TVwXOLxfV8k530wteW/NN8/xQakW9DoUqiPS5U5Pb3Wc6jn5NVOQ8P1bHJpZ1XERHbKajDYJlC4D0GvxOV2Xpn6i5Leg2EdFH0WyYjzlOqF6f7lFLLaBGeEHndZkXBBLp9JAjG15LJbwhvXvUEg2brkfwJx9ZhCCqgXoJW17tlR/YlnO/8qxq1RW0dr7EkvMv6TO9Iu9xrAmgkMekfG8d8nwjVV9DiXfMujFIC7ArsiXpMk6HB3iPu0iULs181EKOBE1wo1ZnFGsmAvsneLBmAt6XWimRxtQedarP3zlV2RDJVkoxv09zXZY7xeRkoovgEiQ59zGEcw+AmuCB2clBsCMZj+WsuDX1VeKiPx2III7RHGPmuo0Auj6Vr8wiY/9OJN+PY1tAMUwg00K4etV4bxuH2fC9MG7E8g2Rbhzmy5s5n2SinI9nn4aEKkSJW4d+FkEppTqa8JmZlXVl8j+3Y0fmHfp5z/eVQVmXLl3 udG8k9wA ZBVXNAWr/f/hkPCwvHG2NrOuu3bbeV0ROT9BAeUK+zEnp8oD2n0FC9qLcXHus22/xDGhkF4PKZR184V3Ll42ycFvAqZWau1clmLT7iOSqJc/bKEZ/FjRKTysO3TvsQSYnx8ft7dRO44/hJVKOhmTFIJBlBERJF3WkwD6LLrOi8/rVwBZafh/57byj+6wV08Q1NbV/q11oeI+2Azf0p4SLB27Dqr46VIDNzkxV3Idwea8kag5MbbmlG38GZghAS6/KmftPOapo0R6E1376wMDgvpABYnY5qiT1DrARclxdVDaZaDK+sUl/haXnZnsVgT/CGtRClVr1EbllnU0QK4Rgayh0sM5XZbAV9Vsn8h3+aqlXT2k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 22, 2025 at 11:32=E2=80=AFAM Kemeng Shi wrote: > > We use maxpages from read_swap_header() to initialize swap_info_struct, > however the maxpages might be reduced in setup_swap_extents() and the > si->max is assigned with the reduced maxpages from the > setup_swap_extents(). > > Obviously, this could lead to memory waste as we allocated memory based o= n > larger maxpages, besides, this could lead to a potensial deadloop as > following: > 1) When calling setup_clusters() with larger maxpages, unavailable pages > within range [si->max, larger maxpages) are not accounted with > inc_cluster_info_page(). As a result, these pages are assumed available > but can not be allocated. The cluster contains these pages can be moved > to frag_clusters list after it's all available pages were allocated. > 2) When the cluster mentioned in 1) is the only cluster in frag_clusters > list, cluster_alloc_swap_entry() assume order 0 allocation will never > failed and will enter a deadloop by keep trying to allocate page from the > only cluster in frag_clusters which contains no actually available page. > > Call setup_swap_extents() to get the final maxpages before swap_info_stru= ct > initialization to fix the issue. > > Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned= ") > > Signed-off-by: Kemeng Shi > --- > mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- > 1 file changed, 20 insertions(+), 27 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 75b69213c2e7..a82f4ebefca3 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap= _info_struct *si, > return maxpages; > } > > -static int setup_swap_map_and_extents(struct swap_info_struct *si, > - union swap_header *swap_header, > - unsigned char *swap_map, > - unsigned long maxpages, > - sector_t *span) > +static int setup_swap_map(struct swap_info_struct *si, > + union swap_header *swap_header, > + unsigned char *swap_map, > + unsigned long maxpages) > { > - unsigned int nr_good_pages; > unsigned long i; > - int nr_extents; > - > - nr_good_pages =3D maxpages - 1; /* omit header page */ > > + swap_map[0] =3D SWAP_MAP_BAD; /* omit header page */ > for (i =3D 0; i < swap_header->info.nr_badpages; i++) { > unsigned int page_nr =3D swap_header->info.badpages[i]; > if (page_nr =3D=3D 0 || page_nr > swap_header->info.last_= page) > return -EINVAL; > if (page_nr < maxpages) { > swap_map[page_nr] =3D SWAP_MAP_BAD; > - nr_good_pages--; > + si->pages--; > } > } > > - if (nr_good_pages) { > - swap_map[0] =3D SWAP_MAP_BAD; > - si->max =3D maxpages; > - si->pages =3D nr_good_pages; > - nr_extents =3D setup_swap_extents(si, span); > - if (nr_extents < 0) > - return nr_extents; > - nr_good_pages =3D si->pages; > - } > - if (!nr_good_pages) { > + if (!si->pages) { > pr_warn("Empty swap-file\n"); > return -EINVAL; > } > > > - return nr_extents; > + return 0; > } > > #define SWAP_CLUSTER_INFO_COLS \ > @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(str= uct swap_info_struct *si, > * Mark unusable pages as unavailable. The clusters aren't > * marked free yet, so no list operations are involved yet. > * > - * See setup_swap_map_and_extents(): header page, bad pages, > + * See setup_swap_map(): header page, bad pages, > * and the EOF part of the last cluster. > */ > inc_cluster_info_page(si, cluster_info, 0); > @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, speci= alfile, int, swap_flags) > goto bad_swap_unlock_inode; > } > > + si->max =3D maxpages; > + si->pages =3D maxpages - 1; > + nr_extents =3D setup_swap_extents(si, &span); > + if (nr_extents < 0) { > + error =3D nr_extents; > + goto bad_swap_unlock_inode; > + } > + maxpages =3D si->max; There seems to be a trivial problem here, previously the si->pages will be seen by swap_activate after bad blocks have been counted and si->pages means the actual available slots. But now si->pages will be seen by swap_active as `maxpages - 1`. One current side effect now is the span value will not be updated properly so the pr_info in swap on may print a larger value, if the swap header contains badblocks and swapfile is on nfs/cifs. This should not be a problem but it's better to mention or add comments about it. And I think it's better to add a sanity check here to check if si->pages still equal to si->max - 1, setup_swap_map_and_extents / setup_swap_map assumes the header section was already counted. This also helps indicate the setup_swap_extents may shrink and modify these two values. BTW, I was thinking that we should get rid of the whole extents design after the swap table series is ready, so mTHP allocation will be usable for swap over fs too. > /* OK, set up the swap map and apply the bad block list */ > swap_map =3D vzalloc(maxpages); > if (!swap_map) { > @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, speci= alfile, int, swap_flags) > if (error) > goto bad_swap_unlock_inode; > > - nr_extents =3D setup_swap_map_and_extents(si, swap_header, swap_m= ap, > - maxpages, &span); > - if (unlikely(nr_extents < 0)) { > - error =3D nr_extents; > + error =3D setup_swap_map(si, swap_header, swap_map, maxpages); > + if (error) > goto bad_swap_unlock_inode; > - } > > /* > * Use kvmalloc_array instead of bitmap_zalloc as the allocation = order might > -- > 2.30.0 > Other than that: Reviewed-by: Kairui Song