From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 489ECC83F03 for ; Fri, 4 Jul 2025 18:18:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9FD16B8066; Fri, 4 Jul 2025 14:18:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4FDF6B8059; Fri, 4 Jul 2025 14:18:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C18CA6B8066; Fri, 4 Jul 2025 14:18:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id ADDFF6B8059 for ; Fri, 4 Jul 2025 14:18:38 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5682616022A for ; Fri, 4 Jul 2025 18:18:38 +0000 (UTC) X-FDA: 83627392716.10.8422ABD Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) by imf13.hostedemail.com (Postfix) with ESMTP id 7BB2420014 for ; Fri, 4 Jul 2025 18:18:36 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FpN+Umaj; spf=pass (imf13.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.222.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751653116; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=E8faw6sAX+Ml82MO55gj9R3m/gUe9viz373aFcWSXcQ=; b=EsTgW1VGYkN4Cv38e2pCPDDp2XRqk+E0YVv7H5YAyFMRsYrkwH8IJT941a5XoixdynKlhR OGsFhHMkXCnTvqMH/mVIkkYtmjdIIMhoTHnKow0eGFAgWLhaVlGS/ZfKPzOuTM181BuSdf 6YTPAAq8PIhetCP6IZQdSp2zwkgLVmw= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FpN+Umaj; spf=pass (imf13.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.222.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751653116; a=rsa-sha256; cv=none; b=urCC0z5Aysm1aQoMTiXJtXd/kQUTRKP4fsNxl/+nzD1Evq28RccZpRdtm9vSMuJHgeK4y/ 50VCOkwwDXDzNpaVLs796OwBj3HiXfO5Q7N/gOo/Hk/11xSCK9970mse35ooLJh2m7f7oy G+xcWxfDyWtxgjn6BAx7zR3Al9350+Y= Received: by mail-qk1-f179.google.com with SMTP id af79cd13be357-7d3862646eeso80718185a.2 for ; Fri, 04 Jul 2025 11:18:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751653115; x=1752257915; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=E8faw6sAX+Ml82MO55gj9R3m/gUe9viz373aFcWSXcQ=; b=FpN+UmajP+wQdmpbAdodCiMm1WdH48ErKvsL6Wctya01WoF/IFTU1N0sr8Ays/ymBC QDICitfEdCWjvTsq3lu6NmsH6trQUWJboPL45bod6wK0n2emXHg9xYb8VAJlkqjlOeR4 O7JhUtYRlSydom5ElWXXR8n/SpYdTPYVseT/eQMmUgqkkARvyfbolfGFjIKve+2eqpdc lhdCX3ACiECLLAlmJdbqrrzifLtzRzkcDwJErFbxuW22x+cfT783C24aS6UgdyNaOApX b2kc5HmRds6480RNisZOyBEEKaEkxQWWkP9gYiJM+C8O+9uzOM0zzQhlYjFzyND8KlHh WW1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751653115; x=1752257915; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=E8faw6sAX+Ml82MO55gj9R3m/gUe9viz373aFcWSXcQ=; b=wbFkysosmWGQc1ff4iQxY5JbysWAO1aTymqmIhi1GDu5B3zhIodbyGdRY9FUN6tkQ8 aFbFZy3cDJb1yXqhA9w9RM0umUR0ofcqNYT4y2QAaQIeXs+K0iG+K43YUA4ioOTiBCxz M1PEv7Dubdto4PTQmeI7/59pXw6zViC9ANErRDoSuWW+34c8wIcuhYnWcRf6j1Dr/PPa k0Ete7jGoZZQDOdMAatxcLa8XIgcM4atGnMcMivIW6K99Q+41DjAeTaYBU48qoU9ag5O fGr1bDDROQeXFrWuYcKXH5JeUbgjsuS7DkniVTjZluV88wMybUzoqYZh3N6xDTcNfyYg JLIg== X-Gm-Message-State: AOJu0YxfJRCTazPPoqXm76DwgjoU749TCZWhDXrDpWlmfzhCxkONhw8m Kf/gxdtawdlPMk17WGr1T8QbMfsZnoN2B4jsF9Yh9ffKxFSGWtDJZBHmN2kOqRh50iM= X-Gm-Gg: ASbGncs1j4Uv74VcAGOUIKd4mUswEL6HCwvyM1NdatXIxpDitBPDxZPde9SgpqvI/sQ giErdxLOeGRVeRODhbOvmrT2jpQrOPQ5t64Ki/rMPimKOLxzx99XOKBJNKgfxs9ehB3AhlWW2us 18Ne6y6yEECN4ScHgDqHnZFfxmKubvD9QGt5TQamVJAaOKkumGzSS3iISJd/IfAS5Hu2+kAYqvD azlzLuapkO4+8YRXUTia2c1wq99gJV3F1fpz2KJPWDf5Hrsvzk3rVqfdUBSp9JIOnwzqY/QBnfc H+9JYICtfBUJpcjUkAuNlwq1jr4ZSDoNlcy9K1aKTQsfASa7pw0hL8UPZ/oTHt0JldI= X-Google-Smtp-Source: AGHT+IGEl8jKR2TlZ4n+3yLyyMozyoKc99DvoeWJCrCkBYUgU9He1oJ+ddvGlfVppqxhmA8PRgOQCQ== X-Received: by 2002:a05:620a:60db:b0:7d4:604c:31bd with SMTP id af79cd13be357-7d5df194ae3mr349988085a.56.1751653114688; Fri, 04 Jul 2025 11:18:34 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d5dbe7c188sm183300585a.59.2025.07.04.11.18.30 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 04 Jul 2025 11:18:34 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v4 8/9] mm/shmem, swap: simplify swap entry and index calculation of large swapin Date: Sat, 5 Jul 2025 02:17:47 +0800 Message-ID: <20250704181748.63181-9-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250704181748.63181-1-ryncsn@gmail.com> References: <20250704181748.63181-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: qju4tsrg6gf3tmw3h1qwno4o6oisyqjy X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7BB2420014 X-HE-Tag: 1751653116-871873 X-HE-Meta: U2FsdGVkX1/8YxqE8/iVJ7u8hQoINWNbq8TEywRnNs/y+G4R1pg4CxrgpwQ8wWTKnWMvwssuFUKiOQv9cwi6UjF0GzpyYX7d5ktEZEaqxwId47xPlNmtkzvJDCnXQjZp8Wd4gwh0cYpGrn8v1eiUi+QFTiWVb6kscAS0otIpqeLhh2kw4dNfX79kruTeHckemX7v7B1Girn04DvSwxjra3izOD2MAKYYbMJM5TgAlJiuTDM5+s6qP3zFtMCTy3InjoaEUDSBIxyEzAvkUnqnK1hh+EruHwZoo967KbsHFuF+lvprj3ZpzCgo6qdpshhHvnLSQValZuIe0xgN71XizuL8ds3Ys6BRNvpHVJzGM20PSXktjr8bmaPyMIQARMZp28XnrFjlW399vfAPX0UdWkCvl6xronZBa3MWNoOGtltR+OyaYiAARzwttaDHyDEczEvnmF9Qf3NsLSVmB1lesILMmFPCwmiBPMPkYLtcrhh/RlWhUyTk+2mtapuIsfqTzNn9Vg0qtQ3jNhjDUFTtNltc6mppbNKNMXfdECrZlbl8uyvvpsehqWyZ6EVu96SvgYXPEjtahejo8JYagxuN46dyvsG07NIN1F5r14EzsBdMAe+jORgEGGDBVuE9iXQHnzNmFTq8UOiw/7BlYDLCpaN1+gfkFTWfRHaE6FSym9lARkPTIOt/gGj/dNDI0B0PR6qYNdIz4BCejUqo2SAmNHtTNvghtpXeEg7i08RyOSwa5r7vvX/1zeDDAJMCoKaI8hnT0tlRDDjZsTY/iFuqrhd2UdTRYMn9L9Kz3T6tLdWeE5hrP1tSsOMCjwigHsWQZAMn7Q3SHAnLPmGkzf/jQRp7YevBMJ6Jv3A202Xc6h1WR9REUJ33OwfntBgSOYNs1rCuOu7F4o6/J25cP2oqFHrtr/4OZ3QFw8SqKfubQRNufRpLyyMRSJ7t1DNluZTUysyXMyFo0dE39b1mQKx 3wVOcWFP vP/Th+6B0d+/Oou0K5oZa/VQnvAp1HJZ+Y/+e+XFR56c0WPeNYPP+JydD3uPAlSCrOzfRHjcSOYw6WgDqbkVwtyLaCCYOILBxynAy1wL6pixEfn4yvw8ibWTrcOpuFM4lQh52hhM7yOHt1GsXZCblXcwtG5EDjOUHAjQiXkHRvKl+0UDjNKtQHjtcv6p8Ln4pPh5XKtGTBafKGXdTWPahPDkSoH3nyoGlOzGYx/r9zg/25+6jGwymA/yBj1H6yZofB2KZZvnkN/P2s81XvJHa864dFCqn+r5421Gq1UFF4EYY9xsMto/d0FFodzyhBG87FWP4FYWrxnsTJDcLoMINo43VyjK/RonjAGk8vaG6vz9NqEEV43eARlLXK/yeYu/cwWgEoIrJrNfC85M0hw0MRg5WIxbNfyo8NKLcEhfuS6uZQTI0SI49oqOgXWJHAQQaVneza8CbAt+/Cp105Nkypo9TDrSxv0S+fiVFAaW7EBUhlSJUKSV1srT0RlZlRdgKJAKFNby990YrJjEk7DGCN0/hoYiDDv8MybNv/9N4fvrLZw9ws6G0B7rzg3M3WwJ68/GUHwsDY+YrEV4HrrTePn0G+NBjUoVoY93osHgm+MXHa+7s9lrH0GE+DY+L5TFWRkY8IvRLGTKWE2wEKJQ7kp20bQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Noticing large shmem swapin have already calculated the right swap value to be used before the swap cache look up, simply rounding it down against the size of the folio brought in the swapin is simple and effective enough to get the swap value to be verified. A folio's swap entry is always aligned by its size. Any kind of parallel split or race is fine, because the final shmem_add_to_page_cache always ensures entries covered by the folio are all correct, and there won't be any data corruption. This shouldn't cause any increased repeated fault too, instead, no matter how the shmem mapping is split in parallel, as long as the mapping still contains the right entries, the swapin will succeed. This reduced both the final object size and stack usage: ./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o add/remove: 0/0 grow/shrink: 1/1 up/down: 5/-214 (-209) Function old new delta shmem_read_mapping_page_gfp 143 148 +5 shmem_swapin_folio 4020 3806 -214 Total: Before=33478, After=33269, chg -0.62% Stack usage (Before vs After): shmem.c:2279:12:shmem_swapin_folio 280 static shmem.c:2279:12:shmem_swapin_folio 264 static Signed-off-by: Kairui Song --- mm/shmem.c | 43 +++++++++++++++++++++---------------------- 1 file changed, 21 insertions(+), 22 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 782162c0c4e0..646b1db9501c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2267,7 +2267,7 @@ static int shmem_split_large_entry(struct inode *inode, pgoff_t index, if (xas_error(&xas)) return xas_error(&xas); - return entry_order; + return 0; } /* @@ -2288,7 +2288,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, struct swap_info_struct *si; struct folio *folio = NULL; bool skip_swapcache = false; - int error, nr_pages, order, split_order; + int error, nr_pages, order; pgoff_t offset; VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); @@ -2336,8 +2336,6 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, folio = NULL; goto failed; } else { - if (folio_test_large(folio)) - swap = index_entry; skip_swapcache = true; } } else { @@ -2349,6 +2347,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, } } } + if (order > folio_order(folio)) { /* * Swapin may get smaller folios due to various reasons: @@ -2358,23 +2357,25 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, * large swap entries. In such cases, we should split the * large swap entry to prevent possible data corruption. */ - split_order = shmem_split_large_entry(inode, index, index_entry, gfp); - if (split_order < 0) { - error = split_order; + error = shmem_split_large_entry(inode, index, index_entry, gfp); + if (error) goto failed_nolock; - } + } - /* - * If the large swap entry has already been split, it is - * necessary to recalculate the new swap entry based on - * the old order alignment. - */ - if (split_order > 0) { - offset = index - round_down(index, 1 << split_order); - swap = swp_entry(swp_type(swap), swp_offset(swap) + offset); - } - } else if (order < folio_order(folio)) { - swap.val = round_down(swap.val, 1 << folio_order(folio)); + /* + * If the folio is large, round down swap and index by folio size. + * No matter what race occurs, the swap layer ensures we either get + * a valid folio that has its swap entry aligned by size, or a + * temporarily invalid one which we'll abort very soon and retry. + * + * shmem_add_to_page_cache ensures the whole range contains expected + * entries and prevents any corruption, so any race split is fine + * too, it will succeed as long as the entries are still there. + */ + nr_pages = folio_nr_pages(folio); + if (nr_pages > 1) { + swap.val = round_down(swap.val, nr_pages); + index = round_down(index, nr_pages); } /* We have to do this with folio locked to prevent races */ @@ -2389,7 +2390,6 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, goto failed; } folio_wait_writeback(folio); - nr_pages = folio_nr_pages(folio); /* * Some architectures may have to restore extra metadata to the @@ -2403,8 +2403,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, goto failed; } - error = shmem_add_to_page_cache(folio, mapping, - round_down(index, nr_pages), + error = shmem_add_to_page_cache(folio, mapping, index, swp_to_radix_entry(swap), gfp); if (error) goto failed; -- 2.50.0