From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9C1FC7EE2A for ; Fri, 27 Jun 2025 06:23:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A7AF6B00A0; Fri, 27 Jun 2025 02:23:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 52FDC6B00A2; Fri, 27 Jun 2025 02:23:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41FC66B00AD; Fri, 27 Jun 2025 02:23:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2A03B6B00A0 for ; Fri, 27 Jun 2025 02:23:04 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id EE3A810496C for ; Fri, 27 Jun 2025 06:23:03 +0000 (UTC) X-FDA: 83600187846.06.79CC9E5 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf26.hostedemail.com (Postfix) with ESMTP id 154F6140006 for ; Fri, 27 Jun 2025 06:23:01 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Jew6qj6l; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751005382; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=C7MLuCSrE1yGN0/K+3ZOSsZH1kdg+jacwIW2AlKNhWo=; b=vVTnBC2OT+LXEXw+kdnu2s/0NarHGp0PKDeK4Fqd/foI6RGdExb1+dMwUpwkqcyc+aGf3+ hoktmTG2XoBQsbDx+fy1voSPB6T4cLAnuYh7Kj/KNKY2VXMN1uSKeVwWijjheDhhNIaXoS XVGzMeZtcYbsX2wYO+SSQyuYF3kSKWE= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Jew6qj6l; spf=pass (imf26.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751005382; a=rsa-sha256; cv=none; b=3YUw4iPPpbtiYCwnmM/ppGtw74ChNYABlnjJY9Inccow/6UcX8jxwMA0sp12GPDqUbYBMs m7Vf7P1FWVt8+kvpGH8ZOUr1MxFjtF4wR5ucrL4Tx2T2YkU08k40vHuBhNP/mlpeZ52niD uAH6OBchSr/EiLZJToU80AvgTaFh/L0= Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-747c2cc3419so1640328b3a.2 for ; Thu, 26 Jun 2025 23:23:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751005380; x=1751610180; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=C7MLuCSrE1yGN0/K+3ZOSsZH1kdg+jacwIW2AlKNhWo=; b=Jew6qj6lKDmdBUBOPNzdK0y7lKgaoEEqDLTG29oGGmfsxj3Mm2TSm6S6zhSufb8kDX 9ZNo5Uxi9U0GpQn+pV082TZ/nqUIx2BVIIuiO+PQ2MG8nIIcSNaS6jKA9VIsawSjiKEB /6cFkkavYHhy/wwLzNHsbk5bhehf375HAuOrcWTbz7lo3Tk2IB0gCajwcGjWKtXrXt73 CtGckIagaI/drWE/rWgC0mUf3uiiG43gIRsTQN0UD5zh7NJNFkXwZCxtZNmeCAxCPUwQ ZrBbXACaWxiXQ/9xzO1O2iteXCxV2cf6nqauZGcNx3l4DANqtH6PMoxVzPqb89k8FG+/ OCkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751005380; x=1751610180; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=C7MLuCSrE1yGN0/K+3ZOSsZH1kdg+jacwIW2AlKNhWo=; b=KjIR8iNqNIa2MJJD/voBR1mEu5u7f0BIpzHFPPBFnyisfkt6f5/yPTOTk6hebpcpIU abbOjXxVFPzFAWyRiROAOW67PAdfFk49SdkXwxdeSEAf9HAgqY9W5eYekeTN9tReIwg/ L7acDarWo6NgJke90nM82BrRl+11WuDvQbLS7HTKAP1Trqp7UU3WsRL0lHgNUiYxVspp ibLWpbz5KNTELErL71k2pjFNEArHt2MH9HAFwVRXctl6xmmgIBFX8gJgyFGG17+3xHin MFWSSTU7CDRFBV/nJa6iYTUP72zeQMAgGtYqPIWB+7QZZd6Ix9Nc9/dRcMHrZSp/KG2z hwhQ== X-Gm-Message-State: AOJu0YzWwPYI65s5bZb5o9FRJ+Hvj/meWcj2564MUqJx6uiFx/1DA7zB WFAYiAsBnzO8CpHIRnbEINPTG06BAa3qTKq8uarwgTYdnJwVpMdvodjudi4Ix5PxiWo= X-Gm-Gg: ASbGnct8Ng8LeWSHckMH2yTKbqEEC03DBipW10a6WbHSxJ8AIIyAq8wo7pEd8GsX5sw 6RqFQrjpQvhlM8hPQe/DZiVwRJG/1Wjm1HshkdbhZoiM/STqE/O5gWC/5j96T1DRWpDsJpzflQW MUdrgX3JDGRl3j5H8+sNbSqnFpxaHKtoROq62IJrCvnaMrB9rv8FnogbMsFJWsyH5xMGVbKAcEo WnBIPAWg+/nzvzyHo4kc56pTpSbCaw9By3H1Y4TXQ1P9i2I4omnSRujQyLH6t3uxpWPY4pXTSUo dzRiCB+gcsgZhuqEnnCmbOR/DJRVJKIHaCWjiLblpjF/Z07cqZQAMohylrqTMjjRKoEcHf4Ekw4 9 X-Google-Smtp-Source: AGHT+IGH69rDgVPbw+rW1TxbTSwidDYOQSjIR5lrciPeYQ1EueHkgXL17qrV4KOtidHHVr3VYimEvg== X-Received: by 2002:a05:6a00:b4f:b0:748:e4f6:ff31 with SMTP id d2e1a72fcca58-74af6e509f7mr3101933b3a.8.1751005379998; Thu, 26 Jun 2025 23:22:59 -0700 (PDT) Received: from KASONG-MC4 ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74af5409cb6sm1456212b3a.23.2025.06.26.23.22.56 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 23:22:59 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 5/7] mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO Date: Fri, 27 Jun 2025 14:20:18 +0800 Message-ID: <20250627062020.534-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250627062020.534-1-ryncsn@gmail.com> References: <20250627062020.534-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 154F6140006 X-Stat-Signature: txk5cewgccyxu1h8tjk3qysiomtf7f3m X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1751005381-252365 X-HE-Meta: U2FsdGVkX1+mUifSdcfw/3BaGqqrtaV2QpmlARIwgr1jkZJOtSdDDfwxvPwO8fVrih2Sfj6e/k7+Al/7rg0p+S7wVlIfNqT0E1ncPaPE29jn7UKpn1wjS/KlaK68YysKB5KfxJG0hXv2AS/jPWBkrscrpBqp32JhKfjeDGjUGouYOZknXEqImAF363HEYEmlE77Xre0/KhBEE7FSjzLheTFSy7ISbsw76SQFr84lX0OIXTR74EABft8hfmHi3nPRiT5ZY54QHfMcuxAtyAhbZ28M095imUUFySoGVdqK3fI6E0ElyIf7mbK0YMearMTzh8iwc3/250mgkFIbI48zAtdC8yrm7mtKwXbg2ibnwaJ7Zt3OP34jeDUIhZ/vlEXqwtVJu9C4NqCqGkbyaw1mqV6XMufXiha82bNY22bP8UnRJRM51pXpmpjo5ojunS8pOJUStl/aH5Bhr+L22DdTp2oSj/v+Oqq0zv6I/yrdowvLqlaQH9/31O/RG8QqUPvNhQQTKVSW9JlY6esM/sK0ilgE/iHtmykSjYyI84DHYLVpaT2pnoGUYS8CSg7BWfGf7OTWQkDQ4e9jEu/LgEvKNaD2vbNc0Mo6fVpD+AFeSZtztqy+XZr2NhlseK9W7xQlpd2I7zSCazcw40s4gM9eZNfJImhEjO4Sq3+kxVPgvY4JqB+TQqjBrW31VqWYVWminqegw4BPQ0aebRHiqlcsra/Xfr6uZq915Oo7KQLfBVKtga5sSP4LIojwSDL3Oo2Fd3xFfAi7e4Rwr3lS3lnnQBm5urxJ35CWatdha+Xw62q4G6LpMzYN9Bm3iT1DTGdaiaAe7ipzLZqiYgs7s/AD/9DUtTYQMWfIyl6xEb8n8Z80aTZvyHxB9m9YkHtd89BoMwx1mavhth1Xsjr+o3yGk+FoLWh//zwKvIoIeBlfnovDmupmz8HPN/rhONDsP6sii3w+Rd0dnwQ4Z7CxHxN mSQsZWm/ fdOQgUx/y6Dm1nKazSSUsoB1r/qhp+U+pOAeYKrrjJmAtVkai0sMmjEi5bdSO3Vuihkhs0xidbBlHaH48k8c5PF3vfbNwCS4NIaG8IkT6d6SJ0pT+YH4nNkXW2yH7G/hn/ZgeXYrTpcUaeVos6qqUUAodSQoJmhqCSr/UOG5dfsrVha5zsMY3P2ZTenENoGRsXgTu3UYn/h9CdcIMs15GlxOX2BJePtCZGheJ4c2jiaeRguZtpLGgSuWmWosKgO7qSDFPPrZJSufUvMVtc6d0OkRRuBxO7zcbgyfaRGxQxDFZNHiYU75lXVXR34Psvk0N3cP6ugzfXhLlChXP6z7ZE/55foV+jnYSy6FaMhWIT9xKVDKNU/R6NTNthDWPwU01AZZjz0negy79wvNVTE+r99USHWKXxqoYPreMEFP6lyEDfhxqCh5K+QTo41EGHWBA4lbVy7MZ6gAdvi4b9K0tftW0cnZAeukxVKADWpntQ+rifb5blpK6a9igq7ldMjba15dX5lkgvrg5V/0W3RgLF/wStbU82NOr3d6vevBuvV8oqTF53uJtAOSJ4CBeDT5t0CzMRDfgSqZ3lAb6BRyhfkQWCe+ZCi/KCBagzo6VFm0df+bLG3lV4dnG/qMH1bTPsyOq0/QSKYOBucDCPLJOreTVpQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently if THP swapin failed due to reasons like partially conflicting swap cache or ZSWAP enabled, it will fallback to cached swapin. Right now the swap cache has a non-trivial overhead, and readahead is not helpful for SWP_SYNCHRONOUS_IO devices, so we should always skip the readahead and swap cache even if the swapin falls back to order 0. So handle the fallback logic without falling back to the cached read. Also slightly tweak the behavior if the WARN_ON is triggered (shmem mapping is corrupted or buggy code) as a side effect, just return with -EINVAL. This should be OK as things are already very wrong beyond recovery at that point. Signed-off-by: Kairui Song --- mm/shmem.c | 68 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 38 insertions(+), 30 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 5be9c905396e..5f2641fd1be7 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1975,13 +1975,15 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf, return ERR_PTR(error); } -static struct folio *shmem_swap_alloc_folio(struct inode *inode, +static struct folio *shmem_swapin_direct(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, swp_entry_t entry, int order, gfp_t gfp) { struct shmem_inode_info *info = SHMEM_I(inode); int nr_pages = 1 << order; struct folio *new; + pgoff_t offset; + gfp_t swap_gfp; void *shadow; /* @@ -1989,6 +1991,7 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode, * limit chance of success with further cpuset and node constraints. */ gfp &= ~GFP_CONSTRAINT_MASK; + swap_gfp = gfp; if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { if (WARN_ON_ONCE(order)) return ERR_PTR(-EINVAL); @@ -2003,20 +2006,23 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode, if ((vma && unlikely(userfaultfd_armed(vma))) || !zswap_never_enabled() || non_swapcache_batch(entry, nr_pages) != nr_pages) { - return ERR_PTR(-EINVAL); + goto fallback; } else { - gfp = limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); + swap_gfp = limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); } } - - new = shmem_alloc_folio(gfp, order, info, index); - if (!new) - return ERR_PTR(-ENOMEM); +retry: + new = shmem_alloc_folio(swap_gfp, order, info, index); + if (!new) { + new = ERR_PTR(-ENOMEM); + goto fallback; + } if (mem_cgroup_swapin_charge_folio(new, vma ? vma->vm_mm : NULL, - gfp, entry)) { + swap_gfp, entry)) { folio_put(new); - return ERR_PTR(-ENOMEM); + new = ERR_PTR(-ENOMEM); + goto fallback; } /* @@ -2045,6 +2051,17 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode, folio_add_lru(new); swap_read_folio(new, NULL); return new; +fallback: + /* Order 0 swapin failed, nothing to fallback to, abort */ + if (!order) + return new; + /* High order swapin failed, fallback to order 0 and retry */ + order = 0; + nr_pages = 1; + swap_gfp = gfp; + offset = index - round_down(index, nr_pages); + entry = swp_entry(swp_type(entry), swp_offset(entry) + offset); + goto retry; } /* @@ -2243,7 +2260,6 @@ static int shmem_split_swap_entry(struct inode *inode, pgoff_t index, cur_order = split_order; split_order = xas_try_split_min_order(split_order); } - unlock: xas_unlock_irq(&xas); @@ -2306,34 +2322,26 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, count_vm_event(PGMAJFAULT); count_memcg_event_mm(fault_mm, PGMAJFAULT); } - - /* Skip swapcache for synchronous device. */ if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { - folio = shmem_swap_alloc_folio(inode, vma, index, swap, order, gfp); - if (!IS_ERR(folio)) { + /* Direct mTHP swapin without swap cache or readahead */ + folio = shmem_swapin_direct(inode, vma, index, + swap, order, gfp); + if (IS_ERR(folio)) { + error = PTR_ERR(folio); + folio = NULL; + } else { skip_swapcache = true; - goto alloced; } - + } else { /* - * Fallback to swapin order-0 folio unless the swap entry - * already exists. + * Order 0 swapin using swap cache and readahead, it + * may return order > 0 folio due to raced swap cache */ - error = PTR_ERR(folio); - folio = NULL; - if (error == -EEXIST) - goto failed; + folio = shmem_swapin_cluster(swap, gfp, info, index); } - - /* Here we actually start the io */ - folio = shmem_swapin_cluster(swap, gfp, info, index); - if (!folio) { - error = -ENOMEM; + if (!folio) goto failed; - } } - -alloced: /* * We need to split an existing large entry if swapin brought in a * smaller folio due to various of reasons. -- 2.50.0