From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0587CC3ABC9 for ; Thu, 15 May 2025 09:31:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 604F26B00BE; Thu, 15 May 2025 05:31:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B4586B00BF; Thu, 15 May 2025 05:31:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47C196B00C0; Thu, 15 May 2025 05:31:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2B24C6B00BE for ; Thu, 15 May 2025 05:31:14 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 17600BF1B1 for ; Thu, 15 May 2025 09:31:16 +0000 (UTC) X-FDA: 83444623752.24.D48F221 Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by imf16.hostedemail.com (Postfix) with ESMTP id 2E2B5180004 for ; Thu, 15 May 2025 09:31:13 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bHqUQa5t; spf=pass (imf16.hostedemail.com: domain of klarasmodin@gmail.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=klarasmodin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747301474; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H+yp4Rs1RGpTtEKE1ruco0L8UqkFY0S2ulNF6Jl1CRU=; b=lRpPTLoOn2dskkIzC/Q4PfTxQbPh68/e2bSf8QkDrVRVNzBqsoGFw8rAk7/aemm2wFZKoY TdxBbCTBZrjjcaOrydfMdDsP2GNJnUzDo1e4ZyzJlr6BVZ09Y6e4GQuZDHEeJ6aIOa3EB1 F0bhAyuCPp5ElPx+ucZs3eaPXw+SHoM= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bHqUQa5t; spf=pass (imf16.hostedemail.com: domain of klarasmodin@gmail.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=klarasmodin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747301474; a=rsa-sha256; cv=none; b=bkuIvRWAcweE8naNBp1PgnwCOHnUMjOcAgeTygeaMIHpG3Wkf9WOv/oJIKtDRBr/Yc/RiB VjlQX+yyJ4YYPpd+vmTI9mjiPqCc0GQhPjsvZAV2co9h7TdR/TjSPw4IyBmBtNT06L/0yI 5L1B/t/aFN+nUyESRIQmHQk9nZFS/cQ= Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-54b10594812so745026e87.1 for ; Thu, 15 May 2025 02:31:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747301472; x=1747906272; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=H+yp4Rs1RGpTtEKE1ruco0L8UqkFY0S2ulNF6Jl1CRU=; b=bHqUQa5tpa+QzLEowBQkLkBiymDXHm/su/u0jCcMCijjQVqRapNtq2uHPM0akiTEZM RzXAPZF8exoSwpA68ScWUz1P5PsEay1ViYhzt3LmTYmRD9h+ZxAd+dUdl0aZ+3aiB9Jm 0ju1E6SSuGoDVVL3y45oG58j7OHQ3A4VTMMIUEK9+jz9lU7e4c28DZKOoNFO4C+yOr9H Ye7luZNZ5VuFSYw5oCVglUW7ZnSgbeEC2adDBW/2CRoEkEbxhzqHp/CWsavS3opQkLBw 0sfwMS8cz7/FXDFgkmAz4klbp2BxesvuGXKTQ88MVT0NsXO5XbKTzlNIwrUxRPP1s/WV PxkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747301472; x=1747906272; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=H+yp4Rs1RGpTtEKE1ruco0L8UqkFY0S2ulNF6Jl1CRU=; b=xLO+viS/tyBO+elsXxCRYTS1wpUzjBGvGD3V7lbPo6tityLukfvFF1Gtuej5wHWkHy VomClsobNYEr71yrzmigMCMES9K2N1R505GV36uwiHqbDWxA6G5jf6aEryBiDKSzh7p7 FpZNEIWyvKRc2FOnNEMcEbGXuG6iV4rfJm0fpMSewDsup/79D+Mxw2qZaBelW3AwJxye PNMOMfn0ppK+egVjNmmRT4yy4OgIWawsJi4U7euGSkek23fvRzw4aF1sCRUNxcrtUzzb WvKjfPp9JdNlV+63rTzRInEahYiDJdjSx8TQeYIcM9Uzv35Ur2omnUivDDVvklnC9J/Q bfTA== X-Gm-Message-State: AOJu0YxeHWrHaq3baSPQc77tK2n9yJv/ZQXtRKQLGe5hNoNif08pFcB8 kLbYVk6UtmaQaBXulFYmMFJvOttBeEamTkFzQVuxXFet1eq7U9ZL X-Gm-Gg: ASbGnctZ6IVd4PDYyE/SEsGKllMEscmFyPMHSWMUQPkOT0ZWClvvwT8SccFqPgd6Axy oSAF6tkrx4ZinY1PVsYn3iX8oJn60/4yCTlXrh4X4vvjxkeFwguLB4RoL+xwFMPHn734hAsFyX/ 4spJgnnurJhoZhAENiEKgTVzM3I37MnVITJbYoDDD6MkOEi4D9/OWZmep+tjn80Mv0U/D2hKo6Q k2S4pTmyQJl6BVI2ajyUtERlkHzYSaaTgGr6qKoJ+KvEGNhiOR72QMvUdd45eqw5pGkSwRbIlw6 FpmcwdZb2rZREWGxtITZWLTTVQz+T53F/sAme15JesAcT+4mvloJJYWky9sTRy1Z+Q== X-Google-Smtp-Source: AGHT+IE6i83yVNZlqINw13SaGwTdHej02o+ZoOXkE+9oIBwKs6K8O7oWHQO69i0g2D3fwUc2Ma1IaA== X-Received: by 2002:a05:6512:4614:b0:545:2837:17d0 with SMTP id 2adb3069b0e04-550d5f7c298mr2657153e87.12.1747301471874; Thu, 15 May 2025 02:31:11 -0700 (PDT) Received: from localhost (soda.int.kasm.eu. [2001:678:a5c:1202:4fb5:f16a:579c:6dcb]) by smtp.gmail.com with UTF8SMTPSA id 2adb3069b0e04-54fc64cc24bsm2577513e87.215.2025.05.15.02.31.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 May 2025 02:31:11 -0700 (PDT) Date: Thu, 15 May 2025 11:31:10 +0200 From: Klara Modin To: Kairui Song Cc: linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Hugh Dickins , Chris Li , David Hildenbrand , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Baolin Wang , Baoquan He , Barry Song , Kalesh Singh , Kemeng Shi , Tim Chen , Ryan Roberts , linux-kernel@vger.kernel.org Subject: Re: [PATCH 11/28] mm, swap: clean up and consolidate helper for mTHP swapin check Message-ID: References: <20250514201729.48420-1-ryncsn@gmail.com> <20250514201729.48420-12-ryncsn@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250514201729.48420-12-ryncsn@gmail.com> X-Rspam-User: X-Rspamd-Queue-Id: 2E2B5180004 X-Rspamd-Server: rspam09 X-Stat-Signature: khpguxndto3h8use8skf759kuneyscm9 X-HE-Tag: 1747301473-272871 X-HE-Meta: U2FsdGVkX18ybbNRBYC/tOaQbrGBBN4gAqllcHmLGvNnIfTd3j2Xekpp5raZcXKScjIzk+0+OW+BZQERqp+fStBVUNw2Df+klqjsCknkwVoYVUBbCihGlWEvamMLOr/xITdJbPSAh7DKVuvAmBLRZvLDBj/JahrHl+HDZ3xo4k7j2DEkuUu7tfsDHbVm6O9Gy8DUu1AttkbuA/Hr0rDWuGrX12FaIR8l8DEj3jes4cm9qVk0J6m6i1wWrWY8FWQSZiMlKVM/xQ+FU45zF4HziPtcXVDOEvutrwcAktA1I3qkzsuCeMISbHg83hisjXeq8DLOCzCebg+ytz4aDvCZnnOz4yavKZCnNDVlw22oF7DzcYikl0wcvOWxSWJ6rcSdTNo9mWrpoaRwASrtMosTLLk4Qud2edKKwto3IkAVDjhGV+570wP18Le3EQnr4iI63pOpvw+eQysi66GOCPpMeKQUrzV4fhM3YO0WKidyJB9cY39o2MMMvlnvyjLmeXsAg3bGoQXwO0dyv6VIRFiJOyLKmBuDewRpZdFVJOtwAH99j0o9S3Hlr8stxfMHZTv8FSxdxErEIBxC3aPWx7PboEjImfz4ixieEFkAiHVI8TWJAs/TNwyVLGKgjgJft6rwKYj2wEON1I+a1LnMV8MrPgQ0yiuQSl5K0HS7gLB6hMugriEUrP5f/PxrjtkS0dznFNxU/LPDKrilvk3/GTU8bhQKHvrfa8ch/hZjfreZ9wzsFjLE4MJLY1+ftOIVUw7CZfdePXPH8H9ELz5K0X5xERVWt62Qc7Ted7wJnYH2R+LRBQVHWaycZLMmV0DJ0fjXwRS8g+X6ZzxIzt5NBXWCtgfpd5wiT0Qj3gq/hjRMk4/LbkG6JwEfHYeNjewmd3y8oMdN2PC7QOMhycLXfvEudGNpgqf5WmrytcZ2PDOa8H8e5TcNcSscx3bhodMndSNNR5whPunCJpmCB/hk5ed pU1eARHV Ln5FWFa/MhF60932DY+bgaX/15bC/cAUEqy/K5J+UkotK1bptH++mS8nKv+KnZ2mwOCATrBZmtCgFXtlhhl/KLORdk4QWHiIDMcpwrOPwzTcStobrPSLoSHljtDyxBbY5cWxqt/HjwGmByQihPBV7dp8/jKHvJW+cw7+u8AXFJsacCj2FkuQaIk+9qDffV+sew4r5v2G7w9fTI5Jyol5lSOosnOU7k92rtQO2Fnz4DgPGJFMTK3DxL7rmkzdWQPN3bs83Ame8zOc3XyhQkJp+vqU1AMwMcUNLmtX2YufrQ+8/Av4+1XfLz0msZpEuyyn5JiV1WSAd8vbqWrAi3BC+dlvDHQpJlfRrw2qo79Dc6JJp8bUgS1GAhNZWMTPBFdwrdR5fSa5XTFbErq2kqSqx+n985AqzSVwsfDml X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, On 2025-05-15 04:17:11 +0800, Kairui Song wrote: > From: Kairui Song > > Move all mTHP swapin check into can_swapin_thp and use it for both pre > IO check and post IO check. This way the code is more consolidated and > make later commit easier to maintain. >From what I can see, can_swapin_thp is gated behind CONFIG_TRANSPARENT_HUGEPAGE and this fails to build when it's not enabled. > > Also clean up the comments while at it. The current comment of > non_swapcache_batch is not correct: swap in bypassing swap cache won't > reach the swap device as long as the entry is cached, because it still > sets the SWAP_HAS_CACHE flag. If the folio is already in swap cache, raced > swap in will either fail due to -EEXIST with swapcache_prepare, or see the > cached folio. > > The real reason this non_swapcache_batch is needed is that if a smaller > folio is in the swap cache but not mapped, mTHP swapin will be blocked > forever as it won't see the folio due to index offset, nor it can set the > SWAP_HAS_CACHE bit, so it has to fallback to order 0 swap in. > > Signed-off-by: Kairui Song > --- > mm/memory.c | 90 ++++++++++++++++++++++++----------------------------- > 1 file changed, 41 insertions(+), 49 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index f2897d9059f2..1b6e192de6ec 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4319,12 +4319,6 @@ static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) > pgoff_t offset = swp_offset(entry); > int i; > > - /* > - * While allocating a large folio and doing swap_read_folio, which is > - * the case the being faulted pte doesn't have swapcache. We need to > - * ensure all PTEs have no cache as well, otherwise, we might go to > - * swap devices while the content is in swapcache. > - */ > for (i = 0; i < max_nr; i++) { > if ((si->swap_map[offset + i] & SWAP_HAS_CACHE)) > return i; > @@ -4334,34 +4328,30 @@ static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) > } > > /* > - * Check if the PTEs within a range are contiguous swap entries > - * and have consistent swapcache, zeromap. > + * Check if the page table is still suitable for large folio swap in. > + * @vmf: The fault triggering the swap-in. > + * @ptep: Pointer to the PTE that should be the head of the swap in folio. > + * @addr: The address corresponding to the PTE. > + * @nr_pages: Number of pages of the folio that suppose to be swapped in. > */ > -static bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, int nr_pages) > +static bool can_swapin_thp(struct vm_fault *vmf, pte_t *ptep, > + unsigned long addr, unsigned int nr_pages) > { > - unsigned long addr; > - swp_entry_t entry; > - int idx; > - pte_t pte; > - > - addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); > - idx = (vmf->address - addr) / PAGE_SIZE; > - pte = ptep_get(ptep); > + pte_t pte = ptep_get(ptep); > + unsigned long addr_end = addr + (PAGE_SIZE * nr_pages); > + unsigned long pte_offset = (vmf->address - addr) / PAGE_SIZE; > > - if (!pte_same(pte, pte_move_swp_offset(vmf->orig_pte, -idx))) > + VM_WARN_ON_ONCE(!IS_ALIGNED(addr, PAGE_SIZE) || > + addr > vmf->address || addr_end <= vmf->address); > + if (unlikely(addr < max(addr & PMD_MASK, vmf->vma->vm_start) || > + addr_end > pmd_addr_end(addr, vmf->vma->vm_end))) > return false; > - entry = pte_to_swp_entry(pte); > - if (swap_pte_batch(ptep, nr_pages, pte) != nr_pages) > - return false; > - > /* > - * swap_read_folio() can't handle the case a large folio is hybridly > - * from different backends. And they are likely corner cases. Similar > - * things might be added once zswap support large folios. > + * All swap entries must from the same swap device, in same > + * cgroup, with same exclusiveness, only differs in offset. > */ > - if (unlikely(swap_zeromap_batch(entry, nr_pages, NULL) != nr_pages)) > - return false; > - if (unlikely(non_swapcache_batch(entry, nr_pages) != nr_pages)) > + if (!pte_same(pte, pte_move_swp_offset(vmf->orig_pte, -pte_offset)) || > + swap_pte_batch(ptep, nr_pages, pte) != nr_pages) > return false; > > return true; > @@ -4441,13 +4431,24 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) > * completely swap entries with contiguous swap offsets. > */ > order = highest_order(orders); > - while (orders) { > - addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); > - if (can_swapin_thp(vmf, pte + pte_index(addr), 1 << order)) > - break; > - order = next_order(&orders, order); > + for (; orders; order = next_order(&orders, order)) { > + unsigned long nr_pages = 1 << order; > + swp_entry_t swap_entry = { .val = ALIGN_DOWN(entry.val, nr_pages) }; > + addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); > + if (!can_swapin_thp(vmf, pte + pte_index(addr), addr, nr_pages)) > + continue; > + /* > + * If there is already a smaller folio in cache, it will > + * conflict with the larger folio in the swap cache layer > + * and block the swap in. > + */ > + if (unlikely(non_swapcache_batch(swap_entry, nr_pages) != nr_pages)) > + continue; > + /* Zero map doesn't work with large folio yet. */ > + if (unlikely(swap_zeromap_batch(swap_entry, nr_pages, NULL) != nr_pages)) > + continue; > + break; > } > - > pte_unmap_unlock(pte, ptl); > > /* Try allocating the highest of the remaining orders. */ > @@ -4731,27 +4732,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > page_idx = 0; > address = vmf->address; > ptep = vmf->pte; > + > if (folio_test_large(folio) && folio_test_swapcache(folio)) { > - int nr = folio_nr_pages(folio); > + unsigned long nr = folio_nr_pages(folio); > unsigned long idx = folio_page_idx(folio, page); > - unsigned long folio_start = address - idx * PAGE_SIZE; > - unsigned long folio_end = folio_start + nr * PAGE_SIZE; > - pte_t *folio_ptep; > - pte_t folio_pte; > + unsigned long folio_address = address - idx * PAGE_SIZE; > + pte_t *folio_ptep = vmf->pte - idx; > > - if (unlikely(folio_start < max(address & PMD_MASK, vma->vm_start))) > - goto check_folio; > - if (unlikely(folio_end > pmd_addr_end(address, vma->vm_end))) > - goto check_folio; > - > - folio_ptep = vmf->pte - idx; > - folio_pte = ptep_get(folio_ptep); > - if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) || > - swap_pte_batch(folio_ptep, nr, folio_pte) != nr) > + if (!can_swapin_thp(vmf, folio_ptep, folio_address, nr)) > goto check_folio; At this point we're outside CONFIG_TRANSPARENT_HUGEPAGE. > > page_idx = idx; > - address = folio_start; > + address = folio_address; > ptep = folio_ptep; > nr_pages = nr; > entry = folio->swap; > -- > 2.49.0 > Regards, Klara Modin