From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67A01C7EE30 for ; Fri, 27 Jun 2025 02:53:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B470E8D0002; Thu, 26 Jun 2025 22:53:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF7758D0001; Thu, 26 Jun 2025 22:53:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E64C8D0002; Thu, 26 Jun 2025 22:53:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 877458D0001 for ; Thu, 26 Jun 2025 22:53:03 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id F11E9804F8 for ; Fri, 27 Jun 2025 02:53:02 +0000 (UTC) X-FDA: 83599658604.12.6435CF2 Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) by imf05.hostedemail.com (Postfix) with ESMTP id 21AEB10000A for ; Fri, 27 Jun 2025 02:53:00 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="UTH/Y0Nz"; spf=pass (imf05.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750992781; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=N6PzA7mB8f6GScrtLf4fvCqozUiHt2kWSKvUWfhehRw=; b=b+Xj2rzfzNuloi87dd/NCW61HOnAy3qXdjSCEhjUfRQzenCZz2Gl3mJb53ReE8lDT7rImO jl+2BrY5VVgYJO93IW+g0SU9hvhWD0nJ05+lySWWmHbZS1tyAe/DojetHMq1pmx/6teA4K ujfFm45olNQfHf+wx2dkeK8quj56UWI= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="UTH/Y0Nz"; spf=pass (imf05.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750992781; a=rsa-sha256; cv=none; b=yT7pOdC6eTg7NQ/sALyY47kSfJdt4wsowZjXeBXkWEYrS0x+Pvj9YTwqBzJzJ7veJga/Dn RLOq2T3SjZjIaC0Mqhx4Xdx+NXaWCTe+Url5s1IUkKWsNYZylRATSAkvWODh+QGBb7dO8+ jG/4jo3jO3T8/dofZnsWHWgunFFhQNQ= Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-450ce671a08so9340315e9.3 for ; Thu, 26 Jun 2025 19:53:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750992779; x=1751597579; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=N6PzA7mB8f6GScrtLf4fvCqozUiHt2kWSKvUWfhehRw=; b=UTH/Y0NzLFWdKPie+X4MUBjw9A9vnZuV1teLMsSNegA5iTSkZVrOvHMwF/YjaZhhlW epN6/G3WVao3wLOQEQgMn/KB2FDjxGOo5RhsNHOGGAiDXUWofUn1BFgGPKhEbwrv6JvI rFkeoTlXaul0KCnw45qjJuRH8/IDcetNXe6GRAhy7NlwIKnyYvRgGdHWRgjAFeRpzCWA BHotRqkb4fkx45VGIxIaCKs+Q3PquZeECxEyGGffQw9Mtin4SBMMGig4QbsWeXXKxMsX ANJUVJmoLS3O1xdXM2hPuBYwhH25LV5/ZG+A78mYFLxmt7/bCXdP55S0dBD61QnFBakd NVsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750992779; x=1751597579; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N6PzA7mB8f6GScrtLf4fvCqozUiHt2kWSKvUWfhehRw=; b=B0aAgyRBpIFvdFNPZD0XJq8F/ALD5cXg0/GNwmpiIrvt4o5d0UMU0ebTBTFNgbESiF hj/otAoOqLVWymGP9BlGzYn3txvrZtEni7ynkp4W5GXVa5NnZskGXbegiXvyXn0MzAaa ztbZQ3lPtdZZRhw8qfVp6b3rrDWHX6pwv2hkeQHucvQUm69YAIDrAbpIuBSmznGyjnMe OH88k/SGhFO9jz4xbPK8Dup0WOUME4YAcXd6XslbSF7qU4KWmjt8AtpiL7PV6mpoxgG9 86jlx7LpTwMcKGjfPlDsHw3ilM6t/tE1HH38mrLSLwSdyimMfz5uJ7wF0/JFnceUdJnm qGIw== X-Forwarded-Encrypted: i=1; AJvYcCUrNulekn2q2e8qiJ20HcRjD4gHC0eXjr+LsngDEJ9dKmz+LkECywTQa2oL/HcTOucqf01zEAaR3A==@kvack.org X-Gm-Message-State: AOJu0Yx8NX2LYyCJRc4QlfVl/qDxgqfFjgyyuq4nRmgwILPyM1ljLHMg YtWMBroPfv+EP8qxfRzLSOWYJxHWypeImh28uxL8MG3o4h8f08wS8CGI X-Gm-Gg: ASbGncs7ba+MUHDTrDnKjdDteewBD0shNbp7seK3sa+Vwa1NdmjM/XsBLVQcxYY7gOe cuO/l0tmpbUlX3t3Ffvlrv/GxGmNzDG8CatQFNOvw8lililM5v4f1g1bwQ0e8cbOxbK2saXbert OIsu/zZwIekqgcXXppL+KQYWL8F6/iqLDFHHtttr1KJ+h2mSvpRFQQJb2eu0z4hD6+S7WJ+P6aE cGBcT4tgDV9YcLWgRtoVqGO3HKMLFAALoVQWtCOkn574MiA2cM6UXPkVNhoEl+RL1ucPVMDJqva 2H1ORZarqERkQY6lVNJHUNfjiIwIDuKc0vXtRtzKygP9MP+c X-Google-Smtp-Source: AGHT+IFBUodrO2XKQIiwzTtHA/m8Q0lEsRRL9iE7J3KYXgrGSCiuYQrK9rEZ8la7pXGRsNR2J1eheQ== X-Received: by 2002:a05:600c:3582:b0:442:f482:c429 with SMTP id 5b1f17b1804b1-4538ee3b4dcmr12330455e9.8.1750992779126; Thu, 26 Jun 2025 19:52:59 -0700 (PDT) Received: from EBJ9932692.tcent.cn ([2a09:0:1:2::302c]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4538a4213f0sm36194695e9.36.2025.06.26.19.52.51 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 19:52:58 -0700 (PDT) From: Lance Yang X-Google-Original-From: Lance Yang To: akpm@linux-foundation.org, david@redhat.com, 21cnbao@gmail.com Cc: baolin.wang@linux.alibaba.com, chrisl@kernel.org, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, huang.ying.caritas@gmail.com, zhengtangquan@oppo.com, riel@surriel.com, Liam.Howlett@oracle.com, vbabka@suse.cz, harry.yoo@oracle.com, mingzhe.yang@ly.com, Barry Song , Lance Yang Subject: [PATCH 1/1] mm/rmap: make folio unmap batching safe and support partial batches Date: Fri, 27 Jun 2025 10:52:14 +0800 Message-ID: <20250627025214.30887-1-lance.yang@linux.dev> X-Mailer: git-send-email 2.49.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 21AEB10000A X-Stat-Signature: hysiti9absq8tiqp38qzwx5srkpihgiq X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1750992780-828660 X-HE-Meta: U2FsdGVkX1/tcw3O/SD8RJ28hWIqF6vVP9p/lOQsqhMahIDs8NF+BU+boG+ACjGB4h4UGWICI7v2ESvygRda1oz1FlD3CcCSxEdw7LmUmYpcFG2LE4ySN9JeT/1MYxKUTaqlPhQ8R3KoDSoWpwwC0VBpfvzC+KV+5QrMvNEJH0J3UbCujVuhkFWA88QfFMhOEO/DeuD3CXRcIwwXPqbQPO+Phu6/VivBB79dKL2wokYVL5qdjoEoHOg0fdWFZZThigVbdXv9eXEkafvoWQW8RCOfifQ9gQ6xBoqwheo7vnu444jfH5QN0ED1fZmx1IwvM86527vZyXEDWkXMoVyiY4On/esZ9rrqAXaVItYg2HsHlzJsVND5GVjHA4LE/rZH76PxRzEIViYv42SwCdLWi57ELjYi7Az+9kxLsV40Erqak5KdXsNSFdHTo4unwSYKHCHunHBpSkjJJW5zSWhYvlCxrQRrgCrvOezks0+briiCzTi4oSuPU/hincz4V38y+IwdvgTBL+f3aylPWTlcIqZA3lO3OAZo+Ocw3q3TRnZz14rSojAzyoaG2Ip6SUrf4wSH7fOHaabj2UbmwgCdmG8/tZ1XbswM+CJq5g6iPCaVcE0jn4F+XYymvYdZT3dAGakzOrxRsnP+lktbMODyA8vBIif3cZZyGmjD62xULCTokPwNkg8GhLvWHyI/HBt77AiwCQh95eDq+0HgLn5RxB8B3AFYwb0y669PFpgUlVGUvle4tm4kcRwGPRzVXqpjDoLjlWJQ1zwhfb0xaGNeQja4JfH0P7lLd80w9l+ZodhK8kMrc8FgafRMeeMtXe5fNm0u78l3bOjIDev3eMU4C1e7RGsgfPzI1a1sg2lAsB5Y828Xnon+6IHWut8nojQPP1rD0TqURkiALxv4HsQr++zCohZZmGpVb4jPklPFfyt91Wyp3zJ9fU+cBqaXZg6XJlYWB5EtPkkMIVKdNlO lMARxOHQ KiRvsLTxmNA59f2hF3e9Uid+jaYY8x46W28OKdGhLa3+5SAbh27YjOi4VcXMEh9Nwx91tvp2+f8H7kI2VWABITrLGMfRIxZIFqfhO03gc2Hwf35dJQo2t4SBJyHxlW/WFuVgHFgZ/NJANHI9xlPMtpnbQkAqVHqmmspRfQ6q/NVLRi+C91vKIjeRMgGZJVThbw8NBU9ZouUXScSKNB5k5I6J7AKRcpd9MPcyh2X7B73ibus18y3cnEb9Ljnl3jzCYZFaqrjNu8dMaKCGeE7i0bRR5Pjy3HR2FlWj+xqehbiTsOyLDcVhcWVmcvZolkayGgXDTT0dCtoWARIA2/LPDamlAuxf1TA/GrMBgikesrqNo7LvRcr6eZlgnkt19zjE+fRho3/kDGe00jnVsbD7GupbhBSD2ChPVz4jiSzuX0ogYDC5humza9da+Krxc48VNH8RH0GZGGfXZ7ofSr0e5KW9ofOvxDHTXiGcQ2TvbExXKJPcKyTOqzrwBopugm+wZ7MqK6zTvauazwYEzuI66mgOLjA0ID2nMhAcBRK8PFYymuaeCDiv2GU5Igt2qmCIReD3/CCGaIK4fM8E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Lance Yang As pointed out by David[1], the batched unmap logic in try_to_unmap_one() can read past the end of a PTE table if a large folio is mapped starting at the last entry of that table. So let's fix the out-of-bounds read by refactoring the logic into a new helper, folio_unmap_pte_batch(). The new helper now correctly calculates the safe number of pages to scan by limiting the operation to the boundaries of the current VMA and the PTE table. In addition, the "all-or-nothing" batching restriction is removed to support partial batches. The reference counting is also cleaned up to use folio_put_refs(). [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation") Suggested-by: David Hildenbrand Suggested-by: Barry Song Signed-off-by: Lance Yang --- mm/rmap.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index fb63d9256f09..1320b88fab74 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1845,23 +1845,32 @@ void folio_remove_rmap_pud(struct folio *folio, struct page *page, #endif } -/* We support batch unmapping of PTEs for lazyfree large folios */ -static inline bool can_batch_unmap_folio_ptes(unsigned long addr, - struct folio *folio, pte_t *ptep) +static inline unsigned int folio_unmap_pte_batch(struct folio *folio, + struct page_vma_mapped_walk *pvmw, + enum ttu_flags flags, pte_t pte) { const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; - int max_nr = folio_nr_pages(folio); - pte_t pte = ptep_get(ptep); + unsigned long end_addr, addr = pvmw->address; + struct vm_area_struct *vma = pvmw->vma; + unsigned int max_nr; + + if (flags & TTU_HWPOISON) + return 1; + if (!folio_test_large(folio)) + return 1; + /* We may only batch within a single VMA and a single page table. */ + end_addr = pmd_addr_end(addr, vma->vm_end); + max_nr = (end_addr - addr) >> PAGE_SHIFT; + + /* We only support lazyfree batching for now ... */ if (!folio_test_anon(folio) || folio_test_swapbacked(folio)) - return false; + return 1; if (pte_unused(pte)) - return false; - if (pte_pfn(pte) != folio_pfn(folio)) - return false; + return 1; - return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL, - NULL, NULL) == max_nr; + return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags, + NULL, NULL, NULL); } /* @@ -2024,9 +2033,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (pte_dirty(pteval)) folio_mark_dirty(folio); } else if (likely(pte_present(pteval))) { - if (folio_test_large(folio) && !(flags & TTU_HWPOISON) && - can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) - nr_pages = folio_nr_pages(folio); + nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval); end_addr = address + nr_pages * PAGE_SIZE; flush_cache_range(vma, address, end_addr); @@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, hugetlb_remove_rmap(folio); } else { folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); - folio_ref_sub(folio, nr_pages - 1); } if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); - folio_put(folio); - /* We have already batched the entire folio */ - if (nr_pages > 1) + folio_put_refs(folio, nr_pages); + + /* + * If we are sure that we batched the entire folio and cleared + * all PTEs, we can just optimize and stop right here. + */ + if (nr_pages == folio_nr_pages(folio)) goto walk_done; continue; walk_abort: -- 2.49.0