From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 309F3C77B7F for ; Fri, 27 Jun 2025 15:29:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC3CC6B00B6; Fri, 27 Jun 2025 11:29:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A73C56B00B7; Fri, 27 Jun 2025 11:29:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9623C6B00B8; Fri, 27 Jun 2025 11:29:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7FC7B6B00B6 for ; Fri, 27 Jun 2025 11:29:54 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 29664B7634 for ; Fri, 27 Jun 2025 15:29:54 +0000 (UTC) X-FDA: 83601565908.15.9BDD297 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) by imf07.hostedemail.com (Postfix) with ESMTP id 319AB40009 for ; Fri, 27 Jun 2025 15:29:51 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=jojdp3aJ; spf=pass (imf07.hostedemail.com: domain of lance.yang@linux.dev designates 91.218.175.181 as permitted sender) smtp.mailfrom=lance.yang@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751038192; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sLiuMRtcmExZlAb2vPZafFZMyEqiWSdyJ2XHpTWjryU=; b=CngmCGSO+OecXUuo5ff2N/O/g5p2z8tOkxW+NE7799xNdeVKICempiod4dbXxM8BhuY/vp tcH1i5yAZCScu6k1WMooX00usbdSRwVAGK2t/k4IugrNhRxQxG8a0GQVlIdBxrOt9RTYJE fERHRmLvouzVfbw5VhEB0/zdimrYWt0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751038192; a=rsa-sha256; cv=none; b=PkzVgMYXhYZHoyMRZsHSERwcmfR9pHUnaJ30AE96iIZHJdzttnC8rqCr2h0jE/iWtnEUo6 kewolsuZFwYcqY1bgFlGwrDWPjrJZaKf3aULo4rsyZrumtGJrmxxMPGo1P/2goXsCRnmPN dZ8o1Tm+mwLfGlpILtj12NqzveZt1mA= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=jojdp3aJ; spf=pass (imf07.hostedemail.com: domain of lance.yang@linux.dev designates 91.218.175.181 as permitted sender) smtp.mailfrom=lance.yang@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Message-ID: <530101b3-34d2-49bb-9a12-c7036b0c0a69@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1751038189; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sLiuMRtcmExZlAb2vPZafFZMyEqiWSdyJ2XHpTWjryU=; b=jojdp3aJ6B+Xgwpz/k6+OhYLJXmPgXf95Us1PNv0bB6R1+gv6FBAjoT2fm5yhc9bnqacrM dJgOVT4u5v1BYJtWnvmK5l8QzMJqxt7gFKd8YtXf0nkUr4H7+W/qnhtPH/Z0E7K4fZlxiG YH3Sa6PrtnlBPZ17rz56nOlQAruaDLA= Date: Fri, 27 Jun 2025 23:29:11 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v2 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap Content-Language: en-US To: David Hildenbrand , Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, huang.ying.caritas@gmail.com, zhengtangquan@oppo.com, riel@surriel.com, Liam.Howlett@oracle.com, vbabka@suse.cz, harry.yoo@oracle.com, mingzhe.yang@ly.com, stable@vger.kernel.org, Lance Yang References: <20250627062319.84936-1-lance.yang@linux.dev> <1d39b66e-4009-4143-a8fa-5d876bc1f7e7@linux.dev> <609409c7-91a8-4898-ab29-fa00eb36df02@redhat.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Lance Yang In-Reply-To: <609409c7-91a8-4898-ab29-fa00eb36df02@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Stat-Signature: y61ya4ykh54chgwwxsta3u1c3noj67py X-Rspamd-Queue-Id: 319AB40009 X-Rspamd-Server: rspam08 X-HE-Tag: 1751038191-826679 X-HE-Meta: U2FsdGVkX18x3MvXcBRPt81OOYusSTr+8MuBTJE7y35yB0dg68W5znSv0No6dpyFE0umd4jUkhcjZa1gHurgMef7WGCLh15EyGh72aRmLWdRNZ8p7VS/DPOeOQI5qgcwaJAE4DWr0TOVqIkyk90hJd1WfLeNZjAMmQtWhDL/Wmrefsqy8AahDJAPXkndQRZOL2Qu7sBi/zOCLvIbPjVntmds6TLFKeeHoTVoQMm9AtWXf4ZsvcVXALtEqvfLAoD2ztJhXBsF8RE3DsMoewyfj18BTA9OXZdO7EtaQDZTfHNFrgqwtAbdBjkbek1PipLaDpvwEpUi0wZ8GpkdtKUFT6jV7EGLzbz1Qt4QycI6PhybWNyq08+cKFAcXFjSIrDiSlgxshZLJA74LpwMs+aMgyhUzmPRZigbKeBKjNwLXvILVMPelNmfbthAWRTsWUt+miryvpdlnW4FiCIwPOiUfJsyKV6YoCYmbUhoog7acbBRTYXZls9XsUehD4xoJTpR9iSK47+YChDrWJz7nwzUyv5XstcVR0V6cMtn7k9gAXIt+XDDP9XrHE/bVaNQ8WCFuu3Z7h986JJycdie0houLjUJ6SOP4IvY67ZC8dnvFU0fiRQrrLzKe8jSkeXZvF43FSVVBvcTV8AJHIsy89QWJJDtqmYB+unmpHrbzxvevYcDEi8maW5Sk04Xf/kZUSKAfE7dGIyYGJ8B/l6lnZCUn8QMXQjCfZ2jlSpdst/8HY0Cq7/2mggRVIisYwX0UDTONfwVzyhEyGbuPp3sscO5AY6zSyAqoRNWa/PRwtkJBAUc05ZnPZGe0MvivRW8XvI1YMEfSvsEUmRZKQ8nhA3gjVk5YVn9/AGQ5Bk+8/oTv5PkBAKVjnt8JtGEVxgeVCIEEGL6GuBe7JCsZoFxiaKkUhdQo/0vxRH5bE1E3hp5LOzk57KiOksjeQCKALGMAsP/y+qDyII+1bteaAfZdu8 HlzjedMT WPmiDml9MI309Y9JuaKWg2qm+amNvqj6bUXrm6VJkQofMNDq2ml4nmiC8CoTdU2sByonSFb1pgTdb8DGiXRm9p45Uq22Nrj37X7tMvloNj9kTkjruwe9XmhMPT/mfBt2/srh3TqFE+VjEbTkvztgk8sQOyY/oAjhKT8/ywX6Y+vqDit9uxUwEGRd+614isLiQJA7WD00WDT6RU8gmRX0pwX5kpJAOoKZzB6AW9XjAOvOxveIDmrERzhxMsEsLNOM8eudL/BPxNGaxHCHJ9Tk67a1ONmD+ThoTuI26qDgIhSGc0wLLlh/zktlaxYOAkSdsd9y9lZXigYGabVGgXCsWExWsKXW2HJ8nRwB2Iv41rTv2jQrAyb0cUE5/Fq1qNZaArHMn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/6/27 18:13, David Hildenbrand wrote: > On 27.06.25 09:36, Barry Song wrote: >> On Fri, Jun 27, 2025 at 7:15 PM Lance Yang wrote: >>> >>> >>> >>> On 2025/6/27 14:55, Barry Song wrote: >>>> On Fri, Jun 27, 2025 at 6:52 PM Barry Song <21cnbao@gmail.com> wrote: >>>>> >>>>> On Fri, Jun 27, 2025 at 6:23 PM Lance Yang >>>>> wrote: >>>>>> >>>>>> From: Lance Yang >>>>>> >>>>>> As pointed out by David[1], the batched unmap logic in >>>>>> try_to_unmap_one() >>>>>> can read past the end of a PTE table if a large folio is mapped >>>>>> starting at >>>>>> the last entry of that table. It would be quite rare in practice, as >>>>>> MADV_FREE typically splits the large folio ;) >>>>>> >>>>>> So let's fix the potential out-of-bounds read by refactoring the >>>>>> logic into >>>>>> a new helper, folio_unmap_pte_batch(). >>>>>> >>>>>> The new helper now correctly calculates the safe number of pages >>>>>> to scan by >>>>>> limiting the operation to the boundaries of the current VMA and >>>>>> the PTE >>>>>> table. >>>>>> >>>>>> In addition, the "all-or-nothing" batching restriction is removed to >>>>>> support partial batches. The reference counting is also cleaned up >>>>>> to use >>>>>> folio_put_refs(). >>>>>> >>>>>> [1] https://lore.kernel.org/linux-mm/ >>>>>> a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com >>>>>> >>>>> >>>>> What about ? >>>>> >>>>> As pointed out by David[1], the batched unmap logic in >>>>> try_to_unmap_one() >>>>> may read past the end of a PTE table when a large folio spans >>>>> across two PMDs, >>>>> particularly after being remapped with mremap(). This patch fixes the >>>>> potential out-of-bounds access by capping the batch at vm_end and >>>>> the PMD >>>>> boundary. >>>>> >>>>> It also refactors the logic into a new helper, >>>>> folio_unmap_pte_batch(), >>>>> which supports batching between 1 and folio_nr_pages. This improves >>>>> code >>>>> clarity. Note that such cases are rare in practice, as MADV_FREE >>>>> typically >>>>> splits large folios. >>>> >>>> Sorry, I meant that MADV_FREE typically splits large folios if the >>>> specified >>>> range doesn't cover the entire folio. >>> >>> Hmm... I got it wrong as well :( It's the partial coverage that triggers >>> the split. >>> >>> how about this revised version: >>> >>> As pointed out by David[1], the batched unmap logic in >>> try_to_unmap_one() >>> may read past the end of a PTE table when a large folio spans across two >>> PMDs, particularly after being remapped with mremap(). This patch fixes >>> the potential out-of-bounds access by capping the batch at vm_end and >>> the >>> PMD boundary. >>> >>> It also refactors the logic into a new helper, folio_unmap_pte_batch(), >>> which supports batching between 1 and folio_nr_pages. This improves code >>> clarity. Note that such boundary-straddling cases are rare in >>> practice, as >>> MADV_FREE will typically split a large folio if the advice range does >>> not >>> cover the entire folio. >> >> I assume the out-of-bounds access must be fixed, even though it is very >> unlikely. It might occur after a large folio is marked with MADV_FREE and >> then remapped to an unaligned address, potentially crossing two PTE >> tables. > > Right. If it can be triggered from userspace, it doesn't matter how > likely/common/whatever it is. It must be fixed. Agreed. It must be fixed regardless of how rare the scenario is ;) > >> >> A batch size between 2 and nr_pages - 1 is practically rare, as we >> typically >> split large folios when MADV_FREE does not cover the entire folio range. >> Cases where a batch of size 2 or nr_pages - 1 occurs may only happen if a >> large folio is partially unmapped after being marked MADV_FREE, which is >> quite an unusual pattern in userspace. > > I think the point is rather "Simplify the code by not special-casing for > completely mapped folios, there is no real reason why we cannot batch > ranges that don't cover the complete folio.". Yeah. That makes the code cleaner and more generic, as there is no strong reason to special-case for fully mapped folios ;) Based on that, I think we're on the same page now. I'd like to post the following commit message for the next version: ``` As pointed out by David[1], the batched unmap logic in try_to_unmap_one() may read past the end of a PTE table when a large folio's PTE mappings are not fully contained within a single page table. While this scenario might be rare, an issue triggerable from userspace must be fixed regardless of its likelihood. This patch fixes the out-of-bounds access by refactoring the logic into a new helper, folio_unmap_pte_batch(). The new helper correctly calculates the safe batch size by capping the scan at both the VMA and PMD boundaries. To simplify the code, it also supports partial batching (i.e., any number of pages from 1 up to the calculated safe maximum), as there is no strong reason to special-case for fully mapped folios. ``` So, wdyt? Thanks, Lance