From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CC8AC369AB for ; Tue, 15 Apr 2025 10:07:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7DFA2800E2; Tue, 15 Apr 2025 06:07:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B070C2800BE; Tue, 15 Apr 2025 06:07:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 958932800E2; Tue, 15 Apr 2025 06:07:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 743812800BE for ; Tue, 15 Apr 2025 06:07:19 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 32663811A5 for ; Tue, 15 Apr 2025 10:07:20 +0000 (UTC) X-FDA: 83335850640.05.44D8DF0 Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by imf14.hostedemail.com (Postfix) with ESMTP id 29A94100002 for ; Tue, 15 Apr 2025 10:07:17 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=n0lbMLyC; dmarc=pass (policy=none) header.from=igalia.com; spf=pass (imf14.hostedemail.com: domain of gavinguo@igalia.com designates 178.60.130.6 as permitted sender) smtp.mailfrom=gavinguo@igalia.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744711638; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vlK0Dri62mGNQqlwhnFyUcr4Z53lx2/utirBchiSPQk=; b=BHKHcccNBNRWkY2011vw4mioAqLkdBu3mxusHEX9yRnvP3lm1sIHPFFrhjH3geJWvwmV+5 lETH1ReqUNfGXu1//LkoTLvDY/55YVzCaqxZVAOXl75U5ArluYjkjaxMgV0l689TektcHs yKKY8veUqDVze2C539TeD3ENqvYtSPw= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=n0lbMLyC; dmarc=pass (policy=none) header.from=igalia.com; spf=pass (imf14.hostedemail.com: domain of gavinguo@igalia.com designates 178.60.130.6 as permitted sender) smtp.mailfrom=gavinguo@igalia.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744711638; a=rsa-sha256; cv=none; b=sqEOscimoHF9451oieDh1xDrR1USsZbL7OSQBWzhSqEVGrQnEdoa0dfX6P18tf4TSOQT2A oL0T4pTa0coyT10EouuQbhHZXyb9BTnD1s8Na7QNQhwInLcgfXn2GtYHIg95cgqNyc75oe AKWEKDTTvxc0Q+IHPt17scd8OOkKYQw= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From: References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=vlK0Dri62mGNQqlwhnFyUcr4Z53lx2/utirBchiSPQk=; b=n0lbMLyCyPBW5L12MKsE+9Fdau zeTIKDPA3rcAVYyr1VM4FvZh1LCWGdGBIeyXjl3p7RFi9O9ip7bucIQqTKZXwZqct0JwCm0RkonPU izOQl8Dy2rzDbx4NYaHlSIchvVPabaUy7+n3HkeIkseg5kiAFA/yB0dMJK7jHGVNDE2gvLa8rA1du oRCYRnalyT8b1xjP7PcYf4gXMMX8dGZPC1JrtK8qLTvknUzxXjQqltjaEdH9+Yw31Y6o/YF4cn/FJ fiPowjR21wyHFcWCPDIZ7HHHSJs7v8nzf6/thGntL4P9Mli6lgzfjbFtTeLpBWTtvoHZwyZLfm6Pr yvjzt4YQ==; Received: from 27-53-107-160.adsl.fetnet.net ([27.53.107.160] helo=[192.168.220.43]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_128_GCM:128) (Exim) id 1u4dCD-00GtGI-OH; Tue, 15 Apr 2025 12:07:14 +0200 Message-ID: <83629774-981b-44cb-a106-d549f1a43db9@igalia.com> Date: Tue, 15 Apr 2025 18:07:05 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/huge_memory: fix dereferencing invalid pmd migration entry To: Zi Yan Cc: linux-mm@kvack.org, akpm@linux-foundation.org, willy@infradead.org, linmiaohe@huawei.com, hughd@google.com, revest@google.com, kernel-dev@igalia.com, linux-kernel@vger.kernel.org, stable@vger.kernel.org, Naoya Horiguchi References: <20250414072737.1698513-1-gavinguo@igalia.com> Content-Language: en-US From: Gavin Guo In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam01 X-Stat-Signature: d51e45dgm47o4tfwkdwfefsb7azd3uxx X-Rspam-User: X-Rspamd-Queue-Id: 29A94100002 X-HE-Tag: 1744711637-621027 X-HE-Meta: U2FsdGVkX1+VvQ1Ttj2j7b7qla6fzvxqjG7W3MqdBFlDi3V0jIbd54LJMLYnqchDQFIku1uDAgwH/tx+k6CTx+MwVxUuHxf5CcRzIbmu88TTBTaV6PuTdW5vkpAoR2p8fN6+GmdwrVUW5Sp4D4KDgc4CRrNtVKXpCJPYbf57WOri16b9rWlBwqJLTIYlCd9N6l4jH4xTMHT7Erkck7euAZA9I3Q5w3wjxYRNT4D/FrZsHgcaw2ncmUWOraTM9T1mlV/bbDokA5W532MwYvWm2/s9qFt3K+3XInW8BQJ0BNBCve0z1fJr7nxHZK8vBqmqHgH5Bbp7DB2aT73hmx7FUCAtr8oo3vqVDyQy/eknyjYpT7V4f0I7ODnZaGJ9kMJVIos9XG7D+GadFUnNxWitoSOKfq3ruC2LcewzGJ0B26IJQJ3WPZkqAlvapBE3x5cqj9RYFCXSQ4DAq8fi5ZR8R9avJDzPbvOboZycvazVUIKWoB25SXmi12eF2lVqITfNKFLO+jiLjx8mw6yiOr+hWwHg7vZuCdSZ2FdzRZeu1JOSPIKrAsrv61/TnTVjolWWmS8SK+YV454+p9MQJNsgQgV37PI6hj34b8ch20H+Jva/V0frFnGIW0gXv4YQU/vmRJ+XjhCKcYizV5Luv+vO5giIV2Z6txGxpQWl94lMYLQ096zpTV9v7fUYlL4SkLd9/d0ZqRgprwmboDtRlLFJWAiWFG47QO+vB0T176wFBZ/fvPuXXXCs/Ane8CkaK1J9ze1b0YqbrLB5P1+6tY7tNa99Hh3h8jhs/u+49U7RfkHn5wrXzFADl5HGz7FR6U+itT33JyY0d5wfjBCboGjsyNpaTN3JE0GPeA8nGgy7lG/Kcy4noGtZR7GaZbx7osZIGbpbk0XzBfIE7Ji8DcFLxZz25QMFpeMWg9qlvKmxKSVsqagPhutez7k4z6egGJw2RTVpTdaKY6BsjpD1kOb gENxq4eu u2HOW3zhc59wBm1c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Zi-Yan, Thank you for the comment! On 4/15/25 00:50, Zi Yan wrote: > On 14 Apr 2025, at 3:27, Gavin Guo wrote: > >> When migrating a THP, concurrent access to the PMD migration entry >> during a deferred split scan can lead to a page fault, as illustrated > > It is an access violation, right? Because pmd_folio(*pmd_migration_entry) > does not return a folio address. Page fault made this sounded like not > a big issue. Right, pmd_folio(*pmd_migration_entry) is defined with the following macro: #define pmd_folio(pmd) page_folio(pmd_page(pmd)) page_folio will eventually access compound_head: static __always_inline unsigned long _compound_head(const struct page *page) { unsigned long head = READ_ONCE(page->compound_head); ... } However, given the invalid access address starting with 0xffffea (ffffea60001db008) which is the base address of the struct page. It indicates that the page address calculated from pmd_page is invalid during the THP migration where the pmd migration entry has been set up in set_pmd_migration_entry, for example: do_mbind migrate_pages migrate_pages_sync migration_pages_batch migrate_folio_unmap try_to_migrate try_to_migrate_one rmap_walk_anon set_pmd_migration_entry set_pmd_at > >> below. To prevent this page fault, it is necessary to check the PMD >> migration entry and return early. In this context, there is no need to >> use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality >> of the target folio. Since the PMD migration entry is locked, it cannot >> be served as the target. > > You mean split_huge_pmd_address() locks the PMD page table, so that > page migration cannot proceed, or the THP is locked by migration, > so that it cannot be split? The sentence is a little confusing to me. During the THP migration, the THP folio is locked (folio_trylock). When this THP folio is identified as partially-mapped and inserted into the deferred split queue, the system then scans the queue and attempts to split the folio when memory is under pressure or through the sysfs debug interface. Since the migrated folio remained locked, during the deferred_split_scan, the folio_trylock fails with the migrated folio. As a result, the folio passed to split_huge_pmd_locked ends up being unequal to the folio referenced by the pmd migration entry, indicating the pmd migration folio cannot be the target for splitting and needs to return. An alternative approach is similar to the following: + swp_entry_t entry; + struct folio *dst; if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)) { - if (folio && folio != pmd_folio(*pmd)) - return; + if (folio) { + if (is_pmd_migration_entry(*pmd)) { + entry = pmd_to_swp_entry(*pmd); + dst = pfn_swap_entry_folio(entry); + } else + dst = pmd_folio(*pmd); + + if (folio != dst) + return + } __split_huge_pmd_locked(vma, pmd, address, freeze); However, this extra effort to translate the pmd migration folio is unnecessary. Any ideas of exceptions? > >> >> BUG: unable to handle page fault for address: ffffea60001db008 >> CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 >> RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 >> Call Trace: >> >> try_to_migrate_one+0x28c/0x3730 >> rmap_walk_anon+0x4f6/0x770 >> unmap_folio+0x196/0x1f0 >> split_huge_page_to_list_to_order+0x9f6/0x1560 >> deferred_split_scan+0xac5/0x12a0 >> shrinker_debugfs_scan_write+0x376/0x470 >> full_proxy_write+0x15c/0x220 >> vfs_write+0x2fc/0xcb0 >> ksys_write+0x146/0x250 >> do_syscall_64+0x6a/0x120 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> The bug is found by syzkaller on an internal kernel, then confirmed on >> upstream. >> >> Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") >> Cc: stable@vger.kernel.org >> Signed-off-by: Gavin Guo >> --- >> mm/huge_memory.c | 18 ++++++++++++++---- >> 1 file changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 2a47682d1ab7..0cb9547dcff2 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, >> void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, >> pmd_t *pmd, bool freeze, struct folio *folio) >> { >> + bool pmd_migration = is_pmd_migration_entry(*pmd); >> + >> VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); >> VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); >> VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); >> @@ -3085,10 +3087,18 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, >> * require a folio to check the PMD against. Otherwise, there >> * is a risk of replacing the wrong folio. >> */ >> - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || >> - is_pmd_migration_entry(*pmd)) { >> - if (folio && folio != pmd_folio(*pmd)) >> - return; >> + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) { >> + if (folio) { >> + /* >> + * Do not apply pmd_folio() to a migration entry; and >> + * folio lock guarantees that it must be of the wrong >> + * folio anyway. > > Why does the folio lock imply it is a wrong folio? As explained above. > >> + */ >> + if (pmd_migration) >> + return; >> + if (folio != pmd_folio(*pmd)) >> + return; >> + } > > Why not just > > if (folio && pmd_migration) > return; > > if (pmd_trans_huge() …) { > … > } > ? Do you mean to implement as follows? if (folio && pmd_migration) return; if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { if (folio && folio != pmd_folio(*pmd)) return; } if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) __split_huge_pmd_locked(vma, pmd, address, freeze); To match the current upstream logic, when folio is null, no matter the condition is either pmd_trans_huge, pmd_devmap, or pmd_migration, the __split_huge_pmd_locked must be always executed. Given that, the __split_huge_pmd_locked cannot be put inside if condition for trans_huge and devmap. Likewise, the __split_huge_pmd_locked cannot be called directly without wrapping the if condition checks. What happens if this is not a pmd entry? This will be invoked unconditionally. The original implementation consolidates all the logic into one large if statement with nested if condition check. However, it's more robust and clear. The second one simplifies the structure and can rule out the pmd_migration earlier and doesn't have the big if condition. However, it's trickier and needs extra care and maintenance. > > Thanks. > > Best Regards, > Yan, Zi Thanks!