From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A64E8C4345F for ; Fri, 26 Apr 2024 07:48:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48F3A6B0093; Fri, 26 Apr 2024 03:48:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43E506B0092; Fri, 26 Apr 2024 03:48:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32F826B0093; Fri, 26 Apr 2024 03:48:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 176176B008C for ; Fri, 26 Apr 2024 03:48:13 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 344A4A082E for ; Fri, 26 Apr 2024 07:48:12 +0000 (UTC) X-FDA: 82050904824.13.0EE9CAE Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf02.hostedemail.com (Postfix) with ESMTP id 83EEE8000D for ; Fri, 26 Apr 2024 07:48:10 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714117690; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V+cySVAcDaBtP3xCBymauMiq/8oy5WPgaD1+PaMY1fY=; b=XD+m3p57DSur4Fon0hfHHUbhqqBjrGODwUjDoC8+0OqDrllH+jEMAhOsKmcXzI0Ab3DZWg 4tkDswpqGRispBYnV9Llt9QxVrFtQ/Ds+/K/sp+vdPLFWABoTqmzd1lxiJH53U+W2CLsDi 6N5c0yXynTWQry7DnwTRYocQHFQsmJA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714117690; a=rsa-sha256; cv=none; b=lfIuiEKJGqxx0l6imWAAEekVZw/8WcS+rACqrd9QV3htWLfs2lf3RG9uyLMwJDuGumMss4 0RP22IqE3XIk9yOmeDQotn0/yOJktqigrjJ0zLm7Vsm5JcR0YhKWq3GIZMDQybshUxgV+m mTfrilS9+u/NqLogZJjCpQaAKVsHR44= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ACEAD2F4; Fri, 26 Apr 2024 00:48:37 -0700 (PDT) Received: from [10.57.64.176] (unknown [10.57.64.176]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8798F3F793; Fri, 26 Apr 2024 00:48:08 -0700 (PDT) Message-ID: <35210542-1309-4330-9367-e4a597be59fc@arm.com> Date: Fri, 26 Apr 2024 08:48:07 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1] mm: Fix race between __split_huge_pmd_locked() and GUP-fast Content-Language: en-GB To: Zi Yan Cc: Andrew Morton , "Aneesh Kumar K.V" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Anshuman Khandual , Huacai Chen References: <20240425170704.3379492-1-ryan.roberts@arm.com> <922F6794-1C32-4862-8A67-90D4F7DBC474@nvidia.com> From: Ryan Roberts In-Reply-To: <922F6794-1C32-4862-8A67-90D4F7DBC474@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 83EEE8000D X-Rspam-User: X-Stat-Signature: dxqpzcfofxm7a3e96t5c4zk3imudemtx X-HE-Tag: 1714117690-391663 X-HE-Meta: U2FsdGVkX19mmw0iNsU0g6xCFr4BrVQtT+7AMqIGOgPcUOGHt2NTXyVmJ3YIWmSc9pQju4n/tsfqHHFQQikCuedRKSRBGo6UiadHXK+lk+4SsWYKsFhVfMCy7ccVlBQJUOpxzHrX85PVbTaXoyPdBYpMIGeqm4JdJe6+PRn0vFdAQ2X0Yh6D/S2hRhftILjndclfB5N+2xBHVFOg7L/d0SwFU7cVED9/gd22q54sVMl+UDp3mHuI7KV3ANDCCw6eWe/Ek40YFKwm/gtLSQersPGr7bBNEOYoDoCrujFfgr3FijL1Xklm/AigvEdWEBaHTODCy7I6+SmEzJE3hGV/7xnxcLbxjSAwn2Tq5i9HpZsiLvU4UCQldEMpkEOq1nj4QbnPqlDns6cT7GuoYVe+66dWQWZZluu8ZkEW/jFHW1Xl/NgpPSMSeA0hnPX26WMlaBWNxw2n63BY4fpmP8ImGHr2yPWuffN80SPPFtnZyE1CRCMCQaXKXQG7K26ygNKohlBH7d2eER/JJccsirqXHW0F+Km1fcUKJxf+JOVNV4ysNe56u6mfCMocE1wGBcpq2WAU8Kpag9DT6knCoEGeu1zY5DNzoqmrVrBpu3lab1cg0PsxQxNw3tYKwCGW+rfRFTLqefLsYmTdbdylCK0RbQEjagwtVkBDAbFqJc3uIbRRU1twwiqFwS8stTjqMamL/rnv1ZWu4v+ZrHx6+apbvtLG6rLR4+R5nQsZ6ckyWSJklIDjNrN70qvl3B17vwjag7kQpa0UD934XdoiusBChpk9JBg7cz5mFQfHzP+dmjq/ejF4UkcuT0o9T0Yb5XEL/K45VlUBT7AqTISna8KnzZq+BbEPse0sMgSl3/zRy175zigQSuOvifXNmxrT0Yw7wYyaACq2yEWjoropzCjX4qQLplO3OJ0xlUF6EmWk8+ig6SwpmFf9uUNJhi7ukJBQTEQvd21lic6oWT+9+SK LT326Sz7 kCz+pkihKYaQpEDz2ounhar4gVo20WQZp18XG6DFh0J9aPmBKZmCGT+L5rFCE/J64Wa+nTrmP8jclpsWI5hOL3H1NwY766MWuiV65aDnfHyEmf/1vaeTZoIdr4/XKd6VPIAKdMiZnR3SITxj/CtHNO5mOW459/xKY+MOrrCR4JP7JxH1mzO3WiH0gdaYr4GWdSHZ2k2e3zyEDaa2NeEihOe2hQtmCDgbbA+7XAzzawOEajOCZvqlzhudN9sKlpgvzyAb356QFWihS+jTnlxBkr9GoCvr3s5cOk0bAwaXW9/FTAkUhWV2wRyUKHQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 25/04/2024 19:58, Zi Yan wrote: > +Anshuman, who changed pmd_present() semantics. See: > https://lore.kernel.org/all/1599627183-14453-2-git-send-email-anshuman.khandual@arm.com/ and commit b65399f6111b ("arm64/mm: Change > THP helpers to comply with generic MM semantics") > > On 25 Apr 2024, at 13:07, Ryan Roberts wrote: > >> __split_huge_pmd_locked() can be called for a present THP, devmap or >> (non-present) migration entry. It calls pmdp_invalidate() >> unconditionally on the pmdp and only determines if it is present or not >> based on the returned old pmd. This is a problem for the migration entry >> case because pmd_mkinvalid(), called by pmdp_invalidate() must only be >> called for a present pmd. >> >> On arm64 at least, pmd_mkinvalid() will mark the pmd such that any >> future call to pmd_present() will return true. And therefore any > > But pmd_mkinvalid() on x86 does not behave so. Maybe we should fix > pmd_mkinvalid() on arm64 by not setting PMD_PRESENT_INVALID when the > entry is invalid already. And add a test in mm/debug_vm_pgtable.c. Yes, we *could* make pmd_mkinvalid() defensive. But we don't do that for the other getters/setters (e.g. pte_mkwrite()). So not sure why we would want that here. Ultimately it makes no semantic sense to invalidate a non-present pmd. See my other mail for excessive detail. Thanks, Ryan > > I notice that x86, risc-v, mips behave the same. loongarch also > has _PAGE_PRESENT_INVALID bit set during pmd_mkinvalid(), but its > pmd_present() makes sure _PAGE_HUEG is set before checks _PAGE_PRESENT_INVALID. > So it is not a problem for loongarch. Add Huacai to confirm this. > > Maybe pmd_present() on arm64 can do that too?> >> lockless pgtable walker could see the migration entry pmd in this state >> and start interpretting the fields as if it were present, leading to >> BadThings (TM). GUP-fast appears to be one such lockless pgtable walker. >> I suspect the same is possible on other architectures. >> >> Fix this by only calling pmdp_invalidate() for a present pmd. And for >> good measure let's add a warning to the generic implementation of >> pmdp_invalidate(). I've manually reviewed all other >> pmdp_invalidate[_ad]() call sites and believe all others to be >> conformant. >> >> This is a theoretical bug found during code review. I don't have any >> test case to trigger it in practice. >> >> Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") >> Signed-off-by: Ryan Roberts >> --- >> >> Applies on top of v6.9-rc5. Passes all the mm selftests on arm64. >> >> Thanks, >> Ryan >> >> >> mm/huge_memory.c | 5 +++-- >> mm/pgtable-generic.c | 2 ++ >> 2 files changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 89f58c7603b2..80939ad00718 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -2513,12 +2513,12 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, >> * for this pmd), then we flush the SMP TLB and finally we write the >> * non-huge version of the pmd entry with pmd_populate. >> */ >> - old_pmd = pmdp_invalidate(vma, haddr, pmd); >> >> - pmd_migration = is_pmd_migration_entry(old_pmd); >> + pmd_migration = is_pmd_migration_entry(*pmd); >> if (unlikely(pmd_migration)) { >> swp_entry_t entry; >> >> + old_pmd = *pmd; >> entry = pmd_to_swp_entry(old_pmd); >> page = pfn_swap_entry_to_page(entry); >> write = is_writable_migration_entry(entry); >> @@ -2529,6 +2529,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, >> soft_dirty = pmd_swp_soft_dirty(old_pmd); >> uffd_wp = pmd_swp_uffd_wp(old_pmd); >> } else { >> + old_pmd = pmdp_invalidate(vma, haddr, pmd); >> page = pmd_page(old_pmd); >> folio = page_folio(page); >> if (pmd_dirty(old_pmd)) { >> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c >> index 4fcd959dcc4d..74e34ea90656 100644 >> --- a/mm/pgtable-generic.c >> +++ b/mm/pgtable-generic.c >> @@ -198,6 +198,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) >> pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, >> pmd_t *pmdp) >> { >> + VM_WARN_ON(!pmd_present(*pmdp)); >> pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp)); >> flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); >> return old; >> @@ -208,6 +209,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, >> pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, >> pmd_t *pmdp) >> { >> + VM_WARN_ON(!pmd_present(*pmdp)); return pmdp_invalidate(vma, address, pmdp); >> } >> #endif >> -- >> 2.25.1 > > > -- > Best Regards, > Yan, Zi