From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 73181CCF9E3 for ; Mon, 10 Nov 2025 11:18:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B36488E0014; Mon, 10 Nov 2025 06:18:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AE6ED8E0002; Mon, 10 Nov 2025 06:18:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9AEF78E0014; Mon, 10 Nov 2025 06:18:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 878058E0002 for ; Mon, 10 Nov 2025 06:18:03 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 46D7EBE035 for ; Mon, 10 Nov 2025 11:18:03 +0000 (UTC) X-FDA: 84094448046.17.5D89799 Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf02.hostedemail.com (Postfix) with ESMTP id A92BD80013 for ; Mon, 10 Nov 2025 11:18:01 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=suAfZZSb; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1762773481; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QZ/wnETuHj4pngw4rNh6VnRsQQ0NKC9V7rmNNQJYmbM=; b=TUdLlarlKRZZOiYMWK1R3jt0b2h/edJVL1DVbeDYWb1nYo3WBxg8FAtsS6uDiqurQjvPG0 sA8yf/xuU74JIrogDrPzBYS3hkfS8t4KJ/U+pDDDLTsvVrzuw2A5ytIqN7i6gbPARg0VS/ 0GbzXKpE3U8PuvneV6eP8nF1+FAUAbQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1762773481; a=rsa-sha256; cv=none; b=mjF9ddpYqE0FU2iBF88v/ymhpO5vqmtuQZQUpYMMqyVtHmsrBzy8o4SXDPHAIhTUzA1GID x1nS5dSJahDCuKPtg9bi39vIDO1XerVHF5fRz3p6BpnKsbSn/Hd71eEqnEGGd6ETM0lijN dLb3KNHtj4ilk/YAInCHewco2CBkDlU= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=suAfZZSb; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id A9CFC60146; Mon, 10 Nov 2025 11:18:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D4C43C116B1; Mon, 10 Nov 2025 11:17:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762773480; bh=Kl3noGEumLrebFoTSEUpT8n+WyWcc5CDX7vO2T8znyo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=suAfZZSbiazW8sFc3Ab8Dx4NBPVEAhiluxobak9p2W7fV7tGaO7etyuexr+Wd3vCj PqHxUnnNW7q++eT7YBuJNwAJ3rvKAo7XoOn6lFdrMFjUD3WX+vxmv61jmLfPOYOs4S kgsdkCyDsNIp911BsonJsPQag0KfEYtcFcMSuVKaDEiv4/3VDw0E9C2G+b/HLgSKiW SvzHr4R5cUfu3krfZf0NcIhD1aEvXuOLOydf18+trvSrzMjKoRzc936vRumSNGLv/7 zHFW012E2kZECXgXkvGjDJltpN+//CUP2yGc+d3k9TyonAEyXSC/gzEyX+zosgPkV/ y+Taz5qHRrJ5A== Date: Mon, 10 Nov 2025 13:17:37 +0200 From: Mike Rapoport To: Lorenzo Stoakes Cc: Andrew Morton , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , David Hildenbrand , Alexander Gordeev , Gerald Schaefer , Heiko Carstens , Vasily Gorbik , Sven Schnelle , Peter Xu , Alexander Viro , Christian Brauner , Jan Kara , Arnd Bergmann , Zi Yan , Baolin Wang , "Liam R . Howlett" , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Lance Yang , Muchun Song , Oscar Salvador , Vlastimil Babka , Suren Baghdasaryan , Michal Hocko , Matthew Brost , Joshua Hahn , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Axel Rasmussen , Yuanchu Xie , Wei Xu , Kemeng Shi , Kairui Song , Nhat Pham , Baoquan He , Chris Li , SeongJae Park , Matthew Wilcox , Jason Gunthorpe , Leon Romanovsky , Xu Xin , Chengming Zhou , Jann Horn , Miaohe Lin , Naoya Horiguchi , Pedro Falcato , Pasha Tatashin , Rik van Riel , Harry Yoo , Hugh Dickins , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, damon@lists.linux.dev Subject: Re: [PATCH v2 01/16] mm: correctly handle UFFD PTE markers Message-ID: References: <0b50fd4b1d3241d0965e6b969fb49bcc14704d9b.1762621568.git.lorenzo.stoakes@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0b50fd4b1d3241d0965e6b969fb49bcc14704d9b.1762621568.git.lorenzo.stoakes@oracle.com> X-Stat-Signature: ms5ipjfcbg57e5g1bi17em9b6fpkdtwu X-Rspam-User: X-Rspamd-Queue-Id: A92BD80013 X-Rspamd-Server: rspam10 X-HE-Tag: 1762773481-659540 X-HE-Meta: U2FsdGVkX19AOPsDB3AiSVLWVVgVMQzsdTEp/sgT7oji8jcvH5vMWUST9fd4qpXsh3+0/co1c+rmEUU0+0u3aa+HGE+IHaH7HiqlSnYQNKQLYkbAO2uCm7UWGRPRbnTPPpkMYb340agro/Z9zYFZn+7KLWxMgZWPK82JiZ64C0K3LrdmwXu5EKLIpjpWv6z3aPCu9YjUaOOOFP4Il0bYmkR+E42QVVWRJhREvNPe0H2GFTTPG0aPK+DpBEBjxv5kAmOj9x2GIuBnEIbs0w87O/z94/3k2ZFMCpi3LkwUWaudnmal1Mwqc8urdlb+5YH4UKyLHXXVALuNcFdD4G2C/2pIY/8R30vRUTBEYpBqU+If0f+P7zf25mHJRtokWyMnvd6Y2tgNeN1kJ/6kRaHMGbbHI/r9FxStbn6NzBywX17Nc86xU98fI0Hdd0TW4pvvUCJXlNRRk/s7vG4o0FIW77D1FbAOHkCXFa7emE1mpW6WRgRpEnFZjXtK7dmJVVdzIvN0RwCkNOnUcaBuszBTwbbQYcqi74jh1ObevJXp6TwSQcWAfgPnWLSW/U4Zt1fn8q6uNTDlSb3GF8yv/mGj13AWL2E9O8QLXnqbvUZlQlBUlYRxqjp2QlonCrP71QVdsQS8GV0SXYJa3tEA/zVbx7fHyPWtrjWMgZ/Pl6kmxxEEH22xRxZ7ZLvDIt1L1yQmkshJ5vSNxexfdol8NdQnmI1xonpXirNDy88JiBDPGUS5M4KvjnkfqjBVxACwRT47U3qOkQj2SSzzVqPL89WRw++eLctKB7+8RC7AHFBwf4GprpEp7KY1kUZID122WJ5T8gj/nb8bLaV1HLNQR4MoFqCcYoI3qOSh+HoC8GY1XgwootKqGrJQ9ikXzyhmiqqHVS9EuV6cGhsaWdgtdPVQgWkvYz0JaeYHWsfEHqvKbTcnB1a55L+VKHrlj2dLyfe3IYVZFgoRjYWmDNBneFt JavN6aGz r6wHrLwr4nNXQABaVLRrUGaZ3MnrjSn3ZtQyF6xM62xsLKrFgyni9qpratz2kGXKRvCiSs1zpzH4vpUvTwUsG9dnQ0T/BnKs4jsLA4Eh3Digqiy2/gF1CD6qkHp5jXrjYI5rKuKRaH6BUos0SW/dT2Q+l1uGprxyfvWnjFlEZu1xKgbvuCwBrwUkQoVJ7yW7zvcdj+BOVONkiw1yqLq4wGy01vwNZFlUXe/e2+92XR+rnp78oeX8fDj/VzXeygLkhXiB2pxaePHJDZ3WHwpupU0oqWlhCo4xvVGFFgyKQupxBDv18/SEWon0ec9PXjEibF795TDLdbBicvgbscSIFZN8P2ijpYi+05Tut X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Nov 08, 2025 at 05:08:15PM +0000, Lorenzo Stoakes wrote: > PTE markers were previously only concerned with UFFD-specific logic - that > is, PTE entries with the UFFD WP marker set or those marked via > UFFDIO_POISON. > > However since the introduction of guard markers in commit > 7c53dfbdb024 ("mm: add PTE_MARKER_GUARD PTE marker"), this has no longer > been the case. > > Issues have been avoided as guard regions are not permitted in conjunction > with UFFD, but it still leaves very confusing logic in place, most notably > the misleading and poorly named pte_none_mostly() and > huge_pte_none_mostly(). > > This predicate returns true for PTE entries that ought to be treated as > none, but only in certain circumstances, and on the assumption we are > dealing with H/W poison markers or UFFD WP markers. > > This patch removes these functions and makes each invocation of these > functions instead explicitly check what it needs to check. > > As part of this effort it introduces is_uffd_pte_marker() to explicitly > determine if a marker in fact is used as part of UFFD or not. > > In the HMM logic we note that the only time we would need to check for a > fault is in the case of a UFFD WP marker, otherwise we simply encounter a > fault error (VM_FAULT_HWPOISON for H/W poisoned marker, VM_FAULT_SIGSEGV > for a guard marker), so only check for the UFFD WP case. > > While we're here we also refactor code to make it easier to understand. > > Reviewed-by: Vlastimil Babka > Signed-off-by: Lorenzo Stoakes > --- > fs/userfaultfd.c | 83 +++++++++++++++++++---------------- > include/asm-generic/hugetlb.h | 8 ---- > include/linux/swapops.h | 18 -------- > include/linux/userfaultfd_k.h | 21 +++++++++ > mm/hmm.c | 2 +- > mm/hugetlb.c | 47 ++++++++++---------- > mm/mincore.c | 17 +++++-- > mm/userfaultfd.c | 27 +++++++----- > 8 files changed, 123 insertions(+), 100 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 54c6cc7fe9c6..04c66b5001d5 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -233,40 +233,46 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, > { > struct vm_area_struct *vma = vmf->vma; > pte_t *ptep, pte; > - bool ret = true; > > assert_fault_locked(vmf); > > ptep = hugetlb_walk(vma, vmf->address, vma_mmu_pagesize(vma)); > if (!ptep) > - goto out; > + return true; > > - ret = false; > pte = huge_ptep_get(vma->vm_mm, vmf->address, ptep); > > /* > * Lockless access: we're in a wait_event so it's ok if it > - * changes under us. PTE markers should be handled the same as none > - * ptes here. > + * changes under us. > */ > - if (huge_pte_none_mostly(pte)) > - ret = true; > + > + /* If missing entry, wait for handler. */ It's actually #PF handler that waits ;-) When userfaultfd_(huge_)must_wait() return true, it means that process that caused a fault should wait until userspace resolves the fault and return false means that it's ok to retry the #PF. So the comment here should probably read as /* entry is still missing, wait for userspace to resolve the fault */ and the rest of the comments here and in userfaultfd_must_wait() need similar update. > + if (huge_pte_none(pte)) > + return true; > + /* UFFD PTE markers require handling. */ > + if (is_uffd_pte_marker(pte)) > + return true; > + /* If VMA has UFFD WP faults enabled and WP fault, wait for handler. */ > if (!huge_pte_write(pte) && (reason & VM_UFFD_WP)) > - ret = true; > -out: > - return ret; > + return true; > + > + /* Otherwise, if entry isn't present, let fault handler deal with it. */ Entry is actually present here, e.g because there is a thread that called UFFDIO_COPY in parallel with the fault, so no need to stuck the faulting process. > + return false; > } > #else > static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, > struct vm_fault *vmf, > unsigned long reason) > { > - return false; /* should never get here */ > + /* Should never get here. */ > + VM_WARN_ON_ONCE(1); > + return false; > } > #endif /* CONFIG_HUGETLB_PAGE */ > > /* > - * Verify the pagetables are still not ok after having reigstered into > + * Verify the pagetables are still not ok after having registered into > * the fault_pending_wqh to avoid userland having to UFFDIO_WAKE any > * userfault that has already been resolved, if userfaultfd_read_iter and > * UFFDIO_COPY|ZEROPAGE are being run simultaneously on two different > @@ -284,53 +290,55 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, > pmd_t *pmd, _pmd; > pte_t *pte; > pte_t ptent; > - bool ret = true; > + bool ret; > > assert_fault_locked(vmf); > > pgd = pgd_offset(mm, address); > if (!pgd_present(*pgd)) > - goto out; > + return true; > p4d = p4d_offset(pgd, address); > if (!p4d_present(*p4d)) > - goto out; > + return true; > pud = pud_offset(p4d, address); > if (!pud_present(*pud)) > - goto out; > + return true; > pmd = pmd_offset(pud, address); > again: > _pmd = pmdp_get_lockless(pmd); > if (pmd_none(_pmd)) > - goto out; > + return true; > > - ret = false; > if (!pmd_present(_pmd)) > - goto out; > + return false; This one is actually tricky, maybe it's worth adding a gist of commit log from a365ac09d334 ("mm, userfaultfd, THP: avoid waiting when PMD under THP migration") as a comment. > > - if (pmd_trans_huge(_pmd)) { > - if (!pmd_write(_pmd) && (reason & VM_UFFD_WP)) > - ret = true; > - goto out; > - } > + if (pmd_trans_huge(_pmd)) > + return !pmd_write(_pmd) && (reason & VM_UFFD_WP); ... > diff --git a/mm/hmm.c b/mm/hmm.c > index a56081d67ad6..43d4a91035ff 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -244,7 +244,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, > uint64_t pfn_req_flags = *hmm_pfn; > uint64_t new_pfn_flags = 0; > > - if (pte_none_mostly(pte)) { > + if (pte_none(pte) || pte_marker_uffd_wp(pte)) { Would be nice to add the note from the changelog as a comment here. > required_fault = > hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); > if (required_fault) -- Sincerely yours, Mike.