From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 852BEC282DE for ; Wed, 5 Mar 2025 18:59:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F8B4280016; Wed, 5 Mar 2025 13:59:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A95328000B; Wed, 5 Mar 2025 13:59:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0707E280016; Wed, 5 Mar 2025 13:59:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D71A628000B for ; Wed, 5 Mar 2025 13:59:01 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 52509B6693 for ; Wed, 5 Mar 2025 14:12:15 +0000 (UTC) X-FDA: 83187687030.28.6E904B1 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf15.hostedemail.com (Postfix) with ESMTP id F3C01A0006 for ; Wed, 5 Mar 2025 14:12:12 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=XYZ5hYRg; dmarc=none; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741183933; a=rsa-sha256; cv=none; b=Qb0nYzOCZ1nF7W/AfI5k7JVO1BDDcP/FMlxBSSmItJdaje+c6IJB4wp6LvGSvAWoTe4IZC 2C9xI8FE9pUtvFGm66rdEKphZlyEdsXbtDIITPcUu/FbO1xtsWDlVxeVeOE3MRsIoyA+YU bnoWrXR8QD7rovN6qDOBhlMNWEgRid0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=XYZ5hYRg; dmarc=none; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741183933; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LYtlhxVzhTyajNUxQxNY5jgj6oy4DfbsP7RDQCKcybE=; b=SV7lPwoOd1vpb7IMrxGLQZIHudhLq1tArqpmVF5JNFZ/05odSCfJ2H0RRXnnFNPYH0Oerx Qrg4WXH0gsDOV7rDL13QW33XBrfneUCm+R2m/56GySPDO1gIdKQiPxIK/3IvlgNNASu4Oj FXkbdfuEdxHYzLJPhpvPrU+fCk4fE3A= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=LYtlhxVzhTyajNUxQxNY5jgj6oy4DfbsP7RDQCKcybE=; b=XYZ5hYRglm9fc5zCOcSxAK8jbb tQHenp906XZ1VfnkIDQVZSTspC4l+BFmKj+a1PpTAnPmBKCAYANgSGYC13wu2er4LKS1kXgxOyFmM 732lpNwrykjBw3G7dUQ9ABVblOgn1NpBeMHpKXeYPBugXVCDLCpZqKuMjSCvrJ0irG/dpFaZLgB/X Pyp59hM1v5LvkWXCUsL/YnIRTJIqjGoD/Tv9iwv2/6wLn1vdZPVmflDhrs1VsFJip1tEbkQxxxRwV jMQdoOJwPMn+oBJyI06ap2w/ra5S16Kx7F6EAD9LELTsHit1kStmSAI2Fnl2uDeMpuDmOjoRJC7uf 7p2ncKCA==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tppTj-00000005dei-1hr5; Wed, 05 Mar 2025 14:12:07 +0000 Date: Wed, 5 Mar 2025 14:12:07 +0000 From: Matthew Wilcox To: David Hildenbrand Cc: Dev Jain , ziy@nvidia.com, hughd@google.com, ryan.roberts@arm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [QUESTION] Plain dereference and READ_ONCE() in fault handler Message-ID: References: <20250305102159.96420-1-dev.jain@arm.com> <8477d9ec-b9ce-4a3d-b61f-1bd44d3360a5@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8477d9ec-b9ce-4a3d-b61f-1bd44d3360a5@redhat.com> X-Rspamd-Queue-Id: F3C01A0006 X-Stat-Signature: 1bac7hkf5enc53934j5ixy1dp35gunbs X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1741183932-615522 X-HE-Meta: U2FsdGVkX1/K1kprFNyOOy9ZNfUOQ9GqGNqH2q1AI55R1iEso8ajeB0Ep3lCwpJqIFOtZQEgwf1/BgGBBX/o/Nts0HlndM41RQ6YOzXKR4qCMh7Tw0DERyET8wl4Iuf5blX46ro53ScqZIHj6bupal3wlq4yL7kJO/SkS5anbNdx8YN+Qbs8CRZz+QhCTLU/yFP9l/3Mdgg8kRsvZ2Y3zB139e7EMlx6lclQf9JE7sEPcBDiTHvngVTwPHuezsQoOgtbTrwmpfGS0vVGMEVpdNBBJrkztCfYX0dNo7ALBCnIMOebrKhtRzW7v4jLyWgsYyx0yMSbYeyhaQHaHpiDRPTtwGNHLKFHGjL5RO0VxuLmDVc0RHbgzMoudlaF2fLMtqGTiTrxB7+pO6DRpgauZkHFwKqnfz5/tPCGRnulES/4c4nPSyYafSbJ7Fn0debCrJ8zMfcwE4OXmqZT6iTlXHannk0RbaQReMrBS4FUC0J38i9BvlDyZIawIfsjMNXI/6TnO8pelMv3SlWfkE0chbQ+JJzJ8hyUpFKZOuVRpFi4Rj7xqq0HpKl866LKKc543AGgVswmYl6LAHtCNO3FoQhhiARY2T6UE6KsB2YFhbNw0AZDe7u7E3Vicq4gcM79Cpno2OP4A0fVZ8sYiD0GekRlG+RRxINaub1v08IuPP03DuflXOWH2YRyand5PRt1dg1knzffhO3KBrVtmHf1ZMdWSQwY17MR5ccyuszjBuN8FwJUEwUDCMXTeL95ImHZijdm1SaZFy9s2w/2wKhoaYbn+MlOVYQ1QSgiV6MC3rsruB3M/JkARfVSG19lccNcRzI6e5fVU+wI2K6FiitXGQUKL8+xLM9WGuXobYtvFOcV6WDMBR9cewkUQ5Fm5NPN3ni08kRuBBIEXCPRDr5aVVP/u1gu+XTMn5oho1ydPoH3pQoIxsXFWyo+f16m3K5dnwuDWoyCx6wXzUFJgQo JTVoFvUm C6jriFmnHkd/kfiyQ7Lr9ivuQhiIQ4EEZvUdb59NMlySmfn3PsL5YkUZiPgIxvl5qr7XSQePfuThI++d6SMBCfQFgO4dLo7oHE5Yuka9b36tfjaF6wPkHJ8n6UHImv5WOPyNJLGSDtCNCAN16x1twWIbxOKCBXKLUOTD1gWK+keRzOVDDHLkRm+1t2VXzz8ye+JSIkC3erNK+LfK+PhIVhav24FxkmsPSaIaD8HoG0BUXY+fEPo1S75jz7w6V1Q1qlR5+Vn2ndigIadoq07c3evTTUPZJ5VX3XTeZU6hChVDdedaym/ADqrMwdU5dRQ+zRawRO98dOKEQNYQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 05, 2025 at 11:46:41AM +0100, David Hildenbrand wrote: > On 05.03.25 11:21, Dev Jain wrote: > > In __handle_mm_fault(), > > > > 1. Why is there a barrier() for the PUD logic? > > Good question. It was added in > > commit a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517 > Author: Matthew Wilcox > Date: Fri Feb 24 14:57:02 2017 -0800 > > mm, x86: add support for PUD-sized transparent hugepages > > Maybe it was an alternative to performing a READ_ONCE(*vmf.pud). I was monkey-see, monkey-do. Here's the corresponding code as it existed at the time: } else { pmd_t orig_pmd = *vmf.pmd; barrier(); if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { vmf.flags |= FAULT_FLAG_SIZE_PMD; vs what I added: } else { pud_t orig_pud = *vmf.pud; barrier(); if (pud_trans_huge(orig_pud) || pud_devmap(orig_pud)) { At some point, somebody added pmdp_get_lockless() and did not add a corresponding pudp_get_lockless(). And it was ... Hugh in 26e1a0c3277d If you want to add a pudp_get_lockless(), I doubt anyone will object, but it's probably pointless churn. > Maybe it was done that way, because pudp_get_lockless() does not exist. And > it would likely not be required, because on architectures where > ptep_get_lockless() does some magic (see below, mostly 32bit), PUD THP are > not applicable. > > > > 2. For the PMD logic, in the if block, we use *vmf.pmd, and in the else block > > we use pmdp_get_lockless(); what if someone changes the pmd just when we > > have begun processing the conditions in the if block, fail in the if block > > and then the else block operates on a different pmd value. Shouldn't we cache > > the value of the pmd and operate on a single consistent value until we take the > > lock and then finally check using pxd_same() and friends? > > The pmd_none(*vmf.pmd) is fine. create_huge_pmd() must be able to deal with > races, because we are not holding any locks. > > We only have to use pmdp_get_lockless() when we want to effectively perform > a READ_ONCE(), and make sure that we read something "reasonable" that we can > operate on, even with concurrent changes. (e.g., not read a garbage PFN just > because of some concurrent changes) > > We'll store the value in vmf.orig_pmd, on which we'll operate and try to > detect later changes using pmd_same(). So we really want something > consistent in there. > > See the description above ptep_get_lockless(), why we cannot simply do a > READ_ONCE on architectures where a PTE cannot be read atomically (e.g., 8 > byte PTEs on 32bit architecture). > > -- > Cheers, > > David / dhildenb >