From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C960C19F32 for ; Wed, 5 Mar 2025 18:40:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D00328001C; Wed, 5 Mar 2025 13:40:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 85DEA28000F; Wed, 5 Mar 2025 13:40:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7218628001C; Wed, 5 Mar 2025 13:40:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 521A028000F for ; Wed, 5 Mar 2025 13:40:57 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7E3EA533C7 for ; Wed, 5 Mar 2025 15:02:58 +0000 (UTC) X-FDA: 83187814836.02.F79FADC Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 6CD0AC002D for ; Wed, 5 Mar 2025 15:02:56 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf22.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741186976; a=rsa-sha256; cv=none; b=I3pW5aaR8dsoSZjxJ1HNdifM+YAC2nzeIYXkWA34opywkSUer3AeefbioLYHCRkKRkGybx gzAGcFw22fv29+czamPyUk0lWwJQWIslD0Nv29GXfX88lwo7kwrFJOXyYEWg8JG8PwO1Wl +CFd/R9l57lTtcTWuaZOYUcZHSU41zI= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf22.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741186976; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qMXctNUYFQiRB5mAfBDp9FlFDJBMCBX0/9K2r22eb0s=; b=Gx3wpfiFUagaXh+cDAbKGlbTMJ405Ab0NWlB9gq1R4CTJm5XqfgShYg7TBE/u4BvZXn+kI DkGDDg62Vk1hwNC4flRWMNaTLD8L5aSwpYuz8nGF2TT0WRYel+IwS7ryWj9JsERzbJZu1N fGYeU/J4xu5ZqtiQ1CMTDbDFhzGdWS0= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9535FFEC; Wed, 5 Mar 2025 07:03:08 -0800 (PST) Received: from [10.174.36.147] (unknown [10.174.36.147]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 065973F673; Wed, 5 Mar 2025 07:02:52 -0800 (PST) Message-ID: <3c3f3cfe-9fa7-41d7-9759-cc67306f13f5@arm.com> Date: Wed, 5 Mar 2025 20:32:49 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [QUESTION] Plain dereference and READ_ONCE() in fault handler To: David Hildenbrand , willy@infradead.org, ziy@nvidia.com, hughd@google.com, ryan.roberts@arm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20250305102159.96420-1-dev.jain@arm.com> <8477d9ec-b9ce-4a3d-b61f-1bd44d3360a5@redhat.com> Content-Language: en-US From: Dev Jain In-Reply-To: <8477d9ec-b9ce-4a3d-b61f-1bd44d3360a5@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 6CD0AC002D X-Stat-Signature: gwb4ooit469bdteyrtsa3fg8dzqyd7yd X-HE-Tag: 1741186976-827362 X-HE-Meta: U2FsdGVkX19NNK8t1tL9MW/5hejjDxR/NN61BFtLU5l84OYYlf4ZmItU8smLXMsuS19LhORLN4FJgd17CodcT9a4DNrm++bFpJ9YC8N9TMSyXSP0Mbgy9H1Pv9KbEZQs4SVY1zllzF4rJ+9KJMtKz9pz4japPZX+GZpDgz+u74pr5ti7JpEXq/ZtLmtKp3NUhNNJpTN85Fljl45dR2fUSVniWuoVXsD+/r9xIkVouVdKDValN9IzbN6/RjRCJwK9zEFahSA96AKiw6g1eoZlWt5fQt6xhi3ZOA8dR6KRGiI14Eb+ZDqkC7GtDivpqHnToArPz7eMWx/8A1J0KQH+a0yYytqpDy01dEKNicSkDMoCg43Vcgwanup7R7V+Os+6JHO5ii/7xXbxLT7WMYfnsOAgfXWpHvQ3m0GuaONCikIqMhum6CfGMmngzza0kTJrv0uOO5BtdJ8VAsgudXHeKchwLyb8xEVUY2sZm+4OUXIlTGeUObgP0Y54nJj064TTvq98dsKszGcjqe8ShkzeJqsrSpqHAIu5NeWWlCrcPkSiG6lYAHPVLCiis/xR1LJNionE2w7Lx0aC8Jm9/4ZhrD45mYTY86WZWwuM6bJwVE8Bj9VkiWlAXDMH8eeYpXaFOzsaDKXRm7uXDgd/rk3BsLp5RTTdM3PQSHVxRhR0SHtLzhzcgZ+33zaQQNYiZ6ojDPEr9Qmi2gqcJDzuaMSUSUJqUqZfdM59XjEcz1zzZSXrUXnBzV/1bVZf5+h3rZSWmXsQ/5/nF3zWZ1mOMVpwYh3LTnAs+LZqPpDDf55kLhaFcaSCT4DbGsv4uodz20j6eYtaq22XdE4WjbtF8RBKroh41IGOIYxDw9XZu9Y4/oRUUhNMtn7R2u2EKLAIzQppwFoSqwiFJ0SDqRJSmzxfTs2KoDaCWagjAfoMb4nRd9PgnCtCPnIOU9knxzpj892BGX/w/0Oo7VvJlLOpZZf G/g9DUsY Z5z14Ml5iS+WG9MKJiJ4EwCCL8gRVxGlnxoBT+L2yW0xGoru1iSQIO/FK3Lob2/t3Z+4PeWded3QEqECuOlHijCR/luOkzbfxuNlV/dwOd+Ip5LPNfQ/IhgHUtVeLS1ga+mNK9/PDvAYn4JqdB/KtmQ14ZkUYGEeJZPMO6y631DyYaMmZz5dtkGx+zjPZnLYtmYQV3iSanrOaI2BeeatqpkBHRwhsTri/u0/mEO/H5Xnp/iaJ3xFgrNUEk2Wh7bW7KhfOC/dd0lXUfsItww80z9M7Hw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/03/25 4:16 pm, David Hildenbrand wrote: > On 05.03.25 11:21, Dev Jain wrote: >> In __handle_mm_fault(), >> >> 1. Why is there a barrier() for the PUD logic? > > Good question. It was added in > > commit a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517 > Author: Matthew Wilcox > Date:   Fri Feb 24 14:57:02 2017 -0800 > >     mm, x86: add support for PUD-sized transparent hugepages > > Maybe it was an alternative to performing a READ_ONCE(*vmf.pud). > > Maybe it was done that way, because pudp_get_lockless() does not exist. > And it would likely not be required, because on architectures where > ptep_get_lockless() does some magic (see below, mostly 32bit), PUD THP > are not applicable. Thanks for your reply David. > > >> 2. For the PMD logic, in the if block, we use *vmf.pmd, and in the >> else block >>     we use pmdp_get_lockless(); what if someone changes the pmd just >> when we >>     have begun processing the conditions in the if block, fail in the >> if block >>     and then the else block operates on a different pmd value. >> Shouldn't we cache >>     the value of the pmd and operate on a single consistent value >> until we take the >>     lock and then finally check using pxd_same() and friends? > > The pmd_none(*vmf.pmd) is fine. create_huge_pmd() must be able to deal > with races, because we are not holding any locks. I had a mental hiccup, yes we don't need the cached value even before the if block, as the relevant path will eventually check after taking the lock. I was thinking of all sorts of weird races. > > We only have to use pmdp_get_lockless() when we want to effectively > perform a READ_ONCE(), and make sure that we read something "reasonable" > that we can operate on, even with concurrent changes. (e.g., not read a > garbage PFN just because of some concurrent changes) Oh I just looked at the arm64 definition and assumed ptep_get_lockless() == READ_ONCE() :) Now this makes sense. So I am guessing that we can still get away with a *vmf.pmd on 64-bit arches? A separate question: When taking the create_huge_pmd() path with a read fault and after taking the pmd lock, why shouldn't we check with pmd_none(pmdp_get_lockless(vmf.pmd)) instead of plain *vmf.pmd...surely now we must load the actual correct value from memory since we are committing into mapping the huge zero folio? And after looking somewhat more, why even is a pmd_none(*pmd) there in set_huge_zero_folio()...it should be the responsibility of the caller to verify this? Any caller will just assume that it got the huge zero folio mapped so this check should be redundant. > > We'll store the value in vmf.orig_pmd, on which we'll operate and try to > detect later changes using pmd_same(). So we really want something > consistent in there. > > See the description above ptep_get_lockless(), why we cannot simply do a > READ_ONCE on architectures where a PTE cannot be read atomically (e.g., > 8 byte PTEs on 32bit architecture). >