linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Wei Yang <richard.weiyang@gmail.com>,
	Samuel Holland <samuel.holland@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <pjw@kernel.org>,
	linux-riscv@lists.infradead.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, devicetree@vger.kernel.org,
	Suren Baghdasaryan <surenb@google.com>,
	linux-kernel@vger.kernel.org, Mike Rapoport <rppt@kernel.org>,
	Michal Hocko <mhocko@suse.com>, Conor Dooley <conor@kernel.org>,
	Krzysztof Kozlowski <krzk+dt@kernel.org>,
	Alexandre Ghiti <alex@ghiti.fr>,
	Emil Renner Berthing <kernel@esmil.dk>,
	Rob Herring <robh+dt@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Julia Lawall <Julia.Lawall@inria.fr>,
	Nicolas Palix <nicolas.palix@imag.fr>,
	Anshuman Khandual <anshuman.khandual@arm.com>
Subject: Re: [PATCH v3 06/22] mm: Always use page table accessor functions
Date: Thu, 27 Nov 2025 08:14:45 +0100	[thread overview]
Message-ID: <bcd2a49d-a42f-41e8-9f64-4fd24fc862c7@kernel.org> (raw)
In-Reply-To: <b2fbe58e-f47b-4a76-879b-fd38a915c2ce@kernel.org>

On 11/26/25 21:31, David Hildenbrand (Red Hat) wrote:
> On 11/26/25 17:34, Ryan Roberts wrote:
>> On 26/11/2025 16:07, Ryan Roberts wrote:
>>> On 26/11/2025 15:12, David Hildenbrand (Red Hat) wrote:
>>>> On 11/26/25 16:08, Lorenzo Stoakes wrote:
>>>>> On Wed, Nov 26, 2025 at 03:56:13PM +0100, David Hildenbrand (Red Hat) wrote:
>>>>>> On 11/26/25 15:52, Lorenzo Stoakes wrote:
>>>>>>>
>>>>>>> Would the pmdp_get() never get invoked then? Or otherwise wouldn't that end up
>>>>>>> requiring a READ_ONCE() further up the stack?
>>>>>>
>>>>>> See my other reply, I think the pmdp_get() is required because all pud_*
>>>>>> functions are just simple stubs.
>>>>>
>>>>> OK, thought you were saying we should push further down the stack? Or up
>>>>> depending on how you view these things :P as in READ_ONCE at leaf?
>>>>
>>>> I think at leaf because I think the previous ones should essentially be only
>>>> used by stubs.
>>>>
>>>> But I haven't fully digested how this is all working. Or supposed to work.
>>>>
>>>> I'm trying to chew through the arch/arm/include/asm/pgtable-2level.h example to
>>>> see if I can make sense of it,
>>>
>>> I wonder if we can think about this slightly differently;
>>>
>>> READ_ONCE() has two important properties:
>>>
>>>    - It guarrantees that a load will be issued, *even if output is unused*
>>>    - It guarrantees that the read will be single-copy-atomic (no tearing)
>>>
>>> I think for the existing places where READ_ONCE() is used for pagetable reads we
>>> only care about:
>>>
>>>    - It guarrantees that a load will be issued, *if output is used*
>>>    - It guarrantees that the read will be single-copy-atomic (no tearing)
>>>
>>> I think if we can weaken to the "if output is used" property, then the compiler
>>> will optimize out all the unneccessary reads.
>>>
>>> AIUI, a C dereference provides neither of the guarrantees so that's no good.
>>>
>>> What about non-volatile asm? I'm told (thought need to verify) that for
>>> non-volatile asm, the compiler will emit it if the output is used and remove it
>>> otherwise. So if the asm contains the required single-copy-atomic, perhaps we
>>> are in business?
>>>
>>> So we would need a new READ_SCA() macro that could default to READ_ONCE() (which
>>> is stronger) and arches could opt in to providing a weaker asm version. Then the
>>> default pXdp_get() could be READ_SCA(). And this should work for all cases.
>>>
>>> I think.
>>
>> I'm not sure this works. It looks like the compiler is free to move non-volatile
>> asm sections which might be problematic for places where we are currently using
>> READ_ONCE() in lockless algorithms, (e.g. GUP?). We wouldn't want to end up with
>> a stale value.
>>
>> Another idea:
>>
>> Given the main pattern where we are aiming to optimize out the read is something
>> like:
>>
>> if (!pud_present(*pud))
>>
>> where for a folded pmd:
>>
>> static inline int pud_present(pud_t pud)	{ return 1; }
>>
>> And we will change it to this:
>>
>> if (!pud_present(pudp_get(pud)))
>>
>> ...
>>
>> perhaps we can just define the folded pXd_present(), pXd_none(), pXd_bad(),
>> pXd_user() and pXd_leaf() as macros:
>>
>> #define pud_present(pud)	1
>>
> 
> Let's take a step back and realize that with __PAGETABLE_PMD_FOLDED
> 
> (a) *pudp does not make any sense
> 
> For a folded PMD, *pudp == *pmdp and consequently we would actually
> get a PMD, not a PUD.
> 
> For this reason all these pud_* helpers ignore the passed value
> completely. It would be wrong.
> 
> (b) pmd_offset() does *not* consume a pud but instead a pudp.
> 
> That makes sense, just imagine what would happen if someone would pass
> *pudp to that helper (we'd dereference twice ...).
> 
> 
> So I wonder if we can just teach get_pudp() and friends to ... return
> true garbage instead of dereferencing something that does not make sense?
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 32e8457ad5352..c95d0d89ab3f1 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -351,7 +351,13 @@ static inline pmd_t pmdp_get(pmd_t *pmdp)
>    #ifndef pudp_get
>    static inline pud_t pudp_get(pud_t *pudp)
>    {
> +#ifdef __PAGETABLE_PMD_FOLDED
> +       pud_t dummy = { 0 };
> +
> +       return dummy;
> +#else
>           return READ_ONCE(*pudp);
> +#endif
>    }
>    #endif
>    
> set_pud/pud_page/pud_pgtable helper are confusing, I would
> assume they are essentially unused (like documented for set_put)
> and only required to keep compilers happy.

Staring at GUP-fast and perf_get_pgtable_size()---which should better be 
converted to pudp_get() etc--I guess we might have to rework 
p4d_offset_lockless() to do something that doesn't rely on
passing variables of local variables.

We might have to enlighten these walkers (and only these) about folded 
page tables such that they don't depend on the result of pudp_get() and 
friends.

-- 
Cheers

David


  reply	other threads:[~2025-11-27  7:14 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-13  1:45 [PATCH v3 00/22] riscv: Memory type control for platforms with physical memory aliases Samuel Holland
2025-11-13  1:45 ` [PATCH v3 01/22] mm/ptdump: replace READ_ONCE() with standard page table accessors Samuel Holland
2025-11-13  1:45 ` [PATCH v3 02/22] mm: " Samuel Holland
2025-11-13  4:05   ` Dev Jain
2025-11-13  1:45 ` [PATCH v3 03/22] mm/dirty: replace READ_ONCE() with pudp_get() Samuel Holland
2025-11-13  1:45 ` [PATCH v3 04/22] perf/events: replace READ_ONCE() with standard page table accessors Samuel Holland
2025-11-13 19:10   ` David Hildenbrand (Red Hat)
2025-11-13  1:45 ` [PATCH v3 05/22] mm: Move the fallback definitions of pXXp_get() Samuel Holland
2025-11-13 19:11   ` David Hildenbrand (Red Hat)
2025-11-13  1:45 ` [PATCH v3 06/22] mm: Always use page table accessor functions Samuel Holland
2025-11-13  4:53   ` kernel test robot
2025-11-13  5:46   ` kernel test robot
2025-11-26 11:08   ` Christophe Leroy (CS GROUP)
2025-11-26 11:09   ` Ryan Roberts
2025-11-26 12:16     ` David Hildenbrand (Red Hat)
2025-11-26 12:19       ` David Hildenbrand (Red Hat)
2025-11-26 12:27         ` Lorenzo Stoakes
2025-11-26 12:35           ` David Hildenbrand (Red Hat)
2025-11-26 13:03             ` Ryan Roberts
2025-11-26 13:47               ` Wei Yang
2025-11-26 14:22                 ` Ryan Roberts
2025-11-26 14:37                   ` Lorenzo Stoakes
2025-11-26 14:53                     ` David Hildenbrand (Red Hat)
2025-11-26 14:46                   ` David Hildenbrand (Red Hat)
2025-11-26 14:52                     ` Lorenzo Stoakes
2025-11-26 14:56                       ` David Hildenbrand (Red Hat)
2025-11-26 15:08                         ` Lorenzo Stoakes
2025-11-26 15:12                           ` David Hildenbrand (Red Hat)
2025-11-26 16:07                             ` Ryan Roberts
2025-11-26 16:34                               ` Ryan Roberts
2025-11-26 20:31                                 ` David Hildenbrand (Red Hat)
2025-11-27  7:14                                   ` David Hildenbrand (Red Hat) [this message]
2025-11-27  7:31                                     ` David Hildenbrand (Red Hat)
2025-11-27 15:32                                       ` Ryan Roberts
2025-11-27 19:39                                 ` Christophe Leroy (CS GROUP)
2025-11-27 19:44                                 ` Christophe Leroy (CS GROUP)
2025-11-27  8:26                   ` Christophe Leroy (CS GROUP)
2025-11-27  8:35                     ` David Hildenbrand (Red Hat)
2025-11-13  1:45 ` [PATCH v3 07/22] checkpatch: Warn on page table access without accessors Samuel Holland
2025-11-13  2:21   ` Joe Perches
2025-11-13  2:36     ` Samuel Holland
2025-11-13 19:17       ` David Hildenbrand (Red Hat)
2025-11-13  1:45 ` [PATCH v3 08/22] mm: Allow page table accessors to be non-idempotent Samuel Holland
2025-11-13  7:19   ` kernel test robot
2025-11-27 16:57   ` Ryan Roberts
2025-11-27 17:47     ` David Hildenbrand (Red Hat)
2025-11-13  1:45 ` [PATCH v3 09/22] riscv: hibernate: Replace open-coded pXXp_get() Samuel Holland
2025-11-13  1:45 ` [PATCH v3 10/22] riscv: mm: Always use page table accessor functions Samuel Holland
2025-11-13  1:45 ` [PATCH v3 11/22] riscv: mm: Simplify set_p4d() and set_pgd() Samuel Holland
2025-11-13  1:45 ` [PATCH v3 12/22] riscv: mm: Deduplicate _PAGE_CHG_MASK definition Samuel Holland
2025-11-13  1:45 ` [PATCH v3 13/22] riscv: ptdump: Only show N and MT bits when enabled in the kernel Samuel Holland
2025-11-13  1:45 ` [PATCH v3 14/22] riscv: mm: Fix up memory types when writing page tables Samuel Holland
2025-11-13  1:45 ` [PATCH v3 15/22] riscv: mm: Expose all page table bits to assembly code Samuel Holland
2025-11-13  1:45 ` [PATCH v3 16/22] riscv: alternative: Add an ALTERNATIVE_3 macro Samuel Holland
2025-11-13  1:45 ` [PATCH v3 17/22] riscv: alternative: Allow calls with alternate link registers Samuel Holland
2025-11-13  1:45 ` [PATCH v3 18/22] riscv: Fix logic for selecting DMA_DIRECT_REMAP Samuel Holland
2025-11-13  1:45 ` [PATCH v3 19/22] dt-bindings: riscv: Describe physical memory regions Samuel Holland
2025-12-04 15:12   ` Rob Herring
2025-11-13  1:45 ` [PATCH v3 20/22] riscv: mm: Use physical memory aliases to apply PMAs Samuel Holland
2025-11-13  1:45 ` [PATCH v3 21/22] riscv: dts: starfive: jh7100: Use physical memory ranges for DMA Samuel Holland
2025-11-13  1:45 ` [PATCH v3 22/22] riscv: dts: eswin: eic7700: " Samuel Holland
2025-11-13 19:13 ` [PATCH v3 00/22] riscv: Memory type control for platforms with physical memory aliases David Hildenbrand (Red Hat)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bcd2a49d-a42f-41e8-9f64-4fd24fc862c7@kernel.org \
    --to=david@kernel.org \
    --cc=Julia.Lawall@inria.fr \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=anshuman.khandual@arm.com \
    --cc=conor@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=kernel@esmil.dk \
    --cc=krzk+dt@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=nicolas.palix@imag.fr \
    --cc=palmer@dabbelt.com \
    --cc=pjw@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=robh+dt@kernel.org \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=samuel.holland@sifive.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox