linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless
@ 2024-12-07  8:29 Mateusz Guzik
  2024-12-09  9:28 ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Mateusz Guzik @ 2024-12-07  8:29 UTC (permalink / raw)
  To: yuzhao; +Cc: akpm, willy, linux-kernel, linux-mm, Mateusz Guzik

Explicitly pre-checking the count adds nothing as atomic_add_unless
starts with doing the same thing. iow no functional changes.

disasm of stock filemap_get_read_batch from perf top while running
readseek2_processes -t 24:

  0.04 │ cb:   mov    0x34(%rbx),%eax           # first load
 73.11 │       test   %eax,%eax
       │     ↓ je     1bd
  0.09 │       mov    0x34(%rbx),%eax           # second load
  1.01 │ d9:   test   %eax,%eax
       │     ↓ je     1bd
  0.06 │       lea    0x1(%rax),%edx
  0.00 │       lea    0x34(%rbx),%r14
  0.00 │       lock   cmpxchg %edx,0x34(%rbx)
 14.06 │     ↑ jne    d9

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---

I did not bother benchmarking, I don't think there is anything
warranting it for this one. fwiw it plausibly is worth few % in a
microbenchmark at higher core count.

 include/linux/page_ref.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 8c236c651d1d..fa203894876f 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -234,7 +234,7 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
 
 	rcu_read_lock();
 	/* avoid writing to the vmemmap area being remapped */
-	if (!page_is_fake_head(page) && page_ref_count(page) != u)
+	if (!page_is_fake_head(page))
 		ret = atomic_add_unless(&page->_refcount, nr, u);
 	rcu_read_unlock();
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless
  2024-12-07  8:29 [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless Mateusz Guzik
@ 2024-12-09  9:28 ` David Hildenbrand
  2024-12-09 10:25   ` Mateusz Guzik
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2024-12-09  9:28 UTC (permalink / raw)
  To: Mateusz Guzik, yuzhao; +Cc: akpm, willy, linux-kernel, linux-mm

On 07.12.24 09:29, Mateusz Guzik wrote:
> Explicitly pre-checking the count adds nothing as atomic_add_unless
> starts with doing the same thing. iow no functional changes.

I recall that we added that check because with the hugetlb vmemmap 
optimization, some of the tail pages we don't ever expect to be modified 
  (because they are fake-duplicated) might be mapped R/O.

If the arch implementation of atomic_add_unless() would trigger an 
unconditional write fault, we'd be in trouble. That would likely only be 
the case if the arch provides a dedicate instruction.

atomic_add_unless()->raw_atomic_add_unless()

Nobody currently defines arch_atomic_add_unless().

raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined 
on some architectures.

I scanned some of the inline-asm, and I think most of them perform a 
check first.


So this currently looks good to me, but we'll rely on the fact that 
atomic_add_unless() will never trigger a write fault if the values 
match. Which makes me wonder if we should document that behavior of 
atomic_add_unless().

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless
  2024-12-09  9:28 ` David Hildenbrand
@ 2024-12-09 10:25   ` Mateusz Guzik
  2024-12-09 10:56     ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Mateusz Guzik @ 2024-12-09 10:25 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: yuzhao, akpm, willy, linux-kernel, linux-mm

On Mon, Dec 9, 2024 at 10:28 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.12.24 09:29, Mateusz Guzik wrote:
> > Explicitly pre-checking the count adds nothing as atomic_add_unless
> > starts with doing the same thing. iow no functional changes.
>
> I recall that we added that check because with the hugetlb vmemmap
> optimization, some of the tail pages we don't ever expect to be modified
>   (because they are fake-duplicated) might be mapped R/O.
>
> If the arch implementation of atomic_add_unless() would trigger an
> unconditional write fault, we'd be in trouble. That would likely only be
> the case if the arch provides a dedicate instruction.
>
> atomic_add_unless()->raw_atomic_add_unless()
>
> Nobody currently defines arch_atomic_add_unless().
>
> raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined
> on some architectures.
>
> I scanned some of the inline-asm, and I think most of them perform a
> check first.
>

Huh.

Some arch triggering a write fault despite not changing the value is
not something I thought about. Sounds pretty broken to me if any arch
was to do it, but then stranger things did happen.

However, if this is seen as a real concern, then I think the best way
forward is to drop the patch (and maybe instead add a comment what's
up with the extra load).
-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless
  2024-12-09 10:25   ` Mateusz Guzik
@ 2024-12-09 10:56     ` David Hildenbrand
  2024-12-09 12:33       ` Mateusz Guzik
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2024-12-09 10:56 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: yuzhao, akpm, willy, linux-kernel, linux-mm

On 09.12.24 11:25, Mateusz Guzik wrote:
> On Mon, Dec 9, 2024 at 10:28 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.12.24 09:29, Mateusz Guzik wrote:
>>> Explicitly pre-checking the count adds nothing as atomic_add_unless
>>> starts with doing the same thing. iow no functional changes.
>>
>> I recall that we added that check because with the hugetlb vmemmap
>> optimization, some of the tail pages we don't ever expect to be modified
>>    (because they are fake-duplicated) might be mapped R/O.
>>
>> If the arch implementation of atomic_add_unless() would trigger an
>> unconditional write fault, we'd be in trouble. That would likely only be
>> the case if the arch provides a dedicate instruction.
>>
>> atomic_add_unless()->raw_atomic_add_unless()
>>
>> Nobody currently defines arch_atomic_add_unless().
>>
>> raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined
>> on some architectures.
>>
>> I scanned some of the inline-asm, and I think most of them perform a
>> check first.
>>
> 
> Huh.
> 
> Some arch triggering a write fault despite not changing the value is
> not something I thought about. Sounds pretty broken to me if any arch
> was to do it, but then stranger things did happen.

Yeah, it really depends on what the architecture defines. For example, 
on s390x for "COMPARE AND SWAP" the spec states something like

"When the result of the comparison is unequal, the
second operand is loaded at the first-operand loca-
tion, and the second-operand location remains
unchanged. However, on some models, the contents
may be fetched and subsequently stored back
unchanged at the second-operand location. This
update appears to be a block-concurrent interlocked-
update reference as observed by other CPUs."

So there might be an unconditional store on an instruction where one 
would not expect it.


Something similar-but-different recently popped up on aarch64, which 
does what one would expect:

"The atomic RMW instructions, for example, ldadd, actually does load +
add + store in one instruction, it will trigger two page faults per the
ARM64 architecture spec, the first fault is a read fault, the second
fault is a write fault.

Some applications use atomic RMW instructions to populate memory, for
example, openjdk uses atomic-add-0 to do pretouch (populate heap memory
at launch time) between v18 and v22 in order to permit use of memory
concurrently with pretouch." [1]

And Christoph commented:

"x86 does not do a read fault on atomics so we have an issue htere." [2]

I did not check if that is actually true on x86.



> 
> However, if this is seen as a real concern, then I think the best way
> forward is to drop the patch (and maybe instead add a comment what's
> up with the extra load).

I assume we're currently fine because no architecture actually defines 
such an instruction that would be a problem for add_unless.


[1] https://lore.kernel.org/linux-arm-kernel/Zmw1jltdkMrTrT_l@arm.com/T/
[2] 
https://lore.kernel.org/linux-arm-kernel/c1ba9ba3-b0d6-4c6c-d628-614751d737c2@gentwo.org/

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless
  2024-12-09 10:56     ` David Hildenbrand
@ 2024-12-09 12:33       ` Mateusz Guzik
  2024-12-09 14:22         ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Mateusz Guzik @ 2024-12-09 12:33 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: yuzhao, akpm, willy, linux-kernel, linux-mm

On Mon, Dec 9, 2024 at 11:56 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 09.12.24 11:25, Mateusz Guzik wrote:
> > On Mon, Dec 9, 2024 at 10:28 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 07.12.24 09:29, Mateusz Guzik wrote:
> >>> Explicitly pre-checking the count adds nothing as atomic_add_unless
> >>> starts with doing the same thing. iow no functional changes.
> >>
> >> I recall that we added that check because with the hugetlb vmemmap
> >> optimization, some of the tail pages we don't ever expect to be modified
> >>    (because they are fake-duplicated) might be mapped R/O.
> >>
> >> If the arch implementation of atomic_add_unless() would trigger an
> >> unconditional write fault, we'd be in trouble. That would likely only be
> >> the case if the arch provides a dedicate instruction.
> >>
> >> atomic_add_unless()->raw_atomic_add_unless()
> >>
> >> Nobody currently defines arch_atomic_add_unless().
> >>
> >> raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined
> >> on some architectures.
> >>
> >> I scanned some of the inline-asm, and I think most of them perform a
> >> check first.
> >>
> >
> > Huh.
> >
> > Some arch triggering a write fault despite not changing the value is
> > not something I thought about. Sounds pretty broken to me if any arch
> > was to do it, but then stranger things did happen.
>
> Yeah, it really depends on what the architecture defines. For example,
> on s390x for "COMPARE AND SWAP" the spec states something like
>
[snip]

Well in this context you need to do the initial load to even know what
to CAS with, unless you want to blindly do it hoping to get lucky,
which I'm assuming no arch is doing.

Granted, if there was an architecture which had an actual "cas unless
the value is x" then this would not hold, but I don't know of any.
[such an extension would be most welcome fwiw]

Assuming you indeed want the patch after all, can you sort out adding
a comment to atomic_add_unless yourself? ;) I presume you know the
right people and whatnot, so this would cut down on back and forth.

That is to say I think this thread just about exhausted the time
warranted by this patch. No hard feelz if it gets dropped, but then I
do strongly suggest adding a justification to the extra load.

Cheers,
-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless
  2024-12-09 12:33       ` Mateusz Guzik
@ 2024-12-09 14:22         ` David Hildenbrand
  2024-12-09 14:30           ` Mateusz Guzik
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2024-12-09 14:22 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: yuzhao, akpm, willy, linux-kernel, linux-mm

On 09.12.24 13:33, Mateusz Guzik wrote:
> On Mon, Dec 9, 2024 at 11:56 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 09.12.24 11:25, Mateusz Guzik wrote:
>>> On Mon, Dec 9, 2024 at 10:28 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 07.12.24 09:29, Mateusz Guzik wrote:
>>>>> Explicitly pre-checking the count adds nothing as atomic_add_unless
>>>>> starts with doing the same thing. iow no functional changes.
>>>>
>>>> I recall that we added that check because with the hugetlb vmemmap
>>>> optimization, some of the tail pages we don't ever expect to be modified
>>>>     (because they are fake-duplicated) might be mapped R/O.
>>>>
>>>> If the arch implementation of atomic_add_unless() would trigger an
>>>> unconditional write fault, we'd be in trouble. That would likely only be
>>>> the case if the arch provides a dedicate instruction.
>>>>
>>>> atomic_add_unless()->raw_atomic_add_unless()
>>>>
>>>> Nobody currently defines arch_atomic_add_unless().
>>>>
>>>> raw_atomic_fetch_add_unless()->arch_atomic_fetch_add_unless() is defined
>>>> on some architectures.
>>>>
>>>> I scanned some of the inline-asm, and I think most of them perform a
>>>> check first.
>>>>
>>>
>>> Huh.
>>>
>>> Some arch triggering a write fault despite not changing the value is
>>> not something I thought about. Sounds pretty broken to me if any arch
>>> was to do it, but then stranger things did happen.
>>
>> Yeah, it really depends on what the architecture defines. For example,
>> on s390x for "COMPARE AND SWAP" the spec states something like
>>
> [snip]
> 
> Well in this context you need to do the initial load to even know what
> to CAS with, unless you want to blindly do it hoping to get lucky,
> which I'm assuming no arch is doing.
> 
> Granted, if there was an architecture which had an actual "cas unless
> the value is x" then this would not hold, but I don't know of any.
> [such an extension would be most welcome fwiw]

Apparently, we prepared for that via arch_atomic_add_unless(), which has no users.

> 
> Assuming you indeed want the patch after all, can you sort out adding
> a comment to atomic_add_unless yourself? ;) I presume you know the
> right people and whatnot, so this would cut down on back and forth.
> 
> That is to say I think this thread just about exhausted the time
> warranted by this patch. No hard feelz if it gets dropped, but then I
> do strongly suggest adding a justification to the extra load.

Maybe it's sufficient for now to simply do your change with a comment:

diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 8c236c651d1d6..1efc992ad5687 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -234,7 +234,13 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
  
         rcu_read_lock();
         /* avoid writing to the vmemmap area being remapped */
-       if (!page_is_fake_head(page) && page_ref_count(page) != u)
+       if (!page_is_fake_head(page))
+               /*
+                * atomic_add_unless() will currently never modify the value
+                * if it already is u. If that ever changes, we'd have to have
+                * a separate check here, such that we won't be writing to
+                * write-protected vmemmap areas.
+                */
                 ret = atomic_add_unless(&page->_refcount, nr, u);
         rcu_read_unlock();
  

It would bail out during testing ... hopefully, such that we can detect any such change.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless
  2024-12-09 14:22         ` David Hildenbrand
@ 2024-12-09 14:30           ` Mateusz Guzik
  2024-12-09 14:46             ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Mateusz Guzik @ 2024-12-09 14:30 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: yuzhao, akpm, willy, linux-kernel, linux-mm

On Mon, Dec 9, 2024 at 3:22 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 09.12.24 13:33, Mateusz Guzik wrote:
> > That is to say I think this thread just about exhausted the time
> > warranted by this patch. No hard feelz if it gets dropped, but then I
> > do strongly suggest adding a justification to the extra load.
>
> Maybe it's sufficient for now to simply do your change with a comment:
>
> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> index 8c236c651d1d6..1efc992ad5687 100644
> --- a/include/linux/page_ref.h
> +++ b/include/linux/page_ref.h
> @@ -234,7 +234,13 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
>
>          rcu_read_lock();
>          /* avoid writing to the vmemmap area being remapped */
> -       if (!page_is_fake_head(page) && page_ref_count(page) != u)
> +       if (!page_is_fake_head(page))
> +               /*
> +                * atomic_add_unless() will currently never modify the value
> +                * if it already is u. If that ever changes, we'd have to have
> +                * a separate check here, such that we won't be writing to
> +                * write-protected vmemmap areas.
> +                */
>                  ret = atomic_add_unless(&page->_refcount, nr, u);
>          rcu_read_unlock();
>
>
> It would bail out during testing ... hopefully, such that we can detect any such change.
>

Not my call to make, but looks good. ;)

fwiw I don't need any credit and I would be more than happy if you
just submitted the thing as your own without me being mentioned. *No*
cc would also be appreciated.

-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless
  2024-12-09 14:30           ` Mateusz Guzik
@ 2024-12-09 14:46             ` David Hildenbrand
  2025-01-07  4:41               ` Yu Zhao
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2024-12-09 14:46 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: yuzhao, akpm, willy, linux-kernel, linux-mm

On 09.12.24 15:30, Mateusz Guzik wrote:
> On Mon, Dec 9, 2024 at 3:22 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 09.12.24 13:33, Mateusz Guzik wrote:
>>> That is to say I think this thread just about exhausted the time
>>> warranted by this patch. No hard feelz if it gets dropped, but then I
>>> do strongly suggest adding a justification to the extra load.
>>
>> Maybe it's sufficient for now to simply do your change with a comment:
>>
>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
>> index 8c236c651d1d6..1efc992ad5687 100644
>> --- a/include/linux/page_ref.h
>> +++ b/include/linux/page_ref.h
>> @@ -234,7 +234,13 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
>>
>>           rcu_read_lock();
>>           /* avoid writing to the vmemmap area being remapped */
>> -       if (!page_is_fake_head(page) && page_ref_count(page) != u)
>> +       if (!page_is_fake_head(page))
>> +               /*
>> +                * atomic_add_unless() will currently never modify the value
>> +                * if it already is u. If that ever changes, we'd have to have
>> +                * a separate check here, such that we won't be writing to
>> +                * write-protected vmemmap areas.
>> +                */
>>                   ret = atomic_add_unless(&page->_refcount, nr, u);
>>           rcu_read_unlock();
>>
>>
>> It would bail out during testing ... hopefully, such that we can detect any such change.
>>
> 
> Not my call to make, but looks good. ;)
> 
> fwiw I don't need any credit and I would be more than happy if you
> just submitted the thing as your own without me being mentioned. *No*
> cc would also be appreciated.

Likely Andrew can add the comment as a fixup.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless
  2024-12-09 14:46             ` David Hildenbrand
@ 2025-01-07  4:41               ` Yu Zhao
  0 siblings, 0 replies; 9+ messages in thread
From: Yu Zhao @ 2025-01-07  4:41 UTC (permalink / raw)
  To: David Hildenbrand, Mateusz Guzik; +Cc: akpm, willy, linux-kernel, linux-mm

On Mon, Dec 9, 2024 at 7:46 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 09.12.24 15:30, Mateusz Guzik wrote:
> > On Mon, Dec 9, 2024 at 3:22 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 09.12.24 13:33, Mateusz Guzik wrote:
> >>> That is to say I think this thread just about exhausted the time
> >>> warranted by this patch. No hard feelz if it gets dropped, but then I
> >>> do strongly suggest adding a justification to the extra load.
> >>
> >> Maybe it's sufficient for now to simply do your change with a comment:
> >>
> >> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> >> index 8c236c651d1d6..1efc992ad5687 100644
> >> --- a/include/linux/page_ref.h
> >> +++ b/include/linux/page_ref.h
> >> @@ -234,7 +234,13 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u)
> >>
> >>           rcu_read_lock();
> >>           /* avoid writing to the vmemmap area being remapped */
> >> -       if (!page_is_fake_head(page) && page_ref_count(page) != u)
> >> +       if (!page_is_fake_head(page))
> >> +               /*
> >> +                * atomic_add_unless() will currently never modify the value
> >> +                * if it already is u. If that ever changes, we'd have to have
> >> +                * a separate check here, such that we won't be writing to
> >> +                * write-protected vmemmap areas.
> >> +                */
> >>                   ret = atomic_add_unless(&page->_refcount, nr, u);
> >>           rcu_read_unlock();
> >>
> >>
> >> It would bail out during testing ... hopefully, such that we can detect any such change.
> >>
> >
> > Not my call to make, but looks good. ;)
> >
> > fwiw I don't need any credit and I would be more than happy if you
> > just submitted the thing as your own without me being mentioned. *No*
> > cc would also be appreciated.
>
> Likely Andrew can add the comment as a fixup.

Sorry for not getting back to you sooner: the page_ref_count() test
actually is a must, however, I didn't do it in the right order.

I only realized my carelessness after Will asked me about it here [1]
-- I just posted a fix here [2], CC'ing both of you. If any of you
have concerns about it, please let me know in that thread. Otherwise
I'll ask Andew to drop this patch. Thank you.

[1] https://lore.kernel.org/linux-mm/20241128142028.GA3506@willie-the-truck/
[2] https://lore.kernel.org/linux-mm/20250107043505.351925-1-yuzhao@google.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-01-07  4:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-07  8:29 [PATCH] mm: remove an avoidable load of page refcount in page_ref_add_unless Mateusz Guzik
2024-12-09  9:28 ` David Hildenbrand
2024-12-09 10:25   ` Mateusz Guzik
2024-12-09 10:56     ` David Hildenbrand
2024-12-09 12:33       ` Mateusz Guzik
2024-12-09 14:22         ` David Hildenbrand
2024-12-09 14:30           ` Mateusz Guzik
2024-12-09 14:46             ` David Hildenbrand
2025-01-07  4:41               ` Yu Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox