From: David Hildenbrand <david@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Petr Vaněk" <arkamar@atlas.cz>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"Ryan Roberts" <ryan.roberts@arm.com>,
xen-devel@lists.xenproject.org, x86@kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH v2 1/1] mm: fix folio_pte_batch() on XEN PV
Date: Sun, 4 May 2025 10:58:44 +0200 [thread overview]
Message-ID: <62990a3f-524f-4362-8f64-2fc582986eba@redhat.com> (raw)
In-Reply-To: <20250504001547.177b2aba8c2ffbfe63e0552e@linux-foundation.org>
On 04.05.25 09:15, Andrew Morton wrote:
> On Sun, 4 May 2025 08:47:45 +0200 David Hildenbrand <david@redhat.com> wrote:
>
>>>
>>> Methinks max_nr really wants to be unsigned long.
>>
>> We only batch within a single PTE table, so an integer was sufficient.
>>
>> The unsigned value is the result of a discussion with Ryan regarding similar/related
>> (rmap) functions:
>>
>> "
>> Personally I'd go with signed int (since
>> that's what all the counters in struct folio that we are manipulating are,
>> underneath the atomic_t) then check that nr_pages > 0 in
>> __folio_rmap_sanity_checks().
>> "
>>
>> https://lore.kernel.org/linux-mm/20231204142146.91437-14-david@redhat.com/T/#ma0bfff0102f0f2391dfa94aa22a8b7219b92c957
>>
>> As soon as we let "max_nr" be an "unsigned long", also the return value
>> should be an "unsigned long", and everybody calling that function.
>>
>> In this case here, we should likely just use whatever type "max_nr" is.
>>
>> Not sure myself if we should change that here to unsigned long or long. Some
>> callers also operate with the negative values IIRC (e.g., adjust the RSS by doing -= nr).
>
> "rss -= nr" doesn't require, expect or anticipate that `nr' can be negative!
The one thing I ran into with "unsigned int" around folio_nr_pages()
was that if you pass
-folio-nr_pages()
into a function that expects an "long" (add vs. remove a value to a counter), then
the result might not be what one would expect when briefly glimpsing at the code:
#include <stdio.h>
static __attribute__((noinline)) void print(long diff)
{
printf("%ld\n", diff);
}
static int value_int()
{
return 12345;
}
static unsigned int value_unsigned_int()
{
return 12345;
}
static int value_long()
{
return 12345;
}
static unsigned long value_unsigned_long()
{
return 12345;
}
int main(void)
{
print(-value_int());
print(-value_unsigned_int());
print(-value_long());
print(-value_unsigned_long());
return 0;
}
$ ./tmp
-12345
4294954951
-12345
-12345
So, I am fine with using "unsigned long" (as stated in that commit description below).
>
>>
>>> That will permit the
>>> cleanup of quite a bit of truncation, extension, signedness conversion
>>> and general type chaos in folio_pte_batch()'s various callers.
>>>> And...
>>>
>>> Why does folio_nr_pages() return a signed quantity? It's a count.
>>
>> A partial answer is in 1ea5212aed068 ("mm: factor out large folio handling
>> from folio_nr_pages() into folio_large_nr_pages()"), where I stumbled over the
>> reason for a signed value myself and at least made the other
>> functions be consistent with folio_nr_pages():
>>
>> "
>> While at it, let's consistently return a "long" value from all these
>> similar functions. Note that we cannot use "unsigned int" (even though
>> _folio_nr_pages is of that type), because it would break some callers that
>> do stuff like "-folio_nr_pages()". Both "int" or "unsigned long" would
>> work as well.
>>
>> "
>>
>> Note that folio_nr_pages() returned a "long" since the very beginning. Probably using
>> a signed value for consistency because also mapcounts / refcounts are all signed.
>
> Geeze.
>
> Can we step back and look at what we're doing? Anything which counts
> something (eg, has "nr" in the identifier) cannot be negative.
Yes. Unless we want to catch underflows (e.g., mapcount / refcount). For "nr_pages" I agree.
>
> It's that damn "int" thing. I think it was always a mistake that the C
> language's go-to type is a signed one.
Yeah. But see above that "unsigned int" in combination with long can also cause pain.
> It's a system programming
> language and system software rarely deals with negative scalars.
> Signed scalars are the rare case.
>
> I do expect that the code in and around here would be cleaner and more
> reliable if we were to do a careful expunging of inappropriately signed
> variables.
Maybe, but it would mostly be a "int -> unsigned long" conversion, probably not
much more. I'm not against cleaning that up at all.
>
>>
>>>
>>> And why the heck is folio_pte_batch() inlined? It's larger then my
>>> first hard disk and it has five callsites!
>>
>> :)
>>
>> In case of fork/zap we really want it inlined because
>>
>> (1) We want to optimize out all of the unnecessary checks we added for other users
>>
>> (2) Zap/fork code is very sensitive to function call overhead
>>
>> Probably, as that function sees more widespread use, we might want a
>> non-inlined variant that can be used in places where performance doesn't
>> matter all that much (although I am not sure there will be that many).
>
> a quick test.
>
> before:
> text data bss dec hex filename
> 12380 470 0 12850 3232 mm/madvise.o
> 52975 2689 24 55688 d988 mm/memory.o
> 25305 1448 2096 28849 70b1 mm/mempolicy.o
> 8573 924 4 9501 251d mm/mlock.o
> 20950 5864 16 26830 68ce mm/rmap.o
>
> (120183)
>
> after:
>
> text data bss dec hex filename
> 11916 470 0 12386 3062 mm/madvise.o
> 52990 2697 24 55711 d99f mm/memory.o
> 25161 1448 2096 28705 7021 mm/mempolicy.o
> 8381 924 4 9309 245d mm/mlock.o
> 20806 5864 16 26686 683e mm/rmap.o
>
> (119254)
>
> so uninlining saves a kilobyte of text - less than I expected but
> almost 1%.
As I said, for fork+zap/unmap we really want to inline -- the first two users
of that function when that function was still simpler and resided in mm/memory.o. For
the other users, probably okay to have a non-inlined one in mm/util.c .
--
Cheers,
David / dhildenb
prev parent reply other threads:[~2025-05-04 8:58 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-02 21:50 [PATCH v2 0/1] mm: Xen PV regression after THP PTE optimization Petr Vaněk
2025-05-02 21:50 ` [PATCH v2 1/1] mm: fix folio_pte_batch() on XEN PV Petr Vaněk
2025-05-04 1:28 ` Andrew Morton
2025-05-04 6:47 ` David Hildenbrand
2025-05-04 7:15 ` Andrew Morton
2025-05-04 8:58 ` David Hildenbrand [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=62990a3f-524f-4362-8f64-2fc582986eba@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arkamar@atlas.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=stable@vger.kernel.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox