linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Walleij <linus.walleij@linaro.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, pasha.tatashin@soleen.com,
	 Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com
Subject: Re: [PATCH] fork: stop ignoring NUMA while handling cached thread stacks
Date: Tue, 18 Nov 2025 22:15:04 +0100	[thread overview]
Message-ID: <CACRpkdbNxKh7ySjffhzCncgBroOOeOQP689k7dgBKgV9annLpg@mail.gmail.com> (raw)
In-Reply-To: <20251117140747.2566239-1-mjguzik@gmail.com>

Hi Mateusz,

excellent initiative!

I had this on some TODO-list, really nice to see that you
picked it up.

The patch looks solid just some questions:

On Mon, Nov 17, 2025 at 3:08 PM Mateusz Guzik <mjguzik@gmail.com> wrote:

> Note the current caching is already bad as the cache keeps overflowing
> and a different solution is needed for the long run, to be worked
> out(tm).

That isn't very strange since we just have 2 stacks in the cache.

The best I can think of is to scale the number of cached stacks to
a function of free physical memory and process fork rate, if we have
much memory (for some definition of) and we are forking a lot we
should keep some more stacks around, if the forkrate goes down
or we are low on memory compared to the stack size we should
dynamically scale down the stack cache size. (OTOMH)

> +static struct vm_struct *alloc_thread_stack_node_from_cache(struct task_struct *tsk, int node)
> +{
> +       struct vm_struct *vm_area;
> +       unsigned int i;
> +
> +       /*
> +        * If the node has memory, we are guaranteed the stacks are backed by local pages.
> +        * Otherwise the pages are arbitrary.
> +        *
> +        * Note that depending on cpuset it is possible we will get migrated to a different
> +        * node immediately after allocating here, so this does *not* guarantee locality for
> +        * arbitrary callers.
> +        */
> +       scoped_guard(preempt) {
> +               if (node != NUMA_NO_NODE && numa_node_id() != node)
> +                       return NULL;
> +
> +               for (i = 0; i < NR_CACHED_STACKS; i++) {
> +                       vm_area = this_cpu_xchg(cached_stacks[i], NULL);
> +                       if (vm_area)
> +                               return vm_area;

So we check each stack slot in order to see if we can find one which isn't
NULL, and we can use this_cpu_xchg() because nothing can contest
this here as we are under the preempt guard, so we will get a !NULL
vm_area then we know we are good, right?

>  static bool try_release_thread_stack_to_cache(struct vm_struct *vm_area)
>  {
>         unsigned int i;
> +       int nid;
> +
> +       scoped_guard(preempt) {
> +               nid = numa_node_id();
> +               if (node_state(nid, N_MEMORY)) {
> +                       for (i = 0; i < vm_area->nr_pages; i++) {
> +                               struct page *page = vm_area->pages[i];
> +                               if (page_to_nid(page) != nid)
> +                                       return false;
> +                       }
> +               }

I would maybe add a comment saying:

"if we have node-local memory, don't even bother to cache a stack
if any page of it isn't on the same node, we only want clean local
node stacks"

(I guess that is the semantic you wanted.)

>
> -       for (i = 0; i < NR_CACHED_STACKS; i++) {
> -               struct vm_struct *tmp = NULL;
> +               for (i = 0; i < NR_CACHED_STACKS; i++) {
> +                       struct vm_struct *tmp = NULL;
>
> -               if (this_cpu_try_cmpxchg(cached_stacks[i], &tmp, vm_area))
> -                       return true;
> +                       if (this_cpu_try_cmpxchg(cached_stacks[i], &tmp, vm_area))
> +                               return true;

So since this now is under the preemption guard, this will always
succeed, right? I understand that using this_cpu_try_cmpxchg() is
the idiom, but just asking so I don't miss something else
possibly contesting the stacks here.

If the code should have the same style as alloc_thread_stack_node_from_cache()
I suppose it should be:

for (i = 0; i < NR_CACHED_STACKS; i++) {
        struct vm_struct *tmp = NULL;
        if (!this_cpu_cmpxchg(cached_stacks[i], &tmp, vm_area))
                return true;

Since if it managed to exchange the old value NULL for
the value of vm_area then it is returning NULL on success.

If I understood correctly +/- the above code style change:
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij


  reply	other threads:[~2025-11-18 21:15 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-17 14:07 Mateusz Guzik
2025-11-18 21:15 ` Linus Walleij [this message]
2025-11-19 14:06   ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACRpkdbNxKh7ySjffhzCncgBroOOeOQP689k7dgBKgV9annLpg@mail.gmail.com \
    --to=linus.walleij@linaro.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mjguzik@gmail.com \
    --cc=pasha.tatashin@soleen.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox