linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: David Laight <David.Laight@aculab.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	Matthew Wilcox <willy@infradead.org>,
	 Kent Overstreet <kent.overstreet@linux.dev>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	 "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"x86@kernel.org" <x86@kernel.org>,  "bp@alien8.de" <bp@alien8.de>,
	"brauner@kernel.org" <brauner@kernel.org>,
	 "bristot@redhat.com" <bristot@redhat.com>,
	"bsegall@google.com" <bsegall@google.com>,
	 "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	 "dianders@chromium.org" <dianders@chromium.org>,
	 "dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
	 "eric.devolder@oracle.com" <eric.devolder@oracle.com>,
	"hca@linux.ibm.com" <hca@linux.ibm.com>,
	 "hch@infradead.org" <hch@infradead.org>,
	 "jacob.jun.pan@linux.intel.com" <jacob.jun.pan@linux.intel.com>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>,
	 "jpoimboe@kernel.org" <jpoimboe@kernel.org>,
	"jroedel@suse.de" <jroedel@suse.de>,
	 "juri.lelli@redhat.com" <juri.lelli@redhat.com>,
	"kinseyho@google.com" <kinseyho@google.com>,
	 "kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"lstoakes@gmail.com" <lstoakes@gmail.com>,
	 "luto@kernel.org" <luto@kernel.org>,
	"mgorman@suse.de" <mgorman@suse.de>,
	"mic@digikod.net" <mic@digikod.net>,
	 "michael.christie@oracle.com" <michael.christie@oracle.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	 "mjguzik@gmail.com" <mjguzik@gmail.com>,
	"mst@redhat.com" <mst@redhat.com>,
	 "npiggin@gmail.com" <npiggin@gmail.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	 "pmladek@suse.com" <pmladek@suse.com>,
	 "rick.p.edgecombe@intel.com" <rick.p.edgecombe@intel.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	 "surenb@google.com" <surenb@google.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	 "urezki@gmail.com" <urezki@gmail.com>,
	 "vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"vschneid@redhat.com" <vschneid@redhat.com>
Subject: Re: [RFC 00/14] Dynamic Kernel Stacks
Date: Mon, 18 Mar 2024 11:09:47 -0400	[thread overview]
Message-ID: <CA+CK2bAuNLXq4p8pjwAatuw2KuadhKjD6JRwJN8ZvSEd1d7ntA@mail.gmail.com> (raw)
In-Reply-To: <bb203717ab644362a8eafe78aff23947@AcuMS.aculab.com>

On Sun, Mar 17, 2024 at 2:58 PM David Laight <David.Laight@aculab.com> wrote:
>
> From: Pasha Tatashin
> > Sent: 16 March 2024 19:18
> ...
> > Expanding on Mathew's idea of an interface for dynamic kernel stack
> > sizes, here's what I'm thinking:
> >
> > - Kernel Threads: Create all kernel threads with a fully populated
> > THREAD_SIZE stack.  (i.e. 16K)
> > - User Threads: Create all user threads with THREAD_SIZE kernel stack
> > but only the top page mapped. (i.e. 4K)
> > - In enter_from_user_mode(): Expand the thread stack to 16K by mapping
> > three additional pages from the per-CPU stack cache. This function is
> > called early in kernel entry points.
> > - exit_to_user_mode(): Unmap the extra three pages and return them to
> > the per-CPU cache. This function is called late in the kernel exit
> > path.
>
> Isn't that entirely horrid for TLB use and so will require a lot of IPI?

The TLB load is going to be exactly the same as today, we already use
small pages for VMA mapped stacks. We won't need to have extra
flushing either, the mappings are in the kernel space, and once pages
are removed from the page table, no one is going to access that VA
space until that thread enters the kernel again. We will need to
invalidate the VA range only when the pages are mapped, and only on
the local cpu.

> Remember, if a thread sleeps in 'extra stack' and is then resheduled
> on a different cpu the extra pages get 'pumped' from one cpu to
> another.

Yes, the per-cpu cache can get unbalanced this way, we can remember
the original CPU where we acquired the pages to return to the same
place.

> I also suspect a stack_probe() is likely to end up being a cache miss
> and also slow???

Can you please elaborate on this point. I am not aware of
stack_probe() and how it is used.

> So you wouldn't want one on all calls.
> I'm not sure you'd want a conditional branch either.
>
> The explicit request for 'more stack' can be required to be allowed
> to sleep - removing a lot of issues.
> It would also be portable to all architectures.
> I'd also suspect that any thread that needs extra stack is likely
> to need to again.
> So while the memory could be recovered, I'd bet is isn't worth
> doing except under memory pressure.
> The call could also return 'no' - perhaps useful for (broken) code
> that insists on being recursive.

The current approach discussed is somewhat different from explicit
more stack requests API. I am investigating how feasible it is to use
kernel stack multiplexing, so the same pages can be re-used by many
threads when they are actually used. If the multiplexing approach
won't work, I will come back to the explicit more stack API.

> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)


  reply	other threads:[~2024-03-18 15:10 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-11 16:46 Pasha Tatashin
2024-03-11 16:46 ` [RFC 01/14] task_stack.h: remove obsolete __HAVE_ARCH_KSTACK_END check Pasha Tatashin
2024-03-17 14:36   ` Christophe JAILLET
2024-03-17 15:13     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 02/14] fork: Clean-up ifdef logic around stack allocation Pasha Tatashin
2024-03-11 16:46 ` [RFC 03/14] fork: Clean-up naming of vm_strack/vm_struct variables in vmap stacks code Pasha Tatashin
2024-03-17 14:42   ` Christophe JAILLET
2024-03-19 16:32     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 04/14] fork: Remove assumption that vm_area->nr_pages equals to THREAD_SIZE Pasha Tatashin
2024-03-17 14:45   ` Christophe JAILLET
2024-03-17 15:14     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 05/14] fork: check charging success before zeroing stack Pasha Tatashin
2024-03-12 15:57   ` Kirill A. Shutemov
2024-03-12 16:52     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 06/14] fork: zero vmap stack using clear_page() instead of memset() Pasha Tatashin
2024-03-12  7:15   ` Nikolay Borisov
2024-03-12 16:53     ` Pasha Tatashin
2024-03-14  7:55       ` Christophe Leroy
2024-03-14 13:52         ` Pasha Tatashin
2024-03-17 14:48   ` Christophe JAILLET
2024-03-17 15:15     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 07/14] fork: use the first page in stack to store vm_stack in cached_stacks Pasha Tatashin
2024-03-11 16:46 ` [RFC 08/14] fork: separate vmap stack alloction and free calls Pasha Tatashin
2024-03-14 15:18   ` Jeff Xie
2024-03-14 17:14     ` Pasha Tatashin
2024-03-17 14:51   ` Christophe JAILLET
2024-03-17 15:15     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 09/14] mm/vmalloc: Add a get_vm_area_node() and vmap_pages_range_noflush() public functions Pasha Tatashin
2024-03-11 16:46 ` [RFC 10/14] fork: Dynamic Kernel Stacks Pasha Tatashin
2024-03-11 19:32   ` Randy Dunlap
2024-03-11 19:55     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 11/14] x86: add support for " Pasha Tatashin
2024-03-11 22:17   ` Andy Lutomirski
2024-03-11 23:10     ` Pasha Tatashin
2024-03-11 23:33       ` Thomas Gleixner
2024-03-11 23:34       ` Andy Lutomirski
2024-03-12  0:08         ` Pasha Tatashin
2024-03-12  0:23           ` Pasha Tatashin
2024-03-11 23:34     ` Dave Hansen
2024-03-11 23:41       ` Andy Lutomirski
2024-03-11 23:56         ` Nadav Amit
2024-03-12  0:02           ` Andy Lutomirski
2024-03-12  7:20             ` Nadav Amit
2024-03-12  0:53           ` Dave Hansen
2024-03-12  1:25             ` H. Peter Anvin
2024-03-12  2:16               ` Andy Lutomirski
2024-03-12  2:20                 ` H. Peter Anvin
2024-03-12 21:58   ` Andi Kleen
2024-03-13 10:23   ` Thomas Gleixner
2024-03-13 13:43     ` Pasha Tatashin
2024-03-13 15:28       ` Pasha Tatashin
2024-03-13 16:12         ` Thomas Gleixner
2024-03-14 14:03           ` Pasha Tatashin
2024-03-14 18:26             ` Thomas Gleixner
2024-03-11 16:46 ` [RFC 12/14] task_stack.h: Clean-up stack_not_used() implementation Pasha Tatashin
2024-03-11 16:46 ` [RFC 13/14] task_stack.h: Add stack_not_used() support for dynamic stack Pasha Tatashin
2024-03-11 16:46 ` [RFC 14/14] fork: Dynamic Kernel Stack accounting Pasha Tatashin
2024-03-11 17:09 ` [RFC 00/14] Dynamic Kernel Stacks Mateusz Guzik
2024-03-11 18:58   ` Pasha Tatashin
2024-03-11 19:21     ` Mateusz Guzik
2024-03-11 19:55       ` Pasha Tatashin
2024-03-12 17:18 ` H. Peter Anvin
2024-03-12 19:45   ` Pasha Tatashin
2024-03-12 21:36     ` H. Peter Anvin
2024-03-14 19:05       ` Kent Overstreet
2024-03-14 19:23         ` Pasha Tatashin
2024-03-14 19:28           ` Kent Overstreet
2024-03-14 19:34             ` Pasha Tatashin
2024-03-14 19:49               ` Kent Overstreet
2024-03-12 22:18     ` David Laight
2024-03-14 19:43   ` Matthew Wilcox
2024-03-14 19:53     ` Kent Overstreet
2024-03-14 19:57       ` Matthew Wilcox
2024-03-14 19:58         ` Kent Overstreet
2024-03-15  3:13         ` Pasha Tatashin
2024-03-15  3:39           ` H. Peter Anvin
2024-03-16 19:17             ` Pasha Tatashin
2024-03-17  0:41               ` Matthew Wilcox
2024-03-17  1:32                 ` Kent Overstreet
2024-03-17 14:19                 ` Pasha Tatashin
2024-03-17 14:43               ` Brian Gerst
2024-03-17 16:15                 ` Pasha Tatashin
2024-03-17 21:30                   ` Brian Gerst
2024-03-18 14:59                     ` Pasha Tatashin
2024-03-18 21:02                       ` Brian Gerst
2024-03-19 14:56                         ` Pasha Tatashin
2024-03-17 18:57               ` David Laight
2024-03-18 15:09                 ` Pasha Tatashin [this message]
2024-03-18 15:13                   ` Pasha Tatashin
2024-03-18 15:19                   ` Matthew Wilcox
2024-03-18 15:30                     ` Pasha Tatashin
2024-03-18 15:53                       ` David Laight
2024-03-18 16:57                         ` Pasha Tatashin
2024-03-18 15:38               ` David Laight
2024-03-18 17:00                 ` Pasha Tatashin
2024-03-18 17:37                   ` Pasha Tatashin
2024-03-15  4:17           ` H. Peter Anvin
2024-03-17  0:47     ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+CK2bAuNLXq4p8pjwAatuw2KuadhKjD6JRwJN8ZvSEd1d7ntA@mail.gmail.com \
    --to=pasha.tatashin@soleen.com \
    --cc=David.Laight@aculab.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dianders@chromium.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=eric.devolder@oracle.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jpoimboe@kernel.org \
    --cc=jroedel@suse.de \
    --cc=juri.lelli@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=kinseyho@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mic@digikod.net \
    --cc=michael.christie@oracle.com \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox