From: David Laight <David.Laight@ACULAB.COM>
To: 'Pasha Tatashin' <pasha.tatashin@soleen.com>,
"H. Peter Anvin" <hpa@zytor.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"x86@kernel.org" <x86@kernel.org>, "bp@alien8.de" <bp@alien8.de>,
"brauner@kernel.org" <brauner@kernel.org>,
"bristot@redhat.com" <bristot@redhat.com>,
"bsegall@google.com" <bsegall@google.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"dianders@chromium.org" <dianders@chromium.org>,
"dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
"eric.devolder@oracle.com" <eric.devolder@oracle.com>,
"hca@linux.ibm.com" <hca@linux.ibm.com>,
"hch@infradead.org" <hch@infradead.org>,
"jacob.jun.pan@linux.intel.com" <jacob.jun.pan@linux.intel.com>,
"jgg@ziepe.ca" <jgg@ziepe.ca>,
"jpoimboe@kernel.org" <jpoimboe@kernel.org>,
"jroedel@suse.de" <jroedel@suse.de>,
"juri.lelli@redhat.com" <juri.lelli@redhat.com>,
"kent.overstreet@linux.dev" <kent.overstreet@linux.dev>,
"kinseyho@google.com" <kinseyho@google.com>,
"kirill.shutemov@linux.intel.com"
<kirill.shutemov@linux.intel.com>,
"lstoakes@gmail.com" <lstoakes@gmail.com>,
"luto@kernel.org" <luto@kernel.org>,
"mgorman@suse.de" <mgorman@suse.de>,
"mic@digikod.net" <mic@digikod.net>,
"michael.christie@oracle.com" <michael.christie@oracle.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"mjguzik@gmail.com" <mjguzik@gmail.com>,
"mst@redhat.com" <mst@redhat.com>,
"npiggin@gmail.com" <npiggin@gmail.com>,
"peterz@infradead.org" <peterz@infradead.org>,
"pmladek@suse.com" <pmladek@suse.com>,
"rick.p.edgecombe@intel.com" <rick.p.edgecombe@intel.com>,
"rostedt@goodmis.org" <rostedt@goodmis.org>,
"surenb@google.com" <surenb@google.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"urezki@gmail.com" <urezki@gmail.com>,
"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
"vschneid@redhat.com" <vschneid@redhat.com>
Subject: RE: [RFC 00/14] Dynamic Kernel Stacks
Date: Tue, 12 Mar 2024 22:18:39 +0000 [thread overview]
Message-ID: <e0e7e253412240b3b427624a984642e6@AcuMS.aculab.com> (raw)
In-Reply-To: <CA+CK2bC+bgOfohCEEW7nwAdakVmzg=RhUjjw=+Rw3wFALnOq-Q@mail.gmail.com>
...
> I re-read my cover letter, and I do not see where "kernel memory" is
> mentioned. We are talking about kernel stacks overhead that is
> proportional to the user workload, as every active thread has an
> associated kernel stack. The idea is to save memory by not
> pre-allocating all pages of kernel-stacks, but instead use it as a
> safeguard when a stack actually becomes deep. Come-up with a solution
> that can handle rare deeper stacks only when needed. This could be
> done through faulting on the supported hardware (as proposed in this
> series), or via pre-map on every schedule event, and checking the
> access when thread goes off cpu (as proposed by Andy Lutomirski to
> avoid double faults on x86) .
>
> In other words, this feature is only about one very specific type of
> kernel memory that is not even directly mapped (the feature required
> vmapped stacks).
Just for interest how big does the register save area get?
In the 'good old days' it could be allocated from the low end of the
stack memory. But AVX512 starts making it large - never mind some
other things that (IIRC) might get to 8k.
Even the task area is probably non-trivial since far fewer things
can be shared than one might hope.
I'm sure I remember someone contemplating not allocating stacks to
each thread. I think that requires waking up with a system call
restart for some system calls - plausibly possible for futex() and poll().
Another option is to do a proper static analysis of stack usage
and fix the paths that have deep stacks and remove all recursion.
I'm pretty sure objtool knows the stack offsets of every call instruction.
The indirect call hashes (fine IBT?) should allow indirect calls
be handled as well as direct calls.
Processing the 'A calls B at offset n' to generate a max depth
is just a SMOP.
At the moment I think all 'void (*)(void *)' function have the same hash?
So the compiler would need a function attribute to seed the hash.
With that you might be able to remove all the code paths that actually
use a lot of stack - instead of just guessing and limiting individual
stack frames.
My 'gut feel' from calculating the stack use that way for an embedded
system back in the early 1980s is that the max use will be inside
printk() inside an obscure error path and if you actually hit it
things will explode.
(We didn't have enough memory to allocate big enough stacks!)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
next prev parent reply other threads:[~2024-03-12 22:18 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-11 16:46 Pasha Tatashin
2024-03-11 16:46 ` [RFC 01/14] task_stack.h: remove obsolete __HAVE_ARCH_KSTACK_END check Pasha Tatashin
2024-03-17 14:36 ` Christophe JAILLET
2024-03-17 15:13 ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 02/14] fork: Clean-up ifdef logic around stack allocation Pasha Tatashin
2024-03-11 16:46 ` [RFC 03/14] fork: Clean-up naming of vm_strack/vm_struct variables in vmap stacks code Pasha Tatashin
2024-03-17 14:42 ` Christophe JAILLET
2024-03-19 16:32 ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 04/14] fork: Remove assumption that vm_area->nr_pages equals to THREAD_SIZE Pasha Tatashin
2024-03-17 14:45 ` Christophe JAILLET
2024-03-17 15:14 ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 05/14] fork: check charging success before zeroing stack Pasha Tatashin
2024-03-12 15:57 ` Kirill A. Shutemov
2024-03-12 16:52 ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 06/14] fork: zero vmap stack using clear_page() instead of memset() Pasha Tatashin
2024-03-12 7:15 ` Nikolay Borisov
2024-03-12 16:53 ` Pasha Tatashin
2024-03-14 7:55 ` Christophe Leroy
2024-03-14 13:52 ` Pasha Tatashin
2024-03-17 14:48 ` Christophe JAILLET
2024-03-17 15:15 ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 07/14] fork: use the first page in stack to store vm_stack in cached_stacks Pasha Tatashin
2024-03-11 16:46 ` [RFC 08/14] fork: separate vmap stack alloction and free calls Pasha Tatashin
2024-03-14 15:18 ` Jeff Xie
2024-03-14 17:14 ` Pasha Tatashin
2024-03-17 14:51 ` Christophe JAILLET
2024-03-17 15:15 ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 09/14] mm/vmalloc: Add a get_vm_area_node() and vmap_pages_range_noflush() public functions Pasha Tatashin
2024-03-11 16:46 ` [RFC 10/14] fork: Dynamic Kernel Stacks Pasha Tatashin
2024-03-11 19:32 ` Randy Dunlap
2024-03-11 19:55 ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 11/14] x86: add support for " Pasha Tatashin
2024-03-11 22:17 ` Andy Lutomirski
2024-03-11 23:10 ` Pasha Tatashin
2024-03-11 23:33 ` Thomas Gleixner
2024-03-11 23:34 ` Andy Lutomirski
2024-03-12 0:08 ` Pasha Tatashin
2024-03-12 0:23 ` Pasha Tatashin
2024-03-11 23:34 ` Dave Hansen
2024-03-11 23:41 ` Andy Lutomirski
2024-03-11 23:56 ` Nadav Amit
2024-03-12 0:02 ` Andy Lutomirski
2024-03-12 7:20 ` Nadav Amit
2024-03-12 0:53 ` Dave Hansen
2024-03-12 1:25 ` H. Peter Anvin
2024-03-12 2:16 ` Andy Lutomirski
2024-03-12 2:20 ` H. Peter Anvin
2024-03-12 21:58 ` Andi Kleen
2024-03-13 10:23 ` Thomas Gleixner
2024-03-13 13:43 ` Pasha Tatashin
2024-03-13 15:28 ` Pasha Tatashin
2024-03-13 16:12 ` Thomas Gleixner
2024-03-14 14:03 ` Pasha Tatashin
2024-03-14 18:26 ` Thomas Gleixner
2024-03-11 16:46 ` [RFC 12/14] task_stack.h: Clean-up stack_not_used() implementation Pasha Tatashin
2024-03-11 16:46 ` [RFC 13/14] task_stack.h: Add stack_not_used() support for dynamic stack Pasha Tatashin
2024-03-11 16:46 ` [RFC 14/14] fork: Dynamic Kernel Stack accounting Pasha Tatashin
2024-03-11 17:09 ` [RFC 00/14] Dynamic Kernel Stacks Mateusz Guzik
2024-03-11 18:58 ` Pasha Tatashin
2024-03-11 19:21 ` Mateusz Guzik
2024-03-11 19:55 ` Pasha Tatashin
2024-03-12 17:18 ` H. Peter Anvin
2024-03-12 19:45 ` Pasha Tatashin
2024-03-12 21:36 ` H. Peter Anvin
2024-03-14 19:05 ` Kent Overstreet
2024-03-14 19:23 ` Pasha Tatashin
2024-03-14 19:28 ` Kent Overstreet
2024-03-14 19:34 ` Pasha Tatashin
2024-03-14 19:49 ` Kent Overstreet
2024-03-12 22:18 ` David Laight [this message]
2024-03-14 19:43 ` Matthew Wilcox
2024-03-14 19:53 ` Kent Overstreet
2024-03-14 19:57 ` Matthew Wilcox
2024-03-14 19:58 ` Kent Overstreet
2024-03-15 3:13 ` Pasha Tatashin
2024-03-15 3:39 ` H. Peter Anvin
2024-03-16 19:17 ` Pasha Tatashin
2024-03-17 0:41 ` Matthew Wilcox
2024-03-17 1:32 ` Kent Overstreet
2024-03-17 14:19 ` Pasha Tatashin
2024-03-17 14:43 ` Brian Gerst
2024-03-17 16:15 ` Pasha Tatashin
2024-03-17 21:30 ` Brian Gerst
2024-03-18 14:59 ` Pasha Tatashin
2024-03-18 21:02 ` Brian Gerst
2024-03-19 14:56 ` Pasha Tatashin
2024-03-17 18:57 ` David Laight
2024-03-18 15:09 ` Pasha Tatashin
2024-03-18 15:13 ` Pasha Tatashin
2024-03-18 15:19 ` Matthew Wilcox
2024-03-18 15:30 ` Pasha Tatashin
2024-03-18 15:53 ` David Laight
2024-03-18 16:57 ` Pasha Tatashin
2024-03-18 15:38 ` David Laight
2024-03-18 17:00 ` Pasha Tatashin
2024-03-18 17:37 ` Pasha Tatashin
2024-03-15 4:17 ` H. Peter Anvin
2024-03-17 0:47 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e0e7e253412240b3b427624a984642e6@AcuMS.aculab.com \
--to=david.laight@aculab.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dave.hansen@linux.intel.com \
--cc=dianders@chromium.org \
--cc=dietmar.eggemann@arm.com \
--cc=eric.devolder@oracle.com \
--cc=hca@linux.ibm.com \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=jacob.jun.pan@linux.intel.com \
--cc=jgg@ziepe.ca \
--cc=jpoimboe@kernel.org \
--cc=jroedel@suse.de \
--cc=juri.lelli@redhat.com \
--cc=kent.overstreet@linux.dev \
--cc=kinseyho@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lstoakes@gmail.com \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=mic@digikod.net \
--cc=michael.christie@oracle.com \
--cc=mingo@redhat.com \
--cc=mjguzik@gmail.com \
--cc=mst@redhat.com \
--cc=npiggin@gmail.com \
--cc=pasha.tatashin@soleen.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rick.p.edgecombe@intel.com \
--cc=rostedt@goodmis.org \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox