* [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks
@ 2024-02-23 1:03 Pasha Tatashin
2024-02-23 21:49 ` Peter Collingbourne
2024-04-30 13:07 ` [Lsf-pc] " Michal Hocko
0 siblings, 2 replies; 7+ messages in thread
From: Pasha Tatashin @ 2024-02-23 1:03 UTC (permalink / raw)
To: lsf-pc; +Cc: linux-mm
For a long time, an 8K kernel stack was large enough. However, since
2014, the default stack size has increased to 16K [1]. To conserve
memory at Google, we maintained 8K stacks via a custom patch while
verifying that our workload could fit within this limit.
As we qualify new workloads and kernels, we find it more difficult to
keep the stacks at 8K. Therefore, we will increase the stack size to
the mainline value of 16K. However, this translates to a significant
increase in memory usage, potentially counted in petabytes.
With virtually mapped stacks [2], it's possible to implement
auto-growth on faults. Ideally, the vast majority of kernel threads
could fit into 4K or 8K stacks, with only a small number requiring
deeper stacks that would expand as needed.
The complication is that new pages must always be available from
within an interrupt context. To ensure this, pages must be accessible
to kernel threads in an atomic and lockless manner. This could be
achieved by using a per-CPU supply of pages dedicated to handling
kernel-stack faults.
[1] https://lwn.net/Articles/600644
[2] https://lwn.net/Articles/692608
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks
2024-02-23 1:03 [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks Pasha Tatashin
@ 2024-02-23 21:49 ` Peter Collingbourne
2024-02-23 21:56 ` Matthew Wilcox
2024-04-30 13:07 ` [Lsf-pc] " Michal Hocko
1 sibling, 1 reply; 7+ messages in thread
From: Peter Collingbourne @ 2024-02-23 21:49 UTC (permalink / raw)
To: Pasha Tatashin, Alexandru Elisei, David Hildenbrand; +Cc: lsf-pc, cc: linux-mm
On Thu, Feb 22, 2024, 17:04 Pasha Tatashin <pasha.tatashin@soleen.com> wrote:
>
> For a long time, an 8K kernel stack was large enough. However, since
> 2014, the default stack size has increased to 16K [1]. To conserve
> memory at Google, we maintained 8K stacks via a custom patch while
> verifying that our workload could fit within this limit.
>
> As we qualify new workloads and kernels, we find it more difficult to
> keep the stacks at 8K. Therefore, we will increase the stack size to
> the mainline value of 16K. However, this translates to a significant
> increase in memory usage, potentially counted in petabytes.
>
> With virtually mapped stacks [2], it's possible to implement
> auto-growth on faults. Ideally, the vast majority of kernel threads
> could fit into 4K or 8K stacks, with only a small number requiring
> deeper stacks that would expand as needed.
>
> The complication is that new pages must always be available from
> within an interrupt context. To ensure this, pages must be accessible
> to kernel threads in an atomic and lockless manner. This could be
> achieved by using a per-CPU supply of pages dedicated to handling
> kernel-stack faults.
>
> [1] https://lwn.net/Articles/600644
> [2] https://lwn.net/Articles/692608
Hi Pasha,
I wonder if this is another potential use case for bringing back
cleancache, as proposed in [1]? The idea would be that all kernel
stacks have 16KB allocations but only one page accessible and the rest
available as cleancache. We can handle a fault on one of those pages
by discarding the cleancache page and remapping it as r/w.
Peter
[1] https://lore.kernel.org/all/ZdSMbjGf2Fj98diT@raptor/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks
2024-02-23 21:49 ` Peter Collingbourne
@ 2024-02-23 21:56 ` Matthew Wilcox
2024-02-23 22:01 ` Peter Collingbourne
0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2024-02-23 21:56 UTC (permalink / raw)
To: Peter Collingbourne
Cc: Pasha Tatashin, Alexandru Elisei, David Hildenbrand, lsf-pc,
cc: linux-mm
On Fri, Feb 23, 2024 at 01:49:08PM -0800, Peter Collingbourne wrote:
> I wonder if this is another potential use case for bringing back
> cleancache, as proposed in [1]? The idea would be that all kernel
> stacks have 16KB allocations but only one page accessible and the rest
> available as cleancache. We can handle a fault on one of those pages
> by discarding the cleancache page and remapping it as r/w.
That seems like the most complicated way to solve the problem.
Stack pages which have not been accessed contain no data, so do not
need swap or cleancache, they just need to be allocatable. And the
fault handler needs to be able to handle faults from interrupt context,
which I'm not sure all architectures can do.
The "We need to be able to allocate memory from interrupt context" is
really not that hard to handle; just keep three pages in a per-cpu array.
Refill the array at return-to-process-context.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks
2024-02-23 21:56 ` Matthew Wilcox
@ 2024-02-23 22:01 ` Peter Collingbourne
0 siblings, 0 replies; 7+ messages in thread
From: Peter Collingbourne @ 2024-02-23 22:01 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Pasha Tatashin, Alexandru Elisei, David Hildenbrand, lsf-pc,
cc: linux-mm
On Fri, Feb 23, 2024 at 1:56 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Feb 23, 2024 at 01:49:08PM -0800, Peter Collingbourne wrote:
> > I wonder if this is another potential use case for bringing back
> > cleancache, as proposed in [1]? The idea would be that all kernel
> > stacks have 16KB allocations but only one page accessible and the rest
> > available as cleancache. We can handle a fault on one of those pages
> > by discarding the cleancache page and remapping it as r/w.
>
> That seems like the most complicated way to solve the problem.
> Stack pages which have not been accessed contain no data, so do not
> need swap or cleancache, they just need to be allocatable. And the
> fault handler needs to be able to handle faults from interrupt context,
> which I'm not sure all architectures can do.
>
> The "We need to be able to allocate memory from interrupt context" is
> really not that hard to handle; just keep three pages in a per-cpu array.
> Refill the array at return-to-process-context.
Okay, if it doesn't really need that many pages allocated upfront,
that seems like the best approach to me as well.
Peter
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks
2024-02-23 1:03 [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks Pasha Tatashin
2024-02-23 21:49 ` Peter Collingbourne
@ 2024-04-30 13:07 ` Michal Hocko
2024-05-01 0:20 ` Pasha Tatashin
1 sibling, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2024-04-30 13:07 UTC (permalink / raw)
To: Pasha Tatashin; +Cc: lsf-pc, linux-mm
Hi Pasha,
is this something you still do consider interesting (and also productive
in absence of x86 maintainers) to discuss at LSFMM?
On Thu 22-02-24 20:03:37, Pavel Tatashin wrote:
> For a long time, an 8K kernel stack was large enough. However, since
> 2014, the default stack size has increased to 16K [1]. To conserve
> memory at Google, we maintained 8K stacks via a custom patch while
> verifying that our workload could fit within this limit.
>
> As we qualify new workloads and kernels, we find it more difficult to
> keep the stacks at 8K. Therefore, we will increase the stack size to
> the mainline value of 16K. However, this translates to a significant
> increase in memory usage, potentially counted in petabytes.
>
> With virtually mapped stacks [2], it's possible to implement
> auto-growth on faults. Ideally, the vast majority of kernel threads
> could fit into 4K or 8K stacks, with only a small number requiring
> deeper stacks that would expand as needed.
>
> The complication is that new pages must always be available from
> within an interrupt context. To ensure this, pages must be accessible
> to kernel threads in an atomic and lockless manner. This could be
> achieved by using a per-CPU supply of pages dedicated to handling
> kernel-stack faults.
>
> [1] https://lwn.net/Articles/600644
> [2] https://lwn.net/Articles/692608
> _______________________________________________
> Lsf-pc mailing list
> Lsf-pc@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lsf-pc
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks
2024-04-30 13:07 ` [Lsf-pc] " Michal Hocko
@ 2024-05-01 0:20 ` Pasha Tatashin
2024-05-02 7:19 ` Michal Hocko
0 siblings, 1 reply; 7+ messages in thread
From: Pasha Tatashin @ 2024-05-01 0:20 UTC (permalink / raw)
To: Michal Hocko; +Cc: lsf-pc, linux-mm
Hi Michal,
On Tue, Apr 30, 2024 at 9:07 AM Michal Hocko <mhocko@suse.com> wrote:
>
> Hi Pasha,
> is this something you still do consider interesting (and also productive
> in absence of x86 maintainers) to discuss at LSFMM?
Yes, I am going to go over a few alternatives solutions that were both
discussed in RFCv1 thread, and also a new proposal that I've been
working on.
X86 maintainers are not required, as the framework proposals are going
to be arch independent.
Pasha
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks
2024-05-01 0:20 ` Pasha Tatashin
@ 2024-05-02 7:19 ` Michal Hocko
0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2024-05-02 7:19 UTC (permalink / raw)
To: Pasha Tatashin; +Cc: lsf-pc, linux-mm
On Tue 30-04-24 20:20:20, Pavel Tatashin wrote:
> Hi Michal,
>
> On Tue, Apr 30, 2024 at 9:07 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > Hi Pasha,
> > is this something you still do consider interesting (and also productive
> > in absence of x86 maintainers) to discuss at LSFMM?
>
> Yes, I am going to go over a few alternatives solutions that were both
> discussed in RFCv1 thread, and also a new proposal that I've been
> working on.
>
> X86 maintainers are not required, as the framework proposals are going
> to be arch independent.
OK, thanks!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-05-02 7:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-23 1:03 [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks Pasha Tatashin
2024-02-23 21:49 ` Peter Collingbourne
2024-02-23 21:56 ` Matthew Wilcox
2024-02-23 22:01 ` Peter Collingbourne
2024-04-30 13:07 ` [Lsf-pc] " Michal Hocko
2024-05-01 0:20 ` Pasha Tatashin
2024-05-02 7:19 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox