linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org,  Andrew Morton <akpm@linux-foundation.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	 Borislav Petkov <bp@alien8.de>,
	Christian Brauner <brauner@kernel.org>,
	bristot@redhat.com,  Ben Segall <bsegall@google.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	dianders@chromium.org,  dietmar.eggemann@arm.com,
	eric.devolder@oracle.com, hca@linux.ibm.com,
	 "hch@infradead.org" <hch@infradead.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	 Jacob Pan <jacob.jun.pan@linux.intel.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	jpoimboe@kernel.org,  Joerg Roedel <jroedel@suse.de>,
	juri.lelli@redhat.com,
	 Kent Overstreet <kent.overstreet@linux.dev>,
	kinseyho@google.com,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	lstoakes@gmail.com, mgorman@suse.de,  mic@digikod.net,
	michael.christie@oracle.com, Ingo Molnar <mingo@redhat.com>,
	 mjguzik@gmail.com, "Michael S. Tsirkin" <mst@redhat.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	 "Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Petr Mladek <pmladek@suse.com>,
	 Rick P Edgecombe <rick.p.edgecombe@intel.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Uladzislau Rezki <urezki@gmail.com>,
	vincent.guittot@linaro.org, vschneid@redhat.com
Subject: Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks
Date: Mon, 11 Mar 2024 20:08:16 -0400	[thread overview]
Message-ID: <CA+CK2bBU2zwu_V6hpOonswyuft5gWQh1H9tBbYP8efLRRAAdQQ@mail.gmail.com> (raw)
In-Reply-To: <1ac305b1-d28f-44f6-88e5-c85d9062f9e8@app.fastmail.com>

> >> There are some other options: you could pre-map
> >
> > Pre-mapping would be expensive. It would mean pre-mapping the dynamic
> > pages for every scheduled thread, and we'd still need to check the
> > access bit every time a thread leaves the CPU.
>
> That's a write to four consecutive words in memory, with no locking required.

You convinced me, this might not be that bad. At the thread creation
time we will save the locations of the unmapped thread PTE's, and set
them on every schedule. There is a slight increase in scheduling cost,
but perhaps it is not as bad as I initially thought. This approach,
however, makes this dynamic stac feature much safer, and can be easily
extended to all arches that support access/dirty bit tracking.

>
> > Dynamic thread faults
> > should be considered rare events and thus shouldn't significantly
> > affect the performance of normal context switch operations. With 8K
> > stacks, we might encounter only 0.00001% of stacks requiring an extra
> > page, and even fewer needing 16K.
>
> Well yes, but if you crash 0.0001% of the time due to the microcode not liking you, you lose. :)
>
> >
> >> Also, I think the whole memory allocation concept in this whole series is a bit odd.  Fundamentally, we *can't* block on these stack faults -- we may be in a context where blocking will deadlock.  We may be in the page allocator.  Panicing due to kernel stack allocation  would be very unpleasant.
> >
> > We never block during handling stack faults. There's a per-CPU page
> > pool, guaranteeing availability for the faulting thread. The thread
> > simply takes pages from this per-CPU data structure and refills the
> > pool when leaving the CPU. The faulting routine is efficient,
> > requiring a fixed number of loads without any locks, stalling, or even
> > cmpxchg operations.
>
> You can't block when scheduling, either.  What if you can't refill the pool?

Why can't we (I am not a scheduler guy)? IRQ's are not yet disabled,
what prevents us from blocking while the old process has not yet been
removed from the CPU?


  reply	other threads:[~2024-03-12  0:08 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-11 16:46 [RFC 00/14] " Pasha Tatashin
2024-03-11 16:46 ` [RFC 01/14] task_stack.h: remove obsolete __HAVE_ARCH_KSTACK_END check Pasha Tatashin
2024-03-17 14:36   ` Christophe JAILLET
2024-03-17 15:13     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 02/14] fork: Clean-up ifdef logic around stack allocation Pasha Tatashin
2024-03-11 16:46 ` [RFC 03/14] fork: Clean-up naming of vm_strack/vm_struct variables in vmap stacks code Pasha Tatashin
2024-03-17 14:42   ` Christophe JAILLET
2024-03-19 16:32     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 04/14] fork: Remove assumption that vm_area->nr_pages equals to THREAD_SIZE Pasha Tatashin
2024-03-17 14:45   ` Christophe JAILLET
2024-03-17 15:14     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 05/14] fork: check charging success before zeroing stack Pasha Tatashin
2024-03-12 15:57   ` Kirill A. Shutemov
2024-03-12 16:52     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 06/14] fork: zero vmap stack using clear_page() instead of memset() Pasha Tatashin
2024-03-12  7:15   ` Nikolay Borisov
2024-03-12 16:53     ` Pasha Tatashin
2024-03-14  7:55       ` Christophe Leroy
2024-03-14 13:52         ` Pasha Tatashin
2024-03-17 14:48   ` Christophe JAILLET
2024-03-17 15:15     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 07/14] fork: use the first page in stack to store vm_stack in cached_stacks Pasha Tatashin
2024-03-11 16:46 ` [RFC 08/14] fork: separate vmap stack alloction and free calls Pasha Tatashin
2024-03-14 15:18   ` Jeff Xie
2024-03-14 17:14     ` Pasha Tatashin
2024-03-17 14:51   ` Christophe JAILLET
2024-03-17 15:15     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 09/14] mm/vmalloc: Add a get_vm_area_node() and vmap_pages_range_noflush() public functions Pasha Tatashin
2024-03-11 16:46 ` [RFC 10/14] fork: Dynamic Kernel Stacks Pasha Tatashin
2024-03-11 19:32   ` Randy Dunlap
2024-03-11 19:55     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 11/14] x86: add support for " Pasha Tatashin
2024-03-11 22:17   ` Andy Lutomirski
2024-03-11 23:10     ` Pasha Tatashin
2024-03-11 23:33       ` Thomas Gleixner
2024-03-11 23:34       ` Andy Lutomirski
2024-03-12  0:08         ` Pasha Tatashin [this message]
2024-03-12  0:23           ` Pasha Tatashin
2024-03-11 23:34     ` Dave Hansen
2024-03-11 23:41       ` Andy Lutomirski
2024-03-11 23:56         ` Nadav Amit
2024-03-12  0:02           ` Andy Lutomirski
2024-03-12  7:20             ` Nadav Amit
2024-03-12  0:53           ` Dave Hansen
2024-03-12  1:25             ` H. Peter Anvin
2024-03-12  2:16               ` Andy Lutomirski
2024-03-12  2:20                 ` H. Peter Anvin
2024-03-12 21:58   ` Andi Kleen
2024-03-13 10:23   ` Thomas Gleixner
2024-03-13 13:43     ` Pasha Tatashin
2024-03-13 15:28       ` Pasha Tatashin
2024-03-13 16:12         ` Thomas Gleixner
2024-03-14 14:03           ` Pasha Tatashin
2024-03-14 18:26             ` Thomas Gleixner
2024-03-11 16:46 ` [RFC 12/14] task_stack.h: Clean-up stack_not_used() implementation Pasha Tatashin
2024-03-11 16:46 ` [RFC 13/14] task_stack.h: Add stack_not_used() support for dynamic stack Pasha Tatashin
2024-03-11 16:46 ` [RFC 14/14] fork: Dynamic Kernel Stack accounting Pasha Tatashin
2024-03-11 17:09 ` [RFC 00/14] Dynamic Kernel Stacks Mateusz Guzik
2024-03-11 18:58   ` Pasha Tatashin
2024-03-11 19:21     ` Mateusz Guzik
2024-03-11 19:55       ` Pasha Tatashin
2024-03-12 17:18 ` H. Peter Anvin
2024-03-12 19:45   ` Pasha Tatashin
2024-03-12 21:36     ` H. Peter Anvin
2024-03-14 19:05       ` Kent Overstreet
2024-03-14 19:23         ` Pasha Tatashin
2024-03-14 19:28           ` Kent Overstreet
2024-03-14 19:34             ` Pasha Tatashin
2024-03-14 19:49               ` Kent Overstreet
2024-03-12 22:18     ` David Laight
2024-03-14 19:43   ` Matthew Wilcox
2024-03-14 19:53     ` Kent Overstreet
2024-03-14 19:57       ` Matthew Wilcox
2024-03-14 19:58         ` Kent Overstreet
2024-03-15  3:13         ` Pasha Tatashin
2024-03-15  3:39           ` H. Peter Anvin
2024-03-16 19:17             ` Pasha Tatashin
2024-03-17  0:41               ` Matthew Wilcox
2024-03-17  1:32                 ` Kent Overstreet
2024-03-17 14:19                 ` Pasha Tatashin
2024-03-17 14:43               ` Brian Gerst
2024-03-17 16:15                 ` Pasha Tatashin
2024-03-17 21:30                   ` Brian Gerst
2024-03-18 14:59                     ` Pasha Tatashin
2024-03-18 21:02                       ` Brian Gerst
2024-03-19 14:56                         ` Pasha Tatashin
2024-03-17 18:57               ` David Laight
2024-03-18 15:09                 ` Pasha Tatashin
2024-03-18 15:13                   ` Pasha Tatashin
2024-03-18 15:19                   ` Matthew Wilcox
2024-03-18 15:30                     ` Pasha Tatashin
2024-03-18 15:53                       ` David Laight
2024-03-18 16:57                         ` Pasha Tatashin
2024-03-18 15:38               ` David Laight
2024-03-18 17:00                 ` Pasha Tatashin
2024-03-18 17:37                   ` Pasha Tatashin
2024-03-15  4:17           ` H. Peter Anvin
2024-03-17  0:47     ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+CK2bBU2zwu_V6hpOonswyuft5gWQh1H9tBbYP8efLRRAAdQQ@mail.gmail.com \
    --to=pasha.tatashin@soleen.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dianders@chromium.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=eric.devolder@oracle.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jpoimboe@kernel.org \
    --cc=jroedel@suse.de \
    --cc=juri.lelli@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=kinseyho@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mic@digikod.net \
    --cc=michael.christie@oracle.com \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox