Re: [RFC UKL 04/10] x86/entry: Create alternate entry path for system calls

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Andy Lutomirski" <luto@kernel.org>
To: "Ali Raza" <aliraza@bu.edu>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Cc: "Jonathan Corbet" <corbet@lwn.net>,
	masahiroy@kernel.org, michal.lkml@markovi.net,
	"Nick Desaulniers" <ndesaulniers@google.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"Arnd Bergmann" <arnd@arndb.de>,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, "Steven Rostedt" <rostedt@goodmis.org>,
	"Ben Segall" <bsegall@google.com>,
	mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	jpoimboe@kernel.org, linux-doc@vger.kernel.org,
	linux-kbuild@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org,
	"the arch/x86 maintainers" <x86@kernel.org>,
	rjones@redhat.com, munsoner@bu.edu, tommyu@bu.edu,
	drepper@redhat.com, lwoodman@redhat.com, mboydmcse@gmail.com,
	okrieg@bu.edu, rmancuso@bu.edu,
	"Daniel Bristot de Oliveira" <bristot@kernel.org>
Subject: Re: [RFC UKL 04/10] x86/entry: Create alternate entry path for system calls
Date: Tue, 04 Oct 2022 10:43:47 -0700	[thread overview]
Message-ID: <b8e86bd9-f8a1-4f37-8f1a-ae0b6209d922@app.fastmail.com> (raw)
In-Reply-To: <20221003222133.20948-5-aliraza@bu.edu>



On Mon, Oct 3, 2022, at 3:21 PM, Ali Raza wrote:
> If a UKL application makes a system call, it won't go through with the
> syscall assembly instruction. Instead, the application will use the call
> instruction to go to the kernel entry point. Instead of adding checks to
> the normal entry_SYSCALL_64 to see if we came here from a UKL task or a
> normal application task, we create a totally new entry point called
> ukl_entry_SYSCALL_64. This allows the normal entry point to be unchanged
> and simplifies the UKL specific code as well.
>
> ukl_entry_SYSCALL_64 is similar to entry_SYSCALL_64 except that it has to
> populate %rcx with return address manually (syscall instruction does that
> automatically for normal application tasks). This allows the pt_regs to be
> correct. Also, we have to push the flags onto the user stack, because on
> the return path, we first switch to user stack, then pop the flags and then
> return. Popping the flags would restart interrupts, so we dont want to be
> stuck on kernel stack when an interrupt hits. All this can be done with an
> iret instruction, but call/iret pair performans way slower than a call/ret
> pair.
>
> Also, on the entry path, we make sure the context flag i.e., in_user is set
> to 1 to indicate we are now in kernel context so any new interrupts dont
> have to go through kernel entry code again. This is normally done with the
> CS value on stack, but in UKL case that will always be a kernel value. On
> the way back, the in_user is switched back to 2 to indicate that now
> application context is being entered. All non-UKL tasks have the in_user
> value set to 0.


>
> The UKL application uses a slightly different value for CS, instead of
> 0x33, we use 0xC3. As most of the tests compare only the least significant
> nibble, they behave as expected. The C value in the second nibble allows us
> to distinguish between user space and UKL application code.

My intuition would be to try this the other way around.  Use an actual honest CS (specifically _KERNEL_CS) for pt_regs->cs.  Translate at the user ABI boundary instead.  After all, a UKL task is essentially just a kernel thread that happens to have a pt_regs area.


>
> Rest of the code makes sure the above mentioned in_user context tracking is
> done for all entry and exit cases i.e., for interrupts, exceptions etc.  If
> its a UKL task, if in_user value is 2, we treat it as an application task,
> and if it is 1, we treat it as coming from kernel context. We skip these
> checks if in_user is 0.

By "context tracking" are you referring to RCU?  Since a UKL task is essentially a kernel thread, what "entry" is there other than setting up pt_regs?

>
> swapgs_restore_regs_and_return_to_usermode changes also make sure that
> in_user is correct and then we iret back.
>
> Double fault handling is special case. Normally, if a user stack suffers a
> page fault, hardware switches to a kernel stack and pushes a frame onto the
> kernel stack. This switch only happens if the execution was in user
> privilege level when the page fault occurred. For UKL, execution is always
> in kernel level, so when the user stack suffers a page fault, no switch to
> a pinned kernel stack happens, and hardware tries to push state on the
> already faulting user stack. This generates a double fault. So we handle
> this case in the double fault handler by assuming any double fault is
> actually a user stack page fault. This can also be fixed by making all page
> faults go through a pinned stack using the IST mechanism. We have tried and
> tested that, but in the interest of touching as little code as possible, we
> chose this option instead.

Eww.  I guess this is a real problem, but eww.

next prev parent reply	other threads:[~2022-10-04 17:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-03 22:21 [RFC UKL 00/10] Unikernel Linux (UKL) Ali Raza
2022-10-03 22:21 ` [RFC UKL 01/10] kbuild: Add sections and symbols to linker script for UKL support Ali Raza
2022-10-03 22:21 ` [RFC UKL 02/10] x86/boot: Load the PT_TLS segment for Unikernel configs Ali Raza
2022-10-04 17:30   ` Andy Lutomirski
2022-10-06 21:00     ` Ali Raza
2022-10-03 22:21 ` [RFC UKL 03/10] sched: Add task_struct tracking of kernel or application execution Ali Raza
2022-10-03 22:21 ` [RFC UKL 04/10] x86/entry: Create alternate entry path for system calls Ali Raza
2022-10-04 17:43   ` Andy Lutomirski [this message]
2022-10-06 21:12     ` Ali Raza
2022-10-03 22:21 ` [RFC UKL 05/10] x86/uaccess: Make access_ok UKL aware Ali Raza
2022-10-04 17:36   ` Andy Lutomirski
2022-10-06 21:16     ` Ali Raza
2022-10-03 22:21 ` [RFC UKL 06/10] x86/fault: Skip checking kernel mode access to user address space for UKL Ali Raza
2022-10-03 22:21 ` [RFC UKL 07/10] x86/signal: Adjust signal handler register values and return frame Ali Raza
2022-10-04 17:34   ` Andy Lutomirski
2022-10-06 21:20     ` Ali Raza
2022-10-03 22:21 ` [RFC UKL 08/10] exec: Make exec path for starting UKL application Ali Raza
2022-10-03 22:21 ` [RFC UKL 09/10] exec: Give userspace a method for starting UKL process Ali Raza
2022-10-04 17:35   ` Andy Lutomirski
2022-10-06 21:25     ` Ali Raza
2022-10-03 22:21 ` [RFC UKL 10/10] Kconfig: Add config option for enabling and sample for testing UKL Ali Raza
2022-10-04  2:11   ` Bagas Sanjaya
2022-10-06 21:28     ` Ali Raza
2022-10-07 10:21       ` Masahiro Yamada
2022-10-13 17:08         ` Ali Raza
2022-10-06 21:27 ` [RFC UKL 00/10] Unikernel Linux (UKL) H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8e86bd9-f8a1-4f37-8f1a-ae0b6209d922@app.fastmail.com \
    --to=luto@kernel.org \
    --cc=aliraza@bu.edu \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=bristot@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=drepper@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=masahiroy@kernel.org \
    --cc=mboydmcse@gmail.com \
    --cc=mgorman@suse.de \
    --cc=michal.lkml@markovi.net \
    --cc=mingo@redhat.com \
    --cc=munsoner@bu.edu \
    --cc=ndesaulniers@google.com \
    --cc=okrieg@bu.edu \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjones@redhat.com \
    --cc=rmancuso@bu.edu \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tommyu@bu.edu \
    --cc=vincent.guittot@linaro.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox