linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Peter Xu <peterx@redhat.com>
Cc: Carsten Stollmaier <stollmc@amazon.com>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	nh-open-source@amazon.com,
	Sebastian Biemueller <sbiemue@amazon.de>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] KVM: x86: Use gfn_to_pfn_cache for steal_time
Date: Sat, 03 Aug 2024 09:35:56 +0100	[thread overview]
Message-ID: <ae8412474f793ff59d5567bc721dc012d6ee0199.camel@infradead.org> (raw)
In-Reply-To: <Zq1gavwLBHeSr2ju@x1n>

[-- Attachment #1: Type: text/plain, Size: 3190 bytes --]

On Fri, 2024-08-02 at 18:40 -0400, Peter Xu wrote:
> On Fri, Aug 02, 2024 at 01:03:16PM +0100, David Woodhouse wrote:
> > An alternative workaround (which perhaps we should *also* consider)
> > looked like this (plus some suitable code comment, of course):
> > 
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -1304,6 +1304,8 @@ void do_user_addr_fault(struct pt_regs *regs,
> >          */
> >         if (user_mode(regs))
> >                 flags |= FAULT_FLAG_USER;
> > +       else
> > +               flags &= ~FAULT_FLAG_INTERRUPTIBLE;
> >  
...
> Instead of "interruptible exception" or the original patch (which might
> still be worthwhile, though?  I didn't follow much on kvm and the new gpc
> cache, but looks still nicer than get/put user from initial glance), above

Yes, I definitely think we want the GPC conversion anyway. That's why I
suggested it to Carsten, to resolve our *immediate* problem while we
continue to ponder the general case.

> looks like the easier and complete solution to me.  For "completeness", I
> mean I am not sure how many other copy_to/from_user() code in kvm can hit
> this, so looks like still possible to hit outside steal time page?

Right. It theoretically applies to *any* user access. It's just that
anything other than *guest* pages is slightly less likely to be backed
by userfaultfd.

> I thought only the slow fault path was involved in INTERRUPTIBLE thing and
> that was the plan, but I guess I overlooked how the default value could
> affect copy to/from user invoked from KVM as well..
> 
> With above patch to drop FAULT_FLAG_INTERRUPTIBLE for !user, KVM can still
> opt-in INTERRUPTIBLE anywhere by leveraging hva_to_pfn[_slow]() API, which
> is "INTERRUPTIBLE"-ready with a boolean the caller can set. But the caller
> will need to be able to process KVM_PFN_ERR_SIGPENDING.

Right. I think converting kvm_{read,write}_guest() and friends to do
that and be interruptible might make sense?

The patch snippet above obviously only fixes it for x86 and would need
to be done across the board. Unless we do this one instead, abusing the
knowledge that uffd is the only thing which honours
FAULT_FLAG_INTERRUPTIBLE?

--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -351,7 +351,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
 
 static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags)
 {
-       if (flags & FAULT_FLAG_INTERRUPTIBLE)
+       if ((flags & FAULT_FLAG_INTERRUPTIBLE) && (flags & FAULT_FLAG_USER))
                return TASK_INTERRUPTIBLE;
 
        if (flags & FAULT_FLAG_KILLABLE)

I still quite like the idea of *optional* interruptible exceptions, as
seen in my proof of concept. Perhaps we wouldn't want the read(2) and
write(2) system calls to use them, but there are plenty of other system
calls which could be interruptible instead of blocking.

Right now, even the simple case of a trivial SIGINT handler which does
some minor cleanup before exiting, makes it a non-fatal signal so the
kernel blocks and waits for ever.


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

  reply	other threads:[~2024-08-03  8:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240802114402.96669-1-stollmc@amazon.com>
2024-08-02 12:03 ` David Woodhouse
2024-08-02 12:38   ` Matthew Wilcox
2024-08-02 12:53     ` David Woodhouse
2024-08-02 12:56       ` David Woodhouse
2024-08-02 16:06   ` David Woodhouse
2024-08-02 22:40   ` Peter Xu
2024-08-03  8:35     ` David Woodhouse [this message]
2024-08-04 13:31       ` Peter Xu
2024-08-17  0:22   ` Sean Christopherson
2024-08-20 10:11     ` David Woodhouse
2025-07-29 10:28       ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae8412474f793ff59d5567bc721dc012d6ee0199.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=nh-open-source@amazon.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=sbiemue@amazon.de \
    --cc=seanjc@google.com \
    --cc=stollmc@amazon.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox