From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90815C3DA64 for ; Sun, 4 Aug 2024 13:31:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A918E6B007B; Sun, 4 Aug 2024 09:31:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A42506B0082; Sun, 4 Aug 2024 09:31:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 909146B0085; Sun, 4 Aug 2024 09:31:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6E1A56B007B for ; Sun, 4 Aug 2024 09:31:28 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 09864A1908 for ; Sun, 4 Aug 2024 13:31:28 +0000 (UTC) X-FDA: 82414649856.12.E3D0C8C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf10.hostedemail.com (Postfix) with ESMTP id AD553C002B for ; Sun, 4 Aug 2024 13:31:25 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Nw/f+Ni9"; spf=pass (imf10.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722778226; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yzljz7zFdusmtedzaMMn1iGtnMQpOk40I0Wp+DiG0Yg=; b=lzfEOeIMWMUCGYFvDcfrAv92XfCWd3VtK93bWG64mvSI+7lbrexkEtVNgznfY3MDWMT9N5 FdZRcY2Fc0rQgjRkMGPlKejr/v5yuTa9eBZiELIRJiUIJoR/iPYrTOK85RCFpkEekwitBU hGlCS3SbBCEItkiWqoFxHxmv75IjBHQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722778226; a=rsa-sha256; cv=none; b=JIpWNlD1eyfMERWsKZf/GbXsWrR07ayz4kWPNPtqGVKh4JHh9g3ePXwiiS4wSj/XUoqXt0 2BSEYu6qQiDJ0qsYpO0nSFhoJHiQVDx1zwvAI2sAN/b158zSMlnp7Bxgv6YA4mptzK6wXp 7h7Y0MpO+Lv6/TC34H2dp8/EZ0BzeSU= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Nw/f+Ni9"; spf=pass (imf10.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722778284; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yzljz7zFdusmtedzaMMn1iGtnMQpOk40I0Wp+DiG0Yg=; b=Nw/f+Ni9hW6qE1csswuzDOQAI6muGto88u/AnDWzbT24UKWCPoPY95ek7F3yj/Hknzfe1K NJB8byHFnep+j/xN0EawRiXMS2qrkvqnhFUV39dkkt6iCFlHOlyxS0+eIAQrYoQQc4O3Z5 E3tQv8iycFNtStgdBib/4WL2VhzRTN8= Received: from mail-ot1-f71.google.com (mail-ot1-f71.google.com [209.85.210.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-654-mkycbB7aMYeJLK37Z8-yDw-1; Sun, 04 Aug 2024 09:31:23 -0400 X-MC-Unique: mkycbB7aMYeJLK37Z8-yDw-1 Received: by mail-ot1-f71.google.com with SMTP id 46e09a7af769-7093752a9f5so2151038a34.1 for ; Sun, 04 Aug 2024 06:31:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722778283; x=1723383083; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Yzljz7zFdusmtedzaMMn1iGtnMQpOk40I0Wp+DiG0Yg=; b=GiFo91A6M77XTtd3Vcux1sw9HcEOy0e7z73HX3JeOJowk3jAH3gMxoONTvXoM5GjpU ruxtrxCoIXT5id7trwHnRWsFcxpZUbQDITTr6nnKIogYQnwdgbbZ2eyLmGnp4ENspLzA wP8vkM2KtTSjnVvzBJXsdrF+v6+1hf/BORN7RHgQ0cipq8dMT0XkF+9lvIzkK9pL9JJb b6A2ZmZw49kUh/cY2/hqceOoHgjjnXGHIFkCo8L/LpCNTXFiGF6gXpIQyQ2EDDyNQaaG 02TZdQ5uagPeN14Izuc/5fGj1Vh7pWF1/tgpjwE0vFNazLCEx64U/8FYHytNp8dVCgyC Ye1w== X-Forwarded-Encrypted: i=1; AJvYcCVMzZtCimz95trB38hw4DKdTn/ropbo8N/DDTOGd1Hw1YXPDUCXCdBhTsmYsmANak45iTvALwEIWg==@kvack.org X-Gm-Message-State: AOJu0YzPVpLfC88WKettLYpcQt2cswky4yGlPYQk5Tf8nLETl8696UjE ZxtTcSYWYkkGgMrtP+zvNzD6v1H6770HNR8HUVCwllw/8L0rkgkgkbtIsIj2IVVCkAR3Fj7t219 i6FE10G7/fQ7TuITS/6TR5BHjPXCxmmgE110iiJG4YWaz2ciW X-Received: by 2002:a05:6830:6582:b0:703:5c2d:6d30 with SMTP id 46e09a7af769-7099ea3ad03mr5182825a34.0.1722778282822; Sun, 04 Aug 2024 06:31:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEVAwWmeGFfzpNaruJ0rX+cQMbAUjtorFb8Y7NaDBarrl/5zMYsYzeHsuxBnCiAgdVn3xEpXw== X-Received: by 2002:a05:6830:6582:b0:703:5c2d:6d30 with SMTP id 46e09a7af769-7099ea3ad03mr5182813a34.0.1722778282444; Sun, 04 Aug 2024 06:31:22 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a34f6fbc40sm255563085a.58.2024.08.04.06.31.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Aug 2024 06:31:21 -0700 (PDT) Date: Sun, 4 Aug 2024 09:31:19 -0400 From: Peter Xu To: David Woodhouse Cc: Carsten Stollmaier , Sean Christopherson , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , nh-open-source@amazon.com, Sebastian Biemueller , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox , Andrew Morton , "linux-mm@kvack.org" Subject: Re: [PATCH] KVM: x86: Use gfn_to_pfn_cache for steal_time Message-ID: References: <20240802114402.96669-1-stollmc@amazon.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: AD553C002B X-Stat-Signature: 7z4wqg7ta93mw6jxqst168qk73ifoqc3 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1722778285-778671 X-HE-Meta: U2FsdGVkX1/jYgBx6YGcmdTK3IXA8ir9b3wTpaGncjoB3fF/c4OSY6VB1rEvtkg0n9BP7z7dRv599YDy3xkl+LqEVXzWEpdmByMlj1x3Aqw+Hiea2PHjXSjWMcukaTLl94I/AEiq4yGAdqdg0/95HGDb3Uo6RNyuFGdLWo5Zr1y3dObeZBk+JcvmmX/VASJpzQbPzMS/HOKWj7xtf4mKGQXbMeC8m1mdqIKtlkbz2ZfyZwTKKHDyFzvtkmNPLmuYhXpjozF7/f/m63hCIdDUD86d7tHw7nh00vOifk3pS/McvO/KcB9Df4kSnJoWHowu0zlL+g8EK3vXoW9zU6zNtdeqqNbBbR7KOE/KlbKciaua90GL0DBsAAjPRfsgX6BWtDxBXFURGxQxr7adFK/o1noz6rANSNcpU0oRss01rU7CFYmPstzZYES86UCVHIohjGuwzNAqXcgdDFiDRBQfCtiexN/t5p7r8/uYw4BBgr7NyaPa/AfjSlsB48cozsPgASJpROiYZBt6W8Ur1yIT4iJj3v1uJ+kBzTzernjwom9MyWD9BUCc/6cRdRDvSwds9Z/BCTaFbiPCfFpQ5qVc+rcG+jkPbEU2Pzsc/U6LGR1dL1v1m1Pk/r5R9pjHpjNtgkU+GHY5z6D5NiMVkEf6WWOMEjHvlpWnGLtgFQzW1G/nWHLsCIXWRhu+tuyXhcL5F5vjJAy+vJK+JNdPmnf/iAJoAPxAdt/WYUC+hdLCGZf+gBpYBphMAkRCz9jTyTvCW590+aMRVENPWph8Educznek5fiEShpLwr/vFkyZxtcavSxwJ4h2sY9vK7i5hzoi8clCcupEKF8mgYliWRktzd/xam/cbsXb2PfGi4Mf+Zgcw8J1piExSJLlVu0tJRsR1IPPTDBP63+lqbsNGSph8xg6QTqe00A2i/tia9OyDC3AdNaa7c54BXfkI2Y0hVdMCHWKcL/X+GMIXg2h/r/ Lwf7KnYM lj1BfAjbqf1icBfv+WINKLJTj5YvzpbUjav6Kkuyo4bzgwCfxbxeDdOKHcW81XArTf4m94mD7qL3Mox50GWg3xtz3UapavPWjFKOwZ1OeQAQC6X81B0R/VckYVJKWi1UYQCd8Wz9F/q/TXz+JAq24CtYBRRBvug2poI74f1A5ID/VUxhAgEPaWmgqace9hxNxLEmNofwQS+lq6xWLIZuyvn6yfvRwlN95UpbmtO7O2c2RStJgZ+PHDfD6Anm5smpJyhQTdrEmjOejC/Ccts8rqZL+VvaEVLsloiSPqcU1hvWvNN8OVEOnOSDARniau3f0m16Ta1yRSXwdBiMzruLy36ReWRHgyGJhfdrtpicaw/Pgfh70Ox4bxfri4kcdTiYXvEsj7WGvxlLf7IAMhsst8rWyJM6FH8RGwpC3Fxzo5JHPvI1PUcnJAsFEJIpz/99wyFfGoVS/3e07lcKE35ahuG7Kp4xSw36RXREX+0/tFpuA8d9DRMLcEozDgQVx6oM4aC1sE4xoCjtNjsEIfPwbCuUAHMv92txdSOr8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Aug 03, 2024 at 09:35:56AM +0100, David Woodhouse wrote: > On Fri, 2024-08-02 at 18:40 -0400, Peter Xu wrote: > > On Fri, Aug 02, 2024 at 01:03:16PM +0100, David Woodhouse wrote: > > > An alternative workaround (which perhaps we should *also* consider) > > > looked like this (plus some suitable code comment, of course): > > > > > > --- a/arch/x86/mm/fault.c > > > +++ b/arch/x86/mm/fault.c > > > @@ -1304,6 +1304,8 @@ void do_user_addr_fault(struct pt_regs *regs, > > >          */ > > >         if (user_mode(regs)) > > >                 flags |= FAULT_FLAG_USER; > > > +       else > > > +               flags &= ~FAULT_FLAG_INTERRUPTIBLE; > > >   > ... > > Instead of "interruptible exception" or the original patch (which might > > still be worthwhile, though?  I didn't follow much on kvm and the new gpc > > cache, but looks still nicer than get/put user from initial glance), above > > Yes, I definitely think we want the GPC conversion anyway. That's why I > suggested it to Carsten, to resolve our *immediate* problem while we > continue to ponder the general case. > > > looks like the easier and complete solution to me.  For "completeness", I > > mean I am not sure how many other copy_to/from_user() code in kvm can hit > > this, so looks like still possible to hit outside steal time page? > > Right. It theoretically applies to *any* user access. It's just that > anything other than *guest* pages is slightly less likely to be backed > by userfaultfd. > > > I thought only the slow fault path was involved in INTERRUPTIBLE thing and > > that was the plan, but I guess I overlooked how the default value could > > affect copy to/from user invoked from KVM as well.. > > > > With above patch to drop FAULT_FLAG_INTERRUPTIBLE for !user, KVM can still > > opt-in INTERRUPTIBLE anywhere by leveraging hva_to_pfn[_slow]() API, which > > is "INTERRUPTIBLE"-ready with a boolean the caller can set. But the caller > > will need to be able to process KVM_PFN_ERR_SIGPENDING. > > Right. I think converting kvm_{read,write}_guest() and friends to do > that and be interruptible might make sense? Makes sense to me. It's just that there seem to be a lot of the contexts that using this, so I'm not sure how much work needed to integrate the new KVM_PFN_ERR_SIGPENDING, and whether it'll be worthwhile. Also, not sure whether some context that can be too involved to only handle sigkill/quit. And this can also, logically, trigger with kvm_{read,write}_guest() or similar path already, right? I wonder why it can so far only trigger with steal time; I probably missed something. > > The patch snippet above obviously only fixes it for x86 and would need > to be done across the board. Unless we do this one instead, abusing the > knowledge that uffd is the only thing which honours > FAULT_FLAG_INTERRUPTIBLE? > > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -351,7 +351,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, > > static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) > { > - if (flags & FAULT_FLAG_INTERRUPTIBLE) > + if ((flags & FAULT_FLAG_INTERRUPTIBLE) && (flags & FAULT_FLAG_USER)) > return TASK_INTERRUPTIBLE; > > if (flags & FAULT_FLAG_KILLABLE) This smells hacky to me, as FAULT_FLAG_INTERRUPTIBLE is a fault API that the fault handler says "let's respond to non-fatal signals". It means here userfault is violating the ABI.. And, IIUC this concept of "handling non-fatal signal" can apply outside userfaultfd too. The one in my mind is __folio_lock_or_retry(). The previous change looks more reasonable, as I think it's a bug that in do_user_addr_fault() (just take x86 as example) it specifies the INTERRUPTIBLE but later after handle_mm_fault() it ignored it in fault_signal_pending() for !user. So it makes sense to me to have FAULT_FLAG_DEFAULT matching what fault_signal_pending() does. From that POV perhaps if FAULT_FLAG_DEFAULT can take "user" as input would be even cleaner (instead of clearing it later). > > I still quite like the idea of *optional* interruptible exceptions, as > seen in my proof of concept. Perhaps we wouldn't want the read(2) and > write(2) system calls to use them, but there are plenty of other system > calls which could be interruptible instead of blocking. I don't have enough much direct experience there, but it sounds reasonable to me. > > Right now, even the simple case of a trivial SIGINT handler which does > some minor cleanup before exiting, makes it a non-fatal signal so the > kernel blocks and waits for ever. > Thanks, -- Peter Xu