From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05C41CCA47C for ; Thu, 23 Jun 2022 20:18:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 530F48E0188; Thu, 23 Jun 2022 16:18:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DEF68E0187; Thu, 23 Jun 2022 16:18:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A6F88E0188; Thu, 23 Jun 2022 16:18:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 290E98E0187 for ; Thu, 23 Jun 2022 16:18:19 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id EF16C80660 for ; Thu, 23 Jun 2022 20:18:18 +0000 (UTC) X-FDA: 79610612676.04.046DA83 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf05.hostedemail.com (Postfix) with ESMTP id 06A10100028 for ; Thu, 23 Jun 2022 20:18:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656015496; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=D/1SGgS7UqOks1VSsH8z0y+t8W/VgcfJCo4UXbDDJys=; b=Js0N//GQyI83N8/p86LSQKpa0GVY5Q9XLPVFwfbWss7hPC5TJ3OqtlKB+ylevJGiYb8V6n 16NLDma0rUPIg/CeI88qWYKUEY/g0Q6uJzNwdmFlhfv1k04nZbIC0nLB/y1F4WaqDCRMRs MygW3B5eD1xxoYSJ5am0yYQzgLeZ21M= Received: from mail-io1-f69.google.com (mail-io1-f69.google.com [209.85.166.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-315-8rM0qgebM1CqmmArb_VEDg-1; Thu, 23 Jun 2022 16:18:13 -0400 X-MC-Unique: 8rM0qgebM1CqmmArb_VEDg-1 Received: by mail-io1-f69.google.com with SMTP id l7-20020a6b7007000000b00669b2a0d497so296020ioc.0 for ; Thu, 23 Jun 2022 13:18:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=D/1SGgS7UqOks1VSsH8z0y+t8W/VgcfJCo4UXbDDJys=; b=pTRl6fJ7oXRxZR/t5iQlnq8FRuS7fHUYlv26dEVvbR69xKAQeUiRwBPAxa2WEeZS3A yj/QqiBJgoWMgdMwS8RnssnKwJR49B4/OowIgihBAlM8t0qaZwFcPICBtWCzvD/D2fFN hxXWCOV2UYhyzY+svaXRjbIzsnD1br9qfFynfCxDjByfFdq+gv8cIpnz1CNDi7wD9kFi w/3QShUYOtwcYsfQ3hbh8yq6TeTcDAWpQVTVrhQoWc2d4p87Lde3im5O+zz4gj22XfyM Lb5xgq3hDy4chcyOY120ydPX5Y2LvvPFbLz4Hn1/zngOLygKJitRCBDqYqEg+lToEI/O ENjQ== X-Gm-Message-State: AJIora8gfPU07mh5aQJX3f6EN8wAsS3VyFr9ejIgk5gaAimxWe01BAc9 RsTD8TtOsowOmAbNUEXETRYg71x4hQaGxkxpnXcHqpWnT/2GU7z0C1luyazFQ6UWVPCmHssKQ9S puGeoAmqz8ZQ= X-Received: by 2002:a05:6e02:1bc6:b0:2d3:dba7:f626 with SMTP id x6-20020a056e021bc600b002d3dba7f626mr6366438ilv.299.1656015492474; Thu, 23 Jun 2022 13:18:12 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uo4C7ez/iBecC3OJ//ZRyob4/ITiRBgoiX4i17BPRQRzlcyu7fAx97fpSFztFuDnOm/w4PUQ== X-Received: by 2002:a05:6e02:1bc6:b0:2d3:dba7:f626 with SMTP id x6-20020a056e021bc600b002d3dba7f626mr6366425ilv.299.1656015492238; Thu, 23 Jun 2022 13:18:12 -0700 (PDT) Received: from xz-m1.local (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id u10-20020a02b1ca000000b00339da678a7csm158539jah.78.2022.06.23.13.18.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jun 2022 13:18:11 -0700 (PDT) Date: Thu, 23 Jun 2022 16:18:09 -0400 From: Peter Xu To: Sean Christopherson Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini , Andrew Morton , David Hildenbrand , "Dr . David Alan Gilbert" , Andrea Arcangeli , Linux MM Mailing List Subject: Re: [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Message-ID: References: <20220622213656.81546-1-peterx@redhat.com> <20220622213656.81546-5-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Js0N//GQ"; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf05.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656015497; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D/1SGgS7UqOks1VSsH8z0y+t8W/VgcfJCo4UXbDDJys=; b=dK5HAWQjaxcMrl5Ow/uBsH8g/+ZZwiwhHU0hDAd0IWAVNNdfkko7Mlwefgj+7J7k/mKpea 2y2OkHkals9usa56NycHXHJLqMocSRD/CQvH9a8qe9TMKZPXHGmJyNy8WlDY2ZpAN/m0qE kJP5RXrQhhtlHZj9d5QpjTucfDhoiOk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656015497; a=rsa-sha256; cv=none; b=7WRpfDqbtIqvno0K7JBV0HXTGw5eV96T6ioFSWoPAdp5T9WigdODcVrPGXpTDfmjUdS5R4 XHr8kAPfQX6RSHrD3GAs7AJ41fkkCSTL0+rVygoynvSktK8yW5e2UuXj8DBh5QugvhwZhK JF3Z1cVrgvoXu3Jy5n/bpXtRp1okr1g= X-Stat-Signature: 9a7anpo9ny9z97z448q4abmyqk4ot8c7 X-Rspamd-Queue-Id: 06A10100028 X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Js0N//GQ"; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf05.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com X-Rspamd-Server: rspam12 X-HE-Tag: 1656015496-484881 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 23, 2022 at 08:07:00PM +0000, Sean Christopherson wrote: > On Thu, Jun 23, 2022, Peter Xu wrote: > > Hi, Sean, > > > > On Thu, Jun 23, 2022 at 02:46:08PM +0000, Sean Christopherson wrote: > > > On Wed, Jun 22, 2022, Peter Xu wrote: > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > > index e92f1ab63d6a..b39acb7cb16d 100644 > > > > --- a/arch/x86/kvm/mmu/mmu.c > > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > > @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) > > > > static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, > > > > unsigned int access) > > > > { > > > > + /* NOTE: not all error pfn is fatal; handle intr before the other ones */ > > > > + if (unlikely(is_intr_pfn(fault->pfn))) { > > > > + vcpu->run->exit_reason = KVM_EXIT_INTR; > > > > + ++vcpu->stat.signal_exits; > > > > + return -EINTR; > > > > + } > > > > + > > > > /* The pfn is invalid, report the error! */ > > > > if (unlikely(is_error_pfn(fault->pfn))) > > > > return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn); > > > > @@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > > > } > > > > } > > > > > > > > + /* Allow to respond to generic signals in slow page faults */ > > > > > > "slow" is being overloaded here. The previous call __gfn_to_pfn_memslot() will > > > end up in hva_to_pfn_slow(), but because of passing a non-null async it won't wait. > > > This code really should have a more extensive comment irrespective of the interruptible > > > stuff, now would be a good time to add that. > > > > Yes I agree, especially the "async" parameter along with "atomic" makes it > > even more confusing as you said. But isn't that also means the "slow" here > > is spot-on? I mean imho it's the "elsewhere" needs cleanup not this > > comment itself since it's really stating the fact that this is the real > > slow path? > > No, because atomic=false here, i.e. KVM will try hva_to_pfn_slow() if hva_to_pfn_fast() > fails. So even if we agree that the "wait for IO" path is the true slow path, > when reading KVM code the vast, vast majority of developers will associate "slow" > with hva_to_pfn_slow(). Okay. I think how we define slow matters, here my take is "when a major fault happens" (as defined in the mm term), but probably that definition is a bit far away from kvm as the hypervisor level indeed. > > > Or any other suggestions greatly welcomed on how I should improve this > > comment. > > Something along these lines? > > /* > * Allow gup to bail on pending non-fatal signals when it's also allowed > * to wait for IO. Note, gup always bails if it is unable to quickly > * get a page and a fatal signal, i.e. SIGKILL, is pending. > */ Taken. > > > > > > > > Comments aside, isn't this series incomplete from the perspective that there are > > > still many flows where KVM will hang if gfn_to_pfn() gets stuck in gup? E.g. if > > > KVM is retrieving a page pointed at by vmcs12. > > > > Right. The thing is I'm not confident I can make it complete easily in one > > shot.. > > > > I mentioned some of that in cover letter or commit message of patch 1, in > > that I don't think all the gup call sites are okay with being interrupted > > by a non-fatal signal. > > > > So what I want to do is doing it step by step, at least by introducing > > FOLL_INTERRUPTIBLE and having one valid user of it that covers a very valid > > use case. I'm also pretty sure the page fault path is really the most > > cases that will happen with GUP, so it already helps in many ways for me > > when running with a patched kernel. > > > > So when the complete picture is non-trivial to achieve in one shot, I think > > this could be one option we go for. With the facility (and example code on > > x86 slow page fault) ready, hopefully we could start to convert many other > > call sites to be signal-aware, outside page faults, or even outside x86, > > because it's really a more generic problem, which I fully agree. > > > > Does it sound reasonable to you? > > Yep. In fact, I'd be totally ok keeping this to just the page fault path. I > missed one cruicial detail on my first read through: gup already bails on SIGKILL, > it's only these technically-not-fatal signals that gup ignores by default. In > other words, using FOLL_INTERRUPTIBLE is purely opportunsitically as userspace > can always resort to SIGKILL if the VM really needs to die. > > It would be very helpful to explicit call that out in the changelog, that way > it's (hopefully) clear that KVM uses FOLL_INTERRUPTIBLE to be user friendly when > it's easy to do so, and that it's not required for correctness/robustness. Yes that's the case, sigkill is special. I can mention that somewhere in the cover letter too besides the comment you suggested above. Thanks, -- Peter Xu