From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EB84C04A6A for ; Tue, 15 Aug 2023 14:51:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFCD0900002; Tue, 15 Aug 2023 10:51:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A86288D0001; Tue, 15 Aug 2023 10:51:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9272A900002; Tue, 15 Aug 2023 10:51:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7E3B78D0001 for ; Tue, 15 Aug 2023 10:51:04 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 14F20804B2 for ; Tue, 15 Aug 2023 14:51:03 +0000 (UTC) X-FDA: 81126626448.16.96ABC3B Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf04.hostedemail.com (Postfix) with ESMTP id C66B540018 for ; Tue, 15 Aug 2023 14:51:01 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=CjVP1jj4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 31JDbZAYKCJEDzv84x19916z.x97638FI-775Gvx5.9C1@flex--seanjc.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=31JDbZAYKCJEDzv84x19916z.x97638FI-775Gvx5.9C1@flex--seanjc.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692111061; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BSaljbCebqXxAjoldD4Av0i/0DgZBw/2C/j7e5eHNTM=; b=CZO6btTw2ohHzA+B3W4pIB6GbvO01N1rTOdD0mh7zSV1XDujXzDkBW8TB8+wOT0oMEMHmM 2xDh+0pYzOKI/BhohA2RLKExHU+NBJvkgrEos5/Yaas6X+UyNyS1tKKBz3KQl4SPeGnTmh WJJMqLLPnRlIJpOKmCeAWq1uHKtbESk= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=CjVP1jj4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 31JDbZAYKCJEDzv84x19916z.x97638FI-775Gvx5.9C1@flex--seanjc.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=31JDbZAYKCJEDzv84x19916z.x97638FI-775Gvx5.9C1@flex--seanjc.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692111061; a=rsa-sha256; cv=none; b=V4b0p8QW3wS9X5BM5ppOHxc+RMs3z0PFvL3ybiIDFKuvIiPa1ka3XzEiXGYSF4V6EiPlDd WyTUgV34ky4L7v+zVADaGnaMGfw8xjSvVuZ3qrUhiMMyxuG8Zt9RxsucMeyUdJ0Jlc+aWZ k4vTaYRFFGh4iYkEIMhNWY6wuzlLrb0= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-565e78b8b31so373749a12.2 for ; Tue, 15 Aug 2023 07:51:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692111060; x=1692715860; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BSaljbCebqXxAjoldD4Av0i/0DgZBw/2C/j7e5eHNTM=; b=CjVP1jj4QRBNiSjT17ZnmQ6IkpIiSXWRPUAGRWlcELiHLU8pm6JX08g26mT/E7NLgl Sk61Zoumk5UmABrUQ7A27qWDJ3N7uSNTDeyiI9stop74+DNWtWXsIho3spC61XYjmpJu fE0mgLPJytnijZFdWHVkJzO2eNC5Y0uztbNJlXFEud4rzzVINPh2Fcknv/7W4N25v2b+ /zjy43E+hnTxlGUrLHSe7PN8r2hzn6zAhKJBV1WVgVM1M4jClB/0Y7Q3NWBRIZr+Z7Qy F8BxkmdjJ8VPe+9HyqLXQY3o94LyApXS+4COm6Ty/l5KK5tNuJ4OdFogp/iwb3SNclLL pf1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692111060; x=1692715860; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BSaljbCebqXxAjoldD4Av0i/0DgZBw/2C/j7e5eHNTM=; b=MLVey4fLD2b9STst8OjeuyBsbwSTt6D8sX6VSgClds55XYJZYZ8TvLeo9T5VliRedH GcTzdVSdykA93RdrOOa0/s/LijKrdFtFUZ7FXAL8ZbyanAHHskDYfcCXV2B5dH6B/RHE VZH+m+i5KEWYhF8RKlZ8S+rHGqd5vKxLdn5V4vYE2t04M3Kg2eRsvGIcfrBL3rkRTEMZ KSz/t3fdTKgL/96UkZU2+OjyW307dDBT9uLPR5r797bsZzK9XS+sOafx3nQQtxzC8Nik KlLLiYn61uMc0dOdq+haVuvU3QRtbCdlR1amnXUBlPNpW4fDklXTWomHGwYAVGterV0z MJcg== X-Gm-Message-State: AOJu0YxhvBJEAxyl7E2PzHAP84dlLIbNradRdPsMEXOR+ak0lTjoC5mQ iRVaurfCruW0mdMJtpQL6Wq48Xb9hdA= X-Google-Smtp-Source: AGHT+IHG966N7ESziWjOW/ZteQHTABZwY4fegm6FMIGRWX29Olk3z0GlVrc96GzO09joDfiuW0Hud8OUTxM= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a63:77cc:0:b0:55f:d4a4:57ba with SMTP id s195-20020a6377cc000000b0055fd4a457bamr2231606pgc.8.1692111060425; Tue, 15 Aug 2023 07:51:00 -0700 (PDT) Date: Tue, 15 Aug 2023 07:50:58 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230810085636.25914-1-yan.y.zhao@intel.com> <20230810090218.26244-1-yan.y.zhao@intel.com> <277ee023-dc94-6c23-20b2-7deba641f1b1@loongson.cn> Message-ID: Subject: Re: [RFC PATCH v2 5/5] KVM: Unmap pages only when it's indeed protected for NUMA migration From: Sean Christopherson To: Yan Zhao Cc: bibo mao , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, pbonzini@redhat.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, david@redhat.com Content-Type: text/plain; charset="us-ascii" X-Rspam-User: X-Stat-Signature: qtpxadidewymk4wkaywquoycx338ba37 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C66B540018 X-HE-Tag: 1692111061-87774 X-HE-Meta: U2FsdGVkX18yiz/XdWFSQLRGe40XsqOvBy1yVw2KetYqbVGdl0h66LIyFCZ+FAErddfxyr4JOH4lJtv4EM/czk3Xjl6T8rF4UOoQsLyQustkBvVlss26yuc1q+JwoidAUOI0KP9m8552Vk5WzylFokc2Q1CVnnkklsCQPHEu3gHvvjwXDE1A4Hf1BFRKuRC3mpfsIvzzpuhU3FeromoR7yvIkd/Ra+Xym+0RXIOsjZuD5ktweGxOtiU3hPGswYZbxxOQXVJMDoXq/YiLXG3VYtfAatppaEQ+1gEi3fhAygDBwWNcPJhu79FEdUomfPpyzpIJmpRz02UuFFo1i59UB10EBBHck+ryEe6sMyYfOSdtp9tXgf5fHtFovO8gOe23lGCedNjky7xTGEj72oBI3M1VBRRAAqcdaziGgkDz09mLS30fj0OUHVoEt8UX7lDkrhE5sn2t2mvNcJU2Avmer/GxogLC/3rQze0KUQTY5GcWfuwWC5gZpyWYbjiuvbTfr/71FmCT7/u+qAYjNcTjqxehbN7/q1AT8FUH6x5WBwBB+EDLFK2hxjmn7ddVaqJkX54ZeQZNeRCOUhOrOKq/cXf36V4ltWKjiDTGNyxRu9cXC7RP6H72Ztb69N9MYH7BXd15PUG7sEsuqeAUUqCL2zWdxR4lAzi8lpSAZJLcgdJsSOohyokrDAontky44MdhCbJ0Z+qjl7kFfciadpv/6UOrZakf64FJO+v4+j9pTxlJjqbU9meG4XwSUfIC8KLRBJteh+c4KbiiPTChGEw9gwSUsebUlMtxXxzClZdjO3xZVf4KNqZURa5dsopAiuM9QcZPyTjfUDNBoevK0thqwIkAeNymuEBzLNffRZGShmJTCh6jLwnVHn7moHOT6miWn3kbDOnKzEkz61f4iq96GJqVNnpyPNRb8U2synYZomeg8cje/53LfX2sX73TzcelTxXUOhRuQu64ljuQSOm yzuMyC41 N6vkzsP0DC/Vfa0dxUY38E0oEmhtWMBGPKf2afrg4bCCAUxRkTozVJgy1epxt+f2NktHBvV9wKVdx4ZLighzu9j8NRA78o6NVHKWNZuo0Py9VR13Nz5yTTCGgFtes0WGUTMXwUTf9APCP3tNlhMjs+7b9t+1yL5sorqKMZ/9TUzsOKHQnhqGu6jSXSulU6ohLPN6ngpjIGDuIUtg7njAMqcVAclKzbeJ47ECoZ5Souhefd/qX41wTP1hKhpPFJ+G512rT2kk3EUobMFtNGVcO5ogIFhS5YCXIXcaeja9vaADzjZULF5oayWM8DczVf2aBIpPL9TURPT6NCD+VuM1RddhT3htwclh3YnmY9Yx809giWE38GbvPamba4lil8mF8GQ2FW103yVAITdnFbmNuravXHSM2oq1150sT+QNDV+HXQ8HYc/UHJn7P63bXj0VU5SRfDPSkG/vJnble8HBQ7VpYthQQGGpQxxR7bI+JrxP7bvZRXOjHPDJRXXP36+DPqZbGNNehqlwYtrmwZADizpUC+uhQUJArjXBUm1bahWLC38z/eSG6Ftj3krObQlP0Qou8H0v1/qfiozOrBa2lxqyDzg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 15, 2023, Yan Zhao wrote: > On Mon, Aug 14, 2023 at 09:40:44AM -0700, Sean Christopherson wrote: > > > > Note, I'm assuming secondary MMUs aren't allowed to map swap entries... > > > > > > > > Compile tested only. > > > > > > I don't find a matching end to each > > > mmu_notifier_invalidate_range_start_nonblock(). > > > > It pairs with existing call to mmu_notifier_invalidate_range_end() in change_pmd_range(): > > > > if (range.start) > > mmu_notifier_invalidate_range_end(&range); > No, It doesn't work for mmu_notifier_invalidate_range_start() sent in change_pte_range(), > if we only want the range to include pages successfully set to PROT_NONE. Precise invalidation was a non-goal for my hack-a-patch. The intent was purely to defer invalidation until it was actually needed, but still perform only a single notification so as to batch the TLB flushes, e.g. the start() call still used the original @end. The idea was to play nice with the scenario where nothing in a VMA could be migrated. It was complete untested though, so it may not have actually done anything to reduce the number of pointless invalidations. > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 9e4cd8b4a202..f29718a16211 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -4345,6 +4345,9 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, > > if (unlikely(!fault->slot)) > > return kvm_handle_noslot_fault(vcpu, fault, access); > > > > + if (mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva)) > > + return RET_PF_RETRY; > > + > This can effectively reduce the remote flush IPIs a lot! > One Nit is that, maybe rmb() or READ_ONCE() is required for kvm->mmu_invalidate_range_start > and kvm->mmu_invalidate_range_end. > Otherwise, I'm somewhat worried about constant false positive and retry. If anything, this needs a READ_ONCE() on mmu_invalidate_in_progress. The ranges aren't touched when when mmu_invalidate_in_progress goes to zero, so ensuring they are reloaded wouldn't do anything. The key to making forward progress is seeing that there is no in-progress invalidation. I did consider adding said READ_ONCE(), but practically speaking, constant false positives are impossible. KVM will re-enter the guest when retrying, and there is zero chance of the compiler avoiding reloads across VM-Enter+VM-Exit. I suppose in theory we might someday differentiate between "retry because a different vCPU may have fixed the fault" and "retry because there's an in-progress invalidation", and not bother re-entering the guest for the latter, e.g. have it try to yield instead. All that said, READ_ONCE() on mmu_invalidate_in_progress should effectively be a nop, so it wouldn't hurt to be paranoid in this case. Hmm, at that point, it probably makes sense to add a READ_ONCE() for mmu_invalidate_seq too, e.g. so that a sufficiently clever compiler doesn't completely optimize away the check. Losing the check wouldn't be problematic (false negatives are fine, especially on that particular check), but the generated code would *look* buggy.