From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6687C001DE for ; Tue, 8 Aug 2023 07:40:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 246306B0074; Tue, 8 Aug 2023 03:40:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F6226B0075; Tue, 8 Aug 2023 03:40:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BDF18D0001; Tue, 8 Aug 2023 03:40:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id ED6D86B0074 for ; Tue, 8 Aug 2023 03:40:29 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C13C1B1FBB for ; Tue, 8 Aug 2023 07:40:29 +0000 (UTC) X-FDA: 81100139778.02.8138DB2 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by imf22.hostedemail.com (Postfix) with ESMTP id C6B16C000F for ; Tue, 8 Aug 2023 07:40:27 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=A6x66Dgg; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf22.hostedemail.com: domain of yan.y.zhao@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=yan.y.zhao@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691480428; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references:dkim-signature; bh=yb3PScxP/yXgdF6uacGEI8JsPThoaWGXGlLFr70kzOQ=; b=nJhkeRL9M2Q7fZeyoyaUjF71udUuoPAXoe5SmpaQqXzbzNITj+qJui5x87owS2dotUwGIK hu7vKLDK4svpvse8pA93J/6E1vSpgGlWKI3jFqDhO2fzI1rmUjDVlroPJJuqUMU0/WeFoF b7Hq1KZkC6F5yZPtQA+A2zrFoPmmu5U= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=A6x66Dgg; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf22.hostedemail.com: domain of yan.y.zhao@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=yan.y.zhao@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691480428; a=rsa-sha256; cv=none; b=7lGD7EaY+Rt9ZZTzQcuC4xWcpwPoG9hYfdZU010ROIDkuyUVhjj27nixU6uWgu60F7Yhmx RNH2nj1meRAmQIXmMEusuWKvA4ZZnm29JIxKwPiBS0GtNXHAsSKHBlxRag1aLg/kzxqE4X fN06Vg/QTHFl/YQq+KnX6cwlaGrRgCo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691480427; x=1723016427; h=from:to:cc:subject:date:message-id; bh=Vk5oRN79w/cLbLNx3NSjT6+SGZdwpnI5SrG+5/rFROM=; b=A6x66Dggw3OKbWOoRfmj5M0/KKslfTsvSwyPKKN2MWJmpVeaJtoyTy1w EFVaaCkBpJ0/pcCd9ED0Jwywre+QRMWpXiGH3pJRpBvMmHPKUWJwvQOiP evMflZR/LdZhEN3LYQddRe1hDyAGbgarjUkrJlBUS3rHFrgXNXcLSJ4cn kZ5GGzm5zo/Dbjn0ZyOvOyhG2x1Zp5+E0mj8LBIOzgnuNdADbhnrUSUhT gI54jqzBHtiX3CwQeMscDpdT5PI+I9v/gcP+JfJdAK34CgDkLZGKEUTDg zj/BcIHRfV5UF4ORLsxiIchrIBERcJYaj5pWfsc7+6b/EU+BAriTZC2Jr w==; X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="457130717" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="457130717" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:40:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10795"; a="731281811" X-IronPort-AV: E=Sophos;i="6.01,263,1684825200"; d="scan'208";a="731281811" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Aug 2023 00:40:22 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, Yan Zhao Subject: [RFC PATCH 0/3] Reduce NUMA balance caused TLB-shootdowns in a VM Date: Tue, 8 Aug 2023 15:13:29 +0800 Message-Id: <20230808071329.19995-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 X-Rspamd-Queue-Id: C6B16C000F X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: wufsmsu9mp4t9p5w36o6ygh91aq76mce X-HE-Tag: 1691480427-171289 X-HE-Meta: U2FsdGVkX1++/GKaMd18eeqchd3NLHA3Esegl7wGnG7vHaeORnNVjcGjnpE3lF1SNCz9fOoqryBvS/Fj0EYlB/+igSdigde3+/2OB3Di/vhrecr5X4cbtiODV2mQdOPXUac/xVxpONG2A2mPvEYQQuaej78e1zBz57nS9sG53eJSCbjQj/2dh2cIc+mn4K4JurFQxbTyZPr8lHnxDNW/lMRoE335wHzsRC5cDJpXRvGgHrYCA8+ZPQ1TCTlPNyJs/mdbqxs5dYqoXZHRg2FdlXaU27/+cm7qg6vtN2vyTwbN/TVGVQl6Xt6U6w4VV4o/D5dEirCMspFYEgUOeIQrvQY7zTpOvcAfMpVAmfPYTBA7Jg2pb+RDHqeYObj3SUe4N1CTo6TWVdq5lXxzy5qpJcCMADBBE7ybBJCu4M64ct5+/ewqaD91/h/5C3WEclAVLh/16CV3YzqDC7rgiyegCXZLz7tBlbfPSfpfrKHCPaUwAHUBhbTzxfYw/IwwcpUPK7800zCqg3FQSX+HWgAWeJYRiFtDZyIqZ84G596TEC9a9VH5eblPrI7PkewsrR5nBf+UrQ3nT+brPewd5XTBsXo4zHxGtYJGibZF5As3xfaPmJlHVeOQxmfkir3DsG1A7K5SHV29LuIqsx4giw6y68ftiVAdkPMxI35+6ymNxQ2ndif1dKy4Te05GOia2haMm4uTnX5QSS1wKcSWB/1Gj6M6uM+hCmhpGaMHv9bfX/zFPW4wfFFV0ds24iq9Mojz3nv9yvYU/Iogsan/Ld7Gcz1scUmO1FJbe5SsGNhTIVhEgENgzXQ8CbuelggtVpa5yGXqcSeb0Kw00Cxzpv0Ea1mfgRncciaiQ1ofHxjPWCisSOzpruxVQEKrSUQpboGdP8hOqKeXgkpez6r6CiPwLDSz6hmC70I8M2lRSaK9sRbg9CywTD37vDEUEg8fs5UXbKXKYdMJBLMZ+Krg7Vi OIkuJ1mG KXYllBaSO8NQ5RYIa8VGDl6xMjwQwsjPE6LA7yagGkH6hPVuxsj9E13lOUwhH+HZ4VIJ1HR8+vI7kd0Zn5Tr9WMFvCi63qBuXRALT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is an RFC series trying to fix the issue of unnecessary NUMA protection and TLB-shootdowns found in VMs with assigned devices or VFIO mediated devices during NUMA balance. For VMs with assigned devices or VFIO mediated devices, all or part of guest memory are pinned for long-term. Auto NUMA balancing will periodically selects VMAs of a process and change protections to PROT_NONE even though some or all pages in the selected ranges are long-term pinned for DMAs, which is true for VMs with assigned devices or VFIO mediated devices. Though this will not cause real problem because NUMA migration will ultimately reject migration of those kind of pages and restore those PROT_NONE PTEs, it causes KVM's secondary MMU to be zapped periodically with equal SPTEs finally faulted back, wasting CPU cycles and generating unnecessary TLB-shootdowns. This series first introduces a new flag MMU_NOTIFIER_RANGE_NUMA in patch 1 to work with mmu notifier event type MMU_NOTIFY_PROTECTION_VMA, so that the subscriber (e.g.KVM) of the mmu notifier can know that an invalidation event is sent for NUMA migration purpose in specific. Then, with patch 3, during zapping KVM's secondary MMU, KVM can check and keep accessing the long-term pinned pages even though it's PROT_NONE-mapped in the primary MMU. Patch 2 skips setting PROT_NONE to long-term pinned pages in the primary MMU to avoid NUMA protection introduced page faults and restoration of old huge PMDs/PTEs in primary MMU. As change_pmd_range() will first send .invalidate_range_start() before going down and checking the pages to skip, patch 1 and 3 are still required for KVM. In my test environment, with this series, during boot-up with a VM with assigned devices: TLB shootdown count in KVM caused by .invalidate_range_start() sent for NUMA balancing in change_pmd_range() is reduced from 9000+ on average to 0. Yan Zhao (3): mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose KVM: x86/mmu: skip zap maybe-dma-pinned pages for NUMA migration arch/x86/kvm/mmu/mmu.c | 4 ++-- arch/x86/kvm/mmu/tdp_mmu.c | 26 ++++++++++++++++++++++---- arch/x86/kvm/mmu/tdp_mmu.h | 4 ++-- include/linux/kvm_host.h | 1 + include/linux/mmu_notifier.h | 1 + mm/huge_memory.c | 5 +++++ mm/mprotect.c | 9 ++++++++- virt/kvm/kvm_main.c | 5 +++++ 8 files changed, 46 insertions(+), 9 deletions(-) base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c -- 2.17.1