From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44A09C04E69 for ; Thu, 10 Aug 2023 09:23:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB8D96B0075; Thu, 10 Aug 2023 05:23:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A67B96B0078; Thu, 10 Aug 2023 05:23:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92F5C6B007B; Thu, 10 Aug 2023 05:23:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 846C76B0075 for ; Thu, 10 Aug 2023 05:23:38 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 529F8C0785 for ; Thu, 10 Aug 2023 09:23:38 +0000 (UTC) X-FDA: 81107657316.05.E717907 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by imf13.hostedemail.com (Postfix) with ESMTP id D5B9E2002B for ; Thu, 10 Aug 2023 09:23:35 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=njAoiDR4; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf13.hostedemail.com: domain of yan.y.zhao@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=yan.y.zhao@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691659416; a=rsa-sha256; cv=none; b=e1N5WCGQL9fPt+ByjxS9u5CVTm1h3RpRbitlKUavpzJ6tVs+2rN8lajxQQf7tpnhUFSzht 0tOrQ7RmCK+IVM2BGSCx+eUWPm8UIcJCHo0g8eVKOzg951qXaPicS0g7ScFIAwStrtNed/ myvfF8H+lCipnSAwkGPgKv5yAuAKYvc= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=njAoiDR4; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf13.hostedemail.com: domain of yan.y.zhao@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=yan.y.zhao@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691659416; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references:dkim-signature; bh=NqE6mTEwI453v/gEtcv4ta9MU3s1kNot4sv7mgfbvSk=; b=4IYusH8vNkDnYPhLQ8g/f3+OVWjRzpCVZjtTwvCM9PMkAZOboR0gOm7GIDgevycUVvXsa1 cBgtBiJDNeB3VUrgHgplcrbvQ87UOHwdpoDFkGtNpn3a0OyqnkCIHlPs0PBQp8kSoDSV3m LJRvrxEfFF05MyeRx4MmPK057rmdIdA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691659415; x=1723195415; h=from:to:cc:subject:date:message-id; bh=gfW6MuZeHTt533K3UHwW/pOHp5AwnMiWdNZRUvGivNY=; b=njAoiDR4MBO9xgR5SNO+oMBQUO1gVJnwUscGrDz+yuQ/L9ZbWtVQtJ9y 4945DDFLflePlZZ9Pfn6OM7mxx/4PkupxSC2kvgDbzFC4PbR49Tmex47w bjeh6lSiY+gklLI5rMPPd90lav8ohJ1QM+OOhbr6RmK/XcLQ/Qzo6bYAV Mjc4N1l8KLF+DjYv5Y3f8+t744OSBtH6XXxuJFc1rNgc0RgMtbHKhhXaA I0B/IrrfnQoa3M92jYNRZouZqj+wHAEe4k+WH3pP9I+YWn7/NxOuXeD7P 50QCeMCTCmAZ64ahS+231tZ3DQVxiHeIZZ4X8H1xE5Un5tcAVy6IPHKHH g==; X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="435245960" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="435245960" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:23:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="802128926" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="802128926" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:23:30 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, david@redhat.com, Yan Zhao Subject: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM Date: Thu, 10 Aug 2023 16:56:36 +0800 Message-Id: <20230810085636.25914-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D5B9E2002B X-Stat-Signature: hrbei9z7hnog7hzmt7jwci9arocpynd5 X-HE-Tag: 1691659415-644758 X-HE-Meta: U2FsdGVkX1/t5Glq7Sl2zD94dWTWDbF0qstBgwd2tUQ8Now/INrcLwhEBbtWVTagEJnXinp4fnq/jMXJGCrHahoT3wDMxexUTH2C93qy3WYnnR8dMXWgQ0Ccp28vJat3ngM6lLRur3auOXonx+dWiMnhiYOQ2hktbcDAhkncx6p72ru+7+cMfcR7UTHXG26Fo20HjxfoYWnJzm0rQqspYtIYaqJRZNYiqpqqraD1EvmLcVZuGqv+4IuJUBwuzoAGszgGEY4Xv6x3Bs/eqL4voUgiM6dKmu3rkJ04Fn+JIIvuSuvn+fLPh5+naOBbb6KYxqdXbdGISxrppUBQydGbViLQ2cM1WC2OH+Wojzt5ncjy79ShiUnFP8NOZVc3yUkIYKRXxzROSyzchmp3IXAVRqR3+J7f6yCtWtwbhxfGXIOkTb3YrEG7yZ/6omN2Z2kteK43bJGNi/xHx9QfLoPSktgKyptZHk8i8EDf03yl8kS2wnSA//1CD1vsZkIU67hnYihFtr1nDrhYW5JvHKzLjzqa9f/JtPa++Qw8XWjXVRjndwbDrOV8FORWnqkse8Bb/1HEfEwFV60tzT4U0a+VgT3j941gH1pyBC5NVhKo7qINPfCBogMJdZH8pXMAakqDTpUG/cZ6mg9ejgA+7iBt0Ot4fClucv6uz/g0W6Y/YpwWLHZQRP0Hw41H3f1fJoXtML8lHjLEOwcPdNaB+AtQk5VguvewKE/9iRyUdhvq9Rocf/Qs68U2fHm+N7PLJD4GfRABIlSeHOK0hoT7W32dJW9XEZCaWjZFh1qsdsDkHfefLfbVWk43zJIfmmdrcFxsM5zFANJadpbPq+G9U37D6MbverNwgQR8prQ5N8zZG/3q8XreDQdcoR+ljDGdGJWjXZ9lQ2KnjyQA5CU0Xira7Jra3kQE7ekVYyLChXdUAWNqo/TQs95GutKXy1m/196NfXdwPPBb+zviyOIO/z7 A2YhazKx UKeAe5V1H02fgtjhLgsJyLQdaa31z7sHaIudspMex4OkYUqCjqMh18NR3Nk2VDuGU7xFSCvQ8UnguSqyih+CCBohAm11SO2wpmvLQifSX+/jaSEomnhEb+C7xA46HEEV8bhKMKGtM7rYy430V9caNQtcfFbKhtLxDFe1l8sfieCE4rllmK04EvEylpw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is an RFC series trying to fix the issue of unnecessary NUMA protection and TLB-shootdowns found in VMs with assigned devices or VFIO mediated devices during NUMA balance. For VMs with assigned devices or VFIO mediated devices, all or part of guest memory are pinned for long-term. Auto NUMA balancing will periodically selects VMAs of a process and change protections to PROT_NONE even though some or all pages in the selected ranges are long-term pinned for DMAs, which is true for VMs with assigned devices or VFIO mediated devices. Though this will not cause real problem because NUMA migration will ultimately reject migration of those kind of pages and restore those PROT_NONE PTEs, it causes KVM's secondary MMU to be zapped periodically with equal SPTEs finally faulted back, wasting CPU cycles and generating unnecessary TLB-shootdowns. This series first introduces a new flag MMU_NOTIFIER_RANGE_NUMA in patch 1 to work with mmu notifier event type MMU_NOTIFY_PROTECTION_VMA, so that the subscriber (e.g.KVM) of the mmu notifier can know that an invalidation event is sent for NUMA migration purpose in specific. Patch 2 skips setting PROT_NONE to long-term pinned pages in the primary MMU to avoid NUMA protection introduced page faults and restoration of old huge PMDs/PTEs in primary MMU. Patch 3 introduces a new mmu notifier callback .numa_protect(), which will be called in patch 4 when a page is ensured to be PROT_NONE protected. Then in patch 5, KVM can recognize a .invalidate_range_start() notification is for NUMA balancing specific and do not do the page unmap in secondary MMU until .numa_protect() comes. Changelog: RFC v1 --> v2: 1. added patch 3-4 to introduce a new callback .numa_protect() 2. Rather than have KVM duplicate logic to check if a page is pinned for long-term, let KVM depend on the new callback .numa_protect() to do the page unmap in secondary MMU for NUMA migration purpose. RFC v1: https://lore.kernel.org/all/20230808071329.19995-1-yan.y.zhao@intel.com/ Yan Zhao (5): mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose mm/mmu_notifier: introduce a new callback .numa_protect mm/autonuma: call .numa_protect() when page is protected for NUMA migrate KVM: Unmap pages only when it's indeed protected for NUMA migration include/linux/mmu_notifier.h | 16 ++++++++++++++++ mm/huge_memory.c | 6 ++++++ mm/mmu_notifier.c | 18 ++++++++++++++++++ mm/mprotect.c | 10 +++++++++- virt/kvm/kvm_main.c | 25 ++++++++++++++++++++++--- 5 files changed, 71 insertions(+), 4 deletions(-) -- 2.17.1