From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42966C001B0 for ; Fri, 11 Aug 2023 18:39:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D076F6B0074; Fri, 11 Aug 2023 14:39:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB77A6B0078; Fri, 11 Aug 2023 14:39:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7F5F6B007B; Fri, 11 Aug 2023 14:39:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A8EC26B0074 for ; Fri, 11 Aug 2023 14:39:55 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6719480940 for ; Fri, 11 Aug 2023 18:39:55 +0000 (UTC) X-FDA: 81112687950.18.27139BB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 946CD8001F for ; Fri, 11 Aug 2023 18:39:51 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SBr2G8rh; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691779193; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pQSZqaBPxPLEbOWk1UdqCwfBAXoTtMuxQ+F1Qordg04=; b=A7F/2D/q718XHWaSXYbwa48VPxhHpSO/8kcvsjR8+EDLYFRNA43mPrK2080I34SrsWs6n/ dWbuj/GeZsT8sDw4znctKGx2IjziJkvJHbqnSQVddQq15AYBB/MtNVOJAMIdHmLrIaARRv ssepWxlg8bqHMmPJ1HEhZgyfuQbGyVY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691779193; a=rsa-sha256; cv=none; b=G2NLmbvjPzW7LaYCiI0sfqkjsMYTV1N1Dtn+XPSTLONVGX46KRSxCRCEChRXTNELcT70+2 zqWFe/i+fVVeDkwcvrJehIWxnRD9F3Mwn6GuBu7lakBThJCarN10OTG+MUgRDr0UdTPbg6 hTj15ULVTpaBfbXheq+nmKE0i1e1U78= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=SBr2G8rh; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691779190; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pQSZqaBPxPLEbOWk1UdqCwfBAXoTtMuxQ+F1Qordg04=; b=SBr2G8rh+6G74Epu9LditttY7iEpuvPgDmjpoWl1+Nhoy7iPnb7B0kcy/BJtkWUOrKJXXv IaL7dgnQ6kkk6yGxCwKhIIW2JCARDmiKxYguPfoaVuteebq7pFW/rFp2k5xvUsGjwbOC60 DcXMt5YiSCDIfdCAuqL3g5n8z7NfduM= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-403-fy6rXjF1MLO5ZdNMFVPdLw-1; Fri, 11 Aug 2023 14:39:49 -0400 X-MC-Unique: fy6rXjF1MLO5ZdNMFVPdLw-1 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-317421b94a4so1317071f8f.3 for ; Fri, 11 Aug 2023 11:39:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691779188; x=1692383988; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pQSZqaBPxPLEbOWk1UdqCwfBAXoTtMuxQ+F1Qordg04=; b=dEsli4hHjei/uvXttDFnoutzdmalBkO+docYte/I1xJlMTnodPTcxynkq18Wpc/8eI 1cnwm9UD3EooxJsJvkDEhEOhbJBmYVzCxldToxwh7mN5waNS6srFASfAsbnwcWGntCTZ 7K4dyoDruYin5ZwB4hb6m6IV3b5YDMc2gfDsQ2EqTi+vKsVliABxYOHMBqvRVuQ4QMg9 TIv07Y2B41S8CMt893H6IOqyWwJZEcZAqIkq7BZRtVHcrCRXjC+2lgOVgXUPBTctLTuP 1QmestOSMkOnlwT+muKPIliaWyVtJBPoqYsfc42MpqzyZf/jS+ZxBhU8tb/W/jwu0gBe 9wcg== X-Gm-Message-State: AOJu0YzcCyocPTLsSkjMwcL9SoUeYLY7BDhfd2PeqLMU/efjqmkXxVCz ZdXHCH2T4Xv4ur+GvGAElGX9+qhQWsa0tQv4Vi0QfLl29Nh1KohHBlt3kTvvdT9yXTo3XvBAbNO zrNmgFKj+3j0= X-Received: by 2002:adf:efc2:0:b0:314:370f:e92c with SMTP id i2-20020adfefc2000000b00314370fe92cmr2209047wrp.67.1691779188495; Fri, 11 Aug 2023 11:39:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHH61cb9gi24MjCxCXSGg4TnORc0Tig5arRBwgwd6TW8enuaXyKtKpsva9dj5uUlJDaO/WG8g== X-Received: by 2002:adf:efc2:0:b0:314:370f:e92c with SMTP id i2-20020adfefc2000000b00314370fe92cmr2209032wrp.67.1691779188068; Fri, 11 Aug 2023 11:39:48 -0700 (PDT) Received: from ?IPV6:2003:cb:c71a:3000:973c:c367:3012:8b20? (p200300cbc71a3000973cc36730128b20.dip0.t-ipconnect.de. [2003:cb:c71a:3000:973c:c367:3012:8b20]) by smtp.gmail.com with ESMTPSA id l6-20020adff486000000b003143867d2ebsm6200653wro.63.2023.08.11.11.39.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 11 Aug 2023 11:39:47 -0700 (PDT) Message-ID: <846e9117-1f79-a5e0-1b14-3dba91ab8033@redhat.com> Date: Fri, 11 Aug 2023 20:39:46 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM To: John Hubbard , Yan Zhao Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, Mel Gorman References: <20230810085636.25914-1-yan.y.zhao@intel.com> <41a893e1-f2e7-23f4-cad2-d5c353a336a3@redhat.com> <6b48a161-257b-a02b-c483-87c04b655635@redhat.com> <1ad2c33d-95e1-49ec-acd2-ac02b506974e@nvidia.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <1ad2c33d-95e1-49ec-acd2-ac02b506974e@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: qfubynnyb9rr9titkcn8ujfcpjt67dw3 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 946CD8001F X-Rspam-User: X-HE-Tag: 1691779191-904008 X-HE-Meta: U2FsdGVkX19+RiqVc74BmAm41T6eB8J0lDTMHukPTSXZ5qwtnCm0tTPmP2fbHhzh9g4iMboi5doG/2XF2OwKqNAkeQFrZ8UelYGDJrrn8xc7eW1oslH9wSsykLwhfKfxR5K05SgJtFspvOs3+UqTykkbZQKCITlE/e8cBYV2bid962WFaH5eVSagDLPwMVuHnWr6oJgunFgtyT/DlTE5wucV0X1O1ZdBxaQ0GIckd3OKY3pSQKhjLkK2zZ+jdblvKRknMrLQxVYCiPKEipnkT7467AnnmXvBM4WYjVNB5XXzcH5DEQEnMSk762jdTobEdLx3jzAK1nFSM77SC4UpNiCcdVW07WQL6LPW2hKmPT0rwqcoW3Y1YRp4dEFbSo6hECGSp4zDcHBfLW8if/myxfwSZrgQGtawz5m2GjRoeVp3q2BMzsoDZ/9i8hwXAssK+X2Qo4F6XV4/Ko6/gJg1VYtEsw6NfsHcWKuNN3POG9zqw2xnQCyGLmuAvy6VmmGZYzq2jKMZB0ZywHZavTtj6Qdkt1cIQ5tBvkSps1OKePQhaF5GrlTTmsu18WnLK5z4arXXly7iXHEdfJTZYTxGN+M6fBpkFuY41SFY+uoYkUHE80Kf31Jjr0dYcNlJgGM3GYQy0Bf+rITSLxVE9bicknPewkn+SBVizz8NDo1FNpuJPro6gTfTuynisaOqyUgVlSYGPWY2Q1BBa8yMSvzoSTzmu1OA3IeZRx6p2PkojQ2VwWO1jdx54D3lgl47JJc+Sfw3jRjyQVLVh72ysSXOk/EX7loixhSNMOdJMH8Swh7na5wTKIRtLAdhbgAsgjMTxsxcoQRzAIrzHKsjzZuK5+wFdLw+6Jxax159DM4i94dmqvGeC4wyF9W2nFDY0w1J2pQWrpU6WznJpA3z5Oo0uFW+/prXbWSM/ZTWCuaa9LtVhzy1TeXtuVwwmr/D4t+s46TryIbebsCekReoous qp8XPJo3 tWsLLuAOH1UV4/5a/GfZ0qNFFGCLe61a3m4cbnDZTSnqq9m8BPs93Euz4GJ70OSWPmbOAjDdwNGUaaroXB+USdK7hmnexhYiGCkZpp3b3vrOCkpCIl4Et3FWSrEhDFBKtHmQN9dfSa7IOXGLktUkpY4/CiGaycHdASgQj1sAQTIsATsLojz2trqGHQk1vk+w6Y0jet0pbUzTZ4oyZOpAVjV+Xe3Rk781V4Zj/oey8wkQw7xlqcZt9utWpN0bcPQGBVTu+HITmvnNz7VPbGtXy+Xi5bWdN139EBGxWZ5mozc2fWweJHTClPn7zfjIl6/7WQevhVBJC3x2am6j0uOB++liwXqb5Ex7BPL+K/xXFg3EMu3/mqwFWiq4Sog3pibKMRKMa X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> Ah, okay I see, thanks. That's indeed unfortunate. > > Sigh. All this difficulty reminds me that this mechanism was created in > the early days of NUMA. I wonder sometimes lately whether the cost, in > complexity and CPU time, is still worth it on today's hardware. > > But of course I am deeply biased, so don't take that too seriously. > See below. :) :) >> >>> >>> Then current KVM will unmap all notified pages from secondary MMU >>> in .invalidate_range_start(), which could include pages that finally not >>> set to PROT_NONE in primary MMU. >>> >>> For VMs with pass-through devices, though all guest pages are pinned, >>> KVM still periodically unmap pages in response to the >>> .invalidate_range_start() notification from auto NUMA balancing, which >>> is a waste. >> >> Should we want to disable NUMA hinting for such VMAs instead (for example, by QEMU/hypervisor) that knows that any NUMA hinting activity on these ranges would be a complete waste of time? I recall that John H. once mentioned that there are > similar issues with GPU memory:  NUMA hinting is actually counter-productive and they end up disabling it. >> > > Yes, NUMA balancing is incredibly harmful to performance, for GPU and > accelerators that map memory...and VMs as well, it seems. Basically, > anything that has its own processors and page tables needs to be left > strictly alone by NUMA balancing. Because the kernel is (still, even > today) unaware of what those processors are doing, and so it has no way > to do productive NUMA balancing. Is there any existing way we could handle that better on a per-VMA level, or on the process level? Any magic toggles? MMF_HAS_PINNED might be too restrictive. MMF_HAS_PINNED_LONGTERM might be better, but with things like iouring still too restrictive eventually. I recall that setting a mempolicy could prevent auto-numa from getting active, but that might be undesired. CCing Mel. -- Cheers, David / dhildenb