From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3BDCEB64DD for ; Fri, 11 Aug 2023 17:25:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 572B86B0078; Fri, 11 Aug 2023 13:25:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 522EE6B007B; Fri, 11 Aug 2023 13:25:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C47A6B007E; Fri, 11 Aug 2023 13:25:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2D3FD6B0078 for ; Fri, 11 Aug 2023 13:25:40 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0296D1C93C1 for ; Fri, 11 Aug 2023 17:25:39 +0000 (UTC) X-FDA: 81112500840.30.6CA1606 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 7C93A100016 for ; Fri, 11 Aug 2023 17:25:37 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cqK9XUGu; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691774737; a=rsa-sha256; cv=none; b=u3lO22R4+6/IIPs3Hs+InmoTICYNbSWekjh7AjSJTw1mI0XJQ+1dUeJUT1qXoDym2LCmjE JdKnObj6+vmsYSQ1A/AjQ8Q+FLwhwhM7VOT2KzRqe+6mYjoX4Y69ZdIxC93f0Ql7DfaRn3 2hM5tVGcGUymJ/IrYIt5kkBqMdIuNbo= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cqK9XUGu; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691774737; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X6sQ8Q6WelCDCKZ8Z8ftpDHYvOx3deYiCFU+E3Ltf9Y=; b=ewEw8smT5yjpoJUJGHfsRyhUtxhWgKJTylrxoWqWHkTjHYO3rXRT34FWd8cwOIAe+3qYAk Oll23GI0cfLVgfQW7wbJGZqwP2Z7gGz+ZLdLrNSgKTXGRNNeAD3W77x5QIrIrj7I/2XC4k WYkTf68tWZ9zKBUg7Eaq4bxarVuOcGM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691774736; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=X6sQ8Q6WelCDCKZ8Z8ftpDHYvOx3deYiCFU+E3Ltf9Y=; b=cqK9XUGuRGxftkPmSYSWt794Re7Z7lbP/5CgJ4BA2KkYI4h7hDCoQ01InY/7I4vs5S9rjl WxKw7XM0q1G5CkGyR1Fmskj3tOVieQLKcB9m6qcCigJJPk2r2b0VVfolFyuxAcXDLH75Ar PGCFeQsO4kG1EOTBevEe3hlfoHVRhk4= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-370-tsBIxQsDO2aAbhvJnZjlSA-1; Fri, 11 Aug 2023 13:25:33 -0400 X-MC-Unique: tsBIxQsDO2aAbhvJnZjlSA-1 Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-3fd0fa4d08cso13969155e9.1 for ; Fri, 11 Aug 2023 10:25:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691774732; x=1692379532; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=X6sQ8Q6WelCDCKZ8Z8ftpDHYvOx3deYiCFU+E3Ltf9Y=; b=dz88qXXWc6IPH9vsHFt/2XjNw8nXj02PVi4CWnFnJ1s89ZVk4M+C/URU0apvANWN/q KESENIrrN9B2je9Xtp+mhfr/nq72czcmfbaXC9UMectE8uoKKooRi5WBBKUo5hXWiKzt wnUxzfpKws7/4tYtp9hLCtfvWKM+MLZzQi7sJm1T5MlvCMk/xfB7AOtXZWHtu7VJodHC LIzuEftjIFDsYhOyqCPZVZNA5C/d86gHJGAwuOiw4e/+UVU/I4hHkNr4Hw2n42EL793g oKWXas2TCSQGEhN3bVH8KMgk54NcEyWEIOtE+RT6q8gfcSM78a0OYUhsJw/stgAyHM6l T8BA== X-Gm-Message-State: AOJu0Ywnul2VsRcprmKAkqZIVqz7K7i+2CGuSfvMCbl1/h/tgPTvXHwC VvN+i073KkWoLj0t50V0VZYtHMe68OtJLK72ZzmRqj8kg5ISOq09etrW+fHiLb9SaQ9lD1bKaZ7 EAZLdg0ai+6A= X-Received: by 2002:adf:f511:0:b0:317:6b92:26b5 with SMTP id q17-20020adff511000000b003176b9226b5mr1820018wro.23.1691774732466; Fri, 11 Aug 2023 10:25:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IESQhR+T4R/2hhEkOkTbdXkRMH8nFN557GFq/6P+WCs1OfxoSnCzpP/VC0zaopv5X94Y+Tueg== X-Received: by 2002:adf:f511:0:b0:317:6b92:26b5 with SMTP id q17-20020adff511000000b003176b9226b5mr1820002wro.23.1691774732111; Fri, 11 Aug 2023 10:25:32 -0700 (PDT) Received: from ?IPV6:2003:cb:c71a:3000:973c:c367:3012:8b20? (p200300cbc71a3000973cc36730128b20.dip0.t-ipconnect.de. [2003:cb:c71a:3000:973c:c367:3012:8b20]) by smtp.gmail.com with ESMTPSA id o13-20020a05600c378d00b003fe2de3f94fsm5803925wmr.12.2023.08.11.10.25.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 11 Aug 2023 10:25:31 -0700 (PDT) Message-ID: <6b48a161-257b-a02b-c483-87c04b655635@redhat.com> Date: Fri, 11 Aug 2023 19:25:30 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM To: Yan Zhao Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, John Hubbard References: <20230810085636.25914-1-yan.y.zhao@intel.com> <41a893e1-f2e7-23f4-cad2-d5c353a336a3@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7C93A100016 X-Stat-Signature: cpha6th5ygrq1uiq1ha3ibkrg5toaqg4 X-HE-Tag: 1691774737-581905 X-HE-Meta: U2FsdGVkX1+V8mHwkmR2trUxFz5TDqlgwu060UAGxqpdy2FXl0O5w4oGQJgz4g4kb6bgesI4QOhrnT9Eb2sWr+PKIkeiuMBIqDOZRiO2DI4d8E/fJKNhdn4jEvYIUAx5vrOlQe9ESiIJqB7FXA9vIR0eUv9LVnlEizEm7ZLBqZ/4hGNqv2y6H91lhaCQscgTls5JoJ6Uxp+/tWRHZ4geLWpqku+TP97phWkI9YI5ga54c3PDlzkqA5wRfyWkx7cdin93QnZVF8nYdycyVbpBvsxPMZhTOgMKGI7pnYC61CH31119jIz+4J1s1sz5u7SbcWTNzF6X5W5V8htgYViyj75eGVQAwaR5AFOee7LmV0FBXKVyu8t87elFkZh01Z+UDzkCoyonf6Ey3mGM49BBC8/kO+7eoYOxWDLrcaEDx5Auf0iGHDjWKLPKhq6HgNlpwRLLyB33BpFQK0d6fvwYR4T/zPsNxmmcL468giF3Sf32z6wIGwYxB2p3pkLCLYZN3HE/wZ1auBwb+MvtB7wRCDtzR7ZTTabnigBNoXzRaeaHmcJgiO5VDTnfL4WkVIl7rYJb2CVBHuVTLXW+LrepfCR2i6eKUtoLAgHw3TJIabDi8znvRCpSFzJNi3cgyRhK+8XqXId0IwgiEXHLdIsjQZDrY9+xbuOaofQ2fG/odjp1iZ3QQjWcasULYQJkCzOAI6hxj5BmJiqbKKiT1ljQ+DevxgrnoJMUw1FArNN4YNI5I/QzLnuBCSFEn6iXEDlnj2sXrFYCFVH5bOJ+UxJHNtkYKfz9Vfoqv3Ls0mlhHjjsCLPbMK5V2+2HRiHj8LiBSPtx1ywFokm28g9WRZoQ7oseJHOJtrfScB1/ATRRcRU+ueP5w6Klpa1Oe1F1if6ItFQZmWH4wjGtNk/oDlBGwppLwHq74RcR6l9CJ09T/2Hb5kc2OqIFRSvKJzm1g8GzRZXb1QKbGXF+wbSmwb0 TLDyy7q0 trge77G0UWEPz+Ktb9CA405GDSSZ8OJ0nd5nalTg0YNinj0vkr1nGiT44x1vZrLhQD8aOKPgBEDFvdlbb0FwYA8GIILSAfdVOUaMp/MJvy170jdiz2dAuKtRaav4cY/OPA7wXUH7+gfVjAPPgC++jq6JZ8z/DlJrqRcWgPi7/oH3PWyxTUGGs0WLUL83kKMUs9jWQQRrjtup5XhhELWODhnvgURx5piDA2BJrW5gBPskHL6AGcGWxU+I5smDsNhurBmJ2S7EupT2zjoDy60Eu5JuKk+aITyjsl8EnMf+6GoVeFDsDFcIiTGpkC6nnhcA/HOFCId6gxcmzXg1Aze3vNshih9pzMkomFxAfBhtzU0fCPUtvt36o55atSIQtkAoRRGyK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10.08.23 11:50, Yan Zhao wrote: > On Thu, Aug 10, 2023 at 11:34:07AM +0200, David Hildenbrand wrote: >>> This series first introduces a new flag MMU_NOTIFIER_RANGE_NUMA in patch 1 >>> to work with mmu notifier event type MMU_NOTIFY_PROTECTION_VMA, so that >>> the subscriber (e.g.KVM) of the mmu notifier can know that an invalidation >>> event is sent for NUMA migration purpose in specific. >>> >>> Patch 2 skips setting PROT_NONE to long-term pinned pages in the primary >>> MMU to avoid NUMA protection introduced page faults and restoration of old >>> huge PMDs/PTEs in primary MMU. >>> >>> Patch 3 introduces a new mmu notifier callback .numa_protect(), which >>> will be called in patch 4 when a page is ensured to be PROT_NONE protected. >>> >>> Then in patch 5, KVM can recognize a .invalidate_range_start() notification >>> is for NUMA balancing specific and do not do the page unmap in secondary >>> MMU until .numa_protect() comes. >>> >> >> Why do we need all that, when we should simply not be applying PROT_NONE to >> pinned pages? >> >> In change_pte_range() we already have: >> >> if (is_cow_mapping(vma->vm_flags) && >> page_count(page) != 1) >> >> Which includes both, shared and pinned pages. > Ah, right, currently in my side, I don't see any pinned pages are > outside of this condition. > But I have a question regarding to is_cow_mapping(vma->vm_flags), do we > need to allow pinned pages in !is_cow_mapping(vma->vm_flags)? One issue is that folio_maybe_pinned...() ... is unreliable as soon as your page is mapped more than 1024 times. One might argue that we also want to exclude pages that are mapped that often. That might possibly work. > >> Staring at page #2, are we still missing something similar for THPs? > Yes. > >> Why is that MMU notifier thingy and touching KVM code required? > Because NUMA balancing code will firstly send .invalidate_range_start() with > event type MMU_NOTIFY_PROTECTION_VMA to KVM in change_pmd_range() > unconditionally, before it goes down into change_pte_range() and > change_huge_pmd() to check each page count and apply PROT_NONE. Ah, okay I see, thanks. That's indeed unfortunate. > > Then current KVM will unmap all notified pages from secondary MMU > in .invalidate_range_start(), which could include pages that finally not > set to PROT_NONE in primary MMU. > > For VMs with pass-through devices, though all guest pages are pinned, > KVM still periodically unmap pages in response to the > .invalidate_range_start() notification from auto NUMA balancing, which > is a waste. Should we want to disable NUMA hinting for such VMAs instead (for example, by QEMU/hypervisor) that knows that any NUMA hinting activity on these ranges would be a complete waste of time? I recall that John H. once mentioned that there are similar issues with GPU memory: NUMA hinting is actually counter-productive and they end up disabling it. -- Cheers, David / dhildenb