From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D51E1C52D7B for ; Thu, 8 Aug 2024 15:34:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DA586B00C4; Thu, 8 Aug 2024 11:34:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 38A906B00F5; Thu, 8 Aug 2024 11:34:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29FE36B00F6; Thu, 8 Aug 2024 11:34:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0C1016B00C4 for ; Thu, 8 Aug 2024 11:34:05 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 87AE6A1469 for ; Thu, 8 Aug 2024 15:34:04 +0000 (UTC) X-FDA: 82429474008.13.6BC6FB1 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf23.hostedemail.com (Postfix) with ESMTP id 974E3140016 for ; Thu, 8 Aug 2024 15:34:02 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oCmB763x; spf=pass (imf23.hostedemail.com: domain of 3aeW0ZgYKCEQykgtpimuumrk.iusrot03-ssq1giq.uxm@flex--seanjc.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3aeW0ZgYKCEQykgtpimuumrk.iusrot03-ssq1giq.uxm@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723131177; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lZ0wde8if6L/uO3sLYu5HFWeUlZmqm4xvJMbv4zJJuU=; b=8cWycTFW6mIoMOV4atCETY/wCQxz8+SsBUR/C79NMRCkcFC7r1VmrzppRKXrj3LKOs5ITz SKPRVlKrTJw1lA3D/uu2IOIqN2moKCXsyNgN28iy0z1Rz5aQ3QQKKoYD98lHKVfSjeJmWz trDm9voVHzQQK9QSxaXayPDbDe37YFk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723131177; a=rsa-sha256; cv=none; b=0nQ5fD9uLKsiv5cTiB3emMHKJjlp+13P9j0yQ5OhBslbtYTpkgcp4hTIvS/o1koOpHyXn9 OYrw28WluCg9WklWINa6H8JLjae8B7p9kbTA2+yjhEyTlOv6MrbMCROXGKwuNpWiVi6yYQ nMueGfkeUgvLEVXC31aaggjXrGmeDc0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oCmB763x; spf=pass (imf23.hostedemail.com: domain of 3aeW0ZgYKCEQykgtpimuumrk.iusrot03-ssq1giq.uxm@flex--seanjc.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3aeW0ZgYKCEQykgtpimuumrk.iusrot03-ssq1giq.uxm@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-70d1831ae05so1036733b3a.1 for ; Thu, 08 Aug 2024 08:34:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723131241; x=1723736041; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lZ0wde8if6L/uO3sLYu5HFWeUlZmqm4xvJMbv4zJJuU=; b=oCmB763x3TvOmcutzvQ/6TEFVEnVICFnwrWGt1jd7CFi7sx3B71XOCr9SauYZOYz2A /O0aqv1ZrdITK8n4S3jeKv1g5SrJJhad8IP+d90aFHiSZtAp9s/tKc2WlAqjWzHFKWOf ddT+qjHd1DHGZyXBHO2EdlVkIJ+/NLwEMv6a4IwFWMYwa8zEruhYHxHmr+nOPE+zWynt LbNM+ynpnnrddukS4khwIZhPlipyR6kVqmCGs+lvL08T/whDVCp97bPLOoHexm9/Pnmo R2QfNm9LFJHReoUjjnJaWDxA+LFRxv5EmRm40IukFRArfkDw+qmDOYdv5zjyfXIm8KB3 PauQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723131241; x=1723736041; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lZ0wde8if6L/uO3sLYu5HFWeUlZmqm4xvJMbv4zJJuU=; b=it+0Jq73Nk9qq8moBZ14F4Eng/FuqZJkol1eFBN7HJLq7O2PY9cWhz0q1oICnDpzB4 YVUICddsr4QaeIqU2267u9iKUIsKYr2KvMGtaIOikEHUWFcuqfKK7FX6hFQmc2JBdWHa FED/5SuTtPhZlEyTKukPDb1pjipULPVeU2iuHOzmZscC7lplQvl5TPxSUote4paQzyS8 NgFlgXCoocj515UNkJy260oDTWgeTRZcY9UYxFmyRLF8YjPgebLFqw++z8GFz9QjTl5j LoiRqvKWtFsNgNU7bGZXK37FwiO3dNnzvS4XpqmDVuHkyG+7lj6AUrT6k6j1KlBUE/uX R+dw== X-Forwarded-Encrypted: i=1; AJvYcCUlw39SoCsn39E3vVIHLjJewfMhoNbKxAExddj97PXv86ubQrG8LknnzWFFRIxtmscO7Tz4o4LJAVSGoP/ufT22m3s= X-Gm-Message-State: AOJu0YyJVAl0bnyhadd9nI2NLVw+sKLlOvl4UasdAFzL2DUbCMaceB4Y 2dTsb7tiES4su67R4S3v2UtSAl/gOE2iiCwfbnzsXGqI0SmxGAynDXMpzYkqhrdnx7/EE/mcLV7 vSQ== X-Google-Smtp-Source: AGHT+IFW12G/8DTseTogt5MhAKcrgi/3fN+1gRYM4X6/jrZk3eDJGH2Gd3Qqt40eZzOvum4F7HsKf9TeReU= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:6f25:b0:70d:3548:bb59 with SMTP id d2e1a72fcca58-710caf2dba9mr8014b3a.4.1723131241056; Thu, 08 Aug 2024 08:34:01 -0700 (PDT) Date: Thu, 8 Aug 2024 08:33:59 -0700 In-Reply-To: <20240807194812.819412-3-peterx@redhat.com> Mime-Version: 1.0 References: <20240807194812.819412-1-peterx@redhat.com> <20240807194812.819412-3-peterx@redhat.com> Message-ID: Subject: Re: [PATCH v4 2/7] mm/mprotect: Push mmu notifier to PUDs From: Sean Christopherson To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Aneesh Kumar K . V" , Michael Ellerman , Oscar Salvador , Dan Williams , James Houghton , Matthew Wilcox , Nicholas Piggin , Rik van Riel , Dave Jiang , Andrew Morton , x86@kernel.org, Ingo Molnar , Rick P Edgecombe , "Kirill A . Shutemov" , linuxppc-dev@lists.ozlabs.org, Mel Gorman , Hugh Dickins , Borislav Petkov , David Hildenbrand , Thomas Gleixner , Vlastimil Babka , Dave Hansen , Christophe Leroy , Huang Ying , kvm@vger.kernel.org, Paolo Bonzini , David Rientjes Content-Type: text/plain; charset="us-ascii" X-Stat-Signature: 4ij5k18pze7qdqhfz9duoe8t83ma7eaj X-Rspamd-Queue-Id: 974E3140016 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1723131242-249548 X-HE-Meta: U2FsdGVkX1+At9/rG+RMClC+RJZ5fWdx+32xJpPwSGDmLOdBrQyyue8psbr/f304RK5Hu7RYvL8a6BSVeFAn7EmJ4Eme671bn1f8S0jd0LWmLhAOfjd2tTp0OxydA8FmBGaG53OCbkm0d2qKp//TyCH8kq3WO5fwakKKhGNI+8bE5pQjDMhxeH/L7MTVILQxq3C4UD8tonJvaFOukhSvxke/sjHyEvsQWto+uTsMGbmwaYoKxjRHkeY26fiBkLGP8dwm2LLkQr8uhQKWh/UrPm0bDxQvKEOWiULI7a8G9FciV8wI3guJ/bRr1dCn0GKycqNdyUzpNTnkaZ03gLPxh1JM260+0Ob4QO5PXb8+oX5aD9PU0xHFgyOQrPRQ8c99cH+x0Wh8ZK4Uju5vZnbDPi0jGBNX02dQDjTai4KpnbyljR8NcA0x0R0h7YMXoOsbhjvEyX7THmJKTp42gdeoMq4YMAQPoHrjuqYleolSFVKxTMTilTduuarRkcCl6/EQG72b+Bb7r2p2Z0z12xLNwiIKTHFFchJxFcBKCdKPX42thXnSs2Q8S1PB70tFkKzWwqiTJHTpkTGII7r44vC5viPERRmyR98NVGGORjZ3NOIxZ+fafapwGoobZiX2pG/CgXJBDCwpFAcGLqMO0d9ReyzWt1P/wL3P/YbsgVV1RdUFF3pCDMVE41Rdb6OqF+qtCEWPxQWwj7wa8MqKV3ZOw8kQNF9edCc1w8Ceucec5pfCpVJlUO/OpBh86Za5eZ4PDOGQ5/CmObgcPgeY8GHFc+YakVSebscxltdr181pf3JXGRD85Ei1YtQuSFO3GVMusakyB24NliZR/c3Eo7lo/psJksXj7OudvHS3tnNun43gcll0ShRdc1gfB2F52K76zCIaBVr8SqJxOZUSkmKZ7kVmIiA9ToXuE5ggQtrg2m7MeSGqHy6bMciCZ1XlW8qE090IVSPJefGuIZAr514 vpS07BGw WK9bsKup8yCPfquhvSjyW7i67w+hFHQDzhWkDBKPaXkvbDPGzOkF/bndr9R/j2+ZBpV9zM5Kjm0Md1Jxn6Z9HzicyUzxZZ07EeSz8M5dsvbgVdVG38toMrjswYsRH4H/zraOANFhF6g4dlw+5KghogzWTkwSCAGlGfJqLvGWY3ii3+r9fLzv8HXNfaVVEb06gscnZa0NbE5sBH1vtuncSxCCefihSZaJxIwOZakIkKZ/uG+6IJQum7cFZfln1QHVUQ2N+/IvRvVxlYvhZQlEdz7pD6e9N9JQ/sBNEcsgUY3LXn4YkGOG+jdxQWgBEGRZJYUZ2nSq5f64uBvq6/2IuEYpysUzg0ka5qBLgr7UR7JvKd5XDiE58cZRVc/+VR2y/IKI4dqxY/bYCL2tRinOOopNYkrnW5Rhw82rggM3GCP5pkT3vXnObaK1jDYbhkVTeE5vhUdbKM848ZsAxP7clJEPJ1vM2Pn1TTLcTtW4AY2UC1B+w4Vo2/DaUlZ8y1EEqsWa3k8F3ifIb5kkqPrm8k259kMsbwJpij9ooa0yoWp1JtHeA6Q8B4OK1ChFk+F40ULAn1ARMEi1kesPAQYSpc7PFxkvn2hhxhZE/av+ZQIjYHxI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Aug 07, 2024, Peter Xu wrote: > mprotect() does mmu notifiers in PMD levels. It's there since 2014 of > commit a5338093bfb4 ("mm: move mmu notifier call from change_protection to > change_pmd_range"). > > At that time, the issue was that NUMA balancing can be applied on a huge > range of VM memory, even if nothing was populated. The notification can be > avoided in this case if no valid pmd detected, which includes either THP or > a PTE pgtable page. > > Now to pave way for PUD handling, this isn't enough. We need to generate > mmu notifications even on PUD entries properly. mprotect() is currently > broken on PUD (e.g., one can easily trigger kernel error with dax 1G > mappings already), this is the start to fix it. > > To fix that, this patch proposes to push such notifications to the PUD > layers. > > There is risk on regressing the problem Rik wanted to resolve before, but I > think it shouldn't really happen, and I still chose this solution because > of a few reasons: > > 1) Consider a large VM that should definitely contain more than GBs of > memory, it's highly likely that PUDs are also none. In this case there I don't follow this. Did you mean to say it's highly likely that PUDs are *NOT* none? > will have no regression. > > 2) KVM has evolved a lot over the years to get rid of rmap walks, which > might be the major cause of the previous soft-lockup. At least TDP MMU > already got rid of rmap as long as not nested (which should be the major > use case, IIUC), then the TDP MMU pgtable walker will simply see empty VM > pgtable (e.g. EPT on x86), the invalidation of a full empty region in > most cases could be pretty fast now, comparing to 2014. The TDP MMU will indeed be a-ok. It only zaps leaf SPTEs in response to mmu_notifier invalidations, and checks NEED_RESCHED after processing each SPTE, i.e. KVM won't zap an entire PUD and get stuck processing all its children. I doubt the shadow MMU will fair much better than it did years ago though, AFAICT the relevant code hasn't changed. E.g. when zapping a large range in response to an mmu_notifier invalidation, KVM never yields even if blocking is allowed. That said, it is stupidly easy to fix the soft lockup problem in the shadow MMU. KVM already has an rmap walk path that plays nice with NEED_RESCHED *and* zaps rmaps, but because of how things grew organically over the years, KVM never adopted the cond_resched() logic for the mmu_notifier path. As a bonus, now the .change_pte() is gone, the only other usage of x86's kvm_handle_gfn_range() is for the aging mmu_notifiers, and I want to move those to their own flow too[*], i.e. kvm_handle_gfn_range() in its current form can be removed entirely. I'll post a separate series, I don't think it needs to block this work, and I'm fairly certain I can get this done for 6.12 (shouldn't be a large or scary series, though I may tack on my lockless aging idea as an RFC). https://lore.kernel.org/all/Zo137P7BFSxAutL2@google.com