From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B096C3DA4A for ; Thu, 8 Aug 2024 22:45:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD21A6B0095; Thu, 8 Aug 2024 18:45:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B81B36B0098; Thu, 8 Aug 2024 18:45:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4A986B009A; Thu, 8 Aug 2024 18:45:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8D39F6B0095 for ; Thu, 8 Aug 2024 18:45:54 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3032CA02D8 for ; Thu, 8 Aug 2024 22:45:54 +0000 (UTC) X-FDA: 82430562228.05.195E2F5 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf07.hostedemail.com (Postfix) with ESMTP id 6A42E40014 for ; Thu, 8 Aug 2024 22:45:52 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=tumUFgMv; spf=pass (imf07.hostedemail.com: domain of 3nkq1ZgYKCEUzlhuqjnvvnsl.jvtspu14-ttr2hjr.vyn@flex--seanjc.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3nkq1ZgYKCEUzlhuqjnvvnsl.jvtspu14-ttr2hjr.vyn@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723157120; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zNoZqlzt9ZBl78Ypw3YUgXOuKREj1jjRu4BYW9vQ8bI=; b=WGNSWD7AGlFxOWwsRBsWUBM/Iqp8djIFbazUz/6z+Je2/SyE25WUIGQPiTUYlcou9P/IxO pEBKJ/P9lSb5JCo9U4pCG4ys0nvJ/TYv8+C1z4SqXY6RNjq4TNSMB9E9mdghP/8JkMbff4 YUgySK71IpM3Bnv8F5zh0POVMU1NL/4= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=tumUFgMv; spf=pass (imf07.hostedemail.com: domain of 3nkq1ZgYKCEUzlhuqjnvvnsl.jvtspu14-ttr2hjr.vyn@flex--seanjc.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3nkq1ZgYKCEUzlhuqjnvvnsl.jvtspu14-ttr2hjr.vyn@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723157120; a=rsa-sha256; cv=none; b=ZQ24eSoTpF15802xx4mN51ArR44F0CmCUYhMKny3ERLRIKqI5wk1o1vlXKJ9vsAwdcy+k3 NWhN6LjYiysK2MW4nndFvRi0CRY/emat6BFvNJn+/RbwreVNAgZla1yfe7hO4YxT9/dcvq X7iKirvcW0jWBoSLrwaQl9JkvOfpH5s= Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7a994e332a8so1308183a12.3 for ; Thu, 08 Aug 2024 15:45:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723157151; x=1723761951; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zNoZqlzt9ZBl78Ypw3YUgXOuKREj1jjRu4BYW9vQ8bI=; b=tumUFgMvR5G607nm8u9HCzdjP54py73c2ss9+sesYxXfP5+qmCgDV3wORaW55aNJuj kiMOiWKTXqdTDGKWIdyVWBHVYYHRGZ0PsqBANk1aLsysVobIEi3RMeebII9MMLaf1Hmz Ynaq6yo2HVD+XZ2jamgGlBLPtPCmrjmsu33NMg0fTogWyCnJZ7YETCZ2OACmYUf1UCcY d38irHPmE3iLqnxDzgkLe4TFZBNnW+EYiX44Gjp8TUzkmQQ2fzOkGwAthfdHWYd0XS1W MMyLvrvlS6JslwRy/wKVQc7dWX3xsWkxFFq4rutjIciYdTHLVOD0cgWKNTgEfApz0G4v Fz/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723157151; x=1723761951; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zNoZqlzt9ZBl78Ypw3YUgXOuKREj1jjRu4BYW9vQ8bI=; b=V2gv4y1/7N+6xT8UDz2KNU/M5IK4oI8SYwuJZ8N1MmlULqR6gEJ/bsgHYc49FztoIO yd5hUowaKkOQLG6KL0c00dKa1VpUbZICvb5pvDXIybdE6FMdDY/+WFsYdmiYBDhx1jAN R7Y9u1ii4lrkCYZ0lpbES3Fe31nZTL2LudE19OSvDe2YdT6XavQf0E/hM1He9QzwRi/z 9qyhvWiYR6eqLddBuBizC2n9IjYk2GlPn5G0UPZzjXsdNqhHknLfS7NpqL6XY3XSm/JT XP7tw+NxoOx1g74DE0r0N2UKLNL49MCqEXbGIxYnkKeogP74NwCecT5nnhWPd1YVCsth 3dkg== X-Forwarded-Encrypted: i=1; AJvYcCUexTOt5at2sSy+b9WXHm2/Ab1upswNXIACoGeZdaq7WZNXoLkPMuZOudOEt2dKXZJeHaW7Tg9srLI5NDuVGLSlDOU= X-Gm-Message-State: AOJu0YzfNVM3Tp756w/KMeNIYEuqfAeeS0PYG/bshSdzVGbcrO09QVSx Rd3971EHZGFqQLTPMivdOsTAkI36YPGYe6FTLxNFkXGMeiY9GQcSp324cT5nTMFwZbi9udFVbN9 DvQ== X-Google-Smtp-Source: AGHT+IEPJ08ELA+TzWWFCZw07FCvVi/TuALPCbLFSyKyDr+wDhcDE59d0hzyaFMhbl9jQNfxVlpU+d7vYMQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a63:b242:0:b0:6e7:95d3:b35c with SMTP id 41be03b00d2f7-7c2684072femr6690a12.5.1723157150809; Thu, 08 Aug 2024 15:45:50 -0700 (PDT) Date: Thu, 8 Aug 2024 15:45:49 -0700 In-Reply-To: Mime-Version: 1.0 References: <20240807194812.819412-1-peterx@redhat.com> <20240807194812.819412-3-peterx@redhat.com> Message-ID: Subject: Re: [PATCH v4 2/7] mm/mprotect: Push mmu notifier to PUDs From: Sean Christopherson To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Aneesh Kumar K . V" , Michael Ellerman , Oscar Salvador , Dan Williams , James Houghton , Matthew Wilcox , Nicholas Piggin , Rik van Riel , Dave Jiang , Andrew Morton , x86@kernel.org, Ingo Molnar , Rick P Edgecombe , "Kirill A . Shutemov" , linuxppc-dev@lists.ozlabs.org, Mel Gorman , Hugh Dickins , Borislav Petkov , David Hildenbrand , Thomas Gleixner , Vlastimil Babka , Dave Hansen , Christophe Leroy , Huang Ying , kvm@vger.kernel.org, Paolo Bonzini , David Rientjes Content-Type: text/plain; charset="us-ascii" X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 6A42E40014 X-Stat-Signature: yy5ao1gg8pnrsd39q5gx3qfxdkuqqnkt X-HE-Tag: 1723157152-782313 X-HE-Meta: U2FsdGVkX1/Ud0sZMzLxOGCWz+qUedW5rVsPKm1A0n20770RYhdQ74WGHkFTE1l5j+L+eKHIcOQtRpN+EeTVa/StDBaJmK3xVYFYe5xZnGvwfQOuYXp91wf+VifuffrRenOOIsS7vKgSm0PH0+1ELGwqHDE8dG6wt8fuhKmw/90Vo/o+aJk5fXWMQAcCqzguIvlEd+hqbObGoXe+GE2/Nes307YuekRJ6bFD2+ffYDPXoB3JMBwJK9nyKVGAvB6+GVnGOIPHfgo3ItILwq96mYmxVfFD+PuWMqbHVnDzjK195AAXZ4/spdWuOSN79/FYK7DEQdu0dU2MIBYHfOdXRSGJfeP2QklulS3TwjRIm0l3CecZX33Wat0ILNCi+wi0HmPk5BeQa9C277Y+Nkg+UauZO2oDxs/QfNaBD7Auocx17wEL767+9RyRHKv0MlKt5C2pEXwCyZnhhm7I64ih8HfYjelUkMpKXbEq182NFpADcHfaWgXoEpDdGpEReFeGAInUGCcR/PTjf1Ubnp+yiK8AOJVWUqtWDddRmQBr+Il/KrOLMOXV1MCnc0MJwYCu2v4yuMO33anYh6eyAgHcjbqD2KJ0tfetWmdTq6kf9sTbndz0YMBYx+97mztD2NGHcGFqn2GTjUV5MtMEF0+Vle0pFCg94v9cfA+prb17QJ7t/XIfiBX8Mh8GNAtNelmjdrZcd4nDhsUL5/hRO6jNxFma/C4pJK2F9nMM+j10DmxnrrtFYsDrt6jACnOULfAHzx9drC2Z43z1SI8w7da7s38YDSbBUcurm8Q00jnOAycGnOHzd6BwvHaxzbiu3B7JVtR8Yu3GjxENG4jtC9do+RuHmKYnzXv8BKeC2N3CJQN5F+m/ogRui0c9zO8UH51pEabb1sW9wrVjjo+hLocNtifeB6wkq2eek0qKMaYowrOkUMzcej5AS6frQgjNPRSKE7Kda/nWImxJqPE9tDG cNwxsqbg wE2neVrq5Wkrz1r/RepGduRazHTtYoG/tlagM9kEdIkQoYdLmfTZSGM/wa7iUD+ULebLK+6e9f9ci3OfWiticlmGSkJLiY6bX1BMaRJv3bk2ZiV20kAvJOZKUOytTo4ugChBl/D98kUzN/+8dbsdDTFvpbMJ2JsdmTgnaSyxH4vmBjdKgFZUCj5L+xmP35txrzCR1dzE9HDrIfPmE9jwNUXA15Ia2r2mXoEczOWe4f05sYhZzyehCqDS/KoD9sPJ2uhpEULZo6TW5U1LPBV4FE1281mSFRSRGoWfrACTKpxMIipBxnQ8g54x9yl5yS9bBoYNm41cy5RRTBPCFsiSSnlKZ6QprTbR2RYOSuFBxILTl56eS3OEbOIqLbSG+RNUOmOUWko6lvabB3YNWzgWCuBomfkmNznwXZXgSWTun8EV2BXkZZ8lKF+UMySPoQnBRGbV4b/CAD63hFshd5cFUeTD+pHlvXChI4LIYUztPHJV3lmsMjKVR42PQduItwu/h+pfloZ7CByOjDyLd1HB8pi4ALmz3+AC64zt4YPoA3qf9f/Bq+hUPruUZk5c4Yf3NOrzv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 08, 2024, Peter Xu wrote: > On Thu, Aug 08, 2024 at 02:31:19PM -0700, Sean Christopherson wrote: > > On Thu, Aug 08, 2024, Peter Xu wrote: > > > Hi, Sean, > > > > > > On Thu, Aug 08, 2024 at 08:33:59AM -0700, Sean Christopherson wrote: > > > > On Wed, Aug 07, 2024, Peter Xu wrote: > > > > > mprotect() does mmu notifiers in PMD levels. It's there since 2014 of > > > > > commit a5338093bfb4 ("mm: move mmu notifier call from change_protection to > > > > > change_pmd_range"). > > > > > > > > > > At that time, the issue was that NUMA balancing can be applied on a huge > > > > > range of VM memory, even if nothing was populated. The notification can be > > > > > avoided in this case if no valid pmd detected, which includes either THP or > > > > > a PTE pgtable page. > > > > > > > > > > Now to pave way for PUD handling, this isn't enough. We need to generate > > > > > mmu notifications even on PUD entries properly. mprotect() is currently > > > > > broken on PUD (e.g., one can easily trigger kernel error with dax 1G > > > > > mappings already), this is the start to fix it. > > > > > > > > > > To fix that, this patch proposes to push such notifications to the PUD > > > > > layers. > > > > > > > > > > There is risk on regressing the problem Rik wanted to resolve before, but I > > > > > think it shouldn't really happen, and I still chose this solution because > > > > > of a few reasons: > > > > > > > > > > 1) Consider a large VM that should definitely contain more than GBs of > > > > > memory, it's highly likely that PUDs are also none. In this case there > > > > > > > > I don't follow this. Did you mean to say it's highly likely that PUDs are *NOT* > > > > none? > > > > > > I did mean the original wordings. > > > > > > Note that in the previous case Rik worked on, it's about a mostly empty VM > > > got NUMA hint applied. So I did mean "PUDs are also none" here, with the > > > hope that when the numa hint applies on any part of the unpopulated guest > > > memory, it'll find nothing in PUDs. Here it's mostly not about a huge PUD > > > mapping as long as the guest memory is not backed by DAX (since only DAX > > > supports 1G huge pud so far, while hugetlb has its own path here in > > > mprotect, so it must be things like anon or shmem), but a PUD entry that > > > contains pmd pgtables. For that part, I was trying to justify "no pmd > > > pgtable installed" with the fact that "a large VM that should definitely > > > contain more than GBs of memory", it means the PUD range should hopefully > > > never been accessed, so even the pmd pgtable entry should be missing. > > > > Ah, now I get what you were saying. > > > > Problem is, walking the rmaps for the shadow MMU doesn't benefit (much) from > > empty PUDs, because KVM needs to blindly walk the rmaps for every gfn covered by > > the PUD to see if there are any SPTEs in any shadow MMUs mapping that gfn. And > > that walk is done without ever yielding, which I suspect is the source of the > > soft lockups of yore. > > > > And there's no way around that conundrum (walking rmaps), at least not without a > > major rewrite in KVM. In a nested TDP scenario, KVM's stage-2 page tables (for > > L2) key off of L2 gfns, not L1 gfns, and so the only way to find mappings is > > through the rmaps. > > I think the hope here is when the whole PUDs being hinted are empty without > pgtable installed, there'll be no mmu notifier to be kicked off at all. > > To be explicit, I meant after this patch applied, the pud loop for numa > hints look like this: > > FOR_EACH_PUD() { > ... > if (pud_none(pud)) > continue; > > if (!range.start) { > mmu_notifier_range_init(&range, > MMU_NOTIFY_PROTECTION_VMA, 0, > vma->vm_mm, addr, end); > mmu_notifier_invalidate_range_start(&range); > } > ... > } > > So the hope is that pud_none() is always true for the hinted area (just > like it used to be when pmd_none() can be hopefully true always), then we > skip the mmu notifier as a whole (including KVM's)! Gotcha, that makes sense. Too many page tables flying around :-)