From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9704C3DA4A for ; Thu, 8 Aug 2024 21:31:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 494D56B0089; Thu, 8 Aug 2024 17:31:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 41EB96B008A; Thu, 8 Aug 2024 17:31:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26FA06B008C; Thu, 8 Aug 2024 17:31:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 06B136B0089 for ; Thu, 8 Aug 2024 17:31:25 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 889001201B7 for ; Thu, 8 Aug 2024 21:31:24 +0000 (UTC) X-FDA: 82430374488.24.C74A1A0 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf24.hostedemail.com (Postfix) with ESMTP id B57FE180013 for ; Thu, 8 Aug 2024 21:31:22 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Et4wvgp7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of 3KTm1ZgYKCKweQMZVOSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--seanjc.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3KTm1ZgYKCKweQMZVOSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--seanjc.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723152634; a=rsa-sha256; cv=none; b=HyOhieYNV7BRuAzv8p+cKjURVTLRuvYb2qRq8ItskqbjC3zbzTpC6F1B0t4BBCHE3okCOH zUo7hiB68OU1GN5tA4PVgUgUH/KXmSbk12UVTF40crV4eJDNTkea0gv44IzAvbsR1FrJym gl08h0FqYbavL9I2LPwBGsotcqnuC1A= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Et4wvgp7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of 3KTm1ZgYKCKweQMZVOSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--seanjc.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3KTm1ZgYKCKweQMZVOSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--seanjc.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723152634; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YuGkjDtUlaXTtwi32m/glrLrjhcbyitEhoCA9sFmL3A=; b=XkWxYmKgY0VNGe1snHiZ6ufI8MGX5hao2InSFVjjoY3Ptnd0cFkboEGJvepjCDJAMOK4GQ VqG5uRcU6GFsakmQC5Km5chFdIC1y3uxAj5edvC0gmafh4k6hD9SQJnh/EuLXEgbL2ucmp ZRmf4TPbVruhBE0lwl+27jtB0X8vpT8= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-5e4df21f22dso1238268a12.0 for ; Thu, 08 Aug 2024 14:31:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723152681; x=1723757481; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YuGkjDtUlaXTtwi32m/glrLrjhcbyitEhoCA9sFmL3A=; b=Et4wvgp7VJFkXXWH1/WVXIoyusXVDNeH0rdjNQfgrNmGjUdc/kBzWloWwb6IgD2Bob NEA47nGHNa/jeKtFhyXXe9WmLZIQwkUeLiJoFbBHMdDxID5DyIxqLqeCve0YujZ3iqO8 XT+WlzxlNMRcauCABtPuGBj0GNTGReoDk+U9grvSVe8tNFxWVYX5LbyO7/P7v6MESIZf FcT2lTZylLCIm/neM9SYpMGU5Za724VbMsPX7JY0NVX5w6BlL4rNB9WmzRkUqG2bU6ls v8NkJEy+9YL3cHhzcoWJWm7uGKG3EkNSXmNVcpve2k9BIj0DxYRBnxMbyjsRniC143s/ lRwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723152681; x=1723757481; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YuGkjDtUlaXTtwi32m/glrLrjhcbyitEhoCA9sFmL3A=; b=bTa1ywJts2E4jIKutTkEWdP/azBCwAooKI9OcwZ/5FFPEL21UzBK9eXbvSvfCYp1FW BacMlO1Rg3qmIoIN/YOGwgx6B3DX37V7yxVIjTnCQ8L/9aTBR6kCox5vMrRIVgJuigBH ChX2ZusOsiNJ9lggvk7cc7mmr9n3/GxMw4o8kFT3mj7eSW9CTK+EhdYoh7TXZkPA7+jo Q+82Sbk6HPJ3YlGXinmzyjkgLtDjkKeLREF8hf1fA+vrUXRNNcec5gNmVrw/Qr/8KpED C+JuErqoP9Vpqn6C84fiNJEZMuMCOeA3G6HmMnK4/5p5f3D42z67qAbP9P8iXKDwjatr Xwkg== X-Forwarded-Encrypted: i=1; AJvYcCUUsCbcGAXXkkdRHyZLZW7PJGAhTDP/iFJe+1YOopO2D9FGzJi35kt5cMD/lyKk+yhrsxmooXNx5pAXXgk9A7Guv5w= X-Gm-Message-State: AOJu0Yzh2Eh/r33B7N61HLB+mfkZuXxXv9qQHyOoVhyE2jk0eKbHKHD9 3j1AATPvOpj7zZgC+N4sRw/zhX9qE0KrPh7vJ+B10oCJ09Z9YuzHOBanfyeKQWPjtD2NSL1qKA+ S+w== X-Google-Smtp-Source: AGHT+IEP73Lm+ojwFY0B1vludlYBr1Am2NVgxUI9/0Rde9EmSF7EgK1uLnCg73xCWYIY67KH9l4Ygd6Hklk= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a63:6685:0:b0:7a1:1b9f:2b16 with SMTP id 41be03b00d2f7-7c268c1d3d2mr6887a12.2.1723152681108; Thu, 08 Aug 2024 14:31:21 -0700 (PDT) Date: Thu, 8 Aug 2024 14:31:19 -0700 In-Reply-To: Mime-Version: 1.0 References: <20240807194812.819412-1-peterx@redhat.com> <20240807194812.819412-3-peterx@redhat.com> Message-ID: Subject: Re: [PATCH v4 2/7] mm/mprotect: Push mmu notifier to PUDs From: Sean Christopherson To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Aneesh Kumar K . V" , Michael Ellerman , Oscar Salvador , Dan Williams , James Houghton , Matthew Wilcox , Nicholas Piggin , Rik van Riel , Dave Jiang , Andrew Morton , x86@kernel.org, Ingo Molnar , Rick P Edgecombe , "Kirill A . Shutemov" , linuxppc-dev@lists.ozlabs.org, Mel Gorman , Hugh Dickins , Borislav Petkov , David Hildenbrand , Thomas Gleixner , Vlastimil Babka , Dave Hansen , Christophe Leroy , Huang Ying , kvm@vger.kernel.org, Paolo Bonzini , David Rientjes Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: B57FE180013 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 5aeqa48g75bz1skghphdhcpfzmx5zt31 X-HE-Tag: 1723152682-360062 X-HE-Meta: U2FsdGVkX18ZT3yJqiEYMtopajTgtRQp4uansn0wmstdCWa9T6llIWwWKCPBUUyHMoFa3w9sk75Byd+Id7DXlOqajvoR1SdrERMoASv4mPJhsv1qCLgfRoei9SewDXnEFKrCZK1Saekhd0v002DkgzPyuFbfOGorDtRRnm8gRlYhhiHxJ03N8emkLZ8jN8a7sYDTmW3MxeW/y3K7Otys1blmtutKXBJu0D/Qmloui0FzoDXgDn8PiwdbW45sagGlNw4RAuAw2oqMVEH0x7CBCfVoHBfnhQqQp9OtmNATm6/Lf7MkETf0wJetCrlV/EbwX76ZxekJ+ZyTiRKRVKHC16me7I6eLuQ5ACB+57rmHELfepuMR9xoFcFE9FUGchSw25q9TCQ8K5JKQosTzwe3DjC5JhmHKM58WfPKOrB5yUMWZ/WZh8Ik43go7RilCw3VWVcaSRnQpSPW7xsZE14AjRDvfdSoMGx4KUORyfY2pTf6a1BYXCNGFp3INHW9zhU08MIGQsFDPLBUhQYI05vrWy5slI3QcsEtOQD0FYkpMGumkcwvMIQX+XNeVqs7yjlRAs40BF3zA6XzffMHE+IDl5bozzzyTsJIJWPa4itX8zN6rOhj03usJMbKMfzGzxERRY7OeCuTQBAji4sm++k1AVr04cDx6QNSLbsHSbS29hs+dVs+ze6tA2fo8k2/HP/BE2RHHUyf1JgDjKqIh9QpTwS/CDpRMsNAB+TNKMtQjrhC2kPrUUxVgAg2lEzg4XKjn4uV2BIODrFEHKCMuaEez/I/vPpowM+ObzWkxCXqSyS38Wonyozbgo7AiBrCG5WO6ayruoR8UNxymNcJeRO9vC20XjigXr761kOYFNIcj8Lkiv4YMYt8NsCQlo3gdmaNvQcJaXosU07iAdp0B2jkYxiaQDCw5uMQYLd1cO/WAo0zTZmWrmqcFsAu1nqRfcL+HPzb9o7n2Ijku1GDyzE 3xs0wSUc soi4pG17aNZxxXsKsb0WHX9GV6c3Jw2Tg3hYy2EJpBoyJDb7oLRqm5RyGVx1aN+cuz87iwFigBRqZkQ/bRQUfYmPkxj8cNqXATocu2nV+eAL8TxtrXV8imjzys960gEMMyeApLhPOzoFB5SBssX7A+56pLNx0KtwNZi1P+azNRJKDmTWG0Sn1NWKhCPePBQvrIXVQwo92AfgNRPu5mEHeukyVgRZk+ViZ8krsd1LGKQ//Tlt/oOj3i+jJTBKO3yaiT91tmXYn94a9IqfT/EOtmvFi/cfJ0KmPyEXGjTTmv2ThaAoAWh8spY2qgX2GXOymrYVIofOyFsMCYnfLpJZ+nLOxFzTcV9yMNdRT2htuowz+ht9KrTfPKceUeJ+NQ5jc9X6K0wlNZ6oDjh+06U7Tb6E4Wxk39bhvVeip77cEHhpIayR9my+Txzj8ZgOith/sKk8BMy/7Oo83Njp9I74+YAGpKpIsh+7AZX2ghmDdJtMgbZVypFm+uS2foj0tYw54K7q1rcz4Bn2Aeamy7vRudDtSZ51fnVRaCUwCJXYcjiT2QoyP1XD2DIK5+HV+6IlNlY4G X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 08, 2024, Peter Xu wrote: > Hi, Sean, > > On Thu, Aug 08, 2024 at 08:33:59AM -0700, Sean Christopherson wrote: > > On Wed, Aug 07, 2024, Peter Xu wrote: > > > mprotect() does mmu notifiers in PMD levels. It's there since 2014 of > > > commit a5338093bfb4 ("mm: move mmu notifier call from change_protection to > > > change_pmd_range"). > > > > > > At that time, the issue was that NUMA balancing can be applied on a huge > > > range of VM memory, even if nothing was populated. The notification can be > > > avoided in this case if no valid pmd detected, which includes either THP or > > > a PTE pgtable page. > > > > > > Now to pave way for PUD handling, this isn't enough. We need to generate > > > mmu notifications even on PUD entries properly. mprotect() is currently > > > broken on PUD (e.g., one can easily trigger kernel error with dax 1G > > > mappings already), this is the start to fix it. > > > > > > To fix that, this patch proposes to push such notifications to the PUD > > > layers. > > > > > > There is risk on regressing the problem Rik wanted to resolve before, but I > > > think it shouldn't really happen, and I still chose this solution because > > > of a few reasons: > > > > > > 1) Consider a large VM that should definitely contain more than GBs of > > > memory, it's highly likely that PUDs are also none. In this case there > > > > I don't follow this. Did you mean to say it's highly likely that PUDs are *NOT* > > none? > > I did mean the original wordings. > > Note that in the previous case Rik worked on, it's about a mostly empty VM > got NUMA hint applied. So I did mean "PUDs are also none" here, with the > hope that when the numa hint applies on any part of the unpopulated guest > memory, it'll find nothing in PUDs. Here it's mostly not about a huge PUD > mapping as long as the guest memory is not backed by DAX (since only DAX > supports 1G huge pud so far, while hugetlb has its own path here in > mprotect, so it must be things like anon or shmem), but a PUD entry that > contains pmd pgtables. For that part, I was trying to justify "no pmd > pgtable installed" with the fact that "a large VM that should definitely > contain more than GBs of memory", it means the PUD range should hopefully > never been accessed, so even the pmd pgtable entry should be missing. Ah, now I get what you were saying. Problem is, walking the rmaps for the shadow MMU doesn't benefit (much) from empty PUDs, because KVM needs to blindly walk the rmaps for every gfn covered by the PUD to see if there are any SPTEs in any shadow MMUs mapping that gfn. And that walk is done without ever yielding, which I suspect is the source of the soft lockups of yore. And there's no way around that conundrum (walking rmaps), at least not without a major rewrite in KVM. In a nested TDP scenario, KVM's stage-2 page tables (for L2) key off of L2 gfns, not L1 gfns, and so the only way to find mappings is through the rmaps.