From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1C3CC3271E for ; Mon, 8 Jul 2024 16:51:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56F186B00AA; Mon, 8 Jul 2024 12:51:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 51F976B00AC; Mon, 8 Jul 2024 12:51:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E6696B00AD; Mon, 8 Jul 2024 12:51:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1E2C06B00AA for ; Mon, 8 Jul 2024 12:51:32 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9F7381C2C01 for ; Mon, 8 Jul 2024 16:51:31 +0000 (UTC) X-FDA: 82317176382.11.599A468 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf03.hostedemail.com (Postfix) with ESMTP id C20DB20003 for ; Mon, 8 Jul 2024 16:51:29 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=aTLD+HcE; spf=pass (imf03.hostedemail.com: domain of jthoughton@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720457474; a=rsa-sha256; cv=none; b=5zaBXapv6pvELbmV+9BxyOHfcZlWTiwAyCw8Em6KTViARnrRhKNrMmsys6dtVqq9Cjbevp gz8XW9cAOGeyG958IQPypwGwVl6LGhQwzc6Kp4KrCxeM5BAmST9u05LPN6P/HVPmeWSgo/ m3Smb9GMtePIVkxL5SAKhRIFZs6Wdvo= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=aTLD+HcE; spf=pass (imf03.hostedemail.com: domain of jthoughton@google.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720457474; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZTLNn4LGmic6uz3SST2MPov4JwAtgFvBXVkhyUNj6Ls=; b=HC7ZeCuWCHn7X/Zhv4bu6T39O9vzSEYENODXMaGZj6M5/K9pXeR40BoCfYWn5iHLGNYKHD m6NHhxQLaK6H3ukH+gXBThVGrmn+zLlJSXMPl5GObJnbwHWlICK8ru8ZZlMCciAxmyslm/ TYLD6zryrQCPh1HN61Vn1Ycjwr/NoxM= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1fb67f59805so3935ad.1 for ; Mon, 08 Jul 2024 09:51:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720457488; x=1721062288; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZTLNn4LGmic6uz3SST2MPov4JwAtgFvBXVkhyUNj6Ls=; b=aTLD+HcEcZsCEgfmKXxDC/+yFzOe5+QsczyCEscVc6gEQNV9VrfEZLJFHhqEZ7hBxt 0q0jbw6CcFoDmECWtKA7cR8bHIYH9JibXZaAWzrB7KAO+lP8CU4MlAWJZwMApCmtX1k7 I9V6EVmAjPs0/YPjyKg6vF9LPfqbRflM+6CqOsR1fGuL8DSMw8Io5hTk7S85snwbrWRK cHSnL0T9urcFu+r1MJzQfXztpJMYGATdXeQNVmTGgtJUrBOlcf+9V4xMaCBZfSQ3bNB/ B2oZVnfqdZfYm0qPdn9ZabLTiw09uFEbO89Z1r0LR+FwnVX7rRzcKlxkZtwllW2RiYV1 mDQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720457488; x=1721062288; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZTLNn4LGmic6uz3SST2MPov4JwAtgFvBXVkhyUNj6Ls=; b=TTow4+p+UDhdaTK1XowaCSH0DNkqXSMcp91VGdbMRngzEFU62SAnikHTMcOos4hCIW AOLtMq0ntKYn25SX/jkNnmStjcZWKIFQn6cuKcksFkqF0cnNBxVZZDHO5YoOzf/mYDVG WpKnha50tZLkMACUUpt/TgeWs2y4DjEimGWsjK91uvCP6G91woq28r9KaPrs2HL+fM+z mxCyt1xy5jh85m936ez6qSlMCp6nG5mqb/+5d1ltuPI7oNNvLYrAP97rAE5QzE2dljpH ml+Ud0f9hG8SJeBsistwG5++3ZHA4OT1uzYjVgYThOEsn0VHjjpMy0YZ0bqY9J5nluWv R9oA== X-Forwarded-Encrypted: i=1; AJvYcCXtLtFLDisAKyCtnowXI7id9+usaf4CtX8UixH9llzCITvAixp1zBBBBRVneA4SMQWRMGM+aKBtwJymsPq5Isw7q64= X-Gm-Message-State: AOJu0YzeJfP1lZfdimEAE0k5WB45fGqTForWEVXpU94H3xsFImFl4aWc xl0ffjxZbknksY3qS9NBhBxaRDUazZS2qD0vHRzGdbX8OJYw2ZpiTsT+spNMky+7o6CTr1O+OTA SmMd3uqHB8UIZHWcbNoBHiqbl22Q232SKzVKI X-Google-Smtp-Source: AGHT+IGMZBuODG5Qg+BsiQ0JMRrEsApbDVZodAf/W97JpFbZBYOVdGRDLcd2wERkfLVJkW8bcAp1r3ymFzBf2tplW5Q= X-Received: by 2002:a17:903:6d0:b0:1fa:cd15:985e with SMTP id d9443c01a7336-1fb30b895c8mr9496525ad.6.1720457488179; Mon, 08 Jul 2024 09:51:28 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: James Houghton Date: Mon, 8 Jul 2024 09:50:51 -0700 Message-ID: Subject: Re: [PATCH v5 4/9] mm: Add test_clear_young_fast_only MMU notifier To: Sean Christopherson Cc: Yu Zhao , Andrew Morton , Paolo Bonzini , Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: zm1brcgshtthhd5gdfcy3rczky9g59ek X-Rspamd-Queue-Id: C20DB20003 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1720457489-565168 X-HE-Meta: U2FsdGVkX191XZh2Iar6BZGaoqKiTdsF38u4/7i1XvkRUapDSNeIvkpmqx4z1NnKOMnLnQqjc+/dg32CT60Zbh/bRJprEny9cHybxvCm3ISZVQnVA7UMbbtqSOK7TeSJ78MCEj2HiOngIvIidWAPPY0wT++RfCfG+iZxWIwC3XOVO3gmPfzGO47JNZBf0XAc3CYI4OU4zhf2n2anWdl0GFW//+l9xweOEMuW5jtYC60SLlNfaxyfHd/FW6zUfQLGoXgiKI7M6zy3/LwceRz1PGFGAu16KapzgXzVIqJffUPSyFwXQ5ad0PJmNUZlGQndUWIoV+8LR2kQhHmxi1SMdcaC3pQRVmrfwwdrMG7y3b3esbjkNReyvX3N3RHHougLNpyTh9ci3+O1mMbrxQwpD4QrMGGnhR9S5rpRKHslr9h/YQTszT6lYBEbmE5a/62nHpyy/Oe5s61cnMZCGWCqNRcjeovcFIwMgAVAB0U9QwoI0qZuckLX87qsewuXPxFlyR+NehJCLkKnSH2u/rrx5yJbPaVMnfQmKNsLF81jLU1YekylbCSeSWZ3uT8uY7NrfhSQnTa3x7/VgH3SC1Gzjrb6yiAiz5DE+YOyZGGQDjrzHGo5LGrNWm1l/51c9hK7rzhaBftuAjFTrR0rwCOUYDkeNmdc9SbP/Cz/k9NgeTReXyh+w6loy4LL8L6utkcCNrdLXrHs7ham4Do/sK+Tuvz1C+srNT1blyGLpZ1v6imO51l+FHxokZiJ/KlBLEH9FgZ6V6ql51TdYg9YPkUKkY/BE8NusPafF+6kEL2pENckMx+meAw6xeD2m9abaPYsg8QqM79Swz6Fzvw2LChD8fkZPe2JGFl3eC7Hc0UDnMh0FvFx5v1NjgumDyyUHqN9uEbliXwePLp/Q3Udet7eV/ohTm4OXdfklwejVHDdyNhG6vSGayBuGoKc3nk6GHYbCnGsa1PSm7ZUp1RKsL0 ej2i2xM1 pKzpuUHHS06N9siLOvBYS6sLHcoNTRix+Ej5arzC7uFznDlbZaS9tylqghTy8VHeQSSnk2p8mFiJiNGOlHRqZP3aOH7ASKKkRC4QR0mwSXPZGp0KEyMwudKPmu9wIKL9e3+0m80+A4D5cL1LTlS2XFEmcyqGXwUpV12jZkv6jMIyiUQvqlx4uAIpCvIjLoTJYRh7GfvfyrXk05Z8MtyoUZoyljDeXPUg+WIxQSWhteQQ0o6HtwUwANNvegURCZNZC8ay948MeAr6mk3hhW/hd6RhZ7MktnowkFZFaA6stcgMXE4/7C1LPq8Kl7oKEzQzYxdrg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 28, 2024 at 7:38=E2=80=AFPM James Houghton wrote: > > On Mon, Jun 17, 2024 at 11:37=E2=80=AFAM Sean Christopherson wrote: > > > > On Mon, Jun 17, 2024, James Houghton wrote: > > > On Fri, Jun 14, 2024 at 4:17=E2=80=AFPM Sean Christopherson wrote: > > > > Ooh! Actually, after fiddling a bit to see how feasible fast-aging= in the shadow > > > > MMU would be, I'm pretty sure we can do straight there for nested T= DP. Or rather, > > > > I suspect/hope we can get close enough for an initial merge, which = would allow > > > > aging_is_fast to be a property of the mmu_notifier, i.e. would simp= lify things > > > > because KVM wouldn't need to communicate MMU_NOTIFY_WAS_FAST for ea= ch notification. > > > > > > > > Walking KVM's rmaps requires mmu_lock because adding/removing rmap = entries is done > > > > in such a way that a lockless walk would be painfully complex. But= if there is > > > > exactly _one_ rmap entry for a gfn, then slot->arch.rmap[...] point= s directly at > > > > that one SPTE. And with nested TDP, unless L1 is doing something u= ncommon, e.g. > > > > mapping the same page into multiple L2s, that overwhelming vast maj= ority of rmaps > > > > have only one entry. That's not the case for legacy shadow paging = because kernels > > > > almost always map a pfn using multiple virtual addresses, e.g. Linu= x's direct map > > > > along with any userspace mappings. > > Hi Sean, sorry for taking so long to get back to you. > > So just to make sure I have this right: if L1 is using TDP, the gfns > in L0 will usually only be mapped by a single spte. If L1 is not using > TDP, then all bets are off. Is that true? > > If that is true, given that we don't really have control over whether > or not L1 decides to use TDP, the lockless shadow MMU walk will work, > but, if L1 is not using TDP, it will often return false negatives > (says "old" for an actually-young gfn). So then I don't really > understand conditioning the lockless shadow MMU walk on us (L0) using > the TDP MMU[1]. We care about L1, right? Ok I think I understand now. If L1 is using shadow paging, L2 is accessing memory the same way L1 would, so we use the TDP MMU at L0 for this case (if tdp_mmu_enabled). If L1 is using TDP, then we must use the shadow MMU, so that's the interesting case. > (Maybe you're saying that, when the TDP MMU is enabled, the only cases > where the shadow MMU is used are cases where gfns are practically > always mapped by a single shadow PTE. This isn't how I understood your > mail, but this is what your hack-a-patch[1] makes me think.) So it appears that this interpretation is actually what you meant. > > [1] https://lore.kernel.org/linux-mm/ZmzPoW7K5GIitQ8B@google.com/ > > > > > ... > > > > > Hmm, interesting. I need to spend a little bit more time digesting th= is. > > > > > > Would you like to see this included in v6? (It'd be nice to avoid the > > > WAS_FAST stuff....) Should we leave it for a later series? I haven't > > > formed my own opinion yet. > > > > I would say it depends on the viability and complexity of my idea. E.g= . if it > > pans out more or less like my rough sketch, then it's probably worth ta= king on > > the extra code+complexity in KVM to avoid the whole WAS_FAST goo. > > > > Note, if we do go this route, the implementation would need to be tweak= ed to > > handle the difference in behavior between aging and last-minute checks = for eviction, > > which I obviously didn't understand when I threw together that hack-a-p= atch. > > > > I need to think more about how best to handle that though, e.g. skippin= g GFNs with > > multiple mappings is probably the worst possible behavior, as we'd risk= evicting > > hot pages. But falling back to taking mmu_lock for write isn't all tha= t desirable > > either. > > I think falling back to the write lock is more desirable than evicting > a young page. > > I've attached what I think could work, a diff on top of this series. > It builds at least. It uses rcu_read_lock/unlock() for > walk_shadow_page_lockless_begin/end(NULL), and it puts a > synchronize_rcu() in kvm_mmu_commit_zap_page(). > > It doesn't get rid of the WAS_FAST things because it doesn't do > exactly what [1] does. It basically makes three calls now: lockless > TDP MMU, lockless shadow MMU, locked shadow MMU. It only calls the > locked shadow MMU bits if the lockless bits say !young (instead of > being conditioned on tdp_mmu_enabled). My choice is definitely > questionable for the clear path. I still don't think we should get rid of the WAS_FAST stuff. The assumption that the L1 VM will almost never share pages between L2 VMs is questionable. The real question becomes: do we care to have accurate age information for this case? I think so. It's not completely trivial to get the lockless walking of the shadow MMU rmaps correct either (please see the patch I attached here[1]). And the WAS_FAST functionality isn't even that complex to begin with. Thanks for your patience. [1]: https://lore.kernel.org/linux-mm/CADrL8HW=3DkCLoWBwoiSOCd8WHFvBdWaguZ2= ureo4eFy9D67+owg@mail.gmail.com/