From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B50AC7619A for ; Wed, 12 Apr 2023 01:44:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA5B46B0074; Tue, 11 Apr 2023 21:44:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B5679900003; Tue, 11 Apr 2023 21:44:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1D60900002; Tue, 11 Apr 2023 21:44:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9058B6B0074 for ; Tue, 11 Apr 2023 21:44:15 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4FBE71A0E91 for ; Wed, 12 Apr 2023 01:44:15 +0000 (UTC) X-FDA: 80671043670.03.2D7E60B Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf15.hostedemail.com (Postfix) with ESMTP id 86330A0002 for ; Wed, 12 Apr 2023 01:44:13 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="fZF2nn/S"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of rientjes@google.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=rientjes@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681263853; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2pzRhJ2LtUsn5TMFMaAMCLV1FD904hTMVU+g5i12Gec=; b=BcJxBZDVuojEKPLB5hn+KL86062qhvgsbUdfTGR+yCeHOM3cpj0XImHnYAjM6XOlp1XRa0 tFmD7UiyqhCVmsi3CdF1oZj4xjLCIlaaSz6vark9FlD+PVFd/ph+fngsATCuS0H01yfe4p efwAAk5i1SABV5qSuoTA7Nk2ezt6uf0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="fZF2nn/S"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf15.hostedemail.com: domain of rientjes@google.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=rientjes@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681263853; a=rsa-sha256; cv=none; b=xcgM86laJqMuk44CexBSeEUZ6xYTb+ONCrDNVawfdF+RKhw8s5bmpemF6WvDRqkggFjdj+ AhFyPvY2yz/pUvmqDQnslxwrooU1/26LgPeNITWO7Qsv7KEQDLbPcw96oDbtXdOIBwoQHw PJwlmm+7t5GeE0+pQUgb0MlRVhTt0RM= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1a2104d8b00so82355ad.1 for ; Tue, 11 Apr 2023 18:44:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681263852; x=1683855852; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2pzRhJ2LtUsn5TMFMaAMCLV1FD904hTMVU+g5i12Gec=; b=fZF2nn/ShVI1xSHwGHPFhZeYtV3fhia8ds2PBBaP9lrnFkFtxPDX3wHo89LKSIcgCj 0mNIc3K4v3zb2afZSvGaHsghh2ZpmkbeIckgzQEvIySO+LJNGfM3SWtAj6+HjDeDDPaa rJV5N1Ag8zcdPS5XBrfVIE5bQjSanFPiBWBnauHQYwXUaz1ejT30omxSkyXZ2yb2MRjv BvFi4bgI8LMt1+pZ472fxhPR5DnTw+PvkDT2/Kn66WXP7RtgTc93cnejqJSAsn2VRSPi s2kPHiqmsds6XUeeWxSd0zJ0QjVPa6eChzMgHUQb/wzN+yEjDyj1ZpBoxTKAni6n2E8u KEhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1681263852; x=1683855852; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2pzRhJ2LtUsn5TMFMaAMCLV1FD904hTMVU+g5i12Gec=; b=6wIadVM/o+QwsGpYsBoo9vlMlQS1nNVs7QBTiPM+ockhuUny5eV6fwfczuwhscdj4X xbsIP0IU6eBhTGk52DkhuQgQ/tuYRV8rkYGW4PwsYotRVdsqXSfETBsKDefPY+kj/ljo nb4fBOs+aYpA6dJ86aQHllS8892ZibV9eVLMQpGeHbYisI1kxBNUGxyrVHyjR8SIMsD3 JREYPwG6YJ/E6bqFMw4I9cqFNC0RxLoF6Rw6w3Bcd9WPekHjnGDdUPK7VbYShkRC/xw5 zperI7B3f1wvm3ZXTW45FDxYz2p5l6AODlE2fuMJExJ0IxIaaSFzYFjF4cr0R0DpqnlZ pgxg== X-Gm-Message-State: AAQBX9ecWPYpcaG1D8yHZDv2RMhpZvpu7AWexV2NB082eGP2aAmpUhL9 d+WuNAJK9waw0wYbbd4WvvDS1A== X-Google-Smtp-Source: AKy350bengxfpnbIjUrtF3MdY3Z7SuyCasCW2rLSnFfY9WIeBYmMeS8iSdYOvJ2cchzLROq4Tx1jFQ== X-Received: by 2002:a17:902:a604:b0:199:3909:eaee with SMTP id u4-20020a170902a60400b001993909eaeemr491127plq.6.1681263852170; Tue, 11 Apr 2023 18:44:12 -0700 (PDT) Received: from [2620:0:1008:11:7e3f:e20c:7479:83ea] ([2620:0:1008:11:7e3f:e20c:7479:83ea]) by smtp.gmail.com with ESMTPSA id u22-20020aa78496000000b0063824fef275sm4363525pfn.37.2023.04.11.18.44.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Apr 2023 18:44:11 -0700 (PDT) Date: Tue, 11 Apr 2023 18:44:10 -0700 (PDT) From: David Rientjes To: James Houghton , Tom Lendacky , "Roth, Michael" , "Kalra, Ashish" cc: Mike Kravetz , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Peter Xu Subject: Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs In-Reply-To: Message-ID: <32590761-05ff-1923-1d2b-b397114d0b97@google.com> References: <20230306191944.GA15773@monkey> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2003089352-646934051-1681263851=:2005607" X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 86330A0002 X-Stat-Signature: 8sqzbynukik14t8zekeg8c1q5nfgs67u X-HE-Tag: 1681263853-848337 X-HE-Meta: U2FsdGVkX18cyYynIT6ZsmCyi0ROVA83grIpYs2HwmSRxZqZdnbvWbwlysYT+Aez2vqkhUHWPUgtW3K+YfdfYIvGPHWkF2QY5tWQNuk1DtccTsbgF8WuxWdL0mw3Zpp6DioYKnu2+IwbHHNQppjAFT2c8817pEeI54Q0/pekE8l80XWBGXuQKp07Dn2/jUH1xpis65kcoBieMvzLmYrg7P75SLFJe+fFhNIBBA0WUgqGFGUhs7d85SEyx7iWDsKamMVje6Q9xebO4NFvnmld2UTV3kQ/zVay0hJclG8c6C8c5E9vLd+vFYoD7oRZgh8cWY8wyC3Xrtg+aY02KcOx1yaKdT8J9NBZG78Uwozl9KLSPrtH3FTyLNYWScFSpJ5MiniOe0ohL7d02C8vB4XbMO4jxRg6pWKJb1viyvc4qPNwlo1QWsiBEwydzMau2scuPcouNRJN6aUzkVhKNcSQ6vHdsaaW+CqVsxtYWAk3BrV7JTiv4/45KCg+GrxTs+cGR830wpJvV9+A+YV0wsDVkt/LecEIbXLGqptsEdhkg2yrQfVXxezdJII5uM7NGAEh5VZioXGAXUmZK8uboTHc4Fxr02tiHhAjdKURjrv5AhCrmMjLrRk5IrtMjrLdfuLSGIm8SdAea9ck8Ai/s1RsqEj9HBO8Jh4FLtsi8iGmsafcgsN2AqKUvxuU5yaLrZy89p+wdHunjYoHFGHrt97SEBLYHJK0jcMMejx975zACN6eaJhMD+D/uIAzYvLgoi81hm2hKT6MYBEVuvH0HqM8/+e1LanzWGKzyQMjb10H7DDsR4MzxmdA88+EhnRwD8DeMzIwI0pUxde3+2RYymuzmcAynBiTgmaiGFMn8m/f2vlQdf83UaNz1J+2R+cvN8OJ9clCerBkKS+FY49SNNiO9KuAYaPFmAqMr78s9VNrTrn6dZ3dzrTJet9DqehxSpfP+tQwPme+u8elGGbsxas hWDR8Gqt /EFbUubqMWulCul9QSIWnINCaTAcfPKiC59cDZbyemgQedbWJgVpOjNN58zmTKim3N71iMGf88QRMv559Tv6SegVmKHcJfSD9PvKxmKFxOClpWGk+ig2dRThn4p4uTNCFo2oqTr/MmAA7Wm1MsqOlz/mSEfOZnq9ZbWql/rBsGtuDeDTzqtGO4PFkO5lChgIRQO3zH4CZxPk6AODL79DPz6YVQOxCEFQtKvNzMq6ryzTzSxipBlhgYbjadymfVM0Ds42udEufceE2vvITGpP3pulbe4WrGdQ570CWn8PMac8R39V3Kj3Lrw1v04XcSAujKj2GUrbijxQbE5EI0EX9WKt+YKSwxKM+8iDelEMfV0D8fp+gob5vOb15R/3WpMDBPh/t0sWrze75262yBqMiYW0dpDaj0lDi7cAd7lDYNtlQPOGBflah5UDUIJoQIdaKwi4opbKEzDF79nl6lry1TBtk2g4q4+wqLbAaSPKzXk0VOqUIc119ZLm3sXDVMQX8eV5rf+tYW2FokO9a/6VySaSjRc1mnpNZBAN7u0u4mOMdym6fp5X2lVkQlW9hW9pkLd+U X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --2003089352-646934051-1681263851=:2005607 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT On Tue, 14 Mar 2023, James Houghton wrote: > On Mon, Mar 6, 2023 at 11:19 AM Mike Kravetz wrote: > > > > This is past the deadline, so feel free to ignore. However, ... > > > > James Houghton has been working on the concept of HugeTLB High Granularity > > Mapping (HGM) as discussed here: > > https://lore.kernel.org/linux-mm/20230218002819.1486479-1-jthoughton@google.com/ > > > > The primary motivation for this work is post-copy live migration of VMs backed > > by hugetlb pages via userfaultfd. A followup use case is more gracefully > > handling memory errors/poison on hugetlb pages. > > > > As can be seen by the size of James's patch set, the required changes for > > HGM are a bit complex and involved. This is also complicated the need > > choosing a 'mapcount strategy' as the previous scheme used by hugetlb > > will no longer work. > > > > A HGM for hugetlbfs session would present the current approach and challenges. > > While much of the work is confined to hugetlb, there is a bit spill over to > > other mm areas: specifically page table walking. A discussion on ways to > > move forward with this effort would be appreciated. > > Thanks for proposing this, Mike. > > To hopefully get more interest in this topic, I want to lay out the > reasons that Google uses HugeTLB for VMs today. They are: > - Guaranteed availability of hugepages > - Guaranteed NUMA alignment > - Availability of 1G pages > - HugeTLB vmemmap optimization to save page struct overhead > > Until generic mm supports all this, HugeTLB will remain a very > important piece of Linux for us. :) > > The main limitation of HugeTLB that I care about is that it can only > map an entire hugepage at once; it can never partially map a hugepage > (like, there is no such thing as a PTE-mapped HugeTLB page). As Mike > said, this makes the following applications impossible: > 1. With userfaultfd-based live migration, being able to fetch and > install memory at PAGE_SIZE. > 2. Memory poison at PAGE_SIZE. > > HugeTLB high-granularity mapping (HGM) is an effort to make #1 and #2 > possible with HugeTLB. > > #1 and #2 are already possible with generic mm, so this also begs the > question: Can we merge HugeTLB with generic mm? This would certainly > be much more work than HGM, but it removes all those pesky HugeTLB > special cases (though, we still want all those features that HugeTLB > has). > > Coming up with a plan to merge HugeTLB with generic mm would be > challenging, and LSFMM might be a good place to have such a > discussion. Not all of HugeTLB would need to be merged. I think some > of the main special cases that should be removed are: > 1. hugetlb_fault (fault/GUP special case) > 2. page_vma_mapped_walk's special case > 3. hugetlb_entry in pagewalk > 4. HugeTLB's rmap/mapcount special cases (already working on this!) > > As part of this merge/unification, architectures would need to merge > their hugetlb implementations with their generic mm implementations > (for example, moving any special logic from set_huge_pte_at to > set_pte_at). > > These are just some initial thoughts; I'm sure many of you have your > own ideas for this. > > A discussion about HGM might serve as a jumping-off point for ideas > for how to enhance the generic mm implementation to make the > unification possible. > I'd definitely be interested in joining into this discussion, specifically for live migration and memory poisoning use cases. Adding in some folks at AMD as well as this may be useful for SEV-SNP host support. --2003089352-646934051-1681263851=:2005607--