From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC3C8C02193 for ; Mon, 3 Feb 2025 08:35:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 59E2B6B0082; Mon, 3 Feb 2025 03:35:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5246B6B0083; Mon, 3 Feb 2025 03:35:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39DF26B0085; Mon, 3 Feb 2025 03:35:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 194E16B0082 for ; Mon, 3 Feb 2025 03:35:23 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2F35EA2404 for ; Mon, 3 Feb 2025 08:35:18 +0000 (UTC) X-FDA: 83077973916.24.01B2FCA Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf04.hostedemail.com (Postfix) with ESMTP id 5EABD40008 for ; Mon, 3 Feb 2025 08:35:16 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ka+KyjV3; spf=pass (imf04.hostedemail.com: domain of 3wn-gZwsKCIclnvp2wp94yrrzzrwp.nzxwty58-xxv6lnv.z2r@flex--ackerleytng.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3wn-gZwsKCIclnvp2wp94yrrzzrwp.nzxwty58-xxv6lnv.z2r@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738571716; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=bCVGsPkdqv8VOcstVdRRWjGTGpKBIG2jSUrhWhisdPo=; b=Rnj57VUseWGEiArqpWY6O9J0nGu7bb4ictHQJoP7SCaC+rdJ/8Z7g/RPM28j77B3934bUi AQwt5UAHM8g8Ww0EumAnZHKT44EVUGrswlWRjdxG4Sk4nnm+0ZWFDZAg8R7T89ff7/7nfn 0UUYKOtnKvxXznVGAm81Dp+DD1gmB38= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738571716; a=rsa-sha256; cv=none; b=Ayn3b7xjvnqsRe9ejp5zZWchSGp47sts3MYYNqbsNQywWO5TbMPaI3h7CIG3QpbxNKEKDt J9EQM8gYtonAGu0y/lAc8fPsGprLYrTmDc9LiHzn5nw1i2EXv9xMiYQLaqnKhAaao46gJ4 gZ3dsR6zKMZuZ1O2SxJbllJ9IO0boe4= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ka+KyjV3; spf=pass (imf04.hostedemail.com: domain of 3wn-gZwsKCIclnvp2wp94yrrzzrwp.nzxwty58-xxv6lnv.z2r@flex--ackerleytng.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3wn-gZwsKCIclnvp2wp94yrrzzrwp.nzxwty58-xxv6lnv.z2r@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ee86953aeaso7623473a91.2 for ; Mon, 03 Feb 2025 00:35:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738571715; x=1739176515; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=bCVGsPkdqv8VOcstVdRRWjGTGpKBIG2jSUrhWhisdPo=; b=ka+KyjV3tD9UVrlZTEGaF82KBMfCSwONchmk1MAiPdcE/DHlhEVPPoHrqfmEdW3QaU IaCAATUftcQUHmxa66VP+ZOX/K96TFtITLSUOJTAdx8iNrOrVdMZZjM/+yzMdY6n+Q6d 5VVj5OFmOZKSChJx2SINwk6f6fFccu1RqUWOhohQ0skypDuCC1smyisY/Tk+L3bXB4zC VSguCGMSOXnUykew369rO4WOqI8y5gAh5xjV7jZ5GWn1Q2u4+ld6rO0Ev9lVG4dquxUQ 0uxxe1He2IL4t98LF5UeSIOZlGxKQDQtBIOmUKlMkRG6+HByoU+7izqwwOqu3ZwCIg0S BR9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738571715; x=1739176515; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bCVGsPkdqv8VOcstVdRRWjGTGpKBIG2jSUrhWhisdPo=; b=JHyCH53o1Zr5dpnrPhpMmIJieeGe9jCJlwxeh77XSu4eT20odlMx37qnZfP3mEaB9U kanGXOibfc4dPhTdf2FAACOBgqfFsvjlLFar/TcXFD8VyqLxCF2yhyticxjifu37k0MH N9XarWhdnNQMCKQoO/L7CuUXdRzSdY/mwseX8KyP2eMGsod67s2EjcxkNXRgQUWGs/Ui twQ1r+q3iTGQYpWl+gngjOeq8PsENXzjue2qg1cZMmuT4VIMVHEp+yptzyafm0aIgIZU doXKkv5nCt6cZ1GxoumiL2MSFGtLmLJUrfSOcsMLG+lnELC4sQgPP/Lh0CX8C8dfmurC mlew== X-Forwarded-Encrypted: i=1; AJvYcCUHqaA+aop8YeO5bMQUVPGz41skw6WGJI0PDdWQEcQlVnQNyv4FEPai+tiKiAkH12QpW5iIW9/SQA==@kvack.org X-Gm-Message-State: AOJu0YzAhenCxNBaU/mHflxoucQ3znPxqBje8yA6IiumejVkX1mdhZX1 2CtWHQCz0TRqYo9pQCnUaksYwu5RIjDqQHMiNKP5tRa/qqsHx2oUckr/DQ4DjSOZMfuvQAIlc3J X6CZb67y/y8+Chgcxd5FXkg== X-Google-Smtp-Source: AGHT+IHj+Cug/2vDcc3qbw44IShUmb2Dal+tFzb8jNSGxbKWVsDIeVzNOQ+erfgJBUWM/FABAfOHIsgQxSoSDHqCzA== X-Received: from pjbnc13.prod.google.com ([2002:a17:90b:37cd:b0:2ea:46ed:5d3b]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2748:b0:2f4:465d:5c94 with SMTP id 98e67ed59e1d1-2f83abd9978mr32152246a91.11.1738571714919; Mon, 03 Feb 2025 00:35:14 -0800 (PST) Date: Mon, 03 Feb 2025 08:35:12 +0000 In-Reply-To: <83d44307f30ad8ce19de3edcdc00c179750e0e23.camel@infradead.org> (message from Amit Shah on Tue, 28 Jan 2025 10:42:57 +0100) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH 00/39] 1G page support for guest_memfd From: Ackerley Tng To: Amit Shah Cc: tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk, jgg@nvidia.com, peterx@redhat.com, david@redhat.com, rientjes@google.com, fvdl@google.com, jthoughton@google.com, seanjc@google.com, pbonzini@redhat.com, zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com, isaku.yamahata@intel.com, muchun.song@linux.dev, mike.kravetz@oracle.com, erdemaktas@google.com, vannapurve@google.com, qperret@google.com, jhubbard@nvidia.com, willy@infradead.org, shuah@kernel.org, brauner@kernel.org, bfoster@redhat.com, kent.overstreet@linux.dev, pvorel@suse.cz, rppt@kernel.org, richard.weiyang@gmail.com, anup@brainfault.org, haibo1.xu@intel.com, ajones@ventanamicro.com, vkuznets@redhat.com, maciej.wieczor-retman@intel.com, pgonda@google.com, oliver.upton@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-fsdevel@kvack.org Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: mwj4fdsc3gqyhzmz11dhfw1o8hw5d1n3 X-Rspam-User: X-Rspamd-Queue-Id: 5EABD40008 X-Rspamd-Server: rspam03 X-HE-Tag: 1738571716-81376 X-HE-Meta: U2FsdGVkX18Ij+lDv6zTw7fI4gWJ7rstBZnFtTFQUDZuD5oIAhFzbc6yb/Yl7Rhu5PAId6zfjxNMwbzYlRppE4TUPZieqcODyj+eTR+PuN9MJtVmEvaLYruionGpwHyyvo3XhZJv5nSc8EZiUGJIfdUOsy3NW6bsnCNC9d7iBJfNOituUIGs3f+4K2o6PhSMuMFPDP0LSouRTj/0vshqn+ScsuajBusuuvaXheDxW096akqFn48kQBnOpeUEL7KvvaGL8CA0+htA42WApA8S4L9AHF3JV1Zv1IeCAejd8fVnpHfOF5lAhq6cNiwbOwntITAaqzcbr5ntoDnaI2tdI1l66kSALSsbmeL0ciVf7XB3aR3lJpXNiKbQlxrItD1iUU3FuH3h/ZWpzdE4GfqMCdTVspKb0K/7ti/koQZzkoNW0+uYUcSeMp8Aycuuh11HzC+LFSq/46FZ7S07/LhQSQ2q8XbEYF/ZOV74SJh5rdw1blXtD20lOoa47ARb+FVc0LNcxN2AdQdba/kHHPDJ+yZpHnxFvMYLWOPcoIA9ZziVcP+DausgYJYicoeBUqtlfpasuTRjOWJvx4UR/ItkT3tO6zJK3ddEOSgyRX/Kkqme0WdwqEudaRvmszUpdJxzVaOd64vQG/CU/ZuaCl8LXo9uGUXMH7V1Ek2kb0lFSMqUXJ6l0l7V/gpmY2Jd6O2eEWyocE6UvIB1nmrSCSQv7A0UvDhdC2XDGBCdy5oNJa0Gwjw3/kwS70wrUdU05vXyB0hBsPh/ta6pzTJshacQR0LEhCi2OScqVhqrJRGFWoYKI6t2MEC4VXCzGzXP8moydujLiItQk7NrkCT7GFxSsFON8al0MEcCjv2Tst2KAPgErz4/XsfERJ8D6Msw1QK4hlGQSKW9IU9LBb4e32nlo+keotzv/txaey0Ecby22IHfRjs1zySVUBNZeX/XvMCx1zyKA6u+DYgqozkpTD4 uIxxBAew vAd6dQUWvKClGC/D2+r7POrps9+aq8e7DgO0ILrJuuNSwopmtKbCTVFtBhoNVi/RMldlm2whkjkbqiGvgWCO3dj3hyZT3ZbIOmPpReQBNY79aySHJ01j3s51jIhD1Xs1VzBW5PDoxpyu8uxI5FU+EtGDus1I8beBVgZFAS0mlk+253wGtpSQdckaXaSRBDZs8CzzmSrAF8GRX/E5b3npTfE/zMWA7WQcNOvkyGu5KChRqdPoxhifeK0VWQeGvgxaaBPo8n38+/xmP+L6+IyS66ps8UtS4C5fT14yahtG6x79TnL/2KFacJpCDyfp9vbRfJBHAKEL8DsGbB6AYs5O4rloGeYzewLqax2m08cQrQUXcYRVeLprAISZJBev3Hi87TQYCcf/JBlGO9O0/zBfV1DctOIu+NhWCKQxXDNDiq/SdsFBIq10DsCq0qbbig9qjk2q9CXXGaatpJrXxsCe/zrtQ93FYbos0EnUIrQWbO9DObIqN3sG9U+oWTmNY3j6ibeUW92ZWemjFy/xoDhRooMGy6B1h5OFzfgY9n9osbB0+wY8L+5Zy6/3cacNp7nGfdlPEhPzSicDUfySotF+5HJEJGFT+xdESLU/4G9KdL+0IHLkV78JKa6nrGu4+F4MLKENTnmtO9FWkQtIoKAYZ6E/bAPjETkEJeZC/Zxd7ZXIKXg689CYeD5AKyHgTt6D9fOfkQc5p376oh5x737PtprZ9wXoDyLHYkZxGsEKWMqnUV+SPIaWyjOCAqD5rQD5mPRGJK2LJtZylKwRyKqHS7KZ4hfUjPTKmbjCn/ktqI2czDkHSGTwj4micdrKGotsnQL+f6BeeHFzZ7gBdoE6eeotAaa3rRFBPvT2NdX/3juZyhxVJfaaUq1op3xxgE09V/kitOmyplRYP6lydI6EBaoIvJYG6HvqtIc6jOrSB8jI8ljk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.003641, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Amit Shah writes: > Hey Ackerley, Hi Amit, > On Tue, 2024-09-10 at 23:43 +0000, Ackerley Tng wrote: >> Hello, >> >> This patchset is our exploration of how to support 1G pages in >> guest_memfd, and >> how the pages will be used in Confidential VMs. > > We've discussed this patchset at LPC and in the guest-memfd calls. Can > you please summarise the discussions here as a follow-up, so we can > also continue discussing on-list, and not repeat things that are > already discussed? Thanks for this question! Since LPC, Vishal and I have been tied up with some Google internal work, which slowed down progress on 1G page support for guest_memfd. We will have progress this quarter and the next few quarters on 1G page support for guest_memfd. The related updates are 1. No objections on using hugetlb as the source of 1G pages. 2. Prerequisite hugetlb changes. + I've separated some of the prerequisite hugetlb changes into another patch series hoping to have them merged ahead of and separately from this patchset [1]. + Peter Xu contributed a better patchset, including a bugfix [2]. + I have an alternative [3]. + The next revision of this series (1G page support for guest_memfd) will be based on alternative [3]. I think there should be no issues there. + I believe Peter is also waiting on the next revision before we make further progress/decide on [2] or [3]. 3. No objections for allowing mmap()-ing of guest_memfd physical memory when memory is marked shared to avoid double-allocation. 4. No objections for splitting pages when marked shared. 5. folio_put() callback for guest_memfd folio cleanup/merging. + In Fuad's series [4], Fuad used the callback to reset the folio's mappability status. + The catch is that the callback is only invoked when folio->page_type == PGTY_guest_memfd, and folio->page_type is a union with folio's mapcount, so any folio with a non-zero mapcount cannot have a valid page_type. + I was concerned that we might not get a callback, and hence unintentionally skip merging pages and not correctly restore hugetlb pages + This was discussed at the last guest_memfd upstream call (2025-01-23 07:58 PST), and the conclusion is that using folio->page_type works, because + We only merge folios in two cases: (1) when converting to private (2) when truncating folios (removing from filemap). + When converting to private, in (1), we can forcibly unmap all the converted pages or check if the mapcount is 0, and once mapcount is 0 we can install the callback by setting folio->page_type = PGTY_guest_memfd + When truncating, we will be unmapping the folios anyway, so mapcount is also 0 and we can install the callback. Hope that covers the points that you're referring to. If there are other parts that you'd like to know the status on, please let me know which aspects those are! > Also - as mentioned in those meetings, we at AMD are interested in this > series along with SEV-SNP support - and I'm also interested in figuring > out how we collaborate on the evolution of this series. Thanks all your help and comments during the guest_memfd upstream calls, and thanks for the help from AMD. Extending mmap() support from Fuad with 1G page support introduces more states that made it more complicated (at least for me). I'm modeling the states in python so I can iterate more quickly. I also have usage flows (e.g. allocate, guest_use, host_use, transient_folio_get, close, transient_folio_put) as test cases. I'm almost done with the model and my next steps are to write up a state machine (like Fuad's [5]) and share that. I'd be happy to share the python model too but I have to work through some internal open-sourcing processes first, so if you think this will be useful, let me know! Then, I'll code it all up in a new revision of this series (target: March 2025), which will be accompanied by source code on GitHub. I'm happy to collaborate more closely, let me know if you have ideas for collaboration! > Thanks, > > Amit [1] https://lore.kernel.org/all/cover.1728684491.git.ackerleytng@google.com/T/ [2] https://lore.kernel.org/all/20250107204002.2683356-1-peterx@redhat.com/T/ [3] https://lore.kernel.org/all/diqzjzayz5ho.fsf@ackerleytng-ctop.c.googlers.com/ [4] https://lore.kernel.org/all/20250117163001.2326672-1-tabba@google.com/T/ [5] https://lpc.events/event/18/contributions/1758/attachments/1457/3699/Guestmemfd%20folio%20state%20page_type.pdf