From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89960C28B28 for ; Sun, 9 Mar 2025 18:52:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0C85280002; Sun, 9 Mar 2025 14:52:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A960E280001; Sun, 9 Mar 2025 14:52:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91088280002; Sun, 9 Mar 2025 14:52:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6FA4B280001 for ; Sun, 9 Mar 2025 14:52:21 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E634BB2049 for ; Sun, 9 Mar 2025 18:52:21 +0000 (UTC) X-FDA: 83202908082.08.18A292D Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf23.hostedemail.com (Postfix) with ESMTP id 0FF81140009 for ; Sun, 9 Mar 2025 18:52:19 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=g7rIj4Yc; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=vannapurve@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741546340; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f0F9uAQ7tctvQCnO0ZWm2wGaaH5f4voNwAJg6oGX9SI=; b=rsNSEoIhDTon1pXDkaxBWLzF4nZJAGEZulNwy7qOUB3T2blPRCH2dynmyRTMtBublLDRUZ 8IZzsdU4cNIq3nnBy0vv488Kt09Hr6GL2Nu/YJBLSZTNKO+QPKA7dl5dU18KEIxB7Qr9rX w1q6uWiMoLlgReDA9IsvTj1mQy0w9ZE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741546340; a=rsa-sha256; cv=none; b=8QNXiSfQLbTpGhVDiYuV/0L2ywi2M1spdNiTyNbtImwmsiLqcsYR0wt5EzN506wLPuqivX m3tltATM1lr/Y2MRGV03Y4tq8PiAcTewwRLbRZ6k6s80LIFKaIMPrDhIy76iykXWygQZtM AIMZeXQyNA//XcIMPllGGrRQXYVhuns= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=g7rIj4Yc; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=vannapurve@google.com Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-2242aca53efso145355ad.1 for ; Sun, 09 Mar 2025 11:52:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741546339; x=1742151139; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=f0F9uAQ7tctvQCnO0ZWm2wGaaH5f4voNwAJg6oGX9SI=; b=g7rIj4YcHAWidYYLROubykZuzcNTiq/mIfpFARYabsyVrTSiugAcgalKAKpE4oMNkB 1iPxKMsqh2cuNSOJjCGYpKAhnerWIFrWV5Kg6K7u27HQioIHxUujDHc7LXPQEr+UrdZa Mck2d9JzmZRhujC9pMGsyWLGQjKw6ujIUjQjydk/i/OW9FKiJono/wvTrOrcUhANsqA6 sGqGXY2nDWMIzG8lH6ZkIjonte1s5ORvbxVqJ5jNVNfTdOGN0DMlQJtVtSdxb/ee5EO+ E1eqNOQS6cTX91MzBA+rk/SgOsMrgm6deeSu5YR7P7Z+p57hbFBkpN/z11AAgSAK3V+3 bnyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741546339; x=1742151139; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=f0F9uAQ7tctvQCnO0ZWm2wGaaH5f4voNwAJg6oGX9SI=; b=QMs/VYWKsOmldPLpSUVIVmkaZjCPU760l2/+237EYoBd+S8n9INAL9vilnwtSDlhMc VIEr1u8G4GTpGd06inafFW453xSwmqYBTcOJOtgQEV0sdHJdFmHyWkATuneH1xCmIW2r W6UFxBGn2C6wDGzYTc6g31JZrnf+IUe8z6zLgiQHtfxzyoOa3d5VXi9QPmDEtm+iuzca hfoSGqVeQSnG5avm3nEMXTV70KGTFvvhsucSlpUFtLO59FcHGTMuKDo0l8pSG7OCPH8b 3hu3JtSOai9M0oA1KErOvsw6VlS2QYv/o3n7k+C/S4wQ9t3bKYetsAZRT9IAqq/d8ikx NqmA== X-Forwarded-Encrypted: i=1; AJvYcCW2VGBdcOpgCD7zxoYScV+H4THzOL/He4n6noXtrIU5MPcefec0XhesYfgdVpC4djrscxaXTjaD5g==@kvack.org X-Gm-Message-State: AOJu0YxXfWWyRFYtJjbCiBELVrdSVIl1qOghVgaoSALbAmCqKMRi5RQF 3/P86Ww3r+IQXOTocUPWRF9GroIhUJQdmhLatTgADuXKMmAVsRKr5vr4NGYQwitZNkGcH/uyJow zPGRp5F8Q3O2Pu3Mouh7pXg3JPfTq/i62Fg0/ X-Gm-Gg: ASbGncsJB/h2yhv/RBfkCbhed8rHPjc7mvLStFsNYUK7IsjImK9vetIqZC21yOj3gCR ohTzuqAWLW/Hpq+2x6dJa71zb0BuCjXtbJWJtsOamvS1R+kjGOcetc7RcjNYLEo1zM97cec4Ssr d+OGHjEpnmHrQFOwt4i4gLdm5zwqLOu46qwviOAuzHJ3f/ujjTXwTu2qXggDc= X-Google-Smtp-Source: AGHT+IFG9SSJTufI0eFYbf0wQm2WofP3DAL5foaE083K56zTD/V1FvHHIehtuzbAThLlpAJshuGX+RU/r/N5IlTVF58= X-Received: by 2002:a17:903:32c7:b0:21f:56e5:daee with SMTP id d9443c01a7336-22540e5a9aemr2472185ad.6.1741546338366; Sun, 09 Mar 2025 11:52:18 -0700 (PDT) MIME-Version: 1.0 References: <20250226082549.6034-1-shivankg@amd.com> In-Reply-To: From: Vishal Annapurve Date: Sun, 9 Mar 2025 11:52:05 -0700 X-Gm-Features: AQ5f1JrURSpnxKRR4ttTpVJoZfdt94X5FqXT7N9vOmgEqpVIhGKcMYbUZvLcn9Y Message-ID: Subject: Re: [PATCH v6 0/5] Add NUMA mempolicy support for KVM guest-memfd To: Shivank Garg Cc: akpm@linux-foundation.org, willy@infradead.org, pbonzini@redhat.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, chao.gao@intel.com, seanjc@google.com, ackerleytng@google.com, david@redhat.com, vbabka@suse.cz, bharata@amd.com, nikunj@amd.com, michael.day@amd.com, Neeraj.Upadhyay@amd.com, thomas.lendacky@amd.com, michael.roth@amd.com, tabba@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: si75hrjyab5gweyzumdnxduxyegnq5fd X-Rspam-User: X-Rspamd-Queue-Id: 0FF81140009 X-Rspamd-Server: rspam04 X-HE-Tag: 1741546339-208533 X-HE-Meta: U2FsdGVkX1/sMtvPi9CPzstIVlmK/iGnzAnYyFbG5aO1Sam9cA+rkEPBP6JVLFD+md/13d6gYrFRz4nqRbkCvcbrqZ+K4t79ouwrlAhTW5Q2UWi9IDkQ1obR6nolWMrP8foB9MpRsxPBddwr6WVOgQoP0NgJP7l2LEEUTeTD4Vb7rAynl9zv7BY2d3hNWLEkTgqbj2ZHCzrOfxh1gmUtwJHwkYtJR6LMPp069MpgArR92n8zHX/DDCGNpEHRqMCpUpuBdovoMObMvTDfVzABeN5pDafXDqmjUSiBePqeyOOtuEiQqN8tUnWe9PCCLfwDivqOvCX4hEyqiWHmLk4z1IDxKhgq5pvMQ7cX70sjSgk0w2pT5ciycaBwhosnYy9QNUuUz1y8G2sGjI6E4bwc3TaYyQaweZ8Gba9pAupdcWMuPkmxvLwxEE1KylggfgIj/wKdfD/uVdELr/If+RBp2bmMSGWM566pSHeqZdqlF+ijhSRgxTswG2NILuPGiIwnYHSHXLCItdNcBHn5KFBNoNydFqZoHGClpJemZOjrePhOS7Bd+yJP299iTtEcXJgRzS6v9GHlljOpQuNjdISldYxueiwVt9G72PwhvoZEq68eHMJia5VTGaiPLrFbHZyeB6nCUanay1jiQVhjkcG0zSOlm2J6aysJwa34mPv73jlwS72i6Fc0QIqJZ3Kk4HivIdSfi7mYyV3AS2ndISGs8jIbwz8LiImJmq7vxhT3/tl9E/g4rXDLzu26AB8njLWmmB2t2JaSgUmKGvPsmZs8Ak/7WNBnyjmwnLyIfJk1tC6wvg+at9hqIjq2e3oL6133yPLqt64UC0NsSMs0X19mx8TZS8/gW70HUOy3D30IPJMkH/Cy/I/2wBw4qr6NnyNmk3OBc16SIQfQNBK3jRxVPETKYvwc0nw+k1Ua5nGeNPEjLAk9ODCr/Byh4jB78R9IqFcJvAg2nT8hzv1EgzY XFdi5Bnm J2YLtZiNJBbdUB/jsqhvezD9ud7kh8nDy978jhVp2+wpz3cB0y3T7lH0pk9d2JwQhQyqOCmyEEdA9q5myX/uR+J45ydiNgGlgnymTE+H6u1SNu+u0mFYwSrkjddXRz+ISgZNR6mDq5AitnKgwRNpEmHuuK0tKfo/XxokuMedRGpPESj8aCWdp1sWJOauqhm8MmbgM4Ormy02iwQ1N+I58Bht17VOA9R2nnMYJBrUsjArhdZ6lPPvIM67ALEzPT/h8CsVm5L7gEqRJdrhFSc5mY5bDtINLruOQYeKRNV2VwCnPfe2Qxm9YlDCJi+dUZ3vGfklx79EVKnYT8SwGINS+2ZnYWK0x+DsGOYtDIO0gF1Op/CyyqAD5zrHm0+AyQFhJWcVyHDeEZOAkTjY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Mar 8, 2025 at 5:09=E2=80=AFPM Vishal Annapurve wrote: > > On Wed, Feb 26, 2025 at 12:28=E2=80=AFAM Shivank Garg = wrote: > > > > In this patch-series: > > Based on the discussion in the bi-weekly guest_memfd upstream call on > > 2025-02-20[4], I have dropped the RFC tag, documented the memory alloca= tion > > behavior after policy changes and added selftests. > > > > > > KVM's guest-memfd memory backend currently lacks support for NUMA polic= y > > enforcement, causing guest memory allocations to be distributed arbitra= rily > > across host NUMA nodes regardless of the policy specified by the VMM. T= his > > occurs because conventional userspace NUMA control mechanisms like mbin= d() > > are ineffective with guest-memfd, as the memory isn't directly mapped t= o > > userspace when allocations occur. > > > > This patch-series adds NUMA binding capabilities to guest_memfd backend > > KVM guests. It has evolved through several approaches based on communit= y > > feedback: > > > > - v1,v2: Extended the KVM_CREATE_GUEST_MEMFD IOCTL to pass mempolicy. > > - v3: Introduced fbind() syscall for VMM memory-placement configuration= . > > - v4-v6: Current approach using shared_policy support and vm_ops (based= on > > suggestions from David[1] and guest_memfd biweekly upstream call[= 2]). > > > > For SEV-SNP guests, which use the guest-memfd memory backend, NUMA-awar= e > > memory placement is essential for optimal performance, particularly for > > memory-intensive workloads. > > > > This series implements proper NUMA policy support for guest-memfd by: > > > > 1. Adding mempolicy-aware allocation APIs to the filemap layer. > > I have been thinking more about this after the last guest_memfd > upstream call on March 6th. > > To allow 1G page support with guest_memfd [1] without encountering > significant memory overheads, its important to support in-place memory > conversion with private hugepages getting split/merged upon > conversion. Private pages can be seamlessly split/merged only if the > refcounts of complete subpages are frozen, most effective way to > achieve and enforce this is to just not have struct pages for private > memory. All the guest_memfd private range users (including IOMMU [2] > in future) can request pfns for offsets and get notified about > invalidation when pfns go away. > > Not having struct pages for private memory also provide additional benefi= ts: > * Significantly lesser memory overhead for handling splitting/merge opera= tions > - With struct pages around, every split of 1G page needs struct > page allocation for 512 * 512 4K pages in worst case. > * Enable roadmap for PFN range allocators in the backend and usecases > like KHO [3] that target use of memory without struct page. > > IIRC, filemap was initially used as a matter of convenience for > initial guest memfd implementation. > > As pointed by David in the call, to get rid of struct page for private > memory ranges, filemap/pagecache needs to be replaced by a lightweight > mechanism that tracks offsets -> pfns mapping for private memory > ranges while still keeping filemap/pagecache for shared memory ranges > (it's still needed to allow GUP usecases). I am starting to think that Going one step further, If we support folio->mapping and possibly any other needed bits while still tracking folios corresponding to shared memory ranges along with private memory pfns in a separate "gmem_cache" to keep core-mm interaction compatible, can that allow pursuing the direction of not needing filemap at all? > the filemap replacement for private memory ranges should be done > sooner rather than later, otherwise it will become more and more > difficult with features landing in guest_memfd relying on presence of > filemap. > > This discussion matters more for hugepages and PFN range allocations. > I would like to ensure that we have consensus on this direction. > > [1] https://lpc.events/event/18/contributions/1764/ > [2] https://lore.kernel.org/kvm/CAGtprH8C4MQwVTFPBMbFWyW4BrK8-mDqjJn-UUFb= Fhw4w23f3A@mail.gmail.com/ > [3] https://lore.kernel.org/linux-mm/20240805093245.889357-1-jgowans@amaz= on.com/