From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4707BC282EC for ; Sun, 9 Mar 2025 01:10:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 925F26B0082; Sat, 8 Mar 2025 20:10:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8ABAC6B0083; Sat, 8 Mar 2025 20:10:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 726246B0085; Sat, 8 Mar 2025 20:10:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 511626B0082 for ; Sat, 8 Mar 2025 20:10:05 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B9306141FCF for ; Sun, 9 Mar 2025 01:10:05 +0000 (UTC) X-FDA: 83200231170.29.ED78A86 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf06.hostedemail.com (Postfix) with ESMTP id BD811180003 for ; Sun, 9 Mar 2025 01:10:03 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jeAMfu+m; spf=pass (imf06.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741482603; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r4FPh8cAnZtQW5ZPsNxGskEY/OiXTs4MuYAjZb1WM+U=; b=6/9rlYzYC/7hlxastrfevsUilwyXAlyx8iOHQV2gi7Z8y+KZAlJEx3HCNVvfzPAy3XRIko 7vf0YZGzfFbv38yc9JGg9CKGCC88vLCHovSD81nkh6qTSC9fzOMwaaXRdmOhq04BMtP5cF uIJU/EVzCylNDGT4Z3ur/53xedX/Lnk= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jeAMfu+m; spf=pass (imf06.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741482603; a=rsa-sha256; cv=none; b=5P0lenJeY2JDrhXAgiBZFLb7iYe4r3pvY1ozo0wcRt+UOQ5uCzktVl2W3f34Yo15PHctzQ bjFiJf9QkABtP+H5kSu0icRAZv6ajJKyAdUuupnWVKyiDpMYoYlyVgI2AomuwAJjJDZYyN ht9Nuh5qQD792uLjhhmKgtHPtmGx16w= Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-225489a0ae6so93195ad.0 for ; Sat, 08 Mar 2025 17:10:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741482602; x=1742087402; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=r4FPh8cAnZtQW5ZPsNxGskEY/OiXTs4MuYAjZb1WM+U=; b=jeAMfu+mhLnbSXiUnoGhyLF8fjYsexxFwzSx7lSKCddw5Yw9IMgL78BgN0c1dkyTZL tTNEjttSKFKtzPoe0zXbml8p/5IEPsR54wSUXQU8KMXf9LOg7zz8df0vYDsK6EXXUYgt /+Ffy4P6HtFrt98PgLEDJipE8/ll2Rx1m+vVhgDXA7nAbw7/ujB6NZdrVF75YNjCzsmx PuM+gcNxLS+GzcVMgMtlgOAF4HerbCp8p91oCK8QXQeRkGdoFfZskjaIki/rQEUpI5Q5 yxxs/0X8umXxqy9ypi6LuvaP6pg3i/aiuj6Ld6CfmW0xJEp2ZlOUNbwwlE9HH5nHoCCb bZgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741482602; x=1742087402; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r4FPh8cAnZtQW5ZPsNxGskEY/OiXTs4MuYAjZb1WM+U=; b=wlY5bnpiiovVgH/+i6ta+ZdgfkYdY5myEQgUO915fGxLrpcVzU2U2uqfesMB6MZVoe CamZxp8sHmWDDmF01nJ+yKoaqb09OMOilUWQg9+9MyBDDf5YXwfXFNsvR2fxi0l1hv/0 /lJ/iVsv6uLEYKB7jyqH2ttVALDMASR8Zkdq2HHmGMcvJbMsDkMWNecVC0q0ZU4iaN1r MfUjv3Np30CbuhxhDCdxj+o60coFRoTjMMIvOFWaEmYJoL1UBOz2P1+B5aOggu6401qm ta38URDAQO792t3KFFnEWC0Y8+VeH04zC1vw8zu+yJV+Xep9SH8mf2DXzOMY+llcPbML 7BRg== X-Forwarded-Encrypted: i=1; AJvYcCXFFcxV9TnaYdNurkqhycdPgggsucIVsFp51LsRd7bqadBTBcUIuQW82z9GyfQRo4siwN3KW5L8Bg==@kvack.org X-Gm-Message-State: AOJu0YyBGkVxZsuk1AbGhn1Xy/K41njAK3PGwOuj34KquUAY8pLMezXY YQpmiejEoOE7OzTVZPi40VGetMMBpd31/tcLM8is1kCz7j4qn+0sBoY7B7S+eXMBBEvul/dQGXt xbtuhzIB8tuc/3XPGjqvRnRN6ADkQ589nV8DH X-Gm-Gg: ASbGncuf6gs/BLViI0QkUqz6nAvy0F6MIoPg7T7ycRhfqNg5uT6oG+8oqLs/pHvIDEh egcbOXJuMhekC3JnXlk7MEYRJ5qOiJItY10+mzwIP9uP1iaCk5foQ33lMENQgEPDx7+bT5rHX78 ie+mZW7LKZdiT5GaNqwsiThDHUDprvdP6f9tQ+O8tvGtsCZ2SsaRIC5YekVA== X-Google-Smtp-Source: AGHT+IFSArhJYsv3zVVqv8OWZu8TOBKIMea4fF5APrE5l+oYIst19Piwk29+HWLQWw6zl3b0tZUthbrcWcdq34ooI38= X-Received: by 2002:a17:903:32c7:b0:21f:56e5:daee with SMTP id d9443c01a7336-22540e5a9aemr1979785ad.6.1741482602144; Sat, 08 Mar 2025 17:10:02 -0800 (PST) MIME-Version: 1.0 References: <20250226082549.6034-1-shivankg@amd.com> In-Reply-To: <20250226082549.6034-1-shivankg@amd.com> From: Vishal Annapurve Date: Sat, 8 Mar 2025 17:09:46 -0800 X-Gm-Features: AQ5f1JpKR8YeZcYjkYLi6Yj3T7-X4RTGFozEjKkiiy8Q2qes9s3EoImSl39wsJs Message-ID: Subject: Re: [PATCH v6 0/5] Add NUMA mempolicy support for KVM guest-memfd To: Shivank Garg Cc: akpm@linux-foundation.org, willy@infradead.org, pbonzini@redhat.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, chao.gao@intel.com, seanjc@google.com, ackerleytng@google.com, david@redhat.com, vbabka@suse.cz, bharata@amd.com, nikunj@amd.com, michael.day@amd.com, Neeraj.Upadhyay@amd.com, thomas.lendacky@amd.com, michael.roth@amd.com, tabba@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: BD811180003 X-Stat-Signature: tytkegqf4o477n9gykt7589a6ize41gy X-Rspam-User: X-HE-Tag: 1741482603-220437 X-HE-Meta: U2FsdGVkX19sB1j2QVNhRTMPKrWyCX/3cCaOoy5RGPstyM29c5CxiDT1vrUh4cJX1E+GLNf6sTqSOCYpxrVX+tQV8S9stwqbuAWn4dX+272Tj3U5wBNFZpJwPsAnlEDVrIj7BxOmcIstv8Jg4w/7KbLiS7CYCDtEiVXTr9Te0Jw5wBRMnPoh17h4Rn+44bzROsjQJYPgSLehLtT/Cj4AARtpluGN9GH8i0sEhBpQK+OrALf7qhIJxIBoMcoT2Cr641iMLxLwIZKcdj4Bqj1h3RTrtBfT8Sp9UCd0TYU/ITitAX0UlB9Z8jRKHUlUhfHs7Om66yADOEyQqOkD8nXw7YlwQ1ahkLU8XkdBywpWruWCj5yFWe8P70rER68vu6067FIyT1zZRaXnEsGMLyMVZY6O1foZ90iW1JSvOkb4wrLOTlrIJU1zMGBuhiszU+LYw/A0cGRpyYA3aGz6eoko5qHsneptwofAqmChZAm/JB4gwQ9/awKuyI4oxq+QhQiZjMKYrdyHYXEbFH+lLEJSD9/MUiuPc6/kX33OgSRnzPUZO5EKz7IWsa1hm0WlLB8CbntyHv7e8PuR7kvlhy9W0WByD0UFqpqHS3pyRcyJKOb7b0YTMY2yAnbF61PQvwx43eIR5cEhUAQn9VEZT9+KEw0FHOcPozR17DPaQRCqYlO4yK7KY0SAnAt0i4Mt+4Iab/aKLoE6BPmKOHk95scezSg15FCl1+5jXLmSnS/EYpJsW3k+7RFH/xmMaCyrFs/OR8xT/MCPdcyCNtFD4lrJvLKucmDYvf40QmpnPskUgdpVu1MzVc19DBtNZ7neU+0ax9n87QQbk58cUt28KjINaU2NTvewOmAC0bWcOeW1JEezMQaKJx8ZtS8GqzpPcUGw0mk3y0XsFXiNqwBHCFcmJPVxW7k0Xiv4GnB1Zb82Ykkt8ujp5NKsulkQnxkhhDH1UavzRX1K5zyFxUeqENO L98WxKpo CkmmlZtW6Au+y3onbsHdrYHwNwdIJCJBj0kQju4xcN8+jkxB0tMUKjYm0xDXc33somZzPmlTZJ8JaKQD/lUa+zuHmNFH+MuQ7j1GzPyww3MriTfAQo3/CvOms/MLj1hPyJWj5dL9LGFstr68MlVCi4W+NDS0/8IMLWIoI1JjKGaJeQMcpMTl4Cc/IVVPl6TPNOAgZFtFm1uPbfRGpvnJ1ZzxrVzInHnAwChvDWDQjJNFGu/7sprq2WSCAO5mR8UmxEzURGtuGCjoXRo4otBZgkRD369KctivWwHi0kr06HlO0VOrwuReKvjYFWNjugYH40bCbiS5l82qkq1hPr5je0H1pzOxUfB28t1zEoPKBgfiS11dxfJr0UYyJNQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 26, 2025 at 12:28=E2=80=AFAM Shivank Garg wr= ote: > > In this patch-series: > Based on the discussion in the bi-weekly guest_memfd upstream call on > 2025-02-20[4], I have dropped the RFC tag, documented the memory allocati= on > behavior after policy changes and added selftests. > > > KVM's guest-memfd memory backend currently lacks support for NUMA policy > enforcement, causing guest memory allocations to be distributed arbitrari= ly > across host NUMA nodes regardless of the policy specified by the VMM. Thi= s > occurs because conventional userspace NUMA control mechanisms like mbind(= ) > are ineffective with guest-memfd, as the memory isn't directly mapped to > userspace when allocations occur. > > This patch-series adds NUMA binding capabilities to guest_memfd backend > KVM guests. It has evolved through several approaches based on community > feedback: > > - v1,v2: Extended the KVM_CREATE_GUEST_MEMFD IOCTL to pass mempolicy. > - v3: Introduced fbind() syscall for VMM memory-placement configuration. > - v4-v6: Current approach using shared_policy support and vm_ops (based o= n > suggestions from David[1] and guest_memfd biweekly upstream call[2]= ). > > For SEV-SNP guests, which use the guest-memfd memory backend, NUMA-aware > memory placement is essential for optimal performance, particularly for > memory-intensive workloads. > > This series implements proper NUMA policy support for guest-memfd by: > > 1. Adding mempolicy-aware allocation APIs to the filemap layer. I have been thinking more about this after the last guest_memfd upstream call on March 6th. To allow 1G page support with guest_memfd [1] without encountering significant memory overheads, its important to support in-place memory conversion with private hugepages getting split/merged upon conversion. Private pages can be seamlessly split/merged only if the refcounts of complete subpages are frozen, most effective way to achieve and enforce this is to just not have struct pages for private memory. All the guest_memfd private range users (including IOMMU [2] in future) can request pfns for offsets and get notified about invalidation when pfns go away. Not having struct pages for private memory also provide additional benefits= : * Significantly lesser memory overhead for handling splitting/merge operati= ons - With struct pages around, every split of 1G page needs struct page allocation for 512 * 512 4K pages in worst case. * Enable roadmap for PFN range allocators in the backend and usecases like KHO [3] that target use of memory without struct page. IIRC, filemap was initially used as a matter of convenience for initial guest memfd implementation. As pointed by David in the call, to get rid of struct page for private memory ranges, filemap/pagecache needs to be replaced by a lightweight mechanism that tracks offsets -> pfns mapping for private memory ranges while still keeping filemap/pagecache for shared memory ranges (it's still needed to allow GUP usecases). I am starting to think that the filemap replacement for private memory ranges should be done sooner rather than later, otherwise it will become more and more difficult with features landing in guest_memfd relying on presence of filemap. This discussion matters more for hugepages and PFN range allocations. I would like to ensure that we have consensus on this direction. [1] https://lpc.events/event/18/contributions/1764/ [2] https://lore.kernel.org/kvm/CAGtprH8C4MQwVTFPBMbFWyW4BrK8-mDqjJn-UUFbFh= w4w23f3A@mail.gmail.com/ [3] https://lore.kernel.org/linux-mm/20240805093245.889357-1-jgowans@amazon= .com/