From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55381C282C6 for ; Tue, 4 Mar 2025 00:19:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A21DE6B0082; Mon, 3 Mar 2025 19:19:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A8876B0083; Mon, 3 Mar 2025 19:19:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 86FA26B0085; Mon, 3 Mar 2025 19:19:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 67D556B0082 for ; Mon, 3 Mar 2025 19:19:06 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 154631610B1 for ; Tue, 4 Mar 2025 00:19:06 +0000 (UTC) X-FDA: 83181958692.01.BA4F82C Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf08.hostedemail.com (Postfix) with ESMTP id 563A7160008 for ; Tue, 4 Mar 2025 00:19:04 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=p8YaisCo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of 390bGZwsKCOIEGOIVPIcXRKKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--ackerleytng.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=390bGZwsKCOIEGOIVPIcXRKKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741047544; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=TDyYj5B62h+Bnx/iegfnSRCOJrRTTkL1sRbV1d02P2o=; b=H5P1PtoWQrmnPlT8L1y5gZycfa/1Dyb8fU8n9C935FetSEOT8qv1afbtU4+YFD7aEz2YwT aC3141RI+3xOytrwNg5cnFoBUnbSk7k5rvuDk3GUlrxVco0U/ZXTixm0t1k93BPa3sF0vL h/lVSZdUQVf92lJ74LRKmqc00SsJljE= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=p8YaisCo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of 390bGZwsKCOIEGOIVPIcXRKKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--ackerleytng.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=390bGZwsKCOIEGOIVPIcXRKKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741047544; a=rsa-sha256; cv=none; b=TCRaKjF2EjfSHh0UZVT1aRtJRp3zcUF/Q3PIB0YUqIO0qMg4zFQZJTBNAiLrElUPHczvPX qHM3bFapUAPfwH+n3nrYRTQq7bt9c94AOr8M0P/6kF8VDpl+hTw7XXdqKQ6/+miIMCKgCN 0LqXbwzkrzM7kxlRJQ29Ci5ep2QEhC4= Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-22366901375so81259295ad.0 for ; Mon, 03 Mar 2025 16:19:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741047543; x=1741652343; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=TDyYj5B62h+Bnx/iegfnSRCOJrRTTkL1sRbV1d02P2o=; b=p8YaisCoMubBuIIR02Rib4UKzwg9oB/HTjBAMrhbFWHX2JEolgE/FQlMum7xR+R1ph RovZtJqKJ+kx/qMF4SWuQKevTIC6RlBZ1COUw0L64+XAyi0J7W2qOaNuBQHpk949g4KZ VmEAyrmSQJvUEWqwViTnYYdNzeh1fIV2DoIZSBCWO/zNWEUjoW0GVq0k6uudfArRFkvE ryjEMIdK0xI5JSV19uiOF/D5dDxT9dRtRalsUwfLt89SpWcw8t/+hKtekNQYGMrutox8 DabuHROBOAGNh+PThVcgeuY6wAWdYBcubGU6m4nvWyPyXVAbWNZ1pADtbfE3KPe3O4gF PBuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741047543; x=1741652343; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TDyYj5B62h+Bnx/iegfnSRCOJrRTTkL1sRbV1d02P2o=; b=gb7iuA4ka6cewv1g1S/3R5RLrKvY7ArthICnDyH7NJhdjpmslVowSnQYgin72lcGLQ cQiVJyhBT4DqoDwoaJH49iag099/Er/vb0TiOhgvWSVcMJzbF5N4C2VGSQK2wui6qPFC qMEEaY/GI8Haj3ibRtgjQyiWYWGZYReXDZI6UFPYk0lItZYy4ijRlv78gLlc/2ocYNP+ fHTeceJyZp3AgRkPhVArIrjUGoTGajyVwtxyB0vXDDr/6uhHZuTF41xCX7TxvmacQ/kA fQX7EDB2+0OtzPoS+KJi9VOqFArKs5gt90/zMfXuc4xcHqDME7L+yZjLZe7qWafZmrTU 79Sw== X-Forwarded-Encrypted: i=1; AJvYcCX/eTkA5xlPuBcJ6aNjlovePaZx/YE5qrmbKtwrIzPCV9fuGpuByLTc17arLSJR8LmwRR/d8yk34A==@kvack.org X-Gm-Message-State: AOJu0Yz/zv5bPm3oH0J0by23SnLLsLKXNKf7fq37zW25ALHwpYDa1TRz sLqIyyNydteZgBimsgMxOASv1dR0STl/+mv+KdoP49spPgx3zvJuIffp6O4wBpKOCaClb5NKYZ7 85iKjQuhAjL6ZogZSSQyNaw== X-Google-Smtp-Source: AGHT+IGjrHco3Rt+VZziwKrqBORZRwSaK6/c6j7Oy5v1wwCg0MU8Clnd3M8Xq66zBx5JDkHN1Q0qDWfyKXWIOSRuKw== X-Received: from pfbei32.prod.google.com ([2002:a05:6a00:80e0:b0:725:1ef3:c075]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:72a1:b0:1e1:a449:ff71 with SMTP id adf61e73a8af0-1f3390f51f8mr2373495637.1.1741047543098; Mon, 03 Mar 2025 16:19:03 -0800 (PST) Date: Tue, 04 Mar 2025 00:19:01 +0000 In-Reply-To: (message from Vlastimil Babka on Mon, 3 Mar 2025 09:58:51 +0100) Mime-Version: 1.0 Message-ID: Subject: Re: [PATCH v6 4/5] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy From: Ackerley Tng To: Vlastimil Babka Cc: shivankg@amd.com, akpm@linux-foundation.org, willy@infradead.org, pbonzini@redhat.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, chao.gao@intel.com, seanjc@google.com, david@redhat.com, bharata@amd.com, nikunj@amd.com, michael.day@amd.com, Neeraj.Upadhyay@amd.com, thomas.lendacky@amd.com, michael.roth@amd.com, tabba@google.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 563A7160008 X-Rspam-User: X-Stat-Signature: ik569915iwa8txfkt436nz597hb1uweq X-HE-Tag: 1741047544-994678 X-HE-Meta: U2FsdGVkX18kJr55BTwq7aGfS7O1HWrqaQtPexexsUx79/bz0tVgJA5UUe1BO3hvfMeorzzFsmfeIT5Fj2Fr/8MXchbIJaNFUzt7Nz7YVT5l1w10k8FBecbz8HhLa51vC6dsM6LjegNNne3UIKtEiALJMSBKAnPtvPHjwtMv5Wxkde37pElJjt7mawlqp2FhlIOp1IsLYr2C5OMx2FSu+SnU5KM02rXiHdxk93B2dEXNoVu0gzf0JX8/LnaiDowAyiOHxEfhflxDa8K7a9pSvPQIyN8U5yhZkkOmJRISlehpv9iFqUuMv71Ib2nyfuxq4RwxIxPgfUJzrub2j+/vIe3iqr68srg3NIrikZeOzrKPvOQosrrIkGMxETVklJcFosNVdkp5goYFVeGrTybBIviorYfX4BW3EP2oyfDvjFflMJ3U8mu0KKf69vGx704EfTa/f+zjpVIJSxkt0sqNgIDDiQdRspmY0U9zebNaAZmZhcQBFTqKGXIVwYR1sFBa89U94vSPnYXsr6Glp4acM5KclRGnKLexfBAyCaB35/s8yFhO2C9syHpi1uAcVmXzLiFiEJcOfN+MbRafwa1jKrzxv3CxXNtZ18zm+ZK8Xc8PM4dZtdc/va5GTexp1Ssr583vUbXXhChQfXCov7sWJ+Le2/uel/sKbB9d9PptZBOhoXV0eD9iOHOZZBKcbBFMf/3ChxlVgEvjdAdeKCAYJ22pibFFnCB7/2Y20DA2vL0N5zjYDFcnh2S5X3Q/kN21Fih7GQV+yZmMKKMkWCCn/+NxCpH2b9wayIJu8AfSxfm3gwzToh3hjD/Uql8p1pHSpvsR8utvNlWtRbeQ37wsJrCScak9Tgo3ojQtjqBQFfnfHV33m7Whv5Hl8Cqibw9klyxM+tKoLFBGqlsNmMWICb35M8DZwnzvHlMEpgrrmexZ7WEQl0R10b8FLfR0ictZnRRuHOzvtmRyLHRtCRj p0IScVbr bDogoT6FfIuVItCciva0GPs5qyajUCc9OPDu2EjZF/X9ELbWYv6rt2XxD5+ckI4UihmRcEfn0f27Xk8PTRQbI2R/hde0f/5vG/5jAx/19ALhdZLdzUTXxctIqJxSzDpWGJyAPiYtWGbYBqWHSdkiiLAFP/yNtnLFT0TezBFa++m3glJmln03NxYZY+dufTirvkk9qPPoDYJ0FLab8sU2Fmzj2GQILEs63zPGPSJcHYPEV5VF2COVCM3eC8XOuw29L8GNXmOHx5lB9NLaw3FODnaRz2sUPD9NZ/TKWlr2ixkxaVSZQLTzd2FlbBmQHvmlp7JxmGtp6CH1AMvDCCteKg0X80C4YnUIN4BRfPTmGj7oqkujm3+W8UWyGzHPmu1h+XMq0Jz0Mw1TlxQBE+Lw0WLu92ynQonvbNUgr8pq06J9IVt7uzWZzwxvmUOsMae3uhle0jA0X6Qnv/queqLa7/+LFCnVP18PZaqt8ZkQ27TX03CsfgO5D700n0eSzlaqvDX+Jf1mOvPKO/BnN3mS6l750tgciHk9konYrP31FCnwybBdSOOGycIxSy6Pzk31prrKLR5GNqEHpn/zSipdvJwYeftqi5XjKi3bgHZMwf8TvbuH9TsJE8Wdx5G06CIH1kO101npVlC9sZdxg3tjrlWnAHCuK6zZS7lC3+GYKqBKhA7PHszue03LzdA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Vlastimil Babka writes: > On 2/28/25 18:25, Ackerley Tng wrote: >> Shivank Garg writes: >> >>> Previously, guest-memfd allocations followed local NUMA node id in absence >>> of process mempolicy, resulting in arbitrary memory allocation. >>> Moreover, mbind() couldn't be used since memory wasn't mapped to userspace >>> in the VMM. >>> >>> Enable NUMA policy support by implementing vm_ops for guest-memfd mmap >>> operation. This allows the VMM to map the memory and use mbind() to set >>> the desired NUMA policy. The policy is then retrieved via >>> mpol_shared_policy_lookup() and passed to filemap_grab_folio_mpol() to >>> ensure that allocations follow the specified memory policy. >>> >>> This enables the VMM to control guest memory NUMA placement by calling >>> mbind() on the mapped memory regions, providing fine-grained control over >>> guest memory allocation across NUMA nodes. >>> >>> The policy change only affect future allocations and does not migrate >>> existing memory. This matches mbind(2)'s default behavior which affects >>> only new allocations unless overridden with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL >>> flags, which are not supported for guest_memfd as it is unmovable. >>> >>> Suggested-by: David Hildenbrand >>> Signed-off-by: Shivank Garg >>> --- >>> virt/kvm/guest_memfd.c | 76 +++++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 75 insertions(+), 1 deletion(-) >>> >>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c >>> index f18176976ae3..b3a8819117a0 100644 >>> --- a/virt/kvm/guest_memfd.c >>> +++ b/virt/kvm/guest_memfd.c >>> @@ -2,6 +2,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> >>> @@ -11,8 +12,12 @@ struct kvm_gmem { >>> struct kvm *kvm; >>> struct xarray bindings; >>> struct list_head entry; >>> + struct shared_policy policy; >>> }; >>> >> >> struct shared_policy should be stored on the inode rather than the file, >> since the memory policy is a property of the memory (struct inode), >> rather than a property of how the memory is used for a given VM (struct >> file). > > That makes sense. AFAICS shmem also uses inodes to store policy. > >> When the shared_policy is stored on the inode, intra-host migration [1] >> will work correctly, since the while the inode will be transferred from >> one VM (struct kvm) to another, the file (a VM's view/bindings of the >> memory) will be recreated for the new VM. >> >> I'm thinking of having a patch like this [2] to introduce inodes. > > shmem has it easier by already having inodes > >> With this, we shouldn't need to pass file pointers instead of inode >> pointers. > > Any downsides, besides more work needed? Or is it feasible to do it using > files now and convert to inodes later? > > Feels like something that must have been discussed already, but I don't > recall specifics. Here's where Sean described file vs inode: "The inode is effectively the raw underlying physical storage, while the file is the VM's view of that storage." [1]. I guess you're right that for now there is little distinction between file and inode and using file should be feasible, but I feel that this dilutes the original intent. Something like [2] doesn't seem like too big of a change and could perhaps be included earlier rather than later, since it will also contribute to support for restricted mapping [3]. [1] https://lore.kernel.org/all/ZLGiEfJZTyl7M8mS@google.com/ [2] https://lore.kernel.org/all/d1940d466fc69472c8b6dda95df2e0522b2d8744.1726009989.git.ackerleytng@google.com/ [3] https://lore.kernel.org/all/20250117163001.2326672-1-tabba@google.com/T/