From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0D63C282C1 for ; Fri, 28 Feb 2025 17:25:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60211280008; Fri, 28 Feb 2025 12:25:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B1D9280004; Fri, 28 Feb 2025 12:25:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 479FA280008; Fri, 28 Feb 2025 12:25:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 29CF2280004 for ; Fri, 28 Feb 2025 12:25:45 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A3B231A0FBA for ; Fri, 28 Feb 2025 17:25:44 +0000 (UTC) X-FDA: 83170030608.17.D681E5F Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf17.hostedemail.com (Postfix) with ESMTP id 860FF40018 for ; Fri, 28 Feb 2025 17:25:39 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HO4hpFil; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3kfHBZwsKCMAgiqkxrk4ztmmuumrk.iusrot03-ssq1giq.uxm@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3kfHBZwsKCMAgiqkxrk4ztmmuumrk.iusrot03-ssq1giq.uxm@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740763539; a=rsa-sha256; cv=none; b=nQ0D/3ilKqheJG+jQx4Jsi5l4xXyjjSRhMQdy3IpcvVR8xzA8Y2toJPpUc0P3CEHg8d4TP ocx1AryhqHuqn2E1K45PNTITygb3W6bi3PJTrS0fT4N7tyMQlsERcVEiFbtR9uVe2kMetR vDaw1L4G5mac+sFYni7J8ACOGnKfx6Q= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HO4hpFil; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3kfHBZwsKCMAgiqkxrk4ztmmuumrk.iusrot03-ssq1giq.uxm@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3kfHBZwsKCMAgiqkxrk4ztmmuumrk.iusrot03-ssq1giq.uxm@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740763539; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=HGWX9JyCXcF3HXXuHfM2kyrS8sFlOh2/32kcia7ZZ1o=; b=gVFlKJZHFGkWQRlmVpgdAh+EFj15op2t0CfJRHXPPUe5Kah2luKPgqhCsT1QgKxZcJfq5N 2tiKF+0PjbwJvW/x0iWsx6Migsodg0uJVPAVK1o1hHa3k1+K1ftqjcRRvPdKusXKgS1lyw tzcAedaAqRteYW1cwagZ6LarVULez/Q= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2fea8e4a655so6794358a91.0 for ; Fri, 28 Feb 2025 09:25:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740763538; x=1741368338; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=HGWX9JyCXcF3HXXuHfM2kyrS8sFlOh2/32kcia7ZZ1o=; b=HO4hpFilIaunj7YDfJJPW7Ky1hzfuC7EBIatBBFu3ONWua9QMUrO7pwkjedn//pEF0 fQyZ8nVFpeplwcuoPKbFM4mQJR34nA72HIZ402AW1L1kLvqxubyEynVeV52KLdB7rfUw GLzLMycWpsLx6Diqk/eZ6XEW+qGzq25nk8yaVKwVaf27ZhmYN9Z8luW15WKIC07F7frv c8E5PBOvbJNrOH+CuiZHpEleRasXfjmnilDuqG4CZhLwvMMf2BxyHDktVF9lxQ5JsVER fQl5eSYMF8kVFUwpTmph8LjJri2xzEVd+84sdVAtBySG6KKZGXKTHJ7ygaIWjLroM9s/ uBgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740763538; x=1741368338; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HGWX9JyCXcF3HXXuHfM2kyrS8sFlOh2/32kcia7ZZ1o=; b=Tg1sHQPgsWe9gPC8rEnB4u3/WKPKj/nRq6j2UPNfFCucNYH/q65J1J8xrWEQ5retuz z7jYdqxDrmNdWQi6K9xfe6zpTFxDcHBR+uJ/hjmoVSTkPEYT9OlSoR1G5RWPRLFUaNPe KQywgcgjE7QuZilRCx+5JxqWbrNdL9Bqb8SzmHg6L0bEDmatXBAac6bXhQWXPPLWpwXh hdY65+zb/t4ytKq6XEn1RuXcQ3a5iJ73lJNYGTrWIs8+PP9EUYanqu5pWwbidrZrKy8Z xf3MmPZtlCkvKzZSl9RpjKIZI8F+/kmtOQsUJRvRzDGgYOOxUqsBeMMpUP4ZDBEipfUC O9aQ== X-Forwarded-Encrypted: i=1; AJvYcCU0Bq8Ln/U0c+3hpH2xDeJvb5U9Mio07mYscsxpTt/KYUGaAgHx+QnVWtfW0ndRqnpyQPl+HQSaag==@kvack.org X-Gm-Message-State: AOJu0Ywz4eA8atb9ExTkJRYc5WVEvKb4arqaxuGb0pZjVfrKh4PbHu9k rAdVea7M6QztKkdbQew9h43JAYYorNP6CRxNwvBv/rms+Gkzst8aJ2HVrOo9Tr3EcFUQemV1OrC 78gQplNAfHq98mNIuGTiw6Q== X-Google-Smtp-Source: AGHT+IEVamZ4Dk4Q9ektIiuhDOuGpmOLJiM7+w7jPIdHbpsjyMEUWr13Zn8uzbBWjARX1bXsyUzJf2To/ZLRvVhVPg== X-Received: from pjn11.prod.google.com ([2002:a17:90b:570b:b0:2f5:4762:e778]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:e7c6:b0:2ee:53b3:3f1c with SMTP id 98e67ed59e1d1-2febab2ed89mr6434291a91.5.1740763537744; Fri, 28 Feb 2025 09:25:37 -0800 (PST) Date: Fri, 28 Feb 2025 17:25:36 +0000 In-Reply-To: <20250226082549.6034-5-shivankg@amd.com> (message from Shivank Garg on Wed, 26 Feb 2025 08:25:48 +0000) Mime-Version: 1.0 Message-ID: Subject: Re: [PATCH v6 4/5] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy From: Ackerley Tng To: Shivank Garg Cc: akpm@linux-foundation.org, willy@infradead.org, pbonzini@redhat.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, chao.gao@intel.com, seanjc@google.com, david@redhat.com, vbabka@suse.cz, bharata@amd.com, nikunj@amd.com, michael.day@amd.com, Neeraj.Upadhyay@amd.com, thomas.lendacky@amd.com, michael.roth@amd.com, shivankg@amd.com, tabba@google.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 860FF40018 X-Stat-Signature: db7tti16dybez5nsp3aqyqwkqxx7ccdk X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1740763539-858054 X-HE-Meta: U2FsdGVkX1/yeWuMy/Y/owBcbPJpJL4EMkxkJ0xakO2mCMBL6XZOhKPbHok0EWgboZvipdwMITaaVfdWkwb836eCK1qqlRJWebAH0Q0xl6B6y1jAYRL4HwxaNcVM8XGnUxQE4yYbh6+4uqTqjz/nxANSlfmS4UowxaBEWXMlPMwBwp/xTX/likpn7vbrwf3VlwZpSGA5IR/vv0h+PW0j7d1a0JbQvsM+kDRkYVGrih2lyUKSJCQLOQ4pSe5VMTyRicS4IO2dGYtK4L+cwwokFHqod/MWIaCfRbt2b25ZB8AssWn738e5Xyd25K7ZWJZb3y/X68NjWSOSHPr7GVwCpajMyUC7hBQnG5Q65EgfGlMm3k0jNSQypMtGwn8NmsVwpCErMxHF6d6Akx6dqcKBFb6aSmWWWytUneWrb1EMlzOn5Je7djBwHtDKpqLE830Iz2BX2cyybQGDZqzTT1Rz5YjaOUBQ8mwbhMQHHiPq1QvoODQdrNbf+RM0Dt2VnO5iGlbdFF8L0Uwl37Zm6MrQ5t2W9j9I9augiGwdBBJ2Gvbn/ZAKhof5tg7nMDuDMvNhTDK+VMkCYQAcLVZRrX1ABfmyL4pXfAByQiVSSWhfNInAbbXR7DXUY4HEop5SV/3znbA0S7KWenrnNYeMgeoJqzNPoowKC9P78/Dmv8SVS96BN5tUmG2OrMbc61p+S4ndS2bf7gDxIBkZGBzfL74DBtHxzAlxxqGS9FqRdLi+9KA1LUsgsQfjN4RuohIzyMEqlCFxUcOqyPmH/2/eneoxRPY1KiLidHYuOlCp2rgoSA5FoiC4oyT4SnfM1vqI1zh1ooWFyk+KkoNHvEwgaW0FUCxbHJAF/vu/PhQ+yM9n1uySIfe/X9zB91Mc6Mi8F3jkratLTg+Qx0r7NXzmPya9CX8ZGlTQerlqDjUDjIhCOtZfJJNmOExhAbTnQe7uvpZoTNDCiBkXB2Tuaw6KVDx thrQryKk 4iqOp3gaq2VPGlJI+K3xeac63T5puV77bVVvljy4mVZF3/sJdqPcd4nabfG3zu3/Vob9HZikXCVkVuEvq+RNmcPWRPxXGC4Jc4YDoB8Oc+wptUrRODFYikNs1BhJUTkUiH6bqFu3q8GXyTJXLM33Kf5UUBGwe5HEQxCb5FJJOvyqKU45FjOkHJ52YsmyOe4mhR/nRv6rtPm3PMkENHB9SuK+ja27gAp9Y2AIgO0dPjixs1zNc3v1c2vH94UqumQKuPsQ2M5NWBGulrDrLyp/RWDN8i8Db4uZur5yNq6r+sYx7BvB8owkkn1I9fWSVYe1rjsrtofLzBkZYK0BrYPZm4y0/mL/1qAcjTIvHlvDtw5UkDHPYPcdrJKikpVoinvVU5i5AM3zvOUq5nRCDsCHVVuqmQt0LNiBB2yWl9zrGlNQadQ1PR9LHRUI0eZFe0f+p+MRA/Ek35KWrNABD24pvHQKFzLhQy/VIIKJV9INxzPeE0HCjskoj1+OWMTYROLzRzi4jzAqChKMKNWfPjw3cPc2EAPlVtfFb93nR/DqIjQkfIUdrn8NdEg45zVoZGpE2i0DhT5tPaUr/fiUaayPddOBh3MKJx5yffrv18MutAOl9tFs0ACEgrS0Kl3ivOD4WAn8zyDFkacvsvn16pnLHPZshgrP1DN32xoExpMWH1oBMpb0oq8VK6YaCmQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Shivank Garg writes: > Previously, guest-memfd allocations followed local NUMA node id in absence > of process mempolicy, resulting in arbitrary memory allocation. > Moreover, mbind() couldn't be used since memory wasn't mapped to userspace > in the VMM. > > Enable NUMA policy support by implementing vm_ops for guest-memfd mmap > operation. This allows the VMM to map the memory and use mbind() to set > the desired NUMA policy. The policy is then retrieved via > mpol_shared_policy_lookup() and passed to filemap_grab_folio_mpol() to > ensure that allocations follow the specified memory policy. > > This enables the VMM to control guest memory NUMA placement by calling > mbind() on the mapped memory regions, providing fine-grained control over > guest memory allocation across NUMA nodes. > > The policy change only affect future allocations and does not migrate > existing memory. This matches mbind(2)'s default behavior which affects > only new allocations unless overridden with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL > flags, which are not supported for guest_memfd as it is unmovable. > > Suggested-by: David Hildenbrand > Signed-off-by: Shivank Garg > --- > virt/kvm/guest_memfd.c | 76 +++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 75 insertions(+), 1 deletion(-) > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index f18176976ae3..b3a8819117a0 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -2,6 +2,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -11,8 +12,12 @@ struct kvm_gmem { > struct kvm *kvm; > struct xarray bindings; > struct list_head entry; > + struct shared_policy policy; > }; > struct shared_policy should be stored on the inode rather than the file, since the memory policy is a property of the memory (struct inode), rather than a property of how the memory is used for a given VM (struct file). When the shared_policy is stored on the inode, intra-host migration [1] will work correctly, since the while the inode will be transferred from one VM (struct kvm) to another, the file (a VM's view/bindings of the memory) will be recreated for the new VM. I'm thinking of having a patch like this [2] to introduce inodes. With this, we shouldn't need to pass file pointers instead of inode pointers. > +static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem *gmem, > + pgoff_t index); > + > /** > * folio_file_pfn - like folio_file_page, but return a pfn. > * @folio: The folio which contains this index. > @@ -99,7 +104,25 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, > static struct folio *kvm_gmem_get_folio(struct file *file, pgoff_t index) > { > /* TODO: Support huge pages. */ > - return filemap_grab_folio(file_inode(file)->i_mapping, index); > + struct kvm_gmem *gmem = file->private_data; > + struct inode *inode = file_inode(file); > + struct mempolicy *policy; > + struct folio *folio; > + > + /* > + * Fast-path: See if folio is already present in mapping to avoid > + * policy_lookup. > + */ > + folio = __filemap_get_folio(inode->i_mapping, index, > + FGP_LOCK | FGP_ACCESSED, 0); > + if (!IS_ERR(folio)) > + return folio; > + > + policy = kvm_gmem_get_pgoff_policy(gmem, index); > + folio = filemap_grab_folio_mpol(inode->i_mapping, index, policy); > + mpol_cond_put(policy); > + > + return folio; > } > > static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, > @@ -291,6 +314,7 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) > mutex_unlock(&kvm->slots_lock); > > xa_destroy(&gmem->bindings); > + mpol_free_shared_policy(&gmem->policy); > kfree(gmem); > > kvm_put_kvm(kvm); > @@ -312,8 +336,57 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) > { > return gfn - slot->base_gfn + slot->gmem.pgoff; > } > +#ifdef CONFIG_NUMA > +static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *new) > +{ > + struct file *file = vma->vm_file; > + struct kvm_gmem *gmem = file->private_data; > + > + return mpol_set_shared_policy(&gmem->policy, vma, new); > +} > + > +static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma, > + unsigned long addr, pgoff_t *pgoff) > +{ > + struct file *file = vma->vm_file; > + struct kvm_gmem *gmem = file->private_data; > + > + *pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); > + return mpol_shared_policy_lookup(&gmem->policy, *pgoff); > +} > + > +static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem *gmem, > + pgoff_t index) > +{ > + struct mempolicy *mpol; > + > + mpol = mpol_shared_policy_lookup(&gmem->policy, index); > + return mpol ? mpol : get_task_policy(current); > +} > +#else > +static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem *gmem, > + pgoff_t index) > +{ > + return NULL; > +} > +#endif /* CONFIG_NUMA */ > + > +static const struct vm_operations_struct kvm_gmem_vm_ops = { > +#ifdef CONFIG_NUMA > + .get_policy = kvm_gmem_get_policy, > + .set_policy = kvm_gmem_set_policy, > +#endif > +}; > + > +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) > +{ > + file_accessed(file); > + vma->vm_ops = &kvm_gmem_vm_ops; > + return 0; > +} > > static struct file_operations kvm_gmem_fops = { > + .mmap = kvm_gmem_mmap, > .open = generic_file_open, > .release = kvm_gmem_release, > .fallocate = kvm_gmem_fallocate, > @@ -446,6 +519,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) > kvm_get_kvm(kvm); > gmem->kvm = kvm; > xa_init(&gmem->bindings); > + mpol_shared_policy_init(&gmem->policy, NULL); > list_add(&gmem->entry, &inode->i_mapping->i_private_list); > > fd_install(fd, file); [1] https://lore.kernel.org/lkml/cover.1691446946.git.ackerleytng@google.com/T/ [2] https://lore.kernel.org/all/d1940d466fc69472c8b6dda95df2e0522b2d8744.1726009989.git.ackerleytng@google.com/