* [RFC PATCH v5 1/4] mm/filemap: add mempolicy support to the filemap layer
2025-02-19 10:15 [RFC PATCH v5 0/4] Add NUMA mempolicy support for KVM guest-memfd Shivank Garg
@ 2025-02-19 10:15 ` Shivank Garg
2025-02-19 10:15 ` [RFC PATCH v5 2/4] mm/mempolicy: export memory policy symbols Shivank Garg
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Shivank Garg @ 2025-02-19 10:15 UTC (permalink / raw)
To: akpm, willy, pbonzini
Cc: linux-fsdevel, linux-mm, linux-kernel, kvm, linux-coco, chao.gao,
seanjc, ackerleytng, david, vbabka, bharata, nikunj, michael.day,
Neeraj.Upadhyay, thomas.lendacky, michael.roth, shivankg
From: Shivansh Dhiman <shivansh.dhiman@amd.com>
Add NUMA mempolicy support to the filemap allocation path by introducing
new APIs that take a mempolicy argument:
- filemap_grab_folio_mpol()
- filemap_alloc_folio_mpol()
- __filemap_get_folio_mpol()
These APIs allow callers to specify a NUMA policy during page cache
allocations, enabling fine-grained control over memory placement. This is
particularly needed by KVM when using guest-memfd memory backends, where
the guest memory needs to be allocated according to the NUMA policy
specified by VMM.
The existing non-mempolicy APIs remain unchanged and continue to use the
default allocation behavior.
Signed-off-by: Shivansh Dhiman <shivansh.dhiman@amd.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
include/linux/pagemap.h | 39 +++++++++++++++++++++++++++++++++++++++
mm/filemap.c | 30 +++++++++++++++++++++++++-----
2 files changed, 64 insertions(+), 5 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 47bfc6b1b632..f480b3b29113 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -662,15 +662,24 @@ static inline void *detach_page_private(struct page *page)
#ifdef CONFIG_NUMA
struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
+struct folio *filemap_alloc_folio_mpol_noprof(gfp_t gfp, unsigned int order,
+ struct mempolicy *mpol);
#else
static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
{
return folio_alloc_noprof(gfp, order);
}
+static inline struct folio *filemap_alloc_folio_mpol_noprof(gfp_t gfp,
+ unsigned int order, struct mempolicy *mpol)
+{
+ return filemap_alloc_folio_noprof(gfp, order);
+}
#endif
#define filemap_alloc_folio(...) \
alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
+#define filemap_alloc_folio_mpol(...) \
+ alloc_hooks(filemap_alloc_folio_mpol_noprof(__VA_ARGS__))
static inline struct page *__page_cache_alloc(gfp_t gfp)
{
@@ -762,6 +771,8 @@ static inline fgf_t fgf_set_order(size_t size)
void *filemap_get_entry(struct address_space *mapping, pgoff_t index);
struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
fgf_t fgp_flags, gfp_t gfp);
+struct folio *__filemap_get_folio_mpol(struct address_space *mapping,
+ pgoff_t index, fgf_t fgp_flags, gfp_t gfp, struct mempolicy *mpol);
struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
fgf_t fgp_flags, gfp_t gfp);
@@ -820,6 +831,34 @@ static inline struct folio *filemap_grab_folio(struct address_space *mapping,
mapping_gfp_mask(mapping));
}
+/**
+ * filemap_grab_folio_mpol - grab a folio from the page cache.
+ * @mapping: The address space to search.
+ * @index: The page index.
+ * @mpol: The mempolicy to apply when allocating a new folio.
+ *
+ * Same as filemap_grab_folio(), except that it allocates the folio using
+ * given memory policy.
+ *
+ * Return: A found or created folio. ERR_PTR(-ENOMEM) if no folio is found
+ * and failed to create a folio.
+ */
+#ifdef CONFIG_NUMA
+static inline struct folio *filemap_grab_folio_mpol(struct address_space *mapping,
+ pgoff_t index, struct mempolicy *mpol)
+{
+ return __filemap_get_folio_mpol(mapping, index,
+ FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
+ mapping_gfp_mask(mapping), mpol);
+}
+#else
+static inline struct folio *filemap_grab_folio_mpol(struct address_space *mapping,
+ pgoff_t index, struct mempolicy *mpol)
+{
+ return filemap_grab_folio(mapping, index);
+}
+#endif /* CONFIG_NUMA */
+
/**
* find_get_page - find and get a page reference
* @mapping: the address_space to search
diff --git a/mm/filemap.c b/mm/filemap.c
index 804d7365680c..9abb20c4d705 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1001,11 +1001,17 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
EXPORT_SYMBOL_GPL(filemap_add_folio);
#ifdef CONFIG_NUMA
-struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
+struct folio *filemap_alloc_folio_mpol_noprof(gfp_t gfp, unsigned int order,
+ struct mempolicy *mpol)
{
int n;
struct folio *folio;
+ if (mpol)
+ return folio_alloc_mpol_noprof(gfp, order, mpol,
+ NO_INTERLEAVE_INDEX,
+ numa_node_id());
+
if (cpuset_do_page_mem_spread()) {
unsigned int cpuset_mems_cookie;
do {
@@ -1018,6 +1024,12 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
}
return folio_alloc_noprof(gfp, order);
}
+EXPORT_SYMBOL(filemap_alloc_folio_mpol_noprof);
+
+struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
+{
+ return filemap_alloc_folio_mpol_noprof(gfp, order, NULL);
+}
EXPORT_SYMBOL(filemap_alloc_folio_noprof);
#endif
@@ -1881,11 +1893,12 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index)
}
/**
- * __filemap_get_folio - Find and get a reference to a folio.
+ * __filemap_get_folio_mpol - Find and get a reference to a folio.
* @mapping: The address_space to search.
* @index: The page index.
* @fgp_flags: %FGP flags modify how the folio is returned.
* @gfp: Memory allocation flags to use if %FGP_CREAT is specified.
+ * @mpol: The mempolicy to apply when allocating a new folio.
*
* Looks up the page cache entry at @mapping & @index.
*
@@ -1896,8 +1909,8 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index)
*
* Return: The found folio or an ERR_PTR() otherwise.
*/
-struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
- fgf_t fgp_flags, gfp_t gfp)
+struct folio *__filemap_get_folio_mpol(struct address_space *mapping, pgoff_t index,
+ fgf_t fgp_flags, gfp_t gfp, struct mempolicy *mpol)
{
struct folio *folio;
@@ -1967,7 +1980,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
err = -ENOMEM;
if (order > min_order)
alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
- folio = filemap_alloc_folio(alloc_gfp, order);
+ folio = filemap_alloc_folio_mpol(alloc_gfp, order, mpol);
if (!folio)
continue;
@@ -2003,6 +2016,13 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
folio_clear_dropbehind(folio);
return folio;
}
+EXPORT_SYMBOL(__filemap_get_folio_mpol);
+
+struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
+ fgf_t fgp_flags, gfp_t gfp)
+{
+ return __filemap_get_folio_mpol(mapping, index, fgp_flags, gfp, NULL);
+}
EXPORT_SYMBOL(__filemap_get_folio);
static inline struct folio *find_get_entry(struct xa_state *xas, pgoff_t max,
--
2.34.1
^ permalink raw reply [flat|nested] 6+ messages in thread* [RFC PATCH v5 2/4] mm/mempolicy: export memory policy symbols
2025-02-19 10:15 [RFC PATCH v5 0/4] Add NUMA mempolicy support for KVM guest-memfd Shivank Garg
2025-02-19 10:15 ` [RFC PATCH v5 1/4] mm/filemap: add mempolicy support to the filemap layer Shivank Garg
@ 2025-02-19 10:15 ` Shivank Garg
2025-02-19 10:15 ` [RFC PATCH v5 3/4] KVM: guest_memfd: Pass file pointer instead of inode pointer Shivank Garg
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Shivank Garg @ 2025-02-19 10:15 UTC (permalink / raw)
To: akpm, willy, pbonzini
Cc: linux-fsdevel, linux-mm, linux-kernel, kvm, linux-coco, chao.gao,
seanjc, ackerleytng, david, vbabka, bharata, nikunj, michael.day,
Neeraj.Upadhyay, thomas.lendacky, michael.roth, shivankg
KVM guest_memfd wants to implement support for NUMA policies just like
shmem already does using the shared policy infrastructure. As
guest_memfd currently resides in KVM module code, we have to export the
relevant symbols.
In the future, guest_memfd might be moved to core-mm, at which point the
symbols no longer would have to be exported. When/if that happens is
still unclear.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
mm/mempolicy.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bbaadbeeb291..d9c5dcdadcd0 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -214,6 +214,7 @@ struct mempolicy *get_task_policy(struct task_struct *p)
return &default_policy;
}
+EXPORT_SYMBOL_GPL(get_task_policy);
static const struct mempolicy_operations {
int (*create)(struct mempolicy *pol, const nodemask_t *nodes);
@@ -347,6 +348,7 @@ void __mpol_put(struct mempolicy *pol)
return;
kmem_cache_free(policy_cache, pol);
}
+EXPORT_SYMBOL_GPL(__mpol_put);
static void mpol_rebind_default(struct mempolicy *pol, const nodemask_t *nodes)
{
@@ -2736,6 +2738,7 @@ struct mempolicy *mpol_shared_policy_lookup(struct shared_policy *sp,
read_unlock(&sp->lock);
return pol;
}
+EXPORT_SYMBOL_GPL(mpol_shared_policy_lookup);
static void sp_free(struct sp_node *n)
{
@@ -3021,6 +3024,7 @@ void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol)
mpol_put(mpol); /* drop our incoming ref on sb mpol */
}
}
+EXPORT_SYMBOL_GPL(mpol_shared_policy_init);
int mpol_set_shared_policy(struct shared_policy *sp,
struct vm_area_struct *vma, struct mempolicy *pol)
@@ -3039,6 +3043,7 @@ int mpol_set_shared_policy(struct shared_policy *sp,
sp_free(new);
return err;
}
+EXPORT_SYMBOL_GPL(mpol_set_shared_policy);
/* Free a backing policy store on inode delete. */
void mpol_free_shared_policy(struct shared_policy *sp)
@@ -3057,6 +3062,7 @@ void mpol_free_shared_policy(struct shared_policy *sp)
}
write_unlock(&sp->lock);
}
+EXPORT_SYMBOL_GPL(mpol_free_shared_policy);
#ifdef CONFIG_NUMA_BALANCING
static int __initdata numabalancing_override;
--
2.34.1
^ permalink raw reply [flat|nested] 6+ messages in thread* [RFC PATCH v5 3/4] KVM: guest_memfd: Pass file pointer instead of inode pointer
2025-02-19 10:15 [RFC PATCH v5 0/4] Add NUMA mempolicy support for KVM guest-memfd Shivank Garg
2025-02-19 10:15 ` [RFC PATCH v5 1/4] mm/filemap: add mempolicy support to the filemap layer Shivank Garg
2025-02-19 10:15 ` [RFC PATCH v5 2/4] mm/mempolicy: export memory policy symbols Shivank Garg
@ 2025-02-19 10:15 ` Shivank Garg
2025-02-19 10:15 ` [RFC PATCH v5 4/4] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy Shivank Garg
2025-02-21 11:30 ` [RFC PATCH v5 0/4] Add NUMA mempolicy support for KVM guest-memfd Shivank Garg
4 siblings, 0 replies; 6+ messages in thread
From: Shivank Garg @ 2025-02-19 10:15 UTC (permalink / raw)
To: akpm, willy, pbonzini
Cc: linux-fsdevel, linux-mm, linux-kernel, kvm, linux-coco, chao.gao,
seanjc, ackerleytng, david, vbabka, bharata, nikunj, michael.day,
Neeraj.Upadhyay, thomas.lendacky, michael.roth, shivankg
Pass file pointer instead of inode pointer to access struct kvm_gmem stored
in file->private_data. This change is needed to access NUMA policy when
allocating memory for guest_memfd, which will be added in a following
patch.
The following functions are modified to use file pointers:
- kvm_gmem_get_folio()
- kvm_gmem_allocate()
Preparatory patch and no functional changes.
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
virt/kvm/guest_memfd.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b2aa6bf24d3a..f18176976ae3 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -96,10 +96,10 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
* Ignore accessed, referenced, and dirty flags. The memory is
* unevictable and there is no storage to write back to.
*/
-static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
+static struct folio *kvm_gmem_get_folio(struct file *file, pgoff_t index)
{
/* TODO: Support huge pages. */
- return filemap_grab_folio(inode->i_mapping, index);
+ return filemap_grab_folio(file_inode(file)->i_mapping, index);
}
static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
@@ -177,8 +177,9 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
return 0;
}
-static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
+static long kvm_gmem_allocate(struct file *file, loff_t offset, loff_t len)
{
+ struct inode *inode = file_inode(file);
struct address_space *mapping = inode->i_mapping;
pgoff_t start, index, end;
int r;
@@ -201,7 +202,7 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
break;
}
- folio = kvm_gmem_get_folio(inode, index);
+ folio = kvm_gmem_get_folio(file, index);
if (IS_ERR(folio)) {
r = PTR_ERR(folio);
break;
@@ -241,7 +242,7 @@ static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
if (mode & FALLOC_FL_PUNCH_HOLE)
ret = kvm_gmem_punch_hole(file_inode(file), offset, len);
else
- ret = kvm_gmem_allocate(file_inode(file), offset, len);
+ ret = kvm_gmem_allocate(file, offset, len);
if (!ret)
file_modified(file);
@@ -585,7 +586,7 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file,
return ERR_PTR(-EIO);
}
- folio = kvm_gmem_get_folio(file_inode(file), index);
+ folio = kvm_gmem_get_folio(file, index);
if (IS_ERR(folio))
return folio;
--
2.34.1
^ permalink raw reply [flat|nested] 6+ messages in thread* [RFC PATCH v5 4/4] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
2025-02-19 10:15 [RFC PATCH v5 0/4] Add NUMA mempolicy support for KVM guest-memfd Shivank Garg
` (2 preceding siblings ...)
2025-02-19 10:15 ` [RFC PATCH v5 3/4] KVM: guest_memfd: Pass file pointer instead of inode pointer Shivank Garg
@ 2025-02-19 10:15 ` Shivank Garg
2025-02-21 11:30 ` [RFC PATCH v5 0/4] Add NUMA mempolicy support for KVM guest-memfd Shivank Garg
4 siblings, 0 replies; 6+ messages in thread
From: Shivank Garg @ 2025-02-19 10:15 UTC (permalink / raw)
To: akpm, willy, pbonzini
Cc: linux-fsdevel, linux-mm, linux-kernel, kvm, linux-coco, chao.gao,
seanjc, ackerleytng, david, vbabka, bharata, nikunj, michael.day,
Neeraj.Upadhyay, thomas.lendacky, michael.roth, shivankg
Previously, guest-memfd allocations were following local NUMA node id
in absence of process mempolicy, resulting in random memory allocation.
Moreover, mbind() couldn't be used since memory wasn't mapped to userspace
in VMM.
Enable NUMA policy support by implementing vm_ops for guest-memfd mmap
operation. This allows VMM to map the memory and use mbind() to set the
desired NUMA policy. The policy is then retrieved via
mpol_shared_policy_lookup() and passed to filemap_grab_folio_mpol() to
ensure that allocations follow the specified memory policy.
This enables VMM to control guest memory NUMA placement by calling mbind()
on the mapped memory regions, providing fine-grained control over guest
memory allocation across NUMA nodes.
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
virt/kvm/guest_memfd.c | 76 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 75 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index f18176976ae3..8d1dfce5d3dc 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -2,6 +2,7 @@
#include <linux/backing-dev.h>
#include <linux/falloc.h>
#include <linux/kvm_host.h>
+#include <linux/mempolicy.h>
#include <linux/pagemap.h>
#include <linux/anon_inodes.h>
@@ -11,8 +12,12 @@ struct kvm_gmem {
struct kvm *kvm;
struct xarray bindings;
struct list_head entry;
+ struct shared_policy policy;
};
+static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem *gmem,
+ pgoff_t index);
+
/**
* folio_file_pfn - like folio_file_page, but return a pfn.
* @folio: The folio which contains this index.
@@ -99,7 +104,25 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
static struct folio *kvm_gmem_get_folio(struct file *file, pgoff_t index)
{
/* TODO: Support huge pages. */
- return filemap_grab_folio(file_inode(file)->i_mapping, index);
+ struct kvm_gmem *gmem = file->private_data;
+ struct inode *inode = file_inode(file);
+ struct mempolicy *policy;
+ struct folio *folio;
+
+ /*
+ * Fast-path: See if folio is already present in mapping to avoid
+ * policy_lookup.
+ */
+ folio = __filemap_get_folio(inode->i_mapping, index,
+ FGP_LOCK | FGP_ACCESSED, 0);
+ if (!IS_ERR(folio))
+ return folio;
+
+ policy = kvm_gmem_get_pgoff_policy(gmem, index);
+ folio = filemap_grab_folio_mpol(inode->i_mapping, index, policy);
+ mpol_cond_put(policy);
+
+ return folio;
}
static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
@@ -291,6 +314,7 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
mutex_unlock(&kvm->slots_lock);
xa_destroy(&gmem->bindings);
+ mpol_free_shared_policy(&gmem->policy);
kfree(gmem);
kvm_put_kvm(kvm);
@@ -312,8 +336,57 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn)
{
return gfn - slot->base_gfn + slot->gmem.pgoff;
}
+#ifdef CONFIG_NUMA
+static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
+{
+ struct file *file = vma->vm_file;
+ struct kvm_gmem *gmem = file->private_data;
+
+ return mpol_set_shared_policy(&gmem->policy, vma, new);
+}
+
+static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
+ unsigned long addr, pgoff_t *pgoff)
+{
+ struct file *file = vma->vm_file;
+ struct kvm_gmem *gmem = file->private_data;
+
+ *pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
+ return mpol_shared_policy_lookup(&gmem->policy, *pgoff);
+}
+
+static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem *gmem,
+ pgoff_t index)
+{
+ struct mempolicy *mpol;
+
+ mpol = mpol_shared_policy_lookup(&gmem->policy, index);
+ return mpol ? mpol : get_task_policy(current);
+}
+#else
+static struct mempolicy *kvm_gmem_get_pgoff_policy(struct kvm_gmem *gmem,
+ pgoff_t index)
+{
+ return NULL;
+}
+#endif /* CONFIG_NUMA */
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+#ifdef CONFIG_NUMA
+ .get_policy = kvm_gmem_get_policy,
+ .set_policy = kvm_gmem_set_policy,
+#endif
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ file_accessed(file);
+ vma->vm_ops = &kvm_gmem_vm_ops;
+ return 0;
+}
static struct file_operations kvm_gmem_fops = {
+ .mmap = kvm_gmem_mmap,
.open = generic_file_open,
.release = kvm_gmem_release,
.fallocate = kvm_gmem_fallocate,
@@ -446,6 +519,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
kvm_get_kvm(kvm);
gmem->kvm = kvm;
xa_init(&gmem->bindings);
+ mpol_shared_policy_init(&gmem->policy, NULL);
list_add(&gmem->entry, &inode->i_mapping->i_private_list);
fd_install(fd, file);
--
2.34.1
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [RFC PATCH v5 0/4] Add NUMA mempolicy support for KVM guest-memfd
2025-02-19 10:15 [RFC PATCH v5 0/4] Add NUMA mempolicy support for KVM guest-memfd Shivank Garg
` (3 preceding siblings ...)
2025-02-19 10:15 ` [RFC PATCH v5 4/4] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy Shivank Garg
@ 2025-02-21 11:30 ` Shivank Garg
4 siblings, 0 replies; 6+ messages in thread
From: Shivank Garg @ 2025-02-21 11:30 UTC (permalink / raw)
To: akpm, willy, pbonzini
Cc: linux-fsdevel, linux-mm, linux-kernel, kvm, linux-coco, chao.gao,
seanjc, ackerleytng, david, vbabka, bharata, nikunj, michael.day,
Neeraj.Upadhyay, thomas.lendacky, michael.roth, Fuad Tabba
On 2/19/2025 3:45 PM, Shivank Garg wrote:
> KVM's guest-memfd memory backend currently lacks support for NUMA policy
> enforcement, causing guest memory allocations to be distributed arbitrarily
> across host NUMA nodes regardless of the policy specified by the VMM. This
> occurs because conventional userspace NUMA control mechanisms like mbind()
> are ineffective with guest-memfd, as the memory isn't directly mapped to
> userspace when allocations occur.
>
> This patch-series adds NUMA binding capabilities to guest_memfd backend
> KVM guests. It has evolved through several approaches based on community
> feedback:
> - v1,v2: Extended the KVM_CREATE_GUEST_MEMFD IOCTL to pass mempolicy.
> - v3: Introduced fbind() syscall for VMM memory-placement configuration.
> - v4,v5: Current approach using shared_policy support and vm_ops (based on
> suggestions from David[1] and guest_memfd biweekly upstream call[2]).
>
<--snip>
Hi All,
This patch-series was discussed during the bi-weekly guest_memfd upstream
call on 2025-02-20 [1].
Here are my notes from the discussion:
The current design using mmap and shared_policy support with vm_ops
appears good and aligns well with how shared memory handles NUMA policy.
This makes perfect sense with upcoming changes from Fuad [2].
Integration with Fuad's work should be straightforward as my work
primarily involves set_policy and get_policy callbacks in vm_ops.
Additionally, this approach helps us avoid any fpolicy/fbind()[3]
complexity.
David mentioned documenting the behavior of setting memory policy after
memory has already been allocated. Specifically, the policy change will
only affect future allocations and will not migrate existing memory.
This matches mbind(2)'s default behavior which affects only new allocations
(unless overridden with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL flags).
In the future, we may explore supporting MPOL_MF_MOVE for guest_memfd,
but for now, this behavior is sufficient and should be clearly documented.
Before sending the non-RFC version of the patch-series, I will:
- Document and clarify the memory allocation behavior after policy changes
- Write kselftests to validate NUMA policy enforcement, including edge
cases like changing policies after memory allocation
I aim to send the updated patch-series soon. If there are any further
suggestions or concerns, please let me know.
[1] https://lore.kernel.org/linux-mm/40290a46-bcf4-4ef6-ae13-109e18ad0dfd@redhat.com
[2] https://lore.kernel.org/linux-mm/20250218172500.807733-1-tabba@google.com
[3] https://lore.kernel.org/linux-mm/20241105164549.154700-1-shivankg@amd.com
Thanks,
Shivank
^ permalink raw reply [flat|nested] 6+ messages in thread