* [PATCH v2 1/5] mm/memory: Change vmf_anon_prepare() to be non-static
2024-02-21 23:47 [PATCH v2 0/5] Handle hugetlb faults under the VMA lock Vishal Moola (Oracle)
@ 2024-02-21 23:47 ` Vishal Moola (Oracle)
2024-02-22 3:31 ` Matthew Wilcox
2024-02-21 23:47 ` [PATCH v2 2/5] hugetlb: Move vm_struct declaration to the top of hugetlb_fault() Vishal Moola (Oracle)
` (3 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Vishal Moola (Oracle) @ 2024-02-21 23:47 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, akpm, muchun.song, willy, Vishal Moola (Oracle)
In order to handle hugetlb faults under the VMA lock, hugetlb can use
vmf_anon_prepare() to ensure we can safely prepare an anon_vma. Change
it to be a non-static function so it can be used within hugetlb as well.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
mm/internal.h | 1 +
mm/memory.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/internal.h b/mm/internal.h
index f309a010d50f..b9b6b2bc1663 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -103,6 +103,7 @@ static inline void wake_throttle_isolated(pg_data_t *pgdat)
wake_up(wqh);
}
+vm_fault_t vmf_anon_prepare(struct vm_fault *vmf);
vm_fault_t do_swap_page(struct vm_fault *vmf);
void folio_rotate_reclaimable(struct folio *folio);
bool __folio_end_writeback(struct folio *folio);
diff --git a/mm/memory.c b/mm/memory.c
index 89bcae0b224d..c93b058adfb2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3081,7 +3081,7 @@ static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf)
return VM_FAULT_RETRY;
}
-static vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
+vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH v2 2/5] hugetlb: Move vm_struct declaration to the top of hugetlb_fault()
2024-02-21 23:47 [PATCH v2 0/5] Handle hugetlb faults under the VMA lock Vishal Moola (Oracle)
2024-02-21 23:47 ` [PATCH v2 1/5] mm/memory: Change vmf_anon_prepare() to be non-static Vishal Moola (Oracle)
@ 2024-02-21 23:47 ` Vishal Moola (Oracle)
2024-02-22 3:35 ` Matthew Wilcox
2024-02-21 23:47 ` [PATCH v2 3/5] hugetlb: Pass struct vm_fault through to hugetlb_handle_userfault() Vishal Moola (Oracle)
` (2 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Vishal Moola (Oracle) @ 2024-02-21 23:47 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, akpm, muchun.song, willy, Vishal Moola (Oracle)
hugetlb_fault() currently defines a vm_struct to pass to the generic
handle_userfault() function. We can move this definition to the top of
hugetlb_fault() so that it can be used throughout the rest of the
hugetlb fault path.
This will help cleanup a number of excess variables and function
arguments throughout the stack. Also, since vm_fault already has space
to store the page offset, use that instead and get rid of idx.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
mm/hugetlb.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ed1581b670d4..d792d60ea16c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6353,13 +6353,25 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
spinlock_t *ptl;
vm_fault_t ret;
u32 hash;
- pgoff_t idx;
struct folio *folio = NULL;
struct folio *pagecache_folio = NULL;
struct hstate *h = hstate_vma(vma);
struct address_space *mapping;
int need_wait_lock = 0;
unsigned long haddr = address & huge_page_mask(h);
+ struct vm_fault vmf = {
+ .vma = vma,
+ .address = haddr,
+ .real_address = address,
+ .flags = flags,
+ .pgoff = vma_hugecache_offset(h, vma, haddr),
+ /* TODO: Track hugetlb faults using vm_fault */
+
+ /*
+ * Some fields may not be initialized, be careful as it may
+ * be hard to debug if called functions make assumptions
+ */
+ };
/* TODO: Handle faults under the VMA lock */
if (flags & FAULT_FLAG_VMA_LOCK) {
@@ -6373,8 +6385,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
* the same page in the page cache.
*/
mapping = vma->vm_file->f_mapping;
- idx = vma_hugecache_offset(h, vma, haddr);
- hash = hugetlb_fault_mutex_hash(mapping, idx);
+ hash = hugetlb_fault_mutex_hash(mapping, vmf.pgoff);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
/*
@@ -6408,8 +6419,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
* hugetlb_no_page will drop vma lock and hugetlb fault
* mutex internally, which make us return immediately.
*/
- return hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
- entry, flags);
+
+ return hugetlb_no_page(mm, vma, mapping, vmf.pgoff, address,
+ ptep, entry, flags);
}
ret = 0;
@@ -6455,7 +6467,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* Just decrements count, does not deallocate */
vma_end_reservation(h, vma, haddr);
- pagecache_folio = filemap_lock_hugetlb_folio(h, mapping, idx);
+ pagecache_folio = filemap_lock_hugetlb_folio(h, mapping,
+ vmf.pgoff);
if (IS_ERR(pagecache_folio))
pagecache_folio = NULL;
}
@@ -6470,13 +6483,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) &&
(flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) {
if (!userfaultfd_wp_async(vma)) {
- struct vm_fault vmf = {
- .vma = vma,
- .address = haddr,
- .real_address = address,
- .flags = flags,
- };
-
spin_unlock(ptl);
if (pagecache_folio) {
folio_unlock(pagecache_folio);
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v2 2/5] hugetlb: Move vm_struct declaration to the top of hugetlb_fault()
2024-02-21 23:47 ` [PATCH v2 2/5] hugetlb: Move vm_struct declaration to the top of hugetlb_fault() Vishal Moola (Oracle)
@ 2024-02-22 3:35 ` Matthew Wilcox
0 siblings, 0 replies; 13+ messages in thread
From: Matthew Wilcox @ 2024-02-22 3:35 UTC (permalink / raw)
To: Vishal Moola (Oracle); +Cc: linux-mm, linux-kernel, akpm, muchun.song
On Wed, Feb 21, 2024 at 03:47:29PM -0800, Vishal Moola (Oracle) wrote:
> hugetlb_fault() currently defines a vm_struct to pass to the generic
s/vm_struct/vm_fault/ (both this line and Subject:)
> handle_userfault() function. We can move this definition to the top of
> hugetlb_fault() so that it can be used throughout the rest of the
> hugetlb fault path.
>
> This will help cleanup a number of excess variables and function
> arguments throughout the stack. Also, since vm_fault already has space
> to store the page offset, use that instead and get rid of idx.
>
> Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 3/5] hugetlb: Pass struct vm_fault through to hugetlb_handle_userfault()
2024-02-21 23:47 [PATCH v2 0/5] Handle hugetlb faults under the VMA lock Vishal Moola (Oracle)
2024-02-21 23:47 ` [PATCH v2 1/5] mm/memory: Change vmf_anon_prepare() to be non-static Vishal Moola (Oracle)
2024-02-21 23:47 ` [PATCH v2 2/5] hugetlb: Move vm_struct declaration to the top of hugetlb_fault() Vishal Moola (Oracle)
@ 2024-02-21 23:47 ` Vishal Moola (Oracle)
2024-02-22 3:41 ` Matthew Wilcox
2024-02-21 23:47 ` [PATCH v2 4/5] hugetlb: Use vmf_anon_prepare() instead of anon_vma_prepare() Vishal Moola (Oracle)
2024-02-21 23:47 ` [PATCH v2 5/5] hugetlb: Allow faults to be handled under the VMA lock Vishal Moola (Oracle)
4 siblings, 1 reply; 13+ messages in thread
From: Vishal Moola (Oracle) @ 2024-02-21 23:47 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, akpm, muchun.song, willy, Vishal Moola (Oracle)
Now that hugetlb_fault() has a struct vm_fault, have
hugetlb_handle_userfault() use it instead of creating one of its own.
This lets us reduce the number of arguments passed to
hugetlb_handle_userfault() from 7 to 3, cleaning up the code and stack.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
mm/hugetlb.c | 38 +++++++++-----------------------------
1 file changed, 9 insertions(+), 29 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d792d60ea16c..70c5870e859e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6060,39 +6060,21 @@ int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping
return 0;
}
-static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma,
+static inline vm_fault_t hugetlb_handle_userfault(struct vm_fault *vmf,
struct address_space *mapping,
- pgoff_t idx,
- unsigned int flags,
- unsigned long haddr,
- unsigned long addr,
unsigned long reason)
{
u32 hash;
- struct vm_fault vmf = {
- .vma = vma,
- .address = haddr,
- .real_address = addr,
- .flags = flags,
-
- /*
- * Hard to debug if it ends up being
- * used by a callee that assumes
- * something about the other
- * uninitialized fields... same as in
- * memory.c
- */
- };
/*
* vma_lock and hugetlb_fault_mutex must be dropped before handling
* userfault. Also mmap_lock could be dropped due to handling
* userfault, any vma operation should be careful from here.
*/
- hugetlb_vma_unlock_read(vma);
- hash = hugetlb_fault_mutex_hash(mapping, idx);
+ hugetlb_vma_unlock_read(vmf->vma);
+ hash = hugetlb_fault_mutex_hash(mapping, vmf->pgoff);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
- return handle_userfault(&vmf, reason);
+ return handle_userfault(vmf, reason);
}
/*
@@ -6116,7 +6098,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
struct vm_area_struct *vma,
struct address_space *mapping, pgoff_t idx,
unsigned long address, pte_t *ptep,
- pte_t old_pte, unsigned int flags)
+ pte_t old_pte, unsigned int flags,
+ struct vm_fault *vmf)
{
struct hstate *h = hstate_vma(vma);
vm_fault_t ret = VM_FAULT_SIGBUS;
@@ -6175,8 +6158,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
goto out;
}
- return hugetlb_handle_userfault(vma, mapping, idx, flags,
- haddr, address,
+ return hugetlb_handle_userfault(vmf, mapping,
VM_UFFD_MISSING);
}
@@ -6248,8 +6230,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
ret = 0;
goto out;
}
- return hugetlb_handle_userfault(vma, mapping, idx, flags,
- haddr, address,
+ return hugetlb_handle_userfault(vmf, mapping,
VM_UFFD_MINOR);
}
}
@@ -6419,9 +6400,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
* hugetlb_no_page will drop vma lock and hugetlb fault
* mutex internally, which make us return immediately.
*/
-
return hugetlb_no_page(mm, vma, mapping, vmf.pgoff, address,
- ptep, entry, flags);
+ ptep, entry, flags, &vmf);
}
ret = 0;
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v2 3/5] hugetlb: Pass struct vm_fault through to hugetlb_handle_userfault()
2024-02-21 23:47 ` [PATCH v2 3/5] hugetlb: Pass struct vm_fault through to hugetlb_handle_userfault() Vishal Moola (Oracle)
@ 2024-02-22 3:41 ` Matthew Wilcox
2024-02-22 16:13 ` Vishal Moola
0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2024-02-22 3:41 UTC (permalink / raw)
To: Vishal Moola (Oracle); +Cc: linux-mm, linux-kernel, akpm, muchun.song
On Wed, Feb 21, 2024 at 03:47:30PM -0800, Vishal Moola (Oracle) wrote:
> Now that hugetlb_fault() has a struct vm_fault, have
> hugetlb_handle_userfault() use it instead of creating one of its own.
>
> This lets us reduce the number of arguments passed to
> hugetlb_handle_userfault() from 7 to 3, cleaning up the code and stack.
>
> Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> ---
> mm/hugetlb.c | 38 +++++++++-----------------------------
> 1 file changed, 9 insertions(+), 29 deletions(-)
I love the look of this ...
> @@ -6116,7 +6098,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> struct vm_area_struct *vma,
> struct address_space *mapping, pgoff_t idx,
> unsigned long address, pte_t *ptep,
> - pte_t old_pte, unsigned int flags)
> + pte_t old_pte, unsigned int flags,
> + struct vm_fault *vmf)
Should we remove vma, address, idx and flags?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/5] hugetlb: Pass struct vm_fault through to hugetlb_handle_userfault()
2024-02-22 3:41 ` Matthew Wilcox
@ 2024-02-22 16:13 ` Vishal Moola
0 siblings, 0 replies; 13+ messages in thread
From: Vishal Moola @ 2024-02-22 16:13 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-mm, linux-kernel, akpm, muchun.song
On Wed, Feb 21, 2024 at 7:41 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 21, 2024 at 03:47:30PM -0800, Vishal Moola (Oracle) wrote:
> > Now that hugetlb_fault() has a struct vm_fault, have
> > hugetlb_handle_userfault() use it instead of creating one of its own.
> >
> > This lets us reduce the number of arguments passed to
> > hugetlb_handle_userfault() from 7 to 3, cleaning up the code and stack.
> >
> > Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> > ---
> > mm/hugetlb.c | 38 +++++++++-----------------------------
> > 1 file changed, 9 insertions(+), 29 deletions(-)
>
> I love the look of this ...
>
> > @@ -6116,7 +6098,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> > struct vm_area_struct *vma,
> > struct address_space *mapping, pgoff_t idx,
> > unsigned long address, pte_t *ptep,
> > - pte_t old_pte, unsigned int flags)
> > + pte_t old_pte, unsigned int flags,
> > + struct vm_fault *vmf)
>
> Should we remove vma, address, idx and flags?
Yes, I'm going to do that in another patchset, this one is mainly about
enabling hugetlb_fault() to work safely under the VMA lock. It will
make it easier to debug if any substitution goes wrong somewhere as well.
We may also be able to remove one (or more) of the pte_t arguments,
but I have to look into that more.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 4/5] hugetlb: Use vmf_anon_prepare() instead of anon_vma_prepare()
2024-02-21 23:47 [PATCH v2 0/5] Handle hugetlb faults under the VMA lock Vishal Moola (Oracle)
` (2 preceding siblings ...)
2024-02-21 23:47 ` [PATCH v2 3/5] hugetlb: Pass struct vm_fault through to hugetlb_handle_userfault() Vishal Moola (Oracle)
@ 2024-02-21 23:47 ` Vishal Moola (Oracle)
2024-02-22 3:51 ` Matthew Wilcox
2024-02-21 23:47 ` [PATCH v2 5/5] hugetlb: Allow faults to be handled under the VMA lock Vishal Moola (Oracle)
4 siblings, 1 reply; 13+ messages in thread
From: Vishal Moola (Oracle) @ 2024-02-21 23:47 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, akpm, muchun.song, willy, Vishal Moola (Oracle)
hugetlb_no_page() and hugetlb_wp() call anon_vma_prepare(). In
preparation for hugetlb to safely handle faults under the VMA lock,
use vmf_anon_prepare() here instead.
Additionally, passing hugetlb_wp() the vm_fault struct from hugetlb_fault()
works toward cleaning up the hugetlb code and function stack.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
mm/hugetlb.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 70c5870e859e..ae8c8b3da981 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5826,7 +5826,8 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
*/
static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pte_t *ptep, unsigned int flags,
- struct folio *pagecache_folio, spinlock_t *ptl)
+ struct folio *pagecache_folio, spinlock_t *ptl,
+ struct vm_fault *vmf)
{
const bool unshare = flags & FAULT_FLAG_UNSHARE;
pte_t pte = huge_ptep_get(ptep);
@@ -5960,10 +5961,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
* When the original hugepage is shared one, it does not have
* anon_vma prepared.
*/
- if (unlikely(anon_vma_prepare(vma))) {
- ret = VM_FAULT_OOM;
+ ret = vmf_anon_prepare(vmf);
+ if (unlikely(ret))
goto out_release_all;
- }
if (copy_user_large_folio(new_folio, old_folio, address, vma)) {
ret = VM_FAULT_HWPOISON_LARGE;
@@ -6203,10 +6203,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
new_pagecache_folio = true;
} else {
folio_lock(folio);
- if (unlikely(anon_vma_prepare(vma))) {
- ret = VM_FAULT_OOM;
+
+ ret = vmf_anon_prepare(vmf);
+ if (unlikely(ret))
goto backout_unlocked;
- }
anon_rmap = 1;
}
} else {
@@ -6273,7 +6273,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
hugetlb_count_add(pages_per_huge_page(h), mm);
if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
/* Optimization, do the COW without a second fault */
- ret = hugetlb_wp(mm, vma, address, ptep, flags, folio, ptl);
+ ret = hugetlb_wp(mm, vma, address, ptep, flags, folio, ptl, vmf);
}
spin_unlock(ptl);
@@ -6496,7 +6496,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
if (!huge_pte_write(entry)) {
ret = hugetlb_wp(mm, vma, address, ptep, flags,
- pagecache_folio, ptl);
+ pagecache_folio, ptl, &vmf);
goto out_put_page;
} else if (likely(flags & FAULT_FLAG_WRITE)) {
entry = huge_pte_mkdirty(entry);
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v2 4/5] hugetlb: Use vmf_anon_prepare() instead of anon_vma_prepare()
2024-02-21 23:47 ` [PATCH v2 4/5] hugetlb: Use vmf_anon_prepare() instead of anon_vma_prepare() Vishal Moola (Oracle)
@ 2024-02-22 3:51 ` Matthew Wilcox
0 siblings, 0 replies; 13+ messages in thread
From: Matthew Wilcox @ 2024-02-22 3:51 UTC (permalink / raw)
To: Vishal Moola (Oracle); +Cc: linux-mm, linux-kernel, akpm, muchun.song
On Wed, Feb 21, 2024 at 03:47:31PM -0800, Vishal Moola (Oracle) wrote:
> hugetlb_no_page() and hugetlb_wp() call anon_vma_prepare(). In
> preparation for hugetlb to safely handle faults under the VMA lock,
> use vmf_anon_prepare() here instead.
>
> Additionally, passing hugetlb_wp() the vm_fault struct from hugetlb_fault()
> works toward cleaning up the hugetlb code and function stack.
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long address, pte_t *ptep, unsigned int flags,
> - struct folio *pagecache_folio, spinlock_t *ptl)
> + struct folio *pagecache_folio, spinlock_t *ptl,
> + struct vm_fault *vmf)
Is it worth removing vma, address and flags?
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 5/5] hugetlb: Allow faults to be handled under the VMA lock
2024-02-21 23:47 [PATCH v2 0/5] Handle hugetlb faults under the VMA lock Vishal Moola (Oracle)
` (3 preceding siblings ...)
2024-02-21 23:47 ` [PATCH v2 4/5] hugetlb: Use vmf_anon_prepare() instead of anon_vma_prepare() Vishal Moola (Oracle)
@ 2024-02-21 23:47 ` Vishal Moola (Oracle)
2024-02-22 3:55 ` Matthew Wilcox
4 siblings, 1 reply; 13+ messages in thread
From: Vishal Moola (Oracle) @ 2024-02-21 23:47 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, akpm, muchun.song, willy, Vishal Moola (Oracle)
Hugetlb can now safely handle faults under the VMA lock, so allow it to
do so.
This patch may cause ltp hugemmap10 to "fail". Hugemmap10 tests hugetlb
counters, and expects the counters to remain unchanged on failure to
handle a fault.
In hugetlb_no_page(), vmf_anon_prepare() may bailout with no anon_vma
under the VMA lock after allocating a folio for the hugepage. In
free_huge_folio(), this folio is completely freed on bailout iff there
is a surplus of hugetlb pages. This will remove a folio off the freelist
and decrement the number of hugepages while ltp expects these counters
to remain unchanged on failure.
Originally this could only happen due to OOM failures, but now it may
also occur after we allocate a hugetlb folio without a suitable anon_vma
under the VMA lock. This should only happen for the first freshly
allocated hugepage in this vma.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
mm/hugetlb.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ae8c8b3da981..688017ca0cc2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6354,12 +6354,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
*/
};
- /* TODO: Handle faults under the VMA lock */
- if (flags & FAULT_FLAG_VMA_LOCK) {
- vma_end_read(vma);
- return VM_FAULT_RETRY;
- }
-
/*
* Serialize hugepage allocation and instantiation, so that we don't
* get spurious allocation failures if two CPUs race to instantiate
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v2 5/5] hugetlb: Allow faults to be handled under the VMA lock
2024-02-21 23:47 ` [PATCH v2 5/5] hugetlb: Allow faults to be handled under the VMA lock Vishal Moola (Oracle)
@ 2024-02-22 3:55 ` Matthew Wilcox
2024-02-22 17:27 ` Vishal Moola
0 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2024-02-22 3:55 UTC (permalink / raw)
To: Vishal Moola (Oracle)
Cc: linux-mm, linux-kernel, akpm, muchun.song, Cyril Hrubis,
Jan Stancek, Petr Vorel, ltp
On Wed, Feb 21, 2024 at 03:47:32PM -0800, Vishal Moola (Oracle) wrote:
> Hugetlb can now safely handle faults under the VMA lock, so allow it to
> do so.
>
> This patch may cause ltp hugemmap10 to "fail". Hugemmap10 tests hugetlb
> counters, and expects the counters to remain unchanged on failure to
> handle a fault.
>
> In hugetlb_no_page(), vmf_anon_prepare() may bailout with no anon_vma
> under the VMA lock after allocating a folio for the hugepage. In
> free_huge_folio(), this folio is completely freed on bailout iff there
> is a surplus of hugetlb pages. This will remove a folio off the freelist
> and decrement the number of hugepages while ltp expects these counters
> to remain unchanged on failure.
>
> Originally this could only happen due to OOM failures, but now it may
> also occur after we allocate a hugetlb folio without a suitable anon_vma
> under the VMA lock. This should only happen for the first freshly
> allocated hugepage in this vma.
Hmm, so it's a bug in LTP. Have you talked to the LTP people about it
(cc's added)?
Also, did you try moving the anon_vma check befor the folio allocation
so that you can bail out without disturbing the counters?
> Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 5/5] hugetlb: Allow faults to be handled under the VMA lock
2024-02-22 3:55 ` Matthew Wilcox
@ 2024-02-22 17:27 ` Vishal Moola
0 siblings, 0 replies; 13+ messages in thread
From: Vishal Moola @ 2024-02-22 17:27 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-mm, linux-kernel, akpm, muchun.song, Cyril Hrubis,
Jan Stancek, Petr Vorel, ltp
On Wed, Feb 21, 2024 at 7:55 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 21, 2024 at 03:47:32PM -0800, Vishal Moola (Oracle) wrote:
> > Hugetlb can now safely handle faults under the VMA lock, so allow it to
> > do so.
> >
> > This patch may cause ltp hugemmap10 to "fail". Hugemmap10 tests hugetlb
> > counters, and expects the counters to remain unchanged on failure to
> > handle a fault.
> >
> > In hugetlb_no_page(), vmf_anon_prepare() may bailout with no anon_vma
> > under the VMA lock after allocating a folio for the hugepage. In
> > free_huge_folio(), this folio is completely freed on bailout iff there
> > is a surplus of hugetlb pages. This will remove a folio off the freelist
> > and decrement the number of hugepages while ltp expects these counters
> > to remain unchanged on failure.
> >
> > Originally this could only happen due to OOM failures, but now it may
> > also occur after we allocate a hugetlb folio without a suitable anon_vma
> > under the VMA lock. This should only happen for the first freshly
> > allocated hugepage in this vma.
>
> Hmm, so it's a bug in LTP. Have you talked to the LTP people about it
> (cc's added)?
>
> Also, did you try moving the anon_vma check befor the folio allocation
> so that you can bail out without disturbing the counters?
Moving the check before folio allocation occurs keeps the folios on the
freelist so the counters remain static as expected.
If we are looking at a shareable vma, hugetlb_no_page() does not need
this check at all, so I left the check where it is. We could definitely place
the anon_vma check above allocation, it would just make the code
slightly more complicated - needing another variable (or reassigning
ret in various places) as well as adding another potentially unnecessary
vma flag check.
> > Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
^ permalink raw reply [flat|nested] 13+ messages in thread