* [PATCH 1/1] userfaultfd: fix move_pages_pte() splitting folio under RCU read lock
@ 2023-12-30 2:56 Suren Baghdasaryan
2024-01-02 8:59 ` Peter Xu
0 siblings, 1 reply; 5+ messages in thread
From: Suren Baghdasaryan @ 2023-12-30 2:56 UTC (permalink / raw)
To: akpm
Cc: viro, brauner, shuah, aarcange, lokeshgidra, peterx, david,
ryan.roberts, hughd, mhocko, axelrasmussen, rppt, willy,
Liam.Howlett, jannh, zhangpeng362, bgeffon, kaleshsingh,
ngeoffray, jdduke, surenb, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest, kernel-team
While testing the split PMD path with lockdep enabled I've got an
"Invalid wait context" error caused by split_huge_page_to_list() trying
to lock anon_vma->rwsem while inside RCU read section. The issues is due
to move_pages_pte() calling split_folio() under RCU read lock. Fix this
by unmapping the PTEs and exiting RCU read section before splitting the
folio and then retrying. The same retry pattern is used when locking the
folio or anon_vma in this function.
Fixes: 94b01c885131 ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
Patch applies over mm-unstable.
Please note that the SHA in Fixes tag is unstable.
mm/userfaultfd.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 5e718014e671..71393410e028 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1078,9 +1078,14 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
/* at this point we have src_folio locked */
if (folio_test_large(src_folio)) {
+ /* split_folio() can block */
+ pte_unmap(&orig_src_pte);
+ pte_unmap(&orig_dst_pte);
+ src_pte = dst_pte = NULL;
err = split_folio(src_folio);
if (err)
goto out;
+ goto retry;
}
if (!src_anon_vma) {
--
2.43.0.472.g3155946c3a-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] userfaultfd: fix move_pages_pte() splitting folio under RCU read lock
2023-12-30 2:56 [PATCH 1/1] userfaultfd: fix move_pages_pte() splitting folio under RCU read lock Suren Baghdasaryan
@ 2024-01-02 8:59 ` Peter Xu
2024-01-02 16:58 ` Suren Baghdasaryan
0 siblings, 1 reply; 5+ messages in thread
From: Peter Xu @ 2024-01-02 8:59 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, viro, brauner, shuah, aarcange, lokeshgidra, david,
ryan.roberts, hughd, mhocko, axelrasmussen, rppt, willy,
Liam.Howlett, jannh, zhangpeng362, bgeffon, kaleshsingh,
ngeoffray, jdduke, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest, kernel-team
On Fri, Dec 29, 2023 at 06:56:07PM -0800, Suren Baghdasaryan wrote:
> @@ -1078,9 +1078,14 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
>
> /* at this point we have src_folio locked */
> if (folio_test_large(src_folio)) {
> + /* split_folio() can block */
> + pte_unmap(&orig_src_pte);
> + pte_unmap(&orig_dst_pte);
> + src_pte = dst_pte = NULL;
> err = split_folio(src_folio);
> if (err)
> goto out;
> + goto retry;
> }
Do we also need to clear src_folio and src_folio_pte? If the folio is a
thp, I think it means it's pte mapped here. Then after the split we may
want to fetch the small folio after the split, not the head one?
--
Peter Xu
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] userfaultfd: fix move_pages_pte() splitting folio under RCU read lock
2024-01-02 8:59 ` Peter Xu
@ 2024-01-02 16:58 ` Suren Baghdasaryan
2024-01-02 23:16 ` Suren Baghdasaryan
0 siblings, 1 reply; 5+ messages in thread
From: Suren Baghdasaryan @ 2024-01-02 16:58 UTC (permalink / raw)
To: Peter Xu
Cc: akpm, viro, brauner, shuah, aarcange, lokeshgidra, david,
ryan.roberts, hughd, mhocko, axelrasmussen, rppt, willy,
Liam.Howlett, jannh, zhangpeng362, bgeffon, kaleshsingh,
ngeoffray, jdduke, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest, kernel-team
On Tue, Jan 2, 2024 at 1:00 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Dec 29, 2023 at 06:56:07PM -0800, Suren Baghdasaryan wrote:
> > @@ -1078,9 +1078,14 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
> >
> > /* at this point we have src_folio locked */
> > if (folio_test_large(src_folio)) {
> > + /* split_folio() can block */
> > + pte_unmap(&orig_src_pte);
> > + pte_unmap(&orig_dst_pte);
> > + src_pte = dst_pte = NULL;
> > err = split_folio(src_folio);
> > if (err)
> > goto out;
> > + goto retry;
> > }
>
> Do we also need to clear src_folio and src_folio_pte? If the folio is a
> thp, I think it means it's pte mapped here. Then after the split we may
> want to fetch the small folio after the split, not the head one?
I think we need to re-fetch the src_folio only if the src_addr falls
into a non-head page. Looking at the __split_huge_page(), the head
page is skipped in the last loop, so I think it should stay valid.
That said, maybe it's just an implementation detail of the
__split_huge_page() and I should not rely on that and refetch anyway?
>
> --
> Peter Xu
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] userfaultfd: fix move_pages_pte() splitting folio under RCU read lock
2024-01-02 16:58 ` Suren Baghdasaryan
@ 2024-01-02 23:16 ` Suren Baghdasaryan
2024-01-02 23:34 ` Suren Baghdasaryan
0 siblings, 1 reply; 5+ messages in thread
From: Suren Baghdasaryan @ 2024-01-02 23:16 UTC (permalink / raw)
To: Peter Xu
Cc: akpm, viro, brauner, shuah, aarcange, lokeshgidra, david,
ryan.roberts, hughd, mhocko, axelrasmussen, rppt, willy,
Liam.Howlett, jannh, zhangpeng362, bgeffon, kaleshsingh,
ngeoffray, jdduke, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest, kernel-team
On Tue, Jan 2, 2024 at 8:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Jan 2, 2024 at 1:00 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Dec 29, 2023 at 06:56:07PM -0800, Suren Baghdasaryan wrote:
> > > @@ -1078,9 +1078,14 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
> > >
> > > /* at this point we have src_folio locked */
> > > if (folio_test_large(src_folio)) {
> > > + /* split_folio() can block */
> > > + pte_unmap(&orig_src_pte);
> > > + pte_unmap(&orig_dst_pte);
> > > + src_pte = dst_pte = NULL;
> > > err = split_folio(src_folio);
> > > if (err)
> > > goto out;
> > > + goto retry;
> > > }
> >
> > Do we also need to clear src_folio and src_folio_pte? If the folio is a
> > thp, I think it means it's pte mapped here. Then after the split we may
> > want to fetch the small folio after the split, not the head one?
>
> I think we need to re-fetch the src_folio only if the src_addr falls
> into a non-head page. Looking at the __split_huge_page(), the head
> page is skipped in the last loop, so I think it should stay valid.
> That said, maybe it's just an implementation detail of the
> __split_huge_page() and I should not rely on that and refetch anyway?
I'll post a v2 with this fix and re-fetching the folio
unconditionally. We also don't need to reset src_folio_pte value
because it's used only if src_folio is not NULL.
Thanks for catching this, Peter!
>
> >
> > --
> > Peter Xu
> >
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] userfaultfd: fix move_pages_pte() splitting folio under RCU read lock
2024-01-02 23:16 ` Suren Baghdasaryan
@ 2024-01-02 23:34 ` Suren Baghdasaryan
0 siblings, 0 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2024-01-02 23:34 UTC (permalink / raw)
To: Peter Xu
Cc: akpm, viro, brauner, shuah, aarcange, lokeshgidra, david,
ryan.roberts, hughd, mhocko, axelrasmussen, rppt, willy,
Liam.Howlett, jannh, zhangpeng362, bgeffon, kaleshsingh,
ngeoffray, jdduke, linux-mm, linux-fsdevel, linux-kernel,
linux-kselftest, kernel-team
On Tue, Jan 2, 2024 at 3:16 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Jan 2, 2024 at 8:58 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Tue, Jan 2, 2024 at 1:00 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Fri, Dec 29, 2023 at 06:56:07PM -0800, Suren Baghdasaryan wrote:
> > > > @@ -1078,9 +1078,14 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
> > > >
> > > > /* at this point we have src_folio locked */
> > > > if (folio_test_large(src_folio)) {
> > > > + /* split_folio() can block */
> > > > + pte_unmap(&orig_src_pte);
> > > > + pte_unmap(&orig_dst_pte);
> > > > + src_pte = dst_pte = NULL;
> > > > err = split_folio(src_folio);
> > > > if (err)
> > > > goto out;
> > > > + goto retry;
> > > > }
> > >
> > > Do we also need to clear src_folio and src_folio_pte? If the folio is a
> > > thp, I think it means it's pte mapped here. Then after the split we may
> > > want to fetch the small folio after the split, not the head one?
> >
> > I think we need to re-fetch the src_folio only if the src_addr falls
> > into a non-head page. Looking at the __split_huge_page(), the head
> > page is skipped in the last loop, so I think it should stay valid.
> > That said, maybe it's just an implementation detail of the
> > __split_huge_page() and I should not rely on that and refetch anyway?
>
> I'll post a v2 with this fix and re-fetching the folio
> unconditionally. We also don't need to reset src_folio_pte value
> because it's used only if src_folio is not NULL.
Posted at https://lore.kernel.org/all/20240102233256.1077959-1-surenb@google.com/
> Thanks for catching this, Peter!
>
> >
> > >
> > > --
> > > Peter Xu
> > >
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-02 23:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-30 2:56 [PATCH 1/1] userfaultfd: fix move_pages_pte() splitting folio under RCU read lock Suren Baghdasaryan
2024-01-02 8:59 ` Peter Xu
2024-01-02 16:58 ` Suren Baghdasaryan
2024-01-02 23:16 ` Suren Baghdasaryan
2024-01-02 23:34 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox