linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH next] mm/khugepaged: fix conflicting mods to collapse_file()
@ 2023-04-23  4:47 Hugh Dickins
  2023-04-24  1:53 ` David Stevens
  2023-04-24 22:07 ` Andrew Morton
  0 siblings, 2 replies; 5+ messages in thread
From: Hugh Dickins @ 2023-04-23  4:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ivan Orlov, Jiaqi Yan, David Stevens, linux-kernel, linux-mm

Inserting Ivan Orlov's syzbot fix commit 2ce0bdfebc74
("mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()")
ahead of Jiaqi Yan's and David Stevens's commits
12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory")
cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
ac492b9c70ca ("mm/khugepaged: skip shmem with userfaultfd")
(all of which restructure collapse_file()) did not work out well.

xfstests generic/086 on huge tmpfs (with accelerated khugepaged) freezes
(if not on the first attempt, then the 2nd or 3rd) in find_lock_entries()
while doing drop_caches: the file's xarray seems to have been corrupted,
with find_get_entry() returning nonsense which makes no progress.

Bisection led to ac492b9c70ca; and diff against earlier working linux-next
suggested that it's probably down to an errant xas_store(), which does not
belong with the later changes (and nor does the positioning of warnings).
The later changes look as if they fix the syzbot issue independently.

Remove most of what's left of 2ce0bdfebc74: just leave one WARN_ON_ONCE
(xas_error) after the final xas_store() of the multi-index entry.

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/khugepaged.c | 23 +----------------------
 1 file changed, 1 insertion(+), 22 deletions(-)

--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1941,16 +1941,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 					result = SCAN_FAIL;
 					goto xa_locked;
 				}
-				xas_store(&xas, hpage);
-				if (xas_error(&xas)) {
-					/* revert shmem_charge performed
-					 * in the previous condition
-					 */
-					mapping->nrpages--;
-					shmem_uncharge(mapping->host, 1);
-					result = SCAN_STORE_FAILED;
-					goto xa_locked;
-				}
 				nr_none++;
 				continue;
 			}
@@ -2105,13 +2095,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 		 * Accumulate the pages that are being collapsed.
 		 */
 		list_add_tail(&page->lru, &pagelist);
-
-		/*
-		 * We can't get an ENOMEM here (because the allocation happened
-		 * before) but let's check for errors (XArray implementation
-		 * can be changed in the future)
-		 */
-		WARN_ON_ONCE(xas_error(&xas));
 		continue;
 out_unlock:
 		unlock_page(page);
@@ -2134,11 +2117,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 		}
 	}
 
-	/* Here we can't get an ENOMEM (because entries were
-	 * previously allocated) But let's check for errors
-	 * (XArray implementation can be changed in the future)
-	 */
-	WARN_ON_ONCE(xas_error(&xas));
 xa_locked:
 	xas_unlock_irq(&xas);
 xa_unlocked:
@@ -2259,6 +2237,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	/* Join all the small entries into a single multi-index entry. */
 	xas_set_order(&xas, start, HPAGE_PMD_ORDER);
 	xas_store(&xas, hpage);
+	WARN_ON_ONCE(xas_error(&xas));
 	xas_unlock_irq(&xas);
 
 	/*


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH next] mm/khugepaged: fix conflicting mods to collapse_file()
  2023-04-23  4:47 [PATCH next] mm/khugepaged: fix conflicting mods to collapse_file() Hugh Dickins
@ 2023-04-24  1:53 ` David Stevens
  2023-04-24  2:17   ` Hugh Dickins
  2023-04-24 22:07 ` Andrew Morton
  1 sibling, 1 reply; 5+ messages in thread
From: David Stevens @ 2023-04-24  1:53 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Ivan Orlov, Jiaqi Yan, linux-kernel, linux-mm

On Sun, Apr 23, 2023 at 1:47 PM Hugh Dickins <hughd@google.com> wrote:
>
> Inserting Ivan Orlov's syzbot fix commit 2ce0bdfebc74
> ("mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()")
> ahead of Jiaqi Yan's and David Stevens's commits
> 12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory")
> cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> ac492b9c70ca ("mm/khugepaged: skip shmem with userfaultfd")
> (all of which restructure collapse_file()) did not work out well.
>
> xfstests generic/086 on huge tmpfs (with accelerated khugepaged) freezes
> (if not on the first attempt, then the 2nd or 3rd) in find_lock_entries()
> while doing drop_caches: the file's xarray seems to have been corrupted,
> with find_get_entry() returning nonsense which makes no progress.
>
> Bisection led to ac492b9c70ca; and diff against earlier working linux-next
> suggested that it's probably down to an errant xas_store(), which does not
> belong with the later changes (and nor does the positioning of warnings).
> The later changes look as if they fix the syzbot issue independently.
>
> Remove most of what's left of 2ce0bdfebc74: just leave one WARN_ON_ONCE
> (xas_error) after the final xas_store() of the multi-index entry.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>
>  mm/khugepaged.c | 23 +----------------------
>  1 file changed, 1 insertion(+), 22 deletions(-)
>
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1941,16 +1941,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>                                         result = SCAN_FAIL;
>                                         goto xa_locked;
>                                 }
> -                               xas_store(&xas, hpage);
> -                               if (xas_error(&xas)) {
> -                                       /* revert shmem_charge performed
> -                                        * in the previous condition
> -                                        */
> -                                       mapping->nrpages--;
> -                                       shmem_uncharge(mapping->host, 1);
> -                                       result = SCAN_STORE_FAILED;

With this being removed, SCAN_STORE_FAILED should also be removed from
the scan_result enum and trace event definitions.

-David

> -                                       goto xa_locked;
> -                               }
>                                 nr_none++;
>                                 continue;
>                         }
> @@ -2105,13 +2095,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>                  * Accumulate the pages that are being collapsed.
>                  */
>                 list_add_tail(&page->lru, &pagelist);
> -
> -               /*
> -                * We can't get an ENOMEM here (because the allocation happened
> -                * before) but let's check for errors (XArray implementation
> -                * can be changed in the future)
> -                */
> -               WARN_ON_ONCE(xas_error(&xas));
>                 continue;
>  out_unlock:
>                 unlock_page(page);
> @@ -2134,11 +2117,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>                 }
>         }
>
> -       /* Here we can't get an ENOMEM (because entries were
> -        * previously allocated) But let's check for errors
> -        * (XArray implementation can be changed in the future)
> -        */
> -       WARN_ON_ONCE(xas_error(&xas));
>  xa_locked:
>         xas_unlock_irq(&xas);
>  xa_unlocked:
> @@ -2259,6 +2237,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>         /* Join all the small entries into a single multi-index entry. */
>         xas_set_order(&xas, start, HPAGE_PMD_ORDER);
>         xas_store(&xas, hpage);
> +       WARN_ON_ONCE(xas_error(&xas));
>         xas_unlock_irq(&xas);
>
>         /*


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH next] mm/khugepaged: fix conflicting mods to collapse_file()
  2023-04-24  1:53 ` David Stevens
@ 2023-04-24  2:17   ` Hugh Dickins
  2023-04-24  2:36     ` David Stevens
  0 siblings, 1 reply; 5+ messages in thread
From: Hugh Dickins @ 2023-04-24  2:17 UTC (permalink / raw)
  To: David Stevens
  Cc: Hugh Dickins, Andrew Morton, Ivan Orlov, Jiaqi Yan, linux-kernel,
	linux-mm

[-- Attachment #1: Type: text/plain, Size: 2929 bytes --]

On Mon, 24 Apr 2023, David Stevens wrote:
> On Sun, Apr 23, 2023 at 1:47 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > Inserting Ivan Orlov's syzbot fix commit 2ce0bdfebc74
> > ("mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()")
> > ahead of Jiaqi Yan's and David Stevens's commits
> > 12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory")
> > cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> > ac492b9c70ca ("mm/khugepaged: skip shmem with userfaultfd")
> > (all of which restructure collapse_file()) did not work out well.
> >
> > xfstests generic/086 on huge tmpfs (with accelerated khugepaged) freezes
> > (if not on the first attempt, then the 2nd or 3rd) in find_lock_entries()
> > while doing drop_caches: the file's xarray seems to have been corrupted,
> > with find_get_entry() returning nonsense which makes no progress.
> >
> > Bisection led to ac492b9c70ca; and diff against earlier working linux-next
> > suggested that it's probably down to an errant xas_store(), which does not
> > belong with the later changes (and nor does the positioning of warnings).
> > The later changes look as if they fix the syzbot issue independently.
> >
> > Remove most of what's left of 2ce0bdfebc74: just leave one WARN_ON_ONCE
> > (xas_error) after the final xas_store() of the multi-index entry.
> >
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> >
> >  mm/khugepaged.c | 23 +----------------------
> >  1 file changed, 1 insertion(+), 22 deletions(-)
> >
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1941,16 +1941,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> >                                         result = SCAN_FAIL;
> >                                         goto xa_locked;
> >                                 }
> > -                               xas_store(&xas, hpage);
> > -                               if (xas_error(&xas)) {
> > -                                       /* revert shmem_charge performed
> > -                                        * in the previous condition
> > -                                        */
> > -                                       mapping->nrpages--;
> > -                                       shmem_uncharge(mapping->host, 1);
> > -                                       result = SCAN_STORE_FAILED;
> 
> With this being removed, SCAN_STORE_FAILED should also be removed from
> the scan_result enum and trace event definitions.

Only if we also remove your use of SCAN_STORE_FAILED in ac492b9c70ca:
what would you want that to say instead?

I don't care myself for any of those "SCAN" result codes, nor whether they
are few or many: I'd rather have __LINE__ numbers for my own debugging.

But if people want to remove SCAN_STORE_FAILED now, sure, send a patch;
my intent was to unbreak the breakage.

Hugh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH next] mm/khugepaged: fix conflicting mods to collapse_file()
  2023-04-24  2:17   ` Hugh Dickins
@ 2023-04-24  2:36     ` David Stevens
  0 siblings, 0 replies; 5+ messages in thread
From: David Stevens @ 2023-04-24  2:36 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Ivan Orlov, Jiaqi Yan, linux-kernel, linux-mm

On Mon, Apr 24, 2023 at 11:17 AM Hugh Dickins <hughd@google.com> wrote:
>
> On Mon, 24 Apr 2023, David Stevens wrote:
> > On Sun, Apr 23, 2023 at 1:47 PM Hugh Dickins <hughd@google.com> wrote:
> > >
> > > Inserting Ivan Orlov's syzbot fix commit 2ce0bdfebc74
> > > ("mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()")
> > > ahead of Jiaqi Yan's and David Stevens's commits
> > > 12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory")
> > > cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> > > ac492b9c70ca ("mm/khugepaged: skip shmem with userfaultfd")
> > > (all of which restructure collapse_file()) did not work out well.
> > >
> > > xfstests generic/086 on huge tmpfs (with accelerated khugepaged) freezes
> > > (if not on the first attempt, then the 2nd or 3rd) in find_lock_entries()
> > > while doing drop_caches: the file's xarray seems to have been corrupted,
> > > with find_get_entry() returning nonsense which makes no progress.
> > >
> > > Bisection led to ac492b9c70ca; and diff against earlier working linux-next
> > > suggested that it's probably down to an errant xas_store(), which does not
> > > belong with the later changes (and nor does the positioning of warnings).
> > > The later changes look as if they fix the syzbot issue independently.
> > >
> > > Remove most of what's left of 2ce0bdfebc74: just leave one WARN_ON_ONCE
> > > (xas_error) after the final xas_store() of the multi-index entry.
> > >
> > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > > ---
> > >
> > >  mm/khugepaged.c | 23 +----------------------
> > >  1 file changed, 1 insertion(+), 22 deletions(-)
> > >
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -1941,16 +1941,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> > >                                         result = SCAN_FAIL;
> > >                                         goto xa_locked;
> > >                                 }
> > > -                               xas_store(&xas, hpage);
> > > -                               if (xas_error(&xas)) {
> > > -                                       /* revert shmem_charge performed
> > > -                                        * in the previous condition
> > > -                                        */
> > > -                                       mapping->nrpages--;
> > > -                                       shmem_uncharge(mapping->host, 1);
> > > -                                       result = SCAN_STORE_FAILED;
> >
> > With this being removed, SCAN_STORE_FAILED should also be removed from
> > the scan_result enum and trace event definitions.
>
> Only if we also remove your use of SCAN_STORE_FAILED in ac492b9c70ca:
> what would you want that to say instead?
>
> I don't care myself for any of those "SCAN" result codes, nor whether they
> are few or many: I'd rather have __LINE__ numbers for my own debugging.
>
> But if people want to remove SCAN_STORE_FAILED now, sure, send a patch;
> my intent was to unbreak the breakage.

Oh, I had forgotten that I used the code when I rebased v6 of my
patches. I have no strong opinions one way or the other. Sorry for the
noise.

-David


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH next] mm/khugepaged: fix conflicting mods to collapse_file()
  2023-04-23  4:47 [PATCH next] mm/khugepaged: fix conflicting mods to collapse_file() Hugh Dickins
  2023-04-24  1:53 ` David Stevens
@ 2023-04-24 22:07 ` Andrew Morton
  1 sibling, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2023-04-24 22:07 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Ivan Orlov, Jiaqi Yan, David Stevens, linux-kernel, linux-mm

On Sat, 22 Apr 2023 21:47:20 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:

> Inserting Ivan Orlov's syzbot fix commit 2ce0bdfebc74
> ("mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()")
> ahead of Jiaqi Yan's and David Stevens's commits
> 12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory")
> cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> ac492b9c70ca ("mm/khugepaged: skip shmem with userfaultfd")
> (all of which restructure collapse_file()) did not work out well.
> 
> xfstests generic/086 on huge tmpfs (with accelerated khugepaged) freezes
> (if not on the first attempt, then the 2nd or 3rd) in find_lock_entries()
> while doing drop_caches: the file's xarray seems to have been corrupted,
> with find_get_entry() returning nonsense which makes no progress.
> 
> Bisection led to ac492b9c70ca; and diff against earlier working linux-next
> suggested that it's probably down to an errant xas_store(), which does not
> belong with the later changes (and nor does the positioning of warnings).
> The later changes look as if they fix the syzbot issue independently.
> 
> Remove most of what's left of 2ce0bdfebc74: just leave one WARN_ON_ONCE
> (xas_error) after the final xas_store() of the multi-index entry.
> 

Sigh.  Thanks.  I thought I'd successfully sorted that mess out.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-04-24 22:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-23  4:47 [PATCH next] mm/khugepaged: fix conflicting mods to collapse_file() Hugh Dickins
2023-04-24  1:53 ` David Stevens
2023-04-24  2:17   ` Hugh Dickins
2023-04-24  2:36     ` David Stevens
2023-04-24 22:07 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox