[bug report] bad error return in walk_hugetlb

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [bug report] bad error return in walk_hugetlb_range()
@ 2025-10-04  6:22 Dan Carpenter
  2025-10-07 10:13 ` David Hildenbrand
  0 siblings, 1 reply; 3+ messages in thread
From: Dan Carpenter @ 2025-10-04  6:22 UTC (permalink / raw)
  To: intel-xe, linux-mm

This is really old code.  I think it's a bug in hugetlb.

	drivers/gpu/drm/xe/xe_gt_pagefault.c:353 pf_queue_work_func()
	warn: passing positive error code 's32min-(-12),(-10)-(-1),1' to 'ERR_PTR'

mm/pagewalk.c
   319  static int walk_hugetlb_range(unsigned long addr, unsigned long end,
   320                                struct mm_walk *walk)
   321  {
   322          struct vm_area_struct *vma = walk->vma;
   323          struct hstate *h = hstate_vma(vma);
   324          unsigned long next;
   325          unsigned long hmask = huge_page_mask(h);
   326          unsigned long sz = huge_page_size(h);
   327          pte_t *pte;
   328          const struct mm_walk_ops *ops = walk->ops;
   329          int err = 0;
   330  
   331          hugetlb_vma_lock_read(vma);
   332          do {
   333                  next = hugetlb_entry_end(h, addr, end);
   334                  pte = hugetlb_walk(vma, addr & hmask, sz);
   335                  if (pte)
   336                          err = ops->hugetlb_entry(pte, hmask, addr, next, walk);

The ->hugetlb_entry() is implemented by two functions which return
true/false instead of error codes.  Smatch thinks this 1 value gets
propagated back to pf_queue_work_func() and results an an Oops.

The two problem functions are hwpoison_hugetlb_range() and
pagemap_hugetlb_range() which returns PM_END_OF_BUFFER from
add_to_pagemap().

   337                  else if (ops->pte_hole)
   338                          err = ops->pte_hole(addr, next, -1, walk);
   339                  if (err)
   340                          break;
   341          } while (addr = next, addr != end);
   342          hugetlb_vma_unlock_read(vma);
   343  
   344          return err;
   345  }

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug report] bad error return in walk_hugetlb_range()
  2025-10-04  6:22 [bug report] bad error return in walk_hugetlb_range() Dan Carpenter
@ 2025-10-07 10:13 ` David Hildenbrand
  2025-10-07 11:41   ` Dan Carpenter
  0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand @ 2025-10-07 10:13 UTC (permalink / raw)
  To: Dan Carpenter, intel-xe, linux-mm

On 04.10.25 08:22, Dan Carpenter wrote:
> This is really old code.  I think it's a bug in hugetlb.
> 
> 	drivers/gpu/drm/xe/xe_gt_pagefault.c:353 pf_queue_work_func()
> 	warn: passing positive error code 's32min-(-12),(-10)-(-1),1' to 'ERR_PTR'
> 
> mm/pagewalk.c
>     319  static int walk_hugetlb_range(unsigned long addr, unsigned long end,
>     320                                struct mm_walk *walk)
>     321  {
>     322          struct vm_area_struct *vma = walk->vma;
>     323          struct hstate *h = hstate_vma(vma);
>     324          unsigned long next;
>     325          unsigned long hmask = huge_page_mask(h);
>     326          unsigned long sz = huge_page_size(h);
>     327          pte_t *pte;
>     328          const struct mm_walk_ops *ops = walk->ops;
>     329          int err = 0;
>     330
>     331          hugetlb_vma_lock_read(vma);
>     332          do {
>     333                  next = hugetlb_entry_end(h, addr, end);
>     334                  pte = hugetlb_walk(vma, addr & hmask, sz);
>     335                  if (pte)
>     336                          err = ops->hugetlb_entry(pte, hmask, addr, next, walk);
> 
> The ->hugetlb_entry() is implemented by two functions which return
> true/false instead of error codes.  Smatch thinks this 1 value gets
> propagated back to pf_queue_work_func() and results an an Oops.
> 
> The two problem functions are hwpoison_hugetlb_range() and
> pagemap_hugetlb_range() which returns PM_END_OF_BUFFER from
> add_to_pagemap().

hwpoison_hugetlb_range() seems to behave just like hwpoison_pte_range(), 
returning "1" if check_hwpoisoned_entry() returned "1" -- if we found 
the entry with the problematic PFN and can just abort.

Staring at kill_accessing_process() that ends up calling these 
walk-functions, that seems to be correct. The value is converted to 
0/-EHWPOISON, all good.


pagemap_hugetlb_range() can indeed return either 0 or PM_END_OF_BUFFER 
obtained from add_to_pagemap().

But that's the same behavior as pagemap_pte_hole()/pagemap_pmd_range(), 
so that's nothing hugetlb-specific.

pagemap_read() does the
	ret = walk_page_range(mm, start_vaddr, end, &pagemap_ops, &pm);

After the loop, it does

if (!ret || ret == PM_END_OF_BUFFER)
	ret = copied;

So I don't immediately seeing anything wrong with that?

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug report] bad error return in walk_hugetlb_range()
  2025-10-07 10:13 ` David Hildenbrand
@ 2025-10-07 11:41   ` Dan Carpenter
  0 siblings, 0 replies; 3+ messages in thread
From: Dan Carpenter @ 2025-10-07 11:41 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: intel-xe, linux-mm

On Tue, Oct 07, 2025 at 12:13:40PM +0200, David Hildenbrand wrote:
> On 04.10.25 08:22, Dan Carpenter wrote:
> > This is really old code.  I think it's a bug in hugetlb.
> > 
> > 	drivers/gpu/drm/xe/xe_gt_pagefault.c:353 pf_queue_work_func()
> > 	warn: passing positive error code 's32min-(-12),(-10)-(-1),1' to 'ERR_PTR'
> > 

Thanks, David.  Yeah.  You're right.  My apologies.  I tracked down the
confusion and this warning is actually because Smatch thinks that
hmm_range_fault() propogates the positive returns from walk_page_range().
But actually walk_page_range() only returns positive with certain flags.

Someone explained this to me in Jun and I said I would silence the
warning but I forgot...  Ugh...  Sorry.  :(

https://lore.kernel.org/all/aECCaCP3BGGGUUa0@stanley.mountain/

I have done it now, below.

regards,
dan carpenter

From fb706e39230f6f2bc6d68a18837171ea4c1fecc6 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter@linaro.org>
Date: Tue, 7 Oct 2025 14:37:51 +0300
Subject: [PATCH] db/kernel.delete_returns: hmm_range_fault() can't return 1

This is pretty tricky code to read.  It doesn't return 1.  This leads to
error pointer warnings.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
---
 smatch_data/db/kernel.delete.return_states | 1 +
 1 file changed, 1 insertion(+)

diff --git a/smatch_data/db/kernel.delete.return_states b/smatch_data/db/kernel.delete.return_states
index a1b3553a9f03..cfdf252e472c 100644
--- a/smatch_data/db/kernel.delete.return_states
+++ b/smatch_data/db/kernel.delete.return_states
@@ -30,3 +30,4 @@ ubi_find_or_add_av 0
 xe_migrate_copy 0
 scmi_get_or_create_handler 0
 alloc_frame_masks 0
+hmm_range_fault 1
-- 
2.51.0



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-10-07 11:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-04  6:22 [bug report] bad error return in walk_hugetlb_range() Dan Carpenter
2025-10-07 10:13 ` David Hildenbrand
2025-10-07 11:41   ` Dan Carpenter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox