[PATCH v3] mm: Fix possible NULL pointer dereference in __swap

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
@ 2025-02-15  9:05 gaoxu
  2025-02-16  1:42 ` Barry Song
  0 siblings, 1 reply; 7+ messages in thread
From: gaoxu @ 2025-02-15  9:05 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Suren Baghdasaryan, Barry Song, Yosry Ahmed, yipengxiang

Add a NULL check on the return value of swp_swap_info in __swap_duplicate
to prevent crashes caused by NULL pointer dereference.

The reason why swp_swap_info() returns NULL is unclear; it may be due to
CPU cache issues or DDR bit flips. The probability of this issue is very
small, and the stack info we encountered is as follows：
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000058
[RB/E]rb_sreason_str_set: sreason_str set null_pointer
Mem abort info:
  ESR = 0x0000000096000005
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000
[0000000000000058] pgd=0000000000000000, p4d=0000000000000000,
pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Skip md ftrace buffer dump for: 0x1609e0
...
pc : swap_duplicate+0x44/0x164
lr : copy_page_range+0x508/0x1e78
sp : ffffffc0f2a699e0
x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
Call trace:
 swap_duplicate+0x44/0x164
 copy_page_range+0x508/0x1e78
 copy_process+0x1278/0x21cc
 kernel_clone+0x90/0x438
 __arm64_sys_clone+0x5c/0x8c
 invoke_syscall+0x58/0x110
 do_el0_svc+0x8c/0xe0
 el0_svc+0x38/0x9c
 el0t_64_sync_handler+0x44/0xec
 el0t_64_sync+0x1a8/0x1ac
Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs

The patch seems to only provide a workaround, but there are no more
effective software solutions to handle the bit flips problem. This path
will change the issue from a system crash to a process exception, thereby
reducing the impact on the entire machine.

Signed-off-by: gao xu <gaoxu2@honor.com>
---
v1 -> v2: 
- Add WARN_ON_ONCE.
- update the commit info.
v2 -> v3: Delete the review tags (This is my issue, and I apologize).
---

mm/swapfile.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 7448a3876..a0bfdba94 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
 	int err, i;
 
 	si = swp_swap_info(entry);
+	if (WARN_ON_ONCE(!si))
+		return -EINVAL;
 
 	offset = swp_offset(entry);
 	VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
  2025-02-15  9:05 [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate gaoxu
@ 2025-02-16  1:42 ` Barry Song
  2025-02-18  2:51   ` 回复: " gaoxu
  0 siblings, 1 reply; 7+ messages in thread
From: Barry Song @ 2025-02-16  1:42 UTC (permalink / raw)
  To: gaoxu
  Cc: Andrew Morton, linux-mm, linux-kernel, Suren Baghdasaryan,
	Yosry Ahmed, yipengxiang

On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote:
>
> Add a NULL check on the return value of swp_swap_info in __swap_duplicate
> to prevent crashes caused by NULL pointer dereference.
>
> The reason why swp_swap_info() returns NULL is unclear; it may be due to
> CPU cache issues or DDR bit flips. The probability of this issue is very
> small, and the stack info we encountered is as follows：
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000058
> [RB/E]rb_sreason_str_set: sreason_str set null_pointer
> Mem abort info:
>   ESR = 0x0000000096000005
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
>   FSC = 0x05: level 1 translation fault
> Data abort info:
>   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000
> [0000000000000058] pgd=0000000000000000, p4d=0000000000000000,
> pud=0000000000000000
> Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
> Skip md ftrace buffer dump for: 0x1609e0
> ...
> pc : swap_duplicate+0x44/0x164
> lr : copy_page_range+0x508/0x1e78
> sp : ffffffc0f2a699e0
> x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
> Call trace:
>  swap_duplicate+0x44/0x164
>  copy_page_range+0x508/0x1e78

This is really strange since we already have a swap entry check before
calling swap_duplicate().

copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
                pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *dst_vma,
                struct vm_area_struct *src_vma, unsigned long addr, int *rss)
{
        unsigned long vm_flags = dst_vma->vm_flags;
        pte_t orig_pte = ptep_get(src_pte);
        pte_t pte = orig_pte;
        struct folio *folio;
        struct page *page;
        swp_entry_t entry = pte_to_swp_entry(orig_pte);

        if (likely(!non_swap_entry(entry))) {
                if (swap_duplicate(entry) < 0)
                        return -EIO;
...
}

likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?

static struct swap_info_struct *swap_type_to_swap_info(int type)
{
        if (type >= MAX_SWAPFILES)
                return NULL;

        return READ_ONCE(swap_info[type]); /* rcu_dereference() */
}

But non_swap_entry() guarantees that swp_type is smaller than MAX_SWAPFILES.

static inline int non_swap_entry(swp_entry_t entry)
{
        return swp_type(entry) >= MAX_SWAPFILES;
}

So another possibility is that we have an overflow of swap_info[] where type is
< MAX_SWAPFILES but is not a valid existing swapfile?

I don't see how the current patch contributes to debugging or fixing
anything related to
this dumped stack. Can we dump swp_type() as well?

>  copy_process+0x1278/0x21cc
>  kernel_clone+0x90/0x438
>  __arm64_sys_clone+0x5c/0x8c
>  invoke_syscall+0x58/0x110
>  do_el0_svc+0x8c/0xe0
>  el0_svc+0x38/0x9c
>  el0t_64_sync_handler+0x44/0xec
>  el0t_64_sync+0x1a8/0x1ac
> Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8)
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Oops: Fatal exception
> SMP: stopping secondary CPUs
>
> The patch seems to only provide a workaround, but there are no more
> effective software solutions to handle the bit flips problem. This path
> will change the issue from a system crash to a process exception, thereby
> reducing the impact on the entire machine.
>
> Signed-off-by: gao xu <gaoxu2@honor.com>
> ---
> v1 -> v2:
> - Add WARN_ON_ONCE.
> - update the commit info.
> v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> ---
>
> mm/swapfile.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 7448a3876..a0bfdba94 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
>         int err, i;
>
>         si = swp_swap_info(entry);
> +       if (WARN_ON_ONCE(!si))

I mean, printk something related to swp_type(). This is really
strange, but the current
stack won't help with debugging.

> +               return -EINVAL;
>
>         offset = swp_offset(entry);
>         VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
> --
> 2.17.1

Thanks
Barry


^ permalink raw reply	[flat|nested] 7+ messages in thread

* 回复: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
  2025-02-16  1:42 ` Barry Song
@ 2025-02-18  2:51   ` gaoxu
  2025-02-18  5:40     ` Barry Song
  0 siblings, 1 reply; 7+ messages in thread
From: gaoxu @ 2025-02-18  2:51 UTC (permalink / raw)
  To: Barry Song
  Cc: Andrew Morton, linux-mm, linux-kernel, Suren Baghdasaryan,
	Yosry Ahmed, yipengxiang

> 
> On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote:
> >
> > Add a NULL check on the return value of swp_swap_info in
> > __swap_duplicate to prevent crashes caused by NULL pointer dereference.
> >
> > The reason why swp_swap_info() returns NULL is unclear; it may be due
> > to CPU cache issues or DDR bit flips. The probability of this issue is
> > very small, and the stack info we encountered is as follows：
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 0000000000000058
> > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> >   ESR = 0x0000000096000005
> >   EC = 0x25: DABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> >   FSC = 0x05: level 1 translation fault Data abort info:
> >   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages,
> > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > pgd=0000000000000000, p4d=0000000000000000,
> > pud=0000000000000000
> > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace
> > buffer dump for: 0x1609e0 ...
> > pc : swap_duplicate+0x44/0x164
> > lr : copy_page_range+0x508/0x1e78
> > sp : ffffffc0f2a699e0
> > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call
> > trace:
> >  swap_duplicate+0x44/0x164
> >  copy_page_range+0x508/0x1e78
> 
> This is really strange since we already have a swap entry check before calling
> swap_duplicate().
> 
> copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>                 pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct
> *dst_vma,
>                 struct vm_area_struct *src_vma, unsigned long addr, int
> *rss) {
>         unsigned long vm_flags = dst_vma->vm_flags;
>         pte_t orig_pte = ptep_get(src_pte);
>         pte_t pte = orig_pte;
>         struct folio *folio;
>         struct page *page;
>         swp_entry_t entry = pte_to_swp_entry(orig_pte);
> 
>         if (likely(!non_swap_entry(entry))) {
>                 if (swap_duplicate(entry) < 0)
>                         return -EIO;
> ...
> }
> 
> likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?
> 
> static struct swap_info_struct *swap_type_to_swap_info(int type) {
>         if (type >= MAX_SWAPFILES)
>                 return NULL;
> 
>         return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> 
> But non_swap_entry() guarantees that swp_type is smaller than
> MAX_SWAPFILES.
> 
> static inline int non_swap_entry(swp_entry_t entry) {
>         return swp_type(entry) >= MAX_SWAPFILES; }
> 
> So another possibility is that we have an overflow of swap_info[] where type is <
> MAX_SWAPFILES but is not a valid existing swapfile?
In the log of this issue, there is a printed entry: get_swap_device:
Bad swap file entry 18000000002d2d2f.
It can be calculated that swp_type(18000000002d2d2f) = 6.
In the Android 15-linux6.6:
system: MAX_SWAPFILES = 28, nr_swapfiles = 1.
Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but greater
than nr_swapfiles, the value of this entry is abnormal.

static unsigned int nr_swapfiles;
static struct swap_info_struct *swap_info[MAX_SWAPFILES];
swap_info is a static array, with its values initialized to 0. 
The size of the array is MAX_SWAPFILES, and the size of valid values in the array is
nr_swapfiles. Therefore, when we validate the validity of swp_type(entry),
we should compare it with nr_swapfiles, not MAX_SWAPFILES.
The code for validating swp_type may need to be modified as follows:
static inline int non_swap_entry(swp_entry_t entry)
{
-	return swp_type(entry) >= MAX_SWAPFILES;
+	return swp_type(entry) >= nr_swapfiles;
}

static struct swap_info_struct *swap_type_to_swap_info(int type)
{
-	if (type >= MAX_SWAPFILES)
+	if (type >= nr_swapfiles)
		return NULL;

	return READ_ONCE(swap_info[type]); /* rcu_dereference() */
}
> 
> I don't see how the current patch contributes to debugging or fixing anything
> related to this dumped stack. Can we dump swp_type() as well?
> 
> >  copy_process+0x1278/0x21cc
> >  kernel_clone+0x90/0x438
> >  __arm64_sys_clone+0x5c/0x8c
> >  invoke_syscall+0x58/0x110
> >  do_el0_svc+0x8c/0xe0
> >  el0_svc+0x38/0x9c
> >  el0t_64_sync_handler+0x44/0xec
> >  el0t_64_sync+0x1a8/0x1ac
> > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace
> > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > exception
> > SMP: stopping secondary CPUs
> >
> > The patch seems to only provide a workaround, but there are no more
> > effective software solutions to handle the bit flips problem. This
> > path will change the issue from a system crash to a process exception,
> > thereby reducing the impact on the entire machine.
> >
> > Signed-off-by: gao xu <gaoxu2@honor.com>
> > ---
> > v1 -> v2:
> > - Add WARN_ON_ONCE.
> > - update the commit info.
> > v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> > ---
> >
> > mm/swapfile.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c index 7448a3876..a0bfdba94
> > 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry,
> unsigned char usage, int nr)
> >         int err, i;
> >
> >         si = swp_swap_info(entry);
> > +       if (WARN_ON_ONCE(!si))
> 
> I mean, printk something related to swp_type(). This is really strange, but the
> current stack won't help with debugging.
The log can find info related to "get_swap_device: Bad swap file entry xxx"
when an entry encounters an exception. 
Add a print info log like the following:
pr_err("%s%08d\n", Bad swap type, swp_type(entry));
> 
> > +               return -EINVAL;
> >
> >         offset = swp_offset(entry);
> >         VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset %
> SWAPFILE_CLUSTER);
> > --
> > 2.17.1
> 
> Thanks
> Barry

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
  2025-02-18  2:51   ` 回复: " gaoxu
@ 2025-02-18  5:40     ` Barry Song
  2025-02-18  7:13       ` 回复: " gaoxu
  0 siblings, 1 reply; 7+ messages in thread
From: Barry Song @ 2025-02-18  5:40 UTC (permalink / raw)
  To: gaoxu
  Cc: Andrew Morton, linux-mm, linux-kernel, Suren Baghdasaryan,
	Yosry Ahmed, yipengxiang

Thank you!

On Tue, Feb 18, 2025 at 3:51 PM gaoxu <gaoxu2@honor.com> wrote:
>
> >
> > On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote:
> > >
> > > Add a NULL check on the return value of swp_swap_info in
> > > __swap_duplicate to prevent crashes caused by NULL pointer dereference.
> > >
> > > The reason why swp_swap_info() returns NULL is unclear; it may be due
> > > to CPU cache issues or DDR bit flips. The probability of this issue is
> > > very small, and the stack info we encountered is as follows：
> > > Unable to handle kernel NULL pointer dereference at virtual address
> > > 0000000000000058
> > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> > >   ESR = 0x0000000096000005
> > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > >   SET = 0, FnV = 0
> > >   EA = 0, S1PTW = 0
> > >   FSC = 0x05: level 1 translation fault Data abort info:
> > >   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages,
> > > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > > pgd=0000000000000000, p4d=0000000000000000,
> > > pud=0000000000000000
> > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace
> > > buffer dump for: 0x1609e0 ...
> > > pc : swap_duplicate+0x44/0x164
> > > lr : copy_page_range+0x508/0x1e78
> > > sp : ffffffc0f2a699e0
> > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call
> > > trace:
> > >  swap_duplicate+0x44/0x164
> > >  copy_page_range+0x508/0x1e78
> >
> > This is really strange since we already have a swap entry check before calling
> > swap_duplicate().
> >
> > copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >                 pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct
> > *dst_vma,
> >                 struct vm_area_struct *src_vma, unsigned long addr, int
> > *rss) {
> >         unsigned long vm_flags = dst_vma->vm_flags;
> >         pte_t orig_pte = ptep_get(src_pte);
> >         pte_t pte = orig_pte;
> >         struct folio *folio;
> >         struct page *page;
> >         swp_entry_t entry = pte_to_swp_entry(orig_pte);
> >
> >         if (likely(!non_swap_entry(entry))) {
> >                 if (swap_duplicate(entry) < 0)
> >                         return -EIO;
> > ...
> > }
> >
> > likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?
> >
> > static struct swap_info_struct *swap_type_to_swap_info(int type) {
> >         if (type >= MAX_SWAPFILES)
> >                 return NULL;
> >
> >         return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> >
> > But non_swap_entry() guarantees that swp_type is smaller than
> > MAX_SWAPFILES.
> >
> > static inline int non_swap_entry(swp_entry_t entry) {
> >         return swp_type(entry) >= MAX_SWAPFILES; }
> >
> > So another possibility is that we have an overflow of swap_info[] where type is <
> > MAX_SWAPFILES but is not a valid existing swapfile?
> In the log of this issue, there is a printed entry: get_swap_device:
> Bad swap file entry 18000000002d2d2f.
> It can be calculated that swp_type(18000000002d2d2f) = 6.
> In the Android 15-linux6.6:
> system: MAX_SWAPFILES = 28, nr_swapfiles = 1.
> Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but greater
> than nr_swapfiles, the value of this entry is abnormal.
>
> static unsigned int nr_swapfiles;
> static struct swap_info_struct *swap_info[MAX_SWAPFILES];
> swap_info is a static array, with its values initialized to 0.
> The size of the array is MAX_SWAPFILES, and the size of valid values in the array is
> nr_swapfiles. Therefore, when we validate the validity of swp_type(entry),
> we should compare it with nr_swapfiles, not MAX_SWAPFILES.
> The code for validating swp_type may need to be modified as follows:

That might be true, but on a normal system, we only need to distinguish
between a swap entry and a migrate entry. Therefore, comparing with
MAX_SWAPFILES is sufficient.

> static inline int non_swap_entry(swp_entry_t entry)
> {
> -       return swp_type(entry) >= MAX_SWAPFILES;
> +       return swp_type(entry) >= nr_swapfiles;
> }
>
> static struct swap_info_struct *swap_type_to_swap_info(int type)
> {
> -       if (type >= MAX_SWAPFILES)
> +       if (type >= nr_swapfiles)
>                 return NULL;
>
>         return READ_ONCE(swap_info[type]); /* rcu_dereference() */
> }
> >
> > I don't see how the current patch contributes to debugging or fixing anything
> > related to this dumped stack. Can we dump swp_type() as well?
> >
> > >  copy_process+0x1278/0x21cc
> > >  kernel_clone+0x90/0x438
> > >  __arm64_sys_clone+0x5c/0x8c
> > >  invoke_syscall+0x58/0x110
> > >  do_el0_svc+0x8c/0xe0
> > >  el0_svc+0x38/0x9c
> > >  el0t_64_sync_handler+0x44/0xec
> > >  el0t_64_sync+0x1a8/0x1ac
> > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace
> > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > > exception
> > > SMP: stopping secondary CPUs
> > >
> > > The patch seems to only provide a workaround, but there are no more
> > > effective software solutions to handle the bit flips problem. This
> > > path will change the issue from a system crash to a process exception,
> > > thereby reducing the impact on the entire machine.
> > >
> > > Signed-off-by: gao xu <gaoxu2@honor.com>
> > > ---
> > > v1 -> v2:
> > > - Add WARN_ON_ONCE.
> > > - update the commit info.
> > > v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> > > ---
> > >
> > > mm/swapfile.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c index 7448a3876..a0bfdba94
> > > 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry,
> > unsigned char usage, int nr)
> > >         int err, i;
> > >
> > >         si = swp_swap_info(entry);
> > > +       if (WARN_ON_ONCE(!si))
> >
> > I mean, printk something related to swp_type(). This is really strange, but the
> > current stack won't help with debugging.
> The log can find info related to "get_swap_device: Bad swap file entry xxx"
> when an entry encounters an exception.
> Add a print info log like the following:
> pr_err("%s%08d\n", Bad swap type, swp_type(entry));

This is really strange. It would be better to have the entire PTE value
dumped so we can determine if a bit-flip occurred on critical bits like
PTE_PRESENT.

In that case, a present PTE could be misinterpreted as a swap entry.

On arm64,
/*
 * Encode and decode a swap entry:
 *      bits 0-1:       present (must be zero)
 *      bits 2:         remember PG_anon_exclusive
 *      bits 3-7:       swap type
 *      bits 8-57:      swap offset
 *      bit  58:        PTE_PROT_NONE (must be zero)
 */

#define __SWP_TYPE_SHIFT        3
#define __SWP_TYPE_BITS         5
#define __SWP_OFFSET_BITS       50
#define __SWP_TYPE_MASK         ((1 << __SWP_TYPE_BITS) - 1)
#define __SWP_OFFSET_SHIFT      (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
#define __SWP_OFFSET_MASK       ((1UL << __SWP_OFFSET_BITS) - 1)

_swp_type is bits3-7.

For a present pte,  bits 3-7 are:
AP[7-6], NS[5], AttributeIndex[4-2].

> >
> > > +               return -EINVAL;
> > >
> > >         offset = swp_offset(entry);
> > >         VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset %
> > SWAPFILE_CLUSTER);
> > > --
> > > 2.17.1

Thanks
Barry


^ permalink raw reply	[flat|nested] 7+ messages in thread

* 回复: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
  2025-02-18  5:40     ` Barry Song
@ 2025-02-18  7:13       ` gaoxu
  2025-02-18  9:06         ` Barry Song
  0 siblings, 1 reply; 7+ messages in thread
From: gaoxu @ 2025-02-18  7:13 UTC (permalink / raw)
  To: Barry Song
  Cc: Andrew Morton, linux-mm, linux-kernel, Suren Baghdasaryan,
	Yosry Ahmed, yipengxiang

> 
> Thank you!
> 
> On Tue, Feb 18, 2025 at 3:51 PM gaoxu <gaoxu2@honor.com> wrote:
> >
> > >
> > > On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote:
> > > >
> > > > Add a NULL check on the return value of swp_swap_info in
> > > > __swap_duplicate to prevent crashes caused by NULL pointer
> dereference.
> > > >
> > > > The reason why swp_swap_info() returns NULL is unclear; it may be
> > > > due to CPU cache issues or DDR bit flips. The probability of this
> > > > issue is very small, and the stack info we encountered is as
> > > > follows：
> > > > Unable to handle kernel NULL pointer dereference at virtual
> > > > address
> > > > 0000000000000058
> > > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> > > >   ESR = 0x0000000096000005
> > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > >   SET = 0, FnV = 0
> > > >   EA = 0, S1PTW = 0
> > > >   FSC = 0x05: level 1 translation fault Data abort info:
> > > >   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > > >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k
> > > > pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > > > pgd=0000000000000000, p4d=0000000000000000,
> > > > pud=0000000000000000
> > > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md
> > > > ftrace buffer dump for: 0x1609e0 ...
> > > > pc : swap_duplicate+0x44/0x164
> > > > lr : copy_page_range+0x508/0x1e78
> > > > sp : ffffffc0f2a699e0
> > > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > > > x23: 00000000002d2d2f x22: 0000000000000008 x21:
> 0000000000000000
> > > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > > > x17: 0000000000000000 x16: 0010000000000001 x15:
> 0040000000000001
> > > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
> > > > Call
> > > > trace:
> > > >  swap_duplicate+0x44/0x164
> > > >  copy_page_range+0x508/0x1e78
> > >
> > > This is really strange since we already have a swap entry check
> > > before calling swap_duplicate().
> > >
> > > copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct
> *src_mm,
> > >                 pte_t *dst_pte, pte_t *src_pte, struct
> > > vm_area_struct *dst_vma,
> > >                 struct vm_area_struct *src_vma, unsigned long addr,
> > > int
> > > *rss) {
> > >         unsigned long vm_flags = dst_vma->vm_flags;
> > >         pte_t orig_pte = ptep_get(src_pte);
> > >         pte_t pte = orig_pte;
> > >         struct folio *folio;
> > >         struct page *page;
> > >         swp_entry_t entry = pte_to_swp_entry(orig_pte);
> > >
> > >         if (likely(!non_swap_entry(entry))) {
> > >                 if (swap_duplicate(entry) < 0)
> > >                         return -EIO; ...
> > > }
> > >
> > > likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?
> > >
> > > static struct swap_info_struct *swap_type_to_swap_info(int type) {
> > >         if (type >= MAX_SWAPFILES)
> > >                 return NULL;
> > >
> > >         return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> > >
> > > But non_swap_entry() guarantees that swp_type is smaller than
> > > MAX_SWAPFILES.
> > >
> > > static inline int non_swap_entry(swp_entry_t entry) {
> > >         return swp_type(entry) >= MAX_SWAPFILES; }
> > >
> > > So another possibility is that we have an overflow of swap_info[]
> > > where type is < MAX_SWAPFILES but is not a valid existing swapfile?
> > In the log of this issue, there is a printed entry: get_swap_device:
> > Bad swap file entry 18000000002d2d2f.
> > It can be calculated that swp_type(18000000002d2d2f) = 6.
> > In the Android 15-linux6.6:
> > system: MAX_SWAPFILES = 28, nr_swapfiles = 1.
> > Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but
> > greater than nr_swapfiles, the value of this entry is abnormal.
> >
> > static unsigned int nr_swapfiles;
> > static struct swap_info_struct *swap_info[MAX_SWAPFILES]; swap_info is
> > a static array, with its values initialized to 0.
> > The size of the array is MAX_SWAPFILES, and the size of valid values
> > in the array is nr_swapfiles. Therefore, when we validate the validity
> > of swp_type(entry), we should compare it with nr_swapfiles, not
> MAX_SWAPFILES.
> > The code for validating swp_type may need to be modified as follows:
> 
> That might be true, but on a normal system, we only need to distinguish
> between a swap entry and a migrate entry. Therefore, comparing with
> MAX_SWAPFILES is sufficient.
> 
> > static inline int non_swap_entry(swp_entry_t entry) {
> > -       return swp_type(entry) >= MAX_SWAPFILES;
> > +       return swp_type(entry) >= nr_swapfiles;
> > }
> >
> > static struct swap_info_struct *swap_type_to_swap_info(int type) {
> > -       if (type >= MAX_SWAPFILES)
> > +       if (type >= nr_swapfiles)
> >                 return NULL;
> >
> >         return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> > >
> > > I don't see how the current patch contributes to debugging or fixing
> > > anything related to this dumped stack. Can we dump swp_type() as well?
> > >
> > > >  copy_process+0x1278/0x21cc
> > > >  kernel_clone+0x90/0x438
> > > >  __arm64_sys_clone+0x5c/0x8c
> > > >  invoke_syscall+0x58/0x110
> > > >  do_el0_svc+0x8c/0xe0
> > > >  el0_svc+0x38/0x9c
> > > >  el0t_64_sync_handler+0x44/0xec
> > > >  el0t_64_sync+0x1a8/0x1ac
> > > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end
> > > > trace
> > > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > > > exception
> > > > SMP: stopping secondary CPUs
> > > >
> > > > The patch seems to only provide a workaround, but there are no
> > > > more effective software solutions to handle the bit flips problem.
> > > > This path will change the issue from a system crash to a process
> > > > exception, thereby reducing the impact on the entire machine.
> > > >
> > > > Signed-off-by: gao xu <gaoxu2@honor.com>
> > > > ---
> > > > v1 -> v2:
> > > > - Add WARN_ON_ONCE.
> > > > - update the commit info.
> > > > v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> > > > ---
> > > >
> > > > mm/swapfile.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/mm/swapfile.c b/mm/swapfile.c index
> > > > 7448a3876..a0bfdba94
> > > > 100644
> > > > --- a/mm/swapfile.c
> > > > +++ b/mm/swapfile.c
> > > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t
> > > > entry,
> > > unsigned char usage, int nr)
> > > >         int err, i;
> > > >
> > > >         si = swp_swap_info(entry);
> > > > +       if (WARN_ON_ONCE(!si))
> > >
> > > I mean, printk something related to swp_type(). This is really
> > > strange, but the current stack won't help with debugging.
> > The log can find info related to "get_swap_device: Bad swap file entry xxx"
> > when an entry encounters an exception.
> > Add a print info log like the following:
> > pr_err("%s%08d\n", Bad swap type, swp_type(entry));
> 
> This is really strange. It would be better to have the entire PTE value dumped so
> we can determine if a bit-flip occurred on critical bits like PTE_PRESENT.
Do you mean to convert the SWP entry to PTE and then print it out?
pr_err("%s%08lx\n", Bad pte, pte_val(swp_entry_to_pte(entry)));

Or is it sufficient to print the SWP entry directly?
pr_err("%s%08lx\n", Bad swap entry, entry.val);
> 
> In that case, a present PTE could be misinterpreted as a swap entry.
> 
> On arm64,
> /*
>  * Encode and decode a swap entry:
>  *      bits 0-1:       present (must be zero)
>  *      bits 2:         remember PG_anon_exclusive
>  *      bits 3-7:       swap type
>  *      bits 8-57:      swap offset
>  *      bit  58:        PTE_PROT_NONE (must be zero)
>  */
> 
> #define __SWP_TYPE_SHIFT        3
> #define __SWP_TYPE_BITS         5
> #define __SWP_OFFSET_BITS       50
> #define __SWP_TYPE_MASK         ((1 << __SWP_TYPE_BITS) - 1)
> #define __SWP_OFFSET_SHIFT      (__SWP_TYPE_BITS +
> __SWP_TYPE_SHIFT)
> #define __SWP_OFFSET_MASK       ((1UL << __SWP_OFFSET_BITS) - 1)
> 
> _swp_type is bits3-7.
> 
> For a present pte,  bits 3-7 are:
> AP[7-6], NS[5], AttributeIndex[4-2].
> 
> > >
> > > > +               return -EINVAL;
> > > >
> > > >         offset = swp_offset(entry);
> > > >         VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset %
> > > SWAPFILE_CLUSTER);
> > > > --
> > > > 2.17.1
> 
> Thanks
> Barry

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
  2025-02-18  7:13       ` 回复: " gaoxu
@ 2025-02-18  9:06         ` Barry Song
  0 siblings, 0 replies; 7+ messages in thread
From: Barry Song @ 2025-02-18  9:06 UTC (permalink / raw)
  To: gaoxu
  Cc: Andrew Morton, linux-mm, linux-kernel, Suren Baghdasaryan,
	Yosry Ahmed, yipengxiang

On Tue, Feb 18, 2025 at 8:13 PM gaoxu <gaoxu2@honor.com> wrote:
>
> >
> > Thank you!
> >
> > On Tue, Feb 18, 2025 at 3:51 PM gaoxu <gaoxu2@honor.com> wrote:
> > >
> > > >
> > > > On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote:
> > > > >
> > > > > Add a NULL check on the return value of swp_swap_info in
> > > > > __swap_duplicate to prevent crashes caused by NULL pointer
> > dereference.
> > > > >
> > > > > The reason why swp_swap_info() returns NULL is unclear; it may be
> > > > > due to CPU cache issues or DDR bit flips. The probability of this
> > > > > issue is very small, and the stack info we encountered is as
> > > > > follows：
> > > > > Unable to handle kernel NULL pointer dereference at virtual
> > > > > address
> > > > > 0000000000000058
> > > > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> > > > >   ESR = 0x0000000096000005
> > > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > >   SET = 0, FnV = 0
> > > > >   EA = 0, S1PTW = 0
> > > > >   FSC = 0x05: level 1 translation fault Data abort info:
> > > > >   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > > > >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > > >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k
> > > > > pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > > > > pgd=0000000000000000, p4d=0000000000000000,
> > > > > pud=0000000000000000
> > > > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md
> > > > > ftrace buffer dump for: 0x1609e0 ...
> > > > > pc : swap_duplicate+0x44/0x164
> > > > > lr : copy_page_range+0x508/0x1e78
> > > > > sp : ffffffc0f2a699e0
> > > > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > > > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > > > > x23: 00000000002d2d2f x22: 0000000000000008 x21:
> > 0000000000000000
> > > > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > > > > x17: 0000000000000000 x16: 0010000000000001 x15:
> > 0040000000000001
> > > > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > > > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > > > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > > > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > > > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
> > > > > Call
> > > > > trace:
> > > > >  swap_duplicate+0x44/0x164
> > > > >  copy_page_range+0x508/0x1e78
> > > >
> > > > This is really strange since we already have a swap entry check
> > > > before calling swap_duplicate().
> > > >
> > > > copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct
> > *src_mm,
> > > >                 pte_t *dst_pte, pte_t *src_pte, struct
> > > > vm_area_struct *dst_vma,
> > > >                 struct vm_area_struct *src_vma, unsigned long addr,
> > > > int
> > > > *rss) {
> > > >         unsigned long vm_flags = dst_vma->vm_flags;
> > > >         pte_t orig_pte = ptep_get(src_pte);
> > > >         pte_t pte = orig_pte;
> > > >         struct folio *folio;
> > > >         struct page *page;
> > > >         swp_entry_t entry = pte_to_swp_entry(orig_pte);
> > > >
> > > >         if (likely(!non_swap_entry(entry))) {
> > > >                 if (swap_duplicate(entry) < 0)
> > > >                         return -EIO; ...
> > > > }
> > > >
> > > > likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?
> > > >
> > > > static struct swap_info_struct *swap_type_to_swap_info(int type) {
> > > >         if (type >= MAX_SWAPFILES)
> > > >                 return NULL;
> > > >
> > > >         return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> > > >
> > > > But non_swap_entry() guarantees that swp_type is smaller than
> > > > MAX_SWAPFILES.
> > > >
> > > > static inline int non_swap_entry(swp_entry_t entry) {
> > > >         return swp_type(entry) >= MAX_SWAPFILES; }
> > > >
> > > > So another possibility is that we have an overflow of swap_info[]
> > > > where type is < MAX_SWAPFILES but is not a valid existing swapfile?
> > > In the log of this issue, there is a printed entry: get_swap_device:
> > > Bad swap file entry 18000000002d2d2f.
> > > It can be calculated that swp_type(18000000002d2d2f) = 6.
> > > In the Android 15-linux6.6:
> > > system: MAX_SWAPFILES = 28, nr_swapfiles = 1.
> > > Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but
> > > greater than nr_swapfiles, the value of this entry is abnormal.
> > >
> > > static unsigned int nr_swapfiles;
> > > static struct swap_info_struct *swap_info[MAX_SWAPFILES]; swap_info is
> > > a static array, with its values initialized to 0.
> > > The size of the array is MAX_SWAPFILES, and the size of valid values
> > > in the array is nr_swapfiles. Therefore, when we validate the validity
> > > of swp_type(entry), we should compare it with nr_swapfiles, not
> > MAX_SWAPFILES.
> > > The code for validating swp_type may need to be modified as follows:
> >
> > That might be true, but on a normal system, we only need to distinguish
> > between a swap entry and a migrate entry. Therefore, comparing with
> > MAX_SWAPFILES is sufficient.
> >
> > > static inline int non_swap_entry(swp_entry_t entry) {
> > > -       return swp_type(entry) >= MAX_SWAPFILES;
> > > +       return swp_type(entry) >= nr_swapfiles;
> > > }
> > >
> > > static struct swap_info_struct *swap_type_to_swap_info(int type) {
> > > -       if (type >= MAX_SWAPFILES)
> > > +       if (type >= nr_swapfiles)
> > >                 return NULL;
> > >
> > >         return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> > > >
> > > > I don't see how the current patch contributes to debugging or fixing
> > > > anything related to this dumped stack. Can we dump swp_type() as well?
> > > >
> > > > >  copy_process+0x1278/0x21cc
> > > > >  kernel_clone+0x90/0x438
> > > > >  __arm64_sys_clone+0x5c/0x8c
> > > > >  invoke_syscall+0x58/0x110
> > > > >  do_el0_svc+0x8c/0xe0
> > > > >  el0_svc+0x38/0x9c
> > > > >  el0t_64_sync_handler+0x44/0xec
> > > > >  el0t_64_sync+0x1a8/0x1ac
> > > > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end
> > > > > trace
> > > > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > > > > exception
> > > > > SMP: stopping secondary CPUs
> > > > >
> > > > > The patch seems to only provide a workaround, but there are no
> > > > > more effective software solutions to handle the bit flips problem.
> > > > > This path will change the issue from a system crash to a process
> > > > > exception, thereby reducing the impact on the entire machine.
> > > > >
> > > > > Signed-off-by: gao xu <gaoxu2@honor.com>
> > > > > ---
> > > > > v1 -> v2:
> > > > > - Add WARN_ON_ONCE.
> > > > > - update the commit info.
> > > > > v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> > > > > ---
> > > > >
> > > > > mm/swapfile.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c index
> > > > > 7448a3876..a0bfdba94
> > > > > 100644
> > > > > --- a/mm/swapfile.c
> > > > > +++ b/mm/swapfile.c
> > > > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t
> > > > > entry,
> > > > unsigned char usage, int nr)
> > > > >         int err, i;
> > > > >
> > > > >         si = swp_swap_info(entry);
> > > > > +       if (WARN_ON_ONCE(!si))
> > > >
> > > > I mean, printk something related to swp_type(). This is really
> > > > strange, but the current stack won't help with debugging.
> > > The log can find info related to "get_swap_device: Bad swap file entry xxx"
> > > when an entry encounters an exception.
> > > Add a print info log like the following:
> > > pr_err("%s%08d\n", Bad swap type, swp_type(entry));
> >
> > This is really strange. It would be better to have the entire PTE value dumped so
> > we can determine if a bit-flip occurred on critical bits like PTE_PRESENT.
> Do you mean to convert the SWP entry to PTE and then print it out?
> pr_err("%s%08lx\n", Bad pte, pte_val(swp_entry_to_pte(entry)));
>
> Or is it sufficient to print the SWP entry directly?
> pr_err("%s%08lx\n", Bad swap entry, entry.val);

Yes, I think so. With that, we can convert it to PTE offline and debug using
that value.

By the way, I don’t have a strong opinion on whether this patch gets merged
or not, but it’s still nice to have. :)

I’m more interested in the bug itself and curious whether other Android
products using the same kernel will encounter the same issue.

> >
> > In that case, a present PTE could be misinterpreted as a swap entry.
> >
> > On arm64,
> > /*
> >  * Encode and decode a swap entry:
> >  *      bits 0-1:       present (must be zero)
> >  *      bits 2:         remember PG_anon_exclusive
> >  *      bits 3-7:       swap type
> >  *      bits 8-57:      swap offset
> >  *      bit  58:        PTE_PROT_NONE (must be zero)
> >  */
> >
> > #define __SWP_TYPE_SHIFT        3
> > #define __SWP_TYPE_BITS         5
> > #define __SWP_OFFSET_BITS       50
> > #define __SWP_TYPE_MASK         ((1 << __SWP_TYPE_BITS) - 1)
> > #define __SWP_OFFSET_SHIFT      (__SWP_TYPE_BITS +
> > __SWP_TYPE_SHIFT)
> > #define __SWP_OFFSET_MASK       ((1UL << __SWP_OFFSET_BITS) - 1)
> >
> > _swp_type is bits3-7.
> >
> > For a present pte,  bits 3-7 are:
> > AP[7-6], NS[5], AttributeIndex[4-2].
> >
> > > >
> > > > > +               return -EINVAL;
> > > > >
> > > > >         offset = swp_offset(entry);
> > > > >         VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset %
> > > > SWAPFILE_CLUSTER);
> > > > > --
> > > > > 2.17.1
> >

Thanks
Barry


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
@ 2025-02-15  8:46 gaoxu
  0 siblings, 0 replies; 7+ messages in thread
From: gaoxu @ 2025-02-15  8:46 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Suren Baghdasaryan, Barry Song, Yosry Ahmed, yipengxiang

[-- Attachment #1: Type: text/plain, Size: 3457 bytes --]

Add a NULL check on the return value of swp_swap_info in __swap_duplicate

to prevent crashes caused by NULL pointer dereference.



The reason why swp_swap_info() returns NULL is unclear; it may be due to

CPU cache issues or DDR bit flips. The probability of this issue is very

small, and the stack info we encountered is as follows：

Unable to handle kernel NULL pointer dereference at virtual address

0000000000000058

[RB/E]rb_sreason_str_set: sreason_str set null_pointer

Mem abort info:

  ESR = 0x0000000096000005

  EC = 0x25: DABT (current EL), IL = 32 bits

  SET = 0, FnV = 0

  EA = 0, S1PTW = 0

  FSC = 0x05: level 1 translation fault

Data abort info:

  ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000

  CM = 0, WnR = 0, TnD = 0, TagAccess = 0

  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0

user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000

[0000000000000058] pgd=0000000000000000, p4d=0000000000000000,

pud=0000000000000000

Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP

Skip md ftrace buffer dump for: 0x1609e0

...

pc : swap_duplicate+0x44/0x164

lr : copy_page_range+0x508/0x1e78

sp : ffffffc0f2a699e0

x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388

x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073

x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000

x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0

x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001

x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff

x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006

x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10

x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000

x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f

Call trace:

swap_duplicate+0x44/0x164

copy_page_range+0x508/0x1e78

copy_process+0x1278/0x21cc

kernel_clone+0x90/0x438

__arm64_sys_clone+0x5c/0x8c

invoke_syscall+0x58/0x110

do_el0_svc+0x8c/0xe0

el0_svc+0x38/0x9c

el0t_64_sync_handler+0x44/0xec

el0t_64_sync+0x1a8/0x1ac

Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8)

---[ end trace 0000000000000000 ]---

Kernel panic - not syncing: Oops: Fatal exception

SMP: stopping secondary CPUs



The patch seems to only provide a workaround, but there are no more

effective software solutions to handle the bit flips problem. This path

will change the issue from a system crash to a process exception, thereby

reducing the impact on the entire machine.



Signed-off-by: gaoxu <gaoxu2@honor.com<mailto:gaoxu2@honor.com>>

---

v1 -> v2:

- Add WARN_ON_ONCE as suggested by Yosry Ahmed.

- update the commit info.

v2 -> v3: Delete the review tags (This is my issue, and I apologize).

---



mm/swapfile.c | 2 ++

1 file changed, 2 insertions(+)



diff --git a/mm/swapfile.c b/mm/swapfile.c

index 7448a3876..a0bfdba94 100644

--- a/mm/swapfile.c

+++ b/mm/swapfile.c

@@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)

       int err, i;

        si = swp_swap_info(entry);

+       if (WARN_ON_ONCE(!si))

+                return -EINVAL;

        offset = swp_offset(entry);

       VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);

--

2.17.1


[-- Attachment #2: Type: text/html, Size: 11433 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-18  9:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-15  9:05 [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate gaoxu
2025-02-16  1:42 ` Barry Song
2025-02-18  2:51   ` 回复: " gaoxu
2025-02-18  5:40     ` Barry Song
2025-02-18  7:13       ` 回复: " gaoxu
2025-02-18  9:06         ` Barry Song
  -- strict thread matches above, loose matches on Subject: below --
2025-02-15  8:46 gaoxu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox