* [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
@ 2025-02-15 9:05 gaoxu
2025-02-16 1:42 ` Barry Song
0 siblings, 1 reply; 7+ messages in thread
From: gaoxu @ 2025-02-15 9:05 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: linux-kernel, Suren Baghdasaryan, Barry Song, Yosry Ahmed, yipengxiang
Add a NULL check on the return value of swp_swap_info in __swap_duplicate
to prevent crashes caused by NULL pointer dereference.
The reason why swp_swap_info() returns NULL is unclear; it may be due to
CPU cache issues or DDR bit flips. The probability of this issue is very
small, and the stack info we encountered is as follows:
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000058
[RB/E]rb_sreason_str_set: sreason_str set null_pointer
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000
[0000000000000058] pgd=0000000000000000, p4d=0000000000000000,
pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Skip md ftrace buffer dump for: 0x1609e0
...
pc : swap_duplicate+0x44/0x164
lr : copy_page_range+0x508/0x1e78
sp : ffffffc0f2a699e0
x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
Call trace:
swap_duplicate+0x44/0x164
copy_page_range+0x508/0x1e78
copy_process+0x1278/0x21cc
kernel_clone+0x90/0x438
__arm64_sys_clone+0x5c/0x8c
invoke_syscall+0x58/0x110
do_el0_svc+0x8c/0xe0
el0_svc+0x38/0x9c
el0t_64_sync_handler+0x44/0xec
el0t_64_sync+0x1a8/0x1ac
Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
The patch seems to only provide a workaround, but there are no more
effective software solutions to handle the bit flips problem. This path
will change the issue from a system crash to a process exception, thereby
reducing the impact on the entire machine.
Signed-off-by: gao xu <gaoxu2@honor.com>
---
v1 -> v2:
- Add WARN_ON_ONCE.
- update the commit info.
v2 -> v3: Delete the review tags (This is my issue, and I apologize).
---
mm/swapfile.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 7448a3876..a0bfdba94 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
int err, i;
si = swp_swap_info(entry);
+ if (WARN_ON_ONCE(!si))
+ return -EINVAL;
offset = swp_offset(entry);
VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
--
2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
2025-02-15 9:05 [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate gaoxu
@ 2025-02-16 1:42 ` Barry Song
2025-02-18 2:51 ` 回复: " gaoxu
0 siblings, 1 reply; 7+ messages in thread
From: Barry Song @ 2025-02-16 1:42 UTC (permalink / raw)
To: gaoxu
Cc: Andrew Morton, linux-mm, linux-kernel, Suren Baghdasaryan,
Yosry Ahmed, yipengxiang
On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote:
>
> Add a NULL check on the return value of swp_swap_info in __swap_duplicate
> to prevent crashes caused by NULL pointer dereference.
>
> The reason why swp_swap_info() returns NULL is unclear; it may be due to
> CPU cache issues or DDR bit flips. The probability of this issue is very
> small, and the stack info we encountered is as follows:
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000058
> [RB/E]rb_sreason_str_set: sreason_str set null_pointer
> Mem abort info:
> ESR = 0x0000000096000005
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> FSC = 0x05: level 1 translation fault
> Data abort info:
> ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000
> [0000000000000058] pgd=0000000000000000, p4d=0000000000000000,
> pud=0000000000000000
> Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
> Skip md ftrace buffer dump for: 0x1609e0
> ...
> pc : swap_duplicate+0x44/0x164
> lr : copy_page_range+0x508/0x1e78
> sp : ffffffc0f2a699e0
> x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
> Call trace:
> swap_duplicate+0x44/0x164
> copy_page_range+0x508/0x1e78
This is really strange since we already have a swap entry check before
calling swap_duplicate().
copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *dst_vma,
struct vm_area_struct *src_vma, unsigned long addr, int *rss)
{
unsigned long vm_flags = dst_vma->vm_flags;
pte_t orig_pte = ptep_get(src_pte);
pte_t pte = orig_pte;
struct folio *folio;
struct page *page;
swp_entry_t entry = pte_to_swp_entry(orig_pte);
if (likely(!non_swap_entry(entry))) {
if (swap_duplicate(entry) < 0)
return -EIO;
...
}
likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?
static struct swap_info_struct *swap_type_to_swap_info(int type)
{
if (type >= MAX_SWAPFILES)
return NULL;
return READ_ONCE(swap_info[type]); /* rcu_dereference() */
}
But non_swap_entry() guarantees that swp_type is smaller than MAX_SWAPFILES.
static inline int non_swap_entry(swp_entry_t entry)
{
return swp_type(entry) >= MAX_SWAPFILES;
}
So another possibility is that we have an overflow of swap_info[] where type is
< MAX_SWAPFILES but is not a valid existing swapfile?
I don't see how the current patch contributes to debugging or fixing
anything related to
this dumped stack. Can we dump swp_type() as well?
> copy_process+0x1278/0x21cc
> kernel_clone+0x90/0x438
> __arm64_sys_clone+0x5c/0x8c
> invoke_syscall+0x58/0x110
> do_el0_svc+0x8c/0xe0
> el0_svc+0x38/0x9c
> el0t_64_sync_handler+0x44/0xec
> el0t_64_sync+0x1a8/0x1ac
> Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8)
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Oops: Fatal exception
> SMP: stopping secondary CPUs
>
> The patch seems to only provide a workaround, but there are no more
> effective software solutions to handle the bit flips problem. This path
> will change the issue from a system crash to a process exception, thereby
> reducing the impact on the entire machine.
>
> Signed-off-by: gao xu <gaoxu2@honor.com>
> ---
> v1 -> v2:
> - Add WARN_ON_ONCE.
> - update the commit info.
> v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> ---
>
> mm/swapfile.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 7448a3876..a0bfdba94 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
> int err, i;
>
> si = swp_swap_info(entry);
> + if (WARN_ON_ONCE(!si))
I mean, printk something related to swp_type(). This is really
strange, but the current
stack won't help with debugging.
> + return -EINVAL;
>
> offset = swp_offset(entry);
> VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
> --
> 2.17.1
Thanks
Barry
^ permalink raw reply [flat|nested] 7+ messages in thread* 回复: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
2025-02-16 1:42 ` Barry Song
@ 2025-02-18 2:51 ` gaoxu
2025-02-18 5:40 ` Barry Song
0 siblings, 1 reply; 7+ messages in thread
From: gaoxu @ 2025-02-18 2:51 UTC (permalink / raw)
To: Barry Song
Cc: Andrew Morton, linux-mm, linux-kernel, Suren Baghdasaryan,
Yosry Ahmed, yipengxiang
>
> On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote:
> >
> > Add a NULL check on the return value of swp_swap_info in
> > __swap_duplicate to prevent crashes caused by NULL pointer dereference.
> >
> > The reason why swp_swap_info() returns NULL is unclear; it may be due
> > to CPU cache issues or DDR bit flips. The probability of this issue is
> > very small, and the stack info we encountered is as follows:
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 0000000000000058
> > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> > ESR = 0x0000000096000005
> > EC = 0x25: DABT (current EL), IL = 32 bits
> > SET = 0, FnV = 0
> > EA = 0, S1PTW = 0
> > FSC = 0x05: level 1 translation fault Data abort info:
> > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages,
> > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > pgd=0000000000000000, p4d=0000000000000000,
> > pud=0000000000000000
> > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace
> > buffer dump for: 0x1609e0 ...
> > pc : swap_duplicate+0x44/0x164
> > lr : copy_page_range+0x508/0x1e78
> > sp : ffffffc0f2a699e0
> > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call
> > trace:
> > swap_duplicate+0x44/0x164
> > copy_page_range+0x508/0x1e78
>
> This is really strange since we already have a swap entry check before calling
> swap_duplicate().
>
> copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct
> *dst_vma,
> struct vm_area_struct *src_vma, unsigned long addr, int
> *rss) {
> unsigned long vm_flags = dst_vma->vm_flags;
> pte_t orig_pte = ptep_get(src_pte);
> pte_t pte = orig_pte;
> struct folio *folio;
> struct page *page;
> swp_entry_t entry = pte_to_swp_entry(orig_pte);
>
> if (likely(!non_swap_entry(entry))) {
> if (swap_duplicate(entry) < 0)
> return -EIO;
> ...
> }
>
> likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?
>
> static struct swap_info_struct *swap_type_to_swap_info(int type) {
> if (type >= MAX_SWAPFILES)
> return NULL;
>
> return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
>
> But non_swap_entry() guarantees that swp_type is smaller than
> MAX_SWAPFILES.
>
> static inline int non_swap_entry(swp_entry_t entry) {
> return swp_type(entry) >= MAX_SWAPFILES; }
>
> So another possibility is that we have an overflow of swap_info[] where type is <
> MAX_SWAPFILES but is not a valid existing swapfile?
In the log of this issue, there is a printed entry: get_swap_device:
Bad swap file entry 18000000002d2d2f.
It can be calculated that swp_type(18000000002d2d2f) = 6.
In the Android 15-linux6.6:
system: MAX_SWAPFILES = 28, nr_swapfiles = 1.
Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but greater
than nr_swapfiles, the value of this entry is abnormal.
static unsigned int nr_swapfiles;
static struct swap_info_struct *swap_info[MAX_SWAPFILES];
swap_info is a static array, with its values initialized to 0.
The size of the array is MAX_SWAPFILES, and the size of valid values in the array is
nr_swapfiles. Therefore, when we validate the validity of swp_type(entry),
we should compare it with nr_swapfiles, not MAX_SWAPFILES.
The code for validating swp_type may need to be modified as follows:
static inline int non_swap_entry(swp_entry_t entry)
{
- return swp_type(entry) >= MAX_SWAPFILES;
+ return swp_type(entry) >= nr_swapfiles;
}
static struct swap_info_struct *swap_type_to_swap_info(int type)
{
- if (type >= MAX_SWAPFILES)
+ if (type >= nr_swapfiles)
return NULL;
return READ_ONCE(swap_info[type]); /* rcu_dereference() */
}
>
> I don't see how the current patch contributes to debugging or fixing anything
> related to this dumped stack. Can we dump swp_type() as well?
>
> > copy_process+0x1278/0x21cc
> > kernel_clone+0x90/0x438
> > __arm64_sys_clone+0x5c/0x8c
> > invoke_syscall+0x58/0x110
> > do_el0_svc+0x8c/0xe0
> > el0_svc+0x38/0x9c
> > el0t_64_sync_handler+0x44/0xec
> > el0t_64_sync+0x1a8/0x1ac
> > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace
> > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > exception
> > SMP: stopping secondary CPUs
> >
> > The patch seems to only provide a workaround, but there are no more
> > effective software solutions to handle the bit flips problem. This
> > path will change the issue from a system crash to a process exception,
> > thereby reducing the impact on the entire machine.
> >
> > Signed-off-by: gao xu <gaoxu2@honor.com>
> > ---
> > v1 -> v2:
> > - Add WARN_ON_ONCE.
> > - update the commit info.
> > v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> > ---
> >
> > mm/swapfile.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c index 7448a3876..a0bfdba94
> > 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry,
> unsigned char usage, int nr)
> > int err, i;
> >
> > si = swp_swap_info(entry);
> > + if (WARN_ON_ONCE(!si))
>
> I mean, printk something related to swp_type(). This is really strange, but the
> current stack won't help with debugging.
The log can find info related to "get_swap_device: Bad swap file entry xxx"
when an entry encounters an exception.
Add a print info log like the following:
pr_err("%s%08d\n", Bad swap type, swp_type(entry));
>
> > + return -EINVAL;
> >
> > offset = swp_offset(entry);
> > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset %
> SWAPFILE_CLUSTER);
> > --
> > 2.17.1
>
> Thanks
> Barry
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
2025-02-18 2:51 ` 回复: " gaoxu
@ 2025-02-18 5:40 ` Barry Song
2025-02-18 7:13 ` 回复: " gaoxu
0 siblings, 1 reply; 7+ messages in thread
From: Barry Song @ 2025-02-18 5:40 UTC (permalink / raw)
To: gaoxu
Cc: Andrew Morton, linux-mm, linux-kernel, Suren Baghdasaryan,
Yosry Ahmed, yipengxiang
Thank you!
On Tue, Feb 18, 2025 at 3:51 PM gaoxu <gaoxu2@honor.com> wrote:
>
> >
> > On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote:
> > >
> > > Add a NULL check on the return value of swp_swap_info in
> > > __swap_duplicate to prevent crashes caused by NULL pointer dereference.
> > >
> > > The reason why swp_swap_info() returns NULL is unclear; it may be due
> > > to CPU cache issues or DDR bit flips. The probability of this issue is
> > > very small, and the stack info we encountered is as follows:
> > > Unable to handle kernel NULL pointer dereference at virtual address
> > > 0000000000000058
> > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> > > ESR = 0x0000000096000005
> > > EC = 0x25: DABT (current EL), IL = 32 bits
> > > SET = 0, FnV = 0
> > > EA = 0, S1PTW = 0
> > > FSC = 0x05: level 1 translation fault Data abort info:
> > > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > > CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages,
> > > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > > pgd=0000000000000000, p4d=0000000000000000,
> > > pud=0000000000000000
> > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace
> > > buffer dump for: 0x1609e0 ...
> > > pc : swap_duplicate+0x44/0x164
> > > lr : copy_page_range+0x508/0x1e78
> > > sp : ffffffc0f2a699e0
> > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call
> > > trace:
> > > swap_duplicate+0x44/0x164
> > > copy_page_range+0x508/0x1e78
> >
> > This is really strange since we already have a swap entry check before calling
> > swap_duplicate().
> >
> > copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> > pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct
> > *dst_vma,
> > struct vm_area_struct *src_vma, unsigned long addr, int
> > *rss) {
> > unsigned long vm_flags = dst_vma->vm_flags;
> > pte_t orig_pte = ptep_get(src_pte);
> > pte_t pte = orig_pte;
> > struct folio *folio;
> > struct page *page;
> > swp_entry_t entry = pte_to_swp_entry(orig_pte);
> >
> > if (likely(!non_swap_entry(entry))) {
> > if (swap_duplicate(entry) < 0)
> > return -EIO;
> > ...
> > }
> >
> > likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?
> >
> > static struct swap_info_struct *swap_type_to_swap_info(int type) {
> > if (type >= MAX_SWAPFILES)
> > return NULL;
> >
> > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> >
> > But non_swap_entry() guarantees that swp_type is smaller than
> > MAX_SWAPFILES.
> >
> > static inline int non_swap_entry(swp_entry_t entry) {
> > return swp_type(entry) >= MAX_SWAPFILES; }
> >
> > So another possibility is that we have an overflow of swap_info[] where type is <
> > MAX_SWAPFILES but is not a valid existing swapfile?
> In the log of this issue, there is a printed entry: get_swap_device:
> Bad swap file entry 18000000002d2d2f.
> It can be calculated that swp_type(18000000002d2d2f) = 6.
> In the Android 15-linux6.6:
> system: MAX_SWAPFILES = 28, nr_swapfiles = 1.
> Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but greater
> than nr_swapfiles, the value of this entry is abnormal.
>
> static unsigned int nr_swapfiles;
> static struct swap_info_struct *swap_info[MAX_SWAPFILES];
> swap_info is a static array, with its values initialized to 0.
> The size of the array is MAX_SWAPFILES, and the size of valid values in the array is
> nr_swapfiles. Therefore, when we validate the validity of swp_type(entry),
> we should compare it with nr_swapfiles, not MAX_SWAPFILES.
> The code for validating swp_type may need to be modified as follows:
That might be true, but on a normal system, we only need to distinguish
between a swap entry and a migrate entry. Therefore, comparing with
MAX_SWAPFILES is sufficient.
> static inline int non_swap_entry(swp_entry_t entry)
> {
> - return swp_type(entry) >= MAX_SWAPFILES;
> + return swp_type(entry) >= nr_swapfiles;
> }
>
> static struct swap_info_struct *swap_type_to_swap_info(int type)
> {
> - if (type >= MAX_SWAPFILES)
> + if (type >= nr_swapfiles)
> return NULL;
>
> return READ_ONCE(swap_info[type]); /* rcu_dereference() */
> }
> >
> > I don't see how the current patch contributes to debugging or fixing anything
> > related to this dumped stack. Can we dump swp_type() as well?
> >
> > > copy_process+0x1278/0x21cc
> > > kernel_clone+0x90/0x438
> > > __arm64_sys_clone+0x5c/0x8c
> > > invoke_syscall+0x58/0x110
> > > do_el0_svc+0x8c/0xe0
> > > el0_svc+0x38/0x9c
> > > el0t_64_sync_handler+0x44/0xec
> > > el0t_64_sync+0x1a8/0x1ac
> > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace
> > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > > exception
> > > SMP: stopping secondary CPUs
> > >
> > > The patch seems to only provide a workaround, but there are no more
> > > effective software solutions to handle the bit flips problem. This
> > > path will change the issue from a system crash to a process exception,
> > > thereby reducing the impact on the entire machine.
> > >
> > > Signed-off-by: gao xu <gaoxu2@honor.com>
> > > ---
> > > v1 -> v2:
> > > - Add WARN_ON_ONCE.
> > > - update the commit info.
> > > v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> > > ---
> > >
> > > mm/swapfile.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c index 7448a3876..a0bfdba94
> > > 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry,
> > unsigned char usage, int nr)
> > > int err, i;
> > >
> > > si = swp_swap_info(entry);
> > > + if (WARN_ON_ONCE(!si))
> >
> > I mean, printk something related to swp_type(). This is really strange, but the
> > current stack won't help with debugging.
> The log can find info related to "get_swap_device: Bad swap file entry xxx"
> when an entry encounters an exception.
> Add a print info log like the following:
> pr_err("%s%08d\n", Bad swap type, swp_type(entry));
This is really strange. It would be better to have the entire PTE value
dumped so we can determine if a bit-flip occurred on critical bits like
PTE_PRESENT.
In that case, a present PTE could be misinterpreted as a swap entry.
On arm64,
/*
* Encode and decode a swap entry:
* bits 0-1: present (must be zero)
* bits 2: remember PG_anon_exclusive
* bits 3-7: swap type
* bits 8-57: swap offset
* bit 58: PTE_PROT_NONE (must be zero)
*/
#define __SWP_TYPE_SHIFT 3
#define __SWP_TYPE_BITS 5
#define __SWP_OFFSET_BITS 50
#define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1)
#define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
#define __SWP_OFFSET_MASK ((1UL << __SWP_OFFSET_BITS) - 1)
_swp_type is bits3-7.
For a present pte, bits 3-7 are:
AP[7-6], NS[5], AttributeIndex[4-2].
> >
> > > + return -EINVAL;
> > >
> > > offset = swp_offset(entry);
> > > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset %
> > SWAPFILE_CLUSTER);
> > > --
> > > 2.17.1
Thanks
Barry
^ permalink raw reply [flat|nested] 7+ messages in thread* 回复: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
2025-02-18 5:40 ` Barry Song
@ 2025-02-18 7:13 ` gaoxu
2025-02-18 9:06 ` Barry Song
0 siblings, 1 reply; 7+ messages in thread
From: gaoxu @ 2025-02-18 7:13 UTC (permalink / raw)
To: Barry Song
Cc: Andrew Morton, linux-mm, linux-kernel, Suren Baghdasaryan,
Yosry Ahmed, yipengxiang
>
> Thank you!
>
> On Tue, Feb 18, 2025 at 3:51 PM gaoxu <gaoxu2@honor.com> wrote:
> >
> > >
> > > On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote:
> > > >
> > > > Add a NULL check on the return value of swp_swap_info in
> > > > __swap_duplicate to prevent crashes caused by NULL pointer
> dereference.
> > > >
> > > > The reason why swp_swap_info() returns NULL is unclear; it may be
> > > > due to CPU cache issues or DDR bit flips. The probability of this
> > > > issue is very small, and the stack info we encountered is as
> > > > follows:
> > > > Unable to handle kernel NULL pointer dereference at virtual
> > > > address
> > > > 0000000000000058
> > > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> > > > ESR = 0x0000000096000005
> > > > EC = 0x25: DABT (current EL), IL = 32 bits
> > > > SET = 0, FnV = 0
> > > > EA = 0, S1PTW = 0
> > > > FSC = 0x05: level 1 translation fault Data abort info:
> > > > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > > > CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k
> > > > pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > > > pgd=0000000000000000, p4d=0000000000000000,
> > > > pud=0000000000000000
> > > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md
> > > > ftrace buffer dump for: 0x1609e0 ...
> > > > pc : swap_duplicate+0x44/0x164
> > > > lr : copy_page_range+0x508/0x1e78
> > > > sp : ffffffc0f2a699e0
> > > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > > > x23: 00000000002d2d2f x22: 0000000000000008 x21:
> 0000000000000000
> > > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > > > x17: 0000000000000000 x16: 0010000000000001 x15:
> 0040000000000001
> > > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
> > > > Call
> > > > trace:
> > > > swap_duplicate+0x44/0x164
> > > > copy_page_range+0x508/0x1e78
> > >
> > > This is really strange since we already have a swap entry check
> > > before calling swap_duplicate().
> > >
> > > copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct
> *src_mm,
> > > pte_t *dst_pte, pte_t *src_pte, struct
> > > vm_area_struct *dst_vma,
> > > struct vm_area_struct *src_vma, unsigned long addr,
> > > int
> > > *rss) {
> > > unsigned long vm_flags = dst_vma->vm_flags;
> > > pte_t orig_pte = ptep_get(src_pte);
> > > pte_t pte = orig_pte;
> > > struct folio *folio;
> > > struct page *page;
> > > swp_entry_t entry = pte_to_swp_entry(orig_pte);
> > >
> > > if (likely(!non_swap_entry(entry))) {
> > > if (swap_duplicate(entry) < 0)
> > > return -EIO; ...
> > > }
> > >
> > > likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?
> > >
> > > static struct swap_info_struct *swap_type_to_swap_info(int type) {
> > > if (type >= MAX_SWAPFILES)
> > > return NULL;
> > >
> > > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> > >
> > > But non_swap_entry() guarantees that swp_type is smaller than
> > > MAX_SWAPFILES.
> > >
> > > static inline int non_swap_entry(swp_entry_t entry) {
> > > return swp_type(entry) >= MAX_SWAPFILES; }
> > >
> > > So another possibility is that we have an overflow of swap_info[]
> > > where type is < MAX_SWAPFILES but is not a valid existing swapfile?
> > In the log of this issue, there is a printed entry: get_swap_device:
> > Bad swap file entry 18000000002d2d2f.
> > It can be calculated that swp_type(18000000002d2d2f) = 6.
> > In the Android 15-linux6.6:
> > system: MAX_SWAPFILES = 28, nr_swapfiles = 1.
> > Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but
> > greater than nr_swapfiles, the value of this entry is abnormal.
> >
> > static unsigned int nr_swapfiles;
> > static struct swap_info_struct *swap_info[MAX_SWAPFILES]; swap_info is
> > a static array, with its values initialized to 0.
> > The size of the array is MAX_SWAPFILES, and the size of valid values
> > in the array is nr_swapfiles. Therefore, when we validate the validity
> > of swp_type(entry), we should compare it with nr_swapfiles, not
> MAX_SWAPFILES.
> > The code for validating swp_type may need to be modified as follows:
>
> That might be true, but on a normal system, we only need to distinguish
> between a swap entry and a migrate entry. Therefore, comparing with
> MAX_SWAPFILES is sufficient.
>
> > static inline int non_swap_entry(swp_entry_t entry) {
> > - return swp_type(entry) >= MAX_SWAPFILES;
> > + return swp_type(entry) >= nr_swapfiles;
> > }
> >
> > static struct swap_info_struct *swap_type_to_swap_info(int type) {
> > - if (type >= MAX_SWAPFILES)
> > + if (type >= nr_swapfiles)
> > return NULL;
> >
> > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> > >
> > > I don't see how the current patch contributes to debugging or fixing
> > > anything related to this dumped stack. Can we dump swp_type() as well?
> > >
> > > > copy_process+0x1278/0x21cc
> > > > kernel_clone+0x90/0x438
> > > > __arm64_sys_clone+0x5c/0x8c
> > > > invoke_syscall+0x58/0x110
> > > > do_el0_svc+0x8c/0xe0
> > > > el0_svc+0x38/0x9c
> > > > el0t_64_sync_handler+0x44/0xec
> > > > el0t_64_sync+0x1a8/0x1ac
> > > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end
> > > > trace
> > > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > > > exception
> > > > SMP: stopping secondary CPUs
> > > >
> > > > The patch seems to only provide a workaround, but there are no
> > > > more effective software solutions to handle the bit flips problem.
> > > > This path will change the issue from a system crash to a process
> > > > exception, thereby reducing the impact on the entire machine.
> > > >
> > > > Signed-off-by: gao xu <gaoxu2@honor.com>
> > > > ---
> > > > v1 -> v2:
> > > > - Add WARN_ON_ONCE.
> > > > - update the commit info.
> > > > v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> > > > ---
> > > >
> > > > mm/swapfile.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/mm/swapfile.c b/mm/swapfile.c index
> > > > 7448a3876..a0bfdba94
> > > > 100644
> > > > --- a/mm/swapfile.c
> > > > +++ b/mm/swapfile.c
> > > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t
> > > > entry,
> > > unsigned char usage, int nr)
> > > > int err, i;
> > > >
> > > > si = swp_swap_info(entry);
> > > > + if (WARN_ON_ONCE(!si))
> > >
> > > I mean, printk something related to swp_type(). This is really
> > > strange, but the current stack won't help with debugging.
> > The log can find info related to "get_swap_device: Bad swap file entry xxx"
> > when an entry encounters an exception.
> > Add a print info log like the following:
> > pr_err("%s%08d\n", Bad swap type, swp_type(entry));
>
> This is really strange. It would be better to have the entire PTE value dumped so
> we can determine if a bit-flip occurred on critical bits like PTE_PRESENT.
Do you mean to convert the SWP entry to PTE and then print it out?
pr_err("%s%08lx\n", Bad pte, pte_val(swp_entry_to_pte(entry)));
Or is it sufficient to print the SWP entry directly?
pr_err("%s%08lx\n", Bad swap entry, entry.val);
>
> In that case, a present PTE could be misinterpreted as a swap entry.
>
> On arm64,
> /*
> * Encode and decode a swap entry:
> * bits 0-1: present (must be zero)
> * bits 2: remember PG_anon_exclusive
> * bits 3-7: swap type
> * bits 8-57: swap offset
> * bit 58: PTE_PROT_NONE (must be zero)
> */
>
> #define __SWP_TYPE_SHIFT 3
> #define __SWP_TYPE_BITS 5
> #define __SWP_OFFSET_BITS 50
> #define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1)
> #define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS +
> __SWP_TYPE_SHIFT)
> #define __SWP_OFFSET_MASK ((1UL << __SWP_OFFSET_BITS) - 1)
>
> _swp_type is bits3-7.
>
> For a present pte, bits 3-7 are:
> AP[7-6], NS[5], AttributeIndex[4-2].
>
> > >
> > > > + return -EINVAL;
> > > >
> > > > offset = swp_offset(entry);
> > > > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset %
> > > SWAPFILE_CLUSTER);
> > > > --
> > > > 2.17.1
>
> Thanks
> Barry
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
2025-02-18 7:13 ` 回复: " gaoxu
@ 2025-02-18 9:06 ` Barry Song
0 siblings, 0 replies; 7+ messages in thread
From: Barry Song @ 2025-02-18 9:06 UTC (permalink / raw)
To: gaoxu
Cc: Andrew Morton, linux-mm, linux-kernel, Suren Baghdasaryan,
Yosry Ahmed, yipengxiang
On Tue, Feb 18, 2025 at 8:13 PM gaoxu <gaoxu2@honor.com> wrote:
>
> >
> > Thank you!
> >
> > On Tue, Feb 18, 2025 at 3:51 PM gaoxu <gaoxu2@honor.com> wrote:
> > >
> > > >
> > > > On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote:
> > > > >
> > > > > Add a NULL check on the return value of swp_swap_info in
> > > > > __swap_duplicate to prevent crashes caused by NULL pointer
> > dereference.
> > > > >
> > > > > The reason why swp_swap_info() returns NULL is unclear; it may be
> > > > > due to CPU cache issues or DDR bit flips. The probability of this
> > > > > issue is very small, and the stack info we encountered is as
> > > > > follows:
> > > > > Unable to handle kernel NULL pointer dereference at virtual
> > > > > address
> > > > > 0000000000000058
> > > > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> > > > > ESR = 0x0000000096000005
> > > > > EC = 0x25: DABT (current EL), IL = 32 bits
> > > > > SET = 0, FnV = 0
> > > > > EA = 0, S1PTW = 0
> > > > > FSC = 0x05: level 1 translation fault Data abort info:
> > > > > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > > > > CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k
> > > > > pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > > > > pgd=0000000000000000, p4d=0000000000000000,
> > > > > pud=0000000000000000
> > > > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md
> > > > > ftrace buffer dump for: 0x1609e0 ...
> > > > > pc : swap_duplicate+0x44/0x164
> > > > > lr : copy_page_range+0x508/0x1e78
> > > > > sp : ffffffc0f2a699e0
> > > > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > > > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > > > > x23: 00000000002d2d2f x22: 0000000000000008 x21:
> > 0000000000000000
> > > > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > > > > x17: 0000000000000000 x16: 0010000000000001 x15:
> > 0040000000000001
> > > > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > > > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > > > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > > > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > > > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
> > > > > Call
> > > > > trace:
> > > > > swap_duplicate+0x44/0x164
> > > > > copy_page_range+0x508/0x1e78
> > > >
> > > > This is really strange since we already have a swap entry check
> > > > before calling swap_duplicate().
> > > >
> > > > copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct
> > *src_mm,
> > > > pte_t *dst_pte, pte_t *src_pte, struct
> > > > vm_area_struct *dst_vma,
> > > > struct vm_area_struct *src_vma, unsigned long addr,
> > > > int
> > > > *rss) {
> > > > unsigned long vm_flags = dst_vma->vm_flags;
> > > > pte_t orig_pte = ptep_get(src_pte);
> > > > pte_t pte = orig_pte;
> > > > struct folio *folio;
> > > > struct page *page;
> > > > swp_entry_t entry = pte_to_swp_entry(orig_pte);
> > > >
> > > > if (likely(!non_swap_entry(entry))) {
> > > > if (swap_duplicate(entry) < 0)
> > > > return -EIO; ...
> > > > }
> > > >
> > > > likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?
> > > >
> > > > static struct swap_info_struct *swap_type_to_swap_info(int type) {
> > > > if (type >= MAX_SWAPFILES)
> > > > return NULL;
> > > >
> > > > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> > > >
> > > > But non_swap_entry() guarantees that swp_type is smaller than
> > > > MAX_SWAPFILES.
> > > >
> > > > static inline int non_swap_entry(swp_entry_t entry) {
> > > > return swp_type(entry) >= MAX_SWAPFILES; }
> > > >
> > > > So another possibility is that we have an overflow of swap_info[]
> > > > where type is < MAX_SWAPFILES but is not a valid existing swapfile?
> > > In the log of this issue, there is a printed entry: get_swap_device:
> > > Bad swap file entry 18000000002d2d2f.
> > > It can be calculated that swp_type(18000000002d2d2f) = 6.
> > > In the Android 15-linux6.6:
> > > system: MAX_SWAPFILES = 28, nr_swapfiles = 1.
> > > Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but
> > > greater than nr_swapfiles, the value of this entry is abnormal.
> > >
> > > static unsigned int nr_swapfiles;
> > > static struct swap_info_struct *swap_info[MAX_SWAPFILES]; swap_info is
> > > a static array, with its values initialized to 0.
> > > The size of the array is MAX_SWAPFILES, and the size of valid values
> > > in the array is nr_swapfiles. Therefore, when we validate the validity
> > > of swp_type(entry), we should compare it with nr_swapfiles, not
> > MAX_SWAPFILES.
> > > The code for validating swp_type may need to be modified as follows:
> >
> > That might be true, but on a normal system, we only need to distinguish
> > between a swap entry and a migrate entry. Therefore, comparing with
> > MAX_SWAPFILES is sufficient.
> >
> > > static inline int non_swap_entry(swp_entry_t entry) {
> > > - return swp_type(entry) >= MAX_SWAPFILES;
> > > + return swp_type(entry) >= nr_swapfiles;
> > > }
> > >
> > > static struct swap_info_struct *swap_type_to_swap_info(int type) {
> > > - if (type >= MAX_SWAPFILES)
> > > + if (type >= nr_swapfiles)
> > > return NULL;
> > >
> > > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> > > >
> > > > I don't see how the current patch contributes to debugging or fixing
> > > > anything related to this dumped stack. Can we dump swp_type() as well?
> > > >
> > > > > copy_process+0x1278/0x21cc
> > > > > kernel_clone+0x90/0x438
> > > > > __arm64_sys_clone+0x5c/0x8c
> > > > > invoke_syscall+0x58/0x110
> > > > > do_el0_svc+0x8c/0xe0
> > > > > el0_svc+0x38/0x9c
> > > > > el0t_64_sync_handler+0x44/0xec
> > > > > el0t_64_sync+0x1a8/0x1ac
> > > > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end
> > > > > trace
> > > > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > > > > exception
> > > > > SMP: stopping secondary CPUs
> > > > >
> > > > > The patch seems to only provide a workaround, but there are no
> > > > > more effective software solutions to handle the bit flips problem.
> > > > > This path will change the issue from a system crash to a process
> > > > > exception, thereby reducing the impact on the entire machine.
> > > > >
> > > > > Signed-off-by: gao xu <gaoxu2@honor.com>
> > > > > ---
> > > > > v1 -> v2:
> > > > > - Add WARN_ON_ONCE.
> > > > > - update the commit info.
> > > > > v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> > > > > ---
> > > > >
> > > > > mm/swapfile.c | 2 ++
> > > > > 1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c index
> > > > > 7448a3876..a0bfdba94
> > > > > 100644
> > > > > --- a/mm/swapfile.c
> > > > > +++ b/mm/swapfile.c
> > > > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t
> > > > > entry,
> > > > unsigned char usage, int nr)
> > > > > int err, i;
> > > > >
> > > > > si = swp_swap_info(entry);
> > > > > + if (WARN_ON_ONCE(!si))
> > > >
> > > > I mean, printk something related to swp_type(). This is really
> > > > strange, but the current stack won't help with debugging.
> > > The log can find info related to "get_swap_device: Bad swap file entry xxx"
> > > when an entry encounters an exception.
> > > Add a print info log like the following:
> > > pr_err("%s%08d\n", Bad swap type, swp_type(entry));
> >
> > This is really strange. It would be better to have the entire PTE value dumped so
> > we can determine if a bit-flip occurred on critical bits like PTE_PRESENT.
> Do you mean to convert the SWP entry to PTE and then print it out?
> pr_err("%s%08lx\n", Bad pte, pte_val(swp_entry_to_pte(entry)));
>
> Or is it sufficient to print the SWP entry directly?
> pr_err("%s%08lx\n", Bad swap entry, entry.val);
Yes, I think so. With that, we can convert it to PTE offline and debug using
that value.
By the way, I don’t have a strong opinion on whether this patch gets merged
or not, but it’s still nice to have. :)
I’m more interested in the bug itself and curious whether other Android
products using the same kernel will encounter the same issue.
> >
> > In that case, a present PTE could be misinterpreted as a swap entry.
> >
> > On arm64,
> > /*
> > * Encode and decode a swap entry:
> > * bits 0-1: present (must be zero)
> > * bits 2: remember PG_anon_exclusive
> > * bits 3-7: swap type
> > * bits 8-57: swap offset
> > * bit 58: PTE_PROT_NONE (must be zero)
> > */
> >
> > #define __SWP_TYPE_SHIFT 3
> > #define __SWP_TYPE_BITS 5
> > #define __SWP_OFFSET_BITS 50
> > #define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1)
> > #define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS +
> > __SWP_TYPE_SHIFT)
> > #define __SWP_OFFSET_MASK ((1UL << __SWP_OFFSET_BITS) - 1)
> >
> > _swp_type is bits3-7.
> >
> > For a present pte, bits 3-7 are:
> > AP[7-6], NS[5], AttributeIndex[4-2].
> >
> > > >
> > > > > + return -EINVAL;
> > > > >
> > > > > offset = swp_offset(entry);
> > > > > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset %
> > > > SWAPFILE_CLUSTER);
> > > > > --
> > > > > 2.17.1
> >
Thanks
Barry
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate
@ 2025-02-15 8:46 gaoxu
0 siblings, 0 replies; 7+ messages in thread
From: gaoxu @ 2025-02-15 8:46 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: linux-kernel, Suren Baghdasaryan, Barry Song, Yosry Ahmed, yipengxiang
[-- Attachment #1: Type: text/plain, Size: 3457 bytes --]
Add a NULL check on the return value of swp_swap_info in __swap_duplicate
to prevent crashes caused by NULL pointer dereference.
The reason why swp_swap_info() returns NULL is unclear; it may be due to
CPU cache issues or DDR bit flips. The probability of this issue is very
small, and the stack info we encountered is as follows:
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000058
[RB/E]rb_sreason_str_set: sreason_str set null_pointer
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000
[0000000000000058] pgd=0000000000000000, p4d=0000000000000000,
pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Skip md ftrace buffer dump for: 0x1609e0
...
pc : swap_duplicate+0x44/0x164
lr : copy_page_range+0x508/0x1e78
sp : ffffffc0f2a699e0
x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
Call trace:
swap_duplicate+0x44/0x164
copy_page_range+0x508/0x1e78
copy_process+0x1278/0x21cc
kernel_clone+0x90/0x438
__arm64_sys_clone+0x5c/0x8c
invoke_syscall+0x58/0x110
do_el0_svc+0x8c/0xe0
el0_svc+0x38/0x9c
el0t_64_sync_handler+0x44/0xec
el0t_64_sync+0x1a8/0x1ac
Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
The patch seems to only provide a workaround, but there are no more
effective software solutions to handle the bit flips problem. This path
will change the issue from a system crash to a process exception, thereby
reducing the impact on the entire machine.
Signed-off-by: gaoxu <gaoxu2@honor.com<mailto:gaoxu2@honor.com>>
---
v1 -> v2:
- Add WARN_ON_ONCE as suggested by Yosry Ahmed.
- update the commit info.
v2 -> v3: Delete the review tags (This is my issue, and I apologize).
---
mm/swapfile.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 7448a3876..a0bfdba94 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
int err, i;
si = swp_swap_info(entry);
+ if (WARN_ON_ONCE(!si))
+ return -EINVAL;
offset = swp_offset(entry);
VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
--
2.17.1
[-- Attachment #2: Type: text/html, Size: 11433 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-02-18 9:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-15 9:05 [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate gaoxu
2025-02-16 1:42 ` Barry Song
2025-02-18 2:51 ` 回复: " gaoxu
2025-02-18 5:40 ` Barry Song
2025-02-18 7:13 ` 回复: " gaoxu
2025-02-18 9:06 ` Barry Song
-- strict thread matches above, loose matches on Subject: below --
2025-02-15 8:46 gaoxu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox