* [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
@ 2025-02-22 2:46 Wupeng Ma
2025-02-22 3:45 ` Matthew Wilcox
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Wupeng Ma @ 2025-02-22 2:46 UTC (permalink / raw)
To: akpm
Cc: david, kasong, ryan.roberts, chrisl, huang.ying.caritas,
schatzberg.dan, baohua, hanchuanhua, willy, mawupeng1, linux-mm,
linux-kernel
From: Ma Wupeng <mawupeng1@huawei.com>
During our test, infinite loop is produced during #PF will lead to infinite
error log as follow:
get_swap_device: Bad swap file entry 114000000
Digging into the source, we found that the swap entry is invalid due to
unknown reason, and this lead to invalid swap_info_struct. Excessive log
printing can fill up the prioritized log space, leading to the purging of
originally valid logs and hindering problem troubleshooting. To make this
more robust, kill this task.
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
---
include/linux/swap.h | 1 +
mm/memory.c | 9 ++++++++-
mm/swapfile.c | 2 +-
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index b13b72645db3..0fa39cf66bc4 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -508,6 +508,7 @@ struct backing_dev_info;
extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
extern void exit_swap_address_space(unsigned int type);
extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
+struct swap_info_struct *_swap_info_get(swp_entry_t entry);
sector_t swap_folio_sector(struct folio *folio);
static inline void put_swap_device(struct swap_info_struct *si)
diff --git a/mm/memory.c b/mm/memory.c
index b4d3d4893267..2d36e5a644d1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
/* Prevent swapoff from happening to us. */
si = get_swap_device(entry);
- if (unlikely(!si))
+ if (unlikely(!si)) {
+ if (unlikely(!_swap_info_get(entry)))
+ /*
+ * return VM_FAULT_SIGBUS for invalid swap entry to
+ * avoid infinite #PF.
+ */
+ ret = VM_FAULT_SIGBUS;
goto out;
+ }
folio = swap_cache_get_folio(entry, vma, vmf->address);
if (folio)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ba19430dd4ea..8f580eff0ecb 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1287,7 +1287,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order)
return n_ret;
}
-static struct swap_info_struct *_swap_info_get(swp_entry_t entry)
+struct swap_info_struct *_swap_info_get(swp_entry_t entry)
{
struct swap_info_struct *si;
unsigned long offset;
--
2.43.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-22 2:46 [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page Wupeng Ma
@ 2025-02-22 3:45 ` Matthew Wilcox
2025-02-22 3:59 ` mawupeng
2025-02-22 7:33 ` Kairui Song
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2025-02-22 3:45 UTC (permalink / raw)
To: Wupeng Ma
Cc: akpm, david, kasong, ryan.roberts, chrisl, huang.ying.caritas,
schatzberg.dan, baohua, hanchuanhua, linux-mm, linux-kernel
On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote:
> Digging into the source, we found that the swap entry is invalid due to
> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> printing can fill up the prioritized log space, leading to the purging of
> originally valid logs and hindering problem troubleshooting. To make this
> more robust, kill this task.
this seems like a very bad way to fix this problem
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-22 3:45 ` Matthew Wilcox
@ 2025-02-22 3:59 ` mawupeng
2025-02-23 2:42 ` Matthew Wilcox
0 siblings, 1 reply; 16+ messages in thread
From: mawupeng @ 2025-02-22 3:59 UTC (permalink / raw)
To: willy
Cc: mawupeng1, akpm, david, kasong, ryan.roberts, chrisl,
huang.ying.caritas, schatzberg.dan, baohua, hanchuanhua,
linux-mm, linux-kernel
On 2025/2/22 11:45, Matthew Wilcox wrote:
> On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote:
>> Digging into the source, we found that the swap entry is invalid due to
>> unknown reason, and this lead to invalid swap_info_struct. Excessive log
>> printing can fill up the prioritized log space, leading to the purging of
>> originally valid logs and hindering problem troubleshooting. To make this
>> more robust, kill this task.
>
> this seems like a very bad way to fix this problem
Sure, It's a bad way to fix this. Just a proper way to make it more robust?
Since it will produce lots of invalid and same log?
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-22 2:46 [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page Wupeng Ma
2025-02-22 3:45 ` Matthew Wilcox
@ 2025-02-22 7:33 ` Kairui Song
2025-02-22 7:41 ` mawupeng
2025-02-23 2:38 ` kernel test robot
2025-02-23 2:50 ` kernel test robot
3 siblings, 1 reply; 16+ messages in thread
From: Kairui Song @ 2025-02-22 7:33 UTC (permalink / raw)
To: Wupeng Ma
Cc: akpm, david, ryan.roberts, chrisl, huang.ying.caritas,
schatzberg.dan, baohua, hanchuanhua, willy, gaoxu2, linux-mm,
linux-kernel, Nhat Pham, Yosry Ahmed
On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote:
>
> From: Ma Wupeng <mawupeng1@huawei.com>
>
> During our test, infinite loop is produced during #PF will lead to infinite
> error log as follow:
>
> get_swap_device: Bad swap file entry 114000000
>
> Digging into the source, we found that the swap entry is invalid due to
> unknown reason, and this lead to invalid swap_info_struct. Excessive log
Hi Wupeng,
What is the kernel version you are using? If it's another bug causing
this invalid swap entry, we should fix that bug instead, not
workaround it.
This looks kind of similar to another PATCH & Bug report, corrupted
page table or swap entry:
https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
Might be the same kernel bug? Gaoxu mentioned the bug was observed on
Kernel 6.6.30 (android version), and neither of these two workarounds
will fix it completely, the invalid value could cause many other
issues too. We definitely need to find out the root cause.
> printing can fill up the prioritized log space, leading to the purging of
> originally valid logs and hindering problem troubleshooting. To make this
> more robust, kill this task.
>
> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> ---
> include/linux/swap.h | 1 +
> mm/memory.c | 9 ++++++++-
> mm/swapfile.c | 2 +-
> 3 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index b13b72645db3..0fa39cf66bc4 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -508,6 +508,7 @@ struct backing_dev_info;
> extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
> extern void exit_swap_address_space(unsigned int type);
> extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
> +struct swap_info_struct *_swap_info_get(swp_entry_t entry);
> sector_t swap_folio_sector(struct folio *folio);
>
> static inline void put_swap_device(struct swap_info_struct *si)
> diff --git a/mm/memory.c b/mm/memory.c
> index b4d3d4893267..2d36e5a644d1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>
> /* Prevent swapoff from happening to us. */
> si = get_swap_device(entry);
> - if (unlikely(!si))
> + if (unlikely(!si)) {
> + if (unlikely(!_swap_info_get(entry)))
> + /*
> + * return VM_FAULT_SIGBUS for invalid swap entry to
> + * avoid infinite #PF.
> + */
> + ret = VM_FAULT_SIGBUS;
This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff
get_swap_device will return NULL.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-22 7:33 ` Kairui Song
@ 2025-02-22 7:41 ` mawupeng
2025-02-22 8:02 ` Kairui Song
0 siblings, 1 reply; 16+ messages in thread
From: mawupeng @ 2025-02-22 7:41 UTC (permalink / raw)
To: ryncsn
Cc: mawupeng1, akpm, david, ryan.roberts, chrisl, huang.ying.caritas,
schatzberg.dan, baohua, hanchuanhua, willy, gaoxu2, linux-mm,
linux-kernel, nphamcs, yosryahmed
On 2025/2/22 15:33, Kairui Song wrote:
> On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote:
>>
>> From: Ma Wupeng <mawupeng1@huawei.com>
>>
>> During our test, infinite loop is produced during #PF will lead to infinite
>> error log as follow:
>>
>> get_swap_device: Bad swap file entry 114000000
>>
>> Digging into the source, we found that the swap entry is invalid due to
>> unknown reason, and this lead to invalid swap_info_struct. Excessive log
>
> Hi Wupeng,
>
> What is the kernel version you are using? If it's another bug causing
> this invalid swap entry, we should fix that bug instead, not
> workaround it.
>
> This looks kind of similar to another PATCH & Bug report, corrupted
> page table or swap entry:
> https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
>
> Might be the same kernel bug? Gaoxu mentioned the bug was observed on
> Kernel 6.6.30 (android version), and neither of these two workarounds
> will fix it completely, the invalid value could cause many other
> issues too. We definitely need to find out the root cause.
We are having this problem in linux-v5.10, since the log is lost and swap
is not enabled in this machines, maybe memory corrupted in the pt.
>
>> printing can fill up the prioritized log space, leading to the purging of
>> originally valid logs and hindering problem troubleshooting. To make this
>> more robust, kill this task.
>>
>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>> ---
>> include/linux/swap.h | 1 +
>> mm/memory.c | 9 ++++++++-
>> mm/swapfile.c | 2 +-
>> 3 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index b13b72645db3..0fa39cf66bc4 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -508,6 +508,7 @@ struct backing_dev_info;
>> extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
>> extern void exit_swap_address_space(unsigned int type);
>> extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
>> +struct swap_info_struct *_swap_info_get(swp_entry_t entry);
>> sector_t swap_folio_sector(struct folio *folio);
>>
>> static inline void put_swap_device(struct swap_info_struct *si)
>> diff --git a/mm/memory.c b/mm/memory.c
>> index b4d3d4893267..2d36e5a644d1 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>
>> /* Prevent swapoff from happening to us. */
>> si = get_swap_device(entry);
>> - if (unlikely(!si))
>> + if (unlikely(!si)) {
>> + if (unlikely(!_swap_info_get(entry)))
>> + /*
>> + * return VM_FAULT_SIGBUS for invalid swap entry to
>> + * avoid infinite #PF.
>> + */
>> + ret = VM_FAULT_SIGBUS;
>
> This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff
> get_swap_device will return NULL.
If swap is off, All swap pages should be swap in as expected, so
such entry can not trigger do_swap_page?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-22 7:41 ` mawupeng
@ 2025-02-22 8:02 ` Kairui Song
2025-02-22 9:58 ` Barry Song
0 siblings, 1 reply; 16+ messages in thread
From: Kairui Song @ 2025-02-22 8:02 UTC (permalink / raw)
To: mawupeng
Cc: akpm, david, ryan.roberts, chrisl, huang.ying.caritas,
schatzberg.dan, baohua, hanchuanhua, willy, gaoxu2, linux-mm,
linux-kernel, nphamcs, yosryahmed
On Sat, Feb 22, 2025 at 3:41 PM mawupeng <mawupeng1@huawei.com> wrote:
> On 2025/2/22 15:33, Kairui Song wrote:
> > On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote:
> >>
> >> From: Ma Wupeng <mawupeng1@huawei.com>
> >>
> >> During our test, infinite loop is produced during #PF will lead to infinite
> >> error log as follow:
> >>
> >> get_swap_device: Bad swap file entry 114000000
> >>
> >> Digging into the source, we found that the swap entry is invalid due to
> >> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> >
> > Hi Wupeng,
> >
> > What is the kernel version you are using? If it's another bug causing
> > this invalid swap entry, we should fix that bug instead, not
> > workaround it.
> >
> > This looks kind of similar to another PATCH & Bug report, corrupted
> > page table or swap entry:
> > https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
> >
> > Might be the same kernel bug? Gaoxu mentioned the bug was observed on
> > Kernel 6.6.30 (android version), and neither of these two workarounds
> > will fix it completely, the invalid value could cause many other
> > issues too. We definitely need to find out the root cause.
>
> We are having this problem in linux-v5.10, since the log is lost and swap
> is not enabled in this machines, maybe memory corrupted in the pt.
Thanks for the info, that's very strange. Since you didn't even enable
SWAP, it must be something else corrupted the page table I think
> >
> >> printing can fill up the prioritized log space, leading to the purging of
> >> originally valid logs and hindering problem troubleshooting. To make this
> >> more robust, kill this task.
> >>
> >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> >> ---
> >> include/linux/swap.h | 1 +
> >> mm/memory.c | 9 ++++++++-
> >> mm/swapfile.c | 2 +-
> >> 3 files changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/swap.h b/include/linux/swap.h
> >> index b13b72645db3..0fa39cf66bc4 100644
> >> --- a/include/linux/swap.h
> >> +++ b/include/linux/swap.h
> >> @@ -508,6 +508,7 @@ struct backing_dev_info;
> >> extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
> >> extern void exit_swap_address_space(unsigned int type);
> >> extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
> >> +struct swap_info_struct *_swap_info_get(swp_entry_t entry);
> >> sector_t swap_folio_sector(struct folio *folio);
> >>
> >> static inline void put_swap_device(struct swap_info_struct *si)
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index b4d3d4893267..2d36e5a644d1 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> >>
> >> /* Prevent swapoff from happening to us. */
> >> si = get_swap_device(entry);
> >> - if (unlikely(!si))
> >> + if (unlikely(!si)) {
> >> + if (unlikely(!_swap_info_get(entry)))
> >> + /*
> >> + * return VM_FAULT_SIGBUS for invalid swap entry to
> >> + * avoid infinite #PF.
> >> + */
> >> + ret = VM_FAULT_SIGBUS;
> >
> > This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff
> > get_swap_device will return NULL.
>
> If swap is off, All swap pages should be swap in as expected, so
> such entry can not trigger do_swap_page?
do_swap_page may get blocked due to some random reason, and then a
concurrent swapoff could swap in the entry and disable the device.
Very unlikely to trigger but in theory possible.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-22 8:02 ` Kairui Song
@ 2025-02-22 9:58 ` Barry Song
0 siblings, 0 replies; 16+ messages in thread
From: Barry Song @ 2025-02-22 9:58 UTC (permalink / raw)
To: Kairui Song
Cc: mawupeng, akpm, david, ryan.roberts, chrisl, huang.ying.caritas,
schatzberg.dan, hanchuanhua, willy, gaoxu2, linux-mm,
linux-kernel, nphamcs, yosryahmed
On Sat, Feb 22, 2025 at 9:03 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Sat, Feb 22, 2025 at 3:41 PM mawupeng <mawupeng1@huawei.com> wrote:
> > On 2025/2/22 15:33, Kairui Song wrote:
> > > On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote:
> > >>
> > >> From: Ma Wupeng <mawupeng1@huawei.com>
> > >>
> > >> During our test, infinite loop is produced during #PF will lead to infinite
> > >> error log as follow:
> > >>
> > >> get_swap_device: Bad swap file entry 114000000
> > >>
> > >> Digging into the source, we found that the swap entry is invalid due to
> > >> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> > >
> > > Hi Wupeng,
> > >
> > > What is the kernel version you are using? If it's another bug causing
> > > this invalid swap entry, we should fix that bug instead, not
> > > workaround it.
> > >
> > > This looks kind of similar to another PATCH & Bug report, corrupted
> > > page table or swap entry:
> > > https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
> > >
> > > Might be the same kernel bug? Gaoxu mentioned the bug was observed on
> > > Kernel 6.6.30 (android version), and neither of these two workarounds
> > > will fix it completely, the invalid value could cause many other
> > > issues too. We definitely need to find out the root cause.
> >
> > We are having this problem in linux-v5.10, since the log is lost and swap
> > is not enabled in this machines, maybe memory corrupted in the pt.
>
> Thanks for the info, that's very strange. Since you didn't even enable
> SWAP, it must be something else corrupted the page table I think
>
> > >
> > >> printing can fill up the prioritized log space, leading to the purging of
> > >> originally valid logs and hindering problem troubleshooting. To make this
> > >> more robust, kill this task.
> > >>
> > >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> > >> ---
> > >> include/linux/swap.h | 1 +
> > >> mm/memory.c | 9 ++++++++-
> > >> mm/swapfile.c | 2 +-
> > >> 3 files changed, 10 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/include/linux/swap.h b/include/linux/swap.h
> > >> index b13b72645db3..0fa39cf66bc4 100644
> > >> --- a/include/linux/swap.h
> > >> +++ b/include/linux/swap.h
> > >> @@ -508,6 +508,7 @@ struct backing_dev_info;
> > >> extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
> > >> extern void exit_swap_address_space(unsigned int type);
> > >> extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
> > >> +struct swap_info_struct *_swap_info_get(swp_entry_t entry);
> > >> sector_t swap_folio_sector(struct folio *folio);
> > >>
> > >> static inline void put_swap_device(struct swap_info_struct *si)
> > >> diff --git a/mm/memory.c b/mm/memory.c
> > >> index b4d3d4893267..2d36e5a644d1 100644
> > >> --- a/mm/memory.c
> > >> +++ b/mm/memory.c
> > >> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > >>
> > >> /* Prevent swapoff from happening to us. */
> > >> si = get_swap_device(entry);
> > >> - if (unlikely(!si))
> > >> + if (unlikely(!si)) {
> > >> + if (unlikely(!_swap_info_get(entry)))
> > >> + /*
> > >> + * return VM_FAULT_SIGBUS for invalid swap entry to
> > >> + * avoid infinite #PF.
> > >> + */
> > >> + ret = VM_FAULT_SIGBUS;
> > >
> > > This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff
> > > get_swap_device will return NULL.
> >
> > If swap is off, All swap pages should be swap in as expected, so
> > such entry can not trigger do_swap_page?
>
> do_swap_page may get blocked due to some random reason, and then a
> concurrent swapoff could swap in the entry and disable the device.
> Very unlikely to trigger but in theory possible.
The "goto out" in do_swap_page() should have handled this case. If swapoff
occurred before the actual swap-in began, we should have aborted the
swap-in, and userspace would retry.
/* Prevent swapoff from happening to us. */
si = get_swap_device(entry);
if (unlikely(!si))
goto out;
Thanks
Barry
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-22 2:46 [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page Wupeng Ma
2025-02-22 3:45 ` Matthew Wilcox
2025-02-22 7:33 ` Kairui Song
@ 2025-02-23 2:38 ` kernel test robot
2025-02-23 2:50 ` kernel test robot
3 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2025-02-23 2:38 UTC (permalink / raw)
To: Wupeng Ma, akpm
Cc: oe-kbuild-all, david, kasong, ryan.roberts, chrisl,
huang.ying.caritas, schatzberg.dan, baohua, hanchuanhua, willy,
mawupeng1, linux-mm, linux-kernel
Hi Wupeng,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Wupeng-Ma/mm-swap-Avoid-infinite-loop-if-no-valid-swap-entry-found-during-do_swap_page/20250222-105637
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20250222024617.2790609-1-mawupeng1%40huawei.com
patch subject: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
config: x86_64-buildonly-randconfig-003-20250223 (https://download.01.org/0day-ci/archive/20250223/202502231018.gCTScklR-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250223/202502231018.gCTScklR-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202502231018.gCTScklR-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from include/linux/build_bug.h:5,
from include/linux/container_of.h:5,
from include/linux/list.h:5,
from include/linux/smp.h:12,
from include/linux/kernel_stat.h:5,
from mm/memory.c:42:
mm/memory.c: In function 'do_swap_page':
>> mm/memory.c:4404:31: error: implicit declaration of function '_swap_info_get' [-Werror=implicit-function-declaration]
4404 | if (unlikely(!_swap_info_get(entry)))
| ^~~~~~~~~~~~~~
include/linux/compiler.h:32:55: note: in definition of macro '__branch_check__'
32 | ______r = __builtin_expect(!!(x), expect); \
| ^
mm/memory.c:4404:21: note: in expansion of macro 'unlikely'
4404 | if (unlikely(!_swap_info_get(entry)))
| ^~~~~~~~
cc1: some warnings being treated as errors
vim +/_swap_info_get +4404 mm/memory.c
4322
4323 /*
4324 * We enter with non-exclusive mmap_lock (to exclude vma changes,
4325 * but allow concurrent faults), and pte mapped but not yet locked.
4326 * We return with pte unmapped and unlocked.
4327 *
4328 * We return with the mmap_lock locked or unlocked in the same cases
4329 * as does filemap_fault().
4330 */
4331 vm_fault_t do_swap_page(struct vm_fault *vmf)
4332 {
4333 struct vm_area_struct *vma = vmf->vma;
4334 struct folio *swapcache, *folio = NULL;
4335 DECLARE_WAITQUEUE(wait, current);
4336 struct page *page;
4337 struct swap_info_struct *si = NULL;
4338 rmap_t rmap_flags = RMAP_NONE;
4339 bool need_clear_cache = false;
4340 bool exclusive = false;
4341 swp_entry_t entry;
4342 pte_t pte;
4343 vm_fault_t ret = 0;
4344 void *shadow = NULL;
4345 int nr_pages;
4346 unsigned long page_idx;
4347 unsigned long address;
4348 pte_t *ptep;
4349
4350 if (!pte_unmap_same(vmf))
4351 goto out;
4352
4353 entry = pte_to_swp_entry(vmf->orig_pte);
4354 if (unlikely(non_swap_entry(entry))) {
4355 if (is_migration_entry(entry)) {
4356 migration_entry_wait(vma->vm_mm, vmf->pmd,
4357 vmf->address);
4358 } else if (is_device_exclusive_entry(entry)) {
4359 vmf->page = pfn_swap_entry_to_page(entry);
4360 ret = remove_device_exclusive_entry(vmf);
4361 } else if (is_device_private_entry(entry)) {
4362 struct dev_pagemap *pgmap;
4363 if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
4364 /*
4365 * migrate_to_ram is not yet ready to operate
4366 * under VMA lock.
4367 */
4368 vma_end_read(vma);
4369 ret = VM_FAULT_RETRY;
4370 goto out;
4371 }
4372
4373 vmf->page = pfn_swap_entry_to_page(entry);
4374 vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
4375 vmf->address, &vmf->ptl);
4376 if (unlikely(!vmf->pte ||
4377 !pte_same(ptep_get(vmf->pte),
4378 vmf->orig_pte)))
4379 goto unlock;
4380
4381 /*
4382 * Get a page reference while we know the page can't be
4383 * freed.
4384 */
4385 get_page(vmf->page);
4386 pte_unmap_unlock(vmf->pte, vmf->ptl);
4387 pgmap = page_pgmap(vmf->page);
4388 ret = pgmap->ops->migrate_to_ram(vmf);
4389 put_page(vmf->page);
4390 } else if (is_hwpoison_entry(entry)) {
4391 ret = VM_FAULT_HWPOISON;
4392 } else if (is_pte_marker_entry(entry)) {
4393 ret = handle_pte_marker(vmf);
4394 } else {
4395 print_bad_pte(vma, vmf->address, vmf->orig_pte, NULL);
4396 ret = VM_FAULT_SIGBUS;
4397 }
4398 goto out;
4399 }
4400
4401 /* Prevent swapoff from happening to us. */
4402 si = get_swap_device(entry);
4403 if (unlikely(!si)) {
> 4404 if (unlikely(!_swap_info_get(entry)))
4405 /*
4406 * return VM_FAULT_SIGBUS for invalid swap entry to
4407 * avoid infinite #PF.
4408 */
4409 ret = VM_FAULT_SIGBUS;
4410 goto out;
4411 }
4412
4413 folio = swap_cache_get_folio(entry, vma, vmf->address);
4414 if (folio)
4415 page = folio_file_page(folio, swp_offset(entry));
4416 swapcache = folio;
4417
4418 if (!folio) {
4419 if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
4420 __swap_count(entry) == 1) {
4421 /* skip swapcache */
4422 folio = alloc_swap_folio(vmf);
4423 if (folio) {
4424 __folio_set_locked(folio);
4425 __folio_set_swapbacked(folio);
4426
4427 nr_pages = folio_nr_pages(folio);
4428 if (folio_test_large(folio))
4429 entry.val = ALIGN_DOWN(entry.val, nr_pages);
4430 /*
4431 * Prevent parallel swapin from proceeding with
4432 * the cache flag. Otherwise, another thread
4433 * may finish swapin first, free the entry, and
4434 * swapout reusing the same entry. It's
4435 * undetectable as pte_same() returns true due
4436 * to entry reuse.
4437 */
4438 if (swapcache_prepare(entry, nr_pages)) {
4439 /*
4440 * Relax a bit to prevent rapid
4441 * repeated page faults.
4442 */
4443 add_wait_queue(&swapcache_wq, &wait);
4444 schedule_timeout_uninterruptible(1);
4445 remove_wait_queue(&swapcache_wq, &wait);
4446 goto out_page;
4447 }
4448 need_clear_cache = true;
4449
4450 memcg1_swapin(entry, nr_pages);
4451
4452 shadow = get_shadow_from_swap_cache(entry);
4453 if (shadow)
4454 workingset_refault(folio, shadow);
4455
4456 folio_add_lru(folio);
4457
4458 /* To provide entry to swap_read_folio() */
4459 folio->swap = entry;
4460 swap_read_folio(folio, NULL);
4461 folio->private = NULL;
4462 }
4463 } else {
4464 folio = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
4465 vmf);
4466 swapcache = folio;
4467 }
4468
4469 if (!folio) {
4470 /*
4471 * Back out if somebody else faulted in this pte
4472 * while we released the pte lock.
4473 */
4474 vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
4475 vmf->address, &vmf->ptl);
4476 if (likely(vmf->pte &&
4477 pte_same(ptep_get(vmf->pte), vmf->orig_pte)))
4478 ret = VM_FAULT_OOM;
4479 goto unlock;
4480 }
4481
4482 /* Had to read the page from swap area: Major fault */
4483 ret = VM_FAULT_MAJOR;
4484 count_vm_event(PGMAJFAULT);
4485 count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
4486 page = folio_file_page(folio, swp_offset(entry));
4487 } else if (PageHWPoison(page)) {
4488 /*
4489 * hwpoisoned dirty swapcache pages are kept for killing
4490 * owner processes (which may be unknown at hwpoison time)
4491 */
4492 ret = VM_FAULT_HWPOISON;
4493 goto out_release;
4494 }
4495
4496 ret |= folio_lock_or_retry(folio, vmf);
4497 if (ret & VM_FAULT_RETRY)
4498 goto out_release;
4499
4500 if (swapcache) {
4501 /*
4502 * Make sure folio_free_swap() or swapoff did not release the
4503 * swapcache from under us. The page pin, and pte_same test
4504 * below, are not enough to exclude that. Even if it is still
4505 * swapcache, we need to check that the page's swap has not
4506 * changed.
4507 */
4508 if (unlikely(!folio_test_swapcache(folio) ||
4509 page_swap_entry(page).val != entry.val))
4510 goto out_page;
4511
4512 /*
4513 * KSM sometimes has to copy on read faults, for example, if
4514 * page->index of !PageKSM() pages would be nonlinear inside the
4515 * anon VMA -- PageKSM() is lost on actual swapout.
4516 */
4517 folio = ksm_might_need_to_copy(folio, vma, vmf->address);
4518 if (unlikely(!folio)) {
4519 ret = VM_FAULT_OOM;
4520 folio = swapcache;
4521 goto out_page;
4522 } else if (unlikely(folio == ERR_PTR(-EHWPOISON))) {
4523 ret = VM_FAULT_HWPOISON;
4524 folio = swapcache;
4525 goto out_page;
4526 }
4527 if (folio != swapcache)
4528 page = folio_page(folio, 0);
4529
4530 /*
4531 * If we want to map a page that's in the swapcache writable, we
4532 * have to detect via the refcount if we're really the exclusive
4533 * owner. Try removing the extra reference from the local LRU
4534 * caches if required.
4535 */
4536 if ((vmf->flags & FAULT_FLAG_WRITE) && folio == swapcache &&
4537 !folio_test_ksm(folio) && !folio_test_lru(folio))
4538 lru_add_drain();
4539 }
4540
4541 folio_throttle_swaprate(folio, GFP_KERNEL);
4542
4543 /*
4544 * Back out if somebody else already faulted in this pte.
4545 */
4546 vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
4547 &vmf->ptl);
4548 if (unlikely(!vmf->pte || !pte_same(ptep_get(vmf->pte), vmf->orig_pte)))
4549 goto out_nomap;
4550
4551 if (unlikely(!folio_test_uptodate(folio))) {
4552 ret = VM_FAULT_SIGBUS;
4553 goto out_nomap;
4554 }
4555
4556 /* allocated large folios for SWP_SYNCHRONOUS_IO */
4557 if (folio_test_large(folio) && !folio_test_swapcache(folio)) {
4558 unsigned long nr = folio_nr_pages(folio);
4559 unsigned long folio_start = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE);
4560 unsigned long idx = (vmf->address - folio_start) / PAGE_SIZE;
4561 pte_t *folio_ptep = vmf->pte - idx;
4562 pte_t folio_pte = ptep_get(folio_ptep);
4563
4564 if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) ||
4565 swap_pte_batch(folio_ptep, nr, folio_pte) != nr)
4566 goto out_nomap;
4567
4568 page_idx = idx;
4569 address = folio_start;
4570 ptep = folio_ptep;
4571 goto check_folio;
4572 }
4573
4574 nr_pages = 1;
4575 page_idx = 0;
4576 address = vmf->address;
4577 ptep = vmf->pte;
4578 if (folio_test_large(folio) && folio_test_swapcache(folio)) {
4579 int nr = folio_nr_pages(folio);
4580 unsigned long idx = folio_page_idx(folio, page);
4581 unsigned long folio_start = address - idx * PAGE_SIZE;
4582 unsigned long folio_end = folio_start + nr * PAGE_SIZE;
4583 pte_t *folio_ptep;
4584 pte_t folio_pte;
4585
4586 if (unlikely(folio_start < max(address & PMD_MASK, vma->vm_start)))
4587 goto check_folio;
4588 if (unlikely(folio_end > pmd_addr_end(address, vma->vm_end)))
4589 goto check_folio;
4590
4591 folio_ptep = vmf->pte - idx;
4592 folio_pte = ptep_get(folio_ptep);
4593 if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) ||
4594 swap_pte_batch(folio_ptep, nr, folio_pte) != nr)
4595 goto check_folio;
4596
4597 page_idx = idx;
4598 address = folio_start;
4599 ptep = folio_ptep;
4600 nr_pages = nr;
4601 entry = folio->swap;
4602 page = &folio->page;
4603 }
4604
4605 check_folio:
4606 /*
4607 * PG_anon_exclusive reuses PG_mappedtodisk for anon pages. A swap pte
4608 * must never point at an anonymous page in the swapcache that is
4609 * PG_anon_exclusive. Sanity check that this holds and especially, that
4610 * no filesystem set PG_mappedtodisk on a page in the swapcache. Sanity
4611 * check after taking the PT lock and making sure that nobody
4612 * concurrently faulted in this page and set PG_anon_exclusive.
4613 */
4614 BUG_ON(!folio_test_anon(folio) && folio_test_mappedtodisk(folio));
4615 BUG_ON(folio_test_anon(folio) && PageAnonExclusive(page));
4616
4617 /*
4618 * Check under PT lock (to protect against concurrent fork() sharing
4619 * the swap entry concurrently) for certainly exclusive pages.
4620 */
4621 if (!folio_test_ksm(folio)) {
4622 exclusive = pte_swp_exclusive(vmf->orig_pte);
4623 if (folio != swapcache) {
4624 /*
4625 * We have a fresh page that is not exposed to the
4626 * swapcache -> certainly exclusive.
4627 */
4628 exclusive = true;
4629 } else if (exclusive && folio_test_writeback(folio) &&
4630 data_race(si->flags & SWP_STABLE_WRITES)) {
4631 /*
4632 * This is tricky: not all swap backends support
4633 * concurrent page modifications while under writeback.
4634 *
4635 * So if we stumble over such a page in the swapcache
4636 * we must not set the page exclusive, otherwise we can
4637 * map it writable without further checks and modify it
4638 * while still under writeback.
4639 *
4640 * For these problematic swap backends, simply drop the
4641 * exclusive marker: this is perfectly fine as we start
4642 * writeback only if we fully unmapped the page and
4643 * there are no unexpected references on the page after
4644 * unmapping succeeded. After fully unmapped, no
4645 * further GUP references (FOLL_GET and FOLL_PIN) can
4646 * appear, so dropping the exclusive marker and mapping
4647 * it only R/O is fine.
4648 */
4649 exclusive = false;
4650 }
4651 }
4652
4653 /*
4654 * Some architectures may have to restore extra metadata to the page
4655 * when reading from swap. This metadata may be indexed by swap entry
4656 * so this must be called before swap_free().
4657 */
4658 arch_swap_restore(folio_swap(entry, folio), folio);
4659
4660 /*
4661 * Remove the swap entry and conditionally try to free up the swapcache.
4662 * We're already holding a reference on the page but haven't mapped it
4663 * yet.
4664 */
4665 swap_free_nr(entry, nr_pages);
4666 if (should_try_to_free_swap(folio, vma, vmf->flags))
4667 folio_free_swap(folio);
4668
4669 add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
4670 add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages);
4671 pte = mk_pte(page, vma->vm_page_prot);
4672 if (pte_swp_soft_dirty(vmf->orig_pte))
4673 pte = pte_mksoft_dirty(pte);
4674 if (pte_swp_uffd_wp(vmf->orig_pte))
4675 pte = pte_mkuffd_wp(pte);
4676
4677 /*
4678 * Same logic as in do_wp_page(); however, optimize for pages that are
4679 * certainly not shared either because we just allocated them without
4680 * exposing them to the swapcache or because the swap entry indicates
4681 * exclusivity.
4682 */
4683 if (!folio_test_ksm(folio) &&
4684 (exclusive || folio_ref_count(folio) == 1)) {
4685 if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) &&
4686 !pte_needs_soft_dirty_wp(vma, pte)) {
4687 pte = pte_mkwrite(pte, vma);
4688 if (vmf->flags & FAULT_FLAG_WRITE) {
4689 pte = pte_mkdirty(pte);
4690 vmf->flags &= ~FAULT_FLAG_WRITE;
4691 }
4692 }
4693 rmap_flags |= RMAP_EXCLUSIVE;
4694 }
4695 folio_ref_add(folio, nr_pages - 1);
4696 flush_icache_pages(vma, page, nr_pages);
4697 vmf->orig_pte = pte_advance_pfn(pte, page_idx);
4698
4699 /* ksm created a completely new copy */
4700 if (unlikely(folio != swapcache && swapcache)) {
4701 folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
4702 folio_add_lru_vma(folio, vma);
4703 } else if (!folio_test_anon(folio)) {
4704 /*
4705 * We currently only expect small !anon folios which are either
4706 * fully exclusive or fully shared, or new allocated large
4707 * folios which are fully exclusive. If we ever get large
4708 * folios within swapcache here, we have to be careful.
4709 */
4710 VM_WARN_ON_ONCE(folio_test_large(folio) && folio_test_swapcache(folio));
4711 VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
4712 folio_add_new_anon_rmap(folio, vma, address, rmap_flags);
4713 } else {
4714 folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address,
4715 rmap_flags);
4716 }
4717
4718 VM_BUG_ON(!folio_test_anon(folio) ||
4719 (pte_write(pte) && !PageAnonExclusive(page)));
4720 set_ptes(vma->vm_mm, address, ptep, pte, nr_pages);
4721 arch_do_swap_page_nr(vma->vm_mm, vma, address,
4722 pte, pte, nr_pages);
4723
4724 folio_unlock(folio);
4725 if (folio != swapcache && swapcache) {
4726 /*
4727 * Hold the lock to avoid the swap entry to be reused
4728 * until we take the PT lock for the pte_same() check
4729 * (to avoid false positives from pte_same). For
4730 * further safety release the lock after the swap_free
4731 * so that the swap count won't change under a
4732 * parallel locked swapcache.
4733 */
4734 folio_unlock(swapcache);
4735 folio_put(swapcache);
4736 }
4737
4738 if (vmf->flags & FAULT_FLAG_WRITE) {
4739 ret |= do_wp_page(vmf);
4740 if (ret & VM_FAULT_ERROR)
4741 ret &= VM_FAULT_ERROR;
4742 goto out;
4743 }
4744
4745 /* No need to invalidate - it was non-present before */
4746 update_mmu_cache_range(vmf, vma, address, ptep, nr_pages);
4747 unlock:
4748 if (vmf->pte)
4749 pte_unmap_unlock(vmf->pte, vmf->ptl);
4750 out:
4751 /* Clear the swap cache pin for direct swapin after PTL unlock */
4752 if (need_clear_cache) {
4753 swapcache_clear(si, entry, nr_pages);
4754 if (waitqueue_active(&swapcache_wq))
4755 wake_up(&swapcache_wq);
4756 }
4757 if (si)
4758 put_swap_device(si);
4759 return ret;
4760 out_nomap:
4761 if (vmf->pte)
4762 pte_unmap_unlock(vmf->pte, vmf->ptl);
4763 out_page:
4764 folio_unlock(folio);
4765 out_release:
4766 folio_put(folio);
4767 if (folio != swapcache && swapcache) {
4768 folio_unlock(swapcache);
4769 folio_put(swapcache);
4770 }
4771 if (need_clear_cache) {
4772 swapcache_clear(si, entry, nr_pages);
4773 if (waitqueue_active(&swapcache_wq))
4774 wake_up(&swapcache_wq);
4775 }
4776 if (si)
4777 put_swap_device(si);
4778 return ret;
4779 }
4780
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-22 3:59 ` mawupeng
@ 2025-02-23 2:42 ` Matthew Wilcox
2025-02-23 6:09 ` Barry Song
2025-02-23 6:18 ` Barry Song
0 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox @ 2025-02-23 2:42 UTC (permalink / raw)
To: mawupeng
Cc: akpm, david, kasong, ryan.roberts, chrisl, huang.ying.caritas,
schatzberg.dan, baohua, hanchuanhua, linux-mm, linux-kernel
On Sat, Feb 22, 2025 at 11:59:53AM +0800, mawupeng wrote:
>
>
> On 2025/2/22 11:45, Matthew Wilcox wrote:
> > On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote:
> >> Digging into the source, we found that the swap entry is invalid due to
> >> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> >> printing can fill up the prioritized log space, leading to the purging of
> >> originally valid logs and hindering problem troubleshooting. To make this
> >> more robust, kill this task.
> >
> > this seems like a very bad way to fix this problem
>
> Sure, It's a bad way to fix this. Just a proper way to make it more robust?
> Since it will produce lots of invalid and same log?
We have a mechanism to prevent flooding the log: <linux/ratelimit.h>.
If you grep for 'ratelimit' in include, you'll see a number of
convenience functions exist; not sure whether you'll need to use the raw
ratelilmit stuff, or if you can just use one of the prepared ones.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-22 2:46 [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page Wupeng Ma
` (2 preceding siblings ...)
2025-02-23 2:38 ` kernel test robot
@ 2025-02-23 2:50 ` kernel test robot
3 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2025-02-23 2:50 UTC (permalink / raw)
To: Wupeng Ma, akpm
Cc: llvm, oe-kbuild-all, david, kasong, ryan.roberts, chrisl,
huang.ying.caritas, schatzberg.dan, baohua, hanchuanhua, willy,
mawupeng1, linux-mm, linux-kernel
Hi Wupeng,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Wupeng-Ma/mm-swap-Avoid-infinite-loop-if-no-valid-swap-entry-found-during-do_swap_page/20250222-105637
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20250222024617.2790609-1-mawupeng1%40huawei.com
patch subject: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
config: s390-randconfig-002-20250223 (https://download.01.org/0day-ci/archive/20250223/202502231048.C7L22P7h-lkp@intel.com/config)
compiler: clang version 16.0.6 (https://github.com/llvm/llvm-project 7cbf1a2591520c2491aa35339f227775f4d3adf6)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250223/202502231048.C7L22P7h-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202502231048.C7L22P7h-lkp@intel.com/
All errors (new ones prefixed by >>):
>> mm/memory.c:4404:17: error: call to undeclared function '_swap_info_get'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
if (unlikely(!_swap_info_get(entry)))
^
1 error generated.
vim +/_swap_info_get +4404 mm/memory.c
4322
4323 /*
4324 * We enter with non-exclusive mmap_lock (to exclude vma changes,
4325 * but allow concurrent faults), and pte mapped but not yet locked.
4326 * We return with pte unmapped and unlocked.
4327 *
4328 * We return with the mmap_lock locked or unlocked in the same cases
4329 * as does filemap_fault().
4330 */
4331 vm_fault_t do_swap_page(struct vm_fault *vmf)
4332 {
4333 struct vm_area_struct *vma = vmf->vma;
4334 struct folio *swapcache, *folio = NULL;
4335 DECLARE_WAITQUEUE(wait, current);
4336 struct page *page;
4337 struct swap_info_struct *si = NULL;
4338 rmap_t rmap_flags = RMAP_NONE;
4339 bool need_clear_cache = false;
4340 bool exclusive = false;
4341 swp_entry_t entry;
4342 pte_t pte;
4343 vm_fault_t ret = 0;
4344 void *shadow = NULL;
4345 int nr_pages;
4346 unsigned long page_idx;
4347 unsigned long address;
4348 pte_t *ptep;
4349
4350 if (!pte_unmap_same(vmf))
4351 goto out;
4352
4353 entry = pte_to_swp_entry(vmf->orig_pte);
4354 if (unlikely(non_swap_entry(entry))) {
4355 if (is_migration_entry(entry)) {
4356 migration_entry_wait(vma->vm_mm, vmf->pmd,
4357 vmf->address);
4358 } else if (is_device_exclusive_entry(entry)) {
4359 vmf->page = pfn_swap_entry_to_page(entry);
4360 ret = remove_device_exclusive_entry(vmf);
4361 } else if (is_device_private_entry(entry)) {
4362 struct dev_pagemap *pgmap;
4363 if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
4364 /*
4365 * migrate_to_ram is not yet ready to operate
4366 * under VMA lock.
4367 */
4368 vma_end_read(vma);
4369 ret = VM_FAULT_RETRY;
4370 goto out;
4371 }
4372
4373 vmf->page = pfn_swap_entry_to_page(entry);
4374 vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
4375 vmf->address, &vmf->ptl);
4376 if (unlikely(!vmf->pte ||
4377 !pte_same(ptep_get(vmf->pte),
4378 vmf->orig_pte)))
4379 goto unlock;
4380
4381 /*
4382 * Get a page reference while we know the page can't be
4383 * freed.
4384 */
4385 get_page(vmf->page);
4386 pte_unmap_unlock(vmf->pte, vmf->ptl);
4387 pgmap = page_pgmap(vmf->page);
4388 ret = pgmap->ops->migrate_to_ram(vmf);
4389 put_page(vmf->page);
4390 } else if (is_hwpoison_entry(entry)) {
4391 ret = VM_FAULT_HWPOISON;
4392 } else if (is_pte_marker_entry(entry)) {
4393 ret = handle_pte_marker(vmf);
4394 } else {
4395 print_bad_pte(vma, vmf->address, vmf->orig_pte, NULL);
4396 ret = VM_FAULT_SIGBUS;
4397 }
4398 goto out;
4399 }
4400
4401 /* Prevent swapoff from happening to us. */
4402 si = get_swap_device(entry);
4403 if (unlikely(!si)) {
> 4404 if (unlikely(!_swap_info_get(entry)))
4405 /*
4406 * return VM_FAULT_SIGBUS for invalid swap entry to
4407 * avoid infinite #PF.
4408 */
4409 ret = VM_FAULT_SIGBUS;
4410 goto out;
4411 }
4412
4413 folio = swap_cache_get_folio(entry, vma, vmf->address);
4414 if (folio)
4415 page = folio_file_page(folio, swp_offset(entry));
4416 swapcache = folio;
4417
4418 if (!folio) {
4419 if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
4420 __swap_count(entry) == 1) {
4421 /* skip swapcache */
4422 folio = alloc_swap_folio(vmf);
4423 if (folio) {
4424 __folio_set_locked(folio);
4425 __folio_set_swapbacked(folio);
4426
4427 nr_pages = folio_nr_pages(folio);
4428 if (folio_test_large(folio))
4429 entry.val = ALIGN_DOWN(entry.val, nr_pages);
4430 /*
4431 * Prevent parallel swapin from proceeding with
4432 * the cache flag. Otherwise, another thread
4433 * may finish swapin first, free the entry, and
4434 * swapout reusing the same entry. It's
4435 * undetectable as pte_same() returns true due
4436 * to entry reuse.
4437 */
4438 if (swapcache_prepare(entry, nr_pages)) {
4439 /*
4440 * Relax a bit to prevent rapid
4441 * repeated page faults.
4442 */
4443 add_wait_queue(&swapcache_wq, &wait);
4444 schedule_timeout_uninterruptible(1);
4445 remove_wait_queue(&swapcache_wq, &wait);
4446 goto out_page;
4447 }
4448 need_clear_cache = true;
4449
4450 memcg1_swapin(entry, nr_pages);
4451
4452 shadow = get_shadow_from_swap_cache(entry);
4453 if (shadow)
4454 workingset_refault(folio, shadow);
4455
4456 folio_add_lru(folio);
4457
4458 /* To provide entry to swap_read_folio() */
4459 folio->swap = entry;
4460 swap_read_folio(folio, NULL);
4461 folio->private = NULL;
4462 }
4463 } else {
4464 folio = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
4465 vmf);
4466 swapcache = folio;
4467 }
4468
4469 if (!folio) {
4470 /*
4471 * Back out if somebody else faulted in this pte
4472 * while we released the pte lock.
4473 */
4474 vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
4475 vmf->address, &vmf->ptl);
4476 if (likely(vmf->pte &&
4477 pte_same(ptep_get(vmf->pte), vmf->orig_pte)))
4478 ret = VM_FAULT_OOM;
4479 goto unlock;
4480 }
4481
4482 /* Had to read the page from swap area: Major fault */
4483 ret = VM_FAULT_MAJOR;
4484 count_vm_event(PGMAJFAULT);
4485 count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
4486 page = folio_file_page(folio, swp_offset(entry));
4487 } else if (PageHWPoison(page)) {
4488 /*
4489 * hwpoisoned dirty swapcache pages are kept for killing
4490 * owner processes (which may be unknown at hwpoison time)
4491 */
4492 ret = VM_FAULT_HWPOISON;
4493 goto out_release;
4494 }
4495
4496 ret |= folio_lock_or_retry(folio, vmf);
4497 if (ret & VM_FAULT_RETRY)
4498 goto out_release;
4499
4500 if (swapcache) {
4501 /*
4502 * Make sure folio_free_swap() or swapoff did not release the
4503 * swapcache from under us. The page pin, and pte_same test
4504 * below, are not enough to exclude that. Even if it is still
4505 * swapcache, we need to check that the page's swap has not
4506 * changed.
4507 */
4508 if (unlikely(!folio_test_swapcache(folio) ||
4509 page_swap_entry(page).val != entry.val))
4510 goto out_page;
4511
4512 /*
4513 * KSM sometimes has to copy on read faults, for example, if
4514 * page->index of !PageKSM() pages would be nonlinear inside the
4515 * anon VMA -- PageKSM() is lost on actual swapout.
4516 */
4517 folio = ksm_might_need_to_copy(folio, vma, vmf->address);
4518 if (unlikely(!folio)) {
4519 ret = VM_FAULT_OOM;
4520 folio = swapcache;
4521 goto out_page;
4522 } else if (unlikely(folio == ERR_PTR(-EHWPOISON))) {
4523 ret = VM_FAULT_HWPOISON;
4524 folio = swapcache;
4525 goto out_page;
4526 }
4527 if (folio != swapcache)
4528 page = folio_page(folio, 0);
4529
4530 /*
4531 * If we want to map a page that's in the swapcache writable, we
4532 * have to detect via the refcount if we're really the exclusive
4533 * owner. Try removing the extra reference from the local LRU
4534 * caches if required.
4535 */
4536 if ((vmf->flags & FAULT_FLAG_WRITE) && folio == swapcache &&
4537 !folio_test_ksm(folio) && !folio_test_lru(folio))
4538 lru_add_drain();
4539 }
4540
4541 folio_throttle_swaprate(folio, GFP_KERNEL);
4542
4543 /*
4544 * Back out if somebody else already faulted in this pte.
4545 */
4546 vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
4547 &vmf->ptl);
4548 if (unlikely(!vmf->pte || !pte_same(ptep_get(vmf->pte), vmf->orig_pte)))
4549 goto out_nomap;
4550
4551 if (unlikely(!folio_test_uptodate(folio))) {
4552 ret = VM_FAULT_SIGBUS;
4553 goto out_nomap;
4554 }
4555
4556 /* allocated large folios for SWP_SYNCHRONOUS_IO */
4557 if (folio_test_large(folio) && !folio_test_swapcache(folio)) {
4558 unsigned long nr = folio_nr_pages(folio);
4559 unsigned long folio_start = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE);
4560 unsigned long idx = (vmf->address - folio_start) / PAGE_SIZE;
4561 pte_t *folio_ptep = vmf->pte - idx;
4562 pte_t folio_pte = ptep_get(folio_ptep);
4563
4564 if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) ||
4565 swap_pte_batch(folio_ptep, nr, folio_pte) != nr)
4566 goto out_nomap;
4567
4568 page_idx = idx;
4569 address = folio_start;
4570 ptep = folio_ptep;
4571 goto check_folio;
4572 }
4573
4574 nr_pages = 1;
4575 page_idx = 0;
4576 address = vmf->address;
4577 ptep = vmf->pte;
4578 if (folio_test_large(folio) && folio_test_swapcache(folio)) {
4579 int nr = folio_nr_pages(folio);
4580 unsigned long idx = folio_page_idx(folio, page);
4581 unsigned long folio_start = address - idx * PAGE_SIZE;
4582 unsigned long folio_end = folio_start + nr * PAGE_SIZE;
4583 pte_t *folio_ptep;
4584 pte_t folio_pte;
4585
4586 if (unlikely(folio_start < max(address & PMD_MASK, vma->vm_start)))
4587 goto check_folio;
4588 if (unlikely(folio_end > pmd_addr_end(address, vma->vm_end)))
4589 goto check_folio;
4590
4591 folio_ptep = vmf->pte - idx;
4592 folio_pte = ptep_get(folio_ptep);
4593 if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) ||
4594 swap_pte_batch(folio_ptep, nr, folio_pte) != nr)
4595 goto check_folio;
4596
4597 page_idx = idx;
4598 address = folio_start;
4599 ptep = folio_ptep;
4600 nr_pages = nr;
4601 entry = folio->swap;
4602 page = &folio->page;
4603 }
4604
4605 check_folio:
4606 /*
4607 * PG_anon_exclusive reuses PG_mappedtodisk for anon pages. A swap pte
4608 * must never point at an anonymous page in the swapcache that is
4609 * PG_anon_exclusive. Sanity check that this holds and especially, that
4610 * no filesystem set PG_mappedtodisk on a page in the swapcache. Sanity
4611 * check after taking the PT lock and making sure that nobody
4612 * concurrently faulted in this page and set PG_anon_exclusive.
4613 */
4614 BUG_ON(!folio_test_anon(folio) && folio_test_mappedtodisk(folio));
4615 BUG_ON(folio_test_anon(folio) && PageAnonExclusive(page));
4616
4617 /*
4618 * Check under PT lock (to protect against concurrent fork() sharing
4619 * the swap entry concurrently) for certainly exclusive pages.
4620 */
4621 if (!folio_test_ksm(folio)) {
4622 exclusive = pte_swp_exclusive(vmf->orig_pte);
4623 if (folio != swapcache) {
4624 /*
4625 * We have a fresh page that is not exposed to the
4626 * swapcache -> certainly exclusive.
4627 */
4628 exclusive = true;
4629 } else if (exclusive && folio_test_writeback(folio) &&
4630 data_race(si->flags & SWP_STABLE_WRITES)) {
4631 /*
4632 * This is tricky: not all swap backends support
4633 * concurrent page modifications while under writeback.
4634 *
4635 * So if we stumble over such a page in the swapcache
4636 * we must not set the page exclusive, otherwise we can
4637 * map it writable without further checks and modify it
4638 * while still under writeback.
4639 *
4640 * For these problematic swap backends, simply drop the
4641 * exclusive marker: this is perfectly fine as we start
4642 * writeback only if we fully unmapped the page and
4643 * there are no unexpected references on the page after
4644 * unmapping succeeded. After fully unmapped, no
4645 * further GUP references (FOLL_GET and FOLL_PIN) can
4646 * appear, so dropping the exclusive marker and mapping
4647 * it only R/O is fine.
4648 */
4649 exclusive = false;
4650 }
4651 }
4652
4653 /*
4654 * Some architectures may have to restore extra metadata to the page
4655 * when reading from swap. This metadata may be indexed by swap entry
4656 * so this must be called before swap_free().
4657 */
4658 arch_swap_restore(folio_swap(entry, folio), folio);
4659
4660 /*
4661 * Remove the swap entry and conditionally try to free up the swapcache.
4662 * We're already holding a reference on the page but haven't mapped it
4663 * yet.
4664 */
4665 swap_free_nr(entry, nr_pages);
4666 if (should_try_to_free_swap(folio, vma, vmf->flags))
4667 folio_free_swap(folio);
4668
4669 add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
4670 add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages);
4671 pte = mk_pte(page, vma->vm_page_prot);
4672 if (pte_swp_soft_dirty(vmf->orig_pte))
4673 pte = pte_mksoft_dirty(pte);
4674 if (pte_swp_uffd_wp(vmf->orig_pte))
4675 pte = pte_mkuffd_wp(pte);
4676
4677 /*
4678 * Same logic as in do_wp_page(); however, optimize for pages that are
4679 * certainly not shared either because we just allocated them without
4680 * exposing them to the swapcache or because the swap entry indicates
4681 * exclusivity.
4682 */
4683 if (!folio_test_ksm(folio) &&
4684 (exclusive || folio_ref_count(folio) == 1)) {
4685 if ((vma->vm_flags & VM_WRITE) && !userfaultfd_pte_wp(vma, pte) &&
4686 !pte_needs_soft_dirty_wp(vma, pte)) {
4687 pte = pte_mkwrite(pte, vma);
4688 if (vmf->flags & FAULT_FLAG_WRITE) {
4689 pte = pte_mkdirty(pte);
4690 vmf->flags &= ~FAULT_FLAG_WRITE;
4691 }
4692 }
4693 rmap_flags |= RMAP_EXCLUSIVE;
4694 }
4695 folio_ref_add(folio, nr_pages - 1);
4696 flush_icache_pages(vma, page, nr_pages);
4697 vmf->orig_pte = pte_advance_pfn(pte, page_idx);
4698
4699 /* ksm created a completely new copy */
4700 if (unlikely(folio != swapcache && swapcache)) {
4701 folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
4702 folio_add_lru_vma(folio, vma);
4703 } else if (!folio_test_anon(folio)) {
4704 /*
4705 * We currently only expect small !anon folios which are either
4706 * fully exclusive or fully shared, or new allocated large
4707 * folios which are fully exclusive. If we ever get large
4708 * folios within swapcache here, we have to be careful.
4709 */
4710 VM_WARN_ON_ONCE(folio_test_large(folio) && folio_test_swapcache(folio));
4711 VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
4712 folio_add_new_anon_rmap(folio, vma, address, rmap_flags);
4713 } else {
4714 folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address,
4715 rmap_flags);
4716 }
4717
4718 VM_BUG_ON(!folio_test_anon(folio) ||
4719 (pte_write(pte) && !PageAnonExclusive(page)));
4720 set_ptes(vma->vm_mm, address, ptep, pte, nr_pages);
4721 arch_do_swap_page_nr(vma->vm_mm, vma, address,
4722 pte, pte, nr_pages);
4723
4724 folio_unlock(folio);
4725 if (folio != swapcache && swapcache) {
4726 /*
4727 * Hold the lock to avoid the swap entry to be reused
4728 * until we take the PT lock for the pte_same() check
4729 * (to avoid false positives from pte_same). For
4730 * further safety release the lock after the swap_free
4731 * so that the swap count won't change under a
4732 * parallel locked swapcache.
4733 */
4734 folio_unlock(swapcache);
4735 folio_put(swapcache);
4736 }
4737
4738 if (vmf->flags & FAULT_FLAG_WRITE) {
4739 ret |= do_wp_page(vmf);
4740 if (ret & VM_FAULT_ERROR)
4741 ret &= VM_FAULT_ERROR;
4742 goto out;
4743 }
4744
4745 /* No need to invalidate - it was non-present before */
4746 update_mmu_cache_range(vmf, vma, address, ptep, nr_pages);
4747 unlock:
4748 if (vmf->pte)
4749 pte_unmap_unlock(vmf->pte, vmf->ptl);
4750 out:
4751 /* Clear the swap cache pin for direct swapin after PTL unlock */
4752 if (need_clear_cache) {
4753 swapcache_clear(si, entry, nr_pages);
4754 if (waitqueue_active(&swapcache_wq))
4755 wake_up(&swapcache_wq);
4756 }
4757 if (si)
4758 put_swap_device(si);
4759 return ret;
4760 out_nomap:
4761 if (vmf->pte)
4762 pte_unmap_unlock(vmf->pte, vmf->ptl);
4763 out_page:
4764 folio_unlock(folio);
4765 out_release:
4766 folio_put(folio);
4767 if (folio != swapcache && swapcache) {
4768 folio_unlock(swapcache);
4769 folio_put(swapcache);
4770 }
4771 if (need_clear_cache) {
4772 swapcache_clear(si, entry, nr_pages);
4773 if (waitqueue_active(&swapcache_wq))
4774 wake_up(&swapcache_wq);
4775 }
4776 if (si)
4777 put_swap_device(si);
4778 return ret;
4779 }
4780
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-23 2:42 ` Matthew Wilcox
@ 2025-02-23 6:09 ` Barry Song
2025-02-23 6:18 ` Barry Song
1 sibling, 0 replies; 16+ messages in thread
From: Barry Song @ 2025-02-23 6:09 UTC (permalink / raw)
To: Matthew Wilcox
Cc: mawupeng, akpm, david, kasong, ryan.roberts, chrisl,
huang.ying.caritas, schatzberg.dan, hanchuanhua, linux-mm,
linux-kernel
On Sun, Feb 23, 2025 at 3:42 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, Feb 22, 2025 at 11:59:53AM +0800, mawupeng wrote:
> >
> >
> > On 2025/2/22 11:45, Matthew Wilcox wrote:
> > > On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote:
> > >> Digging into the source, we found that the swap entry is invalid due to
> > >> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> > >> printing can fill up the prioritized log space, leading to the purging of
> > >> originally valid logs and hindering problem troubleshooting. To make this
> > >> more robust, kill this task.
> > >
> > > this seems like a very bad way to fix this problem
> >
> > Sure, It's a bad way to fix this. Just a proper way to make it more robust?
> > Since it will produce lots of invalid and same log?
>
> We have a mechanism to prevent flooding the log: <linux/ratelimit.h>.
> If you grep for 'ratelimit' in include, you'll see a number of
> convenience functions exist; not sure whether you'll need to use the raw
> ratelilmit stuff, or if you can just use one of the prepared ones.
IMHO, I really don’t think log flooding is the issue here; rather, we’re dealing
with an endless page fault. For servers, that might mean server is unresponsive
, for phones, they could be quickly running out of battery.
It’s certainly better to identify the root cause, but it could be due
to a bit-flip in
DDR or memory corruption in the page table. Until we can properly fix it, the
patch seems somewhat reasonable—the wrong application gets killed, it at
least has a chance to be restarted by systemd, Android init, etc. A PTE pointing
to a non-existent swap file and never being enabled clearly indicates something
has gone seriously wrong - either a hardware issue or a kernel bug.
At the very least, it warrants a WARN_ON_ONCE(), even after we identify and fix
the root cause, as it still enhances the system's robustness.
Gaoxu will certainly encounter the same problem if do_swap_page() executes
earlier than swap_duplicate() where the PTE points to a non-existent swap
file [1]. That means the phone will heat up quickly.
[1] https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
Thanks
Barry
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-23 2:42 ` Matthew Wilcox
2025-02-23 6:09 ` Barry Song
@ 2025-02-23 6:18 ` Barry Song
2025-02-24 1:27 ` mawupeng
1 sibling, 1 reply; 16+ messages in thread
From: Barry Song @ 2025-02-23 6:18 UTC (permalink / raw)
To: Matthew Wilcox
Cc: mawupeng, akpm, david, kasong, ryan.roberts, chrisl,
huang.ying.caritas, schatzberg.dan, hanchuanhua, linux-mm,
linux-kernel
On Sun, Feb 23, 2025 at 3:42 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, Feb 22, 2025 at 11:59:53AM +0800, mawupeng wrote:
> >
> >
> > On 2025/2/22 11:45, Matthew Wilcox wrote:
> > > On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote:
> > >> Digging into the source, we found that the swap entry is invalid due to
> > >> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> > >> printing can fill up the prioritized log space, leading to the purging of
> > >> originally valid logs and hindering problem troubleshooting. To make this
> > >> more robust, kill this task.
> > >
> > > this seems like a very bad way to fix this problem
> >
> > Sure, It's a bad way to fix this. Just a proper way to make it more robust?
> > Since it will produce lots of invalid and same log?
>
> We have a mechanism to prevent flooding the log: <linux/ratelimit.h>.
> If you grep for 'ratelimit' in include, you'll see a number of
> convenience functions exist; not sure whether you'll need to use the raw
> ratelilmit stuff, or if you can just use one of the prepared ones.
>
IMHO, I really don’t think log flooding is the issue here; rather, we’re dealing
with an endless page fault. For servers, that might mean server is unresponsive
, for phones, they could be quickly running out of battery.
It’s certainly better to identify the root cause, but it could be due
to a bit-flip in
DDR or memory corruption in the page table. Until we can properly fix it, the
patch seems somewhat reasonable—the wrong application gets killed, it at
least has a chance to be restarted by systemd, Android init, etc. A PTE pointing
to a non-existent swap file and never being enabled clearly indicates something
has gone seriously wrong - either a hardware issue or a kernel bug.
At the very least, it warrants a WARN_ON_ONCE(), even after we identify and fix
the root cause, as it still enhances the system's robustness.
Gaoxu will certainly encounter the same problem if do_swap_page() executes
earlier than swap_duplicate() where the PTE points to a non-existent swap
file [1]. That means the phone will heat up quickly.
[1] https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
Thanks
Barry
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-23 6:18 ` Barry Song
@ 2025-02-24 1:27 ` mawupeng
2025-02-24 4:22 ` Matthew Wilcox
2025-02-24 7:11 ` Barry Song
0 siblings, 2 replies; 16+ messages in thread
From: mawupeng @ 2025-02-24 1:27 UTC (permalink / raw)
To: 21cnbao, willy
Cc: mawupeng1, akpm, david, kasong, ryan.roberts, chrisl,
huang.ying.caritas, schatzberg.dan, hanchuanhua, linux-mm,
linux-kernel
On 2025/2/23 14:18, Barry Song wrote:
> On Sun, Feb 23, 2025 at 3:42 PM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Sat, Feb 22, 2025 at 11:59:53AM +0800, mawupeng wrote:
>>>
>>>
>>> On 2025/2/22 11:45, Matthew Wilcox wrote:
>>>> On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote:
>>>>> Digging into the source, we found that the swap entry is invalid due to
>>>>> unknown reason, and this lead to invalid swap_info_struct. Excessive log
>>>>> printing can fill up the prioritized log space, leading to the purging of
>>>>> originally valid logs and hindering problem troubleshooting. To make this
>>>>> more robust, kill this task.
>>>>
>>>> this seems like a very bad way to fix this problem
>>>
>>> Sure, It's a bad way to fix this. Just a proper way to make it more robust?
>>> Since it will produce lots of invalid and same log?
>>
>> We have a mechanism to prevent flooding the log: <linux/ratelimit.h>.
>> If you grep for 'ratelimit' in include, you'll see a number of
>> convenience functions exist; not sure whether you'll need to use the raw
>> ratelilmit stuff, or if you can just use one of the prepared ones.
>>
>
> IMHO, I really don’t think log flooding is the issue here; rather, we’re dealing
> with an endless page fault. For servers, that might mean server is unresponsive
> , for phones, they could be quickly running out of battery.
Yes, log flooding is not the main issue here, endless #PF is rather a more serious
problem.
>
> It’s certainly better to identify the root cause, but it could be due
> to a bit-flip in
> DDR or memory corruption in the page table. Until we can properly fix it, the
> patch seems somewhat reasonable—the wrong application gets killed, it at
> least has a chance to be restarted by systemd, Android init, etc. A PTE pointing
> to a non-existent swap file and never being enabled clearly indicates something
> has gone seriously wrong - either a hardware issue or a kernel bug.
> At the very least, it warrants a WARN_ON_ONCE(), even after we identify and fix
> the root cause, as it still enhances the system's robustness.
>
> Gaoxu will certainly encounter the same problem if do_swap_page() executes
> earlier than swap_duplicate() where the PTE points to a non-existent swap
> file [1]. That means the phone will heat up quickly.
>
> [1] https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
>
> Thanks
> Barry
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-24 1:27 ` mawupeng
@ 2025-02-24 4:22 ` Matthew Wilcox
2025-02-24 7:11 ` Barry Song
1 sibling, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2025-02-24 4:22 UTC (permalink / raw)
To: mawupeng
Cc: 21cnbao, akpm, david, kasong, ryan.roberts, chrisl,
huang.ying.caritas, schatzberg.dan, hanchuanhua, linux-mm,
linux-kernel
On Mon, Feb 24, 2025 at 09:27:38AM +0800, mawupeng wrote:
> On 2025/2/23 14:18, Barry Song wrote:
> > On Sun, Feb 23, 2025 at 3:42 PM Matthew Wilcox <willy@infradead.org> wrote:
> >> On Sat, Feb 22, 2025 at 11:59:53AM +0800, mawupeng wrote:
> >>> On 2025/2/22 11:45, Matthew Wilcox wrote:
> >>>> On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote:
> >>>>> Digging into the source, we found that the swap entry is invalid due to
> >>>>> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> >>>>> printing can fill up the prioritized log space, leading to the purging of
> >>>>> originally valid logs and hindering problem troubleshooting. To make this
> >>>>> more robust, kill this task.
>
> Yes, log flooding is not the main issue here, endless #PF is rather a more serious
> problem.
Then don't write the report as if the log flooding is the real problem.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-24 1:27 ` mawupeng
2025-02-24 4:22 ` Matthew Wilcox
@ 2025-02-24 7:11 ` Barry Song
2025-02-24 15:37 ` Matthew Wilcox
1 sibling, 1 reply; 16+ messages in thread
From: Barry Song @ 2025-02-24 7:11 UTC (permalink / raw)
To: mawupeng
Cc: willy, akpm, david, kasong, ryan.roberts, chrisl,
huang.ying.caritas, schatzberg.dan, hanchuanhua, linux-mm,
linux-kernel
On Mon, Feb 24, 2025 at 2:27 PM mawupeng <mawupeng1@huawei.com> wrote:
>
>
>
> On 2025/2/23 14:18, Barry Song wrote:
> > On Sun, Feb 23, 2025 at 3:42 PM Matthew Wilcox <willy@infradead.org> wrote:
> >>
> >> On Sat, Feb 22, 2025 at 11:59:53AM +0800, mawupeng wrote:
> >>>
> >>>
> >>> On 2025/2/22 11:45, Matthew Wilcox wrote:
> >>>> On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote:
> >>>>> Digging into the source, we found that the swap entry is invalid due to
> >>>>> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> >>>>> printing can fill up the prioritized log space, leading to the purging of
> >>>>> originally valid logs and hindering problem troubleshooting. To make this
> >>>>> more robust, kill this task.
> >>>>
> >>>> this seems like a very bad way to fix this problem
> >>>
> >>> Sure, It's a bad way to fix this. Just a proper way to make it more robust?
> >>> Since it will produce lots of invalid and same log?
> >>
> >> We have a mechanism to prevent flooding the log: <linux/ratelimit.h>.
> >> If you grep for 'ratelimit' in include, you'll see a number of
> >> convenience functions exist; not sure whether you'll need to use the raw
> >> ratelilmit stuff, or if you can just use one of the prepared ones.
> >>
> >
> > IMHO, I really don’t think log flooding is the issue here; rather, we’re dealing
> > with an endless page fault. For servers, that might mean server is unresponsive
> > , for phones, they could be quickly running out of battery.
>
> Yes, log flooding is not the main issue here, endless #PF is rather a more serious
> problem.
>
Please send a V2 and update your changelog to accurately describe the real
issue. Additionally, clarify how frequently this occurs and why resolving
the root cause is challenging. Gaoxu reported a similar case on the Android
kernel 6.6, while you're reporting it on 5.10. He observed an occurrence
rate of 1 in 500,000 over a week on customer devices but was unable to
reproduce it in the lab.
BTW, your patch is incorrect, as normally we could have a case _swap_info_get()
returns NULL:
thread 1 thread2
1. page fault happens
with entry points to
swapfile;
swapoff()
2. do_swap_page()
In this scenario, _swap_info_get() may return NULL, which is expected,
and we should not return -ERRNO—the subsequent page fault will
detect that the PTE has changed. Since you have never enabled any
swap, the appropriate action is to do the following:
/* Prevent swapoff from happening to us. */
si = get_swap_device(entry);
- if (unlikely(!si))
+ if unlikely(!si)) {
+ /*
+ * Return VM_FAULT_SIGBUS if the swap entry points to
+ * a never-enabled swap file, caused by either hardware
+ * issues or a kernel bug. Return an error code to prevent
+ * an infinite page fault (#PF) loop.
+ if (WARN_ON_ONCE(!swp_swap_info(entry)))
+ ret = VM_FAULT_SIGBUS;
goto out;
+ }
> >
> > It’s certainly better to identify the root cause, but it could be due
> > to a bit-flip in
> > DDR or memory corruption in the page table. Until we can properly fix it, the
> > patch seems somewhat reasonable—the wrong application gets killed, it at
> > least has a chance to be restarted by systemd, Android init, etc. A PTE pointing
> > to a non-existent swap file and never being enabled clearly indicates something
> > has gone seriously wrong - either a hardware issue or a kernel bug.
> > At the very least, it warrants a WARN_ON_ONCE(), even after we identify and fix
> > the root cause, as it still enhances the system's robustness.
> >
> > Gaoxu will certainly encounter the same problem if do_swap_page() executes
> > earlier than swap_duplicate() where the PTE points to a non-existent swap
> > file [1]. That means the phone will heat up quickly.
> >
> > [1] https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
> >
Thanks
Barry
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page
2025-02-24 7:11 ` Barry Song
@ 2025-02-24 15:37 ` Matthew Wilcox
0 siblings, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2025-02-24 15:37 UTC (permalink / raw)
To: Barry Song
Cc: mawupeng, akpm, david, kasong, ryan.roberts, chrisl,
huang.ying.caritas, schatzberg.dan, hanchuanhua, linux-mm,
linux-kernel
On Mon, Feb 24, 2025 at 08:11:47PM +1300, Barry Song wrote:
> Please send a V2 and update your changelog to accurately describe the real
> issue. Additionally, clarify how frequently this occurs and why resolving
> the root cause is challenging. Gaoxu reported a similar case on the Android
> kernel 6.6, while you're reporting it on 5.10. He observed an occurrence
> rate of 1 in 500,000 over a week on customer devices but was unable to
> reproduce it in the lab.
>
> BTW, your patch is incorrect, as normally we could have a case _swap_info_get()
> returns NULL:
> thread 1 thread2
>
>
> 1. page fault happens
> with entry points to
> swapfile;
> swapoff()
> 2. do_swap_page()
>
> In this scenario, _swap_info_get() may return NULL, which is expected,
> and we should not return -ERRNO—the subsequent page fault will
> detect that the PTE has changed. Since you have never enabled any
> swap, the appropriate action is to do the following:
>
> /* Prevent swapoff from happening to us. */
> si = get_swap_device(entry);
> - if (unlikely(!si))
> + if unlikely(!si)) {
> + /*
> + * Return VM_FAULT_SIGBUS if the swap entry points to
> + * a never-enabled swap file, caused by either hardware
> + * issues or a kernel bug. Return an error code to prevent
> + * an infinite page fault (#PF) loop.
> + if (WARN_ON_ONCE(!swp_swap_info(entry)))
> + ret = VM_FAULT_SIGBUS;
> goto out;
> + }
This is overly specific to the case that you're tracking down.
So it's entirely appropriate to apply to _your_ kernel while you work on
tracking it down, but completely inappropriate to upstream.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-02-24 15:38 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-22 2:46 [PATCH] mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page Wupeng Ma
2025-02-22 3:45 ` Matthew Wilcox
2025-02-22 3:59 ` mawupeng
2025-02-23 2:42 ` Matthew Wilcox
2025-02-23 6:09 ` Barry Song
2025-02-23 6:18 ` Barry Song
2025-02-24 1:27 ` mawupeng
2025-02-24 4:22 ` Matthew Wilcox
2025-02-24 7:11 ` Barry Song
2025-02-24 15:37 ` Matthew Wilcox
2025-02-22 7:33 ` Kairui Song
2025-02-22 7:41 ` mawupeng
2025-02-22 8:02 ` Kairui Song
2025-02-22 9:58 ` Barry Song
2025-02-23 2:38 ` kernel test robot
2025-02-23 2:50 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox