linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte
@ 2023-09-06 12:42 Kefeng Wang
  2023-09-06 12:42 ` [PATCH -rfc 1/3] mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte() Kefeng Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Kefeng Wang @ 2023-09-06 12:42 UTC (permalink / raw)
  To: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton,
	Uladzislau Rezki, Christoph Hellwig, Lorenzo Stoakes, kasan-dev,
	linux-mm
  Cc: Kefeng Wang

This is a RFC, even patch3 is a hack to fix the softlock issue when
populate or depopulate pte with large region, looking forward to your
reply and advise, thanks.

Kefeng Wang (3):
  mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
  mm: kasan: shadow: move free_page() out of page table lock
  mm: kasan: shadow: HACK add cond_resched_lock() in
    kasan_depopulate_vmalloc_pte()

 include/linux/kasan.h |  9 ++++++---
 mm/kasan/shadow.c     | 20 +++++++++++++-------
 mm/vmalloc.c          |  7 ++++---
 3 files changed, 23 insertions(+), 13 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH -rfc 1/3] mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
  2023-09-06 12:42 [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Kefeng Wang
@ 2023-09-06 12:42 ` Kefeng Wang
  2023-09-06 12:42 ` [PATCH -rfc 2/3] mm: kasan: shadow: move free_page() out of page table lock Kefeng Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Kefeng Wang @ 2023-09-06 12:42 UTC (permalink / raw)
  To: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton,
	Uladzislau Rezki, Christoph Hellwig, Lorenzo Stoakes, kasan-dev,
	linux-mm
  Cc: Kefeng Wang

The kasan_populate_vmalloc() will cost a lot of time when populate
large size, it will cause soft lockup,

  watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
  _raw_spin_unlock_irqrestore+0x50/0xb8
  rmqueue_bulk+0x434/0x6b8
  get_page_from_freelist+0xdd4/0x1680
  __alloc_pages+0x244/0x508
  alloc_pages+0xf0/0x218
  __get_free_pages+0x1c/0x50
  kasan_populate_vmalloc_pte+0x30/0x188
  __apply_to_page_range+0x3ec/0x650
  apply_to_page_range+0x1c/0x30
  kasan_populate_vmalloc+0x60/0x70
  alloc_vmap_area.part.67+0x328/0xe50
  alloc_vmap_area+0x4c/0x78
  __get_vm_area_node.constprop.76+0x130/0x240
  __vmalloc_node_range+0x12c/0x340
  __vmalloc_node+0x8c/0xb0
  vmalloc+0x2c/0x40

Fix it by adding a cond_resched().

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/kasan/shadow.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index dd772f9d0f08..fd15e38ff80e 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -317,6 +317,8 @@ static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr,
 	unsigned long page;
 	pte_t pte;
 
+	cond_resched();
+
 	if (likely(!pte_none(ptep_get(ptep))))
 		return 0;
 
-- 
2.41.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH -rfc 2/3] mm: kasan: shadow: move free_page() out of page table lock
  2023-09-06 12:42 [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Kefeng Wang
  2023-09-06 12:42 ` [PATCH -rfc 1/3] mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte() Kefeng Wang
@ 2023-09-06 12:42 ` Kefeng Wang
  2023-09-06 12:42 ` [PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte() Kefeng Wang
  2023-09-15  0:58 ` [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Kefeng Wang
  3 siblings, 0 replies; 14+ messages in thread
From: Kefeng Wang @ 2023-09-06 12:42 UTC (permalink / raw)
  To: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton,
	Uladzislau Rezki, Christoph Hellwig, Lorenzo Stoakes, kasan-dev,
	linux-mm
  Cc: Kefeng Wang

free_page() is not needed to be protected by page table lock,
and it will take a little longer, so move it out of the spinlock.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/kasan/shadow.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index fd15e38ff80e..d7d6724da2e0 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -423,12 +423,13 @@ static int kasan_depopulate_vmalloc_pte(pte_t *ptep, unsigned long addr,
 	page = (unsigned long)__va(pte_pfn(ptep_get(ptep)) << PAGE_SHIFT);
 
 	spin_lock(&init_mm.page_table_lock);
-
-	if (likely(!pte_none(ptep_get(ptep)))) {
+	if (likely(!pte_none(ptep_get(ptep))))
 		pte_clear(&init_mm, addr, ptep);
-		free_page(page);
-	}
+	else
+		page = 0;
 	spin_unlock(&init_mm.page_table_lock);
+	if (page)
+		free_page(page);
 
 	return 0;
 }
-- 
2.41.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte()
  2023-09-06 12:42 [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Kefeng Wang
  2023-09-06 12:42 ` [PATCH -rfc 1/3] mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte() Kefeng Wang
  2023-09-06 12:42 ` [PATCH -rfc 2/3] mm: kasan: shadow: move free_page() out of page table lock Kefeng Wang
@ 2023-09-06 12:42 ` Kefeng Wang
  2023-09-13  8:48   ` kernel test robot
  2023-09-15  0:58 ` [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Kefeng Wang
  3 siblings, 1 reply; 14+ messages in thread
From: Kefeng Wang @ 2023-09-06 12:42 UTC (permalink / raw)
  To: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton,
	Uladzislau Rezki, Christoph Hellwig, Lorenzo Stoakes, kasan-dev,
	linux-mm
  Cc: Kefeng Wang

There is a similar softlockup issue with large size in kasan_release_vmalloc(),

  watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
  _raw_spin_unlock_irqrestore+0x50/0xb8
  free_pcppages_bulk+0x2bc/0x3e0
  free_unref_page_commit+0x1fc/0x290
  free_unref_page+0x184/0x250
  __free_pages+0x154/0x1a0
  free_pages+0x88/0xb0
  kasan_depopulate_vmalloc_pte+0x58/0x80
  __apply_to_page_range+0x3ec/0x650
  apply_to_existing_page_range+0x1c/0x30
  kasan_release_vmalloc+0xa4/0x118
  __purge_vmap_area_lazy+0x4f4/0xe30
  drain_vmap_area_work+0x60/0xc0
  process_one_work+0x4cc/0xa38
  worker_thread+0x240/0x638
  kthread+0x1c8/0x1e0
  ret_from_fork+0x10/0x20

But it is could be fixed by adding a cond_resched_lock(), but see comment
about kasan_release_vmalloc(), free_vmap_area_lock is to protect the
concurrency, so it looks risky, any advise to fix this issue?

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/kasan.h | 9 ++++++---
 mm/kasan/shadow.c     | 9 ++++++---
 mm/vmalloc.c          | 7 ++++---
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 3df5499f7936..6d85715c47ad 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -385,7 +385,8 @@ void kasan_populate_early_vm_area_shadow(void *start, unsigned long size);
 int kasan_populate_vmalloc(unsigned long addr, unsigned long size);
 void kasan_release_vmalloc(unsigned long start, unsigned long end,
 			   unsigned long free_region_start,
-			   unsigned long free_region_end);
+			   unsigned long free_region_end,
+			   void *lock);
 
 #else /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */
 
@@ -400,7 +401,8 @@ static inline int kasan_populate_vmalloc(unsigned long start,
 static inline void kasan_release_vmalloc(unsigned long start,
 					 unsigned long end,
 					 unsigned long free_region_start,
-					 unsigned long free_region_end) { }
+					 unsigned long free_region_end,
+					 void *lock) { }
 
 #endif /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */
 
@@ -435,7 +437,8 @@ static inline int kasan_populate_vmalloc(unsigned long start,
 static inline void kasan_release_vmalloc(unsigned long start,
 					 unsigned long end,
 					 unsigned long free_region_start,
-					 unsigned long free_region_end) { }
+					 unsigned long free_region_end,
+					 void *lock) { }
 
 static inline void *kasan_unpoison_vmalloc(const void *start,
 					   unsigned long size,
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index d7d6724da2e0..4bce98e2b30d 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -416,12 +416,14 @@ int kasan_populate_vmalloc(unsigned long addr, unsigned long size)
 }
 
 static int kasan_depopulate_vmalloc_pte(pte_t *ptep, unsigned long addr,
-					void *unused)
+					void *lock)
 {
 	unsigned long page;
 
 	page = (unsigned long)__va(pte_pfn(ptep_get(ptep)) << PAGE_SHIFT);
 
+	cond_resched_lock(lock);
+
 	spin_lock(&init_mm.page_table_lock);
 	if (likely(!pte_none(ptep_get(ptep))))
 		pte_clear(&init_mm, addr, ptep);
@@ -511,7 +513,8 @@ static int kasan_depopulate_vmalloc_pte(pte_t *ptep, unsigned long addr,
  */
 void kasan_release_vmalloc(unsigned long start, unsigned long end,
 			   unsigned long free_region_start,
-			   unsigned long free_region_end)
+			   unsigned long free_region_end,
+			   void *lock)
 {
 	void *shadow_start, *shadow_end;
 	unsigned long region_start, region_end;
@@ -547,7 +550,7 @@ void kasan_release_vmalloc(unsigned long start, unsigned long end,
 		apply_to_existing_page_range(&init_mm,
 					     (unsigned long)shadow_start,
 					     size, kasan_depopulate_vmalloc_pte,
-					     NULL);
+					     lock);
 		flush_tlb_kernel_range((unsigned long)shadow_start,
 				       (unsigned long)shadow_end);
 	}
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 228a4a5312f2..c40ea7d1b65e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1768,7 +1768,8 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
 
 		if (is_vmalloc_or_module_addr((void *)orig_start))
 			kasan_release_vmalloc(orig_start, orig_end,
-					      va->va_start, va->va_end);
+					      va->va_start, va->va_end,
+					      &free_vmap_area_lock);
 
 		atomic_long_sub(nr, &vmap_lazy_nr);
 		num_purged_areas++;
@@ -4198,7 +4199,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
 				&free_vmap_area_list);
 		if (va)
 			kasan_release_vmalloc(orig_start, orig_end,
-				va->va_start, va->va_end);
+				va->va_start, va->va_end, NULL);
 		vas[area] = NULL;
 	}
 
@@ -4248,7 +4249,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
 				&free_vmap_area_list);
 		if (va)
 			kasan_release_vmalloc(orig_start, orig_end,
-				va->va_start, va->va_end);
+				va->va_start, va->va_end, &free_vmap_area_lock);
 		vas[area] = NULL;
 		kfree(vms[area]);
 	}
-- 
2.41.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte()
  2023-09-06 12:42 ` [PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte() Kefeng Wang
@ 2023-09-13  8:48   ` kernel test robot
  2023-09-13 11:21     ` Kefeng Wang
  0 siblings, 1 reply; 14+ messages in thread
From: kernel test robot @ 2023-09-13  8:48 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: oe-lkp, lkp, kasan-dev, linux-mm, Andrey Ryabinin,
	Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vincenzo Frascino, Andrew Morton, Uladzislau Rezki,
	Christoph Hellwig, Lorenzo Stoakes, Kefeng Wang, oliver.sang


hi, Kefeng Wang,

we don't have enough knowledge to connect below random issues with your change,
however, by running up to 300 times, we observed the parent keeps clean.
so make out this report FYI.
if you need more tests, please let us know. Thanks.

cb588b24f0fcf515 eaf065b089545219e27e529e3d6
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :300          6%          17:300   dmesg.BUG:#DF_stack_guard_page_was_hit_at#(stack_is#..#)
           :300          0%           1:300   dmesg.BUG:#DF_stack_guard_page_was_hit_at(____ptrval____)(stack_is(____ptrval____)..(____ptrval____))
           :300          6%          18:300   dmesg.BUG:KASAN:stack-out-of-bounds_in_vsnprintf
           :300          6%          17:300   dmesg.BUG:TASK_stack_guard_page_was_hit_at#(stack_is#..#)
           :300          0%           1:300   dmesg.BUG:TASK_stack_guard_page_was_hit_at(____ptrval____)(stack_is(____ptrval____)..(____ptrval____))
           :300          9%          28:300   dmesg.BUG:unable_to_handle_page_fault_for_address
           :300          3%           8:300   dmesg.Kernel_panic-not_syncing:Fatal_exception
           :300          7%          20:300   dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
           :300          3%          10:300   dmesg.Oops:#[##]
           :300          6%          19:300   dmesg.RIP:__sanitizer_cov_trace_pc
           :300          5%          14:300   dmesg.RIP:exc_page_fault
           :300          6%          18:300   dmesg.WARNING:kernel_stack
           :300          6%          18:300   dmesg.WARNING:stack_recursion
           :300          6%          18:300   dmesg.stack_guard_page:#[##]


Hello,

kernel test robot noticed "BUG:TASK_stack_guard_page_was_hit_at#(stack_is#..#)" on:

commit: eaf065b089545219e27e529e3d6deac4c0bad525 ("[PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte()")
url: https://github.com/intel-lab-lkp/linux/commits/Kefeng-Wang/mm-kasan-shadow-add-cond_resched-in-kasan_populate_vmalloc_pte/20230906-205407
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20230906124234.134200-4-wangkefeng.wang@huawei.com/
patch subject: [PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte()

in testcase: rcuscale
version: 
with following parameters:

	runtime: 300s
	scale_type: srcud



compiler: gcc-9
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202309131652.3e9c0f06-oliver.sang@intel.com


[  114.366291][    C1] BUG: TASK stack guard page was hit at 00000000d230e938 (stack is 000000004315c7ed..00000000e1c06e40)
[  114.366312][    C1] stack guard page: 0000 [#1] SMP KASAN
[  114.366324][    C1] CPU: 1 PID: 400 Comm: systemd-journal Tainted: G        W        N 6.5.0-11778-geaf065b08954 #1
[  114.366338][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 114.366345][ C1] RIP: 0010:exc_page_fault (arch/x86/mm/fault.c:1518) 
[ 114.366365][ C1] Code: 89 ee e8 74 ca 7c fe 0f 1f 44 00 00 90 44 89 f6 4c 89 e7 e8 7d 0b 00 00 41 5c 41 5d 41 5e 5d c3 66 0f 1f 00 55 48 89 e5 41 57 <41> 56 41 55 49 89 f5 41 54 49 89 fc 0f 1f 44 00 00 41 0f 20 d6 65
All code
========
   0:	89 ee                	mov    %ebp,%esi
   2:	e8 74 ca 7c fe       	callq  0xfffffffffe7cca7b
   7:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
   c:	90                   	nop
   d:	44 89 f6             	mov    %r14d,%esi
  10:	4c 89 e7             	mov    %r12,%rdi
  13:	e8 7d 0b 00 00       	callq  0xb95
  18:	41 5c                	pop    %r12
  1a:	41 5d                	pop    %r13
  1c:	41 5e                	pop    %r14
  1e:	5d                   	pop    %rbp
  1f:	c3                   	retq   
  20:	66 0f 1f 00          	nopw   (%rax)
  24:	55                   	push   %rbp
  25:	48 89 e5             	mov    %rsp,%rbp
  28:	41 57                	push   %r15
  2a:*	41 56                	push   %r14		<-- trapping instruction
  2c:	41 55                	push   %r13
  2e:	49 89 f5             	mov    %rsi,%r13
  31:	41 54                	push   %r12
  33:	49 89 fc             	mov    %rdi,%r12
  36:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  3b:	41 0f 20 d6          	mov    %cr2,%r14
  3f:	65                   	gs

Code starting with the faulting instruction
===========================================
   0:	41 56                	push   %r14
   2:	41 55                	push   %r13
   4:	49 89 f5             	mov    %rsi,%r13
   7:	41 54                	push   %r12
   9:	49 89 fc             	mov    %rdi,%r12
   c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  11:	41 0f 20 d6          	mov    %cr2,%r14
  15:	65                   	gs
[  114.366375][    C1] RSP: 0000:ffffc90001388000 EFLAGS: 00210087
[  114.366386][    C1] RAX: ffffc90001388018 RBX: 0000000000000000 RCX: ffffffff84801717
[  114.366394][    C1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc90001388018
[  114.366401][    C1] RBP: ffffc90001388008 R08: 0000000000000000 R09: 0000000000000000
[  114.366409][    C1] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  114.366416][    C1] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  114.366423][    C1] FS:  0000000000000000(0000) GS:ffff8883af500000(0063) knlGS:00000000f516bb40
[  114.366433][    C1] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[  114.366441][    C1] CR2: ffffc90001387ff8 CR3: 00000001bcfc9000 CR4: 00000000000406a0
[  114.366451][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  114.366459][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  114.366466][    C1] Call Trace:
[  114.366473][    C1] BUG: unable to handle page fault for address: fffff52000271002
[  114.366479][    C1] #PF: supervisor read access in kernel mode
[  114.366485][    C1] #PF: error_code(0x0000) - not-present page
[  114.366491][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366513][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366518][    C1] #PF: supervisor read access in kernel mode
[  114.366524][    C1] #PF: error_code(0x0000) - not-present page
[  114.366529][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366549][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366554][    C1] #PF: supervisor read access in kernel mode
[  114.366559][    C1] #PF: error_code(0x0000) - not-present page
[  114.366565][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366584][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366589][    C1] #PF: supervisor read access in kernel mode
[  114.366595][    C1] #PF: error_code(0x0000) - not-present page
[  114.366600][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366620][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366625][    C1] #PF: supervisor read access in kernel mode
[  114.366630][    C1] #PF: error_code(0x0000) - not-present page
[  114.366635][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366655][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366660][    C1] #PF: supervisor read access in kernel mode
[  114.366666][    C1] #PF: error_code(0x0000) - not-present page
[  114.366671][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366691][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366695][    C1] #PF: supervisor read access in kernel mode
[  114.366701][    C1] #PF: error_code(0x0000) - not-present page
[  114.366706][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366726][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366731][    C1] #PF: supervisor read access in kernel mode
[  114.366736][    C1] #PF: error_code(0x0000) - not-present page
[  114.366741][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366761][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366766][    C1] #PF: supervisor read access in kernel mode
[  114.366771][    C1] #PF: error_code(0x0000) - not-present page
[  114.366776][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366796][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366801][    C1] #PF: supervisor read access in kernel mode
[  114.366807][    C1] #PF: error_code(0x0000) - not-present page
[  114.366811][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366831][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366836][    C1] #PF: supervisor read access in kernel mode
[  114.366842][    C1] #PF: error_code(0x0000) - not-present page
[  114.366847][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366866][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366871][    C1] #PF: supervisor read access in kernel mode
[  114.366877][    C1] #PF: error_code(0x0000) - not-present page
[  114.366882][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366902][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366907][    C1] #PF: supervisor read access in kernel mode
[  114.366912][    C1] #PF: error_code(0x0000) - not-present page
[  114.366917][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366932][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366937][    C1] #PF: supervisor read access in kernel mode
[  114.366942][    C1] #PF: error_code(0x0000) - not-present page
[  114.366947][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.366966][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.366971][    C1] #PF: supervisor read access in kernel mode
[  114.366976][    C1] #PF: error_code(0x0000) - not-present page
[  114.366981][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.367001][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.367006][    C1] #PF: supervisor read access in kernel mode
[  114.367012][    C1] #PF: error_code(0x0000) - not-present page
[  114.367016][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.367036][    C1] BUG: unable to handle page fault for address: fffff52000271000
[  114.367042][    C1] #PF: supervisor read access in kernel mode
[  114.367047][    C1] #PF: error_code(0x0000) - not-present page
[  114.367052][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
[  114.367075][    C1] BUG: #DF stack guard page was hit at 0000000071957a17 (stack is 00000000d15a2314..00000000d7ec09e2)
[  114.367086][    C1] stack guard page: 0000 [#2] SMP KASAN
[  114.367095][    C1] CPU: 1 PID: 400 Comm: systemd-journal Tainted: G        W        N 6.5.0-11778-geaf065b08954 #1
[  114.367107][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  114.367121][    C1] ==================================================================
[ 114.367125][ C1] BUG: KASAN: stack-out-of-bounds in vsnprintf (lib/vsprintf.c:2851) 
[  114.367141][    C1] Read of size 8 at addr fffffe39ea66b3c0 by task systemd-journal/400
[  114.367150][    C1]
[  114.367153][    C1] CPU: 1 PID: 400 Comm: systemd-journal Tainted: G        W        N 6.5.0-11778-geaf065b08954 #1
[  114.367165][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  114.367172][    C1] Call Trace:
[  114.367176][    C1]  <#DF>
[ 114.367181][ C1] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[ 114.367197][ C1] print_address_description+0x7d/0x2ee 
[ 114.367219][ C1] print_report (mm/kasan/report.c:476) 
[ 114.367234][ C1] ? vsnprintf (lib/vsprintf.c:2851) 
[ 114.367248][ C1] ? kasan_addr_to_slab (mm/kasan/common.c:35) 
[ 114.367265][ C1] ? vsnprintf (lib/vsprintf.c:2851) 
[ 114.367278][ C1] kasan_report (mm/kasan/report.c:590) 
[ 114.367293][ C1] ? format_decode (lib/vsprintf.c:2526) 
[ 114.367308][ C1] ? vsnprintf (lib/vsprintf.c:2851) 
[ 114.367327][ C1] __asan_report_load8_noabort (mm/kasan/report_generic.c:381) 
[ 114.367346][ C1] vsnprintf (lib/vsprintf.c:2851) 
[ 114.367365][ C1] ? pointer (lib/vsprintf.c:2749) 
[ 114.367384][ C1] sprintf (lib/vsprintf.c:3017) 
[ 114.367399][ C1] ? snprintf (lib/vsprintf.c:3017) 
[ 114.367411][ C1] ? kallsyms_sym_address (kernel/kallsyms.c:164) 
[ 114.367426][ C1] ? kallsyms_expand_symbol+0x1f1/0x231 
[ 114.367443][ C1] ? __sanitizer_cov_trace_pc (kernel/kcov.c:200) 
[ 114.367460][ C1] ? kallsyms_lookup_buildid (kernel/kallsyms.c:437) 
[ 114.367476][ C1] __sprint_symbol+0x15b/0x1ec 
[ 114.367491][ C1] ? kallsyms_lookup_buildid (kernel/kallsyms.c:482) 
[ 114.367504][ C1] ? page_fault_oops (arch/x86/mm/fault.c:699) 
[ 114.367516][ C1] ? fixup_exception (arch/x86/mm/extable.c:305) 
[ 114.367550][ C1] ? kernelmode_fixup_or_oops (arch/x86/mm/fault.c:761) 
[ 114.367566][ C1] ? __bad_area_nosemaphore (arch/x86/mm/fault.c:819) 
[ 114.367579][ C1] ? __sanitizer_cov_trace_pc (kernel/kcov.c:200) 
[ 114.367597][ C1] sprint_symbol (kernel/kallsyms.c:536) 
[ 114.367609][ C1] ? __sanitizer_cov_trace_pc (kernel/kcov.c:200) 
[ 114.367625][ C1] symbol_string (lib/vsprintf.c:1001) 
[ 114.367639][ C1] ? ip4_addr_string (lib/vsprintf.c:983) 
[ 114.367656][ C1] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570) 
[ 114.367677][ C1] ? page_fault_oops (include/linux/sched/task_stack.h:31 arch/x86/mm/fault.c:699) 
[ 114.367689][ C1] ? page_fault_oops (arch/x86/mm/fault.c:699) 
[ 114.367706][ C1] ? dump_pagetable (arch/x86/mm/fault.c:635) 
[ 114.367718][ C1] ? search_extable (lib/extable.c:115) 
[ 114.367731][ C1] ? is_prefetch+0x36f/0x3b4 
[ 114.367745][ C1] ? spurious_kernel_fault_check (arch/x86/mm/fault.c:122) 
[ 114.367758][ C1] ? search_module_extables (arch/x86/include/asm/preempt.h:85 kernel/module/main.c:3236) 
[ 114.367775][ C1] ? widen_string (lib/vsprintf.c:618) 
[ 114.367792][ C1] ? widen_string (lib/vsprintf.c:618) 
[ 114.367805][ C1] ? set_precision (lib/vsprintf.c:618) 
[ 114.367824][ C1] ? string_nocheck (lib/vsprintf.c:640) 
[ 114.367838][ C1] ? number (lib/vsprintf.c:573) 
[ 114.367854][ C1] ? __sanitizer_cov_trace_pc (kernel/kcov.c:200) 
[ 114.367872][ C1] pointer (lib/vsprintf.c:2416) 
[ 114.367887][ C1] ? va_format+0x1a1/0x1a1 
[ 114.367900][ C1] ? hex_string (lib/vsprintf.c:723) 
[ 114.367919][ C1] vsnprintf (lib/vsprintf.c:2822) 
[ 114.367937][ C1] ? pointer (lib/vsprintf.c:2749) 
[ 114.367952][ C1] ? kvm_sched_clock_read (arch/x86/kernel/kvmclock.c:91) 
[ 114.367966][ C1] ? sched_clock_noinstr (arch/x86/kernel/tsc.c:267) 
[ 114.367982][ C1] vprintk_store (kernel/printk/printk.c:2193) 
[ 114.367996][ C1] ? __kasan_check_write (mm/kasan/shadow.c:38) 
[ 114.368011][ C1] ? printk_sprint (kernel/printk/printk.c:2158) 
[ 114.368028][ C1] ? printk_sprint (kernel/printk/printk.c:2158) 
[ 114.368057][ C1] vprintk_emit (kernel/printk/printk.c:2290) 
[ 114.368074][ C1] vprintk_deferred (kernel/printk/printk.c:3911) 
[ 114.368089][ C1] vprintk (kernel/printk/printk_safe.c:42) 
[ 114.368104][ C1] _printk (kernel/printk/printk.c:2329) 
[ 114.368116][ C1] ? syslog_print (kernel/printk/printk.c:2329) 
[ 114.368127][ C1] ? vprintk (kernel/printk/printk_safe.c:46) 
[ 114.368143][ C1] ? syslog_print (kernel/printk/printk.c:2329) 
[ 114.368157][ C1] ? __sanitizer_cov_trace_pc (kernel/kcov.c:200) 
[ 114.368175][ C1] show_ip (arch/x86/kernel/dumpstack.c:144) 
[ 114.368188][ C1] show_iret_regs (arch/x86/kernel/dumpstack.c:150) 
[ 114.368200][ C1] __show_regs (arch/x86/kernel/process_64.c:77) 
[ 114.368214][ C1] ? dump_stack_print_info (lib/dump_stack.c:71) 
[ 114.368231][ C1] show_regs (arch/x86/kernel/dumpstack.c:477) 
[ 114.368243][ C1] __die_body (arch/x86/kernel/dumpstack.c:421) 
[ 114.368256][ C1] __die (arch/x86/kernel/dumpstack.c:435) 
[ 114.368268][ C1] die (arch/x86/kernel/dumpstack.c:448) 
[ 114.368280][ C1] handle_stack_overflow (arch/x86/kernel/traps.c:327) 
[ 114.368298][ C1] exc_double_fault (arch/x86/kernel/traps.c:464) 
[ 114.368315][ C1] asm_exc_double_fault (arch/x86/include/asm/idtentry.h:611) 
[ 114.368329][ C1] RIP: 0010:__sanitizer_cov_trace_pc (kernel/kcov.c:200) 
[ 114.368347][ C1] Code: 00 00 48 c1 e6 38 48 21 fe 74 12 b8 01 00 00 00 48 c1 e0 38 48 39 c6 b0 00 0f 44 c2 c3 85 ff 0f 44 c1 c3 31 c0 c3 f3 0f 1e fa <55> 65 8b 05 6e 52 f0 7c 89 c1 48 89 e5 81 e1 00 01 00 00 48 8b 75
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	48 c1 e6 38          	shl    $0x38,%rsi
   6:	48 21 fe             	and    %rdi,%rsi
   9:	74 12                	je     0x1d
   b:	b8 01 00 00 00       	mov    $0x1,%eax
  10:	48 c1 e0 38          	shl    $0x38,%rax
  14:	48 39 c6             	cmp    %rax,%rsi
  17:	b0 00                	mov    $0x0,%al
  19:	0f 44 c2             	cmove  %edx,%eax
  1c:	c3                   	retq   
  1d:	85 ff                	test   %edi,%edi
  1f:	0f 44 c1             	cmove  %ecx,%eax
  22:	c3                   	retq   
  23:	31 c0                	xor    %eax,%eax
  25:	c3                   	retq   
  26:	f3 0f 1e fa          	endbr64 
  2a:*	55                   	push   %rbp		<-- trapping instruction
  2b:	65 8b 05 6e 52 f0 7c 	mov    %gs:0x7cf0526e(%rip),%eax        # 0x7cf052a0
  32:	89 c1                	mov    %eax,%ecx
  34:	48 89 e5             	mov    %rsp,%rbp
  37:	81 e1 00 01 00 00    	and    $0x100,%ecx
  3d:	48                   	rex.W
  3e:	8b                   	.byte 0x8b
  3f:	75                   	.byte 0x75

Code starting with the faulting instruction
===========================================
   0:	55                   	push   %rbp
   1:	65 8b 05 6e 52 f0 7c 	mov    %gs:0x7cf0526e(%rip),%eax        # 0x7cf05276
   8:	89 c1                	mov    %eax,%ecx
   a:	48 89 e5             	mov    %rsp,%rbp
   d:	81 e1 00 01 00 00    	and    $0x100,%ecx
  13:	48                   	rex.W
  14:	8b                   	.byte 0x8b
  15:	75                   	.byte 0x75


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230913/202309131652.3e9c0f06-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte()
  2023-09-13  8:48   ` kernel test robot
@ 2023-09-13 11:21     ` Kefeng Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Kefeng Wang @ 2023-09-13 11:21 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, kasan-dev, linux-mm, Andrey Ryabinin,
	Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vincenzo Frascino, Andrew Morton, Uladzislau Rezki,
	Christoph Hellwig, Lorenzo Stoakes

Hi, thanks for you test, but as the commit log of this patch,
it is a hack, I don't think this patch is correct, hope kasan maintainer
to give some advise about the softlock issue about populate/depopulate pte.

On 2023/9/13 16:48, kernel test robot wrote:
> 
> hi, Kefeng Wang,
> 
> we don't have enough knowledge to connect below random issues with your change,
> however, by running up to 300 times, we observed the parent keeps clean.
> so make out this report FYI.
> if you need more tests, please let us know. Thanks.
> 
> cb588b24f0fcf515 eaf065b089545219e27e529e3d6
> ---------------- ---------------------------
>         fail:runs  %reproduction    fail:runs
>             |             |             |
>             :300          6%          17:300   dmesg.BUG:#DF_stack_guard_page_was_hit_at#(stack_is#..#)
>             :300          0%           1:300   dmesg.BUG:#DF_stack_guard_page_was_hit_at(____ptrval____)(stack_is(____ptrval____)..(____ptrval____))
>             :300          6%          18:300   dmesg.BUG:KASAN:stack-out-of-bounds_in_vsnprintf
>             :300          6%          17:300   dmesg.BUG:TASK_stack_guard_page_was_hit_at#(stack_is#..#)
>             :300          0%           1:300   dmesg.BUG:TASK_stack_guard_page_was_hit_at(____ptrval____)(stack_is(____ptrval____)..(____ptrval____))
>             :300          9%          28:300   dmesg.BUG:unable_to_handle_page_fault_for_address
>             :300          3%           8:300   dmesg.Kernel_panic-not_syncing:Fatal_exception
>             :300          7%          20:300   dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
>             :300          3%          10:300   dmesg.Oops:#[##]
>             :300          6%          19:300   dmesg.RIP:__sanitizer_cov_trace_pc
>             :300          5%          14:300   dmesg.RIP:exc_page_fault
>             :300          6%          18:300   dmesg.WARNING:kernel_stack
>             :300          6%          18:300   dmesg.WARNING:stack_recursion
>             :300          6%          18:300   dmesg.stack_guard_page:#[##]
> 
> 
> Hello,
> 
> kernel test robot noticed "BUG:TASK_stack_guard_page_was_hit_at#(stack_is#..#)" on:
> 
> commit: eaf065b089545219e27e529e3d6deac4c0bad525 ("[PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte()")
> url: https://github.com/intel-lab-lkp/linux/commits/Kefeng-Wang/mm-kasan-shadow-add-cond_resched-in-kasan_populate_vmalloc_pte/20230906-205407
> base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/all/20230906124234.134200-4-wangkefeng.wang@huawei.com/
> patch subject: [PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte()
> 
> in testcase: rcuscale
> version:
> with following parameters:
> 
> 	runtime: 300s
> 	scale_type: srcud
> 
> 
> 
> compiler: gcc-9
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202309131652.3e9c0f06-oliver.sang@intel.com
> 
> 
> [  114.366291][    C1] BUG: TASK stack guard page was hit at 00000000d230e938 (stack is 000000004315c7ed..00000000e1c06e40)
> [  114.366312][    C1] stack guard page: 0000 [#1] SMP KASAN
> [  114.366324][    C1] CPU: 1 PID: 400 Comm: systemd-journal Tainted: G        W        N 6.5.0-11778-geaf065b08954 #1
> [  114.366338][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 114.366345][ C1] RIP: 0010:exc_page_fault (arch/x86/mm/fault.c:1518)
> [ 114.366365][ C1] Code: 89 ee e8 74 ca 7c fe 0f 1f 44 00 00 90 44 89 f6 4c 89 e7 e8 7d 0b 00 00 41 5c 41 5d 41 5e 5d c3 66 0f 1f 00 55 48 89 e5 41 57 <41> 56 41 55 49 89 f5 41 54 49 89 fc 0f 1f 44 00 00 41 0f 20 d6 65
> All code
> ========
>     0:	89 ee                	mov    %ebp,%esi
>     2:	e8 74 ca 7c fe       	callq  0xfffffffffe7cca7b
>     7:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>     c:	90                   	nop
>     d:	44 89 f6             	mov    %r14d,%esi
>    10:	4c 89 e7             	mov    %r12,%rdi
>    13:	e8 7d 0b 00 00       	callq  0xb95
>    18:	41 5c                	pop    %r12
>    1a:	41 5d                	pop    %r13
>    1c:	41 5e                	pop    %r14
>    1e:	5d                   	pop    %rbp
>    1f:	c3                   	retq
>    20:	66 0f 1f 00          	nopw   (%rax)
>    24:	55                   	push   %rbp
>    25:	48 89 e5             	mov    %rsp,%rbp
>    28:	41 57                	push   %r15
>    2a:*	41 56                	push   %r14		<-- trapping instruction
>    2c:	41 55                	push   %r13
>    2e:	49 89 f5             	mov    %rsi,%r13
>    31:	41 54                	push   %r12
>    33:	49 89 fc             	mov    %rdi,%r12
>    36:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>    3b:	41 0f 20 d6          	mov    %cr2,%r14
>    3f:	65                   	gs
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	41 56                	push   %r14
>     2:	41 55                	push   %r13
>     4:	49 89 f5             	mov    %rsi,%r13
>     7:	41 54                	push   %r12
>     9:	49 89 fc             	mov    %rdi,%r12
>     c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>    11:	41 0f 20 d6          	mov    %cr2,%r14
>    15:	65                   	gs
> [  114.366375][    C1] RSP: 0000:ffffc90001388000 EFLAGS: 00210087
> [  114.366386][    C1] RAX: ffffc90001388018 RBX: 0000000000000000 RCX: ffffffff84801717
> [  114.366394][    C1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc90001388018
> [  114.366401][    C1] RBP: ffffc90001388008 R08: 0000000000000000 R09: 0000000000000000
> [  114.366409][    C1] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [  114.366416][    C1] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  114.366423][    C1] FS:  0000000000000000(0000) GS:ffff8883af500000(0063) knlGS:00000000f516bb40
> [  114.366433][    C1] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> [  114.366441][    C1] CR2: ffffc90001387ff8 CR3: 00000001bcfc9000 CR4: 00000000000406a0
> [  114.366451][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  114.366459][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  114.366466][    C1] Call Trace:
> [  114.366473][    C1] BUG: unable to handle page fault for address: fffff52000271002
> [  114.366479][    C1] #PF: supervisor read access in kernel mode
> [  114.366485][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366491][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366513][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366518][    C1] #PF: supervisor read access in kernel mode
> [  114.366524][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366529][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366549][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366554][    C1] #PF: supervisor read access in kernel mode
> [  114.366559][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366565][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366584][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366589][    C1] #PF: supervisor read access in kernel mode
> [  114.366595][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366600][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366620][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366625][    C1] #PF: supervisor read access in kernel mode
> [  114.366630][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366635][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366655][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366660][    C1] #PF: supervisor read access in kernel mode
> [  114.366666][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366671][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366691][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366695][    C1] #PF: supervisor read access in kernel mode
> [  114.366701][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366706][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366726][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366731][    C1] #PF: supervisor read access in kernel mode
> [  114.366736][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366741][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366761][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366766][    C1] #PF: supervisor read access in kernel mode
> [  114.366771][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366776][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366796][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366801][    C1] #PF: supervisor read access in kernel mode
> [  114.366807][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366811][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366831][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366836][    C1] #PF: supervisor read access in kernel mode
> [  114.366842][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366847][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366866][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366871][    C1] #PF: supervisor read access in kernel mode
> [  114.366877][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366882][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366902][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366907][    C1] #PF: supervisor read access in kernel mode
> [  114.366912][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366917][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366932][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366937][    C1] #PF: supervisor read access in kernel mode
> [  114.366942][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366947][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.366966][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.366971][    C1] #PF: supervisor read access in kernel mode
> [  114.366976][    C1] #PF: error_code(0x0000) - not-present page
> [  114.366981][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.367001][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.367006][    C1] #PF: supervisor read access in kernel mode
> [  114.367012][    C1] #PF: error_code(0x0000) - not-present page
> [  114.367016][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.367036][    C1] BUG: unable to handle page fault for address: fffff52000271000
> [  114.367042][    C1] #PF: supervisor read access in kernel mode
> [  114.367047][    C1] #PF: error_code(0x0000) - not-present page
> [  114.367052][    C1] PGD 417fdf067 P4D 417fdf067 PUD 1009ad067 PMD 14692d067 PTE 0
> [  114.367075][    C1] BUG: #DF stack guard page was hit at 0000000071957a17 (stack is 00000000d15a2314..00000000d7ec09e2)
> [  114.367086][    C1] stack guard page: 0000 [#2] SMP KASAN
> [  114.367095][    C1] CPU: 1 PID: 400 Comm: systemd-journal Tainted: G        W        N 6.5.0-11778-geaf065b08954 #1
> [  114.367107][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [  114.367121][    C1] ==================================================================
> [ 114.367125][ C1] BUG: KASAN: stack-out-of-bounds in vsnprintf (lib/vsprintf.c:2851)
> [  114.367141][    C1] Read of size 8 at addr fffffe39ea66b3c0 by task systemd-journal/400
> [  114.367150][    C1]
> [  114.367153][    C1] CPU: 1 PID: 400 Comm: systemd-journal Tainted: G        W        N 6.5.0-11778-geaf065b08954 #1
> [  114.367165][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [  114.367172][    C1] Call Trace:
> [  114.367176][    C1]  <#DF>
> [ 114.367181][ C1] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
> [ 114.367197][ C1] print_address_description+0x7d/0x2ee
> [ 114.367219][ C1] print_report (mm/kasan/report.c:476)
> [ 114.367234][ C1] ? vsnprintf (lib/vsprintf.c:2851)
> [ 114.367248][ C1] ? kasan_addr_to_slab (mm/kasan/common.c:35)
> [ 114.367265][ C1] ? vsnprintf (lib/vsprintf.c:2851)
> [ 114.367278][ C1] kasan_report (mm/kasan/report.c:590)
> [ 114.367293][ C1] ? format_decode (lib/vsprintf.c:2526)
> [ 114.367308][ C1] ? vsnprintf (lib/vsprintf.c:2851)
> [ 114.367327][ C1] __asan_report_load8_noabort (mm/kasan/report_generic.c:381)
> [ 114.367346][ C1] vsnprintf (lib/vsprintf.c:2851)
> [ 114.367365][ C1] ? pointer (lib/vsprintf.c:2749)
> [ 114.367384][ C1] sprintf (lib/vsprintf.c:3017)
> [ 114.367399][ C1] ? snprintf (lib/vsprintf.c:3017)
> [ 114.367411][ C1] ? kallsyms_sym_address (kernel/kallsyms.c:164)
> [ 114.367426][ C1] ? kallsyms_expand_symbol+0x1f1/0x231
> [ 114.367443][ C1] ? __sanitizer_cov_trace_pc (kernel/kcov.c:200)
> [ 114.367460][ C1] ? kallsyms_lookup_buildid (kernel/kallsyms.c:437)
> [ 114.367476][ C1] __sprint_symbol+0x15b/0x1ec
> [ 114.367491][ C1] ? kallsyms_lookup_buildid (kernel/kallsyms.c:482)
> [ 114.367504][ C1] ? page_fault_oops (arch/x86/mm/fault.c:699)
> [ 114.367516][ C1] ? fixup_exception (arch/x86/mm/extable.c:305)
> [ 114.367550][ C1] ? kernelmode_fixup_or_oops (arch/x86/mm/fault.c:761)
> [ 114.367566][ C1] ? __bad_area_nosemaphore (arch/x86/mm/fault.c:819)
> [ 114.367579][ C1] ? __sanitizer_cov_trace_pc (kernel/kcov.c:200)
> [ 114.367597][ C1] sprint_symbol (kernel/kallsyms.c:536)
> [ 114.367609][ C1] ? __sanitizer_cov_trace_pc (kernel/kcov.c:200)
> [ 114.367625][ C1] symbol_string (lib/vsprintf.c:1001)
> [ 114.367639][ C1] ? ip4_addr_string (lib/vsprintf.c:983)
> [ 114.367656][ C1] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
> [ 114.367677][ C1] ? page_fault_oops (include/linux/sched/task_stack.h:31 arch/x86/mm/fault.c:699)
> [ 114.367689][ C1] ? page_fault_oops (arch/x86/mm/fault.c:699)
> [ 114.367706][ C1] ? dump_pagetable (arch/x86/mm/fault.c:635)
> [ 114.367718][ C1] ? search_extable (lib/extable.c:115)
> [ 114.367731][ C1] ? is_prefetch+0x36f/0x3b4
> [ 114.367745][ C1] ? spurious_kernel_fault_check (arch/x86/mm/fault.c:122)
> [ 114.367758][ C1] ? search_module_extables (arch/x86/include/asm/preempt.h:85 kernel/module/main.c:3236)
> [ 114.367775][ C1] ? widen_string (lib/vsprintf.c:618)
> [ 114.367792][ C1] ? widen_string (lib/vsprintf.c:618)
> [ 114.367805][ C1] ? set_precision (lib/vsprintf.c:618)
> [ 114.367824][ C1] ? string_nocheck (lib/vsprintf.c:640)
> [ 114.367838][ C1] ? number (lib/vsprintf.c:573)
> [ 114.367854][ C1] ? __sanitizer_cov_trace_pc (kernel/kcov.c:200)
> [ 114.367872][ C1] pointer (lib/vsprintf.c:2416)
> [ 114.367887][ C1] ? va_format+0x1a1/0x1a1
> [ 114.367900][ C1] ? hex_string (lib/vsprintf.c:723)
> [ 114.367919][ C1] vsnprintf (lib/vsprintf.c:2822)
> [ 114.367937][ C1] ? pointer (lib/vsprintf.c:2749)
> [ 114.367952][ C1] ? kvm_sched_clock_read (arch/x86/kernel/kvmclock.c:91)
> [ 114.367966][ C1] ? sched_clock_noinstr (arch/x86/kernel/tsc.c:267)
> [ 114.367982][ C1] vprintk_store (kernel/printk/printk.c:2193)
> [ 114.367996][ C1] ? __kasan_check_write (mm/kasan/shadow.c:38)
> [ 114.368011][ C1] ? printk_sprint (kernel/printk/printk.c:2158)
> [ 114.368028][ C1] ? printk_sprint (kernel/printk/printk.c:2158)
> [ 114.368057][ C1] vprintk_emit (kernel/printk/printk.c:2290)
> [ 114.368074][ C1] vprintk_deferred (kernel/printk/printk.c:3911)
> [ 114.368089][ C1] vprintk (kernel/printk/printk_safe.c:42)
> [ 114.368104][ C1] _printk (kernel/printk/printk.c:2329)
> [ 114.368116][ C1] ? syslog_print (kernel/printk/printk.c:2329)
> [ 114.368127][ C1] ? vprintk (kernel/printk/printk_safe.c:46)
> [ 114.368143][ C1] ? syslog_print (kernel/printk/printk.c:2329)
> [ 114.368157][ C1] ? __sanitizer_cov_trace_pc (kernel/kcov.c:200)
> [ 114.368175][ C1] show_ip (arch/x86/kernel/dumpstack.c:144)
> [ 114.368188][ C1] show_iret_regs (arch/x86/kernel/dumpstack.c:150)
> [ 114.368200][ C1] __show_regs (arch/x86/kernel/process_64.c:77)
> [ 114.368214][ C1] ? dump_stack_print_info (lib/dump_stack.c:71)
> [ 114.368231][ C1] show_regs (arch/x86/kernel/dumpstack.c:477)
> [ 114.368243][ C1] __die_body (arch/x86/kernel/dumpstack.c:421)
> [ 114.368256][ C1] __die (arch/x86/kernel/dumpstack.c:435)
> [ 114.368268][ C1] die (arch/x86/kernel/dumpstack.c:448)
> [ 114.368280][ C1] handle_stack_overflow (arch/x86/kernel/traps.c:327)
> [ 114.368298][ C1] exc_double_fault (arch/x86/kernel/traps.c:464)
> [ 114.368315][ C1] asm_exc_double_fault (arch/x86/include/asm/idtentry.h:611)
> [ 114.368329][ C1] RIP: 0010:__sanitizer_cov_trace_pc (kernel/kcov.c:200)
> [ 114.368347][ C1] Code: 00 00 48 c1 e6 38 48 21 fe 74 12 b8 01 00 00 00 48 c1 e0 38 48 39 c6 b0 00 0f 44 c2 c3 85 ff 0f 44 c1 c3 31 c0 c3 f3 0f 1e fa <55> 65 8b 05 6e 52 f0 7c 89 c1 48 89 e5 81 e1 00 01 00 00 48 8b 75
> All code
> ========
>     0:	00 00                	add    %al,(%rax)
>     2:	48 c1 e6 38          	shl    $0x38,%rsi
>     6:	48 21 fe             	and    %rdi,%rsi
>     9:	74 12                	je     0x1d
>     b:	b8 01 00 00 00       	mov    $0x1,%eax
>    10:	48 c1 e0 38          	shl    $0x38,%rax
>    14:	48 39 c6             	cmp    %rax,%rsi
>    17:	b0 00                	mov    $0x0,%al
>    19:	0f 44 c2             	cmove  %edx,%eax
>    1c:	c3                   	retq
>    1d:	85 ff                	test   %edi,%edi
>    1f:	0f 44 c1             	cmove  %ecx,%eax
>    22:	c3                   	retq
>    23:	31 c0                	xor    %eax,%eax
>    25:	c3                   	retq
>    26:	f3 0f 1e fa          	endbr64
>    2a:*	55                   	push   %rbp		<-- trapping instruction
>    2b:	65 8b 05 6e 52 f0 7c 	mov    %gs:0x7cf0526e(%rip),%eax        # 0x7cf052a0
>    32:	89 c1                	mov    %eax,%ecx
>    34:	48 89 e5             	mov    %rsp,%rbp
>    37:	81 e1 00 01 00 00    	and    $0x100,%ecx
>    3d:	48                   	rex.W
>    3e:	8b                   	.byte 0x8b
>    3f:	75                   	.byte 0x75
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	55                   	push   %rbp
>     1:	65 8b 05 6e 52 f0 7c 	mov    %gs:0x7cf0526e(%rip),%eax        # 0x7cf05276
>     8:	89 c1                	mov    %eax,%ecx
>     a:	48 89 e5             	mov    %rsp,%rbp
>     d:	81 e1 00 01 00 00    	and    $0x100,%ecx
>    13:	48                   	rex.W
>    14:	8b                   	.byte 0x8b
>    15:	75                   	.byte 0x75
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20230913/202309131652.3e9c0f06-oliver.sang@intel.com
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte
  2023-09-06 12:42 [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Kefeng Wang
                   ` (2 preceding siblings ...)
  2023-09-06 12:42 ` [PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte() Kefeng Wang
@ 2023-09-15  0:58 ` Kefeng Wang
  2023-10-18 14:16   ` Kefeng Wang
  3 siblings, 1 reply; 14+ messages in thread
From: Kefeng Wang @ 2023-09-15  0:58 UTC (permalink / raw)
  To: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton,
	Uladzislau Rezki, Christoph Hellwig, Lorenzo Stoakes, kasan-dev,
	linux-mm

Hi All, any suggest or comments,many thanks.

On 2023/9/6 20:42, Kefeng Wang wrote:
> This is a RFC, even patch3 is a hack to fix the softlock issue when
> populate or depopulate pte with large region, looking forward to your
> reply and advise, thanks.

Here is full stack,for populate pte,

[    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
[    C3] Modules linked in: test(OE+)
[    C3] irq event stamp: 320776
[    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>] 
_raw_spin_unlock_irqrestore+0x98/0xb8
[    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>] 
el1_interrupt+0x38/0xa8
[    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>] 
__do_softirq+0x658/0x7ac
[    C3] softirqs last disabled at (318169): [<ffff800080047fd8>] 
____do_softirq+0x18/0x30
[    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 
6.5.0+ #595
[    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
[    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
[    C3] sp : ffff800093386d70
[    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
[    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
[    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
[    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
[    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
[    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
[    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
[    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
[    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
[    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
[    C3] Call trace:
[    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
[    C3]  rmqueue_bulk+0x434/0x6b8
[    C3]  get_page_from_freelist+0xdd4/0x1680
[    C3]  __alloc_pages+0x244/0x508
[    C3]  alloc_pages+0xf0/0x218
[    C3]  __get_free_pages+0x1c/0x50
[    C3]  kasan_populate_vmalloc_pte+0x30/0x188
[    C3]  __apply_to_page_range+0x3ec/0x650
[    C3]  apply_to_page_range+0x1c/0x30
[    C3]  kasan_populate_vmalloc+0x60/0x70
[    C3]  alloc_vmap_area.part.67+0x328/0xe50
[    C3]  alloc_vmap_area+0x4c/0x78
[    C3]  __get_vm_area_node.constprop.76+0x130/0x240
[    C3]  __vmalloc_node_range+0x12c/0x340
[    C3]  __vmalloc_node+0x8c/0xb0
[    C3]  vmalloc+0x2c/0x40
[    C3]  show_mem_init+0x1c/0xff8 [test]
[    C3]  do_one_initcall+0xe4/0x500
[    C3]  do_init_module+0x100/0x358
[    C3]  load_module+0x2e64/0x2fc8
[    C3]  init_module_from_file+0xec/0x148
[    C3]  idempotent_init_module+0x278/0x380
[    C3]  __arm64_sys_finit_module+0x88/0xf8
[    C3]  invoke_syscall+0x64/0x188
[    C3]  el0_svc_common.constprop.1+0xec/0x198
[    C3]  do_el0_svc+0x48/0xc8
[    C3]  el0_svc+0x3c/0xe8
[    C3]  el0t_64_sync_handler+0xa0/0xc8
[    C3]  el0t_64_sync+0x188/0x190

and for depopuldate pte,

[    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
[    C6] Modules linked in: test(OE+)
[    C6] irq event stamp: 39458
[    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>] 
_raw_spin_unlock_irqrestore+0x98/0xb8
[    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>] 
el1_interrupt+0x38/0xa8
[    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>] 
__do_softirq+0x658/0x7ac
[    C6] softirqs last disabled at (39415): [<ffff800080047fd8>] 
____do_softirq+0x18/0x30
[    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL 
6.5.0+ #595
[    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[    C6] Workqueue: events drain_vmap_area_work
[    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
[    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
[    C6] sp : ffff80008fe676b0
[    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
[    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
[    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
[    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
[    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
[    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
[    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
[    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
[    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
[    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
[    C6] Call trace:
[    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
[    C6]  free_pcppages_bulk+0x2bc/0x3e0
[    C6]  free_unref_page_commit+0x1fc/0x290
[    C6]  free_unref_page+0x184/0x250
[    C6]  __free_pages+0x154/0x1a0
[    C6]  free_pages+0x88/0xb0
[    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
[    C6]  __apply_to_page_range+0x3ec/0x650
[    C6]  apply_to_existing_page_range+0x1c/0x30
[    C6]  kasan_release_vmalloc+0xa4/0x118
[    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
[    C6]  drain_vmap_area_work+0x60/0xc0
[    C6]  process_one_work+0x4cc/0xa38
[    C6]  worker_thread+0x240/0x638
[    C6]  kthread+0x1c8/0x1e0
[    C6]  ret_from_fork+0x10/0x20



> 
> Kefeng Wang (3):
>    mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
>    mm: kasan: shadow: move free_page() out of page table lock
>    mm: kasan: shadow: HACK add cond_resched_lock() in
>      kasan_depopulate_vmalloc_pte()
> 
>   include/linux/kasan.h |  9 ++++++---
>   mm/kasan/shadow.c     | 20 +++++++++++++-------
>   mm/vmalloc.c          |  7 ++++---
>   3 files changed, 23 insertions(+), 13 deletions(-)
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte
  2023-09-15  0:58 ` [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Kefeng Wang
@ 2023-10-18 14:16   ` Kefeng Wang
  2023-10-18 16:37     ` Marco Elver
  0 siblings, 1 reply; 14+ messages in thread
From: Kefeng Wang @ 2023-10-18 14:16 UTC (permalink / raw)
  To: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton,
	Uladzislau Rezki, Christoph Hellwig, Lorenzo Stoakes, kasan-dev,
	linux-mm

The issue is easy to reproduced with large vmalloc, kindly ping...

On 2023/9/15 8:58, Kefeng Wang wrote:
> Hi All, any suggest or comments,many thanks.
> 
> On 2023/9/6 20:42, Kefeng Wang wrote:
>> This is a RFC, even patch3 is a hack to fix the softlock issue when
>> populate or depopulate pte with large region, looking forward to your
>> reply and advise, thanks.
> 
> Here is full stack,for populate pte,
> 
> [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
> [    C3] Modules linked in: test(OE+)
> [    C3] irq event stamp: 320776
> [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>] 
> _raw_spin_unlock_irqrestore+0x98/0xb8
> [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>] 
> el1_interrupt+0x38/0xa8
> [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>] 
> __do_softirq+0x658/0x7ac
> [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>] 
> ____do_softirq+0x18/0x30
> [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
> [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> [    C3] sp : ffff800093386d70
> [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
> [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
> [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
> [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
> [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
> [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
> [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
> [    C3] Call trace:
> [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
> [    C3]  rmqueue_bulk+0x434/0x6b8
> [    C3]  get_page_from_freelist+0xdd4/0x1680
> [    C3]  __alloc_pages+0x244/0x508
> [    C3]  alloc_pages+0xf0/0x218
> [    C3]  __get_free_pages+0x1c/0x50
> [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
> [    C3]  __apply_to_page_range+0x3ec/0x650
> [    C3]  apply_to_page_range+0x1c/0x30
> [    C3]  kasan_populate_vmalloc+0x60/0x70
> [    C3]  alloc_vmap_area.part.67+0x328/0xe50
> [    C3]  alloc_vmap_area+0x4c/0x78
> [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
> [    C3]  __vmalloc_node_range+0x12c/0x340
> [    C3]  __vmalloc_node+0x8c/0xb0
> [    C3]  vmalloc+0x2c/0x40
> [    C3]  show_mem_init+0x1c/0xff8 [test]
> [    C3]  do_one_initcall+0xe4/0x500
> [    C3]  do_init_module+0x100/0x358
> [    C3]  load_module+0x2e64/0x2fc8
> [    C3]  init_module_from_file+0xec/0x148
> [    C3]  idempotent_init_module+0x278/0x380
> [    C3]  __arm64_sys_finit_module+0x88/0xf8
> [    C3]  invoke_syscall+0x64/0x188
> [    C3]  el0_svc_common.constprop.1+0xec/0x198
> [    C3]  do_el0_svc+0x48/0xc8
> [    C3]  el0_svc+0x3c/0xe8
> [    C3]  el0t_64_sync_handler+0xa0/0xc8
> [    C3]  el0t_64_sync+0x188/0x190
> 
> and for depopuldate pte,
> 
> [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
> [    C6] Modules linked in: test(OE+)
> [    C6] irq event stamp: 39458
> [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>] 
> _raw_spin_unlock_irqrestore+0x98/0xb8
> [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>] 
> el1_interrupt+0x38/0xa8
> [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>] 
> __do_softirq+0x658/0x7ac
> [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>] 
> ____do_softirq+0x18/0x30
> [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL 
> 6.5.0+ #595
> [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> [    C6] Workqueue: events drain_vmap_area_work
> [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> [    C6] sp : ffff80008fe676b0
> [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
> [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
> [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
> [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
> [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
> [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
> [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
> [    C6] Call trace:
> [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
> [    C6]  free_pcppages_bulk+0x2bc/0x3e0
> [    C6]  free_unref_page_commit+0x1fc/0x290
> [    C6]  free_unref_page+0x184/0x250
> [    C6]  __free_pages+0x154/0x1a0
> [    C6]  free_pages+0x88/0xb0
> [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
> [    C6]  __apply_to_page_range+0x3ec/0x650
> [    C6]  apply_to_existing_page_range+0x1c/0x30
> [    C6]  kasan_release_vmalloc+0xa4/0x118
> [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
> [    C6]  drain_vmap_area_work+0x60/0xc0
> [    C6]  process_one_work+0x4cc/0xa38
> [    C6]  worker_thread+0x240/0x638
> [    C6]  kthread+0x1c8/0x1e0
> [    C6]  ret_from_fork+0x10/0x20
> 
> 
> 
>>
>> Kefeng Wang (3):
>>    mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
>>    mm: kasan: shadow: move free_page() out of page table lock
>>    mm: kasan: shadow: HACK add cond_resched_lock() in
>>      kasan_depopulate_vmalloc_pte()
>>
>>   include/linux/kasan.h |  9 ++++++---
>>   mm/kasan/shadow.c     | 20 +++++++++++++-------
>>   mm/vmalloc.c          |  7 ++++---
>>   3 files changed, 23 insertions(+), 13 deletions(-)
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte
  2023-10-18 14:16   ` Kefeng Wang
@ 2023-10-18 16:37     ` Marco Elver
  2023-10-19  1:40       ` Kefeng Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Marco Elver @ 2023-10-18 16:37 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton,
	Uladzislau Rezki, Christoph Hellwig, Lorenzo Stoakes, kasan-dev,
	linux-mm

On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> The issue is easy to reproduced with large vmalloc, kindly ping...
>
> On 2023/9/15 8:58, Kefeng Wang wrote:
> > Hi All, any suggest or comments,many thanks.
> >
> > On 2023/9/6 20:42, Kefeng Wang wrote:
> >> This is a RFC, even patch3 is a hack to fix the softlock issue when
> >> populate or depopulate pte with large region, looking forward to your
> >> reply and advise, thanks.
> >
> > Here is full stack,for populate pte,
> >
> > [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
> > [    C3] Modules linked in: test(OE+)
> > [    C3] irq event stamp: 320776
> > [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
> > _raw_spin_unlock_irqrestore+0x98/0xb8
> > [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
> > el1_interrupt+0x38/0xa8
> > [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
> > __do_softirq+0x658/0x7ac
> > [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
> > ____do_softirq+0x18/0x30
> > [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
> > [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > [    C3] sp : ffff800093386d70
> > [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
> > [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
> > [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
> > [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
> > [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
> > [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
> > [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
> > [    C3] Call trace:
> > [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > [    C3]  rmqueue_bulk+0x434/0x6b8
> > [    C3]  get_page_from_freelist+0xdd4/0x1680
> > [    C3]  __alloc_pages+0x244/0x508
> > [    C3]  alloc_pages+0xf0/0x218
> > [    C3]  __get_free_pages+0x1c/0x50
> > [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
> > [    C3]  __apply_to_page_range+0x3ec/0x650
> > [    C3]  apply_to_page_range+0x1c/0x30
> > [    C3]  kasan_populate_vmalloc+0x60/0x70
> > [    C3]  alloc_vmap_area.part.67+0x328/0xe50
> > [    C3]  alloc_vmap_area+0x4c/0x78
> > [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
> > [    C3]  __vmalloc_node_range+0x12c/0x340
> > [    C3]  __vmalloc_node+0x8c/0xb0
> > [    C3]  vmalloc+0x2c/0x40
> > [    C3]  show_mem_init+0x1c/0xff8 [test]
> > [    C3]  do_one_initcall+0xe4/0x500
> > [    C3]  do_init_module+0x100/0x358
> > [    C3]  load_module+0x2e64/0x2fc8
> > [    C3]  init_module_from_file+0xec/0x148
> > [    C3]  idempotent_init_module+0x278/0x380
> > [    C3]  __arm64_sys_finit_module+0x88/0xf8
> > [    C3]  invoke_syscall+0x64/0x188
> > [    C3]  el0_svc_common.constprop.1+0xec/0x198
> > [    C3]  do_el0_svc+0x48/0xc8
> > [    C3]  el0_svc+0x3c/0xe8
> > [    C3]  el0t_64_sync_handler+0xa0/0xc8
> > [    C3]  el0t_64_sync+0x188/0x190
> >
> > and for depopuldate pte,
> >
> > [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
> > [    C6] Modules linked in: test(OE+)
> > [    C6] irq event stamp: 39458
> > [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
> > _raw_spin_unlock_irqrestore+0x98/0xb8
> > [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
> > el1_interrupt+0x38/0xa8
> > [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
> > __do_softirq+0x658/0x7ac
> > [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
> > ____do_softirq+0x18/0x30
> > [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
> > 6.5.0+ #595
> > [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > [    C6] Workqueue: events drain_vmap_area_work
> > [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > [    C6] sp : ffff80008fe676b0
> > [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
> > [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
> > [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
> > [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
> > [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
> > [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
> > [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
> > [    C6] Call trace:
> > [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > [    C6]  free_pcppages_bulk+0x2bc/0x3e0
> > [    C6]  free_unref_page_commit+0x1fc/0x290
> > [    C6]  free_unref_page+0x184/0x250
> > [    C6]  __free_pages+0x154/0x1a0
> > [    C6]  free_pages+0x88/0xb0
> > [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
> > [    C6]  __apply_to_page_range+0x3ec/0x650
> > [    C6]  apply_to_existing_page_range+0x1c/0x30
> > [    C6]  kasan_release_vmalloc+0xa4/0x118
> > [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
> > [    C6]  drain_vmap_area_work+0x60/0xc0
> > [    C6]  process_one_work+0x4cc/0xa38
> > [    C6]  worker_thread+0x240/0x638
> > [    C6]  kthread+0x1c8/0x1e0
> > [    C6]  ret_from_fork+0x10/0x20
> >
> >
> >
> >>
> >> Kefeng Wang (3):
> >>    mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
> >>    mm: kasan: shadow: move free_page() out of page table lock
> >>    mm: kasan: shadow: HACK add cond_resched_lock() in
> >>      kasan_depopulate_vmalloc_pte()

The first 2 patches look ok, but yeah, the last is a hack. I also
don't have any better suggestions, only more questions.

Does this only happen on arm64?
Do you have a minimal reproducer you can share?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte
  2023-10-18 16:37     ` Marco Elver
@ 2023-10-19  1:40       ` Kefeng Wang
  2023-10-19  6:17         ` Uladzislau Rezki
  0 siblings, 1 reply; 14+ messages in thread
From: Kefeng Wang @ 2023-10-19  1:40 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton,
	Uladzislau Rezki, Christoph Hellwig, Lorenzo Stoakes, kasan-dev,
	linux-mm



On 2023/10/19 0:37, Marco Elver wrote:
> On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
> <kasan-dev@googlegroups.com> wrote:
>>
>> The issue is easy to reproduced with large vmalloc, kindly ping...
>>
>> On 2023/9/15 8:58, Kefeng Wang wrote:
>>> Hi All, any suggest or comments,many thanks.
>>>
>>> On 2023/9/6 20:42, Kefeng Wang wrote:
>>>> This is a RFC, even patch3 is a hack to fix the softlock issue when
>>>> populate or depopulate pte with large region, looking forward to your
>>>> reply and advise, thanks.
>>>
>>> Here is full stack,for populate pte,
>>>
>>> [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
>>> [    C3] Modules linked in: test(OE+)
>>> [    C3] irq event stamp: 320776
>>> [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>> [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
>>> el1_interrupt+0x38/0xa8
>>> [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
>>> __do_softirq+0x658/0x7ac
>>> [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
>>> ____do_softirq+0x18/0x30
>>> [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
>>> [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>> [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>> [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>> [    C3] sp : ffff800093386d70
>>> [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
>>> [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
>>> [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
>>> [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>> [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
>>> [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
>>> [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>> [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>> [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
>>> [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
>>> [    C3] Call trace:
>>> [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>> [    C3]  rmqueue_bulk+0x434/0x6b8
>>> [    C3]  get_page_from_freelist+0xdd4/0x1680
>>> [    C3]  __alloc_pages+0x244/0x508
>>> [    C3]  alloc_pages+0xf0/0x218
>>> [    C3]  __get_free_pages+0x1c/0x50
>>> [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
>>> [    C3]  __apply_to_page_range+0x3ec/0x650
>>> [    C3]  apply_to_page_range+0x1c/0x30
>>> [    C3]  kasan_populate_vmalloc+0x60/0x70
>>> [    C3]  alloc_vmap_area.part.67+0x328/0xe50
>>> [    C3]  alloc_vmap_area+0x4c/0x78
>>> [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
>>> [    C3]  __vmalloc_node_range+0x12c/0x340
>>> [    C3]  __vmalloc_node+0x8c/0xb0
>>> [    C3]  vmalloc+0x2c/0x40
>>> [    C3]  show_mem_init+0x1c/0xff8 [test]
>>> [    C3]  do_one_initcall+0xe4/0x500
>>> [    C3]  do_init_module+0x100/0x358
>>> [    C3]  load_module+0x2e64/0x2fc8
>>> [    C3]  init_module_from_file+0xec/0x148
>>> [    C3]  idempotent_init_module+0x278/0x380
>>> [    C3]  __arm64_sys_finit_module+0x88/0xf8
>>> [    C3]  invoke_syscall+0x64/0x188
>>> [    C3]  el0_svc_common.constprop.1+0xec/0x198
>>> [    C3]  do_el0_svc+0x48/0xc8
>>> [    C3]  el0_svc+0x3c/0xe8
>>> [    C3]  el0t_64_sync_handler+0xa0/0xc8
>>> [    C3]  el0t_64_sync+0x188/0x190
>>>
>>> and for depopuldate pte,
>>>
>>> [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
>>> [    C6] Modules linked in: test(OE+)
>>> [    C6] irq event stamp: 39458
>>> [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>> [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
>>> el1_interrupt+0x38/0xa8
>>> [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
>>> __do_softirq+0x658/0x7ac
>>> [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
>>> ____do_softirq+0x18/0x30
>>> [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
>>> 6.5.0+ #595
>>> [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>> [    C6] Workqueue: events drain_vmap_area_work
>>> [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>> [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>> [    C6] sp : ffff80008fe676b0
>>> [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
>>> [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
>>> [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
>>> [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>> [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
>>> [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
>>> [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>> [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>> [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
>>> [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
>>> [    C6] Call trace:
>>> [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>> [    C6]  free_pcppages_bulk+0x2bc/0x3e0
>>> [    C6]  free_unref_page_commit+0x1fc/0x290
>>> [    C6]  free_unref_page+0x184/0x250
>>> [    C6]  __free_pages+0x154/0x1a0
>>> [    C6]  free_pages+0x88/0xb0
>>> [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
>>> [    C6]  __apply_to_page_range+0x3ec/0x650
>>> [    C6]  apply_to_existing_page_range+0x1c/0x30
>>> [    C6]  kasan_release_vmalloc+0xa4/0x118
>>> [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
>>> [    C6]  drain_vmap_area_work+0x60/0xc0
>>> [    C6]  process_one_work+0x4cc/0xa38
>>> [    C6]  worker_thread+0x240/0x638
>>> [    C6]  kthread+0x1c8/0x1e0
>>> [    C6]  ret_from_fork+0x10/0x20
>>>
>>>
>>>
>>>>
>>>> Kefeng Wang (3):
>>>>     mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
>>>>     mm: kasan: shadow: move free_page() out of page table lock
>>>>     mm: kasan: shadow: HACK add cond_resched_lock() in
>>>>       kasan_depopulate_vmalloc_pte()
> 
> The first 2 patches look ok, but yeah, the last is a hack. I also
> don't have any better suggestions, only more questions.

Thanks Marco, maybe we could convert free_vmap_area_lock from spinlock 
to mutex lock only if KASAN enabled?

> 
> Does this only happen on arm64?

Our test case run on arm64 qemu(host is x86), so it run much more slower 
than real board.
> Do you have a minimal reproducer you can share?
Here is the code in test driver,

void *buf = vmalloc(40UL << 30);
vfree(buf);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte
  2023-10-19  1:40       ` Kefeng Wang
@ 2023-10-19  6:17         ` Uladzislau Rezki
  2023-10-19  7:26           ` Kefeng Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Uladzislau Rezki @ 2023-10-19  6:17 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Marco Elver, Andrey Ryabinin, Alexander Potapenko,
	Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino,
	Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Lorenzo Stoakes, kasan-dev, linux-mm

On Thu, Oct 19, 2023 at 09:40:10AM +0800, Kefeng Wang wrote:
> 
> 
> On 2023/10/19 0:37, Marco Elver wrote:
> > On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
> > <kasan-dev@googlegroups.com> wrote:
> > > 
> > > The issue is easy to reproduced with large vmalloc, kindly ping...
> > > 
> > > On 2023/9/15 8:58, Kefeng Wang wrote:
> > > > Hi All, any suggest or comments,many thanks.
> > > > 
> > > > On 2023/9/6 20:42, Kefeng Wang wrote:
> > > > > This is a RFC, even patch3 is a hack to fix the softlock issue when
> > > > > populate or depopulate pte with large region, looking forward to your
> > > > > reply and advise, thanks.
> > > > 
> > > > Here is full stack,for populate pte,
> > > > 
> > > > [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
> > > > [    C3] Modules linked in: test(OE+)
> > > > [    C3] irq event stamp: 320776
> > > > [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
> > > > _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
> > > > el1_interrupt+0x38/0xa8
> > > > [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
> > > > __do_softirq+0x658/0x7ac
> > > > [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
> > > > ____do_softirq+0x18/0x30
> > > > [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
> > > > [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > > > [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > [    C3] sp : ffff800093386d70
> > > > [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
> > > > [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
> > > > [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
> > > > [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > > > [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
> > > > [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
> > > > [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > > > [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > > > [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
> > > > [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
> > > > [    C3] Call trace:
> > > > [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > [    C3]  rmqueue_bulk+0x434/0x6b8
> > > > [    C3]  get_page_from_freelist+0xdd4/0x1680
> > > > [    C3]  __alloc_pages+0x244/0x508
> > > > [    C3]  alloc_pages+0xf0/0x218
> > > > [    C3]  __get_free_pages+0x1c/0x50
> > > > [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
> > > > [    C3]  __apply_to_page_range+0x3ec/0x650
> > > > [    C3]  apply_to_page_range+0x1c/0x30
> > > > [    C3]  kasan_populate_vmalloc+0x60/0x70
> > > > [    C3]  alloc_vmap_area.part.67+0x328/0xe50
> > > > [    C3]  alloc_vmap_area+0x4c/0x78
> > > > [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
> > > > [    C3]  __vmalloc_node_range+0x12c/0x340
> > > > [    C3]  __vmalloc_node+0x8c/0xb0
> > > > [    C3]  vmalloc+0x2c/0x40
> > > > [    C3]  show_mem_init+0x1c/0xff8 [test]
> > > > [    C3]  do_one_initcall+0xe4/0x500
> > > > [    C3]  do_init_module+0x100/0x358
> > > > [    C3]  load_module+0x2e64/0x2fc8
> > > > [    C3]  init_module_from_file+0xec/0x148
> > > > [    C3]  idempotent_init_module+0x278/0x380
> > > > [    C3]  __arm64_sys_finit_module+0x88/0xf8
> > > > [    C3]  invoke_syscall+0x64/0x188
> > > > [    C3]  el0_svc_common.constprop.1+0xec/0x198
> > > > [    C3]  do_el0_svc+0x48/0xc8
> > > > [    C3]  el0_svc+0x3c/0xe8
> > > > [    C3]  el0t_64_sync_handler+0xa0/0xc8
> > > > [    C3]  el0t_64_sync+0x188/0x190
> > > > 
> > > > and for depopuldate pte,
> > > > 
> > > > [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
> > > > [    C6] Modules linked in: test(OE+)
> > > > [    C6] irq event stamp: 39458
> > > > [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
> > > > _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
> > > > el1_interrupt+0x38/0xa8
> > > > [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
> > > > __do_softirq+0x658/0x7ac
> > > > [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
> > > > ____do_softirq+0x18/0x30
> > > > [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
> > > > 6.5.0+ #595
> > > > [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > > > [    C6] Workqueue: events drain_vmap_area_work
> > > > [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > [    C6] sp : ffff80008fe676b0
> > > > [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
> > > > [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
> > > > [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
> > > > [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > > > [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
> > > > [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
> > > > [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > > > [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > > > [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
> > > > [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
> > > > [    C6] Call trace:
> > > > [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > [    C6]  free_pcppages_bulk+0x2bc/0x3e0
> > > > [    C6]  free_unref_page_commit+0x1fc/0x290
> > > > [    C6]  free_unref_page+0x184/0x250
> > > > [    C6]  __free_pages+0x154/0x1a0
> > > > [    C6]  free_pages+0x88/0xb0
> > > > [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
> > > > [    C6]  __apply_to_page_range+0x3ec/0x650
> > > > [    C6]  apply_to_existing_page_range+0x1c/0x30
> > > > [    C6]  kasan_release_vmalloc+0xa4/0x118
> > > > [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
> > > > [    C6]  drain_vmap_area_work+0x60/0xc0
> > > > [    C6]  process_one_work+0x4cc/0xa38
> > > > [    C6]  worker_thread+0x240/0x638
> > > > [    C6]  kthread+0x1c8/0x1e0
> > > > [    C6]  ret_from_fork+0x10/0x20
> > > > 
> > > > 
> > > > 
> > > > > 
> > > > > Kefeng Wang (3):
> > > > >     mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
> > > > >     mm: kasan: shadow: move free_page() out of page table lock
> > > > >     mm: kasan: shadow: HACK add cond_resched_lock() in
> > > > >       kasan_depopulate_vmalloc_pte()
> > 
> > The first 2 patches look ok, but yeah, the last is a hack. I also
> > don't have any better suggestions, only more questions.
> 
> Thanks Marco, maybe we could convert free_vmap_area_lock from spinlock to
> mutex lock only if KASAN enabled?
> 
I do not think it is a good suggestion. Could you please clarify the
reason of such conversion?

> > 
> > Does this only happen on arm64?
> 
> Our test case run on arm64 qemu(host is x86), so it run much more slower
> than real board.
> > Do you have a minimal reproducer you can share?
> Here is the code in test driver,
> 
> void *buf = vmalloc(40UL << 30);
> vfree(buf);
> 
What is a test driver? Why do you need 42G of memmory, for which purpose?

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte
  2023-10-19  6:17         ` Uladzislau Rezki
@ 2023-10-19  7:26           ` Kefeng Wang
  2023-10-19  8:53             ` Uladzislau Rezki
  0 siblings, 1 reply; 14+ messages in thread
From: Kefeng Wang @ 2023-10-19  7:26 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Marco Elver, Andrey Ryabinin, Alexander Potapenko,
	Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino,
	Andrew Morton, Christoph Hellwig, Lorenzo Stoakes, kasan-dev,
	linux-mm



On 2023/10/19 14:17, Uladzislau Rezki wrote:
> On Thu, Oct 19, 2023 at 09:40:10AM +0800, Kefeng Wang wrote:
>>
>>
>> On 2023/10/19 0:37, Marco Elver wrote:
>>> On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
>>> <kasan-dev@googlegroups.com> wrote:
>>>>
>>>> The issue is easy to reproduced with large vmalloc, kindly ping...
>>>>
>>>> On 2023/9/15 8:58, Kefeng Wang wrote:
>>>>> Hi All, any suggest or comments,many thanks.
>>>>>
>>>>> On 2023/9/6 20:42, Kefeng Wang wrote:
>>>>>> This is a RFC, even patch3 is a hack to fix the softlock issue when
>>>>>> populate or depopulate pte with large region, looking forward to your
>>>>>> reply and advise, thanks.
>>>>>
>>>>> Here is full stack,for populate pte,
>>>>>
>>>>> [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
>>>>> [    C3] Modules linked in: test(OE+)
>>>>> [    C3] irq event stamp: 320776
>>>>> [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
>>>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>> [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
>>>>> el1_interrupt+0x38/0xa8
>>>>> [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
>>>>> __do_softirq+0x658/0x7ac
>>>>> [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
>>>>> ____do_softirq+0x18/0x30
>>>>> [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
>>>>> [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>>> [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>> [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>> [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>> [    C3] sp : ffff800093386d70
>>>>> [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
>>>>> [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
>>>>> [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
>>>>> [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>>>> [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
>>>>> [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
>>>>> [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>>>> [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>>>> [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
>>>>> [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
>>>>> [    C3] Call trace:
>>>>> [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>> [    C3]  rmqueue_bulk+0x434/0x6b8
>>>>> [    C3]  get_page_from_freelist+0xdd4/0x1680
>>>>> [    C3]  __alloc_pages+0x244/0x508
>>>>> [    C3]  alloc_pages+0xf0/0x218
>>>>> [    C3]  __get_free_pages+0x1c/0x50
>>>>> [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
>>>>> [    C3]  __apply_to_page_range+0x3ec/0x650
>>>>> [    C3]  apply_to_page_range+0x1c/0x30
>>>>> [    C3]  kasan_populate_vmalloc+0x60/0x70
>>>>> [    C3]  alloc_vmap_area.part.67+0x328/0xe50
>>>>> [    C3]  alloc_vmap_area+0x4c/0x78
>>>>> [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
>>>>> [    C3]  __vmalloc_node_range+0x12c/0x340
>>>>> [    C3]  __vmalloc_node+0x8c/0xb0
>>>>> [    C3]  vmalloc+0x2c/0x40
>>>>> [    C3]  show_mem_init+0x1c/0xff8 [test]
>>>>> [    C3]  do_one_initcall+0xe4/0x500
>>>>> [    C3]  do_init_module+0x100/0x358
>>>>> [    C3]  load_module+0x2e64/0x2fc8
>>>>> [    C3]  init_module_from_file+0xec/0x148
>>>>> [    C3]  idempotent_init_module+0x278/0x380
>>>>> [    C3]  __arm64_sys_finit_module+0x88/0xf8
>>>>> [    C3]  invoke_syscall+0x64/0x188
>>>>> [    C3]  el0_svc_common.constprop.1+0xec/0x198
>>>>> [    C3]  do_el0_svc+0x48/0xc8
>>>>> [    C3]  el0_svc+0x3c/0xe8
>>>>> [    C3]  el0t_64_sync_handler+0xa0/0xc8
>>>>> [    C3]  el0t_64_sync+0x188/0x190
>>>>>
>>>>> and for depopuldate pte,
>>>>>
>>>>> [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
>>>>> [    C6] Modules linked in: test(OE+)
>>>>> [    C6] irq event stamp: 39458
>>>>> [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
>>>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>> [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
>>>>> el1_interrupt+0x38/0xa8
>>>>> [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
>>>>> __do_softirq+0x658/0x7ac
>>>>> [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
>>>>> ____do_softirq+0x18/0x30
>>>>> [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
>>>>> 6.5.0+ #595
>>>>> [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>>> [    C6] Workqueue: events drain_vmap_area_work
>>>>> [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>> [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>> [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>> [    C6] sp : ffff80008fe676b0
>>>>> [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
>>>>> [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
>>>>> [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
>>>>> [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>>>> [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
>>>>> [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
>>>>> [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>>>> [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>>>> [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
>>>>> [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
>>>>> [    C6] Call trace:
>>>>> [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>> [    C6]  free_pcppages_bulk+0x2bc/0x3e0
>>>>> [    C6]  free_unref_page_commit+0x1fc/0x290
>>>>> [    C6]  free_unref_page+0x184/0x250
>>>>> [    C6]  __free_pages+0x154/0x1a0
>>>>> [    C6]  free_pages+0x88/0xb0
>>>>> [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
>>>>> [    C6]  __apply_to_page_range+0x3ec/0x650
>>>>> [    C6]  apply_to_existing_page_range+0x1c/0x30
>>>>> [    C6]  kasan_release_vmalloc+0xa4/0x118
>>>>> [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
>>>>> [    C6]  drain_vmap_area_work+0x60/0xc0
>>>>> [    C6]  process_one_work+0x4cc/0xa38
>>>>> [    C6]  worker_thread+0x240/0x638
>>>>> [    C6]  kthread+0x1c8/0x1e0
>>>>> [    C6]  ret_from_fork+0x10/0x20
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Kefeng Wang (3):
>>>>>>      mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
>>>>>>      mm: kasan: shadow: move free_page() out of page table lock
>>>>>>      mm: kasan: shadow: HACK add cond_resched_lock() in
>>>>>>        kasan_depopulate_vmalloc_pte()
>>>
>>> The first 2 patches look ok, but yeah, the last is a hack. I also
>>> don't have any better suggestions, only more questions.
>>
>> Thanks Marco, maybe we could convert free_vmap_area_lock from spinlock to
>> mutex lock only if KASAN enabled?
>>
> I do not think it is a good suggestion. Could you please clarify the
> reason of such conversion?

See Call Trace of softlock, when map/unmap the vmalloc buf, the kasan 
will populate and depopulate vmalloc pte, those will spend more time 
than no-kasan kernel, for unmap, and there is already a 
cond_resched_lock() in __purge_vmap_area_lazy(), but with more time 
consumed under spinlock(free_vmap_area_lock), and we couldn't add 
cond_resched_lock in kasan_depopulate_vmalloc_pte(), so if spin lock 
converted to mutex lock, we could add a cond_resched into kasan 
depopulate, this is why make such conversion if kasan enabled, but this
conversion maybe not correct, any better solution?

> 
>>>
>>> Does this only happen on arm64?
>>
>> Our test case run on arm64 qemu(host is x86), so it run much more slower
>> than real board.
>>> Do you have a minimal reproducer you can share?
>> Here is the code in test driver,
>>
>> void *buf = vmalloc(40UL << 30);
>> vfree(buf);
>>
> What is a test driver? Why do you need 42G of memmory, for which purpose?

This is just to accelerate reproduction of above issue, the main reason 
of the issue is too much time spent during kasan_populate_vmalloc() and
kasan_release_vmalloc().

Thanks.

> 
> --
> Uladzislau Rezki
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte
  2023-10-19  7:26           ` Kefeng Wang
@ 2023-10-19  8:53             ` Uladzislau Rezki
  2023-10-19  9:47               ` Kefeng Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Uladzislau Rezki @ 2023-10-19  8:53 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Uladzislau Rezki, Marco Elver, Andrey Ryabinin,
	Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vincenzo Frascino, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, kasan-dev, linux-mm

On Thu, Oct 19, 2023 at 03:26:48PM +0800, Kefeng Wang wrote:
> 
> 
> On 2023/10/19 14:17, Uladzislau Rezki wrote:
> > On Thu, Oct 19, 2023 at 09:40:10AM +0800, Kefeng Wang wrote:
> > > 
> > > 
> > > On 2023/10/19 0:37, Marco Elver wrote:
> > > > On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
> > > > <kasan-dev@googlegroups.com> wrote:
> > > > > 
> > > > > The issue is easy to reproduced with large vmalloc, kindly ping...
> > > > > 
> > > > > On 2023/9/15 8:58, Kefeng Wang wrote:
> > > > > > Hi All, any suggest or comments,many thanks.
> > > > > > 
> > > > > > On 2023/9/6 20:42, Kefeng Wang wrote:
> > > > > > > This is a RFC, even patch3 is a hack to fix the softlock issue when
> > > > > > > populate or depopulate pte with large region, looking forward to your
> > > > > > > reply and advise, thanks.
> > > > > > 
> > > > > > Here is full stack,for populate pte,
> > > > > > 
> > > > > > [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
> > > > > > [    C3] Modules linked in: test(OE+)
> > > > > > [    C3] irq event stamp: 320776
> > > > > > [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
> > > > > > _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > > > [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
> > > > > > el1_interrupt+0x38/0xa8
> > > > > > [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
> > > > > > __do_softirq+0x658/0x7ac
> > > > > > [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
> > > > > > ____do_softirq+0x18/0x30
> > > > > > [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
> > > > > > [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > > > > > [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > > > [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > > > [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > > > [    C3] sp : ffff800093386d70
> > > > > > [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
> > > > > > [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
> > > > > > [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
> > > > > > [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > > > > > [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
> > > > > > [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
> > > > > > [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > > > > > [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > > > > > [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
> > > > > > [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
> > > > > > [    C3] Call trace:
> > > > > > [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > > > [    C3]  rmqueue_bulk+0x434/0x6b8
> > > > > > [    C3]  get_page_from_freelist+0xdd4/0x1680
> > > > > > [    C3]  __alloc_pages+0x244/0x508
> > > > > > [    C3]  alloc_pages+0xf0/0x218
> > > > > > [    C3]  __get_free_pages+0x1c/0x50
> > > > > > [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
> > > > > > [    C3]  __apply_to_page_range+0x3ec/0x650
> > > > > > [    C3]  apply_to_page_range+0x1c/0x30
> > > > > > [    C3]  kasan_populate_vmalloc+0x60/0x70
> > > > > > [    C3]  alloc_vmap_area.part.67+0x328/0xe50
> > > > > > [    C3]  alloc_vmap_area+0x4c/0x78
> > > > > > [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
> > > > > > [    C3]  __vmalloc_node_range+0x12c/0x340
> > > > > > [    C3]  __vmalloc_node+0x8c/0xb0
> > > > > > [    C3]  vmalloc+0x2c/0x40
> > > > > > [    C3]  show_mem_init+0x1c/0xff8 [test]
> > > > > > [    C3]  do_one_initcall+0xe4/0x500
> > > > > > [    C3]  do_init_module+0x100/0x358
> > > > > > [    C3]  load_module+0x2e64/0x2fc8
> > > > > > [    C3]  init_module_from_file+0xec/0x148
> > > > > > [    C3]  idempotent_init_module+0x278/0x380
> > > > > > [    C3]  __arm64_sys_finit_module+0x88/0xf8
> > > > > > [    C3]  invoke_syscall+0x64/0x188
> > > > > > [    C3]  el0_svc_common.constprop.1+0xec/0x198
> > > > > > [    C3]  do_el0_svc+0x48/0xc8
> > > > > > [    C3]  el0_svc+0x3c/0xe8
> > > > > > [    C3]  el0t_64_sync_handler+0xa0/0xc8
> > > > > > [    C3]  el0t_64_sync+0x188/0x190
> > > > > > 
This trace is stuck in the rmqueue_bulk() because you request a
huge alloc size. It has nothing to do with free_vmap_area_lock,
it is about bulk allocator. It gets stuck to accomplish such
demand.


> > > > > > and for depopuldate pte,
> > > > > > 
> > > > > > [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
> > > > > > [    C6] Modules linked in: test(OE+)
> > > > > > [    C6] irq event stamp: 39458
> > > > > > [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
> > > > > > _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > > > [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
> > > > > > el1_interrupt+0x38/0xa8
> > > > > > [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
> > > > > > __do_softirq+0x658/0x7ac
> > > > > > [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
> > > > > > ____do_softirq+0x18/0x30
> > > > > > [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
> > > > > > 6.5.0+ #595
> > > > > > [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > > > > > [    C6] Workqueue: events drain_vmap_area_work
> > > > > > [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > > > [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > > > [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > > > [    C6] sp : ffff80008fe676b0
> > > > > > [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
> > > > > > [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
> > > > > > [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
> > > > > > [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > > > > > [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
> > > > > > [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
> > > > > > [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > > > > > [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > > > > > [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
> > > > > > [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
> > > > > > [    C6] Call trace:
> > > > > > [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > > > [    C6]  free_pcppages_bulk+0x2bc/0x3e0
> > > > > > [    C6]  free_unref_page_commit+0x1fc/0x290
> > > > > > [    C6]  free_unref_page+0x184/0x250
> > > > > > [    C6]  __free_pages+0x154/0x1a0
> > > > > > [    C6]  free_pages+0x88/0xb0
> > > > > > [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
> > > > > > [    C6]  __apply_to_page_range+0x3ec/0x650
> > > > > > [    C6]  apply_to_existing_page_range+0x1c/0x30
> > > > > > [    C6]  kasan_release_vmalloc+0xa4/0x118
> > > > > > [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
> > > > > > [    C6]  drain_vmap_area_work+0x60/0xc0
> > > > > > [    C6]  process_one_work+0x4cc/0xa38
> > > > > > [    C6]  worker_thread+0x240/0x638
> > > > > > [    C6]  kthread+0x1c8/0x1e0
> > > > > > [    C6]  ret_from_fork+0x10/0x20
> > > > > > 
> 
> See Call Trace of softlock, when map/unmap the vmalloc buf, the kasan will
> populate and depopulate vmalloc pte, those will spend more time than
> no-kasan kernel, for unmap, and there is already a cond_resched_lock() in
> __purge_vmap_area_lazy(), but with more time consumed under
> spinlock(free_vmap_area_lock), and we couldn't add cond_resched_lock in
> kasan_depopulate_vmalloc_pte(), so if spin lock converted to mutex lock, we
> could add a cond_resched into kasan depopulate, this is why make such
> conversion if kasan enabled, but this
> conversion maybe not correct, any better solution?
> 
I have at least below thoughts:

a) Add a max allowed threshold that user can request over vmalloc() call.
  I do not think ~40G is a correct request.

b) This can fix unmap path: 

<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ef8599d394fd..988735da5c5c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1723,7 +1723,6 @@ static void purge_fragmented_blocks_allcpus(void);
  */
 static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
 {
-	unsigned long resched_threshold;
 	unsigned int num_purged_areas = 0;
 	struct list_head local_purge_list;
 	struct vmap_area *va, *n_va;
@@ -1747,36 +1746,32 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
 			struct vmap_area, list)->va_end);
 
 	flush_tlb_kernel_range(start, end);
-	resched_threshold = lazy_max_pages() << 1;
 
-	spin_lock(&free_vmap_area_lock);
 	list_for_each_entry_safe(va, n_va, &local_purge_list, list) {
 		unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
 		unsigned long orig_start = va->va_start;
 		unsigned long orig_end = va->va_end;
 
+		if (is_vmalloc_or_module_addr((void *)orig_start))
+			kasan_release_vmalloc(orig_start, orig_end,
+					      va->va_start, va->va_end);
+
 		/*
 		 * Finally insert or merge lazily-freed area. It is
 		 * detached and there is no need to "unlink" it from
 		 * anything.
 		 */
+		spin_lock(&free_vmap_area_lock);
 		va = merge_or_add_vmap_area_augment(va, &free_vmap_area_root,
 				&free_vmap_area_list);
+		spin_unlock(&free_vmap_area_lock);
 
 		if (!va)
 			continue;
 
-		if (is_vmalloc_or_module_addr((void *)orig_start))
-			kasan_release_vmalloc(orig_start, orig_end,
-					      va->va_start, va->va_end);
-
 		atomic_long_sub(nr, &vmap_lazy_nr);
 		num_purged_areas++;
-
-		if (atomic_long_read(&vmap_lazy_nr) < resched_threshold)
-			cond_resched_lock(&free_vmap_area_lock);
 	}
-	spin_unlock(&free_vmap_area_lock);
 
 out:
 	trace_purge_vmap_area_lazy(start, end, num_purged_areas);
<snip>

c) bulk-path i have not checked, but on a high level kasan_populate_vmalloc()
should take a breath between requests.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte
  2023-10-19  8:53             ` Uladzislau Rezki
@ 2023-10-19  9:47               ` Kefeng Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Kefeng Wang @ 2023-10-19  9:47 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Marco Elver, Andrey Ryabinin, Alexander Potapenko,
	Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino,
	Andrew Morton, Christoph Hellwig, Lorenzo Stoakes, kasan-dev,
	linux-mm



On 2023/10/19 16:53, Uladzislau Rezki wrote:
> On Thu, Oct 19, 2023 at 03:26:48PM +0800, Kefeng Wang wrote:
>>
>>
>> On 2023/10/19 14:17, Uladzislau Rezki wrote:
>>> On Thu, Oct 19, 2023 at 09:40:10AM +0800, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2023/10/19 0:37, Marco Elver wrote:
>>>>> On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
>>>>> <kasan-dev@googlegroups.com> wrote:
>>>>>>
>>>>>> The issue is easy to reproduced with large vmalloc, kindly ping...
>>>>>>
>>>>>> On 2023/9/15 8:58, Kefeng Wang wrote:
>>>>>>> Hi All, any suggest or comments,many thanks.
>>>>>>>
>>>>>>> On 2023/9/6 20:42, Kefeng Wang wrote:
>>>>>>>> This is a RFC, even patch3 is a hack to fix the softlock issue when
>>>>>>>> populate or depopulate pte with large region, looking forward to your
>>>>>>>> reply and advise, thanks.
>>>>>>>
>>>>>>> Here is full stack,for populate pte,
>>>>>>>
>>>>>>> [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
>>>>>>> [    C3] Modules linked in: test(OE+)
>>>>>>> [    C3] irq event stamp: 320776
>>>>>>> [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
>>>>>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>>>> [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
>>>>>>> el1_interrupt+0x38/0xa8
>>>>>>> [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
>>>>>>> __do_softirq+0x658/0x7ac
>>>>>>> [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
>>>>>>> ____do_softirq+0x18/0x30
>>>>>>> [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
>>>>>>> [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>>>>> [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>>> [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>>>> [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>>>> [    C3] sp : ffff800093386d70
>>>>>>> [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
>>>>>>> [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
>>>>>>> [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
>>>>>>> [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>>>>>> [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
>>>>>>> [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
>>>>>>> [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>>>>>> [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>>>>>> [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
>>>>>>> [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
>>>>>>> [    C3] Call trace:
>>>>>>> [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>>>> [    C3]  rmqueue_bulk+0x434/0x6b8
>>>>>>> [    C3]  get_page_from_freelist+0xdd4/0x1680
>>>>>>> [    C3]  __alloc_pages+0x244/0x508
>>>>>>> [    C3]  alloc_pages+0xf0/0x218
>>>>>>> [    C3]  __get_free_pages+0x1c/0x50
>>>>>>> [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
>>>>>>> [    C3]  __apply_to_page_range+0x3ec/0x650
>>>>>>> [    C3]  apply_to_page_range+0x1c/0x30
>>>>>>> [    C3]  kasan_populate_vmalloc+0x60/0x70
>>>>>>> [    C3]  alloc_vmap_area.part.67+0x328/0xe50
>>>>>>> [    C3]  alloc_vmap_area+0x4c/0x78
>>>>>>> [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
>>>>>>> [    C3]  __vmalloc_node_range+0x12c/0x340
>>>>>>> [    C3]  __vmalloc_node+0x8c/0xb0
>>>>>>> [    C3]  vmalloc+0x2c/0x40
>>>>>>> [    C3]  show_mem_init+0x1c/0xff8 [test]
>>>>>>> [    C3]  do_one_initcall+0xe4/0x500
>>>>>>> [    C3]  do_init_module+0x100/0x358
>>>>>>> [    C3]  load_module+0x2e64/0x2fc8
>>>>>>> [    C3]  init_module_from_file+0xec/0x148
>>>>>>> [    C3]  idempotent_init_module+0x278/0x380
>>>>>>> [    C3]  __arm64_sys_finit_module+0x88/0xf8
>>>>>>> [    C3]  invoke_syscall+0x64/0x188
>>>>>>> [    C3]  el0_svc_common.constprop.1+0xec/0x198
>>>>>>> [    C3]  do_el0_svc+0x48/0xc8
>>>>>>> [    C3]  el0_svc+0x3c/0xe8
>>>>>>> [    C3]  el0t_64_sync_handler+0xa0/0xc8
>>>>>>> [    C3]  el0t_64_sync+0x188/0x190
>>>>>>>
> This trace is stuck in the rmqueue_bulk() because you request a
> huge alloc size. It has nothing to do with free_vmap_area_lock,
> it is about bulk allocator. It gets stuck to accomplish such
> demand.

Yes, this is not about spinlock issue, it runs too much time in
kasan_populate_vmalloc() as the __apply_to_page_range() with a
large range, and this issue could be fixed by adding a cond_resched()
in kasan_populate_vmalloc(), see patch1.



> 
> 
>>>>>>> and for depopuldate pte,
>>>>>>>
>>>>>>> [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
>>>>>>> [    C6] Modules linked in: test(OE+)
>>>>>>> [    C6] irq event stamp: 39458
>>>>>>> [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
>>>>>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>>>> [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
>>>>>>> el1_interrupt+0x38/0xa8
>>>>>>> [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
>>>>>>> __do_softirq+0x658/0x7ac
>>>>>>> [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
>>>>>>> ____do_softirq+0x18/0x30
>>>>>>> [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
>>>>>>> 6.5.0+ #595
>>>>>>> [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>>>>> [    C6] Workqueue: events drain_vmap_area_work
>>>>>>> [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>>> [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>>>> [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>>>> [    C6] sp : ffff80008fe676b0
>>>>>>> [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
>>>>>>> [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
>>>>>>> [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
>>>>>>> [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>>>>>> [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
>>>>>>> [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
>>>>>>> [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>>>>>> [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>>>>>> [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
>>>>>>> [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
>>>>>>> [    C6] Call trace:
>>>>>>> [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>>>> [    C6]  free_pcppages_bulk+0x2bc/0x3e0
>>>>>>> [    C6]  free_unref_page_commit+0x1fc/0x290
>>>>>>> [    C6]  free_unref_page+0x184/0x250
>>>>>>> [    C6]  __free_pages+0x154/0x1a0
>>>>>>> [    C6]  free_pages+0x88/0xb0
>>>>>>> [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
>>>>>>> [    C6]  __apply_to_page_range+0x3ec/0x650
>>>>>>> [    C6]  apply_to_existing_page_range+0x1c/0x30
>>>>>>> [    C6]  kasan_release_vmalloc+0xa4/0x118
>>>>>>> [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
>>>>>>> [    C6]  drain_vmap_area_work+0x60/0xc0
>>>>>>> [    C6]  process_one_work+0x4cc/0xa38
>>>>>>> [    C6]  worker_thread+0x240/0x638
>>>>>>> [    C6]  kthread+0x1c8/0x1e0
>>>>>>> [    C6]  ret_from_fork+0x10/0x20
>>>>>>>
>>
>> See Call Trace of softlock, when map/unmap the vmalloc buf, the kasan will
>> populate and depopulate vmalloc pte, those will spend more time than
>> no-kasan kernel, for unmap, and there is already a cond_resched_lock() in
>> __purge_vmap_area_lazy(), but with more time consumed under
>> spinlock(free_vmap_area_lock), and we couldn't add cond_resched_lock in
>> kasan_depopulate_vmalloc_pte(), so if spin lock converted to mutex lock, we
>> could add a cond_resched into kasan depopulate, this is why make such
>> conversion if kasan enabled, but this
>> conversion maybe not correct, any better solution?
>>
> I have at least below thoughts:
> 
> a) Add a max allowed threshold that user can request over vmalloc() call.
>    I do not think ~40G is a correct request.

I don't know, but maybe some driver could map large range , but we do
meet this issue in qemu, though it is very low probability.

> 
> b) This can fix unmap path:
> 
> <snip>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ef8599d394fd..988735da5c5c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1723,7 +1723,6 @@ static void purge_fragmented_blocks_allcpus(void);
>    */
>   static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>   {
> -	unsigned long resched_threshold;
>   	unsigned int num_purged_areas = 0;
>   	struct list_head local_purge_list;
>   	struct vmap_area *va, *n_va;
> @@ -1747,36 +1746,32 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>   			struct vmap_area, list)->va_end);
>   
>   	flush_tlb_kernel_range(start, end);
> -	resched_threshold = lazy_max_pages() << 1;
>   
> -	spin_lock(&free_vmap_area_lock);
>   	list_for_each_entry_safe(va, n_va, &local_purge_list, list) {
>   		unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
>   		unsigned long orig_start = va->va_start;
>   		unsigned long orig_end = va->va_end;
>   
> +		if (is_vmalloc_or_module_addr((void *)orig_start))
> +			kasan_release_vmalloc(orig_start, orig_end,
> +					      va->va_start, va->va_end);


> +
>   		/*
>   		 * Finally insert or merge lazily-freed area. It is
>   		 * detached and there is no need to "unlink" it from
>   		 * anything.
>   		 */
> +		spin_lock(&free_vmap_area_lock);
>   		va = merge_or_add_vmap_area_augment(va, &free_vmap_area_root,
>   				&free_vmap_area_list);
> +		spin_unlock(&free_vmap_area_lock);
>   
>   		if (!va)
>   			continue;
>   
> -		if (is_vmalloc_or_module_addr((void *)orig_start))
> -			kasan_release_vmalloc(orig_start, orig_end,
> -					      va->va_start, va->va_end);
> -
>   		atomic_long_sub(nr, &vmap_lazy_nr);
>   		num_purged_areas++;
> -
> -		if (atomic_long_read(&vmap_lazy_nr) < resched_threshold)
> -			cond_resched_lock(&free_vmap_area_lock);
>   	}
> -	spin_unlock(&free_vmap_area_lock);
>   
>   out:
>   	trace_purge_vmap_area_lazy(start, end, num_purged_areas);
> <snip>

Thanks for you suggestion. but check kasan_release_vmalloc(), it seems
that kasan_release_vmalloc() need free_vmap_area_lock from comment[1],

Marco and all kasan maintainers, please help to check the above way.

[1] https://elixir.bootlin.com/linux/v6.6-rc6/source/mm/kasan/shadow.c#L491

> 
> c) bulk-path i have not checked, but on a high level kasan_populate_vmalloc()
> should take a breath between requests.
> 
> --
> Uladzislau Rezki
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-10-19  9:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-06 12:42 [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Kefeng Wang
2023-09-06 12:42 ` [PATCH -rfc 1/3] mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte() Kefeng Wang
2023-09-06 12:42 ` [PATCH -rfc 2/3] mm: kasan: shadow: move free_page() out of page table lock Kefeng Wang
2023-09-06 12:42 ` [PATCH -rfc 3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte() Kefeng Wang
2023-09-13  8:48   ` kernel test robot
2023-09-13 11:21     ` Kefeng Wang
2023-09-15  0:58 ` [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Kefeng Wang
2023-10-18 14:16   ` Kefeng Wang
2023-10-18 16:37     ` Marco Elver
2023-10-19  1:40       ` Kefeng Wang
2023-10-19  6:17         ` Uladzislau Rezki
2023-10-19  7:26           ` Kefeng Wang
2023-10-19  8:53             ` Uladzislau Rezki
2023-10-19  9:47               ` Kefeng Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox