* [linus:master] [alloc_tag] 93d5440ece: WARNING:at_include/linux/alloc_tag.h:#__pgalloc_tag_sub
@ 2025-04-15 8:35 kernel test robot
2025-05-03 11:51 ` David Wang
0 siblings, 1 reply; 3+ messages in thread
From: kernel test robot @ 2025-04-15 8:35 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Shakeel Butt,
David Wang, Kent Overstreet, Minchan Kim, Pasha Tatashin,
Peter Zijlstra, Sourav Panda, Steven Rostedt, Vlastimil Babka,
Yu Zhao, Zhenhua Huang, linux-mm, oliver.sang
Hello,
seems one random WARN just changes the stat due to this commit.
a642b27b991fd663 93d5440ece3c0aa341fb02e3a44
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:24 12% 3:24 dmesg.WARNING:at_include/linux/alloc_tag.h:#__pgalloc_tag_sub
5:24 -21% :24 dmesg.WARNING:at_include/linux/alloc_tag.h:#pgalloc_tag_sub
below report is just FYI what we observed in our tests.
kernel test robot noticed "WARNING:at_include/linux/alloc_tag.h:#__pgalloc_tag_sub" on:
commit: 93d5440ece3c0aa341fb02e3a44a1b7ab44304c8 ("alloc_tag: uninline code gated by mem_alloc_profiling_key in page allocator")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master 900241a5cc15e6e0709a012051cc72d224cd6a6e]
[test failed on linux-next/master 01c6df60d5d4ae00cd5c1648818744838bba7763]
in testcase: trinity
version: trinity-x86_64-ba2360ed-1_20241228
with following parameters:
runtime: 300s
group: group-00
nr_groups: 5
config: x86_64-randconfig-161-20250410
compiler: gcc-11
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202504151659.9b09c785-lkp@intel.com
[ 147.337988][ T2016] ------------[ cut here ]------------
[ 147.338915][ T2016] alloc_tag was not set
[ 147.339502][ T2016] WARNING: CPU: 0 PID: 2016 at include/linux/alloc_tag.h:152 __pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089)
[ 147.341127][ T2016] Modules linked in:
[ 147.341672][ T2016] CPU: 0 UID: 0 PID: 2016 Comm: grep Tainted: G T 6.14.0-rc6-00062-g93d5440ece3c #1 c08622b3723459177a60d595773689e527750d0d
[ 147.343295][ T2016] Tainted: [T]=RANDSTRUCT
[ 147.343867][ T2016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 147.345236][ T2016] RIP: 0010:__pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089)
[ 147.345982][ T2016] Code: 41 5e 41 5f 5d c3 49 c7 47 e0 00 00 00 00 80 3d 25 ef 66 0a 00 75 a9 48 c7 c7 e0 91 39 8e c6 05 15 ef 66 0a 01 e8 3f bb ad ff <0f> 0b eb 92 48 c7 c0 20 2b e6 90 48 ba 00 00 00 00 00 fc ff df 48
All code
========
0: 41 5e pop %r14
2: 41 5f pop %r15
4: 5d pop %rbp
5: c3 ret
6: 49 c7 47 e0 00 00 00 movq $0x0,-0x20(%r15)
d: 00
e: 80 3d 25 ef 66 0a 00 cmpb $0x0,0xa66ef25(%rip) # 0xa66ef3a
15: 75 a9 jne 0xffffffffffffffc0
17: 48 c7 c7 e0 91 39 8e mov $0xffffffff8e3991e0,%rdi
1e: c6 05 15 ef 66 0a 01 movb $0x1,0xa66ef15(%rip) # 0xa66ef3a
25: e8 3f bb ad ff call 0xffffffffffadbb69
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb 92 jmp 0xffffffffffffffc0
2e: 48 c7 c0 20 2b e6 90 mov $0xffffffff90e62b20,%rax
35: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx
3c: fc ff df
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb 92 jmp 0xffffffffffffff96
4: 48 c7 c0 20 2b e6 90 mov $0xffffffff90e62b20,%rax
b: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx
12: fc ff df
15: 48 rex.W
[ 147.348298][ T2016] RSP: 0018:ffffc90001def730 EFLAGS: 00010282
[ 147.349063][ T2016] RAX: dffffc0000000000 RBX: 1ffff920003bdee7 RCX: 0000000000000001
[ 147.350021][ T2016] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffffff9059d5a8
[ 147.354294][ T2016] RBP: ffffc90001def7a0 R08: ffffffff87fd99d0 R09: fffffbfff20b3ab5
[ 147.355329][ T2016] R10: ffffffff9059d5ab R11: 0000000000000001 R12: ffff888106402c58
[ 147.356345][ T2016] R13: 0000000000000000 R14: 0000000000000001 R15: ffffc90001def778
[ 147.357317][ T2016] FS: 0000000000000000(0000) GS:ffffffff904be000(0000) knlGS:0000000000000000
[ 147.358402][ T2016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 147.359199][ T2016] CR2: 00007fedda922200 CR3: 0000000155129000 CR4: 00000000000406f0
[ 147.360209][ T2016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 147.361208][ T2016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 147.362220][ T2016] Call Trace:
[ 147.362713][ T2016] <TASK>
[ 147.363178][ T2016] ? show_regs (arch/x86/kernel/dumpstack.c:479)
[ 147.363763][ T2016] ? __warn (kernel/panic.c:748)
[ 147.364366][ T2016] ? __wake_up_klogd (arch/x86/include/asm/preempt.h:94 kernel/printk/printk.c:4525)
[ 147.365015][ T2016] ? __pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089)
[ 147.365689][ T2016] ? report_bug (lib/bug.c:201 lib/bug.c:219)
[ 147.366349][ T2016] ? page_ext_get (include/linux/rcupdate.h:337 include/linux/rcupdate.h:849 mm/page_ext.c:525)
[ 147.366988][ T2016] ? handle_bug (arch/x86/kernel/traps.c:285)
[ 147.367583][ T2016] ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
[ 147.368249][ T2016] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:574)
[ 147.368921][ T2016] ? irq_work_queue (arch/x86/include/asm/atomic.h:23 arch/x86/include/asm/atomic.h:145 include/linux/atomic/atomic-arch-fallback.h:1690 include/linux/atomic/atomic-instrumented.h:954 kernel/irq_work.c:61 kernel/irq_work.c:119)
[ 147.369562][ T2016] ? __pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089)
[ 147.370244][ T2016] ? __alloc_contig_migrate_range (mm/page_alloc.c:1084)
[ 147.371016][ T2016] free_frozen_pages (mm/page_alloc.c:1211 mm/page_alloc.c:2738)
[ 147.371700][ T2016] __free_slab (mm/slub.c:2669)
[ 147.372331][ T2016] free_slab (mm/slub.c:2692)
[ 147.372900][ T2016] free_to_partial_list (mm/slub.c:4414)
[ 147.373569][ T2016] ? qlist_free_all (mm/kasan/quarantine.c:163 mm/kasan/quarantine.c:179)
[ 147.374213][ T2016] __slab_free (mm/slub.c:4534)
[ 147.374819][ T2016] ? __kasan_check_read (mm/kasan/shadow.c:32)
[ 147.375470][ T2016] ? mark_lock (arch/x86/include/asm/bitops.h:227 (discriminator 3) arch/x86/include/asm/bitops.h:239 (discriminator 3) include/asm-generic/bitops/instrumented-non-atomic.h:142 (discriminator 3) kernel/locking/lockdep.c:230 (discriminator 3) kernel/locking/lockdep.c:4729 (discriminator 3))
[ 147.376062][ T2016] ? mark_held_locks (kernel/locking/lockdep.c:4323)
[ 147.376701][ T2016] ___cache_free (mm/slub.c:4681)
[ 147.377248][ T2016] qlist_free_all (mm/kasan/quarantine.c:174)
[ 147.377795][ T2016] kasan_quarantine_reduce (include/linux/srcu.h:357 mm/kasan/quarantine.c:287)
[ 147.378470][ T2016] __kasan_slab_alloc (mm/kasan/common.c:329)
[ 147.379088][ T2016] kmem_cache_alloc_noprof (include/linux/kasan.h:250 mm/slub.c:4128 mm/slub.c:4177 mm/slub.c:4184)
[ 147.379784][ T2016] getname_flags (include/linux/sched.h:2248 fs/namei.c:139)
[ 147.380436][ T2016] getname (fs/namei.c:224)
[ 147.380969][ T2016] do_sys_openat2 (fs/open.c:1422)
[ 147.381587][ T2016] ? build_open_flags (fs/open.c:1414)
[ 147.382258][ T2016] __x64_sys_openat (fs/open.c:1454)
[ 147.382899][ T2016] ? __ia32_compat_sys_open (fs/open.c:1454)
[ 147.383611][ T2016] x64_sys_call (arch/x86/entry/syscall_64.c:36)
[ 147.384281][ T2016] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
[ 147.384977][ T2016] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 147.385724][ T2016] RIP: 0033:0x7fedda9e895d
[ 147.386333][ T2016] Code: 48 89 54 24 e0 41 83 e2 40 75 32 89 f0 25 00 00 41 00 3d 00 00 41 00 74 24 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 33 c3 66 2e 0f 1f 84 00 00 00 00 00 48 8d 44
All code
========
0: 48 89 54 24 e0 mov %rdx,-0x20(%rsp)
5: 41 83 e2 40 and $0x40,%r10d
9: 75 32 jne 0x3d
b: 89 f0 mov %esi,%eax
d: 25 00 00 41 00 and $0x410000,%eax
12: 3d 00 00 41 00 cmp $0x410000,%eax
17: 74 24 je 0x3d
19: 89 f2 mov %esi,%edx
1b: b8 01 01 00 00 mov $0x101,%eax
20: 48 89 fe mov %rdi,%rsi
23: bf 9c ff ff ff mov $0xffffff9c,%edi
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 33 ja 0x65
32: c3 ret
33: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
3a: 00 00 00
3d: 48 rex.W
3e: 8d .byte 0x8d
3f: 44 rex.R
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 33 ja 0x3b
8: c3 ret
9: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
10: 00 00 00
13: 48 rex.W
14: 8d .byte 0x8d
15: 44 rex.R
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250415/202504151659.9b09c785-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 3+ messages in thread* Re:[linus:master] [alloc_tag] 93d5440ece: WARNING:at_include/linux/alloc_tag.h:#__pgalloc_tag_sub 2025-04-15 8:35 [linus:master] [alloc_tag] 93d5440ece: WARNING:at_include/linux/alloc_tag.h:#__pgalloc_tag_sub kernel test robot @ 2025-05-03 11:51 ` David Wang 2025-05-05 16:57 ` [linus:master] " Suren Baghdasaryan 0 siblings, 1 reply; 3+ messages in thread From: David Wang @ 2025-05-03 11:51 UTC (permalink / raw) To: kernel test robot, Suren Baghdasaryan Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Shakeel Butt, Kent Overstreet, Minchan Kim, Pasha Tatashin, Peter Zijlstra, Sourav Panda, Steven Rostedt, Vlastimil Babka, Yu Zhao, Zhenhua Huang, linux-mm Hi, I have been running my system with CONFIG_MEM_ALLOC_PROFILING_DEBUG=y for a long while, trying to reproduce this. Though I have not yet hit the exact call traces, but I got the same warning via a "simpler" trace: [Fri May 2 15:07:00 2025] alloc_tag was not set [Fri May 2 15:07:00 2025] WARNING: CPU: 0 PID: 677 at ./include/linux/alloc_tag.h:156 ./include/linux/alloc_tag.h:154 ./include/linux/pgalloc_tag.h:182 mm/page_alloc.c:1163 ___free_pages mm/page_alloc.c:5072 <---- code[1] below drivers/net/wireless/intel/iwlwifi/pcie/rx.c:1417 <---code[2] below iwl_pcie_rx_handle drivers/net/wireless/intel/iwlwifi/pcie/rx.c:1568 iwlwifi <other traces are irrelevant...I think> code[1]: 5063 static void ___free_pages(struct page *page, unsigned int order, 5064 fpi_t fpi_flags) 5065 { 5066 /* get PageHead before we drop reference */ 5067 int head = PageHead(page); 5068 5069 if (put_page_testzero(page)) 5070 __free_frozen_pages(page, order, fpi_flags); 5071 else if (!head) { 5072 pgalloc_tag_sub_pages(page, (1 << order) - 1); <-----at this point, page[0] may already be returned. 5073 while (order-- > 0) 5074 __free_frozen_pages(page + (1 << order), order, 5075 fpi_flags); 5076 } 5077 } code[2]: 1415 /* page was stolen from us -- free our reference */ 1416 if (page_stolen) { 1417 __free_pages(rxb->page, trans_pcie->rx_page_order); 1418 rxb->page = NULL; 1419 } From those codes above, my guess is: Thread1 Thread2 iwlwifi alloc page of order "0" some callback use get_page to "steal" the memory. iwlwifi notice the page is "stolen", and call __free_pages to release its reference to the page, and if (put_page_testzero(page)) failed call put_page to release its reference, and page's reference count drop to 0 now, and the page is released pgalloc_tag_sub_pages is called to sub "0" page, but the page is already gone, hence the warning. It is potentially dangerous to pgalloc_tag_sub_pages a released page. Kind of feel the pgalloc_tag_sub_pages here should be removed: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5669baf2a6fe..63c160537045 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5069,7 +5069,6 @@ static void ___free_pages(struct page *page, unsigned int order, if (put_page_testzero(page)) __free_frozen_pages(page, order, fpi_flags); else if (!head) { - pgalloc_tag_sub_pages(page, (1 << order) - 1); while (order-- > 0) __free_frozen_pages(page + (1 << order), order, fpi_flags); And about the warning in the origianl report by "kernel test robot", it is not the same. I think there are places where high-order pages are released via several low-order pages, and my understanding is that only the first page has the tag, but I am not quite sure, correct me if I am wrong... (The whole MM is quite complicated to me.). And I believe when the lower-order page is released without a tag, a debug warning should follow The warning is kind of "benign" under those conditions, though Thanks David At 2025-04-15 16:35:47, "kernel test robot" <oliver.sang@intel.com> wrote: > > >Hello, > > > > > >If you fix the issue in a separate patch/commit (i.e. not just a new version of >the same patch/commit), kindly add following tags >| Reported-by: kernel test robot <oliver.sang@intel.com> >| Closes: https://lore.kernel.org/oe-lkp/202504151659.9b09c785-lkp@intel.com > > >[ 147.337988][ T2016] ------------[ cut here ]------------ >[ 147.338915][ T2016] alloc_tag was not set >[ 147.339502][ T2016] WARNING: CPU: 0 PID: 2016 at include/linux/alloc_tag.h:152 __pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089) >[ 147.341127][ T2016] Modules linked in: >[ 147.341672][ T2016] CPU: 0 UID: 0 PID: 2016 Comm: grep Tainted: G T 6.14.0-rc6-00062-g93d5440ece3c #1 c08622b3723459177a60d595773689e527750d0d >[ 147.343295][ T2016] Tainted: [T]=RANDSTRUCT >[ 147.343867][ T2016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 >[ 147.345236][ T2016] RIP: 0010:__pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089) >[ 147.345982][ T2016] Code: 41 5e 41 5f 5d c3 49 c7 47 e0 00 00 00 00 80 3d 25 ef 66 0a 00 75 a9 48 c7 c7 e0 91 39 8e c6 05 15 ef 66 0a 01 e8 3f bb ad ff <0f> 0b eb 92 48 c7 c0 20 2b e6 90 48 ba 00 00 00 00 00 fc ff df 48 >All code >======== > 0: 41 5e pop %r14 > 2: 41 5f pop %r15 > 4: 5d pop %rbp > 5: c3 ret > 6: 49 c7 47 e0 00 00 00 movq $0x0,-0x20(%r15) > d: 00 > e: 80 3d 25 ef 66 0a 00 cmpb $0x0,0xa66ef25(%rip) # 0xa66ef3a > 15: 75 a9 jne 0xffffffffffffffc0 > 17: 48 c7 c7 e0 91 39 8e mov $0xffffffff8e3991e0,%rdi > 1e: c6 05 15 ef 66 0a 01 movb $0x1,0xa66ef15(%rip) # 0xa66ef3a > 25: e8 3f bb ad ff call 0xffffffffffadbb69 > 2a:* 0f 0b ud2 <-- trapping instruction > 2c: eb 92 jmp 0xffffffffffffffc0 > 2e: 48 c7 c0 20 2b e6 90 mov $0xffffffff90e62b20,%rax > 35: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx > 3c: fc ff df > 3f: 48 rex.W > >Code starting with the faulting instruction >=========================================== > 0: 0f 0b ud2 > 2: eb 92 jmp 0xffffffffffffff96 > 4: 48 c7 c0 20 2b e6 90 mov $0xffffffff90e62b20,%rax > b: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx > 12: fc ff df > 15: 48 rex.W >[ 147.348298][ T2016] RSP: 0018:ffffc90001def730 EFLAGS: 00010282 >[ 147.349063][ T2016] RAX: dffffc0000000000 RBX: 1ffff920003bdee7 RCX: 0000000000000001 >[ 147.350021][ T2016] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffffff9059d5a8 >[ 147.354294][ T2016] RBP: ffffc90001def7a0 R08: ffffffff87fd99d0 R09: fffffbfff20b3ab5 >[ 147.355329][ T2016] R10: ffffffff9059d5ab R11: 0000000000000001 R12: ffff888106402c58 >[ 147.356345][ T2016] R13: 0000000000000000 R14: 0000000000000001 R15: ffffc90001def778 >[ 147.357317][ T2016] FS: 0000000000000000(0000) GS:ffffffff904be000(0000) knlGS:0000000000000000 >[ 147.358402][ T2016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >[ 147.359199][ T2016] CR2: 00007fedda922200 CR3: 0000000155129000 CR4: 00000000000406f0 >[ 147.360209][ T2016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >[ 147.361208][ T2016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >[ 147.362220][ T2016] Call Trace: >[ 147.362713][ T2016] <TASK> >[ 147.363178][ T2016] ? show_regs (arch/x86/kernel/dumpstack.c:479) >[ 147.363763][ T2016] ? __warn (kernel/panic.c:748) >[ 147.364366][ T2016] ? __wake_up_klogd (arch/x86/include/asm/preempt.h:94 kernel/printk/printk.c:4525) >[ 147.365015][ T2016] ? __pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089) >[ 147.365689][ T2016] ? report_bug (lib/bug.c:201 lib/bug.c:219) >[ 147.366349][ T2016] ? page_ext_get (include/linux/rcupdate.h:337 include/linux/rcupdate.h:849 mm/page_ext.c:525) >[ 147.366988][ T2016] ? handle_bug (arch/x86/kernel/traps.c:285) >[ 147.367583][ T2016] ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1)) >[ 147.368249][ T2016] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:574) >[ 147.368921][ T2016] ? irq_work_queue (arch/x86/include/asm/atomic.h:23 arch/x86/include/asm/atomic.h:145 include/linux/atomic/atomic-arch-fallback.h:1690 include/linux/atomic/atomic-instrumented.h:954 kernel/irq_work.c:61 kernel/irq_work.c:119) >[ 147.369562][ T2016] ? __pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089) >[ 147.370244][ T2016] ? __alloc_contig_migrate_range (mm/page_alloc.c:1084) >[ 147.371016][ T2016] free_frozen_pages (mm/page_alloc.c:1211 mm/page_alloc.c:2738) >[ 147.371700][ T2016] __free_slab (mm/slub.c:2669) >[ 147.372331][ T2016] free_slab (mm/slub.c:2692) >[ 147.372900][ T2016] free_to_partial_list (mm/slub.c:4414) >[ 147.373569][ T2016] ? qlist_free_all (mm/kasan/quarantine.c:163 mm/kasan/quarantine.c:179) >[ 147.374213][ T2016] __slab_free (mm/slub.c:4534) >[ 147.374819][ T2016] ? __kasan_check_read (mm/kasan/shadow.c:32) >[ 147.375470][ T2016] ? mark_lock (arch/x86/include/asm/bitops.h:227 (discriminator 3) arch/x86/include/asm/bitops.h:239 (discriminator 3) include/asm-generic/bitops/instrumented-non-atomic.h:142 (discriminator 3) kernel/locking/lockdep.c:230 (discriminator 3) kernel/locking/lockdep.c:4729 (discriminator 3)) >[ 147.376062][ T2016] ? mark_held_locks (kernel/locking/lockdep.c:4323) >[ 147.376701][ T2016] ___cache_free (mm/slub.c:4681) >[ 147.377248][ T2016] qlist_free_all (mm/kasan/quarantine.c:174) >[ 147.377795][ T2016] kasan_quarantine_reduce (include/linux/srcu.h:357 mm/kasan/quarantine.c:287) >[ 147.378470][ T2016] __kasan_slab_alloc (mm/kasan/common.c:329) >[ 147.379088][ T2016] kmem_cache_alloc_noprof (include/linux/kasan.h:250 mm/slub.c:4128 mm/slub.c:4177 mm/slub.c:4184) >[ 147.379784][ T2016] getname_flags (include/linux/sched.h:2248 fs/namei.c:139) >[ 147.380436][ T2016] getname (fs/namei.c:224) >[ 147.380969][ T2016] do_sys_openat2 (fs/open.c:1422) >[ 147.381587][ T2016] ? build_open_flags (fs/open.c:1414) >[ 147.382258][ T2016] __x64_sys_openat (fs/open.c:1454) >[ 147.382899][ T2016] ? __ia32_compat_sys_open (fs/open.c:1454) >[ 147.383611][ T2016] x64_sys_call (arch/x86/entry/syscall_64.c:36) >[ 147.384281][ T2016] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) >[ 147.384977][ T2016] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) >[ 147.385724][ T2016] RIP: 0033:0x7fedda9e895d > >The kernel config and materials to reproduce are available at: >https://download.01.org/0day-ci/archive/20250415/202504151659.9b09c785-lkp@intel.com > > > >-- >0-DAY CI Kernel Test Service >https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [linus:master] [alloc_tag] 93d5440ece: WARNING:at_include/linux/alloc_tag.h:#__pgalloc_tag_sub 2025-05-03 11:51 ` David Wang @ 2025-05-05 16:57 ` Suren Baghdasaryan 0 siblings, 0 replies; 3+ messages in thread From: Suren Baghdasaryan @ 2025-05-05 16:57 UTC (permalink / raw) To: David Wang Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Andrew Morton, Shakeel Butt, Kent Overstreet, Minchan Kim, Pasha Tatashin, Peter Zijlstra, Sourav Panda, Steven Rostedt, Vlastimil Babka, Yu Zhao, Zhenhua Huang, linux-mm On Sat, May 3, 2025 at 4:51 AM David Wang <00107082@163.com> wrote: > > Hi, > > I have been running my system with CONFIG_MEM_ALLOC_PROFILING_DEBUG=y for a long while, trying to > reproduce this. Though I have not yet hit the exact call traces, but I got the same warning via a "simpler" > trace: > > [Fri May 2 15:07:00 2025] alloc_tag was not set > [Fri May 2 15:07:00 2025] WARNING: CPU: 0 PID: 677 at > ./include/linux/alloc_tag.h:156 > ./include/linux/alloc_tag.h:154 > ./include/linux/pgalloc_tag.h:182 > mm/page_alloc.c:1163 > ___free_pages mm/page_alloc.c:5072 <---- code[1] below > drivers/net/wireless/intel/iwlwifi/pcie/rx.c:1417 <---code[2] below > iwl_pcie_rx_handle drivers/net/wireless/intel/iwlwifi/pcie/rx.c:1568 iwlwifi > > <other traces are irrelevant...I think> > > code[1]: > 5063 static void ___free_pages(struct page *page, unsigned int order, > 5064 fpi_t fpi_flags) > 5065 { > 5066 /* get PageHead before we drop reference */ > 5067 int head = PageHead(page); > 5068 > 5069 if (put_page_testzero(page)) > 5070 __free_frozen_pages(page, order, fpi_flags); > 5071 else if (!head) { > 5072 pgalloc_tag_sub_pages(page, (1 << order) - 1); <-----at this point, page[0] may already be returned. > 5073 while (order-- > 0) > 5074 __free_frozen_pages(page + (1 << order), order, > 5075 fpi_flags); > 5076 } > 5077 } > > code[2]: > 1415 /* page was stolen from us -- free our reference */ > 1416 if (page_stolen) { > 1417 __free_pages(rxb->page, trans_pcie->rx_page_order); > 1418 rxb->page = NULL; > 1419 } > > > > From those codes above, my guess is: > > Thread1 Thread2 > iwlwifi alloc page of order "0" > some callback use get_page to "steal" the memory. > iwlwifi notice the page is "stolen", > and call __free_pages to release its > reference to the page, and > if (put_page_testzero(page)) failed > call put_page to release its reference, and page's > reference count drop to 0 now, and the page is released > pgalloc_tag_sub_pages is called to > sub "0" page, but the page is already > gone, hence the warning. > > > > It is potentially dangerous to pgalloc_tag_sub_pages a released page. > Kind of feel the pgalloc_tag_sub_pages here should be removed: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5669baf2a6fe..63c160537045 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5069,7 +5069,6 @@ static void ___free_pages(struct page *page, unsigned int order, > if (put_page_testzero(page)) > __free_frozen_pages(page, order, fpi_flags); > else if (!head) { > - pgalloc_tag_sub_pages(page, (1 << order) - 1); > while (order-- > 0) > __free_frozen_pages(page + (1 << order), order, > fpi_flags); > Hi David, As discussed on the patch you posted, let's store the tag before put_page_testzero() and operate on that tag without using the page that might be freed from under us. > > And about the warning in the origianl report by "kernel test robot", it is not the same. > I think there are places where high-order pages are released via several low-order pages, and my understanding > is that only the first page has the tag, but I am not quite sure, correct me if I am wrong... That might happen but the high-order page would be split before its parts are freed. During the split pgalloc_tag_split() will make each split page point to the original tag, so each resulting page should have a valid reference to a tag. > (The whole MM is quite complicated to me.). > And I believe when the lower-order page is released without a tag, a debug warning should follow > The warning is kind of "benign" under those conditions, though > > > Thanks > David > > > > > > > > At 2025-04-15 16:35:47, "kernel test robot" <oliver.sang@intel.com> wrote: > > > > > >Hello, > > > > > > > > > > > > >If you fix the issue in a separate patch/commit (i.e. not just a new version of > >the same patch/commit), kindly add following tags > >| Reported-by: kernel test robot <oliver.sang@intel.com> > >| Closes: https://lore.kernel.org/oe-lkp/202504151659.9b09c785-lkp@intel.com > > > > > >[ 147.337988][ T2016] ------------[ cut here ]------------ > >[ 147.338915][ T2016] alloc_tag was not set > >[ 147.339502][ T2016] WARNING: CPU: 0 PID: 2016 at include/linux/alloc_tag.h:152 __pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089) > >[ 147.341127][ T2016] Modules linked in: > >[ 147.341672][ T2016] CPU: 0 UID: 0 PID: 2016 Comm: grep Tainted: G T 6.14.0-rc6-00062-g93d5440ece3c #1 c08622b3723459177a60d595773689e527750d0d > >[ 147.343295][ T2016] Tainted: [T]=RANDSTRUCT > >[ 147.343867][ T2016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > >[ 147.345236][ T2016] RIP: 0010:__pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089) > >[ 147.345982][ T2016] Code: 41 5e 41 5f 5d c3 49 c7 47 e0 00 00 00 00 80 3d 25 ef 66 0a 00 75 a9 48 c7 c7 e0 91 39 8e c6 05 15 ef 66 0a 01 e8 3f bb ad ff <0f> 0b eb 92 48 c7 c0 20 2b e6 90 48 ba 00 00 00 00 00 fc ff df 48 > >All code > >======== > > 0: 41 5e pop %r14 > > 2: 41 5f pop %r15 > > 4: 5d pop %rbp > > 5: c3 ret > > 6: 49 c7 47 e0 00 00 00 movq $0x0,-0x20(%r15) > > d: 00 > > e: 80 3d 25 ef 66 0a 00 cmpb $0x0,0xa66ef25(%rip) # 0xa66ef3a > > 15: 75 a9 jne 0xffffffffffffffc0 > > 17: 48 c7 c7 e0 91 39 8e mov $0xffffffff8e3991e0,%rdi > > 1e: c6 05 15 ef 66 0a 01 movb $0x1,0xa66ef15(%rip) # 0xa66ef3a > > 25: e8 3f bb ad ff call 0xffffffffffadbb69 > > 2a:* 0f 0b ud2 <-- trapping instruction > > 2c: eb 92 jmp 0xffffffffffffffc0 > > 2e: 48 c7 c0 20 2b e6 90 mov $0xffffffff90e62b20,%rax > > 35: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx > > 3c: fc ff df > > 3f: 48 rex.W > > > >Code starting with the faulting instruction > >=========================================== > > 0: 0f 0b ud2 > > 2: eb 92 jmp 0xffffffffffffff96 > > 4: 48 c7 c0 20 2b e6 90 mov $0xffffffff90e62b20,%rax > > b: 48 ba 00 00 00 00 00 movabs $0xdffffc0000000000,%rdx > > 12: fc ff df > > 15: 48 rex.W > >[ 147.348298][ T2016] RSP: 0018:ffffc90001def730 EFLAGS: 00010282 > >[ 147.349063][ T2016] RAX: dffffc0000000000 RBX: 1ffff920003bdee7 RCX: 0000000000000001 > >[ 147.350021][ T2016] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffffff9059d5a8 > >[ 147.354294][ T2016] RBP: ffffc90001def7a0 R08: ffffffff87fd99d0 R09: fffffbfff20b3ab5 > >[ 147.355329][ T2016] R10: ffffffff9059d5ab R11: 0000000000000001 R12: ffff888106402c58 > >[ 147.356345][ T2016] R13: 0000000000000000 R14: 0000000000000001 R15: ffffc90001def778 > >[ 147.357317][ T2016] FS: 0000000000000000(0000) GS:ffffffff904be000(0000) knlGS:0000000000000000 > >[ 147.358402][ T2016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >[ 147.359199][ T2016] CR2: 00007fedda922200 CR3: 0000000155129000 CR4: 00000000000406f0 > >[ 147.360209][ T2016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >[ 147.361208][ T2016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >[ 147.362220][ T2016] Call Trace: > >[ 147.362713][ T2016] <TASK> > >[ 147.363178][ T2016] ? show_regs (arch/x86/kernel/dumpstack.c:479) > >[ 147.363763][ T2016] ? __warn (kernel/panic.c:748) > >[ 147.364366][ T2016] ? __wake_up_klogd (arch/x86/include/asm/preempt.h:94 kernel/printk/printk.c:4525) > >[ 147.365015][ T2016] ? __pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089) > >[ 147.365689][ T2016] ? report_bug (lib/bug.c:201 lib/bug.c:219) > >[ 147.366349][ T2016] ? page_ext_get (include/linux/rcupdate.h:337 include/linux/rcupdate.h:849 mm/page_ext.c:525) > >[ 147.366988][ T2016] ? handle_bug (arch/x86/kernel/traps.c:285) > >[ 147.367583][ T2016] ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1)) > >[ 147.368249][ T2016] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:574) > >[ 147.368921][ T2016] ? irq_work_queue (arch/x86/include/asm/atomic.h:23 arch/x86/include/asm/atomic.h:145 include/linux/atomic/atomic-arch-fallback.h:1690 include/linux/atomic/atomic-instrumented.h:954 kernel/irq_work.c:61 kernel/irq_work.c:119) > >[ 147.369562][ T2016] ? __pgalloc_tag_sub (include/linux/alloc_tag.h:152 include/linux/alloc_tag.h:195 mm/page_alloc.c:1089) > >[ 147.370244][ T2016] ? __alloc_contig_migrate_range (mm/page_alloc.c:1084) > >[ 147.371016][ T2016] free_frozen_pages (mm/page_alloc.c:1211 mm/page_alloc.c:2738) > >[ 147.371700][ T2016] __free_slab (mm/slub.c:2669) > >[ 147.372331][ T2016] free_slab (mm/slub.c:2692) > >[ 147.372900][ T2016] free_to_partial_list (mm/slub.c:4414) > >[ 147.373569][ T2016] ? qlist_free_all (mm/kasan/quarantine.c:163 mm/kasan/quarantine.c:179) > >[ 147.374213][ T2016] __slab_free (mm/slub.c:4534) > >[ 147.374819][ T2016] ? __kasan_check_read (mm/kasan/shadow.c:32) > >[ 147.375470][ T2016] ? mark_lock (arch/x86/include/asm/bitops.h:227 (discriminator 3) arch/x86/include/asm/bitops.h:239 (discriminator 3) include/asm-generic/bitops/instrumented-non-atomic.h:142 (discriminator 3) kernel/locking/lockdep.c:230 (discriminator 3) kernel/locking/lockdep.c:4729 (discriminator 3)) > >[ 147.376062][ T2016] ? mark_held_locks (kernel/locking/lockdep.c:4323) > >[ 147.376701][ T2016] ___cache_free (mm/slub.c:4681) > >[ 147.377248][ T2016] qlist_free_all (mm/kasan/quarantine.c:174) > >[ 147.377795][ T2016] kasan_quarantine_reduce (include/linux/srcu.h:357 mm/kasan/quarantine.c:287) > >[ 147.378470][ T2016] __kasan_slab_alloc (mm/kasan/common.c:329) > >[ 147.379088][ T2016] kmem_cache_alloc_noprof (include/linux/kasan.h:250 mm/slub.c:4128 mm/slub.c:4177 mm/slub.c:4184) > >[ 147.379784][ T2016] getname_flags (include/linux/sched.h:2248 fs/namei.c:139) > >[ 147.380436][ T2016] getname (fs/namei.c:224) > >[ 147.380969][ T2016] do_sys_openat2 (fs/open.c:1422) > >[ 147.381587][ T2016] ? build_open_flags (fs/open.c:1414) > >[ 147.382258][ T2016] __x64_sys_openat (fs/open.c:1454) > >[ 147.382899][ T2016] ? __ia32_compat_sys_open (fs/open.c:1454) > >[ 147.383611][ T2016] x64_sys_call (arch/x86/entry/syscall_64.c:36) > >[ 147.384281][ T2016] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) > >[ 147.384977][ T2016] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) > >[ 147.385724][ T2016] RIP: 0033:0x7fedda9e895d > > > > >The kernel config and materials to reproduce are available at: > >https://download.01.org/0day-ci/archive/20250415/202504151659.9b09c785-lkp@intel.com > > > > > > > >-- > >0-DAY CI Kernel Test Service > >https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-05-05 16:58 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-04-15 8:35 [linus:master] [alloc_tag] 93d5440ece: WARNING:at_include/linux/alloc_tag.h:#__pgalloc_tag_sub kernel test robot 2025-05-03 11:51 ` David Wang 2025-05-05 16:57 ` [linus:master] " Suren Baghdasaryan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox