hi, Thomas Gleixner, On Fri, Jul 25, 2025 at 05:45:04PM +0200, Thomas Gleixner wrote: > Oliver! > > On Fri, Jul 25 2025 at 17:10, Oliver Sang wrote: > > On Wed, Jul 23, 2025 at 07:22:43PM +0200, Thomas Gleixner wrote: > >> > [ 286.673775][ C97] BUG: sleeping function called from invalid context at kernel/nsproxy.c:233 <---- (1) > >> > [ 286.673784][ C97] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6748, name: oom03 > >> > [ 286.673787][ C97] preempt_count: 7ffffffe, expected: 0 > >> > >> Ooops. That's a corrupted preempt counter, which has underflown twice. > >> > >> Can you please enable CONFIG_DEBUG_PREEMPT, so we can see where this > >> happens? > > > > after enable CONFIG_DEBUG_PREEMPT, the config is as attached > > config-6.16.0-rc5-00002-g56180dd20c19 > > Thank you for trying, but I just realized that it tells us only when it > underflows, but we don't see where the actual extra decrement happens > before that. > > Can you please enable CONFIG_PREEMPT_TRACER and add > > 'trace_event=preemptirq:preempt_disable,preemptirq:preempt_enable ftrace_dump_on_oops' > > to the kernel command line. The latter will dump the recorded > preempt_enable/disable events and we can pinpoint the function which is > responsible for that. the config with CONFIG_PREEMPT_TRACER enabled is attached as config-6.16.0-rc5-00002-g56180dd20c19 by this config, we still see random WARNING:at_kernel/sched/core.c:#preempt_count_sub ========================================================================================= compiler/kconfig/rootfs/tbox_group/test/testcase: gcc-12/x86_64-rhel-9.4-ltp_+CONFIG_PREEMPT_TRACER/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/mm-oom/ltp a255b78d14324f8a 56180dd20c19e5b0fa34822997a ---------------- --------------------------- fail:runs %reproduction fail:runs | | | :20 60% 12:20 dmesg.RIP:preempt_count_sub :20 60% 12:20 dmesg.WARNING:at_kernel/sched/core.c:#preempt_count_sub but bot seems not be able to capture other new dmesg stats. I attached one dmesg as example dmesg-CONFIG_PREEMPT_TRACER.xz since there are other 11 instances, if there should be some pattern for the dump you mentioned, could you tell me? I will search in other instances if nothing in the dmesg I attached. thanks Decompressing Linux... No EFI environment detected. Parsing ELF... No relocation needed... done. Booting the kernel (entry_offset: 0x0000000003103444). [ 0.000000][ T0] Linux version 6.16.0-rc5-00002-g56180dd20c19 (kbuild@788d51ef74a4) (gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Mon Jul 28 14:54:17 CST 2025 [ 0.000000][ T0] Command line: ip=::::lkp-skl-fpga01::dhcp root=/dev/ram0 RESULT_ROOT=/result/ltp/mm-oom/lkp-skl-fpga01/debian-12-x86_64-20240206.cgz/x86_64-rhel-9.4-ltp_+CONFIG_PREEMPT_TRACER/gcc-12/56180dd20c19e5b0fa34822997a9ac66b517e7b3/16 BOOT_IMAGE=/pkg/linux/x86_64-rhel-9.4-ltp_+CONFIG_PREEMPT_TRACER/gcc-12/56180dd20c19e5b0fa34822997a9ac66b517e7b3/vmlinuz-6.16.0-rc5-00002-g56180dd20c19 branch=tip/locking/futex job=/lkp/jobs/scheduled/lkp-skl-fpga01/ltp-mm-oom-debian-12-x86_64-20240206.cgz-56180dd20c19-20250728-4057047-16v2vnz-7.yaml user=lkp ARCH=x86_64 kconfig=x86_64-rhel-9.4-ltp_+CONFIG_PREEMPT_TRACER commit=56180dd20c19e5b0fa34822997a9ac66b517e7b3 trace_event=preemptirq:preempt_disable,preemptirq:preempt_enable ftrace_dump_on_oops .... .... [ 409.832569][ T7668] ------------[ cut here ]------------ [ 409.855676][ T7668] DEBUG_LOCKS_WARN_ON(val > preempt_count()) [ 409.855687][ T7668] WARNING: CPU: 54 PID: 7668 at kernel/sched/core.c:5903 preempt_count_sub+0xdf/0x1b0 [ 409.871916][ T7668] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac skx_edac_common nfit libnvdimm btrfs blake2b_generic xor zstd_compress raid6_pq x86_pkg_temp_thermal intel_powerclamp sd_mod sg coretemp kvm_intel irdma kvm ice irqbypass snd_pcm ghash_clmulni_intel gnss snd_timer ast rapl ib_uverbs snd ahci acpi_power_meter drm_client_lib intel_cstate ipmi_ssif libahci nvme mei_me soundcore drm_shmem_helper i2c_i801 intel_uncore ioatdma ib_core nvme_core pcspkr libata mei drm_kms_helper i2c_smbus lpc_ich intel_pch_thermal wmi dca ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler acpi_pad joydev binfmt_misc drm fuse loop dm_mod ip_tables [ 409.936794][ T7668] CPU: 54 UID: 0 PID: 7668 Comm: oom05 Not tainted 6.16.0-rc5-00002-g56180dd20c19 #1 PREEMPT(voluntary) [ 409.948414][ T7668] RIP: 0010:preempt_count_sub+0xdf/0x1b0 [ 409.954478][ T7668] Code: 11 38 d0 7c 08 84 d2 0f 85 c4 00 00 00 8b 15 08 b6 e2 04 85 d2 75 b6 48 c7 c6 60 1e 2d 84 48 c7 c7 a0 1e 2d 84 e8 a1 f1 f2 ff <0f> 0b eb 9f 4c 8b 65 08 4c 89 e7 e8 11 4f 0e 00 85 c0 74 0a 31 ff [ 409.975094][ T7668] RSP: 0018:ffffc9003733f938 EFLAGS: 00010286 [ 409.981634][ T7668] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 409.990088][ T7668] RDX: 0000000000000002 RSI: 0000000000000004 RDI: 0000000000000001 [ 409.998542][ T7668] RBP: ffffc9003733f948 R08: 0000000000000001 R09: ffffed12f53c5839 [ 410.006983][ T7668] R10: ffff8897a9e2c1cb R11: 0000000000000001 R12: ffff888187f5a1c0 [ 410.015428][ T7668] R13: ffffc9003733fbd0 R14: 00000000003d0f00 R15: 0000000000001df4 [ 410.023855][ T7668] FS: 00007f33a7dbd740(0000) GS:ffff889822a6e000(0000) knlGS:0000000000000000 [ 410.033269][ T7668] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 410.040336][ T7668] CR2: 00007f279d5ffd58 CR3: 000000016ccaa001 CR4: 00000000007726f0 [ 410.048798][ T7668] PKRU: 55555554 [ 410.052841][ T7668] Call Trace: [ 410.056636][ T7668] [ 410.060076][ T7668] _raw_spin_unlock+0x19/0x70 [ 410.065274][ T7668] copy_process+0x4244/0x4ab0 [ 410.070464][ T7668] ? 0xffffffff81000000 [ 410.075139][ T7668] ? __pfx_copy_process+0x10/0x10 [ 410.080681][ T7668] ? _inline_copy_from_user+0x4f/0xb0 [ 410.086572][ T7668] ? copy_clone_args_from_user+0xff/0x670 [ 410.092816][ T7668] ? preempt_count_sub+0x109/0x1b0 [ 410.098458][ T7668] kernel_clone+0xb6/0x7b0 [ 410.103406][ T7668] ? __pfx_kernel_clone+0x10/0x10 [ 410.108959][ T7668] ? _copy_to_user+0x5c/0x70 [ 410.114081][ T7668] ? __x64_sys_rt_sigprocmask+0x183/0x230 [ 410.120346][ T7668] __do_sys_clone3+0x150/0x1b0 [ 410.125648][ T7668] ? __pfx___do_sys_clone3+0x10/0x10 [ 410.131479][ T7668] ? trace_event_buffer_commit+0x191/0x470 [ 410.137824][ T7668] ? trace_event_buffer_reserve+0x200/0x330 [ 410.144248][ T7668] ? trace_event_buffer_commit+0x191/0x470 [ 410.150566][ T7668] ? trace_event_raw_event_preemptirq_template+0x12c/0x1b0 [ 410.158279][ T7668] ? __pfx_trace_event_raw_event_preemptirq_template+0x10/0x10 [ 410.166337][ T7668] ? trace_event_raw_event_preemptirq_template+0x12c/0x1b0 [ 410.174029][ T7668] do_syscall_64+0x7f/0x2f0 [ 410.179022][ T7668] ? count_memcg_events+0x2a7/0x470 [ 410.184693][ T7668] ? count_memcg_events+0x2a7/0x470 [ 410.190351][ T7668] ? trace_preempt_on+0x8c/0xb0 [ 410.195655][ T7668] ? count_memcg_events+0x2a7/0x470 [ 410.201305][ T7668] ? preempt_count_sub+0x109/0x1b0 [ 410.206857][ T7668] ? count_memcg_events+0x2a7/0x470 [ 410.212493][ T7668] ? handle_mm_fault+0x3ff/0x6f0 [ 410.217843][ T7668] ? do_user_addr_fault+0x91f/0xef0 [ 410.223455][ T7668] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 410.229747][ T7668] RIP: 0033:0x7f33a7ec9889 [ 410.234550][ T7668] Code: 31 ed e9 44 ff ff ff e8 25 e9 00 00 0f 1f 44 00 00 b8 ea ff ff ff 48 85 ff 74 2c 48 85 d2 74 27 49 89 c8 b8 b3 01 00 00 0f 05 <48> 85 c0 7c 18 74 01 c3 31 ed 48 83 e4 f0 4c 89 c7 ff d2 48 89 c7 [ 410.255034][ T7668] RSP: 002b:00007ffc670703f8 EFLAGS: 00000206 ORIG_RAX: 00000000000001b3 [ 410.263824][ T7668] RAX: ffffffffffffffda RBX: 00007f33a7e48ef0 RCX: 00007f33a7ec9889 [ 410.272185][ T7668] RDX: 00007f33a7e48ef0 RSI: 0000000000000058 RDI: 00007ffc67070440 [ 410.280503][ T7668] RBP: 00007f279d5ff6c0 R08: 00007f279d5ff6c0 R09: 00007ffc67070537 [ 410.288823][ T7668] R10: 0000000000000008 R11: 0000000000000206 R12: ffffffffffffff78 [ 410.297149][ T7668] R13: 0000000000000000 R14: 00007ffc67070440 R15: 00007f279cdff000 [ 410.305468][ T7668] [ 410.308836][ T7668] ---[ end trace 0000000000000000 ]--- [ 410.337739][ T1354] oom.h:181: TPASS: victim signalled: (9) SIGKILL [ 410.337753][ T1354] > > Thanks, > > tglx > > >