* Re: [syzbot] [kernel?] WARNING in flush_cpu_slab
[not found] ` <2149ee23-5321-4422-808f-e6a9046662fc@suse.cz>
@ 2024-05-23 22:32 ` Thomas Gleixner
2024-05-24 8:02 ` Vlastimil Babka
2024-05-24 6:43 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 3+ messages in thread
From: Thomas Gleixner @ 2024-05-23 22:32 UTC (permalink / raw)
To: Vlastimil Babka, syzbot, bp, dave.hansen, hpa, linux-kernel,
mingo, syzkaller-bugs, x86, linux-mm, Sebastian Andrzej Siewior,
Tejun Heo, Lai Jiangshan
On Thu, May 23 2024 at 23:03, Vlastimil Babka wrote:
> On 5/23/24 12:36 PM, Thomas Gleixner wrote:
>>> ------------[ cut here ]------------
>>> DEBUG_LOCKS_WARN_ON(l->owner)
>>> WARNING: CPU: 3 PID: 5221 at include/linux/local_lock_internal.h:30 local_lock_acquire include/linux/local_lock_internal.h:30 [inline]
>>> WARNING: CPU: 3 PID: 5221 at include/linux/local_lock_internal.h:30 flush_slab mm/slub.c:3088 [inline]
>>> WARNING: CPU: 3 PID: 5221 at include/linux/local_lock_internal.h:30 flush_cpu_slab+0x37f/0x410 mm/slub.c:3146
>
> I'm puzzled by this. We use local_lock_irqsave() on !PREEMPT_RT everywhere.
> IIUC this warning says we did the irqsave() and then found out somebody else
> already set the owner? But that means they also did that irqsave() and set
> themselves as l->owner. Does that mey there would be a spurious irq enable
> that didn't go through local_unlock_irqrestore()?
>
> Also this particular stack is from the work, which is scheduled by
> queue_work_on() in flush_all_cpus_locked(), which also has a
> lockdep_assert_cpus_held() so it should fullfill the "the caller must ensure
> the cpu doesn't go away" property. But I think even if this ended up on the
> wrong cpu (for the full duration or migrated while processing the work item)
> somehow, it wouldn't be able to cause such warning, but rather corrupt
> something else
Indeed. There is another report which makes no sense either:
https://lore.kernel.org/lkml/000000000000fa09d906191c3ee5@google.com
Both look like data corropution issues caused by whatever...
Thanks,
tglx
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [syzbot] [kernel?] WARNING in flush_cpu_slab
[not found] ` <2149ee23-5321-4422-808f-e6a9046662fc@suse.cz>
2024-05-23 22:32 ` [syzbot] [kernel?] WARNING in flush_cpu_slab Thomas Gleixner
@ 2024-05-24 6:43 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-05-24 6:43 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Thomas Gleixner, syzbot, bp, dave.hansen, hpa, linux-kernel,
mingo, syzkaller-bugs, x86, linux-mm, Tejun Heo, Lai Jiangshan
On 2024-05-23 23:03:52 [+0200], Vlastimil Babka wrote:
> I'm puzzled by this. We use local_lock_irqsave() on !PREEMPT_RT everywhere.
> IIUC this warning says we did the irqsave() and then found out somebody else
> already set the owner? But that means they also did that irqsave() and set
> themselves as l->owner. Does that mey there would be a spurious irq enable
> that didn't go through local_unlock_irqrestore()?
correct.
>
> Also this particular stack is from the work, which is scheduled by
> queue_work_on() in flush_all_cpus_locked(), which also has a
> lockdep_assert_cpus_held() so it should fullfill the "the caller must ensure
> the cpu doesn't go away" property. But I think even if this ended up on the
> wrong cpu (for the full duration or migrated while processing the work item)
> somehow, it wouldn't be able to cause such warning, but rather corrupt
> something else
Based on
> >> CPU: 3 PID: 5221 Comm: kworker/3:3 Not tainted 6.9.0-syzkaller-10713-g2a8120d7b482 #0
the code was invoked on CPU3 and the kworker was made for CPU3. This is
all fine. All access for the lock in question is within a few lines so
there is no unbalance lock/ unlock or IRQ-unlock which could explain it.
Sebastian
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [syzbot] [kernel?] WARNING in flush_cpu_slab
2024-05-23 22:32 ` [syzbot] [kernel?] WARNING in flush_cpu_slab Thomas Gleixner
@ 2024-05-24 8:02 ` Vlastimil Babka
0 siblings, 0 replies; 3+ messages in thread
From: Vlastimil Babka @ 2024-05-24 8:02 UTC (permalink / raw)
To: Thomas Gleixner, syzbot, bp, dave.hansen, hpa, linux-kernel,
mingo, syzkaller-bugs, x86, linux-mm, Sebastian Andrzej Siewior,
Tejun Heo, Lai Jiangshan, Dennis Zhou
On 5/24/24 12:32 AM, Thomas Gleixner wrote:
> On Thu, May 23 2024 at 23:03, Vlastimil Babka wrote:
>> On 5/23/24 12:36 PM, Thomas Gleixner wrote:
>>>> ------------[ cut here ]------------
>>>> DEBUG_LOCKS_WARN_ON(l->owner)
>>>> WARNING: CPU: 3 PID: 5221 at include/linux/local_lock_internal.h:30 local_lock_acquire include/linux/local_lock_internal.h:30 [inline]
>>>> WARNING: CPU: 3 PID: 5221 at include/linux/local_lock_internal.h:30 flush_slab mm/slub.c:3088 [inline]
>>>> WARNING: CPU: 3 PID: 5221 at include/linux/local_lock_internal.h:30 flush_cpu_slab+0x37f/0x410 mm/slub.c:3146
>>
>> I'm puzzled by this. We use local_lock_irqsave() on !PREEMPT_RT everywhere.
>> IIUC this warning says we did the irqsave() and then found out somebody else
>> already set the owner? But that means they also did that irqsave() and set
>> themselves as l->owner. Does that mey there would be a spurious irq enable
>> that didn't go through local_unlock_irqrestore()?
>>
>> Also this particular stack is from the work, which is scheduled by
>> queue_work_on() in flush_all_cpus_locked(), which also has a
>> lockdep_assert_cpus_held() so it should fullfill the "the caller must ensure
>> the cpu doesn't go away" property. But I think even if this ended up on the
>> wrong cpu (for the full duration or migrated while processing the work item)
>> somehow, it wouldn't be able to cause such warning, but rather corrupt
>> something else
>
> Indeed. There is another report which makes no sense either:
>
> https://lore.kernel.org/lkml/000000000000fa09d906191c3ee5@google.com
That looks like slab->next which should contain a valid pointer or NULL,
contains 0x13.
slab->next is initialized in put_cpu_partial() from s->cpu_slab->partial
Here we have corruption inside s->cpu_slab->list_lock
> Both look like data corropution issues caused by whatever...
s->cpu_slab is percpu allocation so possibly another percpu alloc user has a
buffer overflow?
> Thanks,
>
> tglx
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-05-24 8:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <0000000000008c9d27061915ca9c@google.com>
[not found] ` <87v834g6hn.ffs@tglx>
[not found] ` <2149ee23-5321-4422-808f-e6a9046662fc@suse.cz>
2024-05-23 22:32 ` [syzbot] [kernel?] WARNING in flush_cpu_slab Thomas Gleixner
2024-05-24 8:02 ` Vlastimil Babka
2024-05-24 6:43 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox