On 2/29/24 10:25, Catalin Marinas wrote:
> On Wed, Feb 28, 2024 at 02:14:44PM -0500, Waiman Long wrote:
>> When some error conditions happen (like OOM), some kmemleak functions
>> call printk() to dump out some useful debugging information while holding
>> the kmemleak_lock. This may cause deadlock as the printk() function
>> may need to allocate additional memory leading to a create_object()
>> call acquiring kmemleak_lock again.
>>
>> Fix this deadlock issue by making sure that printk() is only called
>> after releasing the kmemleak_lock.
> I can't say I'm familiar with the printk() code but I always thought it
> uses some ring buffers as it can be called from all kind of contexts and
> allocation is not guaranteed.
>
> If printk() ends up taking kmemleak_lock through the slab allocator, I
> wonder whether we have bigger problems. The lock order is always
> kmemleak_lock -> object->lock but if printk() triggers a callback into
> kmemleak, we can also get object->lock -> kmemleak_lock ordering, so
> another potential deadlock.

object->lock is per object whereas kmemleak_lock is global. When taking 
object->lock and doing a data dump leading to a call that takes the 
kmemlock, it is highly unlikely the it will need to take that particular 
object->lock again. I do agree that lockdep may still warn about it if 
that happens as all the object->lock's are likely to be treated to be in 
the same class.

I should probably clarify in the change log that the lockdep splat is 
actually,

[ 3991.452558] Chain exists of: [ 3991.452559] console_owner -> 
&port->lock --> kmemleak_lock

So if kmemleak calls printk() acquiring either console_owner or 
port->lock. It may cause deadlock.

Cheers, Longman