linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hillf Danton <hdanton@sina.com>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: Ben Greear <greearb@candelatech.com>,
	linux-wireless <linux-wireless@vger.kernel.org>,
	"Korenblit, Miriam Rachel" <miriam.rachel.korenblit@intel.com>,
	linux-mm@kvack.org, Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
Date: Wed,  4 Mar 2026 11:08:34 +0800	[thread overview]
Message-ID: <20260304030835.610-1-hdanton@sina.com> (raw)
In-Reply-To: <35779061f94c2a55bb58dcd619ae91c618509cf4.camel@sipsolutions.net>

On Tue, 03 Mar 2026 12:49:24 +0100 Johannes Berg wrote:
>On Mon, 2026-03-02 at 07:50 -0800, Ben Greear wrote:
>> On 3/2/26 07:38, Johannes Berg wrote:
>> > On Mon, 2026-03-02 at 07:26 -0800, Ben Greear wrote:
>> > > > 
>> > > > Was this with lockdep? If so, it complain about anything?
>> > > > 
>> > > > I'm having a hard time seeing why it would deadlock at all when wifi
>> > > > uses  schedule_work() and therefore the system_percpu_wq, and
>> > > > __lru_add_drain_all() flushes lru_add_drain_work on mm_percpu_wq, and
>> > > > lru_add_and_bh_lrus_drain() doesn't really _seem_ to do anything related
>> > > > to RTNL etc.?
>> > > > 
>> > > > I think we need a real explanation here rather than "if I randomly
>> > > > change this, it no longer appears".
>> > > 
>> > > The path where iwlwifi acquires CMA holds rtnl and/or wiphy locks before
>> > > allocating CMA memory, as expected.
>> > > 
>> > > And the CMA allocation path attempts to flush the work queues in
>> > > at least some cases.
>> > > 
>> > > If there is a work item queued that is trying to grab rtnl and/or wiphy lock
>> > > when CMA attempts to flush, then the flush work cannot complete, so it deadlocks.
>> > > 
>> > > Lockdep doesn't warn about this.
>> > 
>> > It really should, in cases where it can actually happen, I wrote the
>> > code myself for that... Though things have changed since, and the checks
>> > were lost at least once (and re-added), so I suppose it's possible that
>> > they were lost _again_, but the flushing system is far more flexible now
>> > and it's not flushing the same workqueue anyway, so it shouldn't happen.
>> > 
>> > I stand by what I said before, need to show more precisely what depends
>> > on what, and I'm not going to accept a random kthread into this.
>> 
>> My first email on the topic has process stack traces as well as lockdep
>> locks-held printout that points to the deadlock.  I'm not sure what else to offer...please let me know
>> what you'd like to see.
>
> Fair. I don't know, I don't think there's anything that even shows that
> there's a dependency between the two workqueues and the
> "((wq_completion)events_unbound)" and "((wq_completion)events)", and
> there would have to be for it to deadlock this way because of that?
>
Given the locks held [1],

	kworker/1:0/39480	kworker/u32:11/34989
	rtnl_mutex
				&rdev->wiphy.mtx
				__lru_add_drain_all
				  flush_work(&per_cpu(lru_add_drain_work, cpu))
	&rdev->wiphy.mtx

__if__ there is one work item queued __before__ one of the flush targets on
workqueue and it acquires the rtnl mutex, then no deadlock can rise,
because worker-xyz gets off CPU due to failing to take the rtnl lock then
worker-xyz+1 dequeus the flush target and completes it due to nothing
with rtnl. Same applies to the wiphy lock.

BTW any chance for queuing work that acquires rtnl lock on mm_percpu_wq?

[1] Subject: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
https://lore.kernel.org/linux-wireless/fa4e82ee-eb14-3930-c76c-f3bd59c5f258@candelatech.com/


      parent reply	other threads:[~2026-03-04  3:08 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 22:36 Ben Greear
2026-02-27 16:31 ` Ben Greear
2026-03-01 15:38   ` Ben Greear
2026-03-02  8:07     ` Johannes Berg
2026-03-02 15:26       ` Ben Greear
2026-03-02 15:38         ` Johannes Berg
2026-03-02 15:50           ` Ben Greear
2026-03-03 11:49             ` Johannes Berg
2026-03-03 20:52               ` Tejun Heo
2026-03-03 21:03                 ` Johannes Berg
2026-03-03 21:12                 ` Johannes Berg
2026-03-03 21:40                   ` Ben Greear
2026-03-03 21:54                     ` Tejun Heo
2026-03-04  0:02                       ` Ben Greear
2026-03-04  3:08               ` Hillf Danton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260304030835.610-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=greearb@candelatech.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=miriam.rachel.korenblit@intel.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox