From: Chen Ridong <chenridong@huaweicloud.com>
To: Frederic Weisbecker <frederic@kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Cc: "Michal Koutný" <mkoutny@suse.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Chen Ridong" <chenridong@huawei.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David S . Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Gabriele Monaco" <gmonaco@redhat.com>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Jens Axboe" <axboe@kernel.dk>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Lai Jiangshan" <jiangshanlai@gmail.com>,
"Marco Crivellari" <marco.crivellari@suse.com>,
"Michal Hocko" <mhocko@suse.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Paolo Abeni" <pabeni@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Phil Auld" <pauld@redhat.com>,
"Rafael J . Wysocki" <rafael@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Shakeel Butt" <shakeel.butt@linux.dev>,
"Simon Horman" <horms@kernel.org>, "Tejun Heo" <tj@kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Waiman Long" <longman@redhat.com>,
"Will Deacon" <will@kernel.org>,
cgroups@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-block@vger.kernel.org, linux-mm@kvack.org,
linux-pci@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 17/33] PCI: Flush PCI probe workqueue on cpuset isolated partition change
Date: Fri, 26 Dec 2025 16:48:10 +0800 [thread overview]
Message-ID: <c724aac7-5647-4253-bf7b-4ea92ea5d167@huaweicloud.com> (raw)
In-Reply-To: <20251224134520.33231-18-frederic@kernel.org>
On 2025/12/24 21:45, Frederic Weisbecker wrote:
> The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In
> order to synchronize against PCI probe works and make sure that no
> asynchronous probing is still pending or executing on a newly isolated
> CPU, the housekeeping subsystem must flush the PCI probe works.
>
> However the PCI probe works can't be flushed easily since they are
> queued to the main per-CPU workqueue pool.
>
> Solve this with creating a PCI probe-specific pool and provide and use
> the appropriate flushing API.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
> drivers/pci/pci-driver.c | 17 ++++++++++++++++-
> include/linux/pci.h | 3 +++
> kernel/sched/isolation.c | 2 ++
> 3 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 786d6ce40999..d87f781e5ce9 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -337,6 +337,8 @@ static int local_pci_probe(struct drv_dev_and_id *ddi)
> return 0;
> }
>
> +static struct workqueue_struct *pci_probe_wq;
> +
> struct pci_probe_arg {
> struct drv_dev_and_id *ddi;
> struct work_struct work;
> @@ -407,7 +409,11 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> cpu = cpumask_any_and(cpumask_of_node(node),
> wq_domain_mask);
> if (cpu < nr_cpu_ids) {
> - schedule_work_on(cpu, &arg.work);
> + struct workqueue_struct *wq = pci_probe_wq;
> +
> + if (WARN_ON_ONCE(!wq))
> + wq = system_percpu_wq;
> + queue_work_on(cpu, wq, &arg.work);
> rcu_read_unlock();
> flush_work(&arg.work);
> error = arg.ret;
> @@ -425,6 +431,11 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> return error;
> }
>
> +void pci_probe_flush_workqueue(void)
> +{
> + flush_workqueue(pci_probe_wq);
> +}
> +
> /**
> * __pci_device_probe - check if a driver wants to claim a specific PCI device
> * @drv: driver to call to check if it wants the PCI device
> @@ -1762,6 +1773,10 @@ static int __init pci_driver_init(void)
> {
> int ret;
>
> + pci_probe_wq = alloc_workqueue("sync_wq", WQ_PERCPU, 0);
> + if (!pci_probe_wq)
> + return -ENOMEM;
> +
> ret = bus_register(&pci_bus_type);
> if (ret)
> return ret;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 864775651c6f..f14f467e50de 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1206,6 +1206,7 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
> struct pci_ops *ops, void *sysdata,
> struct list_head *resources);
> int pci_host_probe(struct pci_host_bridge *bridge);
> +void pci_probe_flush_workqueue(void);
> int pci_bus_insert_busn_res(struct pci_bus *b, int bus, int busmax);
> int pci_bus_update_busn_res_end(struct pci_bus *b, int busmax);
> void pci_bus_release_busn_res(struct pci_bus *b);
> @@ -2079,6 +2080,8 @@ static inline int pci_has_flag(int flag) { return 0; }
> _PCI_NOP_ALL(read, *)
> _PCI_NOP_ALL(write,)
>
> +static inline void pci_probe_flush_workqueue(void) { }
> +
> static inline struct pci_dev *pci_get_device(unsigned int vendor,
> unsigned int device,
> struct pci_dev *from)
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 8aac3c9f7c7f..7dbe037ea8df 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -8,6 +8,7 @@
> *
> */
> #include <linux/sched/isolation.h>
> +#include <linux/pci.h>
> #include "sched.h"
>
> enum hk_flags {
> @@ -145,6 +146,7 @@ int housekeeping_update(struct cpumask *isol_mask, enum hk_type type)
>
> synchronize_rcu();
>
> + pci_probe_flush_workqueue();
> mem_cgroup_flush_workqueue();
> vmstat_flush_workqueue();
>
I am concerned that this flush work may slow down writes to the cpuset interface. I am not sure how
significant the impact will be.
I'm concerned about potential deadlock risks. While preliminary investigation hasn't uncovered any
issues, we must ensure that the cpu write lock is not held during the work(writing cpuset interface
needs cpu read lock).
--
Best regards,
Ridong
next prev parent reply other threads:[~2025-12-26 8:48 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-24 13:44 [PATCH 00/33 v5] cpuset/isolation: Honour kthreads preferred affinity Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 01/33] PCI: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2025-12-29 3:23 ` Zhang Qiao
2025-12-29 3:53 ` Waiman Long
2025-12-31 13:18 ` Frederic Weisbecker
2025-12-30 22:38 ` Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 02/33] cpu: Revert "cpu/hotplug: Prevent self deadlock on CPU hot-unplug" Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2025-12-26 23:56 ` Tejun Heo
2025-12-24 13:44 ` [PATCH 04/33] mm: vmstat: " Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 05/33] sched/isolation: Save boot defined domain flags Frederic Weisbecker
2025-12-25 22:27 ` Waiman Long
2025-12-31 13:45 ` Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 06/33] cpuset: Convert boot_hk_cpus to use HK_TYPE_DOMAIN_BOOT Frederic Weisbecker
2025-12-25 22:31 ` Waiman Long
2025-12-24 13:44 ` [PATCH 07/33] driver core: cpu: Convert /sys/devices/system/cpu/isolated " Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 08/33] net: Keep ignoring isolated cpuset change Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 09/33] block: Protect against concurrent " Frederic Weisbecker
2025-12-30 0:37 ` Jens Axboe
2025-12-31 14:02 ` Frederic Weisbecker
2025-12-31 15:30 ` Jens Axboe
2025-12-24 13:44 ` [PATCH 10/33] timers/migration: Prevent from lockdep false positive warning Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 11/33] cpu: Provide lockdep check for CPU hotplug lock write-held Frederic Weisbecker
2025-12-24 13:44 ` [PATCH 12/33] cpuset: Provide lockdep check for cpuset lock held Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 13/33] sched/isolation: Convert housekeeping cpumasks to rcu pointers Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 14/33] cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset Frederic Weisbecker
2025-12-26 2:24 ` Waiman Long
2025-12-26 3:20 ` Waiman Long
2025-12-26 8:08 ` Chen Ridong
2025-12-31 14:21 ` Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 15/33] sched/isolation: Flush memcg workqueues on cpuset isolated partition change Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 16/33] sched/isolation: Flush vmstat " Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 17/33] PCI: Flush PCI probe workqueue " Frederic Weisbecker
2025-12-26 8:48 ` Chen Ridong [this message]
2025-12-31 14:37 ` Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 18/33] cpuset: Propagate cpuset isolation update to workqueue through housekeeping Frederic Weisbecker
2025-12-26 20:31 ` Waiman Long
2025-12-27 0:18 ` Tejun Heo
2025-12-24 13:45 ` [PATCH 19/33] cpuset: Propagate cpuset isolation update to timers " Frederic Weisbecker
2025-12-26 20:40 ` Waiman Long
2025-12-31 15:53 ` Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 20/33] timers/migration: Remove superfluous cpuset isolation test Frederic Weisbecker
2025-12-26 20:45 ` Waiman Long
2025-12-24 13:45 ` [PATCH 21/33] cpuset: Remove cpuset_cpu_is_isolated() Frederic Weisbecker
2025-12-26 20:48 ` Waiman Long
2025-12-24 13:45 ` [PATCH 22/33] sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated() Frederic Weisbecker
2025-12-26 21:26 ` Waiman Long
2025-12-24 13:45 ` [PATCH 23/33] PCI: Remove superfluous HK_TYPE_WQ check Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 24/33] kthread: Refine naming of affinity related fields Frederic Weisbecker
2025-12-26 21:37 ` Waiman Long
2025-12-24 13:45 ` [PATCH 25/33] kthread: Include unbound kthreads in the managed affinity list Frederic Weisbecker
2025-12-26 22:11 ` Waiman Long
2025-12-24 13:45 ` [PATCH 26/33] kthread: Include kthreadd to " Frederic Weisbecker
2025-12-26 22:13 ` Waiman Long
2025-12-24 13:45 ` [PATCH 27/33] kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management Frederic Weisbecker
2025-12-26 22:16 ` Waiman Long
2025-12-24 13:45 ` [PATCH 28/33] sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN Frederic Weisbecker
2025-12-26 23:08 ` Waiman Long
2025-12-24 13:45 ` [PATCH 29/33] sched/arm64: Move fallback task " Frederic Weisbecker
2025-12-26 23:46 ` Waiman Long
2025-12-24 13:45 ` [PATCH 30/33] kthread: Honour kthreads preferred affinity after cpuset changes Frederic Weisbecker
2025-12-26 23:59 ` Waiman Long
2025-12-24 13:45 ` [PATCH 31/33] kthread: Comment on the purpose and placement of kthread_affine_node() call Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 32/33] kthread: Document kthread_affine_preferred() Frederic Weisbecker
2025-12-24 13:45 ` [PATCH 33/33] doc: Add housekeeping documentation Frederic Weisbecker
2025-12-27 0:39 ` Waiman Long
2025-12-31 15:25 ` Frederic Weisbecker
2025-12-31 17:35 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c724aac7-5647-4253-bf7b-4ea92ea5d167@huaweicloud.com \
--to=chenridong@huaweicloud.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=catalin.marinas@arm.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=dakr@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=frederic@kernel.org \
--cc=gmonaco@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=horms@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
--cc=longman@redhat.com \
--cc=marco.crivellari@suse.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox