* [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup
@ 2025-02-05 20:09 Mateusz Guzik
2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
To: ebiederm, oleg
Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik
The clone side contends against exit side in a way which avoidably
exacerbates the problem by the latter waiting on locks held by the
former while holding the tasklist_lock.
Whacking this for both add_device_randomness and pids allocation gives
me a 15% speed up for thread creation/destruction in a 24-core vm.
The random patch is worth about 4%.
nothing blew up with lockdep, lightly tested so far
Bench (plop into will-it-scale):
$ cat tests/threadspawn1.c
char *testcase_description = "Thread creation and teardown";
static void *worker(void *arg)
{
return (NULL);
}
void testcase(unsigned long long *iterations, unsigned long nr)
{
pthread_t thread;
int error;
while (1) {
error = pthread_create(&thread, NULL, worker, NULL);
assert(error == 0);
error = pthread_join(thread, NULL);
assert(error == 0);
(*iterations)++;
}
}
v5:
- whack scripts/selinux/genheaders/genheaders which accidentally got in
- rebased on next-20250205
v4:
- justify moving get_pid in the commit message with a one-liner
- drop the tty unref patch -- it is completely optional and Oleg has his
own variant
- add the ACK by Oleg
v3:
- keep procfs flush where it was, instead hoist get_pid outside of the
lock
- make detach_pid et al accept an array argument of pids to populate
- sprinkle asserts
- drop irq trips around pidmap_lock
- move tty unref outside of tasklist_lock
Mateusz Guzik (5):
exit: perform add_device_randomness() without tasklist_lock
exit: hoist get_pid() in release_task() outside of tasklist_lock
pid: sprinkle tasklist_lock asserts
pid: perform free_pid() calls outside of tasklist_lock
pid: drop irq disablement around pidmap_lock
include/linux/pid.h | 7 ++--
kernel/exit.c | 36 +++++++++++++-------
kernel/pid.c | 82 +++++++++++++++++++++++++--------------------
kernel/sys.c | 14 +++++---
4 files changed, 82 insertions(+), 57 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock 2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik @ 2025-02-05 20:09 ` Mateusz Guzik 2025-02-05 20:56 ` Oleg Nesterov 2025-02-05 20:09 ` [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock Mateusz Guzik ` (4 subsequent siblings) 5 siblings, 1 reply; 12+ messages in thread From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw) To: ebiederm, oleg Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik Parallel calls to add_device_randomness() contend on their own. The clone side aleady runs outside of tasklist_lock, which in turn means any caller on the exit side extends the tasklist_lock hold time while contending on the random-private lock. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- kernel/exit.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 3485e5fc499e..c79b41509cd3 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -174,9 +174,6 @@ static void __exit_signal(struct task_struct *tsk) sig->curr_target = next_thread(tsk); } - add_device_randomness((const void*) &tsk->se.sum_exec_runtime, - sizeof(unsigned long long)); - /* * Accumulate here the counters for all threads as they die. We could * skip the group leader because it is the last user of signal_struct, @@ -278,6 +275,9 @@ void release_task(struct task_struct *p) write_unlock_irq(&tasklist_lock); proc_flush_pid(thread_pid); put_pid(thread_pid); + add_device_randomness(&p->se.sum_exec_runtime, + sizeof(p->se.sum_exec_runtime)); + free_pids(post.pids); release_thread(p); put_task_struct_rcu_user(p); -- 2.43.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock 2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik @ 2025-02-05 20:56 ` Oleg Nesterov 2025-02-06 10:40 ` Christian Brauner 0 siblings, 1 reply; 12+ messages in thread From: Oleg Nesterov @ 2025-02-05 20:56 UTC (permalink / raw) To: Mateusz Guzik Cc: ebiederm, brauner, akpm, Liam.Howlett, linux-mm, linux-kernel On 02/05, Mateusz Guzik wrote: > > Parallel calls to add_device_randomness() contend on their own. ... > + add_device_randomness(&p->se.sum_exec_runtime, > + sizeof(p->se.sum_exec_runtime)); OK, but > + free_pids(post.pids); wait... free_pids() comes later in 4/5 ? Oleg. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock 2025-02-05 20:56 ` Oleg Nesterov @ 2025-02-06 10:40 ` Christian Brauner 0 siblings, 0 replies; 12+ messages in thread From: Christian Brauner @ 2025-02-06 10:40 UTC (permalink / raw) To: Oleg Nesterov Cc: Mateusz Guzik, ebiederm, akpm, Liam.Howlett, linux-mm, linux-kernel On Wed, Feb 05, 2025 at 09:56:56PM +0100, Oleg Nesterov wrote: > On 02/05, Mateusz Guzik wrote: > > > > Parallel calls to add_device_randomness() contend on their own. > ... > > + add_device_randomness(&p->se.sum_exec_runtime, > > + sizeof(p->se.sum_exec_runtime)); > > OK, but > > > + free_pids(post.pids); > > wait... free_pids() comes later in 4/5 ? Yes, that seems to be an accidental leftover. A git rebase -S -i v6.14-rc1 -x "make -j7 O=build.v1/ kernel/" caught this right away. Removed. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock 2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik 2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik @ 2025-02-05 20:09 ` Mateusz Guzik 2025-02-05 20:09 ` [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts Mateusz Guzik ` (3 subsequent siblings) 5 siblings, 0 replies; 12+ messages in thread From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw) To: ebiederm, oleg Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik reduces hold time as get_pid() contains an atomic Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- kernel/exit.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/exit.c b/kernel/exit.c index c79b41509cd3..b5c0cbc6bdfb 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -248,9 +248,10 @@ void release_task(struct task_struct *p) cgroup_release(p); + thread_pid = get_pid(p->thread_pid); + write_lock_irq(&tasklist_lock); ptrace_release_task(p); - thread_pid = get_pid(p->thread_pid); __exit_signal(p); /* -- 2.43.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts 2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik 2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik 2025-02-05 20:09 ` [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock Mateusz Guzik @ 2025-02-05 20:09 ` Mateusz Guzik 2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik ` (2 subsequent siblings) 5 siblings, 0 replies; 12+ messages in thread From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw) To: ebiederm, oleg Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- kernel/pid.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/kernel/pid.c b/kernel/pid.c index 924084713be8..2ae872f689a7 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -339,17 +339,23 @@ static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type) */ void attach_pid(struct task_struct *task, enum pid_type type) { - struct pid *pid = *task_pid_ptr(task, type); + struct pid *pid; + + lockdep_assert_held_write(&tasklist_lock); + + pid = *task_pid_ptr(task, type); hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]); } static void __change_pid(struct task_struct *task, enum pid_type type, struct pid *new) { - struct pid **pid_ptr = task_pid_ptr(task, type); - struct pid *pid; + struct pid **pid_ptr, *pid; int tmp; + lockdep_assert_held_write(&tasklist_lock); + + pid_ptr = task_pid_ptr(task, type); pid = *pid_ptr; hlist_del_rcu(&task->pid_links[type]); @@ -386,6 +392,8 @@ void exchange_tids(struct task_struct *left, struct task_struct *right) struct hlist_head *head1 = &pid1->tasks[PIDTYPE_PID]; struct hlist_head *head2 = &pid2->tasks[PIDTYPE_PID]; + lockdep_assert_held_write(&tasklist_lock); + /* Swap the single entry tid lists */ hlists_swap_heads_rcu(head1, head2); @@ -403,6 +411,7 @@ void transfer_pid(struct task_struct *old, struct task_struct *new, enum pid_type type) { WARN_ON_ONCE(type == PIDTYPE_PID); + lockdep_assert_held_write(&tasklist_lock); hlist_replace_rcu(&old->pid_links[type], &new->pid_links[type]); } -- 2.43.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock 2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik ` (2 preceding siblings ...) 2025-02-05 20:09 ` [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts Mateusz Guzik @ 2025-02-05 20:09 ` Mateusz Guzik 2025-02-07 18:31 ` Mark Brown 2025-02-05 20:09 ` [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock Mateusz Guzik 2025-02-05 20:29 ` [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Oleg Nesterov 5 siblings, 1 reply; 12+ messages in thread From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw) To: ebiederm, oleg Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik As the clone side already executes pid allocation with only pidmap_lock held, issuing free_pid() while still holding tasklist_lock exacerbates total hold time of the latter. The pid array is smuggled through newly added release_task_post struct so that any extra things want to get moved out have means to do it. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- include/linux/pid.h | 7 ++++--- kernel/exit.c | 27 +++++++++++++++++++-------- kernel/pid.c | 44 ++++++++++++++++++++++---------------------- kernel/sys.c | 14 +++++++++----- 4 files changed, 54 insertions(+), 38 deletions(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index 98837a1ff0f3..311ecebd7d56 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -101,9 +101,9 @@ extern struct pid *get_task_pid(struct task_struct *task, enum pid_type type); * these helpers must be called with the tasklist_lock write-held. */ extern void attach_pid(struct task_struct *task, enum pid_type); -extern void detach_pid(struct task_struct *task, enum pid_type); -extern void change_pid(struct task_struct *task, enum pid_type, - struct pid *pid); +void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type); +void change_pid(struct pid **pids, struct task_struct *task, enum pid_type, + struct pid *pid); extern void exchange_tids(struct task_struct *task, struct task_struct *old); extern void transfer_pid(struct task_struct *old, struct task_struct *new, enum pid_type); @@ -129,6 +129,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *); extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, size_t set_tid_size); extern void free_pid(struct pid *pid); +void free_pids(struct pid **pids); extern void disable_pid_allocation(struct pid_namespace *ns); /* diff --git a/kernel/exit.c b/kernel/exit.c index b5c0cbc6bdfb..0d6df671c8a8 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -122,14 +122,22 @@ static __init int kernel_exit_sysfs_init(void) late_initcall(kernel_exit_sysfs_init); #endif -static void __unhash_process(struct task_struct *p, bool group_dead) +/* + * For things release_task() would like to do *after* tasklist_lock is released. + */ +struct release_task_post { + struct pid *pids[PIDTYPE_MAX]; +}; + +static void __unhash_process(struct release_task_post *post, struct task_struct *p, + bool group_dead) { nr_threads--; - detach_pid(p, PIDTYPE_PID); + detach_pid(post->pids, p, PIDTYPE_PID); if (group_dead) { - detach_pid(p, PIDTYPE_TGID); - detach_pid(p, PIDTYPE_PGID); - detach_pid(p, PIDTYPE_SID); + detach_pid(post->pids, p, PIDTYPE_TGID); + detach_pid(post->pids, p, PIDTYPE_PGID); + detach_pid(post->pids, p, PIDTYPE_SID); list_del_rcu(&p->tasks); list_del_init(&p->sibling); @@ -141,7 +149,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead) /* * This function expects the tasklist_lock write-locked. */ -static void __exit_signal(struct task_struct *tsk) +static void __exit_signal(struct release_task_post *post, struct task_struct *tsk) { struct signal_struct *sig = tsk->signal; bool group_dead = thread_group_leader(tsk); @@ -194,7 +202,7 @@ static void __exit_signal(struct task_struct *tsk) task_io_accounting_add(&sig->ioac, &tsk->ioac); sig->sum_sched_runtime += tsk->se.sum_exec_runtime; sig->nr_threads--; - __unhash_process(tsk, group_dead); + __unhash_process(post, tsk, group_dead); write_sequnlock(&sig->stats_lock); /* @@ -236,10 +244,13 @@ void __weak release_thread(struct task_struct *dead_task) void release_task(struct task_struct *p) { + struct release_task_post post; struct task_struct *leader; struct pid *thread_pid; int zap_leader; repeat: + memset(&post, 0, sizeof(post)); + /* don't need to get the RCU readlock here - the process is dead and * can't be modifying its own credentials. But shut RCU-lockdep up */ rcu_read_lock(); @@ -252,7 +263,7 @@ void release_task(struct task_struct *p) write_lock_irq(&tasklist_lock); ptrace_release_task(p); - __exit_signal(p); + __exit_signal(&post, p); /* * If we are the last non-leader member of the thread diff --git a/kernel/pid.c b/kernel/pid.c index 2ae872f689a7..73625f28c166 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -88,20 +88,6 @@ struct pid_namespace init_pid_ns = { }; EXPORT_SYMBOL_GPL(init_pid_ns); -/* - * Note: disable interrupts while the pidmap_lock is held as an - * interrupt might come in and do read_lock(&tasklist_lock). - * - * If we don't disable interrupts there is a nasty deadlock between - * detach_pid()->free_pid() and another cpu that does - * spin_lock(&pidmap_lock) followed by an interrupt routine that does - * read_lock(&tasklist_lock); - * - * After we clean up the tasklist_lock and know there are no - * irq handlers that take it we can leave the interrupts enabled. - * For now it is easier to be safe than to prove it can't happen. - */ - static __cacheline_aligned_in_smp DEFINE_SPINLOCK(pidmap_lock); seqcount_spinlock_t pidmap_lock_seq = SEQCNT_SPINLOCK_ZERO(pidmap_lock_seq, &pidmap_lock); @@ -128,10 +114,11 @@ static void delayed_put_pid(struct rcu_head *rhp) void free_pid(struct pid *pid) { - /* We can be called with write_lock_irq(&tasklist_lock) held */ int i; unsigned long flags; + lockdep_assert_not_held(&tasklist_lock); + spin_lock_irqsave(&pidmap_lock, flags); for (i = 0; i <= pid->level; i++) { struct upid *upid = pid->numbers + i; @@ -160,6 +147,18 @@ void free_pid(struct pid *pid) call_rcu(&pid->rcu, delayed_put_pid); } +void free_pids(struct pid **pids) +{ + int tmp; + + /* + * This can batch pidmap_lock. + */ + for (tmp = PIDTYPE_MAX; --tmp >= 0; ) + if (pids[tmp]) + free_pid(pids[tmp]); +} + struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, size_t set_tid_size) { @@ -347,8 +346,8 @@ void attach_pid(struct task_struct *task, enum pid_type type) hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]); } -static void __change_pid(struct task_struct *task, enum pid_type type, - struct pid *new) +static void __change_pid(struct pid **pids, struct task_struct *task, + enum pid_type type, struct pid *new) { struct pid **pid_ptr, *pid; int tmp; @@ -370,18 +369,19 @@ static void __change_pid(struct task_struct *task, enum pid_type type, if (pid_has_task(pid, tmp)) return; - free_pid(pid); + WARN_ON(pids[type]); + pids[type] = pid; } -void detach_pid(struct task_struct *task, enum pid_type type) +void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type type) { - __change_pid(task, type, NULL); + __change_pid(pids, task, type, NULL); } -void change_pid(struct task_struct *task, enum pid_type type, +void change_pid(struct pid **pids, struct task_struct *task, enum pid_type type, struct pid *pid) { - __change_pid(task, type, pid); + __change_pid(pids, task, type, pid); attach_pid(task, type); } diff --git a/kernel/sys.c b/kernel/sys.c index cb366ff8703a..4efca8a97d62 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1085,6 +1085,7 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid) { struct task_struct *p; struct task_struct *group_leader = current->group_leader; + struct pid *pids[PIDTYPE_MAX] = { 0 }; struct pid *pgrp; int err; @@ -1142,13 +1143,14 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid) goto out; if (task_pgrp(p) != pgrp) - change_pid(p, PIDTYPE_PGID, pgrp); + change_pid(pids, p, PIDTYPE_PGID, pgrp); err = 0; out: /* All paths lead to here, thus we are safe. -DaveM */ write_unlock_irq(&tasklist_lock); rcu_read_unlock(); + free_pids(pids); return err; } @@ -1222,21 +1224,22 @@ SYSCALL_DEFINE1(getsid, pid_t, pid) return retval; } -static void set_special_pids(struct pid *pid) +static void set_special_pids(struct pid **pids, struct pid *pid) { struct task_struct *curr = current->group_leader; if (task_session(curr) != pid) - change_pid(curr, PIDTYPE_SID, pid); + change_pid(pids, curr, PIDTYPE_SID, pid); if (task_pgrp(curr) != pid) - change_pid(curr, PIDTYPE_PGID, pid); + change_pid(pids, curr, PIDTYPE_PGID, pid); } int ksys_setsid(void) { struct task_struct *group_leader = current->group_leader; struct pid *sid = task_pid(group_leader); + struct pid *pids[PIDTYPE_MAX] = { 0 }; pid_t session = pid_vnr(sid); int err = -EPERM; @@ -1252,13 +1255,14 @@ int ksys_setsid(void) goto out; group_leader->signal->leader = 1; - set_special_pids(sid); + set_special_pids(pids, sid); proc_clear_tty(group_leader); err = session; out: write_unlock_irq(&tasklist_lock); + free_pids(pids); if (err > 0) { proc_sid_connector(group_leader); sched_autogroup_create_attach(group_leader); -- 2.43.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock 2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik @ 2025-02-07 18:31 ` Mark Brown 2025-02-07 19:45 ` Mateusz Guzik 2025-02-07 19:51 ` Mateusz Guzik 0 siblings, 2 replies; 12+ messages in thread From: Mark Brown @ 2025-02-07 18:31 UTC (permalink / raw) To: Mateusz Guzik Cc: ebiederm, oleg, brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Aishwarya.TCV [-- Attachment #1: Type: text/plain, Size: 6137 bytes --] On Wed, Feb 05, 2025 at 09:09:28PM +0100, Mateusz Guzik wrote: > As the clone side already executes pid allocation with only pidmap_lock > held, issuing free_pid() while still holding tasklist_lock exacerbates > total hold time of the latter. > > The pid array is smuggled through newly added release_task_post struct > so that any extra things want to get moved out have means to do it. We are seeing failures in -next with the clone3 clone3_set_tid selftest on at least arm and arm64 systems which have been bisected to this commit, in -next as 88dec855ce117f52bcf88748ddcbb25b10f4f2fc (same bisect for both): For arm64 we're seeing a bunch of new failures including: # # [389] Trying clone3() with CLONE_SET_TID to 391 and 0x0 # # File exists - Failed to create new process # # [389] clone3() with CLONE_SET_TID 391 says: -17 - expected 0 # not ok 19 reallocate child TID with 1 TIDs and flags 0x0 # # [389] Trying clone3() with CLONE_SET_TID to 1 and 0x20000000 # # File exists - Failed to create new process # # [389] clone3() with CLONE_SET_TID 1 says: -17 - expected 0 # not ok 21 create PID 1 in new NS with 2 TIDs and flags 0x20000000 # # [1] Trying clone3() with CLONE_SET_TID to 43 and 0x0 # # File exists - Failed to create new process # # [1] clone3() with CLONE_SET_TID 43 says: -17 - expected 0 # not ok 24 check leak on invalid specific TID with 2 TIDs and flags 0x0 but there's more, 32 bit only seems to see one new failure (at least on Beaglebone Black which is where I have that test running). Full log for a run on arm64: https://lava.sirena.org.uk/scheduler/job/1102114 and arm: https://lava.sirena.org.uk/scheduler/job/1102088 Bisect log, the start is my tooling feeding in test results it already has to hand in the commit range to seed the bisect: # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207 # good: [653cd79f296fc6fa592cb9fa2f7b8494d5573a43] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git # good: [4c7518062d638837cea915e0ffe30f846780639a] ASoC: SOF: ipc4: Add support for split firmware releases # good: [0a7c85b516830c0bb088b0bdb2f2c50c76fc531a] regulator: ad5398: Fix incorrect power down bit mask # good: [215705db51eb23052c73126d2efb6acbc2db0424] spi: Replace custom fsleep() implementation # good: [6603c5133daadbb3277fbd93be0d0d5b8ec928e8] ASoC: dt-bindings: atmel,at91-ssc: Convert to YAML format # good: [25fac20edd09b60651eabcc57c187b1277f43d08] spi: gpio: Support a single always-selected device # good: [652ffad172d089acb1a20e5fde1b66e687832b06] spi: fsi: Batch TX operations # good: [6eab7034579917f207ca6d8e3f4e11e85e0ab7d5] ASoC: soc-core: Stop using of_property_read_bool() for non-boolean properties # good: [f5aab0438ef17f01c5ecd25e61ae6a03f82a4586] regulator: pca9450: Fix enable register for LDO5 # good: [c1ac98492d1584d31f335d233a5cd7a4d4116e5a] spi: realtek-rtl-snand: Drop unneeded assignment for cache_type # good: [5a6a461079decea452fdcae955bccecf92e07e97] regulator: ad5398: Add device tree support # good: [995cf0e014b0144edf1125668a97c252c5ab775e] regmap: Reorder 'struct regmap' # good: [89785306453ce6d949e783f6936821a0b7649ee2] spi: zynqmp-gqspi: Always acknowledge interrupts # good: [0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d] Merge branch 'for-5.19/cleanup' into for-next git bisect start 'ed58d103e6da15a442ff87567898768dc3a66987' '653cd79f296fc6fa592cb9fa2f7b8494d5573a43' '4c7518062d638837cea915e0ffe30f846780639a' '0a7c85b516830c0bb088b0bdb2f2c50c76fc531a' '215705db51eb23052c73126d2efb6acbc2db0424' '6603c5133daadbb3277fbd93be0d0d5b8ec928e8' '25fac20edd09b60651eabcc57c187b1277f43d08' '652ffad172d089acb1a20e5fde1b66e687832b06' '6eab7034579917f207ca6d8e3f4e11e85e0ab7d5' 'f5aab0438ef17f01c5ecd25e61ae6a03f82a4586' 'c1ac98492d1584d31f335d233a5cd7a4d4116e5a' '5a6a461079decea452fdcae955bccecf92e07e97' '995cf0e014b0144edf1125668a97c252c5ab775e' '89785306453ce6d949e783f6936821a0b7649ee2' '0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d' # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207 git bisect bad ed58d103e6da15a442ff87567898768dc3a66987 # bad: [5a44f71c6ba5b0a7623c25047ac61ed852afbd84] Merge branch 'for-linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git git bisect bad 5a44f71c6ba5b0a7623c25047ac61ed852afbd84 # bad: [953e43d94b2048d260774e9ddbfd3f378aa2c256] Merge branch 'fs-next' of linux-next git bisect bad 953e43d94b2048d260774e9ddbfd3f378aa2c256 # good: [2ab7baf7ebddff9faff6846648fd0753e5ee58c1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git git bisect good 2ab7baf7ebddff9faff6846648fd0753e5ee58c1 # good: [09c7fd0d4465df7fd6cda5668bd32267884b6cbf] Merge branch '9p-next' of git://github.com/martinetd/linux git bisect good 09c7fd0d4465df7fd6cda5668bd32267884b6cbf # bad: [7a1f00b09c09aadbf33660a31f3f3d8ee6302c45] Merge branch 'vfs-6.15.iomap' into vfs.all git bisect bad 7a1f00b09c09aadbf33660a31f3f3d8ee6302c45 # good: [9bc19073026d0dce6b7d17d22e74643c80283f8f] Merge branch 'vfs-6.15.mount' into vfs.all git bisect good 9bc19073026d0dce6b7d17d22e74643c80283f8f # bad: [ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c] Merge branch 'kernel-6.15.tasklist_lock' into vfs.all git bisect bad ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c # good: [d7c340391cb0f0882bc8d5d9798ef32b577a89bc] Merge branch 'vfs-6.15.pipe' into vfs.all git bisect good d7c340391cb0f0882bc8d5d9798ef32b577a89bc # good: [e88fed94388f62a28acfef4954348abe79557d19] pid: sprinkle tasklist_lock asserts git bisect good e88fed94388f62a28acfef4954348abe79557d19 # bad: [5ca27e0557d722dd648f949dde6e4997f6255f18] pid: drop irq disablement around pidmap_lock git bisect bad 5ca27e0557d722dd648f949dde6e4997f6255f18 # bad: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock git bisect bad 88dec855ce117f52bcf88748ddcbb25b10f4f2fc # first bad commit: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock 2025-02-07 18:31 ` Mark Brown @ 2025-02-07 19:45 ` Mateusz Guzik 2025-02-07 19:51 ` Mateusz Guzik 1 sibling, 0 replies; 12+ messages in thread From: Mateusz Guzik @ 2025-02-07 19:45 UTC (permalink / raw) To: Mark Brown Cc: ebiederm, oleg, brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Aishwarya.TCV thanks for the report, i'll prod it over the weekend On Fri, Feb 7, 2025 at 7:31 PM Mark Brown <broonie@kernel.org> wrote: > > On Wed, Feb 05, 2025 at 09:09:28PM +0100, Mateusz Guzik wrote: > > As the clone side already executes pid allocation with only pidmap_lock > > held, issuing free_pid() while still holding tasklist_lock exacerbates > > total hold time of the latter. > > > > The pid array is smuggled through newly added release_task_post struct > > so that any extra things want to get moved out have means to do it. > > We are seeing failures in -next with the clone3 clone3_set_tid selftest > on at least arm and arm64 systems which have been bisected to this > commit, in -next as 88dec855ce117f52bcf88748ddcbb25b10f4f2fc (same > bisect for both): > > For arm64 we're seeing a bunch of new failures including: > > # # [389] Trying clone3() with CLONE_SET_TID to 391 and 0x0 > # # File exists - Failed to create new process > # # [389] clone3() with CLONE_SET_TID 391 says: -17 - expected 0 > # not ok 19 reallocate child TID with 1 TIDs and flags 0x0 > > # # [389] Trying clone3() with CLONE_SET_TID to 1 and 0x20000000 > # # File exists - Failed to create new process > # # [389] clone3() with CLONE_SET_TID 1 says: -17 - expected 0 > # not ok 21 create PID 1 in new NS with 2 TIDs and flags 0x20000000 > > # # [1] Trying clone3() with CLONE_SET_TID to 43 and 0x0 > # # File exists - Failed to create new process > # # [1] clone3() with CLONE_SET_TID 43 says: -17 - expected 0 > # not ok 24 check leak on invalid specific TID with 2 TIDs and flags 0x0 > > but there's more, 32 bit only seems to see one new failure (at least on > Beaglebone Black which is where I have that test running). > > Full log for a run on arm64: > > https://lava.sirena.org.uk/scheduler/job/1102114 > > and arm: > > https://lava.sirena.org.uk/scheduler/job/1102088 > > Bisect log, the start is my tooling feeding in test results it already > has to hand in the commit range to seed the bisect: > > # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207 > # good: [653cd79f296fc6fa592cb9fa2f7b8494d5573a43] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git > # good: [4c7518062d638837cea915e0ffe30f846780639a] ASoC: SOF: ipc4: Add support for split firmware releases > # good: [0a7c85b516830c0bb088b0bdb2f2c50c76fc531a] regulator: ad5398: Fix incorrect power down bit mask > # good: [215705db51eb23052c73126d2efb6acbc2db0424] spi: Replace custom fsleep() implementation > # good: [6603c5133daadbb3277fbd93be0d0d5b8ec928e8] ASoC: dt-bindings: atmel,at91-ssc: Convert to YAML format > # good: [25fac20edd09b60651eabcc57c187b1277f43d08] spi: gpio: Support a single always-selected device > # good: [652ffad172d089acb1a20e5fde1b66e687832b06] spi: fsi: Batch TX operations > # good: [6eab7034579917f207ca6d8e3f4e11e85e0ab7d5] ASoC: soc-core: Stop using of_property_read_bool() for non-boolean properties > # good: [f5aab0438ef17f01c5ecd25e61ae6a03f82a4586] regulator: pca9450: Fix enable register for LDO5 > # good: [c1ac98492d1584d31f335d233a5cd7a4d4116e5a] spi: realtek-rtl-snand: Drop unneeded assignment for cache_type > # good: [5a6a461079decea452fdcae955bccecf92e07e97] regulator: ad5398: Add device tree support > # good: [995cf0e014b0144edf1125668a97c252c5ab775e] regmap: Reorder 'struct regmap' > # good: [89785306453ce6d949e783f6936821a0b7649ee2] spi: zynqmp-gqspi: Always acknowledge interrupts > # good: [0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d] Merge branch 'for-5.19/cleanup' into for-next > git bisect start 'ed58d103e6da15a442ff87567898768dc3a66987' '653cd79f296fc6fa592cb9fa2f7b8494d5573a43' '4c7518062d638837cea915e0ffe30f846780639a' '0a7c85b516830c0bb088b0bdb2f2c50c76fc531a' '215705db51eb23052c73126d2efb6acbc2db0424' '6603c5133daadbb3277fbd93be0d0d5b8ec928e8' '25fac20edd09b60651eabcc57c187b1277f43d08' '652ffad172d089acb1a20e5fde1b66e687832b06' '6eab7034579917f207ca6d8e3f4e11e85e0ab7d5' 'f5aab0438ef17f01c5ecd25e61ae6a03f82a4586' 'c1ac98492d1584d31f335d233a5cd7a4d4116e5a' '5a6a461079decea452fdcae955bccecf92e07e97' '995cf0e014b0144edf1125668a97c252c5ab775e' '89785306453ce6d949e783f6936821a0b7649ee2' '0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d' > # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207 > git bisect bad ed58d103e6da15a442ff87567898768dc3a66987 > # bad: [5a44f71c6ba5b0a7623c25047ac61ed852afbd84] Merge branch 'for-linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git > git bisect bad 5a44f71c6ba5b0a7623c25047ac61ed852afbd84 > # bad: [953e43d94b2048d260774e9ddbfd3f378aa2c256] Merge branch 'fs-next' of linux-next > git bisect bad 953e43d94b2048d260774e9ddbfd3f378aa2c256 > # good: [2ab7baf7ebddff9faff6846648fd0753e5ee58c1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > git bisect good 2ab7baf7ebddff9faff6846648fd0753e5ee58c1 > # good: [09c7fd0d4465df7fd6cda5668bd32267884b6cbf] Merge branch '9p-next' of git://github.com/martinetd/linux > git bisect good 09c7fd0d4465df7fd6cda5668bd32267884b6cbf > # bad: [7a1f00b09c09aadbf33660a31f3f3d8ee6302c45] Merge branch 'vfs-6.15.iomap' into vfs.all > git bisect bad 7a1f00b09c09aadbf33660a31f3f3d8ee6302c45 > # good: [9bc19073026d0dce6b7d17d22e74643c80283f8f] Merge branch 'vfs-6.15.mount' into vfs.all > git bisect good 9bc19073026d0dce6b7d17d22e74643c80283f8f > # bad: [ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c] Merge branch 'kernel-6.15.tasklist_lock' into vfs.all > git bisect bad ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c > # good: [d7c340391cb0f0882bc8d5d9798ef32b577a89bc] Merge branch 'vfs-6.15.pipe' into vfs.all > git bisect good d7c340391cb0f0882bc8d5d9798ef32b577a89bc > # good: [e88fed94388f62a28acfef4954348abe79557d19] pid: sprinkle tasklist_lock asserts > git bisect good e88fed94388f62a28acfef4954348abe79557d19 > # bad: [5ca27e0557d722dd648f949dde6e4997f6255f18] pid: drop irq disablement around pidmap_lock > git bisect bad 5ca27e0557d722dd648f949dde6e4997f6255f18 > # bad: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock > git bisect bad 88dec855ce117f52bcf88748ddcbb25b10f4f2fc > # first bad commit: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock 2025-02-07 18:31 ` Mark Brown 2025-02-07 19:45 ` Mateusz Guzik @ 2025-02-07 19:51 ` Mateusz Guzik 1 sibling, 0 replies; 12+ messages in thread From: Mateusz Guzik @ 2025-02-07 19:51 UTC (permalink / raw) To: Mark Brown Cc: ebiederm, oleg, brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Aishwarya.TCV ... and found next-20250207 is somehow missing a call to free_pids() in release_task(). There was some fat-fingering in v5 and maybe previous versions which made the call land in the wrong commit, I presume there was further crappery elsewhere which concluded in the call not being present to begin with. I just confirmed the version is is intended to land has the call in the right place: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=kernel-6.15.tasklist_lock&id=7903f907a226058ed99f86e9924e082aea57fc45 As in this chunk fell out in -next: diff --git a/kernel/exit.c b/kernel/exit.c index b4fc3cc96ea4..0d6df671c8a8 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -289,6 +289,7 @@ void release_task(struct task_struct *p) put_pid(thread_pid); add_device_randomness(&p->se.sum_exec_runtime, sizeof(p->se.sum_exec_runtime)); + free_pids(post.pids); release_thread(p); put_task_struct_rcu_user(p); On Fri, Feb 7, 2025 at 7:31 PM Mark Brown <broonie@kernel.org> wrote: > > On Wed, Feb 05, 2025 at 09:09:28PM +0100, Mateusz Guzik wrote: > > As the clone side already executes pid allocation with only pidmap_lock > > held, issuing free_pid() while still holding tasklist_lock exacerbates > > total hold time of the latter. > > > > The pid array is smuggled through newly added release_task_post struct > > so that any extra things want to get moved out have means to do it. > > We are seeing failures in -next with the clone3 clone3_set_tid selftest > on at least arm and arm64 systems which have been bisected to this > commit, in -next as 88dec855ce117f52bcf88748ddcbb25b10f4f2fc (same > bisect for both): > > For arm64 we're seeing a bunch of new failures including: > > # # [389] Trying clone3() with CLONE_SET_TID to 391 and 0x0 > # # File exists - Failed to create new process > # # [389] clone3() with CLONE_SET_TID 391 says: -17 - expected 0 > # not ok 19 reallocate child TID with 1 TIDs and flags 0x0 > > # # [389] Trying clone3() with CLONE_SET_TID to 1 and 0x20000000 > # # File exists - Failed to create new process > # # [389] clone3() with CLONE_SET_TID 1 says: -17 - expected 0 > # not ok 21 create PID 1 in new NS with 2 TIDs and flags 0x20000000 > > # # [1] Trying clone3() with CLONE_SET_TID to 43 and 0x0 > # # File exists - Failed to create new process > # # [1] clone3() with CLONE_SET_TID 43 says: -17 - expected 0 > # not ok 24 check leak on invalid specific TID with 2 TIDs and flags 0x0 > > but there's more, 32 bit only seems to see one new failure (at least on > Beaglebone Black which is where I have that test running). > > Full log for a run on arm64: > > https://lava.sirena.org.uk/scheduler/job/1102114 > > and arm: > > https://lava.sirena.org.uk/scheduler/job/1102088 > > Bisect log, the start is my tooling feeding in test results it already > has to hand in the commit range to seed the bisect: > > # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207 > # good: [653cd79f296fc6fa592cb9fa2f7b8494d5573a43] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git > # good: [4c7518062d638837cea915e0ffe30f846780639a] ASoC: SOF: ipc4: Add support for split firmware releases > # good: [0a7c85b516830c0bb088b0bdb2f2c50c76fc531a] regulator: ad5398: Fix incorrect power down bit mask > # good: [215705db51eb23052c73126d2efb6acbc2db0424] spi: Replace custom fsleep() implementation > # good: [6603c5133daadbb3277fbd93be0d0d5b8ec928e8] ASoC: dt-bindings: atmel,at91-ssc: Convert to YAML format > # good: [25fac20edd09b60651eabcc57c187b1277f43d08] spi: gpio: Support a single always-selected device > # good: [652ffad172d089acb1a20e5fde1b66e687832b06] spi: fsi: Batch TX operations > # good: [6eab7034579917f207ca6d8e3f4e11e85e0ab7d5] ASoC: soc-core: Stop using of_property_read_bool() for non-boolean properties > # good: [f5aab0438ef17f01c5ecd25e61ae6a03f82a4586] regulator: pca9450: Fix enable register for LDO5 > # good: [c1ac98492d1584d31f335d233a5cd7a4d4116e5a] spi: realtek-rtl-snand: Drop unneeded assignment for cache_type > # good: [5a6a461079decea452fdcae955bccecf92e07e97] regulator: ad5398: Add device tree support > # good: [995cf0e014b0144edf1125668a97c252c5ab775e] regmap: Reorder 'struct regmap' > # good: [89785306453ce6d949e783f6936821a0b7649ee2] spi: zynqmp-gqspi: Always acknowledge interrupts > # good: [0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d] Merge branch 'for-5.19/cleanup' into for-next > git bisect start 'ed58d103e6da15a442ff87567898768dc3a66987' '653cd79f296fc6fa592cb9fa2f7b8494d5573a43' '4c7518062d638837cea915e0ffe30f846780639a' '0a7c85b516830c0bb088b0bdb2f2c50c76fc531a' '215705db51eb23052c73126d2efb6acbc2db0424' '6603c5133daadbb3277fbd93be0d0d5b8ec928e8' '25fac20edd09b60651eabcc57c187b1277f43d08' '652ffad172d089acb1a20e5fde1b66e687832b06' '6eab7034579917f207ca6d8e3f4e11e85e0ab7d5' 'f5aab0438ef17f01c5ecd25e61ae6a03f82a4586' 'c1ac98492d1584d31f335d233a5cd7a4d4116e5a' '5a6a461079decea452fdcae955bccecf92e07e97' '995cf0e014b0144edf1125668a97c252c5ab775e' '89785306453ce6d949e783f6936821a0b7649ee2' '0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d' > # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207 > git bisect bad ed58d103e6da15a442ff87567898768dc3a66987 > # bad: [5a44f71c6ba5b0a7623c25047ac61ed852afbd84] Merge branch 'for-linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git > git bisect bad 5a44f71c6ba5b0a7623c25047ac61ed852afbd84 > # bad: [953e43d94b2048d260774e9ddbfd3f378aa2c256] Merge branch 'fs-next' of linux-next > git bisect bad 953e43d94b2048d260774e9ddbfd3f378aa2c256 > # good: [2ab7baf7ebddff9faff6846648fd0753e5ee58c1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git > git bisect good 2ab7baf7ebddff9faff6846648fd0753e5ee58c1 > # good: [09c7fd0d4465df7fd6cda5668bd32267884b6cbf] Merge branch '9p-next' of git://github.com/martinetd/linux > git bisect good 09c7fd0d4465df7fd6cda5668bd32267884b6cbf > # bad: [7a1f00b09c09aadbf33660a31f3f3d8ee6302c45] Merge branch 'vfs-6.15.iomap' into vfs.all > git bisect bad 7a1f00b09c09aadbf33660a31f3f3d8ee6302c45 > # good: [9bc19073026d0dce6b7d17d22e74643c80283f8f] Merge branch 'vfs-6.15.mount' into vfs.all > git bisect good 9bc19073026d0dce6b7d17d22e74643c80283f8f > # bad: [ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c] Merge branch 'kernel-6.15.tasklist_lock' into vfs.all > git bisect bad ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c > # good: [d7c340391cb0f0882bc8d5d9798ef32b577a89bc] Merge branch 'vfs-6.15.pipe' into vfs.all > git bisect good d7c340391cb0f0882bc8d5d9798ef32b577a89bc > # good: [e88fed94388f62a28acfef4954348abe79557d19] pid: sprinkle tasklist_lock asserts > git bisect good e88fed94388f62a28acfef4954348abe79557d19 > # bad: [5ca27e0557d722dd648f949dde6e4997f6255f18] pid: drop irq disablement around pidmap_lock > git bisect bad 5ca27e0557d722dd648f949dde6e4997f6255f18 > # bad: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock > git bisect bad 88dec855ce117f52bcf88748ddcbb25b10f4f2fc > # first bad commit: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock -- Mateusz Guzik <mjguzik gmail.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock 2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik ` (3 preceding siblings ...) 2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik @ 2025-02-05 20:09 ` Mateusz Guzik 2025-02-05 20:29 ` [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Oleg Nesterov 5 siblings, 0 replies; 12+ messages in thread From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw) To: ebiederm, oleg Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik It no longer serves any purpose now that the tasklist_lock -> pidmap_lock ordering got eliminated. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> --- kernel/pid.c | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/kernel/pid.c b/kernel/pid.c index 73625f28c166..900193de4232 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -115,11 +115,10 @@ static void delayed_put_pid(struct rcu_head *rhp) void free_pid(struct pid *pid) { int i; - unsigned long flags; lockdep_assert_not_held(&tasklist_lock); - spin_lock_irqsave(&pidmap_lock, flags); + spin_lock(&pidmap_lock); for (i = 0; i <= pid->level; i++) { struct upid *upid = pid->numbers + i; struct pid_namespace *ns = upid->ns; @@ -142,7 +141,7 @@ void free_pid(struct pid *pid) idr_remove(&ns->idr, upid->nr); } pidfs_remove_pid(pid); - spin_unlock_irqrestore(&pidmap_lock, flags); + spin_unlock(&pidmap_lock); call_rcu(&pid->rcu, delayed_put_pid); } @@ -210,7 +209,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, } idr_preload(GFP_KERNEL); - spin_lock_irq(&pidmap_lock); + spin_lock(&pidmap_lock); if (tid) { nr = idr_alloc(&tmp->idr, NULL, tid, @@ -237,7 +236,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min, pid_max, GFP_ATOMIC); } - spin_unlock_irq(&pidmap_lock); + spin_unlock(&pidmap_lock); idr_preload_end(); if (nr < 0) { @@ -271,7 +270,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, upid = pid->numbers + ns->level; idr_preload(GFP_KERNEL); - spin_lock_irq(&pidmap_lock); + spin_lock(&pidmap_lock); if (!(ns->pid_allocated & PIDNS_ADDING)) goto out_unlock; pidfs_add_pid(pid); @@ -280,18 +279,18 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, idr_replace(&upid->ns->idr, pid, upid->nr); upid->ns->pid_allocated++; } - spin_unlock_irq(&pidmap_lock); + spin_unlock(&pidmap_lock); idr_preload_end(); return pid; out_unlock: - spin_unlock_irq(&pidmap_lock); + spin_unlock(&pidmap_lock); idr_preload_end(); put_pid_ns(ns); out_free: - spin_lock_irq(&pidmap_lock); + spin_lock(&pidmap_lock); while (++i <= ns->level) { upid = pid->numbers + i; idr_remove(&upid->ns->idr, upid->nr); @@ -301,7 +300,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, if (ns->pid_allocated == PIDNS_ADDING) idr_set_cursor(&ns->idr, 0); - spin_unlock_irq(&pidmap_lock); + spin_unlock(&pidmap_lock); kmem_cache_free(ns->pid_cachep, pid); return ERR_PTR(retval); @@ -309,9 +308,9 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, void disable_pid_allocation(struct pid_namespace *ns) { - spin_lock_irq(&pidmap_lock); + spin_lock(&pidmap_lock); ns->pid_allocated &= ~PIDNS_ADDING; - spin_unlock_irq(&pidmap_lock); + spin_unlock(&pidmap_lock); } struct pid *find_pid_ns(int nr, struct pid_namespace *ns) -- 2.43.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup 2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik ` (4 preceding siblings ...) 2025-02-05 20:09 ` [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock Mateusz Guzik @ 2025-02-05 20:29 ` Oleg Nesterov 5 siblings, 0 replies; 12+ messages in thread From: Oleg Nesterov @ 2025-02-05 20:29 UTC (permalink / raw) To: Mateusz Guzik Cc: ebiederm, brauner, akpm, Liam.Howlett, linux-mm, linux-kernel On 02/05, Mateusz Guzik wrote: > > Whacking this for both add_device_randomness and pids allocation gives > me a 15% speed up for thread creation/destruction in a 24-core vm. Nice. The whole series look good to me. Oleg. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-02-07 19:52 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik 2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik 2025-02-05 20:56 ` Oleg Nesterov 2025-02-06 10:40 ` Christian Brauner 2025-02-05 20:09 ` [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock Mateusz Guzik 2025-02-05 20:09 ` [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts Mateusz Guzik 2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik 2025-02-07 18:31 ` Mark Brown 2025-02-07 19:45 ` Mateusz Guzik 2025-02-07 19:51 ` Mateusz Guzik 2025-02-05 20:09 ` [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock Mateusz Guzik 2025-02-05 20:29 ` [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox