linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup
@ 2025-02-05 20:09 Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik

The clone side contends against exit side in a way which avoidably
exacerbates the problem by the latter waiting on locks held by the
former while holding the tasklist_lock.

Whacking this for both add_device_randomness and pids allocation gives
me a 15% speed up for thread creation/destruction in a 24-core vm.

The random patch is worth about 4%.

nothing blew up with lockdep, lightly tested so far

Bench (plop into will-it-scale):
$ cat tests/threadspawn1.c

char *testcase_description = "Thread creation and teardown";

static void *worker(void *arg)
{
	return (NULL);
}

void testcase(unsigned long long *iterations, unsigned long nr)
{
	pthread_t thread;
	int error;

	while (1) {
		error = pthread_create(&thread, NULL, worker, NULL);
		assert(error == 0);
		error = pthread_join(thread, NULL);
		assert(error == 0);
		(*iterations)++;
	}
}

v5:
- whack scripts/selinux/genheaders/genheaders which accidentally got in
- rebased on next-20250205

v4:
- justify moving get_pid in the commit message with a one-liner
- drop the tty unref patch -- it is completely optional and Oleg has his
  own variant
- add the ACK by Oleg

v3:
- keep procfs flush where it was, instead hoist get_pid outside of the
  lock
- make detach_pid et al accept an array argument of pids to populate
- sprinkle asserts
- drop irq trips around pidmap_lock
- move tty unref outside of tasklist_lock

Mateusz Guzik (5):
  exit: perform add_device_randomness() without tasklist_lock
  exit: hoist get_pid() in release_task() outside of tasklist_lock
  pid: sprinkle tasklist_lock asserts
  pid: perform free_pid() calls outside of tasklist_lock
  pid: drop irq disablement around pidmap_lock

 include/linux/pid.h |  7 ++--
 kernel/exit.c       | 36 +++++++++++++-------
 kernel/pid.c        | 82 +++++++++++++++++++++++++--------------------
 kernel/sys.c        | 14 +++++---
 4 files changed, 82 insertions(+), 57 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
@ 2025-02-05 20:09 ` Mateusz Guzik
  2025-02-05 20:56   ` Oleg Nesterov
  2025-02-05 20:09 ` [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock Mateusz Guzik
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik

Parallel calls to add_device_randomness() contend on their own.

The clone side aleady runs outside of tasklist_lock, which in turn means
any caller on the exit side extends the tasklist_lock hold time while
contending on the random-private lock.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 kernel/exit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 3485e5fc499e..c79b41509cd3 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -174,9 +174,6 @@ static void __exit_signal(struct task_struct *tsk)
 			sig->curr_target = next_thread(tsk);
 	}
 
-	add_device_randomness((const void*) &tsk->se.sum_exec_runtime,
-			      sizeof(unsigned long long));
-
 	/*
 	 * Accumulate here the counters for all threads as they die. We could
 	 * skip the group leader because it is the last user of signal_struct,
@@ -278,6 +275,9 @@ void release_task(struct task_struct *p)
 	write_unlock_irq(&tasklist_lock);
 	proc_flush_pid(thread_pid);
 	put_pid(thread_pid);
+	add_device_randomness(&p->se.sum_exec_runtime,
+			      sizeof(p->se.sum_exec_runtime));
+	free_pids(post.pids);
 	release_thread(p);
 	put_task_struct_rcu_user(p);
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
@ 2025-02-05 20:09 ` Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts Mateusz Guzik
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik

reduces hold time as get_pid() contains an atomic

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 kernel/exit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index c79b41509cd3..b5c0cbc6bdfb 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -248,9 +248,10 @@ void release_task(struct task_struct *p)
 
 	cgroup_release(p);
 
+	thread_pid = get_pid(p->thread_pid);
+
 	write_lock_irq(&tasklist_lock);
 	ptrace_release_task(p);
-	thread_pid = get_pid(p->thread_pid);
 	__exit_signal(p);
 
 	/*
-- 
2.43.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock Mateusz Guzik
@ 2025-02-05 20:09 ` Mateusz Guzik
  2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 kernel/pid.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index 924084713be8..2ae872f689a7 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -339,17 +339,23 @@ static struct pid **task_pid_ptr(struct task_struct *task, enum pid_type type)
  */
 void attach_pid(struct task_struct *task, enum pid_type type)
 {
-	struct pid *pid = *task_pid_ptr(task, type);
+	struct pid *pid;
+
+	lockdep_assert_held_write(&tasklist_lock);
+
+	pid = *task_pid_ptr(task, type);
 	hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]);
 }
 
 static void __change_pid(struct task_struct *task, enum pid_type type,
 			struct pid *new)
 {
-	struct pid **pid_ptr = task_pid_ptr(task, type);
-	struct pid *pid;
+	struct pid **pid_ptr, *pid;
 	int tmp;
 
+	lockdep_assert_held_write(&tasklist_lock);
+
+	pid_ptr = task_pid_ptr(task, type);
 	pid = *pid_ptr;
 
 	hlist_del_rcu(&task->pid_links[type]);
@@ -386,6 +392,8 @@ void exchange_tids(struct task_struct *left, struct task_struct *right)
 	struct hlist_head *head1 = &pid1->tasks[PIDTYPE_PID];
 	struct hlist_head *head2 = &pid2->tasks[PIDTYPE_PID];
 
+	lockdep_assert_held_write(&tasklist_lock);
+
 	/* Swap the single entry tid lists */
 	hlists_swap_heads_rcu(head1, head2);
 
@@ -403,6 +411,7 @@ void transfer_pid(struct task_struct *old, struct task_struct *new,
 			   enum pid_type type)
 {
 	WARN_ON_ONCE(type == PIDTYPE_PID);
+	lockdep_assert_held_write(&tasklist_lock);
 	hlist_replace_rcu(&old->pid_links[type], &new->pid_links[type]);
 }
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
                   ` (2 preceding siblings ...)
  2025-02-05 20:09 ` [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts Mateusz Guzik
@ 2025-02-05 20:09 ` Mateusz Guzik
  2025-02-07 18:31   ` Mark Brown
  2025-02-05 20:09 ` [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock Mateusz Guzik
  2025-02-05 20:29 ` [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Oleg Nesterov
  5 siblings, 1 reply; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik

As the clone side already executes pid allocation with only pidmap_lock
held, issuing free_pid() while still holding tasklist_lock exacerbates
total hold time of the latter.

The pid array is smuggled through newly added release_task_post struct
so that any extra things want to get moved out have means to do it.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 include/linux/pid.h |  7 ++++---
 kernel/exit.c       | 27 +++++++++++++++++++--------
 kernel/pid.c        | 44 ++++++++++++++++++++++----------------------
 kernel/sys.c        | 14 +++++++++-----
 4 files changed, 54 insertions(+), 38 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 98837a1ff0f3..311ecebd7d56 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -101,9 +101,9 @@ extern struct pid *get_task_pid(struct task_struct *task, enum pid_type type);
  * these helpers must be called with the tasklist_lock write-held.
  */
 extern void attach_pid(struct task_struct *task, enum pid_type);
-extern void detach_pid(struct task_struct *task, enum pid_type);
-extern void change_pid(struct task_struct *task, enum pid_type,
-			struct pid *pid);
+void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type);
+void change_pid(struct pid **pids, struct task_struct *task, enum pid_type,
+		struct pid *pid);
 extern void exchange_tids(struct task_struct *task, struct task_struct *old);
 extern void transfer_pid(struct task_struct *old, struct task_struct *new,
 			 enum pid_type);
@@ -129,6 +129,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
 extern struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 			     size_t set_tid_size);
 extern void free_pid(struct pid *pid);
+void free_pids(struct pid **pids);
 extern void disable_pid_allocation(struct pid_namespace *ns);
 
 /*
diff --git a/kernel/exit.c b/kernel/exit.c
index b5c0cbc6bdfb..0d6df671c8a8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -122,14 +122,22 @@ static __init int kernel_exit_sysfs_init(void)
 late_initcall(kernel_exit_sysfs_init);
 #endif
 
-static void __unhash_process(struct task_struct *p, bool group_dead)
+/*
+ * For things release_task() would like to do *after* tasklist_lock is released.
+ */
+struct release_task_post {
+	struct pid *pids[PIDTYPE_MAX];
+};
+
+static void __unhash_process(struct release_task_post *post, struct task_struct *p,
+			     bool group_dead)
 {
 	nr_threads--;
-	detach_pid(p, PIDTYPE_PID);
+	detach_pid(post->pids, p, PIDTYPE_PID);
 	if (group_dead) {
-		detach_pid(p, PIDTYPE_TGID);
-		detach_pid(p, PIDTYPE_PGID);
-		detach_pid(p, PIDTYPE_SID);
+		detach_pid(post->pids, p, PIDTYPE_TGID);
+		detach_pid(post->pids, p, PIDTYPE_PGID);
+		detach_pid(post->pids, p, PIDTYPE_SID);
 
 		list_del_rcu(&p->tasks);
 		list_del_init(&p->sibling);
@@ -141,7 +149,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
 /*
  * This function expects the tasklist_lock write-locked.
  */
-static void __exit_signal(struct task_struct *tsk)
+static void __exit_signal(struct release_task_post *post, struct task_struct *tsk)
 {
 	struct signal_struct *sig = tsk->signal;
 	bool group_dead = thread_group_leader(tsk);
@@ -194,7 +202,7 @@ static void __exit_signal(struct task_struct *tsk)
 	task_io_accounting_add(&sig->ioac, &tsk->ioac);
 	sig->sum_sched_runtime += tsk->se.sum_exec_runtime;
 	sig->nr_threads--;
-	__unhash_process(tsk, group_dead);
+	__unhash_process(post, tsk, group_dead);
 	write_sequnlock(&sig->stats_lock);
 
 	/*
@@ -236,10 +244,13 @@ void __weak release_thread(struct task_struct *dead_task)
 
 void release_task(struct task_struct *p)
 {
+	struct release_task_post post;
 	struct task_struct *leader;
 	struct pid *thread_pid;
 	int zap_leader;
 repeat:
+	memset(&post, 0, sizeof(post));
+
 	/* don't need to get the RCU readlock here - the process is dead and
 	 * can't be modifying its own credentials. But shut RCU-lockdep up */
 	rcu_read_lock();
@@ -252,7 +263,7 @@ void release_task(struct task_struct *p)
 
 	write_lock_irq(&tasklist_lock);
 	ptrace_release_task(p);
-	__exit_signal(p);
+	__exit_signal(&post, p);
 
 	/*
 	 * If we are the last non-leader member of the thread
diff --git a/kernel/pid.c b/kernel/pid.c
index 2ae872f689a7..73625f28c166 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -88,20 +88,6 @@ struct pid_namespace init_pid_ns = {
 };
 EXPORT_SYMBOL_GPL(init_pid_ns);
 
-/*
- * Note: disable interrupts while the pidmap_lock is held as an
- * interrupt might come in and do read_lock(&tasklist_lock).
- *
- * If we don't disable interrupts there is a nasty deadlock between
- * detach_pid()->free_pid() and another cpu that does
- * spin_lock(&pidmap_lock) followed by an interrupt routine that does
- * read_lock(&tasklist_lock);
- *
- * After we clean up the tasklist_lock and know there are no
- * irq handlers that take it we can leave the interrupts enabled.
- * For now it is easier to be safe than to prove it can't happen.
- */
-
 static  __cacheline_aligned_in_smp DEFINE_SPINLOCK(pidmap_lock);
 seqcount_spinlock_t pidmap_lock_seq = SEQCNT_SPINLOCK_ZERO(pidmap_lock_seq, &pidmap_lock);
 
@@ -128,10 +114,11 @@ static void delayed_put_pid(struct rcu_head *rhp)
 
 void free_pid(struct pid *pid)
 {
-	/* We can be called with write_lock_irq(&tasklist_lock) held */
 	int i;
 	unsigned long flags;
 
+	lockdep_assert_not_held(&tasklist_lock);
+
 	spin_lock_irqsave(&pidmap_lock, flags);
 	for (i = 0; i <= pid->level; i++) {
 		struct upid *upid = pid->numbers + i;
@@ -160,6 +147,18 @@ void free_pid(struct pid *pid)
 	call_rcu(&pid->rcu, delayed_put_pid);
 }
 
+void free_pids(struct pid **pids)
+{
+	int tmp;
+
+	/*
+	 * This can batch pidmap_lock.
+	 */
+	for (tmp = PIDTYPE_MAX; --tmp >= 0; )
+		if (pids[tmp])
+			free_pid(pids[tmp]);
+}
+
 struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 		      size_t set_tid_size)
 {
@@ -347,8 +346,8 @@ void attach_pid(struct task_struct *task, enum pid_type type)
 	hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]);
 }
 
-static void __change_pid(struct task_struct *task, enum pid_type type,
-			struct pid *new)
+static void __change_pid(struct pid **pids, struct task_struct *task,
+			 enum pid_type type, struct pid *new)
 {
 	struct pid **pid_ptr, *pid;
 	int tmp;
@@ -370,18 +369,19 @@ static void __change_pid(struct task_struct *task, enum pid_type type,
 		if (pid_has_task(pid, tmp))
 			return;
 
-	free_pid(pid);
+	WARN_ON(pids[type]);
+	pids[type] = pid;
 }
 
-void detach_pid(struct task_struct *task, enum pid_type type)
+void detach_pid(struct pid **pids, struct task_struct *task, enum pid_type type)
 {
-	__change_pid(task, type, NULL);
+	__change_pid(pids, task, type, NULL);
 }
 
-void change_pid(struct task_struct *task, enum pid_type type,
+void change_pid(struct pid **pids, struct task_struct *task, enum pid_type type,
 		struct pid *pid)
 {
-	__change_pid(task, type, pid);
+	__change_pid(pids, task, type, pid);
 	attach_pid(task, type);
 }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index cb366ff8703a..4efca8a97d62 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1085,6 +1085,7 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid)
 {
 	struct task_struct *p;
 	struct task_struct *group_leader = current->group_leader;
+	struct pid *pids[PIDTYPE_MAX] = { 0 };
 	struct pid *pgrp;
 	int err;
 
@@ -1142,13 +1143,14 @@ SYSCALL_DEFINE2(setpgid, pid_t, pid, pid_t, pgid)
 		goto out;
 
 	if (task_pgrp(p) != pgrp)
-		change_pid(p, PIDTYPE_PGID, pgrp);
+		change_pid(pids, p, PIDTYPE_PGID, pgrp);
 
 	err = 0;
 out:
 	/* All paths lead to here, thus we are safe. -DaveM */
 	write_unlock_irq(&tasklist_lock);
 	rcu_read_unlock();
+	free_pids(pids);
 	return err;
 }
 
@@ -1222,21 +1224,22 @@ SYSCALL_DEFINE1(getsid, pid_t, pid)
 	return retval;
 }
 
-static void set_special_pids(struct pid *pid)
+static void set_special_pids(struct pid **pids, struct pid *pid)
 {
 	struct task_struct *curr = current->group_leader;
 
 	if (task_session(curr) != pid)
-		change_pid(curr, PIDTYPE_SID, pid);
+		change_pid(pids, curr, PIDTYPE_SID, pid);
 
 	if (task_pgrp(curr) != pid)
-		change_pid(curr, PIDTYPE_PGID, pid);
+		change_pid(pids, curr, PIDTYPE_PGID, pid);
 }
 
 int ksys_setsid(void)
 {
 	struct task_struct *group_leader = current->group_leader;
 	struct pid *sid = task_pid(group_leader);
+	struct pid *pids[PIDTYPE_MAX] = { 0 };
 	pid_t session = pid_vnr(sid);
 	int err = -EPERM;
 
@@ -1252,13 +1255,14 @@ int ksys_setsid(void)
 		goto out;
 
 	group_leader->signal->leader = 1;
-	set_special_pids(sid);
+	set_special_pids(pids, sid);
 
 	proc_clear_tty(group_leader);
 
 	err = session;
 out:
 	write_unlock_irq(&tasklist_lock);
+	free_pids(pids);
 	if (err > 0) {
 		proc_sid_connector(group_leader);
 		sched_autogroup_create_attach(group_leader);
-- 
2.43.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
                   ` (3 preceding siblings ...)
  2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik
@ 2025-02-05 20:09 ` Mateusz Guzik
  2025-02-05 20:29 ` [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Oleg Nesterov
  5 siblings, 0 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-05 20:09 UTC (permalink / raw)
  To: ebiederm, oleg
  Cc: brauner, akpm, Liam.Howlett, linux-mm, linux-kernel, Mateusz Guzik

It no longer serves any purpose now that the tasklist_lock ->
pidmap_lock ordering got eliminated.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 kernel/pid.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index 73625f28c166..900193de4232 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -115,11 +115,10 @@ static void delayed_put_pid(struct rcu_head *rhp)
 void free_pid(struct pid *pid)
 {
 	int i;
-	unsigned long flags;
 
 	lockdep_assert_not_held(&tasklist_lock);
 
-	spin_lock_irqsave(&pidmap_lock, flags);
+	spin_lock(&pidmap_lock);
 	for (i = 0; i <= pid->level; i++) {
 		struct upid *upid = pid->numbers + i;
 		struct pid_namespace *ns = upid->ns;
@@ -142,7 +141,7 @@ void free_pid(struct pid *pid)
 		idr_remove(&ns->idr, upid->nr);
 	}
 	pidfs_remove_pid(pid);
-	spin_unlock_irqrestore(&pidmap_lock, flags);
+	spin_unlock(&pidmap_lock);
 
 	call_rcu(&pid->rcu, delayed_put_pid);
 }
@@ -210,7 +209,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 		}
 
 		idr_preload(GFP_KERNEL);
-		spin_lock_irq(&pidmap_lock);
+		spin_lock(&pidmap_lock);
 
 		if (tid) {
 			nr = idr_alloc(&tmp->idr, NULL, tid,
@@ -237,7 +236,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 			nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min,
 					      pid_max, GFP_ATOMIC);
 		}
-		spin_unlock_irq(&pidmap_lock);
+		spin_unlock(&pidmap_lock);
 		idr_preload_end();
 
 		if (nr < 0) {
@@ -271,7 +270,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 
 	upid = pid->numbers + ns->level;
 	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&pidmap_lock);
+	spin_lock(&pidmap_lock);
 	if (!(ns->pid_allocated & PIDNS_ADDING))
 		goto out_unlock;
 	pidfs_add_pid(pid);
@@ -280,18 +279,18 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 		idr_replace(&upid->ns->idr, pid, upid->nr);
 		upid->ns->pid_allocated++;
 	}
-	spin_unlock_irq(&pidmap_lock);
+	spin_unlock(&pidmap_lock);
 	idr_preload_end();
 
 	return pid;
 
 out_unlock:
-	spin_unlock_irq(&pidmap_lock);
+	spin_unlock(&pidmap_lock);
 	idr_preload_end();
 	put_pid_ns(ns);
 
 out_free:
-	spin_lock_irq(&pidmap_lock);
+	spin_lock(&pidmap_lock);
 	while (++i <= ns->level) {
 		upid = pid->numbers + i;
 		idr_remove(&upid->ns->idr, upid->nr);
@@ -301,7 +300,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 	if (ns->pid_allocated == PIDNS_ADDING)
 		idr_set_cursor(&ns->idr, 0);
 
-	spin_unlock_irq(&pidmap_lock);
+	spin_unlock(&pidmap_lock);
 
 	kmem_cache_free(ns->pid_cachep, pid);
 	return ERR_PTR(retval);
@@ -309,9 +308,9 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 
 void disable_pid_allocation(struct pid_namespace *ns)
 {
-	spin_lock_irq(&pidmap_lock);
+	spin_lock(&pidmap_lock);
 	ns->pid_allocated &= ~PIDNS_ADDING;
-	spin_unlock_irq(&pidmap_lock);
+	spin_unlock(&pidmap_lock);
 }
 
 struct pid *find_pid_ns(int nr, struct pid_namespace *ns)
-- 
2.43.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup
  2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
                   ` (4 preceding siblings ...)
  2025-02-05 20:09 ` [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock Mateusz Guzik
@ 2025-02-05 20:29 ` Oleg Nesterov
  5 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2025-02-05 20:29 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: ebiederm, brauner, akpm, Liam.Howlett, linux-mm, linux-kernel

On 02/05, Mateusz Guzik wrote:
>
> Whacking this for both add_device_randomness and pids allocation gives
> me a 15% speed up for thread creation/destruction in a 24-core vm.

Nice. The whole series look good to me.

Oleg.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock
  2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
@ 2025-02-05 20:56   ` Oleg Nesterov
  2025-02-06 10:40     ` Christian Brauner
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2025-02-05 20:56 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: ebiederm, brauner, akpm, Liam.Howlett, linux-mm, linux-kernel

On 02/05, Mateusz Guzik wrote:
>
> Parallel calls to add_device_randomness() contend on their own.
...
> +	add_device_randomness(&p->se.sum_exec_runtime,
> +			      sizeof(p->se.sum_exec_runtime));

OK, but

> +	free_pids(post.pids);

wait... free_pids() comes later in 4/5 ?

Oleg.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock
  2025-02-05 20:56   ` Oleg Nesterov
@ 2025-02-06 10:40     ` Christian Brauner
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2025-02-06 10:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mateusz Guzik, ebiederm, akpm, Liam.Howlett, linux-mm, linux-kernel

On Wed, Feb 05, 2025 at 09:56:56PM +0100, Oleg Nesterov wrote:
> On 02/05, Mateusz Guzik wrote:
> >
> > Parallel calls to add_device_randomness() contend on their own.
> ...
> > +	add_device_randomness(&p->se.sum_exec_runtime,
> > +			      sizeof(p->se.sum_exec_runtime));
> 
> OK, but
> 
> > +	free_pids(post.pids);
> 
> wait... free_pids() comes later in 4/5 ?

Yes, that seems to be an accidental leftover. A

git rebase -S -i v6.14-rc1 -x "make -j7 O=build.v1/ kernel/"

caught this right away. Removed.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock
  2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik
@ 2025-02-07 18:31   ` Mark Brown
  2025-02-07 19:45     ` Mateusz Guzik
  2025-02-07 19:51     ` Mateusz Guzik
  0 siblings, 2 replies; 12+ messages in thread
From: Mark Brown @ 2025-02-07 18:31 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: ebiederm, oleg, brauner, akpm, Liam.Howlett, linux-mm,
	linux-kernel, Aishwarya.TCV

[-- Attachment #1: Type: text/plain, Size: 6137 bytes --]

On Wed, Feb 05, 2025 at 09:09:28PM +0100, Mateusz Guzik wrote:
> As the clone side already executes pid allocation with only pidmap_lock
> held, issuing free_pid() while still holding tasklist_lock exacerbates
> total hold time of the latter.
> 
> The pid array is smuggled through newly added release_task_post struct
> so that any extra things want to get moved out have means to do it.

We are seeing failures in -next with the clone3 clone3_set_tid selftest
on at least arm and arm64 systems which have been bisected to this
commit, in -next as 88dec855ce117f52bcf88748ddcbb25b10f4f2fc (same
bisect for both):

For arm64 we're seeing a bunch of new failures including:

# # [389] Trying clone3() with CLONE_SET_TID to 391 and 0x0
# # File exists - Failed to create new process
# # [389] clone3() with CLONE_SET_TID 391 says: -17 - expected 0
# not ok 19 reallocate child TID with 1 TIDs and flags 0x0

# # [389] Trying clone3() with CLONE_SET_TID to 1 and 0x20000000
# # File exists - Failed to create new process
# # [389] clone3() with CLONE_SET_TID 1 says: -17 - expected 0
# not ok 21 create PID 1 in new NS with 2 TIDs and flags 0x20000000

# # [1] Trying clone3() with CLONE_SET_TID to 43 and 0x0
# # File exists - Failed to create new process
# # [1] clone3() with CLONE_SET_TID 43 says: -17 - expected 0
# not ok 24 check leak on invalid specific TID with 2 TIDs and flags 0x0

but there's more, 32 bit only seems to see one new failure (at least on
Beaglebone Black which is where I have that test running).

Full log for a run on arm64:

   https://lava.sirena.org.uk/scheduler/job/1102114

and arm:

   https://lava.sirena.org.uk/scheduler/job/1102088

Bisect log, the start is my tooling feeding in test results it already
has to hand in the commit range to seed the bisect:

# bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
# good: [653cd79f296fc6fa592cb9fa2f7b8494d5573a43] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
# good: [4c7518062d638837cea915e0ffe30f846780639a] ASoC: SOF: ipc4: Add support for split firmware releases
# good: [0a7c85b516830c0bb088b0bdb2f2c50c76fc531a] regulator: ad5398: Fix incorrect power down bit mask
# good: [215705db51eb23052c73126d2efb6acbc2db0424] spi: Replace custom fsleep() implementation
# good: [6603c5133daadbb3277fbd93be0d0d5b8ec928e8] ASoC: dt-bindings: atmel,at91-ssc: Convert to YAML format
# good: [25fac20edd09b60651eabcc57c187b1277f43d08] spi: gpio: Support a single always-selected device
# good: [652ffad172d089acb1a20e5fde1b66e687832b06] spi: fsi: Batch TX operations
# good: [6eab7034579917f207ca6d8e3f4e11e85e0ab7d5] ASoC: soc-core: Stop using of_property_read_bool() for non-boolean properties
# good: [f5aab0438ef17f01c5ecd25e61ae6a03f82a4586] regulator: pca9450: Fix enable register for LDO5
# good: [c1ac98492d1584d31f335d233a5cd7a4d4116e5a] spi: realtek-rtl-snand: Drop unneeded assignment for cache_type
# good: [5a6a461079decea452fdcae955bccecf92e07e97] regulator: ad5398: Add device tree support
# good: [995cf0e014b0144edf1125668a97c252c5ab775e] regmap: Reorder 'struct regmap'
# good: [89785306453ce6d949e783f6936821a0b7649ee2] spi: zynqmp-gqspi: Always acknowledge interrupts
# good: [0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d] Merge branch 'for-5.19/cleanup' into for-next
git bisect start 'ed58d103e6da15a442ff87567898768dc3a66987' '653cd79f296fc6fa592cb9fa2f7b8494d5573a43' '4c7518062d638837cea915e0ffe30f846780639a' '0a7c85b516830c0bb088b0bdb2f2c50c76fc531a' '215705db51eb23052c73126d2efb6acbc2db0424' '6603c5133daadbb3277fbd93be0d0d5b8ec928e8' '25fac20edd09b60651eabcc57c187b1277f43d08' '652ffad172d089acb1a20e5fde1b66e687832b06' '6eab7034579917f207ca6d8e3f4e11e85e0ab7d5' 'f5aab0438ef17f01c5ecd25e61ae6a03f82a4586' 'c1ac98492d1584d31f335d233a5cd7a4d4116e5a' '5a6a461079decea452fdcae955bccecf92e07e97' '995cf0e014b0144edf1125668a97c252c5ab775e' '89785306453ce6d949e783f6936821a0b7649ee2' '0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d'
# bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
git bisect bad ed58d103e6da15a442ff87567898768dc3a66987
# bad: [5a44f71c6ba5b0a7623c25047ac61ed852afbd84] Merge branch 'for-linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git
git bisect bad 5a44f71c6ba5b0a7623c25047ac61ed852afbd84
# bad: [953e43d94b2048d260774e9ddbfd3f378aa2c256] Merge branch 'fs-next' of linux-next
git bisect bad 953e43d94b2048d260774e9ddbfd3f378aa2c256
# good: [2ab7baf7ebddff9faff6846648fd0753e5ee58c1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
git bisect good 2ab7baf7ebddff9faff6846648fd0753e5ee58c1
# good: [09c7fd0d4465df7fd6cda5668bd32267884b6cbf] Merge branch '9p-next' of git://github.com/martinetd/linux
git bisect good 09c7fd0d4465df7fd6cda5668bd32267884b6cbf
# bad: [7a1f00b09c09aadbf33660a31f3f3d8ee6302c45] Merge branch 'vfs-6.15.iomap' into vfs.all
git bisect bad 7a1f00b09c09aadbf33660a31f3f3d8ee6302c45
# good: [9bc19073026d0dce6b7d17d22e74643c80283f8f] Merge branch 'vfs-6.15.mount' into vfs.all
git bisect good 9bc19073026d0dce6b7d17d22e74643c80283f8f
# bad: [ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c] Merge branch 'kernel-6.15.tasklist_lock' into vfs.all
git bisect bad ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c
# good: [d7c340391cb0f0882bc8d5d9798ef32b577a89bc] Merge branch 'vfs-6.15.pipe' into vfs.all
git bisect good d7c340391cb0f0882bc8d5d9798ef32b577a89bc
# good: [e88fed94388f62a28acfef4954348abe79557d19] pid: sprinkle tasklist_lock asserts
git bisect good e88fed94388f62a28acfef4954348abe79557d19
# bad: [5ca27e0557d722dd648f949dde6e4997f6255f18] pid: drop irq disablement around pidmap_lock
git bisect bad 5ca27e0557d722dd648f949dde6e4997f6255f18
# bad: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock
git bisect bad 88dec855ce117f52bcf88748ddcbb25b10f4f2fc
# first bad commit: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock
  2025-02-07 18:31   ` Mark Brown
@ 2025-02-07 19:45     ` Mateusz Guzik
  2025-02-07 19:51     ` Mateusz Guzik
  1 sibling, 0 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-07 19:45 UTC (permalink / raw)
  To: Mark Brown
  Cc: ebiederm, oleg, brauner, akpm, Liam.Howlett, linux-mm,
	linux-kernel, Aishwarya.TCV

thanks for the report, i'll prod it over the weekend

On Fri, Feb 7, 2025 at 7:31 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Wed, Feb 05, 2025 at 09:09:28PM +0100, Mateusz Guzik wrote:
> > As the clone side already executes pid allocation with only pidmap_lock
> > held, issuing free_pid() while still holding tasklist_lock exacerbates
> > total hold time of the latter.
> >
> > The pid array is smuggled through newly added release_task_post struct
> > so that any extra things want to get moved out have means to do it.
>
> We are seeing failures in -next with the clone3 clone3_set_tid selftest
> on at least arm and arm64 systems which have been bisected to this
> commit, in -next as 88dec855ce117f52bcf88748ddcbb25b10f4f2fc (same
> bisect for both):
>
> For arm64 we're seeing a bunch of new failures including:
>
> # # [389] Trying clone3() with CLONE_SET_TID to 391 and 0x0
> # # File exists - Failed to create new process
> # # [389] clone3() with CLONE_SET_TID 391 says: -17 - expected 0
> # not ok 19 reallocate child TID with 1 TIDs and flags 0x0
>
> # # [389] Trying clone3() with CLONE_SET_TID to 1 and 0x20000000
> # # File exists - Failed to create new process
> # # [389] clone3() with CLONE_SET_TID 1 says: -17 - expected 0
> # not ok 21 create PID 1 in new NS with 2 TIDs and flags 0x20000000
>
> # # [1] Trying clone3() with CLONE_SET_TID to 43 and 0x0
> # # File exists - Failed to create new process
> # # [1] clone3() with CLONE_SET_TID 43 says: -17 - expected 0
> # not ok 24 check leak on invalid specific TID with 2 TIDs and flags 0x0
>
> but there's more, 32 bit only seems to see one new failure (at least on
> Beaglebone Black which is where I have that test running).
>
> Full log for a run on arm64:
>
>    https://lava.sirena.org.uk/scheduler/job/1102114
>
> and arm:
>
>    https://lava.sirena.org.uk/scheduler/job/1102088
>
> Bisect log, the start is my tooling feeding in test results it already
> has to hand in the commit range to seed the bisect:
>
> # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
> # good: [653cd79f296fc6fa592cb9fa2f7b8494d5573a43] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
> # good: [4c7518062d638837cea915e0ffe30f846780639a] ASoC: SOF: ipc4: Add support for split firmware releases
> # good: [0a7c85b516830c0bb088b0bdb2f2c50c76fc531a] regulator: ad5398: Fix incorrect power down bit mask
> # good: [215705db51eb23052c73126d2efb6acbc2db0424] spi: Replace custom fsleep() implementation
> # good: [6603c5133daadbb3277fbd93be0d0d5b8ec928e8] ASoC: dt-bindings: atmel,at91-ssc: Convert to YAML format
> # good: [25fac20edd09b60651eabcc57c187b1277f43d08] spi: gpio: Support a single always-selected device
> # good: [652ffad172d089acb1a20e5fde1b66e687832b06] spi: fsi: Batch TX operations
> # good: [6eab7034579917f207ca6d8e3f4e11e85e0ab7d5] ASoC: soc-core: Stop using of_property_read_bool() for non-boolean properties
> # good: [f5aab0438ef17f01c5ecd25e61ae6a03f82a4586] regulator: pca9450: Fix enable register for LDO5
> # good: [c1ac98492d1584d31f335d233a5cd7a4d4116e5a] spi: realtek-rtl-snand: Drop unneeded assignment for cache_type
> # good: [5a6a461079decea452fdcae955bccecf92e07e97] regulator: ad5398: Add device tree support
> # good: [995cf0e014b0144edf1125668a97c252c5ab775e] regmap: Reorder 'struct regmap'
> # good: [89785306453ce6d949e783f6936821a0b7649ee2] spi: zynqmp-gqspi: Always acknowledge interrupts
> # good: [0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d] Merge branch 'for-5.19/cleanup' into for-next
> git bisect start 'ed58d103e6da15a442ff87567898768dc3a66987' '653cd79f296fc6fa592cb9fa2f7b8494d5573a43' '4c7518062d638837cea915e0ffe30f846780639a' '0a7c85b516830c0bb088b0bdb2f2c50c76fc531a' '215705db51eb23052c73126d2efb6acbc2db0424' '6603c5133daadbb3277fbd93be0d0d5b8ec928e8' '25fac20edd09b60651eabcc57c187b1277f43d08' '652ffad172d089acb1a20e5fde1b66e687832b06' '6eab7034579917f207ca6d8e3f4e11e85e0ab7d5' 'f5aab0438ef17f01c5ecd25e61ae6a03f82a4586' 'c1ac98492d1584d31f335d233a5cd7a4d4116e5a' '5a6a461079decea452fdcae955bccecf92e07e97' '995cf0e014b0144edf1125668a97c252c5ab775e' '89785306453ce6d949e783f6936821a0b7649ee2' '0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d'
> # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
> git bisect bad ed58d103e6da15a442ff87567898768dc3a66987
> # bad: [5a44f71c6ba5b0a7623c25047ac61ed852afbd84] Merge branch 'for-linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git
> git bisect bad 5a44f71c6ba5b0a7623c25047ac61ed852afbd84
> # bad: [953e43d94b2048d260774e9ddbfd3f378aa2c256] Merge branch 'fs-next' of linux-next
> git bisect bad 953e43d94b2048d260774e9ddbfd3f378aa2c256
> # good: [2ab7baf7ebddff9faff6846648fd0753e5ee58c1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
> git bisect good 2ab7baf7ebddff9faff6846648fd0753e5ee58c1
> # good: [09c7fd0d4465df7fd6cda5668bd32267884b6cbf] Merge branch '9p-next' of git://github.com/martinetd/linux
> git bisect good 09c7fd0d4465df7fd6cda5668bd32267884b6cbf
> # bad: [7a1f00b09c09aadbf33660a31f3f3d8ee6302c45] Merge branch 'vfs-6.15.iomap' into vfs.all
> git bisect bad 7a1f00b09c09aadbf33660a31f3f3d8ee6302c45
> # good: [9bc19073026d0dce6b7d17d22e74643c80283f8f] Merge branch 'vfs-6.15.mount' into vfs.all
> git bisect good 9bc19073026d0dce6b7d17d22e74643c80283f8f
> # bad: [ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c] Merge branch 'kernel-6.15.tasklist_lock' into vfs.all
> git bisect bad ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c
> # good: [d7c340391cb0f0882bc8d5d9798ef32b577a89bc] Merge branch 'vfs-6.15.pipe' into vfs.all
> git bisect good d7c340391cb0f0882bc8d5d9798ef32b577a89bc
> # good: [e88fed94388f62a28acfef4954348abe79557d19] pid: sprinkle tasklist_lock asserts
> git bisect good e88fed94388f62a28acfef4954348abe79557d19
> # bad: [5ca27e0557d722dd648f949dde6e4997f6255f18] pid: drop irq disablement around pidmap_lock
> git bisect bad 5ca27e0557d722dd648f949dde6e4997f6255f18
> # bad: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock
> git bisect bad 88dec855ce117f52bcf88748ddcbb25b10f4f2fc
> # first bad commit: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock



-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock
  2025-02-07 18:31   ` Mark Brown
  2025-02-07 19:45     ` Mateusz Guzik
@ 2025-02-07 19:51     ` Mateusz Guzik
  1 sibling, 0 replies; 12+ messages in thread
From: Mateusz Guzik @ 2025-02-07 19:51 UTC (permalink / raw)
  To: Mark Brown
  Cc: ebiederm, oleg, brauner, akpm, Liam.Howlett, linux-mm,
	linux-kernel, Aishwarya.TCV

... and found

next-20250207 is somehow missing a call to free_pids() in release_task().

There was some fat-fingering in v5 and maybe previous versions which
made the call land in the wrong commit, I presume there was further
crappery elsewhere which concluded in the call not being present to
begin with.

I just confirmed the version is is intended to land has the call in
the right place:
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=kernel-6.15.tasklist_lock&id=7903f907a226058ed99f86e9924e082aea57fc45

As in this chunk fell out in -next:
diff --git a/kernel/exit.c b/kernel/exit.c
index b4fc3cc96ea4..0d6df671c8a8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -289,6 +289,7 @@ void release_task(struct task_struct *p)
        put_pid(thread_pid);
        add_device_randomness(&p->se.sum_exec_runtime,
                              sizeof(p->se.sum_exec_runtime));
+       free_pids(post.pids);
        release_thread(p);
        put_task_struct_rcu_user(p);

On Fri, Feb 7, 2025 at 7:31 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Wed, Feb 05, 2025 at 09:09:28PM +0100, Mateusz Guzik wrote:
> > As the clone side already executes pid allocation with only pidmap_lock
> > held, issuing free_pid() while still holding tasklist_lock exacerbates
> > total hold time of the latter.
> >
> > The pid array is smuggled through newly added release_task_post struct
> > so that any extra things want to get moved out have means to do it.
>
> We are seeing failures in -next with the clone3 clone3_set_tid selftest
> on at least arm and arm64 systems which have been bisected to this
> commit, in -next as 88dec855ce117f52bcf88748ddcbb25b10f4f2fc (same
> bisect for both):
>
> For arm64 we're seeing a bunch of new failures including:
>
> # # [389] Trying clone3() with CLONE_SET_TID to 391 and 0x0
> # # File exists - Failed to create new process
> # # [389] clone3() with CLONE_SET_TID 391 says: -17 - expected 0
> # not ok 19 reallocate child TID with 1 TIDs and flags 0x0
>
> # # [389] Trying clone3() with CLONE_SET_TID to 1 and 0x20000000
> # # File exists - Failed to create new process
> # # [389] clone3() with CLONE_SET_TID 1 says: -17 - expected 0
> # not ok 21 create PID 1 in new NS with 2 TIDs and flags 0x20000000
>
> # # [1] Trying clone3() with CLONE_SET_TID to 43 and 0x0
> # # File exists - Failed to create new process
> # # [1] clone3() with CLONE_SET_TID 43 says: -17 - expected 0
> # not ok 24 check leak on invalid specific TID with 2 TIDs and flags 0x0
>
> but there's more, 32 bit only seems to see one new failure (at least on
> Beaglebone Black which is where I have that test running).
>
> Full log for a run on arm64:
>
>    https://lava.sirena.org.uk/scheduler/job/1102114
>
> and arm:
>
>    https://lava.sirena.org.uk/scheduler/job/1102088
>
> Bisect log, the start is my tooling feeding in test results it already
> has to hand in the commit range to seed the bisect:
>
> # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
> # good: [653cd79f296fc6fa592cb9fa2f7b8494d5573a43] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
> # good: [4c7518062d638837cea915e0ffe30f846780639a] ASoC: SOF: ipc4: Add support for split firmware releases
> # good: [0a7c85b516830c0bb088b0bdb2f2c50c76fc531a] regulator: ad5398: Fix incorrect power down bit mask
> # good: [215705db51eb23052c73126d2efb6acbc2db0424] spi: Replace custom fsleep() implementation
> # good: [6603c5133daadbb3277fbd93be0d0d5b8ec928e8] ASoC: dt-bindings: atmel,at91-ssc: Convert to YAML format
> # good: [25fac20edd09b60651eabcc57c187b1277f43d08] spi: gpio: Support a single always-selected device
> # good: [652ffad172d089acb1a20e5fde1b66e687832b06] spi: fsi: Batch TX operations
> # good: [6eab7034579917f207ca6d8e3f4e11e85e0ab7d5] ASoC: soc-core: Stop using of_property_read_bool() for non-boolean properties
> # good: [f5aab0438ef17f01c5ecd25e61ae6a03f82a4586] regulator: pca9450: Fix enable register for LDO5
> # good: [c1ac98492d1584d31f335d233a5cd7a4d4116e5a] spi: realtek-rtl-snand: Drop unneeded assignment for cache_type
> # good: [5a6a461079decea452fdcae955bccecf92e07e97] regulator: ad5398: Add device tree support
> # good: [995cf0e014b0144edf1125668a97c252c5ab775e] regmap: Reorder 'struct regmap'
> # good: [89785306453ce6d949e783f6936821a0b7649ee2] spi: zynqmp-gqspi: Always acknowledge interrupts
> # good: [0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d] Merge branch 'for-5.19/cleanup' into for-next
> git bisect start 'ed58d103e6da15a442ff87567898768dc3a66987' '653cd79f296fc6fa592cb9fa2f7b8494d5573a43' '4c7518062d638837cea915e0ffe30f846780639a' '0a7c85b516830c0bb088b0bdb2f2c50c76fc531a' '215705db51eb23052c73126d2efb6acbc2db0424' '6603c5133daadbb3277fbd93be0d0d5b8ec928e8' '25fac20edd09b60651eabcc57c187b1277f43d08' '652ffad172d089acb1a20e5fde1b66e687832b06' '6eab7034579917f207ca6d8e3f4e11e85e0ab7d5' 'f5aab0438ef17f01c5ecd25e61ae6a03f82a4586' 'c1ac98492d1584d31f335d233a5cd7a4d4116e5a' '5a6a461079decea452fdcae955bccecf92e07e97' '995cf0e014b0144edf1125668a97c252c5ab775e' '89785306453ce6d949e783f6936821a0b7649ee2' '0e11f2076e7fa3efa3e0a694bc4d30ef185f0f7d'
> # bad: [ed58d103e6da15a442ff87567898768dc3a66987] Add linux-next specific files for 20250207
> git bisect bad ed58d103e6da15a442ff87567898768dc3a66987
> # bad: [5a44f71c6ba5b0a7623c25047ac61ed852afbd84] Merge branch 'for-linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git
> git bisect bad 5a44f71c6ba5b0a7623c25047ac61ed852afbd84
> # bad: [953e43d94b2048d260774e9ddbfd3f378aa2c256] Merge branch 'fs-next' of linux-next
> git bisect bad 953e43d94b2048d260774e9ddbfd3f378aa2c256
> # good: [2ab7baf7ebddff9faff6846648fd0753e5ee58c1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
> git bisect good 2ab7baf7ebddff9faff6846648fd0753e5ee58c1
> # good: [09c7fd0d4465df7fd6cda5668bd32267884b6cbf] Merge branch '9p-next' of git://github.com/martinetd/linux
> git bisect good 09c7fd0d4465df7fd6cda5668bd32267884b6cbf
> # bad: [7a1f00b09c09aadbf33660a31f3f3d8ee6302c45] Merge branch 'vfs-6.15.iomap' into vfs.all
> git bisect bad 7a1f00b09c09aadbf33660a31f3f3d8ee6302c45
> # good: [9bc19073026d0dce6b7d17d22e74643c80283f8f] Merge branch 'vfs-6.15.mount' into vfs.all
> git bisect good 9bc19073026d0dce6b7d17d22e74643c80283f8f
> # bad: [ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c] Merge branch 'kernel-6.15.tasklist_lock' into vfs.all
> git bisect bad ba734a751d30f59e3b10d6792a1e1b50e7b1fc0c
> # good: [d7c340391cb0f0882bc8d5d9798ef32b577a89bc] Merge branch 'vfs-6.15.pipe' into vfs.all
> git bisect good d7c340391cb0f0882bc8d5d9798ef32b577a89bc
> # good: [e88fed94388f62a28acfef4954348abe79557d19] pid: sprinkle tasklist_lock asserts
> git bisect good e88fed94388f62a28acfef4954348abe79557d19
> # bad: [5ca27e0557d722dd648f949dde6e4997f6255f18] pid: drop irq disablement around pidmap_lock
> git bisect bad 5ca27e0557d722dd648f949dde6e4997f6255f18
> # bad: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock
> git bisect bad 88dec855ce117f52bcf88748ddcbb25b10f4f2fc
> # first bad commit: [88dec855ce117f52bcf88748ddcbb25b10f4f2fc] pid: perform free_pid() calls outside of tasklist_lock



-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-02-07 19:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-05 20:09 [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Mateusz Guzik
2025-02-05 20:09 ` [PATCH v5 1/5] exit: perform add_device_randomness() without tasklist_lock Mateusz Guzik
2025-02-05 20:56   ` Oleg Nesterov
2025-02-06 10:40     ` Christian Brauner
2025-02-05 20:09 ` [PATCH v5 2/5] exit: hoist get_pid() in release_task() outside of tasklist_lock Mateusz Guzik
2025-02-05 20:09 ` [PATCH v5 3/5] pid: sprinkle tasklist_lock asserts Mateusz Guzik
2025-02-05 20:09 ` [PATCH v5 4/5] pid: perform free_pid() calls outside of tasklist_lock Mateusz Guzik
2025-02-07 18:31   ` Mark Brown
2025-02-07 19:45     ` Mateusz Guzik
2025-02-07 19:51     ` Mateusz Guzik
2025-02-05 20:09 ` [PATCH v5 5/5] pid: drop irq disablement around pidmap_lock Mateusz Guzik
2025-02-05 20:29 ` [PATCH v5 0/5] reduce tasklist_lock hold time on exit and do some pid cleanup Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox