From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE9BEC3DA6E for ; Wed, 3 Jan 2024 14:05:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FDD98D0061; Wed, 3 Jan 2024 09:05:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5AC116B02EC; Wed, 3 Jan 2024 09:05:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 475838D0061; Wed, 3 Jan 2024 09:05:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 36FA36B02EB for ; Wed, 3 Jan 2024 09:05:05 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0970740918 for ; Wed, 3 Jan 2024 14:05:05 +0000 (UTC) X-FDA: 81638171370.21.3BB9136 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf20.hostedemail.com (Postfix) with ESMTP id EDCD21C0034 for ; Wed, 3 Jan 2024 14:05:01 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=rKTr7qNL; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf20.hostedemail.com: domain of jarkko@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=jarkko@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704290702; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CJXuRQ+L1uc0B3zPG8tjeYfPng0Czqpowvs0bLYewJw=; b=ZYxZW4OkGtauO3uVXCk9qKQZpflH87jFwFHjNGkPnt6ZFbE7GFEeoNR82db9hyKWxMhMUX 5TOZZG4Q2TVSS6vVHm6mzmvpMm6Bo4D1jmwy8shPozWFR01DqP6tW++v9UXQ1jdeWa82Js MGkl3/mLtgDPELuhijfNoe3w6Imveqs= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=rKTr7qNL; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf20.hostedemail.com: domain of jarkko@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=jarkko@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704290702; a=rsa-sha256; cv=none; b=6Q+vJp7AXqIAN2M8d6x8IUvU5bZDABQnOzeM+BGeoxIp+fzJQOQ5WgnkfSVVn6Ie4Y2+uL 8gDjW+VINs9F1GJPRqN9/QJyqxAevpGUmcC+1Jhi8ikI6c8BUNSTE357qDCzdPekztOjLK 6mUz2KnWflU7lZb39zEaA3gmLsqUc3w= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by ams.source.kernel.org (Postfix) with ESMTP id 5B545B81186; Wed, 3 Jan 2024 14:05:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 35CE6C433C7; Wed, 3 Jan 2024 14:04:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704290699; bh=RYNKFXB1pA6ACfuaqdpya+quA/X8Ww41O670fD+ZNNg=; h=Date:Cc:Subject:From:To:References:In-Reply-To:From; b=rKTr7qNLBKgIMnnNZD5kydCgVoB0ShDhCqxZzuZqnQaHxHN+3n8eAD5Q1loCLVjJR uU1d0eneow4rgu1o41DaW+lS1T3SB/GWC+sAZEwEzhpsz6huPi7L5ffd+A/Osns3xs 7guQ8ydipmDJb1pAO2A/VCrWuk4yQep6SMzb5AfD3U77wP7mA/+hlmJnmLhgOvoAC0 COBE5pracc5vYmGzZV2GMs3Sos2+unnWsqjHeTlXI//wPfvHXLMVJNxASe2Dt+0fkt aYVcIIjciGqJKBRkXKDfoRcP2UEytgT1ESXUAN6VvIJUMkEXdR8r+HD0owcPqTQJdy NYdIoSbmFVuKw== Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Wed, 03 Jan 2024 16:04:54 +0200 Message-Id: Cc: , , , , , , , , , , , , , , , Subject: Re: [PATCH] kernel: Introduce a write lock/unlock wrapper for tasklist_lock From: "Jarkko Sakkinen" To: "Maria Yu" , X-Mailer: aerc 0.15.2 References: <20231225081932.17752-1-quic_aiquny@quicinc.com> In-Reply-To: <20231225081932.17752-1-quic_aiquny@quicinc.com> X-Rspamd-Queue-Id: EDCD21C0034 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: hxtamqdd4sbs5d6qcwpznz4bgwctquiq X-HE-Tag: 1704290701-309595 X-HE-Meta: U2FsdGVkX18EW6Gktbaa8vpExJtoX1VOyAiOtUFtxi0/7JBuHhp1DA+hzWRBFWpchEJSHE1K5fo6+Sln/aDuGgibqifAL1KEKllIfkSIMiHFIE6YQcTv/p3dmOGUeh5gGZ7L0w3qN6dYgKWwcYI2UiFjOK5qbdQZt2/gksFPlcakBhbHIVEV6tFe+VEb3aorZPoAk9vHj8Ep89mkV96jOnkvtK3ccobzNCLLdoUcKwtb10YjcoFMoojeJRu2jCvpY9RYVVMEveDAg5w9uSF0D+CY6osnvy4sx7Ue7urGcorE0ZuZN0lO82fVIYfF6nFA4fuY6O9diLG3Iu4fjxYYlJKP62eAUCaR2oLk9yJbecfdH4w0DaH6wr94HKXdxiKAgmGvPmsXfLNtN0HE7XIOuPXylmrGG8XOWRCyaVmj5ypXsONgJ49v5aoVUOgfz2ZaOVSYjoYmibfYZ8MguU8vidu/97Ki1POHjbNZaQkPObG7ECDxsyPNqhJe0UQNcMYEQiuxQTeRBVA+rXmNrL07qhH2AM3ej7XcDi0U0Z9C7+lfeV/MKFqWIa46vFQU3oTc6GL77tdSffRq++U8R03naY6lsFqJfy7FVlLWrzxE9TMSRPNA/GDVGHfvxSFVYmFWe85WENyghf598sBpjS2fQZ+WoAJ+bQOXyqQzSQhHYm9IXvy/6JtyC5Zc2V89Pq3FR6FkwETX99aNSYpEltQsqDvMWTNKwIPTTISQlhpFykItl7jKOSSlotiliwGod4Vz4nw67aJw3QAzU894VocVL8b+oaiI66BNEqKZmfUVZMbxgwsXcCC6PnRTO9RczNq17+l8MyB4K9a/YUdcxr54LjSK1wxDqGYG8W0S1XBRyYRODqaqZ6rGwb9NHDFqmhCkLIs4SLcDBQ0qBI+Lu81PImL1D120+V/VTSsv5yTY2UT/R28MO4XerkN/tD3mxfP4onBFt2J9IpcBIbSuoYl 2PNyMu2J MqA6EU5wOoVbLFs9HvB8d8eS1aQyn92642yVZSTLbsDYFdyEAAI7x5UBLSprwXVuY0YiM1n7mIKJTmWW2erAg/GGxw53uz5GmtgGTz1xTfmN6rG+sOvwgduNlhA24F0kAnAx/4znHzmiqTUy+SkekqfAwED9wmayrsxtuaVr8oKdKsebkccpxmbjiMk7CJ7e54PfaBR3bfzXvHEu0KKAE3K2E9Upr3VAmSJ2PB902fJZx4YAN6w13Yc7C1+m89kIP/0sM4+02DGwHGv/zEn3KFHTrQ4ggvBCdOr7oqGCYQWqN7eGOevZb8x37AsuVYkxJRWvsk0wcslPL8UDCFggclQwtaRzCFHRifmca9MDCu7fhWxM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon Dec 25, 2023 at 10:19 AM EET, Maria Yu wrote: > As a rwlock for tasklist_lock, there are multiple scenarios to acquire > read lock which write lock needed to be waiting for. > In freeze_process/thaw_processes it can take about 200+ms for holding rea= d > lock of tasklist_lock by walking and freezing/thawing tasks in commercial > devices. And write_lock_irq will have preempt disabled and local irq > disabled to spin until the tasklist_lock can be acquired. This leading to > a bad responsive performance of current system. > Take an example: > 1. cpu0 is holding read lock of tasklist_lock to thaw_processes. > 2. cpu1 is waiting write lock of tasklist_lock to exec a new thread with > preempt_disabled and local irq disabled. > 3. cpu2 is waiting write lock of tasklist_lock to do_exit with > preempt_disabled and local irq disabled. > 4. cpu3 is waiting write lock of tasklist_lock to do_exit with > preempt_disabled and local irq disabled. > So introduce a write lock/unlock wrapper for tasklist_lock specificly. > The current taskslist_lock writers all have write_lock_irq to hold > tasklist_lock, and write_unlock_irq to release tasklist_lock, that means > the writers are not suitable or workable to wait on tasklist_lock in irq > disabled scenarios. So the write lock/unlock wrapper here only follow the > current design of directly use local_irq_disable and local_irq_enable, > and not take already irq disabled writer callers into account. > Use write_trylock in the loop and enabled irq for cpu to repsond if lock > cannot be taken. > > Signed-off-by: Maria Yu > --- > fs/exec.c | 10 +++++----- > include/linux/sched/task.h | 29 +++++++++++++++++++++++++++++ > kernel/exit.c | 16 ++++++++-------- > kernel/fork.c | 6 +++--- > kernel/ptrace.c | 12 ++++++------ > kernel/sys.c | 8 ++++---- > security/keys/keyctl.c | 4 ++-- > 7 files changed, 57 insertions(+), 28 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 4aa19b24f281..030eef6852eb 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1086,7 +1086,7 @@ static int de_thread(struct task_struct *tsk) > =20 > for (;;) { > cgroup_threadgroup_change_begin(tsk); > - write_lock_irq(&tasklist_lock); > + write_lock_tasklist_lock(); > /* > * Do this under tasklist_lock to ensure that > * exit_notify() can't miss ->group_exec_task > @@ -1095,7 +1095,7 @@ static int de_thread(struct task_struct *tsk) > if (likely(leader->exit_state)) > break; > __set_current_state(TASK_KILLABLE); > - write_unlock_irq(&tasklist_lock); > + write_unlock_tasklist_lock(); > cgroup_threadgroup_change_end(tsk); > schedule(); > if (__fatal_signal_pending(tsk)) > @@ -1150,7 +1150,7 @@ static int de_thread(struct task_struct *tsk) > */ > if (unlikely(leader->ptrace)) > __wake_up_parent(leader, leader->parent); > - write_unlock_irq(&tasklist_lock); > + write_unlock_tasklist_lock(); > cgroup_threadgroup_change_end(tsk); > =20 > release_task(leader); > @@ -1198,13 +1198,13 @@ static int unshare_sighand(struct task_struct *me= ) > =20 > refcount_set(&newsighand->count, 1); > =20 > - write_lock_irq(&tasklist_lock); > + write_lock_tasklist_lock(); > spin_lock(&oldsighand->siglock); > memcpy(newsighand->action, oldsighand->action, > sizeof(newsighand->action)); > rcu_assign_pointer(me->sighand, newsighand); > spin_unlock(&oldsighand->siglock); > - write_unlock_irq(&tasklist_lock); > + write_unlock_tasklist_lock(); > =20 > __cleanup_sighand(oldsighand); > } > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h > index a23af225c898..6f69d9a3c868 100644 > --- a/include/linux/sched/task.h > +++ b/include/linux/sched/task.h > @@ -50,6 +50,35 @@ struct kernel_clone_args { > * a separate lock). > */ > extern rwlock_t tasklist_lock; > + > +/* > + * Tasklist_lock is a special lock, it takes a good amount of time of > + * taskslist_lock readers to finish, and the pure write_irq_lock api > + * will do local_irq_disable at the very first, and put the current cpu > + * waiting for the lock while is non-responsive for interrupts. > + * > + * The current taskslist_lock writers all have write_lock_irq to hold > + * tasklist_lock, and write_unlock_irq to release tasklist_lock, that > + * means the writers are not suitable or workable to wait on > + * tasklist_lock in irq disabled scenarios. So the write lock/unlock > + * wrapper here only follow the current design of directly use > + * local_irq_disable and local_irq_enable. > + */ > +static inline void write_lock_tasklist_lock(void) > +{ > + while (1) { > + local_irq_disable(); > + if (write_trylock(&tasklist_lock)) > + break; > + local_irq_enable(); > + cpu_relax(); > + } Maybe: local_irq_disable(); while (!write_trylock(&tasklist_lock)) { local_irq_enable(); cpu_relax(); local_irq_disable(); } BR, Jarkko