From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E970FC4332F for ; Thu, 13 Oct 2022 13:15:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D9B46B0071; Thu, 13 Oct 2022 09:15:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 489196B0073; Thu, 13 Oct 2022 09:15:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 329B68E0001; Thu, 13 Oct 2022 09:15:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 21CAF6B0071 for ; Thu, 13 Oct 2022 09:15:21 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D9FDA121238 for ; Thu, 13 Oct 2022 13:15:20 +0000 (UTC) X-FDA: 80015972400.02.A8CD354 Received: from r3-25.sinamail.sina.com.cn (r3-25.sinamail.sina.com.cn [202.108.3.25]) by imf29.hostedemail.com (Postfix) with ESMTP id 94F3712002D for ; Thu, 13 Oct 2022 13:15:18 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([114.249.60.223]) by sina.com (172.16.97.23) with ESMTP id 63480F020002F219; Thu, 13 Oct 2022 21:13:40 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 4116054919621 From: Hillf Danton To: Peter Zijlstra Cc: Waiman Long , "Ingo Molnar" , "Will Deacon" , "Boqun Feng" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, "Mukesh Ojha" , =?UTF-8?q?Ting11=20Wang=20=E7=8E=8B=E5=A9=B7?= Subject: Re: [PATCH v2 1/2] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Date: Thu, 13 Oct 2022 21:15:05 +0800 Message-Id: <20221013131505.212-1-hdanton@sina.com> In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.25 as permitted sender) smtp.mailfrom=hdanton@sina.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665666920; a=rsa-sha256; cv=none; b=ZRshd+1WoUlv4wm0v1BErg1Ca5LFV54ljmoSJqvb2uQwPVFKVifZaotXOSnvrd6hO9vC1f TeXnE0RymLdO1j6IvCPGdBM9ZzeE7G8NPXUKzkZdbTpgAWbD6dkZWdpbLAv2t1K2RcMj7B yE5uoNwQ5re0knYYClDEMMmPx2gJK94= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665666920; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OvLi/JUccfWjqdn5BMV9fQlbw0vYmtijaHbbzIQ3E/c=; b=jBd7O2O09iX66soe6P5yND8bWkTUbSeA2+mZWPdMDqmdSmwyUirsEs4CIivRcz+2dD/2+A iev7JiKVjraWT9Tz3cQK5gr7eZXnCL1HEPhnZczhSjfAPOr4DkrnpQwzpGS8cCDZdzOx0d JbqRPxqawBNPpRO8XSQrkYzf7VLNWPY= X-Rspam-User: X-Stat-Signature: pm71i7tpwmjoib55yaphdgc9htaonngc X-Rspamd-Queue-Id: 94F3712002D Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.25 as permitted sender) smtp.mailfrom=hdanton@sina.com; dmarc=none X-Rspamd-Server: rspam07 X-HE-Tag: 1665666918-788173 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 13 Oct 2022 12:02:09 +0200 Peter Zijlstra > On Wed, Oct 12, 2022 at 09:33:32AM -0400, Waiman Long wrote: > > A non-first waiter can potentially spin in the for loop of > > rwsem_down_write_slowpath() without sleeping but fail to acquire the > > lock even if the rwsem is free if the following sequence happens: > > > > Non-first waiter First waiter Lock holder > > ---------------- ------------ ----------- > > Acquire wait_lock > > rwsem_try_write_lock(): > > Set handoff bit if RT or > > wait too long > > Set waiter->handoff_set > > Release wait_lock > > Acquire wait_lock > > Inherit waiter->handoff_set > > Release wait_lock > > Clear owner > > Release lock > > if (waiter.handoff_set) { > > rwsem_spin_on_owner((); > > if (OWNER_NULL) > > goto trylock_again; > > } > > trylock_again: > > Acquire wait_lock > > rwsem_try_write_lock(): > > if (first->handoff_set && (waiter != first)) > > return false; > > Release wait_lock > > > > It is especially problematic if the non-first waiter is an RT task and > > it is running on the same CPU as the first waiter as this can lead to > > live lock. > > > So why not do a better handoff? Specifically, have the owner set owner > to first-waiter instead of NULL ? (like the normal mutex code) Given a simple coding of "better handoff", with care to avoid change added to fast path, I see no bonus except for preventing non-first waiters from spinning. Hillf +++ b/kernel/locking/rwsem.c @@ -429,6 +429,7 @@ static void rwsem_mark_wake(struct rw_se if (waiter->type == RWSEM_WAITING_FOR_WRITE) { if (wake_type == RWSEM_WAKE_ANY) { + atomic_long_set(&sem->owner, (long)waiter->task); /* * Mark writer at the front of the queue for wakeup. * Until the task is actually later awoken later by @@ -752,7 +753,7 @@ rwsem_owner_state(struct task_struct *ow static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem) { - struct task_struct *new, *owner; + struct task_struct *new, *owner, *me = current; unsigned long flags, new_flags; enum owner_state state; @@ -762,6 +763,8 @@ rwsem_spin_on_owner(struct rw_semaphore state = rwsem_owner_state(owner, flags); if (state != OWNER_WRITER) return state; + if (owner == me) + return OWNER_NULL; for (;;) { /* @@ -772,7 +775,10 @@ rwsem_spin_on_owner(struct rw_semaphore */ new = rwsem_owner_flags(sem, &new_flags); if ((new != owner) || (new_flags != flags)) { - state = rwsem_owner_state(new, new_flags); + if (new == me) + state = OWNER_NULL; + else + state = rwsem_owner_state(new, new_flags); break; }