From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAF37C433FE for ; Tue, 11 Oct 2022 10:46:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 26D426B0072; Tue, 11 Oct 2022 06:46:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 21C886B0073; Tue, 11 Oct 2022 06:46:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E5696B0074; Tue, 11 Oct 2022 06:46:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id F1EA46B0072 for ; Tue, 11 Oct 2022 06:46:37 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9CACD1C65F4 for ; Tue, 11 Oct 2022 10:46:37 +0000 (UTC) X-FDA: 80008340034.08.BA369AB Received: from mail3-166.sinamail.sina.com.cn (mail3-166.sinamail.sina.com.cn [202.108.3.166]) by imf10.hostedemail.com (Postfix) with ESMTP id B6C7BC0020 for ; Tue, 11 Oct 2022 10:46:35 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([114.249.60.223]) by sina.com (172.16.97.23) with ESMTP id 6345492900006984; Tue, 11 Oct 2022 18:44:58 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 89087554919955 From: Hillf Danton To: Mukesh Ojha Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, linux-mm@kvack.org, Waiman Long , Peter Zijlstra , Will Deacon , Boqun Feng , Ingo Molnar Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Date: Tue, 11 Oct 2022 18:46:21 +0800 Message-Id: <20221011104621.231-1-hdanton@sina.com> In-Reply-To: <7cbf49c9-d122-30e6-68b3-c61eca63e5dc@quicinc.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665485197; a=rsa-sha256; cv=none; b=IRgSm/xTEtbshyWfekqpqWbDLGtFUAVLuE3alQxNtGO8B/5FxoGYsQdJkbKhBZgHDHegLh Xm28sz9YVNbShs/9/73OZ1350XvZrsIAteWf/lr8McNa2aSqzRovJDoHtDNMx+jPSMHoFe D5nkpE5ptpR1JYNQTL1WAmw8ByaTA/o= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf10.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.166 as permitted sender) smtp.mailfrom=hdanton@sina.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665485197; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G5chEuh48YqHWyoKSK9P/3MhpGUaFze1+sdIubC0WTo=; b=Xqm7VGyGItAkJOGTj1II6lqPxQirqm4QdPEO616O1hTlQcuhVJ989EmO5tIjzjuRiMSGxs 4zSxgSz3xQygtEGdm0SXrH1fgLGSB0b7x3jxKNfAy5wR4yB6KntlOo5pMngEkr7QFd2AGj D2n8Zg83MdgiQ3ADZJpMuheBefb7i/c= Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf10.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.166 as permitted sender) smtp.mailfrom=hdanton@sina.com X-Stat-Signature: g73nmtoun95zybyhf9hqu9wmcx4gx4er X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B6C7BC0020 X-Rspam-User: X-HE-Tag: 1665485195-898137 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/10/22 06:24 Mukesh Ojha > Hi Waiman, > > On 9/29/2022 11:36 PM, Waiman Long wrote: >> On 9/29/22 14:04, Waiman Long wrote: >>> A non-first waiter can potentially spin in the for loop of >>> rwsem_down_write_slowpath() without sleeping but fail to acquire the >>> lock even if the rwsem is free if the following sequence happens: >>> >>> Non-first waiter First waiter Lock holder >>> ---------------- ------------ ----------- >>> Acquire wait_lock >>> rwsem_try_write_lock(): >>> Set handoff bit if RT or >>> wait too long >>> Set waiter->handoff_set >>> Release wait_lock >>> Acquire wait_lock >>> Inherit waiter->handoff_set >>> Release wait_lock >>> Clear owner >>> Release lock >>> if (waiter.handoff_set) { >>> rwsem_spin_on_owner((); >>> if (OWNER_NULL) >>> goto trylock_again; >>> } >>> trylock_again: >>> Acquire wait_lock >>> rwsem_try_write_lock(): >>> if (first->handoff_set && (waiter != first)) >>> return false; >>> Release wait_lock >>> >>> It is especially problematic if the non-first waiter is an RT task and >>> it is running on the same CPU as the first waiter as this can lead to >>> live lock. >>> >>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more >>> consistent") >>> Signed-off-by: Waiman Long >>> --- >>> kernel/locking/rwsem.c | 13 ++++++++++--- >>> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> Mukesh, can you test if this patch can fix the RT task lockup problem? >> > > Looks like, There is still a window for a race. > > There is a chance when a reader who came first added it's BIAS and > goes to slowpath and before it gets added to wait list it got > preempted by RT task which goes to slowpath as well and being the > first waiter gets its hand-off bit set and not able to get the lock > due to following condition in rwsem_try_write_lock() > > 630 if (count & RWSEM_LOCK_MASK) { ==> reader has > sets its bias > .. > ... > > 634 > 635 new |= RWSEM_FLAG_HANDOFF; > 636 } else { > 637 new |= RWSEM_WRITER_LOCKED; > > > ---------------------->----------------------->------------------------- > > First reader (1) writer(2) RT task Lock holder(3) > > It sets > RWSEM_READER_BIAS. > while it is going to > slowpath(as the lock > was held by (3)) and > before it got added > to the waiters list > it got preempted > by (2). > RT task also takes > the slowpath and add release the > itself into waiting list rwsem lock > and since it is the first clear the > it is the next one to get owner. > the lock but it can not > get the lock as (count & > RWSEM_LOCK_MASK) is set > as (1) has added it but > not able to remove its > adjustment. > Hey Mukesh, Can you test the diff if it makes sense to you? It simply prevents the first waiter from spinning any longer after detecting it barely makes any progress to spin without lock owner. Hillf --- mainline/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock( long count, new; lockdep_assert_held(&sem->wait_lock); + waiter->handoff_set = false; count = atomic_long_read(&sem->count); do { bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF); if (has_handoff) { - /* - * Honor handoff bit and yield only when the first - * waiter is the one that set it. Otherwisee, we - * still try to acquire the rwsem. - */ - if (first->handoff_set && (waiter != first)) + if (waiter != first) return false; - - /* - * First waiter can inherit a previously set handoff - * bit and spin on rwsem if lock acquisition fails. - */ - if (waiter == first) - waiter->handoff_set = true; } new = count;