From: Mukesh Ojha <quic_mojha@quicinc.com>
To: Hillf Danton <hdanton@sina.com>
Cc: <linux-kernel@vger.kernel.org>, <john.p.donnelly@oracle.com>,
<linux-mm@kvack.org>, Waiman Long <longman@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
Date: Wed, 12 Oct 2022 19:12:59 +0530 [thread overview]
Message-ID: <edbb017b-2677-cd30-2091-ff347b00183d@quicinc.com> (raw)
In-Reply-To: <20221012040410.403-1-hdanton@sina.com>
Hi,
On 10/12/2022 9:34 AM, Hillf Danton wrote:
> On 11 Oct 2022 18:46:20 +0530 Mukesh Ojha <quic_mojha@quicinc.com>
>> On 10/11/2022 4:16 PM, Hillf Danton wrote:
>>> On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com>
>>>> Hi Waiman,
>>>>
>>>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>>>> On 9/29/22 14:04, Waiman Long wrote:
>>>>>> A non-first waiter can potentially spin in the for loop of
>>>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>>>> lock even if the rwsem is free if the following sequence happens:
>>>>>>
>>>>>> Non-first waiter First waiter Lock holder
>>>>>> ---------------- ------------ -----------
>>>>>> Acquire wait_lock
>>>>>> rwsem_try_write_lock():
>>>>>> Set handoff bit if RT or
>>>>>> wait too long
>>>>>> Set waiter->handoff_set
>>>>>> Release wait_lock
>>>>>> Acquire wait_lock
>>>>>> Inherit waiter->handoff_set
>>>>>> Release wait_lock
>>>>>> Clear owner
>>>>>> Release lock
>>>>>> if (waiter.handoff_set) {
>>>>>> rwsem_spin_on_owner(();
>>>>>> if (OWNER_NULL)
>>>>>> goto trylock_again;
>>>>>> }
>>>>>> trylock_again:
>>>>>> Acquire wait_lock
>>>>>> rwsem_try_write_lock():
>>>>>> if (first->handoff_set && (waiter != first))
>>>>>> return false;
>>>>>> Release wait_lock
>>>>>>
>>>>>> It is especially problematic if the non-first waiter is an RT task and
>>>>>> it is running on the same CPU as the first waiter as this can lead to
>>>>>> live lock.
>>>>>>
>>>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>>>> consistent")
>>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>>> ---
>>>>>> kernel/locking/rwsem.c | 13 ++++++++++---
>>>>>> 1 file changed, 10 insertions(+), 3 deletions(-)
>>>>>
>>>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>>>
>>>>
>>>> Looks like, There is still a window for a race.
>>>>
>>>> There is a chance when a reader who came first added it's BIAS and
>>>> goes to slowpath and before it gets added to wait list it got
>>>> preempted by RT task which goes to slowpath as well and being the
>>>> first waiter gets its hand-off bit set and not able to get the lock
>>>> due to following condition in rwsem_try_write_lock()
>>
>> []
>>
>>>>
>>>> 630 if (count & RWSEM_LOCK_MASK) { ==> reader has
>>>> sets its bias
>>>> ..
>>>> ...
>>>>
>>>> 634
>>>> 635 new |= RWSEM_FLAG_HANDOFF;
>>>> 636 } else {
>>>> 637 new |= RWSEM_WRITER_LOCKED;
>>>>
>>>>
>>>> ---------------------->----------------------->-------------------------
>>>>
>>>> First reader (1) writer(2) RT task Lock holder(3)
>>>>
>>>> It sets
>>>> RWSEM_READER_BIAS.
>>>> while it is going to
>>>> slowpath(as the lock
>>>> was held by (3)) and
>>>> before it got added
>>>> to the waiters list
>>>> it got preempted
>>>> by (2).
>>>> RT task also takes
>>>> the slowpath and add release the
>>>> itself into waiting list rwsem lock
>>>> and since it is the first clear the
>>>> it is the next one to get owner.
>>>> the lock but it can not
>>>> get the lock as (count &
>>>> RWSEM_LOCK_MASK) is set
>>>> as (1) has added it but
>>>> not able to remove its
>>>> adjustment.
>>
>> []
>>
>>>>
>>> Hey Mukesh,
>>>
>>> Can you test the diff if it makes sense to you?
>>>
>>> It simply prevents the first waiter from spinning any longer after detecting
>>> it barely makes any progress to spin without lock owner.
>>>
>>> Hillf
>>>
>>> --- mainline/kernel/locking/rwsem.c
>>> +++ b/kernel/locking/rwsem.c
>>> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
>>> long count, new;
>>>
>>> lockdep_assert_held(&sem->wait_lock);
>>> + waiter->handoff_set = false;
>>>
>>> count = atomic_long_read(&sem->count);
>>> do {
>>> bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>>>
>>> if (has_handoff) {
>>> - /*
>>> - * Honor handoff bit and yield only when the first
>>> - * waiter is the one that set it. Otherwisee, we
>>> - * still try to acquire the rwsem.
>>> - */
>>> - if (first->handoff_set && (waiter != first))
>>> + if (waiter != first)
>>> return false;
>>
>> you mean, you want to check and change waiter->handoff_set on every run
>> rwsem_try_write_lock().
>>
> Yes, with RWSEM_FLAG_HANDOFF set, it is too late for non first waiters to
> spin, and with both RWSEM_LOCK_MASK and RWSEM_FLAG_HANDOFF set, the rivals
> in the RWSEM_LOCK_MASK have an uphand over the first waiter wrt acquiring
> the lock, and it is not a bad option for the first waiter to take a step
> back off.
>
> if (count & RWSEM_LOCK_MASK) {
> if (has_handoff || (!rt_task(waiter->task) &&
> !time_after(jiffies, waiter->timeout)))
> return false;
>
> new |= RWSEM_FLAG_HANDOFF;
> } else {
>
>> But does it break optimistic spinning ? @waiman ?
>
> Waiters spin for acquiring lock instead of lockup and your report shows
> spinning too much makes trouble. The key is stop spinning neither too
> late nor too early. My proposal is a simple one with as few heuristics
> added as possible.
From the high level, it looks like it will work.
Let me check and get back on this.
-Mukesh
>
> Hillf
next prev parent reply other threads:[~2022-10-12 13:43 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <7cbf49c9-d122-30e6-68b3-c61eca63e5dc@quicinc.com>
2022-10-11 10:46 ` Hillf Danton
2022-10-11 13:16 ` Mukesh Ojha
2022-10-12 4:04 ` Hillf Danton
2022-10-12 13:19 ` Waiman Long
2022-10-12 13:42 ` Mukesh Ojha [this message]
2022-10-12 13:16 ` Waiman Long
2022-10-12 13:14 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=edbb017b-2677-cd30-2091-ff347b00183d@quicinc.com \
--to=quic_mojha@quicinc.com \
--cc=boqun.feng@gmail.com \
--cc=hdanton@sina.com \
--cc=john.p.donnelly@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox