* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath [not found] <7cbf49c9-d122-30e6-68b3-c61eca63e5dc@quicinc.com> @ 2022-10-11 10:46 ` Hillf Danton 2022-10-11 13:16 ` Mukesh Ojha 2022-10-12 13:14 ` Waiman Long 0 siblings, 2 replies; 7+ messages in thread From: Hillf Danton @ 2022-10-11 10:46 UTC (permalink / raw) To: Mukesh Ojha Cc: linux-kernel, john.p.donnelly, linux-mm, Waiman Long, Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com> > Hi Waiman, > > On 9/29/2022 11:36 PM, Waiman Long wrote: >> On 9/29/22 14:04, Waiman Long wrote: >>> A non-first waiter can potentially spin in the for loop of >>> rwsem_down_write_slowpath() without sleeping but fail to acquire the >>> lock even if the rwsem is free if the following sequence happens: >>> >>> Non-first waiter First waiter Lock holder >>> ---------------- ------------ ----------- >>> Acquire wait_lock >>> rwsem_try_write_lock(): >>> Set handoff bit if RT or >>> wait too long >>> Set waiter->handoff_set >>> Release wait_lock >>> Acquire wait_lock >>> Inherit waiter->handoff_set >>> Release wait_lock >>> Clear owner >>> Release lock >>> if (waiter.handoff_set) { >>> rwsem_spin_on_owner((); >>> if (OWNER_NULL) >>> goto trylock_again; >>> } >>> trylock_again: >>> Acquire wait_lock >>> rwsem_try_write_lock(): >>> if (first->handoff_set && (waiter != first)) >>> return false; >>> Release wait_lock >>> >>> It is especially problematic if the non-first waiter is an RT task and >>> it is running on the same CPU as the first waiter as this can lead to >>> live lock. >>> >>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more >>> consistent") >>> Signed-off-by: Waiman Long <longman@redhat.com> >>> --- >>> kernel/locking/rwsem.c | 13 ++++++++++--- >>> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> Mukesh, can you test if this patch can fix the RT task lockup problem? >> > > Looks like, There is still a window for a race. > > There is a chance when a reader who came first added it's BIAS and > goes to slowpath and before it gets added to wait list it got > preempted by RT task which goes to slowpath as well and being the > first waiter gets its hand-off bit set and not able to get the lock > due to following condition in rwsem_try_write_lock() > > 630 if (count & RWSEM_LOCK_MASK) { ==> reader has > sets its bias > .. > ... > > 634 > 635 new |= RWSEM_FLAG_HANDOFF; > 636 } else { > 637 new |= RWSEM_WRITER_LOCKED; > > > ---------------------->----------------------->------------------------- > > First reader (1) writer(2) RT task Lock holder(3) > > It sets > RWSEM_READER_BIAS. > while it is going to > slowpath(as the lock > was held by (3)) and > before it got added > to the waiters list > it got preempted > by (2). > RT task also takes > the slowpath and add release the > itself into waiting list rwsem lock > and since it is the first clear the > it is the next one to get owner. > the lock but it can not > get the lock as (count & > RWSEM_LOCK_MASK) is set > as (1) has added it but > not able to remove its > adjustment. > Hey Mukesh, Can you test the diff if it makes sense to you? It simply prevents the first waiter from spinning any longer after detecting it barely makes any progress to spin without lock owner. Hillf --- mainline/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock( long count, new; lockdep_assert_held(&sem->wait_lock); + waiter->handoff_set = false; count = atomic_long_read(&sem->count); do { bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF); if (has_handoff) { - /* - * Honor handoff bit and yield only when the first - * waiter is the one that set it. Otherwisee, we - * still try to acquire the rwsem. - */ - if (first->handoff_set && (waiter != first)) + if (waiter != first) return false; - - /* - * First waiter can inherit a previously set handoff - * bit and spin on rwsem if lock acquisition fails. - */ - if (waiter == first) - waiter->handoff_set = true; } new = count; ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-11 10:46 ` [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Hillf Danton @ 2022-10-11 13:16 ` Mukesh Ojha 2022-10-12 4:04 ` Hillf Danton 2022-10-12 13:16 ` Waiman Long 2022-10-12 13:14 ` Waiman Long 1 sibling, 2 replies; 7+ messages in thread From: Mukesh Ojha @ 2022-10-11 13:16 UTC (permalink / raw) To: Hillf Danton Cc: linux-kernel, john.p.donnelly, linux-mm, Waiman Long, Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar Hi @Hilf, Thanks for looking into this issue. On 10/11/2022 4:16 PM, Hillf Danton wrote: > On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com> >> Hi Waiman, >> >> On 9/29/2022 11:36 PM, Waiman Long wrote: >>> On 9/29/22 14:04, Waiman Long wrote: >>>> A non-first waiter can potentially spin in the for loop of >>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the >>>> lock even if the rwsem is free if the following sequence happens: >>>> >>>> Non-first waiter First waiter Lock holder >>>> ---------------- ------------ ----------- >>>> Acquire wait_lock >>>> rwsem_try_write_lock(): >>>> Set handoff bit if RT or >>>> wait too long >>>> Set waiter->handoff_set >>>> Release wait_lock >>>> Acquire wait_lock >>>> Inherit waiter->handoff_set >>>> Release wait_lock >>>> Clear owner >>>> Release lock >>>> if (waiter.handoff_set) { >>>> rwsem_spin_on_owner((); >>>> if (OWNER_NULL) >>>> goto trylock_again; >>>> } >>>> trylock_again: >>>> Acquire wait_lock >>>> rwsem_try_write_lock(): >>>> if (first->handoff_set && (waiter != first)) >>>> return false; >>>> Release wait_lock >>>> >>>> It is especially problematic if the non-first waiter is an RT task and >>>> it is running on the same CPU as the first waiter as this can lead to >>>> live lock. >>>> >>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more >>>> consistent") >>>> Signed-off-by: Waiman Long <longman@redhat.com> >>>> --- >>>> kernel/locking/rwsem.c | 13 ++++++++++--- >>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>> >>> Mukesh, can you test if this patch can fix the RT task lockup problem? >>> >> >> Looks like, There is still a window for a race. >> >> There is a chance when a reader who came first added it's BIAS and >> goes to slowpath and before it gets added to wait list it got >> preempted by RT task which goes to slowpath as well and being the >> first waiter gets its hand-off bit set and not able to get the lock >> due to following condition in rwsem_try_write_lock() [] >> >> 630 if (count & RWSEM_LOCK_MASK) { ==> reader has >> sets its bias >> .. >> ... >> >> 634 >> 635 new |= RWSEM_FLAG_HANDOFF; >> 636 } else { >> 637 new |= RWSEM_WRITER_LOCKED; >> >> >> ---------------------->----------------------->------------------------- >> >> First reader (1) writer(2) RT task Lock holder(3) >> >> It sets >> RWSEM_READER_BIAS. >> while it is going to >> slowpath(as the lock >> was held by (3)) and >> before it got added >> to the waiters list >> it got preempted >> by (2). >> RT task also takes >> the slowpath and add release the >> itself into waiting list rwsem lock >> and since it is the first clear the >> it is the next one to get owner. >> the lock but it can not >> get the lock as (count & >> RWSEM_LOCK_MASK) is set >> as (1) has added it but >> not able to remove its >> adjustment. [] >> > Hey Mukesh, > > Can you test the diff if it makes sense to you? > > It simply prevents the first waiter from spinning any longer after detecting > it barely makes any progress to spin without lock owner. > > Hillf > > --- mainline/kernel/locking/rwsem.c > +++ b/kernel/locking/rwsem.c > @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock( > long count, new; > > lockdep_assert_held(&sem->wait_lock); > + waiter->handoff_set = false; > > count = atomic_long_read(&sem->count); > do { > bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF); > > if (has_handoff) { > - /* > - * Honor handoff bit and yield only when the first > - * waiter is the one that set it. Otherwisee, we > - * still try to acquire the rwsem. > - */ > - if (first->handoff_set && (waiter != first)) > + if (waiter != first) > return false; you mean, you want to check and change waiter->handoff_set on every run rwsem_try_write_lock(). But does it break optimistic spinning ? @waiman ? -Mukesh > - > - /* > - * First waiter can inherit a previously set handoff > - * bit and spin on rwsem if lock acquisition fails. > - */ > - if (waiter == first) > - waiter->handoff_set = true; > } > > new = count; ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-11 13:16 ` Mukesh Ojha @ 2022-10-12 4:04 ` Hillf Danton 2022-10-12 13:19 ` Waiman Long 2022-10-12 13:42 ` Mukesh Ojha 2022-10-12 13:16 ` Waiman Long 1 sibling, 2 replies; 7+ messages in thread From: Hillf Danton @ 2022-10-12 4:04 UTC (permalink / raw) To: Mukesh Ojha Cc: linux-kernel, john.p.donnelly, linux-mm, Waiman Long, Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar On 11 Oct 2022 18:46:20 +0530 Mukesh Ojha <quic_mojha@quicinc.com> >On 10/11/2022 4:16 PM, Hillf Danton wrote: >> On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com> >>> Hi Waiman, >>> >>> On 9/29/2022 11:36 PM, Waiman Long wrote: >>>> On 9/29/22 14:04, Waiman Long wrote: >>>>> A non-first waiter can potentially spin in the for loop of >>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the >>>>> lock even if the rwsem is free if the following sequence happens: >>>>> >>>>> Non-first waiter First waiter Lock holder >>>>> ---------------- ------------ ----------- >>>>> Acquire wait_lock >>>>> rwsem_try_write_lock(): >>>>> Set handoff bit if RT or >>>>> wait too long >>>>> Set waiter->handoff_set >>>>> Release wait_lock >>>>> Acquire wait_lock >>>>> Inherit waiter->handoff_set >>>>> Release wait_lock >>>>> Clear owner >>>>> Release lock >>>>> if (waiter.handoff_set) { >>>>> rwsem_spin_on_owner((); >>>>> if (OWNER_NULL) >>>>> goto trylock_again; >>>>> } >>>>> trylock_again: >>>>> Acquire wait_lock >>>>> rwsem_try_write_lock(): >>>>> if (first->handoff_set && (waiter != first)) >>>>> return false; >>>>> Release wait_lock >>>>> >>>>> It is especially problematic if the non-first waiter is an RT task and >>>>> it is running on the same CPU as the first waiter as this can lead to >>>>> live lock. >>>>> >>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more >>>>> consistent") >>>>> Signed-off-by: Waiman Long <longman@redhat.com> >>>>> --- >>>>> kernel/locking/rwsem.c | 13 ++++++++++--- >>>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>>> >>>> Mukesh, can you test if this patch can fix the RT task lockup problem? >>>> >>> >>> Looks like, There is still a window for a race. >>> >>> There is a chance when a reader who came first added it's BIAS and >>> goes to slowpath and before it gets added to wait list it got >>> preempted by RT task which goes to slowpath as well and being the >>> first waiter gets its hand-off bit set and not able to get the lock >>> due to following condition in rwsem_try_write_lock() > >[] > >>> >>> 630 if (count & RWSEM_LOCK_MASK) { ==> reader has >>> sets its bias >>> .. >>> ... >>> >>> 634 >>> 635 new |= RWSEM_FLAG_HANDOFF; >>> 636 } else { >>> 637 new |= RWSEM_WRITER_LOCKED; >>> >>> >>> ---------------------->----------------------->------------------------- >>> >>> First reader (1) writer(2) RT task Lock holder(3) >>> >>> It sets >>> RWSEM_READER_BIAS. >>> while it is going to >>> slowpath(as the lock >>> was held by (3)) and >>> before it got added >>> to the waiters list >>> it got preempted >>> by (2). >>> RT task also takes >>> the slowpath and add release the >>> itself into waiting list rwsem lock >>> and since it is the first clear the >>> it is the next one to get owner. >>> the lock but it can not >>> get the lock as (count & >>> RWSEM_LOCK_MASK) is set >>> as (1) has added it but >>> not able to remove its >>> adjustment. > >[] > >>> >> Hey Mukesh, >> >> Can you test the diff if it makes sense to you? >> >> It simply prevents the first waiter from spinning any longer after detecting >> it barely makes any progress to spin without lock owner. >> >> Hillf >> >> --- mainline/kernel/locking/rwsem.c >> +++ b/kernel/locking/rwsem.c >> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock( >> long count, new; >> >> lockdep_assert_held(&sem->wait_lock); >> + waiter->handoff_set = false; >> >> count = atomic_long_read(&sem->count); >> do { >> bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF); >> >> if (has_handoff) { >> - /* >> - * Honor handoff bit and yield only when the first >> - * waiter is the one that set it. Otherwisee, we >> - * still try to acquire the rwsem. >> - */ >> - if (first->handoff_set && (waiter != first)) >> + if (waiter != first) >> return false; > > you mean, you want to check and change waiter->handoff_set on every run > rwsem_try_write_lock(). > Yes, with RWSEM_FLAG_HANDOFF set, it is too late for non first waiters to spin, and with both RWSEM_LOCK_MASK and RWSEM_FLAG_HANDOFF set, the rivals in the RWSEM_LOCK_MASK have an uphand over the first waiter wrt acquiring the lock, and it is not a bad option for the first waiter to take a step back off. if (count & RWSEM_LOCK_MASK) { if (has_handoff || (!rt_task(waiter->task) && !time_after(jiffies, waiter->timeout))) return false; new |= RWSEM_FLAG_HANDOFF; } else { > But does it break optimistic spinning ? @waiman ? Waiters spin for acquiring lock instead of lockup and your report shows spinning too much makes trouble. The key is stop spinning neither too late nor too early. My proposal is a simple one with as few heuristics added as possible. Hillf ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-12 4:04 ` Hillf Danton @ 2022-10-12 13:19 ` Waiman Long 2022-10-12 13:42 ` Mukesh Ojha 1 sibling, 0 replies; 7+ messages in thread From: Waiman Long @ 2022-10-12 13:19 UTC (permalink / raw) To: Hillf Danton, Mukesh Ojha Cc: linux-kernel, john.p.donnelly, linux-mm, Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar On 10/12/22 00:04, Hillf Danton wrote: >> you mean, you want to check and change waiter->handoff_set on every run >> rwsem_try_write_lock(). >> > Yes, with RWSEM_FLAG_HANDOFF set, it is too late for non first waiters to > spin, and with both RWSEM_LOCK_MASK and RWSEM_FLAG_HANDOFF set, the rivals > in the RWSEM_LOCK_MASK have an uphand over the first waiter wrt acquiring > the lock, and it is not a bad option for the first waiter to take a step > back off. > > if (count & RWSEM_LOCK_MASK) { > if (has_handoff || (!rt_task(waiter->task) && > !time_after(jiffies, waiter->timeout))) > return false; > > new |= RWSEM_FLAG_HANDOFF; > } else { > >> But does it break optimistic spinning ? @waiman ? > Waiters spin for acquiring lock instead of lockup and your report shows > spinning too much makes trouble. The key is stop spinning neither too > late nor too early. My proposal is a simple one with as few heuristics > added as possible. Yes, too much spinning is bad if we have RT tasks in the mix, otherwise it should be fine. Cheers, Longman ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-12 4:04 ` Hillf Danton 2022-10-12 13:19 ` Waiman Long @ 2022-10-12 13:42 ` Mukesh Ojha 1 sibling, 0 replies; 7+ messages in thread From: Mukesh Ojha @ 2022-10-12 13:42 UTC (permalink / raw) To: Hillf Danton Cc: linux-kernel, john.p.donnelly, linux-mm, Waiman Long, Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar Hi, On 10/12/2022 9:34 AM, Hillf Danton wrote: > On 11 Oct 2022 18:46:20 +0530 Mukesh Ojha <quic_mojha@quicinc.com> >> On 10/11/2022 4:16 PM, Hillf Danton wrote: >>> On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com> >>>> Hi Waiman, >>>> >>>> On 9/29/2022 11:36 PM, Waiman Long wrote: >>>>> On 9/29/22 14:04, Waiman Long wrote: >>>>>> A non-first waiter can potentially spin in the for loop of >>>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the >>>>>> lock even if the rwsem is free if the following sequence happens: >>>>>> >>>>>> Non-first waiter First waiter Lock holder >>>>>> ---------------- ------------ ----------- >>>>>> Acquire wait_lock >>>>>> rwsem_try_write_lock(): >>>>>> Set handoff bit if RT or >>>>>> wait too long >>>>>> Set waiter->handoff_set >>>>>> Release wait_lock >>>>>> Acquire wait_lock >>>>>> Inherit waiter->handoff_set >>>>>> Release wait_lock >>>>>> Clear owner >>>>>> Release lock >>>>>> if (waiter.handoff_set) { >>>>>> rwsem_spin_on_owner((); >>>>>> if (OWNER_NULL) >>>>>> goto trylock_again; >>>>>> } >>>>>> trylock_again: >>>>>> Acquire wait_lock >>>>>> rwsem_try_write_lock(): >>>>>> if (first->handoff_set && (waiter != first)) >>>>>> return false; >>>>>> Release wait_lock >>>>>> >>>>>> It is especially problematic if the non-first waiter is an RT task and >>>>>> it is running on the same CPU as the first waiter as this can lead to >>>>>> live lock. >>>>>> >>>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more >>>>>> consistent") >>>>>> Signed-off-by: Waiman Long <longman@redhat.com> >>>>>> --- >>>>>> kernel/locking/rwsem.c | 13 ++++++++++--- >>>>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>>>> >>>>> Mukesh, can you test if this patch can fix the RT task lockup problem? >>>>> >>>> >>>> Looks like, There is still a window for a race. >>>> >>>> There is a chance when a reader who came first added it's BIAS and >>>> goes to slowpath and before it gets added to wait list it got >>>> preempted by RT task which goes to slowpath as well and being the >>>> first waiter gets its hand-off bit set and not able to get the lock >>>> due to following condition in rwsem_try_write_lock() >> >> [] >> >>>> >>>> 630 if (count & RWSEM_LOCK_MASK) { ==> reader has >>>> sets its bias >>>> .. >>>> ... >>>> >>>> 634 >>>> 635 new |= RWSEM_FLAG_HANDOFF; >>>> 636 } else { >>>> 637 new |= RWSEM_WRITER_LOCKED; >>>> >>>> >>>> ---------------------->----------------------->------------------------- >>>> >>>> First reader (1) writer(2) RT task Lock holder(3) >>>> >>>> It sets >>>> RWSEM_READER_BIAS. >>>> while it is going to >>>> slowpath(as the lock >>>> was held by (3)) and >>>> before it got added >>>> to the waiters list >>>> it got preempted >>>> by (2). >>>> RT task also takes >>>> the slowpath and add release the >>>> itself into waiting list rwsem lock >>>> and since it is the first clear the >>>> it is the next one to get owner. >>>> the lock but it can not >>>> get the lock as (count & >>>> RWSEM_LOCK_MASK) is set >>>> as (1) has added it but >>>> not able to remove its >>>> adjustment. >> >> [] >> >>>> >>> Hey Mukesh, >>> >>> Can you test the diff if it makes sense to you? >>> >>> It simply prevents the first waiter from spinning any longer after detecting >>> it barely makes any progress to spin without lock owner. >>> >>> Hillf >>> >>> --- mainline/kernel/locking/rwsem.c >>> +++ b/kernel/locking/rwsem.c >>> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock( >>> long count, new; >>> >>> lockdep_assert_held(&sem->wait_lock); >>> + waiter->handoff_set = false; >>> >>> count = atomic_long_read(&sem->count); >>> do { >>> bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF); >>> >>> if (has_handoff) { >>> - /* >>> - * Honor handoff bit and yield only when the first >>> - * waiter is the one that set it. Otherwisee, we >>> - * still try to acquire the rwsem. >>> - */ >>> - if (first->handoff_set && (waiter != first)) >>> + if (waiter != first) >>> return false; >> >> you mean, you want to check and change waiter->handoff_set on every run >> rwsem_try_write_lock(). >> > Yes, with RWSEM_FLAG_HANDOFF set, it is too late for non first waiters to > spin, and with both RWSEM_LOCK_MASK and RWSEM_FLAG_HANDOFF set, the rivals > in the RWSEM_LOCK_MASK have an uphand over the first waiter wrt acquiring > the lock, and it is not a bad option for the first waiter to take a step > back off. > > if (count & RWSEM_LOCK_MASK) { > if (has_handoff || (!rt_task(waiter->task) && > !time_after(jiffies, waiter->timeout))) > return false; > > new |= RWSEM_FLAG_HANDOFF; > } else { > >> But does it break optimistic spinning ? @waiman ? > > Waiters spin for acquiring lock instead of lockup and your report shows > spinning too much makes trouble. The key is stop spinning neither too > late nor too early. My proposal is a simple one with as few heuristics > added as possible. From the high level, it looks like it will work. Let me check and get back on this. -Mukesh > > Hillf ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-11 13:16 ` Mukesh Ojha 2022-10-12 4:04 ` Hillf Danton @ 2022-10-12 13:16 ` Waiman Long 1 sibling, 0 replies; 7+ messages in thread From: Waiman Long @ 2022-10-12 13:16 UTC (permalink / raw) To: Mukesh Ojha, Hillf Danton Cc: linux-kernel, john.p.donnelly, linux-mm, Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar On 10/11/22 09:16, Mukesh Ojha wrote: > Hi @Hilf, > > Thanks for looking into this issue. > > On 10/11/2022 4:16 PM, Hillf Danton wrote: >> On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com> >>> Hi Waiman, >>> >>> On 9/29/2022 11:36 PM, Waiman Long wrote: >>>> On 9/29/22 14:04, Waiman Long wrote: >>>>> A non-first waiter can potentially spin in the for loop of >>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the >>>>> lock even if the rwsem is free if the following sequence happens: >>>>> >>>>> Non-first waiter First waiter Lock holder >>>>> ---------------- ------------ ----------- >>>>> Acquire wait_lock >>>>> rwsem_try_write_lock(): >>>>> Set handoff bit if RT or >>>>> wait too long >>>>> Set waiter->handoff_set >>>>> Release wait_lock >>>>> Acquire wait_lock >>>>> Inherit waiter->handoff_set >>>>> Release wait_lock >>>>> Clear owner >>>>> Release lock >>>>> if (waiter.handoff_set) { >>>>> rwsem_spin_on_owner((); >>>>> if (OWNER_NULL) >>>>> goto trylock_again; >>>>> } >>>>> trylock_again: >>>>> Acquire wait_lock >>>>> rwsem_try_write_lock(): >>>>> if (first->handoff_set && (waiter != first)) >>>>> return false; >>>>> Release wait_lock >>>>> >>>>> It is especially problematic if the non-first waiter is an RT task >>>>> and >>>>> it is running on the same CPU as the first waiter as this can lead to >>>>> live lock. >>>>> >>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more >>>>> consistent") >>>>> Signed-off-by: Waiman Long <longman@redhat.com> >>>>> --- >>>>> kernel/locking/rwsem.c | 13 ++++++++++--- >>>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>>> >>>> Mukesh, can you test if this patch can fix the RT task lockup problem? >>>> >>> >>> Looks like, There is still a window for a race. >>> >>> There is a chance when a reader who came first added it's BIAS and >>> goes to slowpath and before it gets added to wait list it got >>> preempted by RT task which goes to slowpath as well and being the >>> first waiter gets its hand-off bit set and not able to get the lock >>> due to following condition in rwsem_try_write_lock() > > [] > >>> >>> 630 if (count & RWSEM_LOCK_MASK) { ==> reader has >>> sets its bias >>> .. >>> ... >>> >>> 634 >>> 635 new |= RWSEM_FLAG_HANDOFF; >>> 636 } else { >>> 637 new |= RWSEM_WRITER_LOCKED; >>> >>> >>> ---------------------->----------------------->------------------------- >>> >>> >>> First reader (1) writer(2) RT task Lock holder(3) >>> >>> It sets >>> RWSEM_READER_BIAS. >>> while it is going to >>> slowpath(as the lock >>> was held by (3)) and >>> before it got added >>> to the waiters list >>> it got preempted >>> by (2). >>> RT task also takes >>> the slowpath and add release the >>> itself into waiting list rwsem lock >>> and since it is the first clear the >>> it is the next one to get owner. >>> the lock but it can not >>> get the lock as (count & >>> RWSEM_LOCK_MASK) is set >>> as (1) has added it but >>> not able to remove its >>> adjustment. > > [] > >>> >> Hey Mukesh, >> >> Can you test the diff if it makes sense to you? >> >> It simply prevents the first waiter from spinning any longer after >> detecting >> it barely makes any progress to spin without lock owner. >> >> Hillf >> >> --- mainline/kernel/locking/rwsem.c >> +++ b/kernel/locking/rwsem.c >> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock( >> long count, new; >> lockdep_assert_held(&sem->wait_lock); >> + waiter->handoff_set = false; >> count = atomic_long_read(&sem->count); >> do { >> bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF); >> if (has_handoff) { >> - /* >> - * Honor handoff bit and yield only when the first >> - * waiter is the one that set it. Otherwisee, we >> - * still try to acquire the rwsem. >> - */ >> - if (first->handoff_set && (waiter != first)) >> + if (waiter != first) >> return false; > > you mean, you want to check and change waiter->handoff_set on every > run rwsem_try_write_lock(). > > But does it break optimistic spinning ? @waiman ? > As I said in my previous mail, this is equivalent to allow only one optimistic spinning attempt after setting the handoff bit. That will likely reduce the performance benefit provided by this feature. Cheers, Longman ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2022-10-11 10:46 ` [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Hillf Danton 2022-10-11 13:16 ` Mukesh Ojha @ 2022-10-12 13:14 ` Waiman Long 1 sibling, 0 replies; 7+ messages in thread From: Waiman Long @ 2022-10-12 13:14 UTC (permalink / raw) To: Hillf Danton, Mukesh Ojha Cc: linux-kernel, john.p.donnelly, linux-mm, Peter Zijlstra, Will Deacon, Boqun Feng, Ingo Molnar On 10/11/22 06:46, Hillf Danton wrote: > On 10/10/22 06:24 Mukesh Ojha <quic_mojha@quicinc.com> >> Hi Waiman, >> >> On 9/29/2022 11:36 PM, Waiman Long wrote: >>> On 9/29/22 14:04, Waiman Long wrote: >>>> A non-first waiter can potentially spin in the for loop of >>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the >>>> lock even if the rwsem is free if the following sequence happens: >>>> >>>> Non-first waiter First waiter Lock holder >>>> ---------------- ------------ ----------- >>>> Acquire wait_lock >>>> rwsem_try_write_lock(): >>>> Set handoff bit if RT or >>>> wait too long >>>> Set waiter->handoff_set >>>> Release wait_lock >>>> Acquire wait_lock >>>> Inherit waiter->handoff_set >>>> Release wait_lock >>>> Clear owner >>>> Release lock >>>> if (waiter.handoff_set) { >>>> rwsem_spin_on_owner((); >>>> if (OWNER_NULL) >>>> goto trylock_again; >>>> } >>>> trylock_again: >>>> Acquire wait_lock >>>> rwsem_try_write_lock(): >>>> if (first->handoff_set && (waiter != first)) >>>> return false; >>>> Release wait_lock >>>> >>>> It is especially problematic if the non-first waiter is an RT task and >>>> it is running on the same CPU as the first waiter as this can lead to >>>> live lock. >>>> >>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more >>>> consistent") >>>> Signed-off-by: Waiman Long <longman@redhat.com> >>>> --- >>>> kernel/locking/rwsem.c | 13 ++++++++++--- >>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>> Mukesh, can you test if this patch can fix the RT task lockup problem? >>> >> Looks like, There is still a window for a race. >> >> There is a chance when a reader who came first added it's BIAS and >> goes to slowpath and before it gets added to wait list it got >> preempted by RT task which goes to slowpath as well and being the >> first waiter gets its hand-off bit set and not able to get the lock >> due to following condition in rwsem_try_write_lock() >> >> 630 if (count & RWSEM_LOCK_MASK) { ==> reader has >> sets its bias >> .. >> ... >> >> 634 >> 635 new |= RWSEM_FLAG_HANDOFF; >> 636 } else { >> 637 new |= RWSEM_WRITER_LOCKED; >> >> >> ---------------------->----------------------->------------------------- >> >> First reader (1) writer(2) RT task Lock holder(3) >> >> It sets >> RWSEM_READER_BIAS. >> while it is going to >> slowpath(as the lock >> was held by (3)) and >> before it got added >> to the waiters list >> it got preempted >> by (2). >> RT task also takes >> the slowpath and add release the >> itself into waiting list rwsem lock >> and since it is the first clear the >> it is the next one to get owner. >> the lock but it can not >> get the lock as (count & >> RWSEM_LOCK_MASK) is set >> as (1) has added it but >> not able to remove its >> adjustment. >> > Hey Mukesh, > > Can you test the diff if it makes sense to you? > > It simply prevents the first waiter from spinning any longer after detecting > it barely makes any progress to spin without lock owner. > > Hillf > > --- mainline/kernel/locking/rwsem.c > +++ b/kernel/locking/rwsem.c > @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock( > long count, new; > > lockdep_assert_held(&sem->wait_lock); > + waiter->handoff_set = false; > > count = atomic_long_read(&sem->count); > do { > bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF); > > if (has_handoff) { > - /* > - * Honor handoff bit and yield only when the first > - * waiter is the one that set it. Otherwisee, we > - * still try to acquire the rwsem. > - */ > - if (first->handoff_set && (waiter != first)) > + if (waiter != first) > return false; > - > - /* > - * First waiter can inherit a previously set handoff > - * bit and spin on rwsem if lock acquisition fails. > - */ > - if (waiter == first) > - waiter->handoff_set = true; > } > > new = count; That is somewhat equivalent to allowing first waiter to spin only once after setting the handoff bit. It also remove the code to handle some of corner cases. Cheers, Longman ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-10-12 13:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <7cbf49c9-d122-30e6-68b3-c61eca63e5dc@quicinc.com>
2022-10-11 10:46 ` [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Hillf Danton
2022-10-11 13:16 ` Mukesh Ojha
2022-10-12 4:04 ` Hillf Danton
2022-10-12 13:19 ` Waiman Long
2022-10-12 13:42 ` Mukesh Ojha
2022-10-12 13:16 ` Waiman Long
2022-10-12 13:14 ` Waiman Long
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox