From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95F87C433EF for ; Sat, 18 Jun 2022 08:43:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F2B66B0071; Sat, 18 Jun 2022 04:43:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 27C186B0072; Sat, 18 Jun 2022 04:43:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11C846B0073; Sat, 18 Jun 2022 04:43:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EF5816B0071 for ; Sat, 18 Jun 2022 04:43:37 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C539020930 for ; Sat, 18 Jun 2022 08:43:37 +0000 (UTC) X-FDA: 79590718074.16.D0A29B4 Received: from r3-24.sinamail.sina.com.cn (r3-24.sinamail.sina.com.cn [202.108.3.24]) by imf16.hostedemail.com (Postfix) with SMTP id E138718008C for ; Sat, 18 Jun 2022 08:43:34 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([114.249.59.48]) by sina.com (172.16.97.23) with ESMTP id 62AD8FD30002C451; Sat, 18 Jun 2022 16:41:57 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 75167054919678 From: Hillf Danton To: Waiman Long , Eric Dumazet Cc: linux-mm@kvack.org, linux-kernel Subject: Re: [PATCH] locking/rwlocks: do not starve writers Date: Sat, 18 Jun 2022 16:43:22 +0800 Message-Id: <20220618084322.81-1-hdanton@sina.com> In-Reply-To: <980387bb-f46d-e9ab-96b0-7293df08c447@redhat.com> References: <20220617091039.2257083-1-eric.dumazet@gmail.com> <2dd754f9-3a79-ed17-e423-6b411c3afb69@redhat.com> <2730b855-8f99-5a9e-707e-697d3bd9811d@redhat.com> <7499dd05-30d1-669c-66b4-5cb06452b476@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655541817; a=rsa-sha256; cv=none; b=fw3dY77uaHKG04wiakhpqKz0A2xja8im28o31wDUvkXSu/paP+1W59/90C1VqR3i4FBcbP Gp6fIUlK/zNihWLGkd2glPfQgTM+7Nh//3j5qby6Y4nh8r8iwWRjzFuN75noLD7oqIOXF6 R5bGZ8Nyx/Pf9qMUOvH7UboZmp09F7I= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.24 as permitted sender) smtp.mailfrom=hdanton@sina.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655541817; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=goG9+hYRMLqgbXM+LMRHgvbMvhRX+tnyl2JeiiGLrDE=; b=Vt6BETjG5kKgV/+BQzN0lXTkz8PagvI5BQA6UOBcRM/vedQ3jWVgd+TsAhazvo9WLi0Bmp YzDcfSG1+B28PGqv737vMulSZ0Q+gGR94cSk11OtXwd1G1yc9wiCsrnmdGZE540gQv1T4D RDXFgfCrkMVu3dZnGvsCLqawqdeyGM0= Authentication-Results: imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.24 as permitted sender) smtp.mailfrom=hdanton@sina.com; dmarc=none X-Rspam-User: X-Stat-Signature: qjchjokiwnr34nqc6dhm896g6fnhkoic X-Rspamd-Queue-Id: E138718008C X-Rspamd-Server: rspam08 X-HE-Tag: 1655541814-771684 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 17 Jun 2022 14:57:55 -0400 Waiman Long wrote: >On 6/17/22 13:45, Eric Dumazet wrote: >> On Fri, Jun 17, 2022 at 7:42 PM Waiman Long wrote: >>> On 6/17/22 11:24, Eric Dumazet wrote: >>>> On Fri, Jun 17, 2022 at 5:00 PM Waiman Long wrote: >>>>> On 6/17/22 10:57, Shakeel Butt wrote: >>>>>> On Fri, Jun 17, 2022 at 7:43 AM Waiman Long wrote: >>>>>>> On 6/17/22 08:07, Peter Zijlstra wrote: >>>>>>>> On Fri, Jun 17, 2022 at 02:10:39AM -0700, Eric Dumazet wrote: >>>>>>>>> --- a/kernel/locking/qrwlock.c >>>>>>>>> +++ b/kernel/locking/qrwlock.c >>>>>>>>> @@ -23,16 +23,6 @@ void queued_read_lock_slowpath(struct qrwlock *lock) >>>>>>>>> /* >>>>>>>>> * Readers come here when they cannot get the lock without waiting >>>>>>>>> */ >>>>>>>>> - if (unlikely(in_interrupt())) { >>>>>>>>> - /* >>>>>>>>> - * Readers in interrupt context will get the lock immediately >>>>>>>>> - * if the writer is just waiting (not holding the lock yet), >>>>>>>>> - * so spin with ACQUIRE semantics until the lock is available >>>>>>>>> - * without waiting in the queue. >>>>>>>>> - */ >>>>>>>>> - atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED)); >>>>>>>>> - return; >>>>>>>>> - } >>>>>>>>> atomic_sub(_QR_BIAS, &lock->cnts); >>>>>>>>> >>>>>>>>> trace_contention_begin(lock, LCB_F_SPIN | LCB_F_READ); >>>>>>>> This is known to break tasklist_lock. >>>>>>>> >>>>>>> We certainly can't break the current usage of tasklist_lock. >>>>>>> >>>>>>> I am aware of this problem with networking code and is thinking about >>>>>>> either relaxing the check to exclude softirq or provide a >>>>>>> read_lock_unfair() variant for networking use. >>>>>> read_lock_unfair() for networking use or tasklist_lock use? >>>>> I mean to say read_lock_fair(), but it could also be the other way >>>>> around. Thanks for spotting that. >>>>> >>>> If only tasklist_lock is problematic and needs the unfair variant, >>>> then changing a few read_lock() for tasklist_lock will be less >>>> invasive than ~1000 read_lock() elsewhere.... >>> After a second thought, I think the right way is to introduce a fair >>> variant, if needed. If an arch isn't using qrwlock, the native rwlock >>> implementation will be unfair. In that sense, unfair rwlock is the >>> default. We will only need to change the relevant network read_lock() >>> calls to use the fair variant which will still be unfair if qrwlock >>> isn't used. We are not going to touch other read_lock call that don't >>> care about fair or unfair. >>> >> Hmm... backporting this kind of invasive change to stable kernels will >> be a daunting task. >> >> Were rwlocks always unfair, and we have been lucky ? >> > Yes, rwlocks was always unfair and it always had this kind of soft > lockup problem and scalability problem because of cacheline bouncing. > That was reason of creating qrwlock which can at least provide a fair > rwlock at task context. Now we have systems with more and more cpus and > that is the reason why you are seeing it all over again with the > networking code. No fair play without paying the price. If writer wants to play fair game with readers, it has to wait in queue for at least a tick then forces readers to go the slow path by setting _QW_WAITING. Only for thoughts now. Hillf +++ b/kernel/locking/qrwlock.c @@ -22,16 +22,20 @@ void queued_read_lock_slowpath(struct qr /* * Readers come here when they cannot get the lock without waiting */ + if (_QW_WAITING & atomic_read(&lock->cnts)) + goto queue; + if (unlikely(in_interrupt())) { /* * Readers in interrupt context will get the lock immediately - * if the writer is just waiting (not holding the lock yet), + * if the writer is not waiting long enough, * so spin with ACQUIRE semantics until the lock is available * without waiting in the queue. */ atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED)); return; } +queue: atomic_sub(_QR_BIAS, &lock->cnts); /* @@ -60,6 +64,8 @@ EXPORT_SYMBOL(queued_read_lock_slowpath) */ void queued_write_lock_slowpath(struct qrwlock *lock) { + unsigned long start; + int qw; int cnts; /* Put the writer into the wait queue */ @@ -70,12 +76,20 @@ void queued_write_lock_slowpath(struct q atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)) goto unlock; - /* Set the waiting flag to notify readers that a writer is pending */ - atomic_or(_QW_WAITING, &lock->cnts); - + start = jiffies; + qw = 0; /* When no more readers or writers, set the locked flag */ do { - cnts = atomic_cond_read_relaxed(&lock->cnts, VAL == _QW_WAITING); + if (qw == 0 && start + 2 < jiffies) { + qw = _QW_WAITING; + /* + * Set the waiting flag to notify readers that a writer is pending + * only after waiting long enough - that is the price writer pays + * for fairness + */ + atomic_or(_QW_WAITING, &lock->cnts); + } + cnts = atomic_cond_read_relaxed(&lock->cnts, VAL == qw); } while (!atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)); unlock: arch_spin_unlock(&lock->wait_lock);