From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6C69C3DA78 for ; Tue, 17 Jan 2023 18:28:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 803346B0074; Tue, 17 Jan 2023 13:28:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B3636B0075; Tue, 17 Jan 2023 13:28:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6538B6B0078; Tue, 17 Jan 2023 13:28:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 55DC26B0074 for ; Tue, 17 Jan 2023 13:28:54 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1D5221C4F92 for ; Tue, 17 Jan 2023 18:28:54 +0000 (UTC) X-FDA: 80365127388.28.68802FF Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) by imf02.hostedemail.com (Postfix) with ESMTP id 5BB5A8001D for ; Tue, 17 Jan 2023 18:28:52 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=TugNNA6I; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673980132; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xJ9a5k62K2ffRLYGDnXiEVgAqCOpETFmSpdpD6ZFTEU=; b=TRRiNZeaYwj1hEOD1b13yxin3vrVSX8Fu1TdO9kwarXV29XmbIBH2VjVfQ0+yDhHbh/OAz S6MQ6xcaF5Qhx4ZhcJxqvAp0PtIxJjQm1lXf22y2R86HCr8CKgemZUjzkuiRC7AelmJ0Ye Cw1x9KnEQ/opHtBXd0lryL3mLxl5ET4= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=TugNNA6I; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673980132; a=rsa-sha256; cv=none; b=LDzuNTt4yU62smnIiUSV5ZO9CUg4Ar0vEnytKMOKQwW6cdx+eTI3iqeIpChV/4cSZLpKlH If9z39OnmiiGCiB22L6PrG7IyIk19544T5dd6MBBWhFmVV2CGM1Xcl6dNeda1IhypPpuFl wrNzr8QFBYBkLlsPAwdXOnKi4Tx8BX0= Received: by mail-yb1-f173.google.com with SMTP id d62so19008378ybh.8 for ; Tue, 17 Jan 2023 10:28:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xJ9a5k62K2ffRLYGDnXiEVgAqCOpETFmSpdpD6ZFTEU=; b=TugNNA6IW3kjjXC0UB9Q4zYArPnczNcobp3A2tpyLPyt4MtQKyr6zzhhrfsm7d8XZz plZwibra7wIvfScF9ZMs3g3vVO8FjrRNHdle/svQaIXQsOx9qxV3vfbolg5PUUICJ3Kf d7qgMgVGJnBSkAUqbq6g1OreLXwU+87l5mwXQwW706kG5xO5+05EIi5bOXg79nz32Qrw dsQMNA/Gw2YTnH7r5UvQ4jWIKQWbtUYH+IOPS2OMLDSqBX7I9IH8i+VuO9kGuLBAVeFC nOreMEr4mt8weNwZnB7rkFh3VQKPCypGe5gwLoMmY+xBGfo+wxqPOBVJpImescmSooeA 4IFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xJ9a5k62K2ffRLYGDnXiEVgAqCOpETFmSpdpD6ZFTEU=; b=3pztaXqwnxxswIx3TqGSCNkHXH0LozOB2ueY3RIGeIyf00zsGBijuFAwFFSrHpHKoK m33qUF1CpEDZQSnW21TpvqRPQyj7BUn8ESAcDTSmgNEaEQBMzV7jUoIcJj5rq27biBiM qyVVs25P39ljgZCa2YyD6h+EYx2PwX9F/aTRY61QLakWtr7a3R2S/WY9/Nd5/60/B8hm Y/NtHJOS4qrC3/dEgERSJ4XdEVVUSXfwMy3MkcGTibgpB/VsSHq/DEe9ToL1k2uAto29 TQIwm1CDUG9jjoL2gS/qcBzRj+/mwmFD2wmmII9ECuI3k0jGCpFBriOPOUKVEumu4NcH 8OjQ== X-Gm-Message-State: AFqh2krvuTqKsN4lBGpzOi9QCKlxGq6KOrjckbpOP7bMI3wHgF2qk2k/ OMI5R6YdF4YtwMtZ16tQ7Np+FQBrTI7L5M/XYMO/jQ== X-Google-Smtp-Source: AMrXdXul+ds4BnedPHTIDuUTPwQ9kRruzqOz+sh02Z6/mQHpytATwg44fNmM1minbyCreXoEm6RuK7SuXwA7MwL42W8= X-Received: by 2002:a25:f305:0:b0:7b8:6d00:ef23 with SMTP id c5-20020a25f305000000b007b86d00ef23mr555146ybs.119.1673980131146; Tue, 17 Jan 2023 10:28:51 -0800 (PST) MIME-Version: 1.0 References: <20230109205336.3665937-1-surenb@google.com> <20230109205336.3665937-42-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 17 Jan 2023 10:28:40 -0800 Message-ID: Subject: Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock To: Matthew Wilcox Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>, akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 5BB5A8001D X-Stat-Signature: 4kd4c36qnwrkh4ayghrhtihtf4rw44zn X-Rspam-User: X-HE-Tag: 1673980132-960819 X-HE-Meta: U2FsdGVkX1/mSBGrvR/88rkaVbSG1ddV4QxpSgifjP2bjz8akkoeuKffmCAt23kXB5ivpcIvX9Nfdq4M5aHP9wySINywFp5xKNCUp2kUk+vsLI5218IE801g6kFUgtM7AIHaPq86u7Jg4ubND53bm8luu7IsiKN9S7qU83XJSEtmAb0raQK1EVhNx51mBS/EQKLJQrqrfc/rCxrDyZ3WucPFSrH6+wMX68Ch8mvOaOBfoJe66WYIEuo4bkERW9XcOSE/UGffNOm61bG33N3P68UnzEbjKyOFY1m1l1is8eBknZxyqbjkoeax7fmZggFSt40bESk4Nitr4RbvwbReqMpvBVYtdARbKPU9aALXdIJ3iYuDGC5GpihKPb2lrjtxmcyxzOrirIRkgBfDI4hQ7+AOHjmSZ58S1b1RZeqI5hm5AV6FAZeeJZsxTJ6Kn+mTwTfasPf2u8SF9PdGoezg2WlBg0CNQiY94aS/JzElxx5MgAeH7xHByqbJQQ8f44iuPabTQhNn4kk2pCv91Ov0xEUSITjrQ54rouUb1IGDDCtm/YcoW3WYboDKuYW7gyx0JbYISqjJoEFs9psMCG3v9phoz3iDuz/UQ5E4rGQOTv+BD8lQxM6dQZwZ78jyxugwEzlMFxKIQA5906Yv01ctF67mifs5zy/EfLbru3IoJLeKcpVV/IKIS5X0VxFC79Tht4FcXb8emj68M3xE20EyiWmvmLhHkFl92u+SXRMkJZsdSQPAmSugI9UAxnnJgKiizDedLfOp6VE3jpD+zNfdXD+SK0CTCS5OfFkToif8z74WdfmAtXrTnn1guj52er4xtqPVBu19bW2HVemLrM3Fc3cCfsMpEYEN5F1As35FDDx0eetG9z2MJrl38RLjHdx1KwR8j++60AFniaQoRFRRLLJWCNHjaE/FwKiidAUdzYeUfY2QIWLYaviG5thFQfNQkBC4i7lbWatMxgfzxUk tKv6eWbG oAkBjEPfFnk5Q7mTcBVxghyht5xnUPZFCtWNyZaI1/9yj6fjOKh0QY4xOnP7/G0htieP6vrERszS+JlfTzIiuHHfxq5ZledtgHrhbN0oL7quniERE8oWpbVccvkX0VYdaWY2e X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 17, 2023 at 10:23 AM Matthew Wilcox wrote: > > On Mon, Jan 16, 2023 at 09:58:35PM -0800, Suren Baghdasaryan wrote: > > On Mon, Jan 16, 2023 at 9:46 PM Matthew Wilcox wrote: > > > > > > On Mon, Jan 16, 2023 at 08:34:36PM -0800, Suren Baghdasaryan wrote: > > > > On Mon, Jan 16, 2023 at 8:14 PM Matthew Wilcox wrote: > > > > > > > > > > On Mon, Jan 16, 2023 at 11:14:38AM +0000, Hyeonggon Yoo wrote: > > > > > > > @@ -643,20 +647,28 @@ static inline void vma_write_lock(struct vm_area_struct *vma) > > > > > > > static inline bool vma_read_trylock(struct vm_area_struct *vma) > > > > > > > { > > > > > > > /* Check before locking. A race might cause false locked result. */ > > > > > > > - if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) > > > > > > > + if (vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) > > > > > > > return false; > > > > > > > > > > > > > > - if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0)) > > > > > > > + if (unlikely(!atomic_inc_unless_negative(&vma->vm_lock->count))) > > > > > > > return false; > > > > > > > > > > > > > > + /* If atomic_t overflows, restore and fail to lock. */ > > > > > > > + if (unlikely(atomic_read(&vma->vm_lock->count) < 0)) { > > > > > > > + if (atomic_dec_and_test(&vma->vm_lock->count)) > > > > > > > + wake_up(&vma->vm_mm->vma_writer_wait); > > > > > > > + return false; > > > > > > > + } > > > > > > > + > > > > > > > /* > > > > > > > * Overflow might produce false locked result. > > > > > > > * False unlocked result is impossible because we modify and check > > > > > > > * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq > > > > > > > * modification invalidates all existing locks. > > > > > > > */ > > > > > > > - if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { > > > > > > > - up_read(&vma->vm_lock->lock); > > > > > > > + if (unlikely(vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { > > > > > > > + if (atomic_dec_and_test(&vma->vm_lock->count)) > > > > > > > + wake_up(&vma->vm_mm->vma_writer_wait); > > > > > > > return false; > > > > > > > } > > > > > > > > > > > > With this change readers can cause writers to starve. > > > > > > What about checking waitqueue_active() before or after increasing > > > > > > vma->vm_lock->count? > > > > > > > > > > I don't understand how readers can starve a writer. Readers do > > > > > atomic_inc_unless_negative() so a writer can always force readers > > > > > to fail. > > > > > > > > I think the point here was that if page faults keep occuring and they > > > > prevent vm_lock->count from reaching 0 then a writer will be blocked > > > > and there is no reader throttling mechanism (no max time that writer > > > > will be waiting). > > > > > > Perhaps I misunderstood your description; I thought that a _waiting_ > > > writer would make the count negative, not a successfully acquiring > > > writer. > > > > A waiting writer does not modify the counter, instead it's placed on > > the wait queue and the last reader which sets the count to 0 while > > releasing its read lock will wake it up. Once the writer is woken it > > will try to set the count to negative and if successful will own the > > lock, otherwise it goes back to sleep. > > Then yes, that's a starvable lock. Preventing starvation on the mmap > sem was the original motivation for making rwsems non-starvable, so > changing that behaviour now seems like a bad idea. For efficiency, I'd > suggest that a waiting writer set the top bit of the counter. That way, > all new readers will back off without needing to check a second variable > and old readers will know that they *may* need to do the wakeup when > atomic_sub_return_release() is negative. > > (rwsem.c has a more complex bitfield, but I don't think we need to go > that far; the important point is that the waiting writer indicates its > presence in the count field so that readers can modify their behaviour) Got it. Ok, I think we can figure something out to check if there are waiting write-lockers and prevent new readers from taking the lock.