From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 030EFE77180 for ; Tue, 10 Dec 2024 22:39:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6CDCE8D0023; Tue, 10 Dec 2024 17:39:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 655A78D0017; Tue, 10 Dec 2024 17:39:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4F9FE8D0023; Tue, 10 Dec 2024 17:39:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2DE508D0017 for ; Tue, 10 Dec 2024 17:39:06 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id ABE2A140B85 for ; Tue, 10 Dec 2024 22:39:05 +0000 (UTC) X-FDA: 82880515242.01.941F8F1 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) by imf09.hostedemail.com (Postfix) with ESMTP id 2C7A2140005 for ; Tue, 10 Dec 2024 22:38:47 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b="quE/vhoI"; spf=none (imf09.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733870333; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6Z6f8By0er5Bpt2mZnEJVSkHMXAxhgYlT24keL+DlzQ=; b=mCS5mFBA+Yq1OR/qr6XNrVuT7OZYJc7gx9z/ZZ0WzkwbkHeehDmrHhqcbv9HdjA3426Xfd r/DgKiUTJUVDqwgL6E6dQArVU03m1BhQyMBIpjo364X5mHuVx3iKGNFyJoYpHcXtjySQle AsFwwdPAHJqLFaMcIM/RzAwqzKxRAc4= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b="quE/vhoI"; spf=none (imf09.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733870333; a=rsa-sha256; cv=none; b=VsFtBkLzl4b1k3JaSQ8cn1T3hF2J2dOr0hkbMW8zBleJ+I0Le4o9sJAPVM924oqP7PpiEZ prr8hYPOfEk8scWPQmJST5X5kgFc88BF1SWraS7qVCfYG1MXrjvbV/nKP/vbj39+/WNNg6 haXxUgrINhizDdBIt7gtN8H8eIr7fLw= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=6Z6f8By0er5Bpt2mZnEJVSkHMXAxhgYlT24keL+DlzQ=; b=quE/vhoIfs0hRTKXaHNBdDpod1 jn1GMuKF6WkW5VoOjLc1PszLeOj0LwrASzHgrtD07Y8a79AJhLUBTynZ7VO7PhVrdllur3+ovt7oo zufumRE2Xsuk5TyJ575TlZlHwQp5cQp/ibzWJNENC/80HEhN8gco7euk0ChjUg86ftRoaHTrRvyZC U7HLNXBUuAQOFdzPpYGFWUoZAGbU8vyg6whZFWmGsypGcqu4jafpuM0LFzkDZ3dOMLvpygz8UUHGo nLQyR5BmJ1485xvZMwI+7SUV9eKxXiOm7eJsX5V2CdO2fTgOQ0qGi5/lbjQWMteVY7lND5duPHWE3 xrbD8FDw==; Received: from 77-249-17-89.cable.dynamic.v4.ziggo.nl ([77.249.17.89] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tL8sV-00000003jqh-1IyZ; Tue, 10 Dec 2024 22:38:51 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 7F9E130035F; Tue, 10 Dec 2024 23:38:50 +0100 (CET) Date: Tue, 10 Dec 2024 23:38:50 +0100 From: Peter Zijlstra To: Suren Baghdasaryan Cc: Matthew Wilcox , akpm@linux-foundation.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock Message-ID: <20241210223850.GA2484@noisy.programming.kicks-ass.net> References: <20241111205506.3404479-1-surenb@google.com> <20241111205506.3404479-4-surenb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 2C7A2140005 X-Stat-Signature: 761hpbqerh8bc4xebccme8fapm5x63qy X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1733870327-819277 X-HE-Meta: U2FsdGVkX1+2ah2IEQo3CXHRFpf3niW70VE92DEiVAm7i/FFl3jefWAz3dYqUpAa+wXNJhVpzkSYdxjamu16ypy2RO6kCl4BbX1UcylG6ZHdM2yKFcPxH2B+MLhoRpgjPFFa2MUX7WKS+H2fipcchjj0o3n90ac42dBoaMMEoBB6hoDry5f2cjmAQ4vQM0+ark2lsoTDiC0a2ZYWAoeZRMsqfLWtb+nECZ+9Ryz+wkmlqIbtkbD0lKDHUXoLViJfH5E2b+f/LO1kuIxAbOqYMPPsLnKQUeu3ijjtAFVdRr+6ViFym9NHxyTtN9pvnpU0IsIeB4XG+1mEnE6kSo3i7+yPCSkU88LBrMpIoX3k3PX4ovGfD/AA9xTVnncOIyeFMqax4JDndU3qBc3Qm4ohpUSculNpKvewi2beLsNjCou9FPTFuO5zJsH45mjsEycOGs9HJllA3PvGZvUHcv5DJ5QddPaofm5LPWJaYILJqH3uTedQyd3ep9WaDQFcxhslBD8Ba36g09Ya5UzjBrmypLdfGbadOt80jBHHXyXyfaIurayCdrxxa8blxfOa/q1C6gtVNHiUidYiZVY5nO9acx6q0DG2a2I8FJSaAQ20hbJzbMZh/TvI9cD5jfiqlKWht+HK5xRigYf7Lmw/6xGu3LB59jwW1ZuVCtJXszIolxd1oEYP3tjAyNWF73R8wCVuhFKTLjPkygFCZZPElwnMBKB8fyVwEJLU3aTvtDptJDpP4PBZa0gqBp2PsJGWpyzoIXQ09auxhzMSfbWpV1KzQUzz8N5ZqL2BZkTX9+P7rtZFJhR22n/sq8Vc0f9IHl+InootFxGfGV8GcLUxzNeKhHEtWaXvYoZOiAS8LBbNgx9kJUmic1LnQmugL0r594g4oJ/pSrUZWf2EJi/Hmw38wPvLy4hSp8sVeW5ENAvc5AHPdwoHc5phxdgPL4M/8TmuQ2Jax0f5MiTigIo6F2I lw1jTpT5 e+K8i6Nn6vEBe2bvljL9OTA8YSauMaeN7rXoZrFmCkvT4LM3gZxBq6VWMpcmp4/RcVYjy6Qmi5pOEut+0p/AXEfTx583rQwUzdjh+tfSCl4xeLjRebLD75a1x3k0sq+3PtnIPKuv8sTEm/QFI6jAYES9C8z+OOJDqSgLkhYQqybPCGrMbotyAsPBpW5To/8cI0Z2RXscoWWZGegVUjyFKI/RbLYyahGAyjZDCFK+Mhm6ocAwwlI9HMkOmapAGIVZuTlKkhVZQG1OW5e7w0y5KTGgtuL6W1I5P7G2BmohxiPsS+85l4XcuhzoMNf7qxknv8tu2gqe1Bfmwma53tqRi4eQsq3g62KGmu/DebsoRLZhpaVnwTi+dhZsMA5FPehGvOhgm6+sZFtL+aOgP5/zXaqPbzmjMf9fpMm5MhvPZJbMiLR9KPIL/lqIOfwWUS0FvVaHt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 12, 2024 at 07:18:45AM -0800, Suren Baghdasaryan wrote: > On Mon, Nov 11, 2024 at 8:58 PM Matthew Wilcox wrote: > > > > On Mon, Nov 11, 2024 at 12:55:05PM -0800, Suren Baghdasaryan wrote: > > > When a reader takes read lock, it increments the atomic, unless the > > > top two bits are set indicating a writer is present. > > > When writer takes write lock, it sets VMA_LOCK_WR_LOCKED bit if there > > > are no readers or VMA_LOCK_WR_WAIT bit if readers are holding the lock > > > and puts itself onto newly introduced mm.vma_writer_wait. Since all > > > writers take mmap_lock in write mode first, there can be only one writer > > > at a time. The last reader to release the lock will signal the writer > > > to wake up. > > > > I don't think you need two bits. You can do it this way: > > > > 0x8000'0000 - No readers, no writers > > 0x1-7fff'ffff - Some number of readers > > 0x0 - Writer held > > 0x8000'0001-0xffff'ffff - Reader held, writer waiting > > > > A prospective writer subtracts 0x8000'0000. If the result is 0, it got > > the lock, otherwise it sleeps until it is 0. > > > > A writer unlocks by adding 0x8000'0000 (not by setting the value to > > 0x8000'0000). > > > > A reader unlocks by adding 1. If the result is 0, it wakes the writer. > > > > A prospective reader subtracts 1. If the result is positive, it got the > > lock, otherwise it does the unlock above (this might be the one which > > wakes the writer). > > > > And ... that's it. See how we use the CPU arithmetic flags to tell us > > everything we need to know without doing arithmetic separately? > > Yes, this is neat! You are using the fact that write-locked == no > readers to eliminate unnecessary state. I'll give that a try. Thanks! The reason I got here is that Vlastimil poked me about the whole TYPESAFE_BY_RCU thing. So the normal way those things work is with a refcount, if the refcount is non-zero, the identifying fields should be stable and you can determine if you have the right object, otherwise tough luck. And I was thinking that since you abuse this rwsem you have, you might as well turn that into a refcount with some extra. So I would propose a slightly different solution. Replace vm_lock with vm_refcnt. Replace vm_detached with vm_refcnt == 0 -- that is, attach sets refcount to 1 to indicate it is part of the mas, detached is the final 'put'. RCU lookup does the inc_not_zero thing, when increment succeeds, compare mm/addr to validate. vma_start_write() already relies on mmap_lock being held for writing, and thus does not have to worry about writer-vs-writer contention, that is fully resolved by mmap_sem. This means we only need to wait for readers to drop out. vma_start_write() add(0x8000'0001); // could fetch_add and double check the high // bit wasn't already set. wait-until(refcnt == 0x8000'0002); // mas + writer ref WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); sub(0x8000'0000); vma_end_write() put(); vma_start_read() then becomes something like: if (vm_lock_seq == mm_lock_seq) return false; cnt = fetch_inc(1); if (cnt & msb || vm_lock_seq == mm_lock_seq) { put(); return false; } return true; vma_end_read() then becomes: put(); and the down_read() from uffffffd requires mmap_read_lock() and thus does not have to worry about writers, it can simpy be inc() and put(), no?