From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 789F5E77180 for ; Tue, 10 Dec 2024 23:38:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0980A8D0024; Tue, 10 Dec 2024 18:38:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 047C68D0017; Tue, 10 Dec 2024 18:38:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E51ED8D0024; Tue, 10 Dec 2024 18:38:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C51558D0017 for ; Tue, 10 Dec 2024 18:38:04 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4952212102F for ; Tue, 10 Dec 2024 23:38:04 +0000 (UTC) X-FDA: 82880663250.20.8128F58 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf21.hostedemail.com (Postfix) with ESMTP id B30A01C0002 for ; Tue, 10 Dec 2024 23:37:19 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=gzOBlKEe; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of surenb@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733873871; a=rsa-sha256; cv=none; b=u5qyQZ+S3RJuKHt/x3TkglbRxXkqgiwdlD3Xr+3WY8F4NhAAL+P6IB9Y1Gg9PVVtZfMO+h 9mT8lOxYO8ggtzG3jxs73nezDh0HPVDShXVzDxVrwLSZ2uFHBMIKie+C70iFCj9/9YW8Uj yvm4caB5BE9GqqNQALG9YpmTmw1hwJY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=gzOBlKEe; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of surenb@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733873871; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Uu2SYWJsLxhdZs4rx8neOqZ9H9vIwKibAwt/zeZ9+H0=; b=b0vQzACTHCmVBUXIQJXiy7TkOdpiYE3uZ5g5d9XFxtPy0xbp20Rx4kr1EX/88XMzmRD26p 9yPZ4vWf2hePITtXtSEoFehQTBVbO8zQrGN9kQsdYh6e0cjOMl/tS1R5mzOmVRcMUYxHJD JRYHn1019O5Hj5C8PnC3+H3DwsdGJTs= Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-4674c22c4afso30771cf.1 for ; Tue, 10 Dec 2024 15:38:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733873881; x=1734478681; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Uu2SYWJsLxhdZs4rx8neOqZ9H9vIwKibAwt/zeZ9+H0=; b=gzOBlKEef44XSnKTCtofqGw0wr6KYRVXDDNsAZx1bPAt7c98ouJXZCDQ0CheHdlMwn fVo6jN5fYyuvWT+K8Lezs6NR+UFFrXpivZlHsz1anaGEv0vTct2Dbj+yJQJuq8y5WoXX jedUlI1XEQptvwTgCpny0CbO99Ks1jwhqQx88tkgXiq7+DmpojJsqldN0uF/47Za1cvA +3xjwCS4mh9mADSlgckvD/c5WnaDPwuEI3uJmX5EwvOS95hQNDbGASqXupbzGpwqav+L 9/A1IReHHzMhgDIUCSKGg1XpQZeftDq0nPMeoZ1Kg0IyZ4ayyaUXiSxKw8QCjxcr+8Gy OU8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733873881; x=1734478681; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Uu2SYWJsLxhdZs4rx8neOqZ9H9vIwKibAwt/zeZ9+H0=; b=Q8ndTMLwrHkcnFOAesBCYpB1BDvkG20mj6MumGFMLBsVZlYeBixP/+3iCU8fD9tUmM WNYTAsd3hQW0OGWLfxwV1WnucDZhZrJOWn+7kaf+EYENKjt3Zzh/yEliDFOdLcmQL8rU 4dHD1FmmR73VQv5P31mYf7w8SE2r8RkmakKxIINL6YR3HJ5aT4V0UwC/xaUXDjYONfT8 IZxYKDN2ayj/YO7z5q+SgnvY+ThGUaTf5HVtoMCUdcicLdwEBwee5+Skz2MOVWYhmOSJ vjOybGqxwbI+iDK9cSaIDvTkxtUibje1H9ftB5l8j3nLHZ68tRErUlsVGjJ1m7b7ASnm JktQ== X-Forwarded-Encrypted: i=1; AJvYcCWoSEa7xxlqKBxPDo1qFfIqHOyQAI7jKnGEYhW+VHhx8GrxcXGZKfsic7oIFYW5NXxebAGr5FnNBg==@kvack.org X-Gm-Message-State: AOJu0Yyn507KJ9XuHFScbzKpj66vUJWh+X8zHvZF3Hgp2crOIk102RQr w7u+mShCDmSGqKv+CcHmeX64J1PU6hs3dVy5fZOiiNJGD3WSAvlbvRi8vwMPmDHo/9nSN6Iizqe /VU6Z70ck6L39bh1obEd7U8sqjcU5h0FIdlUc X-Gm-Gg: ASbGncslO82Se2ni63Zv/JtN19kegkgav9Vzt16TgMrwFMysdDj4k1sw4Jv/ghtRap3 Q8YalSkINyz+8qNCACBjNxNybKlStA6FK8hZekl49/zVYm42gDeo3Q75x28E5poqpOCte X-Google-Smtp-Source: AGHT+IGVBhSdLp0As/jnqYKsN1aao1rI32aOWTfzLBLFYeCOWRS7Ts77DeQuAR9U51StXOPpmkFzjIEqygtIUoqBAKY= X-Received: by 2002:a05:622a:1928:b0:466:8a29:9de7 with SMTP id d75a77b69052e-46788c830b3mr1583661cf.12.1733873881306; Tue, 10 Dec 2024 15:38:01 -0800 (PST) MIME-Version: 1.0 References: <20241111205506.3404479-1-surenb@google.com> <20241111205506.3404479-4-surenb@google.com> <20241210223850.GA2484@noisy.programming.kicks-ass.net> In-Reply-To: <20241210223850.GA2484@noisy.programming.kicks-ass.net> From: Suren Baghdasaryan Date: Tue, 10 Dec 2024 15:37:50 -0800 Message-ID: Subject: Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock To: Peter Zijlstra Cc: Matthew Wilcox , akpm@linux-foundation.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: B30A01C0002 X-Stat-Signature: oqt69prcibohy4ak48bdxu5eyuz9afaw X-HE-Tag: 1733873839-718897 X-HE-Meta: U2FsdGVkX18NirxIsn/xGHbhTKvYtEdE3XuRMU0DdtXghfx2AwmL9g9Z2tL4ej5WG5uyQ50GAGAWY08C01qMLIJFzmHREtwxyV3A6FXoHSqmsRJbXtiixmL5FX+r+404hUNoPfCDmgZDPYbfuWdFaplleCy05qCQ9njKIyX2ImZ8Eyt0VmD4UBOcE0I1w2WpqUL+7lOTJpXJOrq22tUPE1nGvhLKRy/U7apctRmnfHbCkdP3dG34OAaA/1/X/CNENeTJnI/i7C8gMbsybXcfSDc8Mw3wBj+eh0RnP6PsQARLGrGhquuP8MdbS4X0dqo56hJYDG3Vi89kz5HUYT+turgiDrdl3evKM09AIEasQtKWiJxtDtS/sRzPsS4PdvAETf5flTiO/1N//H+oWk37c4bvKRtWLCKowKJx2wLq5PvtiEeH/i3xyUTmGALRpws4r35CU9O0DWAOBTEQxiWF0M7JhDC71P0So47qkP7+pCmfDVpQfoWeFvjruyq80e0MX5Ul9oWKbKsvvxwmI1hzBIAih1pfmFNLP2fNC3Tbr5mbj/wYCu5Q9OKG847mUD3jNjhigYAsNmLFKI2hgFoDQZgHKkRLzXCjfMvRwWK/5bH5ys/N/OQ4VdLMgut8E+mhZP/WWboyhlzUoR/GCVm3seEjCVlmox1C/T+qhICWkx7alQNWcKeurI4biCjfK3vrB3CJ/jpdGO66MYpck8J9zoM74Oz250oRMcllUVeEG8rgTqiP5RcwkAJ9K7XdM0eC9qJMaAa8O0/F+0/yNiv2JhQVTz0PsuRDsI9PSI+ok4n+/+CDIcfESNHKSZRY8OBVnqW77+oewKxWy8O1lkq/T6MNpvUXcxmW6nGtNutUOMlLJuFYmnNMUYQ2zxGyfXZ5C+f9sPu7VCy+xXtm+dxhJ11ROzOYWjFvmm50mc8+Rb7QFZUnD6zY1GzLYU0rbEc0giUed9o/hjMSt/N6s2v /kxuHrAN w36aFhfN5MT//KdIRQjOr1+6qKO/XVzf2yq73iOXXof02rQZlGegcK+wcEBsCQTS4A+e35T9V9xZgVOpS56Cf4WwfhVc50WXRw2qPkpVHclLtHEHCqgA/A3IHRjEMfXzz7MVXeRhuSIpUyedZytlYs41lqrytKTh7VeEtgaA17Tj/Ks69foTD2Esq/nLV5K47gQCrIkqhf8ILRv29xtiOjjzBOPAfyEg1ivo+xS9fHhxJYVO5CRkzS4is85jbCb/rOu4Rybe9W/pr5hUnXrOIdfMqn3jbSRfziEctAe8X3qEzqLQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.022445, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 10, 2024 at 2:39=E2=80=AFPM Peter Zijlstra wrote: > > On Tue, Nov 12, 2024 at 07:18:45AM -0800, Suren Baghdasaryan wrote: > > On Mon, Nov 11, 2024 at 8:58=E2=80=AFPM Matthew Wilcox wrote: > > > > > > On Mon, Nov 11, 2024 at 12:55:05PM -0800, Suren Baghdasaryan wrote: > > > > When a reader takes read lock, it increments the atomic, unless the > > > > top two bits are set indicating a writer is present. > > > > When writer takes write lock, it sets VMA_LOCK_WR_LOCKED bit if the= re > > > > are no readers or VMA_LOCK_WR_WAIT bit if readers are holding the l= ock > > > > and puts itself onto newly introduced mm.vma_writer_wait. Since all > > > > writers take mmap_lock in write mode first, there can be only one w= riter > > > > at a time. The last reader to release the lock will signal the writ= er > > > > to wake up. > > > > > > I don't think you need two bits. You can do it this way: > > > > > > 0x8000'0000 - No readers, no writers > > > 0x1-7fff'ffff - Some number of readers > > > 0x0 - Writer held > > > 0x8000'0001-0xffff'ffff - Reader held, writer waiting > > > > > > A prospective writer subtracts 0x8000'0000. If the result is 0, it g= ot > > > the lock, otherwise it sleeps until it is 0. > > > > > > A writer unlocks by adding 0x8000'0000 (not by setting the value to > > > 0x8000'0000). > > > > > > A reader unlocks by adding 1. If the result is 0, it wakes the write= r. > > > > > > A prospective reader subtracts 1. If the result is positive, it got = the > > > lock, otherwise it does the unlock above (this might be the one which > > > wakes the writer). > > > > > > And ... that's it. See how we use the CPU arithmetic flags to tell u= s > > > everything we need to know without doing arithmetic separately? > > > > Yes, this is neat! You are using the fact that write-locked =3D=3D no > > readers to eliminate unnecessary state. I'll give that a try. Thanks! > > The reason I got here is that Vlastimil poked me about the whole > TYPESAFE_BY_RCU thing. > > So the normal way those things work is with a refcount, if the refcount > is non-zero, the identifying fields should be stable and you can > determine if you have the right object, otherwise tough luck. > > And I was thinking that since you abuse this rwsem you have, you might > as well turn that into a refcount with some extra. > > So I would propose a slightly different solution. > > Replace vm_lock with vm_refcnt. Replace vm_detached with vm_refcnt =3D=3D= 0 > -- that is, attach sets refcount to 1 to indicate it is part of the mas, > detached is the final 'put'. I need to double-check if we ever write-lock a detached vma. I don't think we do but better be safe. If we do then that wait-until() should accept 0x8000'0001 as well. > > RCU lookup does the inc_not_zero thing, when increment succeeds, compare > mm/addr to validate. > > vma_start_write() already relies on mmap_lock being held for writing, > and thus does not have to worry about writer-vs-writer contention, that > is fully resolved by mmap_sem. This means we only need to wait for > readers to drop out. > > vma_start_write() > add(0x8000'0001); // could fetch_add and double check the high > // bit wasn't already set. > wait-until(refcnt =3D=3D 0x8000'0002); // mas + writer ref > WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); > sub(0x8000'0000); > > vma_end_write() > put(); We don't really have vma_end_write(). Instead it's vma_end_write_all() which increments mm_lock_seq unlocking all write-locked VMAs. Therefore in vma_start_write() I think we can sub(0x8000'0001) at the end. > > vma_start_read() then becomes something like: > > if (vm_lock_seq =3D=3D mm_lock_seq) > return false; > > cnt =3D fetch_inc(1); > if (cnt & msb || vm_lock_seq =3D=3D mm_lock_seq) { > put(); > return false; > } > > return true; > > vma_end_read() then becomes: > put(); > > > and the down_read() from uffffffd requires mmap_read_lock() and thus > does not have to worry about writers, it can simpy be inc() and put(), > no? I think your proposal should work. Let me try to code it and see if something breaks. Thanks Peter!