From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00940E77180 for ; Fri, 13 Dec 2024 17:41:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 41F646B0092; Fri, 13 Dec 2024 12:41:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CEE16B0093; Fri, 13 Dec 2024 12:41:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26E7B6B0095; Fri, 13 Dec 2024 12:41:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 075D16B0092 for ; Fri, 13 Dec 2024 12:41:37 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A2AD881007 for ; Fri, 13 Dec 2024 17:41:36 +0000 (UTC) X-FDA: 82890652740.29.A37D034 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf25.hostedemail.com (Postfix) with ESMTP id 44E33A0010 for ; Fri, 13 Dec 2024 17:41:16 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pePKGd0R; spf=pass (imf25.hostedemail.com: domain of surenb@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734111670; a=rsa-sha256; cv=none; b=IBqoTgD8HLX9ozoO62IC8KmNYXkXEfLukPfb0QO9PjrlOnlIZeIsKYlrdbHlSW8gld/mnm sPIUznvIgrMkYfzNRDhH77wf6DhFBWBYR3sZ/Xz6X4cX6ydCs/RxlTJmxYi+6Nh/Ws1JjJ yg0qlVHy3R+a88j7ILnEdXrMhxkuaAA= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pePKGd0R; spf=pass (imf25.hostedemail.com: domain of surenb@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734111670; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z4qzw5gnIvrJ4Oafcqo/5vhxoLnxjAIY9Rxm1LIG2b8=; b=cQ7Bf4xMwCgXsatH6yogIdK8cCqy7aWKZLgkVT2Y/SSLE33CBKr5Go1XU+rSv1mtR8lL37 9vARzoxXa95xeIkdmPLo9p9eezW/wQM/IHjtE2h3sYjjROnSyDXZPfNyX4yMqj0+2zJtHo MHvPJdivxjBQ2x090TWIqvWBS01FfsI= Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-4678c9310afso273041cf.1 for ; Fri, 13 Dec 2024 09:41:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734111694; x=1734716494; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Z4qzw5gnIvrJ4Oafcqo/5vhxoLnxjAIY9Rxm1LIG2b8=; b=pePKGd0RThbExjNh5dibPIKXWe5Cvm2rSleLtHQDJboN7NkPnU+XoVJUUhTFq5R+ed jix7pSYThokMrJwHPmvqTaPV1DhO9g2aQH/nzpl+ZF6o+oeaPR88SvzGZMLlyFFdoPrQ A5cClmOr1U1+vwkc6CmsTvjlrcIo5JCyP+qCxPdvbN3KfJQOAuK2MbHqJHiR+c4FJJgQ bibHBmIXsn/bPQf+RQZ9djaNihP6PnuSJ4/p+P6aTUal1nTJ+erTNw22oZIPUecsFRgT AS+j/gLdATz99b5LwaRC1fsBX88wsSn0C5yLZUouZ/jAcGy1mIw1CnfXMPg1vKO33gKR LUug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734111694; x=1734716494; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z4qzw5gnIvrJ4Oafcqo/5vhxoLnxjAIY9Rxm1LIG2b8=; b=uscuX88NCCtdfwblEd3JaA5oxPTo5bwcUbqpeWKI34vbehFwsOHct/KLgRInRbwIGP 6Xrt/RoQ0wpUrf4twvcg4z0hl2gSuRiPSELeyIsPc4JFtJySU6gTQlBjll3pV7K09wKj 2HlIa3rh9JLT2j9HLFujAfQYo3kQFtFLcz6lVroR8K6c3LxWcnHOSibiPcIIaB4D/0Ls MQODSSQl3d5ThCxl/4OPUA9ZD3Q5jFq2xbcAKnOTAZ8w6VJAanKuZhy2Bp4Q1ggVFTpA RaJqlA1ipfMRlWo/Sb7B3fyla0+Ueq3ratXMTRQkFO+KrTAA8JtjXbS8FJQVAKwgjXGm fm8g== X-Forwarded-Encrypted: i=1; AJvYcCUfk/UGpYNguqyZkT2QeLyRTReHx6Aj749tM26WPcqqebwzGlGCt/LaS8s5ORUA9+c/ofziAuKrxQ==@kvack.org X-Gm-Message-State: AOJu0YylO6O77OitOoJm2wr3m587VYpvd6PdC/Stiz/73T9I1d6TShjq /kcGAbfmb390jjsbXzCIWFZsmYxLSypyTW0dZSTIEGmzlMnfZMbDMqQ008Skv4htz9E4rmqlz2+ TBAuo3VWcDNcSLFxIsq41jDK9DdeumGKHl9H0 X-Gm-Gg: ASbGncuTck+mHFW2+JKXYi5WjkC1CC0xGxI4BtJLsFIYo8P98grbJAtgUoQ3THqW2uO fcNxFe+4VTJlVRAeln6plPZY9Xr8lzsMueuKKCQ== X-Google-Smtp-Source: AGHT+IEb8ceLkAOZST8OwjF14xilDtpsMHM4R5Cfaax8Aipk1tchxHW6ZcyJ1KL4yNAuuPq8HL4xbBOYM3ENgeKkYv4= X-Received: by 2002:a05:622a:a09:b0:466:8c7c:3663 with SMTP id d75a77b69052e-467a5895ebemr3457691cf.5.1734111693572; Fri, 13 Dec 2024 09:41:33 -0800 (PST) MIME-Version: 1.0 References: <20241111205506.3404479-4-surenb@google.com> <20241210223850.GA2484@noisy.programming.kicks-ass.net> <20241211082541.GQ21636@noisy.programming.kicks-ass.net> <20241212091659.GU21636@noisy.programming.kicks-ass.net> <20241213092223.GB2484@noisy.programming.kicks-ass.net> In-Reply-To: <20241213092223.GB2484@noisy.programming.kicks-ass.net> From: Suren Baghdasaryan Date: Fri, 13 Dec 2024 09:41:22 -0800 Message-ID: Subject: Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock To: Peter Zijlstra Cc: Matthew Wilcox , akpm@linux-foundation.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 44E33A0010 X-Stat-Signature: fdp88y7ms9qwss4org573e33czddk1xw X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1734111676-250566 X-HE-Meta: U2FsdGVkX1/BDsxhYYT9M5/FNP8dmHXbr9CeMU+qfahWdtrJKk7ScK6q/2Mn2UcZjcxhAB0PCdOR6yX9aDTwcAz9J6h4bDF/EzddzVwLiyGiC/czWdfEkAGq25tcMsX6WJsFEi/OOp979bSN2OvZdXxbwLcAifFkpyOGU6tbM4BW+1Zcqng2a9Qfpz4N4+SJIzsN8xSOVz6namYlrnxdtR64CHWlsQg+ywlEl32ozzy5Dl0XtNAj7PINIXPGC4qFCr/o0qFyw9j4OKCgGo09FzceLpwY2a9+VPDHzzp6hlMhupp1lg2+17QeNHaSWvhmnzz60i8ED1kf/VzSsp2K3tWL75tqkcj9dh/9YODsrb9RmOSDD5cfUClbMtdDOqO5dm8yfgjTwWLLtY9O4Old2ib42Crq+r6YQ09piGbo83dqWFBYwCR7iFelAe/2AbC6+f6YUp+EHiHMHVtQluXTdC3e0U0z7c6DtybrVOi8dGqEN3IBXiMRf120zYE08ZOBVpTgs/nqYdxKKlwkGDW138EaFHLt1WwGZgryeodYIzpnMZXpgntpnv68bw8iZ8V0UN9VY+oAn5O1JIVhZ6fzkgVrSV+oYpuYJrahsBg0mEZMOpOA0G5HLrQ0ZUui/3EmMZx0noa4D2VwCfgJRs936uYJpof7za2WiIh5t1LcYvSFGICOtT8JQ0vL6X8/WFZaXQZpewiXSnEPZUHg4WQ7CD3aKhB4PNiiCwcNO5Ra+745X5YUwXh9HGe9iU93y15Lw/10SXl6jHY1GR8463fDKHEbh9ND/lrwJMM4Ap+HutQnls6aGXzk3jMnQPfeD8k7bI1mRO1j7BzOKu2rEpP9pKK+pKG8YSnbWEifN0mled+5SIb4fswsPoBJQzXOP78O7Gvrr20UIxLJ4yAgxegkF18yiAV22COck1ZeJd/iLFJjzNSnFCffQ+pixI6pOdmK8WkwSvwrZV3ZJdUmyBy apZz3L/2 pikXqUn3FqPsi6RjuTLpMG3kxvzobmmRwL/NOITODABBY9vB7MwQ2j2CTc6NXY3Zv9X+cCgyQqZrLZQ5nk0NvALIjNkFcVN01gxKWv2xzXMRPWenZ77v22FnujuAzFah5FJcPl4K4wj5ELsYXzOo4HLEFLu5O7sqliRKqO1Hhfxv4enHa5DM0qVqMS/KDXkatnvfZA41cnshylobh+tlqnOh90jJ2OpKk8QUrdLMVL/nICyC3U3mDLozqln6S8ejVDCEkVcJIi4Yppe8e1oq2ZrFZXok4yRuVZNjBmViqtYAHsdnKXUCx5XPQ0sXJE0YX11ZRVjj+spx79LKLGZ8MRc3CadYZDh4/lSxkUG8ACPsSsfA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.334467, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 13, 2024 at 1:22=E2=80=AFAM Peter Zijlstra wrote: > > On Thu, Dec 12, 2024 at 06:17:44AM -0800, Suren Baghdasaryan wrote: > > On Thu, Dec 12, 2024 at 1:17=E2=80=AFAM Peter Zijlstra wrote: > > > > > > On Wed, Dec 11, 2024 at 07:01:16PM -0800, Suren Baghdasaryan wrote: > > > > > > > > > > I think your proposal should work. Let me try to code it and = see if > > > > > > > something breaks. > > > > > > > > Ok, I tried it out and things are a bit more complex: > > > > 1. We should allow write-locking a detached VMA, IOW vma_start_writ= e() > > > > can be called when vm_refcnt is 0. > > > > > > This sounds dodgy, refcnt being zero basically means the object is de= ad > > > and you shouldn't be touching it no more. Where does this happen and > > > why? > > > > > > Notably, it being 0 means it is no longer in the mas tree and can't b= e > > > found anymore. > > > > It happens when a newly created vma that was not yet attached > > (vma->vm_refcnt =3D 0) is write-locked before being added into the vma > > tree. For example: > > mmap() > > mmap_write_lock() > > vma =3D vm_area_alloc() // vma->vm_refcnt =3D 0 (detached) > > //vma attributes are initialized > > vma_start_write() // write 0x8000 0001 into vma->vm_refcnt > > mas_store_gfp() > > vma_mark_attached() > > mmap_write_lock() // vma_end_write_all() > > > > In this scenario, we write-lock the VMA before adding it into the tree > > to prevent readers (pagefaults) from using it until we drop the > > mmap_write_lock(). > > Ah, but you can do that by setting vma->vm_lock_seq and setting the ref > to 1 before adding it (its not visible before adding anyway, so nobody > cares). > > You'll note that the read thing checks both the msb (or other high bit > depending on the actual type you're going with) *and* the seq. That is > needed because we must not set the sequence number before all existing > readers are drained, but since this is pre-add that is not a concern. Yes, I realized that there is an interesting rule that help in this case: vma_mark_attached() is called only on a newly created vma which can't be found by anyone else or the vma is already locked, therefore vma_start_write() can never race with vma_mark_attached(). Considering that vma_mark_attached() is the only one that can raise vma->vm_refcnt from 0, vma_start_write() can set vma->vm_lock_seq without modifying vma->vm_refcnt at all if vma->vm_refcnt=3D=3D0. > > > > > 2. Adding 0x80000000 saturates refcnt, so I have to use a lower bit > > > > 0x40000000 to denote writers. > > > > > > I'm confused, what? We're talking about atomic_t, right? > > > > I thought you suggested using refcount_t. According to > > https://elixir.bootlin.com/linux/v6.13-rc2/source/include/linux/refcoun= t.h#L22 > > valid values would be [0..0x7fff_ffff] and 0x80000000 is outside of > > that range. What am I missing? > > I was talking about atomic_t :-), but yeah, maybe we can use refcount_t, > but I hadn't initially considered that. My current implementation is using refcount_t but I might have to change that to atomic_t to handle vm_refcnt overflows in vma_start_read(). >