From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AEFBE7717F for ; Fri, 13 Dec 2024 09:22:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CDFDE6B0083; Fri, 13 Dec 2024 04:22:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C8FF86B0085; Fri, 13 Dec 2024 04:22:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B57B96B0088; Fri, 13 Dec 2024 04:22:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 98AFA6B0083 for ; Fri, 13 Dec 2024 04:22:39 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 420371C7CE3 for ; Fri, 13 Dec 2024 09:22:39 +0000 (UTC) X-FDA: 82889394840.23.39A0ADF Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) by imf07.hostedemail.com (Postfix) with ESMTP id 244CA4000A for ; Fri, 13 Dec 2024 09:22:04 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b="n/lmmeHl"; spf=none (imf07.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734081746; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5yxKMYewggryOGHY+MR70jJYrmuZmaTGszo19sRiDmE=; b=N1SEBs/47qr5zEd061OtBcz+01mIxZTCPmcQ8kn4Szx9y/EOriomlPfxuCm559PFfkzUTr Hy4IzZg6bD4mG9oSSwLrO7NKcEQ86gkbkdkvpQFPUosmilrMocF2/rZVQTwmM2031EdBWP J9o5CStUuYp1xinTV0gG75+u/hk5iHo= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b="n/lmmeHl"; spf=none (imf07.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734081746; a=rsa-sha256; cv=none; b=mLo2X4pfWrvC66k+UH0AYiaZ37j9mOC3/BNOuS1v0qWqWao4yz0q+FCQbEdQrbilNJoB3i rBmke58Us/TA5C9v6YIfYk9e5JOkGMAVEzpo9gort+w3Su8iDzb1nBv0JperWKsz8bE5pv /Zj6dgRAMMlCpX2TTeS/JMo1y48hUyA= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=5yxKMYewggryOGHY+MR70jJYrmuZmaTGszo19sRiDmE=; b=n/lmmeHlqYuPBE/F6rM88EA/iy ywB08fXCL3RNvmCPdhoUo2zUfT/T8Z3sBBgfj7LAUOW4c3jMDO+UrOSID/9tF81TqMVXR4PHH+tb5 wv6S1QIG/u8w7yDe1jJwbQ6IZaHOScYUCJKO1RlDHAuCq9RkK8U9s9tq2jLrZbFwvjEmvg5MwAurR F85zRhH8PJ7c/E9KFueXYZwsngHGK2KoIk3Y3uDFkc048iLFNAQOgTUH4xCZja6sdiNiAe1jUStg6 17CuGBAQZb8Gh9LIXAb6G1JaaMbaFNMF3c1RWK6zHUm0XfpdIOq7lUFmX6kCbso2kPe6Wp3ibRFbN 0zaaSUqQ==; Received: from 77-249-17-89.cable.dynamic.v4.ziggo.nl ([77.249.17.89] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tM1sO-00000004EyF-3WXY; Fri, 13 Dec 2024 09:22:25 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 6685230049D; Fri, 13 Dec 2024 10:22:23 +0100 (CET) Date: Fri, 13 Dec 2024 10:22:23 +0100 From: Peter Zijlstra To: Suren Baghdasaryan Cc: Matthew Wilcox , akpm@linux-foundation.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock Message-ID: <20241213092223.GB2484@noisy.programming.kicks-ass.net> References: <20241111205506.3404479-4-surenb@google.com> <20241210223850.GA2484@noisy.programming.kicks-ass.net> <20241211082541.GQ21636@noisy.programming.kicks-ass.net> <20241212091659.GU21636@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 244CA4000A X-Stat-Signature: brao6egwgf5mgnqpwxaxr5jtttrut87m X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1734081724-970125 X-HE-Meta: U2FsdGVkX19FeA0o3akvVK727Jn/bvds/CENGQ6sjeN5USARZjOLLrNjBxcOKWL+3UmLUtPvrhCpHAbDbzo8fX479J7Gj4BPsoCY5plh1UWgaibCP/EHQCHHHsmh1aTA+VgLGLsOpQc1aluBZEHboOGOQmiWldXBeb3ecAmXo9Oyw2+gD62meoDuoXhvLFFUSrRFr/HO33l6YB/ovuIbgex5i4l5xS5MTYQJKTX8RezI/ovnGBL/NyRPiqXl1NJdtx4PeF1DbULstJYT0YfqKwQDwFQUah43HFjZOuEXSiNixtz65+xcBsNNrxLc9+lVnXS3ySZ6j6diFFd3jgdtQGFD1mhAMWP8J1xtP2IHewgnpikkAUcQ8+txLrfK3RdliRAkKW/0z+Jc7rFBlDcYnd1NMo335odHJyPE3Hq6oFN2G4mwbqHID6PkAh0zf+36wvUY48GvvmaMKjnu0yKothp7Vq3hwumLXegi9zjYDUY1di4eMpeioRTEJRuQVOzRoLvRyNlxm/UGMMQA4Cp2kpJUFFavbsq6VANThobivaTCtfV+VQjJbCc8F+e7nUkcCqZuUCUYY88s3NP7bH9wo2kUl3LXod7ElZqlkk5kJR556NbIm/TcuKLDF8lmvhWOaYM/ogwg1vqQD7ztReNf9ePakHWqBiWvpX0iApRDL4AlYgnMkzIIbIIqSzGdqwcOs9lgdDqenljIN0VyA8cq0rWI08muEudVKlw088hUHao2n3j7J/eUL/dL7zbg8oBQZGz7XRyVvGxa5Jlxu0UAE2YeU3YzbwOcCsua+dzObPlNV4H7iJ554wGLdkV2vQQTwXizqWMf6D8gaK8VpR5ciiOU0QzDuTcIF+tbtEPoazAG2nlhJgWHBDsmy/lO/cv96FlqdBy8H95mU/U63OEP/YWjQ2sB0jfYRCDkbiU7/r5mz7U3LX2KeUoy+m4rB70LRI7y9UrmgL8g2BCURTO AkxJ5CD0 TM6x+SkeT4cE0xeOntWcGPeKUwwMgXX+oUVwwPO3Q6UhXPD52qeAaDYuWEwAjuNk0dpHTX9qMlUhePEnzLFJdbvlL7bpC12fPJkH/medrx6nSJaRYQb/Ej6f6oUIcPm2EpXG5SvE8E90z6yINfFYKaI+I4oaqoxcgcQYrpXSWESVKaWbaaIu/TE11Hl9obcEURTGK0BiZhsMFYnnUuvCIK1MrU4MIet8IEh8Y3TTqakd0aYGRDNIvZRfDTzIxpXWyBYbLKwJFPAdclc63kxnDQRrW9MwrYLH30RGuXr2moHvEO4N6rwBeLYNUm9jpiBGxQjRHa5GUjbwfirhE6BW3c7fTDFttjqdvFwGdeEoxagJvLhXb6uAvyJ26DnfvSSmt2/0EKT2SQxtb39toa9fzzXGYUfrs0aZIUvNF1zwhhuS9OOHJqT+lXPngYX682o7DieG4aYH18PtsDHukxub0foIuad8sZpp4ob4orBhk16ff6/UC9DS7la8GUg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 12, 2024 at 06:17:44AM -0800, Suren Baghdasaryan wrote: > On Thu, Dec 12, 2024 at 1:17 AM Peter Zijlstra wrote: > > > > On Wed, Dec 11, 2024 at 07:01:16PM -0800, Suren Baghdasaryan wrote: > > > > > > > > I think your proposal should work. Let me try to code it and see if > > > > > > something breaks. > > > > > > Ok, I tried it out and things are a bit more complex: > > > 1. We should allow write-locking a detached VMA, IOW vma_start_write() > > > can be called when vm_refcnt is 0. > > > > This sounds dodgy, refcnt being zero basically means the object is dead > > and you shouldn't be touching it no more. Where does this happen and > > why? > > > > Notably, it being 0 means it is no longer in the mas tree and can't be > > found anymore. > > It happens when a newly created vma that was not yet attached > (vma->vm_refcnt = 0) is write-locked before being added into the vma > tree. For example: > mmap() > mmap_write_lock() > vma = vm_area_alloc() // vma->vm_refcnt = 0 (detached) > //vma attributes are initialized > vma_start_write() // write 0x8000 0001 into vma->vm_refcnt > mas_store_gfp() > vma_mark_attached() > mmap_write_lock() // vma_end_write_all() > > In this scenario, we write-lock the VMA before adding it into the tree > to prevent readers (pagefaults) from using it until we drop the > mmap_write_lock(). Ah, but you can do that by setting vma->vm_lock_seq and setting the ref to 1 before adding it (its not visible before adding anyway, so nobody cares). You'll note that the read thing checks both the msb (or other high bit depending on the actual type you're going with) *and* the seq. That is needed because we must not set the sequence number before all existing readers are drained, but since this is pre-add that is not a concern. > > > 2. Adding 0x80000000 saturates refcnt, so I have to use a lower bit > > > 0x40000000 to denote writers. > > > > I'm confused, what? We're talking about atomic_t, right? > > I thought you suggested using refcount_t. According to > https://elixir.bootlin.com/linux/v6.13-rc2/source/include/linux/refcount.h#L22 > valid values would be [0..0x7fff_ffff] and 0x80000000 is outside of > that range. What am I missing? I was talking about atomic_t :-), but yeah, maybe we can use refcount_t, but I hadn't initially considered that.