From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6477BE77184 for ; Thu, 19 Dec 2024 17:42:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D45366B007B; Thu, 19 Dec 2024 12:42:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CF5ED6B0082; Thu, 19 Dec 2024 12:42:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBCC56B0083; Thu, 19 Dec 2024 12:42:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9D8436B007B for ; Thu, 19 Dec 2024 12:42:46 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1C03A80137 for ; Thu, 19 Dec 2024 17:42:46 +0000 (UTC) X-FDA: 82912427178.19.6037A7A Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf15.hostedemail.com (Postfix) with ESMTP id DC4B2A0007 for ; Thu, 19 Dec 2024 17:41:53 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=BphpnbGJ; spf=none (imf15.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734630139; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NEQDKyVI+AwN6pRWZtj3HZw9vQa6BI/m9sUYwrx/tAI=; b=LiagwRzQUoeuImr8stLa4EZCLT39CeYs5QQXNu3NrwLh3pnwgI757I3QqItqGEqI3DIIGy AC5XZW7YK5mCNQROj8FelhaYHQn9TZm7CaQrvlz62WEhQGAmUy7+OQBH+qRLWvTqOYuGFV B16FQWmjSAXFjjNp3OrtWiUo5oHntPA= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=BphpnbGJ; spf=none (imf15.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734630139; a=rsa-sha256; cv=none; b=zRvAxhfY18oAiovuprkArhhw0wP8Cc72HTNHpPJKdLPQLi/qSIfUVsWATtksFGOGoP27L8 xewBxYP0E+FaYvx06MiJ3tUswfGgXUPvAVp/zIKB5tlhnlv2/swrr60tNYojU6STPSKQW5 Z+4Yrl9bQHcR4gEBfkQ3VUxUF7clj1o= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:To:From:Date:Sender:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description; bh=NEQDKyVI+AwN6pRWZtj3HZw9vQa6BI/m9sUYwrx/tAI=; b=BphpnbGJpLqou1jtZkLRmrRsQj vymEjXQ1M61YmaIhfG+aYUAjnLdxPUhrb04k1Q6DEs0TOqeg4fgTpTMHdEnkeklfxLGDB6/DVl1xT egg6dgaDmX7afuydnRPMyh0nr5s4xpWDxczoo596nwq5WKqkk17LmkeeiyIEWGR9L//Oe/FYQv2J0 aoLdHkR6+cmVSGde/fM/6g5nIm/tngwDjG8CYXj0SSPylyO9Lk9ehlmmzFKcMc2s/lKBgQVBZaHvi pv9PfyUtWC2yJEOALc0/ZpcU/K7iBr0fC6BwScLTAq0v5i0G8qnuRcNBw3vJioqVuW9VMjZg07wg3 gpbSIZnA==; Received: from 77-249-17-89.cable.dynamic.v4.ziggo.nl ([77.249.17.89] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tOKXj-00000004b1L-1atm; Thu, 19 Dec 2024 17:42:36 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 9A17130031E; Thu, 19 Dec 2024 18:42:35 +0100 (CET) Date: Thu, 19 Dec 2024 18:42:35 +0100 From: Peter Zijlstra To: "Liam R. Howlett" , Suren Baghdasaryan , akpm@linux-foundation.org, willy@infradead.org, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count Message-ID: <20241219174235.GD26279@noisy.programming.kicks-ass.net> References: <20241219091334.GC26551@noisy.programming.kicks-ass.net> <20241219112011.GA34942@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: DC4B2A0007 X-Rspamd-Server: rspam12 X-Stat-Signature: 6h49i8f9qaczj46j7gq779i1e896er4d X-Rspam-User: X-HE-Tag: 1734630113-691609 X-HE-Meta: U2FsdGVkX182xOXqKrFrW/xU+5qAqYZDczUAtM9uCeEIjU1rk8gtfPUubE77K9MNSmSWng7NarK2c9eGt1fdWeYIptDKcsR0Ziy3W7XtU3pSE4uwZno6rcO0ah3Ei1b2uBcLj0RjNM833hEBsAFRnTmlWEQqTKIfiXjmXBS/VxhO63T3LYHfLxJhuuvaT2toE3kBiUGJctUHxLtotjeeSLlfGnWXW2szeM8SBWQJ948jvbnCrB/hQo8bftHNtiY8bB2W6Yc4u4YaWnKx+n4YG3GQM8VqS/Ns71KHXkas4wk6YkJIn9WDsyDK06Qyj171drn06Y7eKYFrD5fq4d3O+Z1OK1s5Shgpuu0807OXuNQlzYdn32m48EtTvh16OeuaOq0w31KJctJcwkIPStNHztxwJ9DCAY4HeG34cidGGOb0AyAa4kqFLHPka67GbWjHQjYdamgf2Adz920chZQRtJC4Y/hfK8mIvrDv6cqC1E5k8zPyem8epi90quoez0IujRQPsO1EFeNF1RKn/G8A6ai2XOi9NNCFXkmOEgPcatmjdWZCQ3dfzXAWkdQlipiEPfvrfk+lcaDIoQ21JMp5fjKn2jvWE7mfUjjZfi3wtUipxhYQYmx6YEjy8EHnscYJjxp4d4XjvgPuE/CJmvphq1p4VGhBF5/gtv9jURsK/xYLEmuJBQ/i2XUG/sXwXgcyvvlPpSOkUbzhR3AqC6UtE6bYXFsJI3MSeOaTIwCHoBXtqS2sPZiaTxSLHJ9Qyg3TUH6KJadDwNHc8s4kaBWHDZzwhyfa5/7YqzoU5Z8G7Z5t/ofWY6pJXlZLIsbL+FZzWGVuCay4PwqlgWpiGyPRU2BIfrV05a6oy3jX0L/Zmaf9x2vR74UUIgB3UyAuz7KVlxkc6Tm7UkVA7YVt/GlK/bwWfqarTlgQ8v4u60AmgnnXLQtBNReN/JqCNqSFZd1EsX02PGaIe9cKy8BteTe 6lTRp4xM MO36fPjhzDCqGDtExyxNQEJDfM/8gxUAPKyB8f8mtYRk0TM2tl2tpb+5YsBhyfPS2kIm607RAp/5fYqOT7I2DZR13SEK3EhIWT52G4dXBANEgl5ZlnC8GZk86T+5AyO+eKPd3cdIyuGS6xceD0Ozjm2IepZCMplYyMv8q9EVo1lxjP2Az/TdtVcKXAa2FfWQ6LKSYX/d7EVyxc4z3z0s36M9qCguTvPpI5Su1hFRnf1t9EADGo/ueJjuEYw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 19, 2024 at 12:16:45PM -0500, Liam R. Howlett wrote: > Well, hold on - it is taken out of the rmap/anon vma chain here. It is > completely unhooked except the vma tree at this point. We're not adding > complexity, we're dealing with it. So I'm not entirely sure I understand the details here -- this is again about being able to do rollback when things fail? There is comment above the vms_clean_up_area() call in __mmap_prepare(), but its not making sense atm. > >Is there anything that would prevent a concurrent gup_fast() from > > > doing the same -- touch a cleared PTE? > > Where does gup_fast() install PTEs? Doesn't it bail once a READ_ONCE() > on any level returns no PTE? I think you're right, GUP doesn't, but any 'normal' page-table walker will. > > > AFAICT two threads, one doing overlapping mmap() and the other doing > > > gup_fast() can result in exactly this scenario. > > The mmap() call will race with the gup_fast(), but either the nr_pinned > will be returned from gup_fast() before vms_clean_up_area() removes the > page table (or any higher level), or gup_fast() will find nothing. Agreed. > > > If we don't care about the GUP case, when I'm thinking we should not > > > care about the lockless RCU case either. > > > > Also, at this point we'll just fail to find a page, and that is nothing > > special. The problem with accessing an unmapped VMA is that the > > page-table walk will instantiate page-tables. > > I think there is a problem if we are reinstalling page tables on a vma > that's about to be removed. I think we are avoiding this with our > locking though? So this is purely about the overlapping part, right? We need to remove the old pages, install the new mapping and have new pages populate the thing. But either way around, the range stays valid and page-tables stay needed. > > Given this is an overlapping mmap -- we're going to need to those > > page-tables anyway, so no harm done. > > Well, maybe? The mapping may now be an anon vma vs a file backed, or > maybe it's PROT_NONE? The page-tables don't care about all that no? The only thing where it matters is for things like THP, because that affects the level of page-tables, but otherwise it's all page-table content (ptes). > > Only after the VMA is unlinked must we ensure we don't accidentally > > re-instantiate page-tables. > > It's not as simple as that, unfortunately. There are vma callbacks for > drivers (or hugetlbfs, or whatever) that do other things. So we need to > clean up the area before we are able to replace the vma and part of that > clean up is the page tables, or anon vma chain, and/or closing a file. > > There are other ways of finding the vma as well, besides the vma tree. > We are following the locking so that we are safe from those perspectives > as well, and so the vma has to be unlinked in a few places in a certain > order. For RCU lookups only the mas tree matters -- and its left present there. If you really want to block RCU readers, I would suggest punching a hole in the mm_mt. All the traditional code won't notice anyway, this is all with mmap_lock held for writing.