From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4927BE77184 for ; Thu, 19 Dec 2024 17:23:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D58C56B0083; Thu, 19 Dec 2024 12:23:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D07F86B0085; Thu, 19 Dec 2024 12:23:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCFB66B0088; Thu, 19 Dec 2024 12:23:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9A5356B0083 for ; Thu, 19 Dec 2024 12:23:33 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 61FC8AF3A5 for ; Thu, 19 Dec 2024 17:23:33 +0000 (UTC) X-FDA: 82912378710.03.CA6F4A7 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf12.hostedemail.com (Postfix) with ESMTP id 0AF444000F for ; Thu, 19 Dec 2024 17:23:16 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="ZUpj/1RL"; spf=none (imf12.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734628976; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M9Qh+L84iUm811E7p+zo33hZ53fiZU6quO+Nz9GYMQs=; b=JQPWrdmeeYEyuQUYeKGLg5mDpws3aoIUqBPCcL67sewpa7qjuMGLDsm1018TYA/wY/obMn PyTMjCLJnvGXpt3RzhtOfQ2/1VvblLDf2zyDB4iwQBKxUFgTXR+4LrbnMlaFbE72JaCtdc +HeLcDgmQCCrk+D58Bf6diNUeR6BZm0= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="ZUpj/1RL"; spf=none (imf12.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734628976; a=rsa-sha256; cv=none; b=GaW4X1MLotObNcg/qHs2aG4h6CxCsrZA+INAYz9UeUDxLAByaVpgOD9td2l8SsLZQRMr5P VA8EqyQwlCyu6m92pHGeEPgR9B5J718t5mEoawgEpvBW0U0FOfBu4CSGeYBHJ1Y5+9hyF/ QmPCRD1oFigsOVjrJom5XE9a2jIY8/4= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=M9Qh+L84iUm811E7p+zo33hZ53fiZU6quO+Nz9GYMQs=; b=ZUpj/1RLl3eaXTlzJHdcfrKnA2 RXZ2cFIHYsCx3ffzv84Jar+yMYz/wnGHNtz7s7sXbkkC32DgURVrftp1PcMF/xZkNaLMNafyNJbS5 1TwS8v51lKUtBRS27cnJ/4sUZXcdfla5dCtXvrwHDAk/KWq/03Q8yzAWmbJXUOqyCZw+BwPSpdMT4 Ygygg1Eqt3YSk5ZIvad1A3oPKjqJ6ROP/YG9qxbf849lAd3ZBipCFEjxvpaI6+Q7pXoZpNF4/Z5Au XdB/uu8/gXArbulneBb+98WXqyOlDIKuaqcGBvv9kvCH2qrGiuxmWZoe9o2GEh8y3dTBWOq8fMiPU 9EoMDlww==; Received: from 77-249-17-89.cable.dynamic.v4.ziggo.nl ([77.249.17.89] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tOKF7-00000004WXm-0zMD; Thu, 19 Dec 2024 17:23:22 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 4D7DF3003FF; Thu, 19 Dec 2024 18:23:20 +0100 (CET) Date: Thu, 19 Dec 2024 18:23:20 +0100 From: Peter Zijlstra To: Suren Baghdasaryan Cc: "Liam R. Howlett" , akpm@linux-foundation.org, willy@infradead.org, lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count Message-ID: <20241219172320.GC26279@noisy.programming.kicks-ass.net> References: <20241219091334.GC26551@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0AF444000F X-Rspam-User: X-Stat-Signature: sou79ex3cfxef5gf6bdcqywe485ke8oi X-HE-Tag: 1734628996-779351 X-HE-Meta: U2FsdGVkX1+Ntd25Y6Ur1c6OCnLPZBH+m2LJc7WnQx0R1XR6nkbyfwSLBzxtDPLFUlrpcZLVgwP3niUQzwd8GZUxgnEqOLjxCAD1AncJmgNROYlUsG9c661M4kHqAipjbMG1qrc4SoPbu9KBmH7LQ2SKCuYmHmagr1sLUG78kT5sv6RSdeP5w+REUbEz9/EB76Hq3wJPHYuRkwrYcKs4g4/qpIV8CehiFA83jPNK2Pc+xD1znW0KpihHZhx46UegNfMBDfa8VCUfjB/c9pTULZ6l/Jgge0hlKGKn+DRF+R1bLn10crEdSgcNOazxt3it2z3E/l+fHxpdFkZv63T0vdBHXWfXLyZE0G4RV9ezwqFUsyCLXeQ5nTR5IVeJfWC3sTSfc2xB83EAncrjJd8rbg6WQ3Cc8Zuf8avPiC+7y+SyWjnCCUK/xwMBumijzjVbYFXekeoRtbTvVJUy8UZifjlm2/MLwqNdf3ZO1GhKy8Ki9wo4C25hAaEorYIuNwL1x4WeyqW524788UVi8KZu293McT8As5ePlkQtetDTfuceCrMnKtOfCiKvnCkJnqD8KZ6ISe7w7iv1Z9I0h7BCLOLC3UTV5H86FOgU4GGXNMabTTjRYxTn7hxLoSJgRGwYOsNaZBPXOp36E7eupM6IRCjhD50Eq3VmhT5+ubHVJ68vTbWH3wqB9XYpmILlG5RXp3DakqGd3BfACJ9G85mnWlkIgTWYdY2LfTuVJU/dpLhSigpPakVxgsaAfT5DkRaFJElgL5hpJ739LZkUgsVJUWjM/gTRP1WPFDbPG2DnKY+QQaoIYpeUUtaMafNUVSvLRpiLRYyMOUjfAIkdwg5n/GmcMiVT5c+63CFf/NnDR7UJh0ny6pRWBN1OwSER3WE+6pgabgES0ZlVVq275jDuTLKQcAgde77X2YC4VdsLuaeIVykD5WzAZy3Zel0U8ZF5RQOuibWgn0CSV8s/1RT VIpNMkcS JfRdcZJf/tBk0/1M6KH1eR4CBuFK2YUQm82tzG5mDY+f7qGeOjvp+2Lol61lJRR3SbIisVmchYZdGmwaKv9UrJQex1PmVs7QjaKVT/+W4LpcxVhy1u0NvQ5BMOgmgr29SXjxri1M7kZ5/VuoAUixLOmTi+b7T4AA8VfsWp4TVd3nteL4mfRJ0r/t9VdYAwR9cJKCI9gaZnE1aDfeAE65mdN5X1qGd/pZKg95PSmVZ2AoJtw/SyD02k1cPnaxay/YS/zTHkCwEyYRSArwSowIk7q+5oGGYhlDQ1wh49VevYlsLp9rw4qZl1Ez7pz5V2S/WyD6XHEc5N+WUYS4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 19, 2024 at 08:14:24AM -0800, Suren Baghdasaryan wrote: > On Thu, Dec 19, 2024 at 1:13 AM Peter Zijlstra wrote: > > > > On Wed, Dec 18, 2024 at 01:53:17PM -0800, Suren Baghdasaryan wrote: > > > > > Ah, ok I see now. I completely misunderstood what for_each_vma_range() > > > was doing. > > > > > > Then I think vma_start_write() should remain inside > > > vms_gather_munmap_vmas() and all vmas in mas_detach should be > > > > No, it must not. You really are not modifying anything yet (except the > > split, which we've already noted mark write themselves). > > > > > write-locked, even the ones we are not modifying. Otherwise what would > > > prevent the race I mentioned before? > > > > > > __mmap_region > > > __mmap_prepare > > > vms_gather_munmap_vmas // adds vmas to be unmapped into mas_detach, > > > // some locked > > > by __split_vma(), some not locked > > > > > > lock_vma_under_rcu() > > > vma = mas_walk // finds > > > unlocked vma also in mas_detach > > > vma_start_read(vma) // > > > succeeds since vma is not locked > > > // vma->detached, vm_start, > > > vm_end checks pass > > > // vma is successfully read-locked > > > > > > vms_clean_up_area(mas_detach) > > > vms_clear_ptes > > > // steps on a cleared PTE > > > > So here we have the added complexity that the vma is not unhooked at > > all. Is there anything that would prevent a concurrent gup_fast() from > > doing the same -- touch a cleared PTE? > > > > AFAICT two threads, one doing overlapping mmap() and the other doing > > gup_fast() can result in exactly this scenario. > > > > If we don't care about the GUP case, when I'm thinking we should not > > care about the lockless RCU case either. > > > > > __mmap_new_vma > > > vma_set_range // installs new vma in the range > > > __mmap_complete > > > vms_complete_munmap_vmas // vmas are write-locked and detached > > > but it's too late > > > > But at this point that old vma really is unhooked, and the > > vma_write_start() here will ensure readers are gone and it will clear > > PTEs *again*. > > So, to summarize, you want vma_start_write() and vma_mark_detached() > to be done when we are removing the vma from the tree, right? *after* > Something like: vma_iter_store() vma_start_write() vma_mark_detached() By having vma_start_write() after being unlinked you get the guarantee of no concurrency. New lookups cannot find you (because of that vma_iter_store()) and existing readers will be waited for. > And the race I described is not a real problem since the vma is still > in the tree, so gup_fast() does exactly that and will simply reinstall > the ptes. Just so.