From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B78E4C0015E for ; Thu, 27 Jul 2023 14:57:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 396616B0071; Thu, 27 Jul 2023 10:57:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 346626B0074; Thu, 27 Jul 2023 10:57:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20DEE6B0075; Thu, 27 Jul 2023 10:57:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 10AF26B0071 for ; Thu, 27 Jul 2023 10:57:59 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8314BC05FC for ; Thu, 27 Jul 2023 14:57:58 +0000 (UTC) X-FDA: 81057696636.01.2D212F7 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf10.hostedemail.com (Postfix) with ESMTP id 7DC00C001F for ; Thu, 27 Jul 2023 14:57:56 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=qY0qn2J0; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf10.hostedemail.com: domain of will@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=will@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690469876; a=rsa-sha256; cv=none; b=R8hkqgrWwcf7fxms+8hCJITY81qhcJgnnmoob082xCXgAI+a5b7hXOTL48+dC6eXnQ696S Mrrg3zLYcFLO8eaMKWPn/gDTR61OwlYhLYZzlYdXLQjph06ILzjp69yzvIvP/poFrtLL4T xrAZpSS/2kO6Wn4RLkQIMObvYdXjpC4= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=qY0qn2J0; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf10.hostedemail.com: domain of will@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=will@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690469876; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yd/ZeUSBWv4gUjLHjuOXPQzk22eOqNaTQEQnr0Y+JSo=; b=CvoyjkNjXLVBfitL0eiCZkTr0rQX3P0OwJdmgZyONUKNoeJIJ7PXmtcZdbt+WEUlPxn3Ox ZYLsLRwAQhEALkiX4rKn9ENE2N8W2+CTbcIZXdXUIVAlqBvjT8hRxGdMEBtCCfAsKsicuP gZXnbhXPnjYwd2dpwRuMr6slhCxlSIA= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 427C461E9A; Thu, 27 Jul 2023 14:57:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 70A7CC433C8; Thu, 27 Jul 2023 14:57:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1690469874; bh=3qYlEqtxMD58VkVxR/PBqEwBgGSc0EGhT9TeKevi374=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=qY0qn2J0FU/lOKX+NgKQtCeNuTWtVh2ltsS0IZsz4vQW9A6FnvLiq/yyW5u8umnwy +CGoMV7HMxnfO/d95G43xXAcU9fnVK8DIRxxjfaQ27fgoBgT2+YAe6rJmGmm6Lu9+7 0s45pnWaArXDjC08aeYf7FJV5P48WBi6DkGJukykdhQHXrdKIOywlhtAB1DShVvAyl pWnEX1sumAZL2GNJ/ZDf5H/NXwW2VKMIki+dZJfJHCZXmlTzvTEYq5jNMjezdz+2Er qb7sq0DwWWnJe1qHXJWhprignPnWdYkE7m6BFS5XeVt5sXCxaPTj7M7JsBcCyiKUwl stPuijIOnmkNg== Date: Thu, 27 Jul 2023 15:57:47 +0100 From: Will Deacon To: Jann Horn Cc: paulmck@kernel.org, Andrew Morton , Linus Torvalds , Peter Zijlstra , Suren Baghdasaryan , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alan Stern , Andrea Parri , Boqun Feng , Nicholas Piggin , David Howells , Jade Alglave , Luc Maranget , Akira Yokosawa , Daniel Lustig , Joel Fernandes Subject: Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix anon_vma memory ordering Message-ID: <20230727145747.GB19940@willie-the-truck> References: <20230726214103.3261108-1-jannh@google.com> <31df93bd-4862-432c-8135-5595ffd2bd43@paulmck-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7DC00C001F X-Stat-Signature: qirjw7buropoh8cri6uaxyrnmbtjpyx1 X-HE-Tag: 1690469876-585793 X-HE-Meta: U2FsdGVkX1/GZ3xGDkGepWwZzMFfSjrT2zXbMRi5JFceBq16eL5iOHsDj9YhRh0G/DnH7U8pvCjb6MOPaX+rjUAm2I4gPcqiZFkRAZ6UHExIROMwSv6zLDU4+8BojC2VLLun+x42LCQTuM8gUPH8nO9ichqGORoWO+84A4P7Dl8ivhUb3EMo0GlpKZCPDyD2iP6SQRC0gEHDGTwaBnkyamKynK2Nn81qp0Bk2WsQPeoZGjR5cLdWJ4JqKVWLRbGlEced7QkAaAkc6bqmBElFaQ31ll8FWpubhldT/Vm/mooSl23SsIcTJ5SCDw/8oTlMu5PnMdHL9EhFEwLjR8egYKMRyo+SdD6IBNd/GuYbaKWQYpDnPFe8sGmNc5fFg1YAsNc5JBqjPFI+kT21mpgRn4fX//JPIPP4R+BJgHxmRCwzGarYkzdxy8wz6z960lB/7vaRJTvluYP+2R1Ph3bZWDx9w8xAKGD4kzqUpkgFFu0P7+D2I4GIygcTWZXwws6kB3Uc6VrAvZ1Fu3+MfsJcd421ofifwfSXk3FQ1olv5Vgxb6bjtL9c8QGGDE4iMka5LZUSwkGg/ZpmCrK268L3Qi+Xws92JY5RHuFflZOW6QdAUo02EjzoVJRG4f25Uqk+ZOC7Ir3I6SGPU952ulP4CZmCaKs0D2T83ntkLqxNj/wHPcKM95co9+Dss9XvnzQ7kcy4Hg/9u2fro6HV9abdFJudO3Lluk00EmlnLnAS5KFm2B7kHLGrreRG14nVEl3qhMDNRY1UyzGknMNcI0i/JVtW/41TMSDCOaU5TZjDD7kmcocoMpmk1D8lbobUBsKWeVvZG1gAOgXWM+1/TOgSjLfl+3d0TW0dziTmiPfQCZOgjARFHjIH836AEga5gOc+2zjgNioyJghIHcAQcAdDWvztq/AEZeBbyz+LvLu5+3PMnSIjmqsw7F91AbFmWHfgt9lbTrTuUH7FBnhk03w kWGOOOhZ C22EFERS0W3gAQc6Xhwj1gpW0hyUYxaZrKuXtbLmn7VQXVDbeEPVdw4b5khQwWKsfU5vBF/HWalKfn3wPdMAxBhAXLEwt8lSlpObyRYRHvgrCIt/1RSclBfbIWzAtTnm0003JiUuS0I43NIynE+CZ1afKdS8SCwiHMnDE7TkcDyk+wC5Vpk0gdvCVvvEskJ8nPoyf6RrWvWBpaU9xFK46hBlmBrhTkPGWH/AxToTiCWMXoY0TEiCAVFfSFQX3/SwDs7RED5ifqbUBaaN5l0VFCl7FZo26vHeuksRzBn4XR37Wez7bRyMoe6Qslg4HOBmQIzW01qETkXzNQ7hPDr/lqOtZYaDu6b7nue2UiclYOUyOyyg7ktufq+hCW8O7Rxa5FgQX4Zq8Trnujfhl6NgK5ceO0YaAY349RZiqoDnFgl8sZntaGrnLhdUazWdAzr5+01jfWT9KKhm+lmgc7netT/NgFw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 27, 2023 at 04:39:34PM +0200, Jann Horn wrote: > On Thu, Jul 27, 2023 at 1:19 AM Paul E. McKenney wrote: > > > > On Wed, Jul 26, 2023 at 11:41:01PM +0200, Jann Horn wrote: > > > Hi! > > > > > > Patch 1 here is a straightforward fix for a race in per-VMA locking code > > > that can lead to use-after-free; I hope we can get this one into > > > mainline and stable quickly. > > > > > > Patch 2 is a fix for what I believe is a longstanding memory ordering > > > issue in how vma->anon_vma is used across the MM subsystem; I expect > > > that this one will have to go through a few iterations of review and > > > potentially rewrites, because memory ordering is tricky. > > > (If someone else wants to take over patch 2, I would be very happy.) > > > > > > These patches don't really belong together all that much, I'm just > > > sending them as a series because they'd otherwise conflict. > > > > > > I am CCing: > > > > > > - Suren because patch 1 touches his code > > > - Matthew Wilcox because he is also currently working on per-VMA > > > locking stuff > > > - all the maintainers/reviewers for the Kernel Memory Consistency Model > > > so they can help figure out the READ_ONCE() vs smp_load_acquire() > > > thing > > > > READ_ONCE() has weaker ordering properties than smp_load_acquire(). > > > > For example, given a pointer gp: > > > > p = whichever(gp); > > a = 1; > > r1 = p->b; > > if ((uintptr_t)p & 0x1) > > WRITE_ONCE(b, 1); > > WRITE_ONCE(c, 1); > > > > Leaving aside the "&" needed by smp_load_acquire(), if "whichever" is > > "READ_ONCE", then the load from p->b and the WRITE_ONCE() to "b" are > > ordered after the load from gp (the former due to an address dependency > > and the latter due to a (fragile) control dependency). The compiler > > is within its rights to reorder the store to "a" to precede the load > > from gp. The compiler is forbidden from reordering the store to "c" > > wtih the load from gp (because both are volatile accesses), but the CPU > > is completely within its rights to do this reordering. > > > > But if "whichever" is "smp_load_acquire()", all four of the subsequent > > memory accesses are ordered after the load from gp. > > > > Similarly, for WRITE_ONCE() and smp_store_release(): > > > > p = READ_ONCE(gp); > > r1 = READ_ONCE(gi); > > r2 = READ_ONCE(gj); > > a = 1; > > WRITE_ONCE(b, 1); > > if (r1 & 0x1) > > whichever(p->q, r2); > > > > Again leaving aside the "&" needed by smp_store_release(), if "whichever" > > is WRITE_ONCE(), then the load from gp, the load from gi, and the load > > from gj are all ordered before the store to p->q (by address dependency, > > control dependency, and data dependency, respectively). The store to "a" > > can be reordered with the store to p->q by the compiler. The store to > > "b" cannot be reordered with the store to p->q by the compiler (again, > > both are volatile), but the CPU is free to reorder them, especially when > > whichever() is implemented as a conditional store. > > > > But if "whichever" is "smp_store_release()", all five of the earlier > > memory accesses are ordered before the store to p->q. > > > > Does that help, or am I missing the point of your question? > > My main question is how permissible/ugly you think the following use > of READ_ONCE() would be, and whether you think it ought to be an > smp_load_acquire() instead. > > Assume that we are holding some kind of lock that ensures that the > only possible concurrent update to "vma->anon_vma" is that it changes > from a NULL pointer to a non-NULL pointer (using smp_store_release()). > > > if (READ_ONCE(vma->anon_vma) != NULL) { > // we now know that vma->anon_vma cannot change anymore > > // access the same memory location again with a plain load > struct anon_vma *a = vma->anon_vma; > > // this needs to be address-dependency-ordered against one of > // the loads from vma->anon_vma > struct anon_vma *root = a->root; > } > > > Is this fine? If it is not fine just because the compiler might > reorder the plain load of vma->anon_vma before the READ_ONCE() load, > would it be fine after adding a barrier() directly after the > READ_ONCE()? I'm _very_ wary of mixing READ_ONCE() and plain loads to the same variable, as I've run into cases where you have sequences such as: // Assume *ptr is initially 0 and somebody else writes it to 1 // concurrently foo = *ptr; bar = READ_ONCE(*ptr); baz = *ptr; and you can get foo == baz == 0 but bar == 1 because the compiler only ends up reading from memory twice. That was the root cause behind f069faba6887 ("arm64: mm: Use READ_ONCE when dereferencing pointer to pte table"), which was very unpleasant to debug. Will