From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC945C54EBE for ; Mon, 16 Jan 2023 13:47:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03FF96B0071; Mon, 16 Jan 2023 08:47:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F32666B0072; Mon, 16 Jan 2023 08:47:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFB336B0073; Mon, 16 Jan 2023 08:47:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CECB76B0071 for ; Mon, 16 Jan 2023 08:47:17 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9B39280594 for ; Mon, 16 Jan 2023 13:47:17 +0000 (UTC) X-FDA: 80360788914.06.F5B59A6 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by imf29.hostedemail.com (Postfix) with ESMTP id 66B82120009 for ; Mon, 16 Jan 2023 13:47:15 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b=l5NfDo3X; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=qt3B+4zl; spf=pass (imf29.hostedemail.com: domain of kirill@shutemov.name designates 64.147.123.19 as permitted sender) smtp.mailfrom=kirill@shutemov.name; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673876835; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qeFZXXLsONTLO08sEOva/PbkznU4x6Y3TTAjXZnzwCI=; b=ISPOirX/PFN4wo2nEk9jQd5ycsiRfFyQ8U5m6kcHV8Howh6v4q6aPisiymhSKSZk+2AuXS l58J7MNHpGeD0qsEXv6Q1ruVZbN59o/VYQGHeictJ5HftrKxsdquZScIwt02CtR2m6t11x z6gducEFliWOEtQn8tjpMfChhwlUB8M= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b=l5NfDo3X; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=qt3B+4zl; spf=pass (imf29.hostedemail.com: domain of kirill@shutemov.name designates 64.147.123.19 as permitted sender) smtp.mailfrom=kirill@shutemov.name; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673876835; a=rsa-sha256; cv=none; b=R4nyfYYPX8n67Ra23VyPBvSuFK/qbvSo+Wx7CUcbYxQwwoksEksqRUYI4DOfFa1qEuEOCm NbrJJecD2lzzlOly4meSg5WcVC6cknbthNWqettJdbN/VIFanV8k5V63hnyOMlCTLE5mGM Sg3+qvmi12v2spLazM2NFw1eYJsYTGA= Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 4D937320070D; Mon, 16 Jan 2023 08:47:13 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Mon, 16 Jan 2023 08:47:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm1; t=1673876832; x=1673963232; bh=qe FZXXLsONTLO08sEOva/PbkznU4x6Y3TTAjXZnzwCI=; b=l5NfDo3XSmx5RiV7ZL NJHSta5wUfsjZZ1fPagghVzeqflRBHiBsndqRaGnBRFUIfEzGnLD9rBiu0EWw6RF w/Ph5HeFy2m3cPpffYuMZV745oCg0/9pN8q+ttkfq4QhL8oiy7EUDBNfElvkDq0F 4K6f15ySSzHGP/Q6iXIebs9z6e96z22B8WzZaPYfVbFI/OnMurL4ydS7NNcwEbHu rkhelTYc0KjxhRTQx9sAjgeUohMfKh/yLnkr/c2g+Ee0anyaqe4fOfZtElowIJlY Gq68J3ezKMFOvlm3LZRZVrevhMJtkXAzgtQdY+ujqayCMwrACNBoLxgNqswY2Hq1 bxvA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1673876832; x=1673963232; bh=qeFZXXLsONTLO08sEOva/PbkznU4 x6Y3TTAjXZnzwCI=; b=qt3B+4zlwwSGqo9VOqR8Ym/o41LfeWpzjfTuvhe7gkDy zwHeMouPKZSPfLqv2/DLZQ2gZeSlytw0ZXSxqUlcwEXiDPIUhUMvyvf4RJjrC0Ue aC2DvMa5HKSYxnP7LXQqZzabu9ZdOIh8AXzEDKR5IqtAvowA91l7S45pGNoWgpAS WrDiI8j4808C0gsCStNYbN4RyHbzJUiv6YaTbWH9bIedFWO0Gil7WqXFO/zgw8+m JqaO8n601PyKdANBGvlPt+/8ZXjdqSDxU7RyRljIYgePfYNgmW4+3pUpaOMYx1cG GREVRWgjNRZRSQkBTaPefOlupQ5JFFSKRulNjSsN4w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedruddtgedgheeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesthdttddttddtvdenucfhrhhomhepfdfmihhr ihhllhcutedrucfuhhhuthgvmhhovhdfuceokhhirhhilhhlsehshhhuthgvmhhovhdrnh grmhgvqeenucggtffrrghtthgvrhhnpefhieeghfdtfeehtdeftdehgfehuddtvdeuheet tddtheejueekjeegueeivdektdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrdhnrghmvg X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 16 Jan 2023 08:47:12 -0500 (EST) Received: by box.shutemov.name (Postfix, from userid 1000) id 636E7109792; Mon, 16 Jan 2023 16:47:10 +0300 (+03) Date: Mon, 16 Jan 2023 16:47:10 +0300 From: "Kirill A. Shutemov" To: David Hildenbrand Cc: Jann Horn , Andrew Morton , linux-mm@kvack.org, "Kirill A. Shutemov" , Zach O'Keefe , linux-kernel@vger.kernel.org, Yang Shi Subject: Re: [PATCH] mm/khugepaged: Fix ->anon_vma race Message-ID: <20230116134710.n4dgtrutt6rqif62@box.shutemov.name> References: <20230111133351.807024-1-jannh@google.com> <20230112085649.gvriasb2t5xwmxkm@box.shutemov.name> <20230115190654.mehtlyz2rxtg34sl@box.shutemov.name> <20230116123403.fiyv22esqgh7bzp3@box.shutemov.name> <5a7fdfa7-5b25-0ed4-2479-661d387b397b@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5a7fdfa7-5b25-0ed4-2479-661d387b397b@redhat.com> X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: tr9mpgzchpkochaz5jgdun13groquf35 X-Rspamd-Queue-Id: 66B82120009 X-HE-Tag: 1673876835-929496 X-HE-Meta: U2FsdGVkX19R0qekjzSXYkoqm1xkj6v7hOM/wCw3yXQUYf48RV/iaoG7Bpwd8fculaB2DnYibzpY2OkpuIgdIW6I0F5ERjsgldatPv/3epE992xOYpCRZ5yU50k2g4eCKqI1+GFpDCy1uwGCYycRAnhteAfmmuMdd/Ibw2ORcCDoKamh2c5IA6JxlRSBP4bEjns7QiJCPwd3Bveg/UPFTZJKVV5D+ZN1ENfWWckP6fupBfxtoJLlYVeVVlKPDbtEdZlTaskSGHPB9XGEuoJ77xjrtkRvoGuWso47ITALTBcFrnSeYaFJBBVcbJRHryO7LamD2HVYNocQkSHSvX2kATpJBA4FWzACk2sebgEKt+Sop9h8/I+oAyxT9c9B3t7qxYdh4rlmW/TXt5ra1ARC4Ozn+RUM7FsEnjpZ3Kb7gPtq/mezuWe02ui1AoUaOrP61NJSQbGMR7gGtdRrofiqh+Lj9mOE2rWFjgPaCS0taKaCcm4zBWTMTNb4ErPGFRCI9SePaF/lZz2lG8C/MlChNkHmdYaDwuGgNi2/I2dskjuTBQMGV7yNq4CcK/bjtdaczMos5ICHJaySW3ywfsI4PI8dFAgWzFad8glQWdkhXzS3bryRJ381xDZdJSU2I41ANDvsmXCM+98QqN9J922viy/Ndehe/7QsK3liTvQ9V7DMh52CUoA8nAce8mHUgm4E09p/qpPjL+QA/kAqs7EwHyJBKLsmdNfygguVAWUr6LfAT71xeEWbMoUGlHnxT+mFFQgbUtOnku+UrqQWUuRItye4+R/XE3xX9nhHb+p5lc8WsBWNpnnxU1ib0hHUvIyQrQiBQp9lmxeJL2dGuWmBB+6Y6jjz1HJUNBOLfDaR4likwFG1pYISVRm7RnHf66U+uMbTlw+/Uj4ta20szZgAlrJWQJ6AtcCTXcuTUksQc9M6TMcPHBYVjvBNqCdA7Ecrfuy0sgLdOeu8s2UguQm w8WHXIUg 8bt0lCzhANczR89pgsvlFjBD/E0Re+GtzInCc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 16, 2023 at 02:07:41PM +0100, David Hildenbrand wrote: > On 16.01.23 13:34, Kirill A. Shutemov wrote: > > On Mon, Jan 16, 2023 at 01:06:59PM +0100, Jann Horn wrote: > > > On Sun, Jan 15, 2023 at 8:07 PM Kirill A. Shutemov wrote: > > > > On Fri, Jan 13, 2023 at 08:28:59PM +0100, Jann Horn wrote: > > > > > No, that lockdep assert has to be there. Page table traversal is > > > > > allowed under any one of the mmap lock, the anon_vma lock (if the VMA > > > > > is associated with an anon_vma), and the mapping lock (if the VMA is > > > > > associated with a mapping); and so to be able to remove page tables, > > > > > we must hold all three of them. > > > > > > > > Okay, that's fair. I agree with the patch now. Maybe adjust the commit > > > > message a bit? > > > > > > Just to make sure we're on the same page: Are you suggesting that I > > > add this text? > > > "Page table traversal is allowed under any one of the mmap lock, the > > > anon_vma lock (if the VMA is associated with an anon_vma), and the > > > mapping lock (if the VMA is associated with a mapping); and so to be > > > able to remove page tables, we must hold all three of them." > > > Or something else? > > > > Looks good to me. > > > > > > Anyway: > > > > > > > > Acked-by: Kirill A. Shutemov > > > > > > Thanks! > > > > > > > BTW, I've noticied that you recently added tlb_remove_table_sync_one(). > > > > I'm not sure why it is needed. Why IPI in pmdp_collapse_flush() in not > > > > good enough to serialize against GUP fast? > > > > > > If that sent an IPI, it would be good enough; but > > > pmdp_collapse_flush() is not guaranteed to send an IPI. > > > It does a TLB flush, but on some architectures (including arm64 and > > > also virtualized x86), a remote TLB flush can be done without an IPI. > > > For example, arm64 has some fancy hardware support for remote TLB > > > invalidation without IPIs ("broadcast TLB invalidation"), and > > > virtualized x86 has (depending on the hypervisor) things like TLB > > > shootdown hypercalls (under Hyper-V, see hyperv_flush_tlb_multi) or > > > TLB shootdown signalling for preempted CPUs through shared memory > > > (under KVM, see kvm_flush_tlb_multi). > > > > I think such architectures must provide proper pmdp_collapse_flush() > > with the required serialization. Power and S390 already do that. > > > > The plan is to eventually move away from (ab)using IPI to synchronize with > GUP-fast. Moving further into that direction a is wrong. > > The flush was added as a quick fix for all architectures by Jann, until > we can do better. > > Even for ppc64, see: > > commit bedf03416913d88c796288f9dca109a53608c745 > Author: Yang Shi > Date: Wed Sep 7 11:01:44 2022 -0700 > > powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush > The IPI broadcast is used to serialize against fast-GUP, but fast-GUP will > move to use RCU instead of disabling local interrupts in fast-GUP. Using > an IPI is the old-styled way of serializing against fast-GUP although it > still works as expected now. > And fast-GUP now fixed the potential race with THP collapse by checking > whether PMD is changed or not. So IPI broadcast in radix pmd collapse > flush is not necessary anymore. But it is still needed for hash TLB. Okay. But I think tlb_remove_table_sync_one() belongs inside pmdp_collapse_flush(). Collapsing pmd table into huge page without serialization is a bug. They should not be separate. -- Kiryl Shutsemau / Kirill A. Shutemov