From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0EF0C54EBC for ; Thu, 12 Jan 2023 08:56:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2B5B18E0002; Thu, 12 Jan 2023 03:56:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 265E08E0001; Thu, 12 Jan 2023 03:56:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 107928E0002; Thu, 12 Jan 2023 03:56:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F205A8E0001 for ; Thu, 12 Jan 2023 03:56:56 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C60B01C61A6 for ; Thu, 12 Jan 2023 08:56:56 +0000 (UTC) X-FDA: 80345542032.23.DE60102 Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by imf09.hostedemail.com (Postfix) with ESMTP id E9F72140004 for ; Thu, 12 Jan 2023 08:56:53 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b=qOTbh7F0; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=f0vnSQEr; spf=pass (imf09.hostedemail.com: domain of kirill@shutemov.name designates 66.111.4.26 as permitted sender) smtp.mailfrom=kirill@shutemov.name; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673513814; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/mZ4Vm+ef7Fcse4BfKsJqYMNeCYam4CxxvzvAMJvOpI=; b=Mpth8XGUGMYkLIzh+K+miyh5/riKdB4udPk/7e2DzaGM7SLEFFt/g+ilNPn6OxJ+K06Lng XH+DK9M+32YTteopOY8aKDxxn4DB4tQ6YLDSSL1f3GOUs4+UYOw4Yo6zO/G0UTwInNxYJi NIy2oZF/vsCN3oL9LABQMEy+dOChJJ4= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=shutemov.name header.s=fm1 header.b=qOTbh7F0; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=f0vnSQEr; spf=pass (imf09.hostedemail.com: domain of kirill@shutemov.name designates 66.111.4.26 as permitted sender) smtp.mailfrom=kirill@shutemov.name; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673513814; a=rsa-sha256; cv=none; b=bBT929KJcue9zP1kL7+TIilpe9yq3dHcpDkL6iVH6oqO7P3x1VKcr3ZD0QMysRDKDnbf80 5BKNsoWa05ulXqvTMAfyWk2w3nRH0QEe2sXy1FeVkFbjvTvk6gGgcXTVJ+dVOEKwTeADqY qRa8cLP+ZQnnGucu0ftO0256qPvIMGs= Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 702885C009B; Thu, 12 Jan 2023 03:56:53 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Thu, 12 Jan 2023 03:56:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm1; t=1673513813; x=1673600213; bh=/m Z4Vm+ef7Fcse4BfKsJqYMNeCYam4CxxvzvAMJvOpI=; b=qOTbh7F0sUjm9HN6fw z+jXZhYpIF1EoBCHqwJnw+/w0Rur1WFX54+gBl0w76UxRKPQUlSNNZnOKvVv/rrP KNUTNIWKT5/86SbO2DZ7R7fgwl3gKOjTzSNmtbvj4hDdIY+oIXqdJNAV/dWFGgBy m2r98cQH/wahk28Fa+HE2SfJ1+h0QiGdV6SY/7rHNh5rq0BgdLsGICMVNKsNEK1B KoeFwMdLqHv6tTmH2i3HVYO/LpcLe7AVoiGWK0VdYMxJro/e2BAcT+JwAJdjP2tS vwgTWh2pHcrR9s/XOsEzN80JTlUKFBRfw/HD+PnSvA+P7DsKdaUltWz8nZgckLCp l7SQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1673513813; x=1673600213; bh=/mZ4Vm+ef7Fcse4BfKsJqYMNeCYa m4CxxvzvAMJvOpI=; b=f0vnSQErjrLUJGF/V1xrO7b3Q9Y2Ph3Uo+NacnV/P/FL Bv5aWGC4Z7+oIm2MhP7hxt1gThrmfYYLb+IPNVCqSJF/qV3rhtZpXY+CxmAMEruh 8JnEL29EJeDRaKfp4Z7aLhtNhZcqIIJ18zY0+oQkVXXRcUh4OBkyV8d3jOPJtcfD VT2mUzjuQY9gDAbntgJAE5H9xtW0wGIvuGDsT63iu1eSm76uMeayHddh5cWJNNQI kwdb3zsjMay0CYYzlFBxVub2abVO7nNgi2vfm+8A6i1/uofyWLdGwrQ7raD5GPbY ntcXlLtZHgqHsyet9Ukb6ieAQDZ9kYWhXtiu//vylA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrleehgdduvdehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesthdttddttddtvdenucfhrhhomhepfdfmihhr ihhllhcutedrucfuhhhuthgvmhhovhdfuceokhhirhhilhhlsehshhhuthgvmhhovhdrnh grmhgvqeenucggtffrrghtthgvrhhnpeelvdduleeluedtgedvhedvtedulefhjeffudeu heehheehhffgfffhveeifefgveenucffohhmrghinhepvhdurdhmmhenucevlhhushhtvg hrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhht vghmohhvrdhnrghmvg X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 12 Jan 2023 03:56:52 -0500 (EST) Received: by box.shutemov.name (Postfix, from userid 1000) id 8B97D109AF0; Thu, 12 Jan 2023 11:56:49 +0300 (+03) Date: Thu, 12 Jan 2023 11:56:49 +0300 From: "Kirill A. Shutemov" To: Jann Horn Cc: Andrew Morton , linux-mm@kvack.org, "Kirill A. Shutemov" , Zach O'Keefe , linux-kernel@vger.kernel.org, David Hildenbrand , Yang Shi Subject: Re: [PATCH] mm/khugepaged: Fix ->anon_vma race Message-ID: <20230112085649.gvriasb2t5xwmxkm@box.shutemov.name> References: <20230111133351.807024-1-jannh@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230111133351.807024-1-jannh@google.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E9F72140004 X-Stat-Signature: a5bkop8t37f1jro41h7wtzxagk6ya1hx X-Rspam-User: X-HE-Tag: 1673513813-500943 X-HE-Meta: U2FsdGVkX1/MF0sCDToWYxAx6J5wxmTf41BOZ6BjV0xOned1hxauAqMweVQX14DjIT349OCkF3D7t9bUGsfgW9qFf3W0gLh2mKkMOvB9RJLrF8LjylIU+5dBOSEl+SEf82FxDHAoJLCuf5T0aoUHS5t7qzxyAVHsnnO6gCZUAvHFB4kWc8df2q83+TNG0wms4GIyXUOiiIc88DzLEJ7JU9ibui3h/cms8ehMLA91w4nrsibPHyoTMf8W31NC3UOp3Hy/kIXgZ+syXkyG2+aMIZJCfYlroY5p/okwMQtxjXkPW/4Kmk29F0rwBSVDz/OG7/Sm4iMYXnjSc3tcC969TzbV4Sx7BsZX4uJ+uF2pmpz4bKF5n4gsFaofwGhwTsI6axXWQoKGtH0gHamNRHfF/jJugddLHYD4dJXcVBJvBkUJifW0sqIm34ZE/cxkpZV/uw7GzYvyIAUZQ5D7SYoV3HAnE4ZgAtFhyrtARzVLJNaG+hSmSht+VP0L/4otz4bCYtUazBXrKIi+HFcuBM0hXgGPuIbBL/Fdum45YxYNginqX39LCMLrUePlmHAB8NRe7VPnKcQ3N0eiV7UgbbOkTRRhioSnoBH7219NPH0zu8QYolaLlWltRtHEHp2/i2zie79xtIzGX8Q4ITuOPsb1KpP5FHm+61nAp0itdlxZaCx7Zg+I6cea1F3okMTO2Jyr75uKEfo/ABeAlpiiu9AJBfVwXOkzNXJ1cnVdXhKSQXagfwDKj85mRpmk2Ci9DiuxAkb9TzO/28ohZQOVaU2IsHt0CD+DRVOzEChb9lCgZvnA0ORDejGPPlcV+dHqxFX7uBUOcQ0nOZKR7Be9SUzrkNIjCYNkCFnh7rQVuCCBsC6uB0eFQk3ty1DXW+UETx3p5qFWPKqIWuXCX9Jn8hwWySLWdqwNjib/2BD/YK4DQEtMMEEM11rG3e3DqF46O0mWSIJY3NiXP4We4VmA0W3 UAYRNDy/ czYc6su+p+M2tW5a+YVSR2ikOGg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 11, 2023 at 02:33:51PM +0100, Jann Horn wrote: > If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires > it to be locked. retract_page_tables() bails out if an ->anon_vma is > attached, but does this check before holding the mmap lock (as the comment > above the check explains). > > If we racily merge an existing ->anon_vma (shared with a child process) > from a neighboring VMA, subsequent rmap traversals on pages belonging to > the child will be able to see the page tables that we are concurrently > removing while assuming that nothing else can access them. > > Repeat the ->anon_vma check once we hold the mmap lock to ensure that there > really is no concurrent page table access. > > Reported-by: Zach O'Keefe > Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") > Cc: stable@vger.kernel.org > Signed-off-by: Jann Horn > --- > zokeefe@ pointed out to me that the current code (after my last round of patches) > can hit a lockdep assert by racing, and after staring at it a bit I've > convinced myself that this is a real, preexisting bug. > (I haven't written a reproducer for it though. One way to hit it might be > something along the lines of: > > - set up a process A with a private-file-mapping VMA V1 > - let A fork() to create process B, thereby copying V1 in A to V1' in B > - let B extend the end of V1' > - let B put some anon pages into the extended part of V1' At this point V1' gets it's own ->anon_vma, not connected to V1, right? > - let A map a new private-file-mapping VMA V2 directly behind V1, without > an anon_vma > [race begins here] > - in A's thread 1: begin retract_page_tables() on V2, run through first > ->anon_vma check > - in A's thread 2: run __anon_vma_prepare() on V2 and ensure that it > merges the anon_vma of V1 (which implies V1 and V2 must be mapping the > same file at compatible offsets) > - in B: trigger rmap traversal on anon page in V1' I don't follow the race. rmap on V1' will not reach V1. > mm/khugepaged.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 5cb401aa2b9d..0bfed37f3a3b 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1644,7 +1644,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, > * has higher cost too. It would also probably require locking > * the anon_vma. > */ > - if (vma->anon_vma) { > + if (READ_ONCE(vma->anon_vma)) { > result = SCAN_PAGE_ANON; > goto next; > } This makes perfect sense. At least for readability. But I think false-negative should not lead to bad results. > @@ -1672,6 +1672,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, > result = SCAN_PTE_MAPPED_HUGEPAGE; > if ((cc->is_khugepaged || is_target) && > mmap_write_trylock(mm)) { > + /* > + * Re-check whether we have an ->anon_vma, because > + * collapse_and_free_pmd() requires that either no > + * ->anon_vma exists or the anon_vma is locked. > + * We already checked ->anon_vma above, but that check > + * is racy because ->anon_vma can be populated under the > + * mmap lock in read mode. > + */ > + if (vma->anon_vma) { > + result = SCAN_PAGE_ANON; > + goto unlock_next; > + } This is totally wrong direction. Or I don't understand the race. At this point we already paid nearly all price of of pagetable retraction. I don't see any correctness reason to stop here, except for the assert. I think lockdep assert in collapse_and_free_pmd() is wrong and has to be dropped. > /* > * When a vma is registered with uffd-wp, we can't > * recycle the pmd pgtable because there can be pte > > base-commit: 7dd4b804e08041ff56c88bdd8da742d14b17ed25 > -- > 2.39.0.314.g84b9a713c41-goog > -- Kiryl Shutsemau / Kirill A. Shutemov