From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BE97CF042B for ; Tue, 8 Oct 2024 23:58:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9076F6B00AF; Tue, 8 Oct 2024 19:58:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B39D6B00D5; Tue, 8 Oct 2024 19:58:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7555B6B00D7; Tue, 8 Oct 2024 19:58:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 542956B00AF for ; Tue, 8 Oct 2024 19:58:23 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 44C5B120DD2 for ; Tue, 8 Oct 2024 23:58:21 +0000 (UTC) X-FDA: 82652101644.14.C758F26 Received: from mail-yb1-f170.google.com (mail-yb1-f170.google.com [209.85.219.170]) by imf21.hostedemail.com (Postfix) with ESMTP id F3BF21C0009 for ; Tue, 8 Oct 2024 23:58:20 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=joelfernandes.org header.s=google header.b=sX8P3u1C; spf=pass (imf21.hostedemail.com: domain of joel@joelfernandes.org designates 209.85.219.170 as permitted sender) smtp.mailfrom=joel@joelfernandes.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728431857; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=31IFgPjCDKr2yN/1uXsCZ+3MaYecDU0kdN0PY7CCTI4=; b=3H7/E/89FdLUqAgHggvsh38Rtszw5DCHPQih2DbWaTuJm2ql9cEP+Vw429SXTigMOXO6dS jypyqGSoEoYzqGnJhHO7Shts7dBhQchB19GLu3TFnMMnuKtQ9BsZIglaqOkXGRjhytrGzg JrYy1MVWiBS5iHfHZ9tq+NJMn73T1IU= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=joelfernandes.org header.s=google header.b=sX8P3u1C; spf=pass (imf21.hostedemail.com: domain of joel@joelfernandes.org designates 209.85.219.170 as permitted sender) smtp.mailfrom=joel@joelfernandes.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728431857; a=rsa-sha256; cv=none; b=NdBJo1UARen375P5pzJ2iz1kvYuMN/6jespxGa2vdY+IrHx1ATTalvzkVGRO76yuSD9yeW f0pUYCHB5JkRgyUNTmgSLbW7Aqbtn9krBDFNwDoyUME+eK1oH/THnIqK9kXerfmidlIFca PS+BkKSLMX1Kisj0OatG7X2k4nGe3es= Received: by mail-yb1-f170.google.com with SMTP id 3f1490d57ef6-e28fc40fdccso464937276.1 for ; Tue, 08 Oct 2024 16:58:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1728431900; x=1729036700; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=31IFgPjCDKr2yN/1uXsCZ+3MaYecDU0kdN0PY7CCTI4=; b=sX8P3u1CWvO7OrNeXsd5frB3pcf4IxPSvh3hKgc9nguWzSWRFpttXWP1dylhofi01r GyA/6BIHWs0g0gaWnfs/kSnCszJdPse8RYYCP+Kj8e+5qxrfNw5zVGy/WMfyuBrMwE32 dSTVn5O8/sIbiaqRJ8LMBziYr+e1IYZ2QK09o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728431900; x=1729036700; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=31IFgPjCDKr2yN/1uXsCZ+3MaYecDU0kdN0PY7CCTI4=; b=EehhHgBK9zhPqGBGjl96GfRZn9jXtcwbbG6f5rcjGnokY/GVl735YNC2fzmBVtFQTz mGt4KhbpdAt1bXVsoZqhn6Zn3LBAumn8QI7ZR9+nnFth96EIfxoY5KhYAIm+0SCH7in1 glQtdLKDZ3KWQSxpjSIRORty6r1C8uECe4/jFXxp92JuLzIDmMwLLpEdY8Yj6eujRDeO 093qhPENBPceMIdXoTHcHJ+cJipF4SUiPSGs6s8sU9NNq/ycV9pnUU1JXHOiZjkPV/HC +M0T/Mpk30rrRMF4rhy11z0VKIAkobBDtKWbWGQF10krbqf/aCdY/8sfYtHaQj9yyaP7 kXXA== X-Forwarded-Encrypted: i=1; AJvYcCV8ck3x9+vCI+EW4vj2ImzULxWMxurUXYTeqbU/uDXJD/4DcTi5DtTpOt4E3KIPb+G67Sv+rdPh4A==@kvack.org X-Gm-Message-State: AOJu0YzA4/GcaHpu5MddxYdX8SU/jd1IKJNmwG50hhYEpHtILcHiBogb xjHQOEPKTkNwYfgWkELeK9DHRIUbA6B9/UINxaWKf5dd9o1WY2gt8mixHQJ22/vFHLZh4M7/MCN higtOTV13N7MgAS2PRbhpeaZ8O38RI2YbSDq7hg== X-Google-Smtp-Source: AGHT+IG+0d7vkbx25LOXKc8z9LG8Yqk2gSSRR5Ousyz0X7YZdecXwOYW1eOpBeMCnlUi7PhXf7bW2NYfGzG8cae91Tc= X-Received: by 2002:a05:6902:c03:b0:e1d:44e9:a8fa with SMTP id 3f1490d57ef6-e28fe40ff41mr738530276.46.1728431899868; Tue, 08 Oct 2024 16:58:19 -0700 (PDT) MIME-Version: 1.0 References: <20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5ead9631f2ea@google.com> In-Reply-To: <20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5ead9631f2ea@google.com> From: Joel Fernandes Date: Tue, 8 Oct 2024 19:58:08 -0400 Message-ID: Subject: Re: [PATCH] mm/mremap: Fix move_normal_pmd/retract_page_tables race To: Jann Horn Cc: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, willy@infradead.org, hughd@google.com, lorenzo.stoakes@oracle.com, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: zy89b8mw8bfo8posb1o4cnty3f41zuu3 X-Rspamd-Queue-Id: F3BF21C0009 X-Rspamd-Server: rspam11 X-HE-Tag: 1728431900-676684 X-HE-Meta: U2FsdGVkX1/XfqlID9wUi7yU0K34vMx5FmONK5ElM9QJZDq8INiyaExdHt5FxirJv8y69cdZ2AnJhsCm1GtiRbNx8anx67VTUmtQsfpjsy+IZNss0j4uKM9F/loR+C3SzlS0lkPInNOppgH3cp0BE/8+xKpFxUJxeJvP3Nk36+et1dhgBSug+uDRo73Ev/vevXjOCNaTjYHorgecAuxKEUqlEjM3EHeyn0cTz9taCXhOfR4+L3o/re+bkBTVpI46AIIwgecpYnVZAZlLXSivC/reqMgucvkYj/L2UIFTQgkHIGBbAd6xWn0vCMB1CmiC3LFQ8HDPrldjqE4EzrcKBYlaeakkX+f86k3Fx5ARaOFAV4MrdLh8sobPWN1nR4iHqwGB/bWUDjmLlWAxjKGqBnvo+rkZw9lg7no4S/oSjuogyAMj+pHWdbY8aDDatq4QQqZoL8G/jZqE/mNKGp8WZ9sbbLcvi0NNMiPUllCxDYoQI1IehWcytIthaRLfaLsyf460TmE+4Wrzaowr1E9NlwNKJxe0xvWIynYu17wdIZRJBnXv8tAym7cfza2DnbYlDA3xKHd/T9dZiaYzYKS/Hs+MhMQgn5DquMxDCEQY28LMS00cOi3Sn4zFzlQoWu3RcErBUZpQPTnY1a/X9WPr0fZAO0gge8XP31MLAjQZ7dTOxPierPzcB2HDQysuyBq7bxsVcrZKazpvcRYCsky7QXfyFGQJwAw5PgicyINUFxWcF5Bu0zgj/yGPUHxhFKxBgiNLOHTIxbEbt+BR4n1JFWxQys/qGt7JbYA4C3CSQCRw0nZPnKNqkj2arz521R+pis/mR/VP9OSrOkst2YKLJ3Jrvp7LJUYlq77tekBR20/GIH8gdjAURkkszbWHuuiDxNJYrca8tGeQkbxWjfNReh+nC/p2yazjF/vqpmZl6dSyPdVdxvxxR2VFa1exlzP+5B7e5XEwfj0onTBag46 pXAyIoEV NDFpIKGO1i0x+On+bOGtHy5DlKxaplNGKnQ4pyoRSS7oP4capDvoZpZx9YxW735aS4Rsm/vSKg0EzYNvn+r1FWiznRoaMtmwqSAV0f8H1dnLGIrefJcvnWFDcy7b68pQgS+tDcQJXEeLJLiX7uUg4ublqlu8/fvS97nqNuebyEoribhOG6IlIBioZNdCySbPeUWrUtq2QWqpY8LFjWz1q6qRiTw64VHNHX/HwbnHEJ+FX6n0GtiLSVqgoy3Irn6xwJ1eUBfdWSZk6NEHqDw/v7RiiBuNk7OhDJ5x4ruBkeqYE8HQcJxLPZ0z52fpOoXKdSMfz9i9vTNjHKws= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 7, 2024 at 5:42=E2=80=AFPM Jann Horn wrote: > > In mremap(), move_page_tables() looks at the type of the PMD entry and th= e > specified address range to figure out by which method the next chunk of > page table entries should be moved. > At that point, the mmap_lock is held in write mode, but no rmap locks are > held yet. For PMD entries that point to page tables and are fully covered > by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called, > which first takes rmap locks, then does move_normal_pmd(). > move_normal_pmd() takes the necessary page table locks at source and > destination, then moves an entire page table from the source to the > destination. > > The problem is: The rmap locks, which protect against concurrent page tab= le > removal by retract_page_tables() in the THP code, are only taken after th= e > PMD entry has been read and it has been decided how to move it. > So we can race as follows (with two processes that have mappings of the > same tmpfs file that is stored on a tmpfs mount with huge=3Dadvise); note > that process A accesses page tables through the MM while process B does i= t > through the file rmap: > > > process A process B > =3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D= =3D > mremap > mremap_to > move_vma > move_page_tables > get_old_pmd > alloc_new_pmd > *** PREEMPT *** > madvise(MADV_COLLAPSE) > do_madvise > madvise_walk_vmas > madvise_vma_behavior > madvise_collapse > hpage_collapse_scan_file > collapse_file > retract_page_tables > i_mmap_lock_read(mapping) > pmdp_collapse_flush > i_mmap_unlock_read(mapping= ) > move_pgt_entry(NORMAL_PMD, ...) > take_rmap_locks > move_normal_pmd > drop_rmap_locks > > When this happens, move_normal_pmd() can end up creating bogus PMD entrie= s > in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`. > The effect depends on arch-specific and machine-specific details; on x86, > you can end up with physical page 0 mapped as a page table, which is like= ly > exploitable for user->kernel privilege escalation. > > > Fix the race by letting process B recheck that the PMD still points to a > page table after the rmap locks have been taken. Otherwise, we bail and l= et > the caller fall back to the PTE-level copying path, which will then bail > immediately at the pmd_none() check. > > Bug reachability: Reaching this bug requires that you can create shmem/fi= le > THP mappings - anonymous THP uses different code that doesn't zap stuff > under rmap locks. File THP is gated on an experimental config flag > (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need shmem > THP to hit this bug. As far as I know, getting shmem THP normally require= s > that you can mount your own tmpfs with the right mount flags, which would > require creating your own user+mount namespace; though I don't know if so= me > distros maybe enable shmem THP by default or something like that. Not to overthink it, but do you have any insight into why copy_vma() only requires the rmap lock under this condition? *need_rmap_locks =3D (new_vma->vm_pgoff <=3D vma->vm_pgoff); Could a collapse still occur when need_rmap_locks is false, potentially triggering the bug you described? My assumption is no, but I wanted to double-check. The patch looks good to me overall. I was also curious if move_normal_pud() would require a similar change, though I=E2=80=99m inclin= ed to think that path doesn't lead to a bug. thanks, - Joel > > Bug impact: This issue can likely be used for user->kernel privilege > escalation when it is reachable. > > Cc: stable@vger.kernel.org > Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap o= r vma lock") > Closes: https://project-zero.issues.chromium.org/371047675 > Co-developed-by: David Hildenbrand > Signed-off-by: Jann Horn > --- > @David: please confirm we can add your Signed-off-by to this patch after > the Co-developed-by. > (Context: David basically wrote the entire patch except for the commit > message.) > > @akpm: This replaces the previous "[PATCH] mm/mremap: Prevent racing > change of old pmd type". > --- > mm/mremap.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/mm/mremap.c b/mm/mremap.c > index 24712f8dbb6b..dda09e957a5d 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -238,6 +238,7 @@ static bool move_normal_pmd(struct vm_area_struct *vm= a, unsigned long old_addr, > { > spinlock_t *old_ptl, *new_ptl; > struct mm_struct *mm =3D vma->vm_mm; > + bool res =3D false; > pmd_t pmd; > > if (!arch_supports_page_table_move()) > @@ -277,19 +278,25 @@ static bool move_normal_pmd(struct vm_area_struct *= vma, unsigned long old_addr, > if (new_ptl !=3D old_ptl) > spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); > > - /* Clear the pmd */ > pmd =3D *old_pmd; > + > + /* Racing with collapse? */ > + if (unlikely(!pmd_present(pmd) || pmd_leaf(pmd))) > + goto out_unlock; > + /* Clear the pmd */ > pmd_clear(old_pmd); > + res =3D true; > > VM_BUG_ON(!pmd_none(*new_pmd)); > > pmd_populate(mm, new_pmd, pmd_pgtable(pmd)); > flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE); > +out_unlock: > if (new_ptl !=3D old_ptl) > spin_unlock(new_ptl); > spin_unlock(old_ptl); > > - return true; > + return res; > } > #else > static inline bool move_normal_pmd(struct vm_area_struct *vma, > > --- > base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b > change-id: 20241007-move_normal_pmd-vs-collapse-fix-2-387e9a68c7d6 > -- > Jann Horn >