From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75E6EC61CE8 for ; Fri, 6 Jun 2025 12:38:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 121B26B0095; Fri, 6 Jun 2025 08:38:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D4706B0096; Fri, 6 Jun 2025 08:38:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2C136B0098; Fri, 6 Jun 2025 08:38:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D71766B0095 for ; Fri, 6 Jun 2025 08:38:12 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7B59FC012E for ; Fri, 6 Jun 2025 12:38:12 +0000 (UTC) X-FDA: 83524928424.12.C1BEC95 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf24.hostedemail.com (Postfix) with ESMTP id 89F25180004 for ; Fri, 6 Jun 2025 12:38:10 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Jenrultv; spf=pass (imf24.hostedemail.com: domain of jannh@google.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749213490; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FeOQnwFRu1+Kj/2MlWTCn9hMSyDZ7M1MsWilhgviDz8=; b=MThyjL9/npJ3HElHTUTpFv42MzbdOgA1Bwf1zx5ZwWKOnSSOPbmuduUMecw18TmrezUVGD MwFEqiTVsSZdz/PsZR8+PSOB3XYhh8tihKainSOixxQAWeBTXhLAeJO44O12LyN5XWJlKg Ax1tVZyRCO9dj6xEJ/Wjh28612QtqAc= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Jenrultv; spf=pass (imf24.hostedemail.com: domain of jannh@google.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749213490; a=rsa-sha256; cv=none; b=Za33UQxuUpIZPU/DneYjhTrFUq1N2blxGaCYsEPgIA2iYbaCV7MUkZ6DMNwTBYW+SkPjY5 eRIkr3heBf4oSJPlgFx9O0bN04MnQCldUivsrkuN9vx7YDQP1ewn0Orj9R4sL1sXK7Bz8d UXSSBiCR5TnIdYkwX1KDueDq++ldYVc= Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-5f438523d6fso8446a12.1 for ; Fri, 06 Jun 2025 05:38:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1749213489; x=1749818289; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FeOQnwFRu1+Kj/2MlWTCn9hMSyDZ7M1MsWilhgviDz8=; b=Jenrultvj2yc7F+TyUsaFZxco+TZVvy4VHjLoad24VzqxpomHk9p449BoUbF3V7Pp0 xlxHVp1LDZR8JyWuOn8zuPFnxczxpRxO09o5aCcAxRmOUOxD6fOEFEI4QwQzOG31/mCv lfElTuL1Pu5xqLZNRGKyGHdjmpAKg8P+daw/UY0HtAmzldQjr4oRKawI1A4LQNlcJYx2 fVZE2bgr2qwAceImQ5bdb/QZIWMSC7laKjVLfb2/lNG/OV0GbCU2kPW8PB/Pzfxddhps sIW6ANf564HKVtVPI3/aqzCT/1+d8x8/Mf4dAiHX62nqYSSVOdHU+/BAFQ2n82O/zpz4 uUeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749213489; x=1749818289; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FeOQnwFRu1+Kj/2MlWTCn9hMSyDZ7M1MsWilhgviDz8=; b=mfk5igzxTr0qRd+Mnz+LfEyX1k4uiC5DnQbJuSUIA0z8Shp5gh3dZVQhawOEx0TjH9 1VJXjToE3A9L4lcAdwfN31qDHpwxSUCnd7aocP0j+K062RYtzZeP7ycoJnCaSwz9TKmH WKBZ29dPDi8YzY8eSv96DOC3WS0LZG2S0MF1DsMnR2zLiDABa0HjVYdYele11khlDFUg V90Vpo4tG8w3o9zJfI5rbhj2lG2zcd6zBBbqbNFnpKsLAWigDjKe30YJ8EFH2l4+S52u pqsRincn9nS3hTbxHyxRcpFgIWYho6XPUlQjlCPF4+UxmQbtswThFUr0ebipwIQQ5jVt 67Rw== X-Forwarded-Encrypted: i=1; AJvYcCVmKepNdwA06BzWP+Oj3QfZMyLdV/J9oerJ0ncZKzs4StadqXOYbtNwGEpQj4ixuxgiqS/VYUjxHA==@kvack.org X-Gm-Message-State: AOJu0YzXP4ntSe8CsMrofN1PMnr5Yo/vGJCVT/hmPnLCsUrZSevRuhrj uPzKjWQRXywJK0uMCGEaT/m4DD2BOxXhwKzBI+7DgmL2ut0tuL/LWcMV//4yMk5O9GXcfrCLBss GmI9oSn6rEEbtCBkh9ricaUBiG7JwPHooQgSwU1JIdETVntY57QSQheKYWiE= X-Gm-Gg: ASbGnct+N7ZhqLvcX3uUGk3iviP+w+/TWM8hghnCfj31As785gy8wQnkDUkY2cXms+Y kjjuKORi/Y37exTTJB8gkNcb67BxmBGUe5OgW1Mpr/GltCNe/nvG0q31MKN45Yu70t3nBn5tfXN d4j2bW9RaiaWLVgri3YbCxYfjVWRQ2Z7ohPIGfIDBd7vE1hy+/DREgkEkJ2a1dYIwfJhG57w== X-Google-Smtp-Source: AGHT+IE+fznf396DEdYbm7FDbiMqK8DD7+M7vFeKn4D5k96LIDFgmHNB35OHueWQZma6dil8qwRaIjQmLGcvzLI2aTE= X-Received: by 2002:a05:6402:3887:b0:600:9008:4a40 with SMTP id 4fb4d7f45d1cf-6077498ed18mr90476a12.4.1749213488536; Fri, 06 Jun 2025 05:38:08 -0700 (PDT) MIME-Version: 1.0 References: <20250606092809.4194056-1-ryan.roberts@arm.com> In-Reply-To: <20250606092809.4194056-1-ryan.roberts@arm.com> From: Jann Horn Date: Fri, 6 Jun 2025 14:37:32 +0200 X-Gm-Features: AX0GCFvlWBvUofLsMql7uPnqlPQ_I7AwEatSw2rWNXAV8LpXKA2kOKqETQmhfQc Message-ID: Subject: Re: [PATCH v1] mm: Close theoretical race where stale TLB entries could linger To: Ryan Roberts Cc: Andrew Morton , "Liam R. Howlett" , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 89F25180004 X-Rspamd-Server: rspam09 X-Stat-Signature: cw13cchkt1d4u9xo8wk4tm5x1zhkgnr8 X-HE-Tag: 1749213490-916390 X-HE-Meta: U2FsdGVkX1+ajZ0MHTu0nerhHXbku4JiCVf2U6zT59hCeC7c6FIV2ebQnZDubCFVDab18KrobaHN/MVaSFaXp7hTYoNUGYdT8HTA06hs5QY9JNQLXzHsIgLk18RDkJIUCipOqZ3g6K3MGZ7xH2IUDWwnmcASFTn3hH04gnl4auF3esKm6kS8j7JF+DRO33mkYgEHGZ+gkiUl/swJv6KBE4+j7r3GG5N0pKbqg+T3S3LPfxSYb/+5jfHvWnnlkb1VFkbBnWzJnDrYHw4h+mAA9ugn7E2b9j0gCYZJXKab3WDvjhlAT9k3MslqddnV5UEjP/WWvBSrteckpXERuPwIvaUv+JZVYjzVI2Q9R9NSZmQnwk+DSwgeqm0lN6v+mMYB+tjzQYbcG0N77GB88lvEGPORbG6xckqX2/daM+TNkyDJ4Un7sIrEpYJL0m77YULNnwOaP+7A2RfvvH2Z+b/ijtR6ojl4J1uzy1uYJBu0h9k/gy+saDRoIwXJhn3pFnc+RLofCxaLnP8FDIWUURpFx3nV2Qe0cYI9yaCkDwshq3luw0rr1zT7opYJDES8Tfhwquyg4iM6jgyhWBhOMT0J3zPhUIlD9BoVk66nYtRzQFz8BJ+io+nHukEqTy0xBtYUBgnQpiQLXw0XQ02B480JqHxxVkKMlvc6IrjKys1s3740Y+23nre2/ILUzen/w1XShYzIX4yLbheR22Z6QWF+dgy522RxDeR+HVPxBmaRhpIYZYMLeXkQ40vI/H0o04FgnQ/98oqvYN9Pm1dbemPPlQ5tIG+iySoO9pdTYZ2iAipavUV2KgL8vCCw2eUPHOPGojA4+KJYxA6OBMuys7WWmwWySEUDEuWZDKn+T/TGJL+MTLxbcgndafgfS/KMk2rvZSs2kxVE+3+T9vTOvtCdbr3q6tazdiU6+ghJR9iDbLCXBoKAnWy7lKnyLxomxAB6Rjajp4ZJMPAzciHQezh avL4EG1/ jzCRMKIMRxowF5KomCfugpDOD4C8hRCYygBr0zr05xXQy85Uwghkpf5QO5bypIx04wkMsfg1DOHoWOLmRzqJAFjOqB38lSAPZ//X7IS3dLGwojIRmnR8uYHZy8HHzeCp5PkFbZ92NajdLhwHvh5qVA1bXnz8yIom/Pn3lQLV6Gvb1OOeEve/wE8KHLW28mNMljfzsLUSMgIecB5DpKGFhBwAT/tU1iY2OxwFeWWlpI8toGHjvQhsycY46+jcOY5xWLSJAA9Ud7hhQ/oJ5Zjj6fKhWJgj6Zm4VxayDgG9FKcbjYHg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 6, 2025 at 11:28=E2=80=AFAM Ryan Roberts = wrote: > Commit 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with > a parallel reclaim leaving stale TLB entries") described a theoretical > race as such: > > """ > Nadav Amit identified a theoritical race between page reclaim and > mprotect due to TLB flushes being batched outside of the PTL being held. > > He described the race as follows: > > CPU0 CPU1 > ---- ---- > user accesses memory using RW PTE > [PTE now cached in TLB] > try_to_unmap_one() > =3D=3D> ptep_get_and_clear() > =3D=3D> set_tlb_ubc_flush_pending() > mprotect(addr, PROT_READ) > =3D=3D> change_pte_range() > =3D=3D> [ PTE non-present - no fl= ush ] > > user writes using cached RW PTE > ... > > try_to_unmap_flush() > > The same type of race exists for reads when protecting for PROT_NONE and > also exists for operations that can leave an old TLB entry behind such > as munmap, mremap and madvise. > """ > > The solution was to introduce flush_tlb_batched_pending() and call it > under the PTL from mprotect/madvise/munmap/mremap to complete any > pending tlb flushes. > > However, while madvise_free_pte_range() and > madvise_cold_or_pageout_pte_range() were both retro-fitted to call > flush_tlb_batched_pending() immediately after initially acquiring the > PTL, they both temporarily release the PTL to split a large folio if > they stumble upon one. In this case, where re-acquiring the PTL > flush_tlb_batched_pending() must be called again, but it previously was > not. Let's fix that. > > There are 2 Fixes: tags here: the first is the commit that fixed > madvise_free_pte_range(). The second is the commit that added > madvise_cold_or_pageout_pte_range(), which looks like it copy/pasted the > faulty pattern from madvise_free_pte_range(). > > This is a theoretical bug discovered during code review. Yeah, good point. So we could race like this: CPU 0 CPU 1 madvise_free_pte_range pte_offset_map_lock flush_tlb_batched_pending pte_unmap_unlock try_to_unmap_one get_and_clear_full_ptes set_tlb_ubc_flush_pending pte_offset_map_lock [old PTE still cached in TLB] which is not a security problem for the kernel (a TLB flush will happen before the page is actually freed) but affects userspace correctness. (Maybe we could at some point refactor this into tlb_finish_mmu(), and give tlb_finish_mmu() a boolean parameter for "did we maybe try to unmap/protect some range of memory"; just like how tlb_finish_mmu() already does the safety flush against concurrent mmu_gather operations. Maybe that would make it harder to mess this up?) > Cc: stable@vger.kernel.org > Fixes: 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with = a parallel reclaim leaving stale TLB entries") > Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD") > Signed-off-by: Ryan Roberts Reviewed-by: Jann Horn