From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13855C5B552 for ; Tue, 3 Jun 2025 18:21:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FDC96B04E4; Tue, 3 Jun 2025 14:21:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 787966B04E6; Tue, 3 Jun 2025 14:21:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6762F6B04E7; Tue, 3 Jun 2025 14:21:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 480386B04E4 for ; Tue, 3 Jun 2025 14:21:18 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 002255F4BE for ; Tue, 3 Jun 2025 18:21:17 +0000 (UTC) X-FDA: 83514906636.26.452E0CB Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf30.hostedemail.com (Postfix) with ESMTP id F1ECF8000B for ; Tue, 3 Jun 2025 18:21:15 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=R5PtPrX3; spf=pass (imf30.hostedemail.com: domain of jannh@google.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748974876; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Qx0neXVgtNC/kVhbJbk2XialOz0qQ4TlGtQ7P71fgqA=; b=6qLmEqPQXjjiYlrXRIivjEvBrp2vPjsRwmuVlGEFfPxqvhT3k2Rtrd2HLGyfg1t8gxAUCQ cl0rPFdZ7JI4pljHaoFQlt8+6ap0ixUX+2GK0SgwAm8tUY6ZJ3y/tuore0fZQiG98FuI8S iGPLLLIBcJtoGkbRpFmQp6TGJYcbO0k= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=R5PtPrX3; spf=pass (imf30.hostedemail.com: domain of jannh@google.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748974876; a=rsa-sha256; cv=none; b=dbF6w2d59VkxJ54h7od4H/pvnrYWPJ0YjyXu40eaqpPScfIc1tdQoWuc42941ixmQDCBOS x67WXm8Es4UhNySjOGso7BegfR6qDSFnNy9GdXIg2k+1I/Kwu0hW/7Mgd1UMtm7R1s2e+E 1zYvhf9kIsPZ9lWTubmJ332LRcCQwlQ= Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-443d4bff5dfso8295e9.1 for ; Tue, 03 Jun 2025 11:21:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1748974874; x=1749579674; darn=kvack.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=Qx0neXVgtNC/kVhbJbk2XialOz0qQ4TlGtQ7P71fgqA=; b=R5PtPrX3G+pANB8lx5LhPNTGLq5aEDV4jmCzxbHX0MMBtJ6Da59sfMXeLPhxW/hcoD fEQNWbxOroZQ3vjoTbyzeCMR3n2eFqr3Ilc81R2cWROhnsNJjgv8pnhoS2SBojbyXLio 8U8z5jGWVVm6r8s/LrY6+0ybESWvwEr5GRB1bhbpo3O9UZzGfaEWmn4u8Zs+Npz1RKt4 dmgVpji4Uh5y9bfkT1HxmpaAK1rz8edARRydhcjcOyqq24TIBfOtwxVuJS0asdDaW1SR xi1nWSNoKJU/qG19wribPldwsAmLnFXuqequZR5f4q397caeUZURGv1Uk+RdE17rQW+h qGSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748974874; x=1749579674; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Qx0neXVgtNC/kVhbJbk2XialOz0qQ4TlGtQ7P71fgqA=; b=uE2q6oDP/22RMI7LI3VmwozWCEYnrfdrF5Ro3BmUxtp7SWHJasiX0EamoyOrx2YIad L2pSiKYwBfKcHhKT3oAHTqxpO0soJbK6syTrc8xrufz9FlswCGRyOuQthqN0G5mT3+IO NnxWUneXJQ5riAyApKH44r0dTBsxT9ZoRG3BS/Cq0PpZE8/Kul03UxwpfB/Av4HMa6XO RtnDq3lKxa5ReeW9T9QyJvaCq02SN5OmTedGHqs4jmDDQF51qXmfr97wFOuKfYrHRDMr s+1nl/2MYxu3SOtfcfRZkRzo0D3yXRwQSAB6S9IZl03yaCtx6Bjjq/L5Y41q7tyBGuxB 8F2g== X-Forwarded-Encrypted: i=1; AJvYcCWoHi0aV/IofAvMYi2xulRHRsecyEhIBGXWwBIxI/5URXueAZJsCJbr+uOAVYS1op2ttawDMY8DQA==@kvack.org X-Gm-Message-State: AOJu0Yx5X8wfrClJ2Be1cPb2eUj5szhRI32f8zlQCqxOhzOGjVa6TeYZ NiovRsTpXZqklpDcEvVWcxt6dTsPvzyToTalsVHKwuHvWmWAogs/Mh6YF1IB4PTbsw== X-Gm-Gg: ASbGncvmmHyaJlUNjIbf24mDR71kXMEwf8D3Z9jVypezmJfl0nLgeyyAIgR0U3mdKgA oy7Wmo6Zr5HHxF6GM7BXIAlzbMUUt2eQdklaox6AlB/9PT1GvmP/4WCRrOuwdu520vSmf1njGLf owX7/k579Oc/w7iENsPwi9dYpIEECmywO2suEPJImN99p3rfSGhm+71xr1Ihpi19PgRFT+GAQ8L 6s3owBWQ8eZyHw9H7GAD9mHnURxZ1HO2VN+mqAI/mAS5uM8LPPu2YJZr1BERTCdv64Jx+yDD0kz lb0BWUSv0+ynL9gR3wQEE+7Se1lrOPK5IfLKwIdII4SuZJsJig== X-Google-Smtp-Source: AGHT+IG/MDcPn3lv3392MJVxs0YSZE8tHIiSmeSX+GAcuLe69+/JxBU9ary9/s0lV/XWjZK4w5fp0A== X-Received: by 2002:a05:600c:a49:b0:43d:409c:6142 with SMTP id 5b1f17b1804b1-451ef81dd27mr74105e9.0.1748974874117; Tue, 03 Jun 2025 11:21:14 -0700 (PDT) Received: from localhost ([2a00:79e0:9d:4:796:935b:268f:1be4]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-450d7fa2541sm166778185e9.15.2025.06.03.11.21.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Jun 2025 11:21:13 -0700 (PDT) From: Jann Horn Date: Tue, 03 Jun 2025 20:21:02 +0200 Subject: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20250603-fork-tearing-v1-1-a7f64b7cfc96@google.com> References: <20250603-fork-tearing-v1-0-a7f64b7cfc96@google.com> In-Reply-To: <20250603-fork-tearing-v1-0-a7f64b7cfc96@google.com> To: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org Cc: Peter Xu , linux-kernel@vger.kernel.org, Jann Horn , stable@vger.kernel.org X-Mailer: b4 0.15-dev X-Developer-Signature: v=1; a=ed25519-sha256; t=1748974869; l=3052; i=jannh@google.com; s=20240730; h=from:subject:message-id; bh=dKdCCyeFq6LySYsnlmhBK1IaYpelr+oq3BA9S1KT0HM=; b=GoRay08an3IgCzb96xM4NNdXnjvH0pAT1kssW24Wshk8I9JFTVg7UcCciTteDEoS3hNae8Ri7 CahzDeKuKUmCzyCsIv3BsVwz3E7fQlRhOKyUi+wEAqs9YsR6eDq0UPP X-Developer-Key: i=jannh@google.com; a=ed25519; pk=AljNtGOzXeF6khBXDJVVvwSEkVDGnnZZYqfWhP1V+C8= X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: F1ECF8000B X-Stat-Signature: tx5dkgay13czco97jp4b6qodhq8np8ft X-Rspam-User: X-HE-Tag: 1748974875-162944 X-HE-Meta: U2FsdGVkX186Vx/dzQLOLpCOOUL6jaopRhleSKz8HxLiBTWVZ1VA+SXSPCNEJeoaKaaOvDCq1QBCglDDjQRjG9L/9hQ4UOYC5oTE9YEvGCR8jJAnzIb2CYKIZrg8mLOIKkqNaRarm5v2JaYdUBOcjt9YjEJzsCU9Vr3U79HL2v5wlQVPo+1+Jl5dsrsJECgqVmUojdWthKhWsM2PeoR4AfK24ex74NLAcMT2QqpjD6gQcPHWxwg2Bh7Ec4PXf69UwT6rCJ1WuBuVxR3z1AJHgJfW5163K27/Q7Dh8kpn2HjO29/VHH9OMw0KT2HxtcCuhv3KChG8mveHwhHRZg5aD8pkam7GZVEAW19p2tJDBRXvnZDw30lteJyqiv/uX1+ZRugfs51/isK0FBdaa7QLfpi1/DmQuehMI039p+y3blkALewG+RcSSyQo8OjwZKl1V+SVrz7kcqjRMFnvLvVlzF5QSZPcv6WeTkwfo7XyTKZInoGFFkS9FzHNS4J5lta8Qj3vVJYQDLUO7Z6N5kviiO6fsx4St45qnIo8tXSzGHOoxjIIXXhTSrdRwNSW2GQhTlurZYjAXVyTjNQkEz4iMu3MDrT0R4FVdXCm0Of9Pm0QYS48+rljeGoUgB1RYEex3e8d17SPvCFT3icriS3q5cyhgJPaOIokYl42h/KhYM3hj4KACXAb4VOiiLOkNajHoCmJ6JLskqKfkzqCpUTATTYy738gRkdkBhuvYo5+dsF80ddGzFKC/095Ejb9FeK8j7p/LhJkM2LEZ2rCWk0IUEx2j0xWGoTi40dmvs88fAmsXwyIIHEFvglhsMrr+YCljJhM7Fhz7Ovspsqa/hirBL9x9sLfuUmXcAp6nfWDcYrPClfpRsYYTJiZ53SoMndufzGa4Jil99deMTAeBO+axu83Lsg/VegfYZ1B2Z1QSPbB4LeOpR8KJJ9/dwUtwLuuqNOV0VG4mliBwyehrI7 zaIZhdcj 12TieHE+61KcSMevHVjhVvec1/f4sDStpmJF6NqVj044ZfjQdZjiD/BBSfU3dwnltDz3e8rhB3r6rimuWxbH03iLEtqZmnUHmk5agkSd4oq4rvgxIZXywpF4e3LRBxBOYaub+YmRe1jyd9zYOWz5a3YXsEzrcJV8RDIKiTH8VmK+fhoLBHCPtrQbXTh12XjsuBlDCCJjR2hFxb88Y7Nq3g4qZ3eDngiSVA3aiHK6rhlpLxKFTPP14gs6RZ6QZgEbuKEO2zYC3Bb4plrJXxc2EX49e6bHD6dH6XxBkXV5KE8oYFCY7ZtEqrgw7xS1W7Tis+Dd6/x0Xn2bk8ItPTlaTPFbSqGhLuHlBvSAf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When fork() encounters possibly-pinned pages, those pages are immediately copied instead of just marking PTEs to make CoW happen later. If the parent is multithreaded, this can cause the child to see memory contents that are inconsistent in multiple ways: 1. We are copying the contents of a page with a memcpy() while userspace may be writing to it. This can cause the resulting data in the child to be inconsistent. 2. After we've copied this page, future writes to other pages may continue to be visible to the child while future writes to this page are no longer visible to the child. This means the child could theoretically see incoherent states where allocator freelists point to objects that are actually in use or stuff like that. A mitigating factor is that, unless userspace already has a deadlock bug, userspace can pretty much only observe such issues when fancy lockless data structures are used (because if another thread was in the middle of mutating data during fork() and the post-fork child tried to take the mutex protecting that data, it might wait forever). On top of that, this issue is only observable when pages are either DMA-pinned or appear false-positive-DMA-pinned due to a page having >=1024 references and the parent process having used DMA-pinning at least once before. Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn --- mm/memory.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 49199410805c..b406dfda976b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -917,7 +917,25 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma /* * We have a prealloc page, all good! Take it * over and copy the page & arm it. + * + * One nasty aspect is that we could be in a multithreaded process or + * such, where another thread is in the middle of writing to memory + * while this thread is forking. As long as we're just marking PTEs as + * read-only to make copy-on-write happen *later*, that's easy; we just + * need to do a single TLB flush before dropping the mmap/VMA locks, and + * that's enough to guarantee that the child gets a coherent snapshot of + * memory. + * But here, where we're doing an immediate copy, we must ensure that + * threads in the parent process can no longer write into the page being + * copied until we're done forking. + * This means that we still need to mark the source PTE as read-only, + * with an immediate TLB flush. + * (To make the source PTE writable again after fork() is done, we can + * rely on the page fault handler to do that lazily, thanks to + * PageAnonExclusive().) */ + ptep_set_wrprotect(src_vma->vm_mm, addr, src_pte); + flush_tlb_page(src_vma, addr); if (copy_mc_user_highpage(&new_folio->page, page, addr, src_vma)) return -EHWPOISON; -- 2.49.0.1204.g71687c7c1d-goog