From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A6DFC5AD49 for ; Fri, 6 Jun 2025 12:56:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A6B926B0092; Fri, 6 Jun 2025 08:56:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A1B806B0093; Fri, 6 Jun 2025 08:56:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90AEB6B0095; Fri, 6 Jun 2025 08:56:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6F5B86B0092 for ; Fri, 6 Jun 2025 08:56:05 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1BF5D1D9163 for ; Fri, 6 Jun 2025 12:56:05 +0000 (UTC) X-FDA: 83524973490.24.CA23D55 Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by imf01.hostedemail.com (Postfix) with ESMTP id 3167C40009 for ; Fri, 6 Jun 2025 12:56:02 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0CWr7Sm7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of jannh@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749214563; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lej/JzmO0Dw1V/XOb+M60AVoOmPPuLiZ67cVi04ciIo=; b=PpZyaeipmd2P3eoDJ/s2wdcpo3vRvF8fN3Sr/BbRfw15IhKsHSwhdWZNbyNQwI6xsyur+Z 1lzrzjh3Mee/GD0u0ibPGvDY73ZLMqeOQUmMGotPWVOJU7AyJBnSDO+WC1f1X0m10Nqqf4 QN4TWzH2xm91BCBFA/DcLt6eq4/w8n4= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0CWr7Sm7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of jannh@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749214563; a=rsa-sha256; cv=none; b=NSqJrRBfWeT/ZUZdU6vEJMooBf/kWDIiK29iyKpSqv4Psng+QdJ7QZgDvviRhaZGXScDsH tjiHvOM8Af0IABcpE1x1vReMRQFIlRv044q7B6KeVJstb0heXO8oqk0qQWTJ2eBQe74FFf dpkcQtZiTdfKgHNuSa20XGXeXtYOFgo= Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-6024087086dso9976a12.0 for ; Fri, 06 Jun 2025 05:56:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1749214561; x=1749819361; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lej/JzmO0Dw1V/XOb+M60AVoOmPPuLiZ67cVi04ciIo=; b=0CWr7Sm7CA+jROHgzQEL6gpucT7oNTBLaIuKKazPKnLrYakqMoqC7YBeog7m4hCc28 lz2R9VxO416VG+VuEzjfbGyjtbo2v2CrpcEtZbuzbe7w/tx7AvGB5noE3oMpQTSw2yrZ MhCVNSBsIFBFK7ZxghwPvAOTvoj86QIexU54SsEKh/sAuYRCIRvBUMY5Ar92/I4NBuol 1aPDkUnM2KPy+vJDMjK2BFIwEQH8Ij7Bho4T/z5t2Qfp9xSjQGEkWUkScP9KyzRj8w5s 3jbNH8Ghb95gdmjEkKGVevwX1SQAf1x03/EKtaeur0TAROahBlFRa9GwMH/Zze6GzBl2 rp2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749214561; x=1749819361; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lej/JzmO0Dw1V/XOb+M60AVoOmPPuLiZ67cVi04ciIo=; b=qFOF7mS64VwkByFiy6DMtDj7rki3tpCS8+vA/8TmJhLS1BYOiqCO1Kr/74X9AcKjRh qR38pXxl2nUen1tyCGzW0sJOv6HVmInbnbpIMi8U/82ksVEBRoX40pIXIPvxmHIwIfug 3B79zHQwB7CsdUr0y821QsVxjsCbNo1xEStgzQ0Qs0ublP+ljc+msOWxew74Q3TlxImg UZIxW9vrnU5tDi5gE5dGRYu7YByTZ46HUnD80jkIFA7IaVW7h0JVYtsq1+SvXCiEgIm2 TYIdwPMa4e2rat+5XdOStrsNL2OX13bGJhgimAnGJ4dB50q87AqRHSLWZJCGvWd9Tpv1 nSLA== X-Forwarded-Encrypted: i=1; AJvYcCV8976JUR1WiIX4UdtA7aiS+/Hpxa3nbFrlp1IvBXz4BpCNhU/6VqXS9ccCpeYDUUEJg8tmNLfWeg==@kvack.org X-Gm-Message-State: AOJu0YylZQ7E4ou9WcqzZtNpk6ZvDKCgar6kmYDH1/6SrSfVYW8qtvZx T0CUX8Wg5/TAWBSYOLwnNgnT9GDCMwerjCHlFXfAjbfXiI//bHhT7p/bN3drczmxbdSDG+KOd8m zNSGxcj9/+IMxWsrb/zrj+IGcqw45SdLkIV/+HoJI X-Gm-Gg: ASbGnct2RahcNeRLH0a3YhWwTbdit6a5utYfeGi4lvMRwck7O+hQKFaMtPVWYoiTWFz xMzfk+Y3gX+Oa8s5BN1hs/ZHnl6o7z/fP6QKPlmtJAOgsuhJOxQHAQknObh0dwfzGPqqAgaFTyH dFde3hdQfc6W7bsOTmUBXhq/it1A8x7TvZ3VJRUV+gYrltQ/kIhWYsanx+hBFCzt+quauMvA== X-Google-Smtp-Source: AGHT+IHXwcAquv0lb5IUM1aVYSbEaPQ46aM0vazICMyj+IzmQp//DMyCXsrGOLh2XTNX6ibsNYUExUiUldgyqMaL0gE= X-Received: by 2002:aa7:c346:0:b0:607:1323:9c2c with SMTP id 4fb4d7f45d1cf-607793de3b1mr62222a12.7.1749214561141; Fri, 06 Jun 2025 05:56:01 -0700 (PDT) MIME-Version: 1.0 References: <20250603-fork-tearing-v1-0-a7f64b7cfc96@google.com> <20250603-fork-tearing-v1-1-a7f64b7cfc96@google.com> In-Reply-To: From: Jann Horn Date: Fri, 6 Jun 2025 14:55:25 +0200 X-Gm-Features: AX0GCFtRkqF2Lt6u6n6mq4FlDl2aj4mZ-IS4O5MGhg5wVl65bWzoxagLke2Bp80 Message-ID: Subject: Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot To: Vlastimil Babka Cc: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org, Pedro Falcato , Peter Xu , linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 3167C40009 X-Stat-Signature: poi6mk6hun7jssxyorh9wsok6ijs1yfu X-Rspam-User: X-HE-Tag: 1749214562-1224 X-HE-Meta: U2FsdGVkX195K0jwA+QJbw2YxUN2z/ypFPxrTlyKS2TECPzF9IjbhPC50Ujzr3/xZw1cnSD0DR+9lmpORez5nzZE+L+o/727JZTuz3g8eSOXZ3NcpZTIJd2fxaSoANss5mMYLYGHKu9kY+22j881alP5LzPitQ9ksQg3Cng1RKXAf71TvQuJQCzw3o7vFgJKLVJ9dd6VH8vQgHaQ34JGKbeqcYgVn++4K1kRRnL3Sc+wDFVzgfypaS42exFEdrYnjB/4uLt1U4rGkjAxoZXweNMulzPPf00uaEt3NeOyTxy0DZRHzJroQt9y12enHGBdC8Y9qsMEjTWrHw5VWV4y9bLS1mCG4gL9PYUHJFwe4Zr9B1oaDgGairOKTLiWslRA8L5PhUuAFvaaieYr/OSTLH9ZKNRs85Jk0NL5hwYxw65Aq0XKq/VQIHMPawKF8Wvb4pCjaFjB1vTRz2wu5kvSfYqcOHqmpzfShVdWJtbMqd0BJaU2ZNRqVhGY1s74PojORR9qC7bukQILCZRIlglZdXKWt1Gj/HWbR67Gf8TWYENZKVXtbFDuoC/wtkGR4l0nAuCmV1kGNSr1NxzQsLsrWkjqVMdV42ZXAN7ymFD06leEUe43FRAiJkeZSIYKSEYD4ix+i52mRwIs9UvRT8Ytzk6NaMEPRTeBPH4MYhMO05C6dyxx0LxrYkmu7PPkUHVlbZbhtcKkQl8u+sRDO3tbZgPyYw3uQoINHoNEqQqa27Sv9BWfZ80YpckHo9QQZIwE27+mb5LXns8w049nWG97gJgR57g88OD617tIKuHBRCBLf6LaUBzIvmZuS+fu+gtbLPy0fnK6WIGWhVfqR3JWzU/Sfuo921U/hWHL1tKEkKg4RfS54d82tWoAuxC5cBKse44m1njvHSgWhBjB4xnizgsekstcRKNPvW7yw8+wbboa0udPr67DcVtIf3VjJrz7IFrzN351d1mSkLZXycD Na4YBOuE B5qYp8cYcc5DYsRqzjJOmh5M8eiRK90n8C9qbVKAJF3OYUoibBQVCXj3umeQHcUiprKDps8Bg9oczD2pCd2REO9/cjLkhqNqFDLHIe571rlKUIrTIGyuFgQPvGCQ2hDZpRulwlpZCAKEzu3DviU3qDzJ4grQjwJC52raNeED4srzTY3/5n16lw/dsoWT7LqPibeWsT+u/YHVUIJlDl+36u/WGonC3xsGOSgXOgK+Jx8gt/6WYPuW1hAogWg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 5, 2025 at 9:33=E2=80=AFAM Vlastimil Babka wro= te: > On 6/3/25 20:21, Jann Horn wrote: > > When fork() encounters possibly-pinned pages, those pages are immediate= ly > > copied instead of just marking PTEs to make CoW happen later. If the pa= rent > > is multithreaded, this can cause the child to see memory contents that = are > > inconsistent in multiple ways: > > > > 1. We are copying the contents of a page with a memcpy() while userspac= e > > may be writing to it. This can cause the resulting data in the child= to > > be inconsistent. > > 2. After we've copied this page, future writes to other pages may > > continue to be visible to the child while future writes to this page= are > > no longer visible to the child. > > > > This means the child could theoretically see incoherent states where > > allocator freelists point to objects that are actually in use or stuff = like > > that. A mitigating factor is that, unless userspace already has a deadl= ock > > bug, userspace can pretty much only observe such issues when fancy lock= less > > data structures are used (because if another thread was in the middle o= f > > mutating data during fork() and the post-fork child tried to take the m= utex > > protecting that data, it might wait forever). > > > > On top of that, this issue is only observable when pages are either > > DMA-pinned or appear false-positive-DMA-pinned due to a page having >= =3D1024 > > references and the parent process having used DMA-pinning at least once > > before. > > Seems the changelog seems to be missing the part describing what it's doi= ng > to fix the issue? Some details are not immediately obvious (the writing > threads become blocked in page fault) as the conversation has shown. I tried to document this in patch 2/2, where I wrote this (though I guess I should maybe make it more verbose and not just say "subsequent writes are delayed until mmap_write_unlock()"): + * - Before mmap_write_unlock(), a TLB flush ensures that parent threads = can't + * write to copy-on-write pages anymore. + * - Before dup_mmap() copies page contents (which happens rarely), the + * parent's PTE for the page is made read-only and a TLB flush is issue= d, so + * subsequent writes are delayed until mmap_write_unlock(). But I guess this way makes it hard to review patch 1/2 individually. Should I just squash the two patches together, and then write in the commit message "see the comment blocks I'm adding for the fix approach"? Or is there value in repeating the explanation in the commit message? > > Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() f= or ptes") > > Cc: stable@vger.kernel.org > > Signed-off-by: Jann Horn > > Given how the fix seems to be localized to the already rare slowpath and > doesn't require us to pessimize every trivial fork(), it seems reasonable= to > me even if don't have a concrete example of a sane code in the wild that'= s > broken by the current behavior, so: > > Acked-by: Vlastimil Babka Thanks!