From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F1EAC5AD49 for ; Tue, 3 Jun 2025 19:04:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DCED56B04F9; Tue, 3 Jun 2025 15:04:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA6976B04FA; Tue, 3 Jun 2025 15:04:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C95F96B04FB; Tue, 3 Jun 2025 15:04:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A668F6B04F9 for ; Tue, 3 Jun 2025 15:04:28 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5BE65C14A5 for ; Tue, 3 Jun 2025 19:04:28 +0000 (UTC) X-FDA: 83515015416.08.86E311C Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) by imf18.hostedemail.com (Postfix) with ESMTP id 660A01C0019 for ; Tue, 3 Jun 2025 19:04:26 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jNZCNH6E; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of jannh@google.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748977466; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=upW65V3EzmhIq53ol8W+l4SqDfLitB19YO+d44FV2Xc=; b=p+yOjdkLkXUPbjD6KmuldlSPZAmi6poqySgh91EdIJfWcwXP5mn1Ql76K66kG1lBtWJAJH tOQ9K4DJCYbXnX05J0whYa1AgfQVJuKf9fagMc3/f+514v3RprHMiMIWzUiJyk0RQyppKc g2ToDivqKvms7pyqh/0pgUjjum3tlh4= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jNZCNH6E; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of jannh@google.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748977466; a=rsa-sha256; cv=none; b=UphtFbCh47YkE7tDE5sM3MpP/dVrAE7HX/grZyit7569X1rxEV1J5MGAgjHXELtNMFcZDd 0YPxnSubfVMaiL79nSgdTCrfKdg/IXx+6XsEjGmbdye2ZAu8cNHQqAo99+EtRGmtjXH0Mt 2pvvytUXQzB730vjliZgu1hOVSMzFXE= Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-6024087086dso2042a12.0 for ; Tue, 03 Jun 2025 12:04:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1748977465; x=1749582265; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=upW65V3EzmhIq53ol8W+l4SqDfLitB19YO+d44FV2Xc=; b=jNZCNH6EQOl2isfvmxCZcZeX/1t7NbkEA3QUGkSwcDv41nOejNKKKPlPOFEX0WlvW2 oZq5KqwdicX4UJAu+9vegrZ8Dj9QR80HKq2vq9KSMSnLhjR34IXM/cjKJXNo5O4d3djq HUUPE8jxQOs7CBwYWpJfkbFzx1HaqjjX0pryr1fBywwsNd5Tva2ltN3/5crSvZnS06IO znj7KifyhEuaptrklMA8YI3ahvTEjHQNFd9kIHzBBoqyljEUPxUQMJjaU7TKW4NjOJe+ HFGWMzoxArSKu+MHbQxAYdjxHOcNSyqzZ3pSVz14/8MqFZoWrg0bWR61F5gx0m+LXCeW tOTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748977465; x=1749582265; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=upW65V3EzmhIq53ol8W+l4SqDfLitB19YO+d44FV2Xc=; b=P4f9YN5mdqlexAHvA7Z9HE53K/9tWUvyz8HX0QdjlQXkd6kZF8cR/V1W6cAuxsyeAt hOAYGKLoy884FnM+wgIHkeTwocey1tCxhvWs7gqqnb63iidZ/k44CG+kO+ToYxVUmQl9 ueiNtzRd5ivTN0cTuGHZO6/kDyp4Hhy4xIbmGGA6ip4U4Vbgx5ggI/h85iavjsPqmztw tyMv+9u1x8ZZSaCSCMdDtFv7ypxy4/GafFcRlMsj84TLnhg+KIV87kwAKNbtC4tO02is 5S2M39HPzle3126lJaPB/PVkQZ4VO3zN36rIKgzLYL/NweiGpiNOfdFEJbVVESFQJ7Z3 Obfg== X-Forwarded-Encrypted: i=1; AJvYcCVHNoHZReZir9vmKrTGroEuiwgSyerdDFZGTy3oBNzX10GBwWY0SJKaLK7ixUiicPOdbYh+YqsBVA==@kvack.org X-Gm-Message-State: AOJu0Yzqwa/713yLwYBbaZysJ2WikCxTssiZmzSAtYUjPDl5Wd+7Ia3p pCChvw26P7uVEUHIvkIwgNM+Vs/5tOu2Dbw3p9N6Sc5/XeWce2Y+cig6rEIOra49pDbs5tj4Yvj wxCQQV/dJdAyETzQGl5LeX7f0ghi+CQrNUNWhXFBR X-Gm-Gg: ASbGnctmLpZ5nHXQrAQXPwdH6vw9ZJ+wowz2i5haOEw1y45hUk09cqwkhfRP779IhzQ sojWvIpTIsvI7DE+AsiDvw5m+zFwoqlCko1iuDOJYlyvkS3oTd7YA+isIMzkFAR5uhaxV0e+OaX RLTMn1sOPggEMbW1daM8g5X0A6Q8rAup5Icwd9DkJs/i3aapfHXh+1ROiGddQ7dGPTuhwk2w== X-Google-Smtp-Source: AGHT+IF27moEhGBhsm/ksC0oM3TPqemrqOpUYPAOkCX+3zVbgLX/VHp3AFay/v6jG66/ME7gmjt84tbCZzSThScYaow= X-Received: by 2002:aa7:d4cf:0:b0:604:58e9:516c with SMTP id 4fb4d7f45d1cf-606e887b718mr4783a12.5.1748977464523; Tue, 03 Jun 2025 12:04:24 -0700 (PDT) MIME-Version: 1.0 References: <20250603-fork-tearing-v1-0-a7f64b7cfc96@google.com> <20250603-fork-tearing-v1-1-a7f64b7cfc96@google.com> In-Reply-To: From: Jann Horn Date: Tue, 3 Jun 2025 21:03:48 +0200 X-Gm-Features: AX0GCFuy6Pe40ZC9PMfricVFy-j_0bakFIrtt-vNxRDcTJvFrwsni0IVhkAOpww Message-ID: Subject: Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot To: Matthew Wilcox Cc: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org, Peter Xu , linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 660A01C0019 X-Stat-Signature: qkqgiijdaiirzuyxea1zbiznpxmt44e3 X-Rspam-User: X-HE-Tag: 1748977466-841109 X-HE-Meta: U2FsdGVkX1+WhOkD3MQi7KDmbeZhy+52VVymdb1OwV2mlxwkvwHJCBwOw81wkOB4bwbHxIWWvLNVHcjNJGW2DKOVFkUuUzXZC55fD2PqhoUTQumrkV1Kqkb2EmXUBmyuS7DtxSDCB6GJmF5ytRDhmN6VdXH2KE/1iDQAV/Xp2Vgtjl8ZvAhu0sFUaZ8NBDTQtnTuMKmfXVggHe9P4CTLW4O4qBGg4xaxgnFAq3jc64HPJvtxu4kdfv18gdhjP+Mzh7UHEJp8ShunlnMYHHM23qWZAE6kCRzTPRC30vUBW2289zHm00UrwDXpEVZNs/S3grTQohTIUIMdq2yiT6ZUT7Z5K8X5Tl7MyIl7bymDCixv8u77KxrYWepm7clD3hMt/u1S9S23lE97K5AeGEtk5zLSLQQ6L4YFsV3haKT+IN/9H8k3lVSmu9DvJc9kKA1+xaImhEXBKr+fGQIf+fEGKjwmB4asWvXyGROtrHj9BODJv+BmmojAqLo3Y5nMFdW4KZ+3yCQlNE59nNb5vdU/U5xpQ/tzDg5aJFm+w7gzm8pD/VUiP7QqRvRdbj+bPz6epGGprrA1HZfU2Wmta5XtqHjkU3bCG+vnbBevVsa73nUxkwxFTrRLfXd0n9Y4g1KgLTXYcAeubGGHmJu22NI6mtUlhIGgzRV9LhUdfhn4PbSsqWpU53qZxuBVquDVPfbr8K3OFkn7OUu6Bi4TKCS5cdfr36U7I9rVsHkJH2K6wLb3LU3stizEqsMsGBlzfCQgwg+K7Oc1LVtfOJocQW4LtwTAsy5DbrBGogONr5MRTnBpRtX3Dd2eg9poRJGHXjoXxGHG6dhIfUwN/cqmMgOyvFv9yDfshIYcw/aJoHRx1HazWkHSyo+nhU579umVBsWZR2qE0fOaP+axW61T4UytBGOYW6IgkSVC6BCPkjoQr1uuslBkCfPuHxjyu71MKh7j/D5+97MhEqwZGs2FYTw Rr7hVhTj WO3Zg001BumS7cGOj5XXRnzNgkO6NdmK0YDi15O69EtA03HJ4nK+mVDZKzITpDclECZkIYO30M7sk4fRv3yX/ZpoKut79m5O1B0nVGbK6X2zCofX1B6+fSO6BaTKQ6fnmJmT427WrEvIJL6ZtfFwIW+ZReuzcu6ilEh6gnCXNGW+8TOLlZ3ZsLTuQn4u9ouMFP8xN8tQe5Wx6p3fJznM/4Y+TdPLrr82f5u+dkz31WppahWTgr8+SqmuviXzK3hooq33B X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 3, 2025 at 8:29=E2=80=AFPM Matthew Wilcox = wrote: > On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote: > > When fork() encounters possibly-pinned pages, those pages are immediate= ly > > copied instead of just marking PTEs to make CoW happen later. If the pa= rent > > is multithreaded, this can cause the child to see memory contents that = are > > inconsistent in multiple ways: > > > > 1. We are copying the contents of a page with a memcpy() while userspac= e > > may be writing to it. This can cause the resulting data in the child= to > > be inconsistent. > > 2. After we've copied this page, future writes to other pages may > > continue to be visible to the child while future writes to this page= are > > no longer visible to the child. > > > > This means the child could theoretically see incoherent states where > > allocator freelists point to objects that are actually in use or stuff = like > > that. A mitigating factor is that, unless userspace already has a deadl= ock > > bug, userspace can pretty much only observe such issues when fancy lock= less > > data structures are used (because if another thread was in the middle o= f > > mutating data during fork() and the post-fork child tried to take the m= utex > > protecting that data, it might wait forever). > > Um, OK, but isn't that expected behaviour? POSIX says: I don't think it is expected behavior that locklessly-updated data structures in application code could break. > : A process shall be created with a single thread. If a multi-threaded > : process calls fork(), the new process shall contain a replica of the > : calling thread and its entire address space, possibly including the > : states of mutexes and other resources. Consequently, the application > : shall ensure that the child process only executes async-signal-safe > : operations until such time as one of the exec functions is successful. I think that is only talking about ways in which you can interact with libc, and does not mean that application code couldn't access its own lockless data structures or such. Though admittedly that is a fairly theoretical point, since in practice the most likely place where you'd encounter this kind of issue would be in an allocator implementation or such. > It's always been my understanding that you really, really shouldn't call > fork() from a multithreaded process.