From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6807C5AE59 for ; Tue, 3 Jun 2025 20:32:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E83F6B0513; Tue, 3 Jun 2025 16:32:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 599156B0514; Tue, 3 Jun 2025 16:32:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AEEA6B0515; Tue, 3 Jun 2025 16:32:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2B7D26B0513 for ; Tue, 3 Jun 2025 16:32:33 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4D1F781725 for ; Tue, 3 Jun 2025 20:32:32 +0000 (UTC) X-FDA: 83515237344.23.C6DBEC4 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf24.hostedemail.com (Postfix) with ESMTP id F1CFC180003 for ; Tue, 3 Jun 2025 20:32:29 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=EjcNgWPR; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=C2K+JcMl; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=EjcNgWPR; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=C2K+JcMl; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf24.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=pfalcato@suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748982750; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bhfHJT53yffbMXSdWcA79R8PFaqMFgQVzTzK7KA/Z88=; b=TWyb3GYC+h9miIiWWZxd8gykqS0pg5kQ4XcK33fSLs5dltKV1xxT0ZVDVOIRsEaoU1fjrI 6/twP4RFeR/6R9XGbPtrl8V+pvKC6PZ5ujANmY2t+pCOJPLpH/vkFq3yLl0evyh67/7rC9 kaofldhosJ+cwCxBhNlxQzZGdPOB70w= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=EjcNgWPR; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=C2K+JcMl; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=EjcNgWPR; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=C2K+JcMl; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf24.hostedemail.com: domain of pfalcato@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=pfalcato@suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748982750; a=rsa-sha256; cv=none; b=WyVmE82ounzOf7VBIjO1lkkogHtVUgVTlXr6P95UjLJ8ndixC3IV0hnXF3LYn7o+o6Rfat pUL95nrlvKk51ZNxP8tM4QbGpP+GHWHdvzeO1+UAVeBUXG4rycgiYWLHcrD9Oe/xkIavI9 25JI2Z0/pwEPGFQVoHJwwYSNJT6+CYY= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2CB0921B3A; Tue, 3 Jun 2025 20:32:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1748982748; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bhfHJT53yffbMXSdWcA79R8PFaqMFgQVzTzK7KA/Z88=; b=EjcNgWPRBPVEYSTQJ0AiTlg/vdUcH1wjG1EnPEAoQqWDm1CKCQf8L2YHQ50G7NVUsyFMLQ RxXdaKFV6mBDs8ZRWUu5pcF37njJSlGMH3ofNIiSWF27ueKtkYG1jAfLDVDeXLEgA4t/iY IhPMiek/fhfR5+JwEcRlZfKpdfBxSeI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1748982748; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bhfHJT53yffbMXSdWcA79R8PFaqMFgQVzTzK7KA/Z88=; b=C2K+JcMlH6Up79Qf5IxU+OC2uCc7A/C6PfVZhMLrJPxyz879N+jBZlV5J583/HDUSEtiAC 8uxTm8jGPPXGGjDA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1748982748; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bhfHJT53yffbMXSdWcA79R8PFaqMFgQVzTzK7KA/Z88=; b=EjcNgWPRBPVEYSTQJ0AiTlg/vdUcH1wjG1EnPEAoQqWDm1CKCQf8L2YHQ50G7NVUsyFMLQ RxXdaKFV6mBDs8ZRWUu5pcF37njJSlGMH3ofNIiSWF27ueKtkYG1jAfLDVDeXLEgA4t/iY IhPMiek/fhfR5+JwEcRlZfKpdfBxSeI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1748982748; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bhfHJT53yffbMXSdWcA79R8PFaqMFgQVzTzK7KA/Z88=; b=C2K+JcMlH6Up79Qf5IxU+OC2uCc7A/C6PfVZhMLrJPxyz879N+jBZlV5J583/HDUSEtiAC 8uxTm8jGPPXGGjDA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 5757E13A92; Tue, 3 Jun 2025 20:32:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id cPs2EttbP2g8FAAAD6G6ig (envelope-from ); Tue, 03 Jun 2025 20:32:27 +0000 Date: Tue, 3 Jun 2025 21:32:25 +0100 From: Pedro Falcato To: Jann Horn Cc: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org, Peter Xu , linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot Message-ID: References: <20250603-fork-tearing-v1-0-a7f64b7cfc96@google.com> <20250603-fork-tearing-v1-1-a7f64b7cfc96@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250603-fork-tearing-v1-1-a7f64b7cfc96@google.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: F1CFC180003 X-Stat-Signature: s1dghd4dizqfgq6qr7qjcaq11xoiswz7 X-Rspam-User: X-HE-Tag: 1748982749-273148 X-HE-Meta: U2FsdGVkX19eBtYFRj+9P+2Dqs9ipGSawG2w2NxIleMrDrVOqTd61n5EpuExXCFdjKVfB6+VxhdZXaCs/y3AR8x/FlY+v4YB9Lh+uLH5Cf7nsXHkOIFY9Ix8viHYcEz3r2alxvoWCdVrfGiQJGZ16m43yQS4fBr8ML3+M9nM9/vlKN3mwNLAFeKZMrVoAcbOUua9Cwo/vow+O6sPdP+aBTODSSXKC7q2FY/OrHtFxhEy99pVMTBGtKejgcXu5TfGLBhYWo/tg6QOmzDX/NI04u7nzEIMkvpsw8hwNXZSiZ1n4iMq8E5DIP1/eYEDTVh74/7vdLwHBuqWBr/GzSUnJx+dOLzSTQhE2AXrXySBP6S7PegNgBdvJXC7FOaCAW+QBlwSKeBRB+09zmGvlEtzTuBXfUMClpOJCNrLo393clFk/mQRmS8j8idWsenk/6WmtQrohRuDmTAJjJlfDwYtkSOhJ4z0/iXgBBMRgbo0Vqr01rANUw3VILmfUEt4LHga7zDQzyExJS8f8cwZJ/9144HvvINQzyoe3ZWgP04Vfk9hxgctoazqzp0/uWmStE6jOi/gg0olYkFNL6bZDA1prTCDIN3c2ITbM4hqKtn8BBYw2syI+YYj/D3ZtZq7XNxT4UCKjNYioxi83sb/le07yHaBdM7goQ9m+oElbL331cpdDPHkPZ2jC1H3iksKFeF5T02QcrfCh0Z77NPWpdeECna8jEgewaGlirg+rBYB68Kqq6LouzlQIO7h00xXZI4LemHpfh7jkZMDEKQTE1OQRtSWqvpkF/C2Hb4ltS36BHltNpNMJWTfbtXlisDH1mR+DWF7lN1F6M7+sXpFK3AGrBoNKAUwhuEeIi1BIAGouwodVFp9b/RNgv2zkDLOnnGp9X/gNS3/7/JqmKb006vA6PTSpNnft6Wore5X+kn+VaKySDVOBQ19NPXhoceGr9e1IdebCCZX8mJdkXlQoJa i62ZzeAd 3h9nCrP6pkiNqkvUP8tjZVugthGtSpYpKuVNJMnXiPw8aDtyOtu/NzqsEJabvz8GvvOR09b/gfB9WvB5OaDwaI7kOQ/bvjh7xE/97qSo8YjJx0uSyHaS2+xAOkMkRUgZ0FtICoO2dCk+v7zFEiUwQAtFNxrkOq0zpY56i7otNBCRSEfar32mHQeNECSTqamAsVmQbcvd220iUh1MHezv358X5zWkmnE7WDa0/TwH1mcz+1e0lQXDTI+GPnf4Om3dh+P0Ot0bcXZDasRBc5KFWonn3A6/EoemqZD0wd0w/D1TAFrY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote: > When fork() encounters possibly-pinned pages, those pages are immediately > copied instead of just marking PTEs to make CoW happen later. If the parent > is multithreaded, this can cause the child to see memory contents that are > inconsistent in multiple ways: > > 1. We are copying the contents of a page with a memcpy() while userspace > may be writing to it. This can cause the resulting data in the child to > be inconsistent. This is an interesting problem, but we'll get to it later. > 2. After we've copied this page, future writes to other pages may > continue to be visible to the child while future writes to this page are > no longer visible to the child. > Yes, and this is not fixable. It's also a problem for the regular write-protect pte path where inevitably only a part of the address space will be write-protected. This would only be fixable if e.g we suspended every thread on a multi-threaded fork. > This means the child could theoretically see incoherent states where > allocator freelists point to objects that are actually in use or stuff like > that. A mitigating factor is that, unless userspace already has a deadlock > bug, userspace can pretty much only observe such issues when fancy lockless > data structures are used (because if another thread was in the middle of > mutating data during fork() and the post-fork child tried to take the mutex > protecting that data, it might wait forever). > Ok, so the issue here is that atomics + memcpy (or our kernel variants) will possibly observe tearing. This is indeed a problem, and POSIX doesn't _really_ tell us anything about this. _However_: POSIX says: > Any locks held by any thread in the calling process that have been set to be process-shared > shall not be held by the child process. For locks held by any thread in the calling process > that have not been set to be process-shared, any attempt by the child process to perform > any operation on the lock results in undefined behavior (regardless of whether the calling > process is single-threaded or multi-threaded). The interesting bit here is "For locks held by any thread [...] any attempt by the child [...] results in UB". I don't think it's entirely far-fetched to say the spirit of the law is that atomics may also be UB (just like a lock[1] that was held by a separate thread, then unlocked mid-concurrent-fork is in a UB state). In any way, I think the bottom-line is that fork memory snapshot coherency is a fallacy. It's really impossible to reach without adding insane constraints (like the aforementioned thread suspending + resume). It's not even possible when going through normal write-protect paths that have been conceptually stable since the BSDs in the 1980s (due to the write-protect-a-page-at-a-time-problem). Thus, personally I don't think this is worth fixing. [1] This (at least in theory) covers every lock, so it also encompasses pthread spinlocks -- Pedro