From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1D68C5AD49 for ; Tue, 3 Jun 2025 19:09:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CB566B04FA; Tue, 3 Jun 2025 15:09:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A3876B04FC; Tue, 3 Jun 2025 15:09:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BB056B04FD; Tue, 3 Jun 2025 15:09:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DD88C6B04FA for ; Tue, 3 Jun 2025 15:09:54 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 716CE814FC for ; Tue, 3 Jun 2025 19:09:54 +0000 (UTC) X-FDA: 83515029108.13.5F71DF2 Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) by imf29.hostedemail.com (Postfix) with ESMTP id 760FB12000B for ; Tue, 3 Jun 2025 19:09:52 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HyA8baMC; spf=pass (imf29.hostedemail.com: domain of jannh@google.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748977792; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gDFpNE99L8Pksyp1ve3dfkrnf/nLOEF78F5qU2cr+gY=; b=vIKBuV/V+WU+kcAG6v5eJkojt6b+GEqIdNS/W0zLKKgf3fOxSOjZNuUfs9PR2jerXIQkF6 sPbAAKlX4cwV2mbbl34HT8t787HxhKvcVF+oLFIoPOu4iPWqnegW8KO3Fxo3iSyV1iaRiE 9PvyGgObTjxAOHE5hBenEUyaMqr9OFM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748977792; a=rsa-sha256; cv=none; b=Gf1EBu5p093LbTqQadrhtJOSe3qQlu7iyBZhjAEnymVBo/YPYzUfbJjPw8wkH0mn5KBoCM Xch0xftrjJXwgc+l4h3QDG5u5GtZTTVP0/cj23fG4iOCAFiq7ufJZtwxK5xrUZ7y8qSWM9 ML4aSjCc5Wb3QkxtWHfUvnM6Rp0KV/c= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HyA8baMC; spf=pass (imf29.hostedemail.com: domain of jannh@google.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-6024087086dso2148a12.0 for ; Tue, 03 Jun 2025 12:09:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1748977791; x=1749582591; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gDFpNE99L8Pksyp1ve3dfkrnf/nLOEF78F5qU2cr+gY=; b=HyA8baMC+cKuZOS/W89x5Dlbrp6uqoiwR0zl+orWPkjaXQLwWf2LrSUpvWcfFf/U84 FP26KS1fyRGMry8g9sPQ0iCFF1hz9dmilEaUzO98s8De+UU5bwbD5IbXZqzFrrCK2KI5 K79+sZ1x0wV2+E1SUcKmhP/4hqoXsmpGG7cY3fNIKoeY0e6EG1xKomeF5La6tqwhc80f GqKtSZlnDKKvV0ZKrgi63Y/sYnkW6gbW8vPE6ntv91MWJ47woBg+1lMGFEFZskF9O7+B 96U7riJj3QAgo2eNOSwSqmvyrNm/UPoJ4jbxca0emVkjY2KsaNxQfRnLGUXzEMWZtiTT Z1vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748977791; x=1749582591; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gDFpNE99L8Pksyp1ve3dfkrnf/nLOEF78F5qU2cr+gY=; b=Jn6+NEifHpXoYMdzWaf4m9vCg6ObTEWargtQUwjv98DGDc8hegygPE8QPJfR87rHpp sIrboqudWBqJf7TCdVvb7ea9M/gXVLqvUY5CXVdq/cNy3DnkbgZ7DD9llcH+z06jKzD3 XlJOeEVH/griGyKNZYncP7JwN2C6GFvAGOOfddCBvgnmnWEDJirQz2WOH0XoFTzVFmfP VGgi7qz3Sv7SbtBUvCxEhFcbgRHNNkkXCkp2vDw2x3HdI9CHM4HnJfYCgIKR3gDzGLjd 3AlfoF2N3iDlejrb5siOvi4JY/mIWuFO45fjJf6v8ty1pdNlJpPCLbxGrtYVEsFWJvNP Teiw== X-Forwarded-Encrypted: i=1; AJvYcCXSAR2lRZM+dVcRQ7goPmSVw6nIvAXGaHFgDeu9HEYJX8GXvsTCHfX5OLMyr5VQ18JLF9puEpJPsQ==@kvack.org X-Gm-Message-State: AOJu0YxVy61gcTsBtHvTUhFPYWtV3aEMh6kWR38HOFdXMhn5UV67EDN6 8k3Df8lSvjuPRFD7jvtVfZdbic7+ssURgtSSfkd8h4sXvfDThXLRu6BSaLjlMj8I084DHvu1xOP 3tRFBi7417UHMwbNA8xq81iEYI46HxvCaLDjcBnVA X-Gm-Gg: ASbGncsNXa6I8mxse4lt/K8urueUTD2s28aGdViMX+NGlSZo6b2/DftjLpBYD+TljM/ o9X/KC5A0+/gqm2qYqC/hO8Y4oi5obBbPVbEMetZOZIWbHYrNtZLHunTXOc6+OmF4xDMzdtVHOW XcVzhGuODCQ+6xwDFV97x7Lj4gG4Eyn85xMXbiVCcaOR2Sg4JMxbxNmbBnBJa/ezQJz3Z/ew== X-Google-Smtp-Source: AGHT+IHYswGWanAgWKQSMSfiihgsgNs0+6ptfegE1kE6Wp0sHUCbGa3ku/B3CMF12kd7vrUzCMpjPaYUWT3nbfhBJfM= X-Received: by 2002:a05:6402:324:b0:606:e3ff:b7f9 with SMTP id 4fb4d7f45d1cf-606e887b723mr5721a12.3.1748977790474; Tue, 03 Jun 2025 12:09:50 -0700 (PDT) MIME-Version: 1.0 References: <20250603-fork-tearing-v1-0-a7f64b7cfc96@google.com> <20250603-fork-tearing-v1-1-a7f64b7cfc96@google.com> In-Reply-To: From: Jann Horn Date: Tue, 3 Jun 2025 21:09:14 +0200 X-Gm-Features: AX0GCFtCpIPsWFmnSr9w2T4NN3PLSn_1dA8DhnOyioQLG_9FoZlqKdP7GI5LlqA Message-ID: Subject: Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot To: David Hildenbrand Cc: Matthew Wilcox , Andrew Morton , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org, Peter Xu , linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 760FB12000B X-Stat-Signature: c3nd5sy4amfhcci6fykyhcayr7xir3e6 X-Rspam-User: X-HE-Tag: 1748977792-763899 X-HE-Meta: U2FsdGVkX1//QFyeQCxpvuBSNbpvejs/3IqXdJPXDF7kGwKyphIU8LVwxy3yLLgAEoZJzpCgtaj7LUXRs+qpXg02VR5giN2c4EVExZWkG4kQqUguEWN5zCFsOW4nAcnOApwUq4/pQSqarLwPP4t5z8ydE4DHtw6cEuf5UNbqfAw+hJO4VenM1C511AiPs/XhedPcHo8aa3c0GOBTg8RgK1XXpWRb324cP6i1vBDdOpEPCgajlAFkMsB9YzVcMEc15Vv5VDC201j5wC9CesimsWEu46SxI4pNJIxRv8cMMU82am6Xw9UOBVq0yhqWq/C1p4qTuQReANXsQESmjJCBdiXFv2Acr7yKWUNE3DXNK91IJ8k8SVuvWEFGQcvafPyN7uqjd8hA6AdyMVU6EroKgkyqySGnBx4Jvl6DY31hN5Y2/l+6aBqrttAw1JtdPwfYlgl6WO1DA0NsI7WBpxbOEdpgQyFraTmg9jaqlFNuRhjlFrH+qA2K3Hm+E9IiDAqKJGaUI470Jg1Tclu3t5eKW+gtJltEG/DHNajCAIWUQ1Q1G4f+UvXB3456zN9PdV5Po0ScYJ0PrkhVF3jwfrfODzhQOG+AUmYYHxWljA6ypk5KGbE/vG5o1jDXvYYAvaKkVAQrkc6XvrydgCPIVzVfoBaBAZVCIL3t+FDg+zDGMl4XfC8CXcTip7GybjXxO6OPcANuOIJG9eNRlKsx5I8pyh4K1jyaxF/XVoYfSdiBGRu6s1SP6eZTvtDQoXtD65ZG0tykVE8d23ka3BF60q77fikkiPxKu2zFoM8WEshlIeGtgIKUyu7Lu8MRM5vSmmAqda2fdFgcXBOMXoU1ZD7NXbzX+izfVsN9WfNs6aIr2AknmZFzTH8iHlbm4KzeCPhUUG7/2Pt8On/9l5VHdyBHVABDel7gJijvB7P1Y5ux461jUiZoHqdvojFv3zWszPKgfO3bB4TAW7TZUKNax3F OB/huk9H NfeFxfVFF1ZH7bBeAYjfhShlsPYVqkdiL+3NAPS8gsp8XTxHZ5MD5IAkT5bfEMAzQNVTeqta1QrrXbNy7sQ9dlk9UNfezLE93f/8VDDRRWzb1XI8G5VhAN8/+d2zignbIUdqLTp7vqewsejrEfI5RBAt6jy066smOL8UPg8qGvYKl6We0DOPTa6Cd51xFwpXdZSPoRDYmcfUjc6govk4JDllTfUAAYBejGrnA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 3, 2025 at 8:37=E2=80=AFPM David Hildenbrand = wrote: > On 03.06.25 20:29, Matthew Wilcox wrote: > > On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote: > >> When fork() encounters possibly-pinned pages, those pages are immediat= ely > >> copied instead of just marking PTEs to make CoW happen later. If the p= arent > >> is multithreaded, this can cause the child to see memory contents that= are > >> inconsistent in multiple ways: > >> > >> 1. We are copying the contents of a page with a memcpy() while userspa= ce > >> may be writing to it. This can cause the resulting data in the chi= ld to > >> be inconsistent. > >> 2. After we've copied this page, future writes to other pages may > >> continue to be visible to the child while future writes to this pa= ge are > >> no longer visible to the child. > >> > >> This means the child could theoretically see incoherent states where > >> allocator freelists point to objects that are actually in use or stuff= like > >> that. A mitigating factor is that, unless userspace already has a dead= lock > >> bug, userspace can pretty much only observe such issues when fancy loc= kless > >> data structures are used (because if another thread was in the middle = of > >> mutating data during fork() and the post-fork child tried to take the = mutex > >> protecting that data, it might wait forever). > > > > Um, OK, but isn't that expected behaviour? POSIX says: > > > > : A process shall be created with a single thread. If a multi-threaded > > : process calls fork(), the new process shall contain a replica of the > > : calling thread and its entire address space, possibly including the > > : states of mutexes and other resources. Consequently, the application > > : shall ensure that the child process only executes async-signal-safe > > : operations until such time as one of the exec functions is successful= . > > > > It's always been my understanding that you really, really shouldn't cal= l > > fork() from a multithreaded process. > > I have the same recollection, but rather because of concurrent O_DIRECT > and locking (pthread_atfork ...). > > Using the allocator above example: what makes sure that no other thread > is halfway through modifying allocator state? You really have to sync > somehow before calling fork() -- e.g., grabbing allocator locks in > pthread_atfork(). Yeah, like what glibc does for its malloc implementation to prevent allocator calls from racing with fork(), so that malloc() keeps working after fork(), even though POSIX says that the libc doesn't have to guarantee that. > For Linux we document in the man page > > "After a fork() in a multithreaded program, the child can safely call > only async-signal-safe functions (see signal-safety(7)) until such time > as it calls execve(2)."