From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B7D4C5AE59 for ; Tue, 3 Jun 2025 18:21:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7AFA36B04E6; Tue, 3 Jun 2025 14:21:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 73B456B04E8; Tue, 3 Jun 2025 14:21:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 601356B04E9; Tue, 3 Jun 2025 14:21:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3D2F76B04E6 for ; Tue, 3 Jun 2025 14:21:19 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EF6B580C5B for ; Tue, 3 Jun 2025 18:21:18 +0000 (UTC) X-FDA: 83514906636.30.C9A2039 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by imf21.hostedemail.com (Postfix) with ESMTP id E1A3C1C000D for ; Tue, 3 Jun 2025 18:21:16 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=waAKRrTJ; spf=pass (imf21.hostedemail.com: domain of jannh@google.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748974877; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iJxZWd2oON32+C/P/o+oaXjFrBdmaU/Nx+hYiqQI4Yo=; b=Ddz4ig/LGcrFNeJ0tQhSVFIcYQlCmE02Y7wZZiHJkQ4rmip8PE08LyNeQsFcHYLexKKeBB j7EaC9lMWMCOABeq5ASfBOSdaNgDbALXvbKLZDT8T2S8QSs3c/SMDcqaxJ3aY4EJIgFfOG mJNkVmkVP7hCsYC4aK2nWnzNqltXccQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=waAKRrTJ; spf=pass (imf21.hostedemail.com: domain of jannh@google.com designates 209.85.128.53 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748974877; a=rsa-sha256; cv=none; b=4XjHkRysUos2RWIXnMXqnWZeIBZbuhA9H/bsaojo0HRzRgWEelZysGw9km2r+lh3zQOCMY 1ZKVSSHFV1g3GiUcai/kx2R3x6KSYsgVc+8qf/VVYCS1/MbPLmpGnCwe/WGMtAZxhcz2e4 hFznoPcQ4US7jsegrb+69jNHQNPG1HM= Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-443d4bff5dfso8345e9.1 for ; Tue, 03 Jun 2025 11:21:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1748974875; x=1749579675; darn=kvack.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=iJxZWd2oON32+C/P/o+oaXjFrBdmaU/Nx+hYiqQI4Yo=; b=waAKRrTJbLiT1TuoUkYHs3m3XqTPbk4Pmvx8GDDpEyuxu09fCCnPPp7lpvwZe73KRo VcKcXaVXgYrHq0Ikb6PkzQtPVzGJHQ24k1jt230a/n7P8xGPVSw+UztBRGoPDcbDBIvm GrxODCQdzw2RzkCTGfKRgOiu10lD1VbEz+jwTT7k4UdBKUi6UTBVLmI++yej3lBYbCrZ 9pU8wEgDO7ucioLWZBe90vmmuD55ie9OLw0H/QCMkkhGcdxsPm3TS012oyU2WlUvB0ba 59CqlPCVaD++AiJlXq3Z8NW2mSqbxOb7O47AcexkcFQBwc1x7cDQDW1R38xwtoRu/DYU AFAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748974875; x=1749579675; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iJxZWd2oON32+C/P/o+oaXjFrBdmaU/Nx+hYiqQI4Yo=; b=R1bae6LiVy9d5btdjv3/M8PpTA4iQehUh/b+mCQYUOz7z617W+N7FiGaTlyiG/+BWB WBh7+yPphTvUOIOyhx127Hy+n86c5eCj8SQYQXRxWSjMmm1VBlW9gjKVAQLDQ3C4rzPE sNsoRUtDoeWthorEDkv+rWu9icKhtT1b19WAtedyPhYvdbjgGkjWorR/QqMNRnmPkY2Y NbGMwkaHC6Ycft8mci28yn8AWoB1vfN0t+COCM4FTd0Bx1d0CTdK0DKJYo9IwxdMPUtB QyMkkpzvHl6V8aPDPYiUJIbSJRbsKSxh9tLY3NpcRrNmBdQWuWuvYwQUp9baW+/0Nby6 8qsQ== X-Forwarded-Encrypted: i=1; AJvYcCUadwj6D+4LJwnP6GBzDHD8beCh1myaUiToEQXCSRtJVF0tGJ5wxqlF8Xkf16rkmlJZAfQmu3LNNA==@kvack.org X-Gm-Message-State: AOJu0Yw6SQuMPJcWAz+6YQkTDmtJb78OZEZcJ1UsSXYy8PDtGS6nBp/v Jv39IWsjwpF/6v+N56ftHslMf2On/axIFSxB3N9+PQyybhgcvY+QtXPRHWBsVyc5tg== X-Gm-Gg: ASbGncuC6V0bXZhfDb4Ka0qR3JzWWhKI+PCdAxYPfESlzkNgit8g48rNYNzrk8/iqoV tWDTPH852kpe5Qr59VHL23UMYsV6ubnAc0ZqPOiLRa36MxKIM96k9yz3Z7AsdlDnpm/XgrJCYa9 hiQzbqnAdX60R1qjL4csjqL/140+0/L9k7PnUFslTdSjjsPM1xeJo5ZVsseZeWt7k3Yj601A6Oe zWOcu1ENtp4Pyjx7/uUy4T4nrTVHCiNuiMuj/DXPGbO0KYj6H4RkgVgmQ0hHdcoSyKlEkiAvL+k kPCba9Al4y6mQ4pqRxHKGjoFi8BzwdUAwKLAFvqq1pnZLQys X-Google-Smtp-Source: AGHT+IE1TGvwXvSf9mIQx7ZxRpLVIxCJeIzDou/MXYqzEUxiQk3yVBHKzt5B8Btl5hNaLFX04/nmOw== X-Received: by 2002:a05:600c:a49:b0:43d:409c:6142 with SMTP id 5b1f17b1804b1-451ef81dd27mr74125e9.0.1748974875095; Tue, 03 Jun 2025 11:21:15 -0700 (PDT) Received: from localhost ([2a00:79e0:9d:4:796:935b:268f:1be4]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-3a4efe6c79dsm18650595f8f.25.2025.06.03.11.21.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Jun 2025 11:21:14 -0700 (PDT) From: Jann Horn Date: Tue, 03 Jun 2025 20:21:03 +0200 Subject: [PATCH 2/2] mm/memory: Document how we make a coherent memory snapshot MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20250603-fork-tearing-v1-2-a7f64b7cfc96@google.com> References: <20250603-fork-tearing-v1-0-a7f64b7cfc96@google.com> In-Reply-To: <20250603-fork-tearing-v1-0-a7f64b7cfc96@google.com> To: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org Cc: Peter Xu , linux-kernel@vger.kernel.org, Jann Horn X-Mailer: b4 0.15-dev X-Developer-Signature: v=1; a=ed25519-sha256; t=1748974869; l=2753; i=jannh@google.com; s=20240730; h=from:subject:message-id; bh=OGF9m/jNWboG8Z+nO8GMpoB9VT0bZx04UFOjYYe+6k4=; b=2MuDrm48JsFni0apxBFkaKLPnRUrZmcHQjpwBWkSbQbDPc2LVAXCkVf9RBfBPkm4SkMTD2VNE u9bUkTThkaHAtVzyvP83uLkgL8+MojazrYEAiEbtYi0/IQZKIWmcP9w X-Developer-Key: i=jannh@google.com; a=ed25519; pk=AljNtGOzXeF6khBXDJVVvwSEkVDGnnZZYqfWhP1V+C8= X-Stat-Signature: bxggdhex113bok5o59zw4acrd5p46z8i X-Rspamd-Queue-Id: E1A3C1C000D X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1748974876-779420 X-HE-Meta: U2FsdGVkX18nizfg78C/Fz0vw2c/zAWcE0cTnD5yfghaS7AJLZGxmj0dfPQd23kjil2cJ53+AsTpy7QUNoQLw//hFRvUg1LdtJfjHNWyLX42HHo+MaOILV/jOIfmH7adxah3RCKW42AlibfbZNgrGfT0ix6AkVvo6DUnPIWSyOpPlMhglb0EHgWpCUvEq4bYug4o49oIlmkD4FbxK+DqMew3mxb0qzBBzdoMB5l+00vaKFWUvTDNjEs2fyUT+Pxhf/wKNc0qMU/hQfsbr+GJtTTDHmB4ggamk5qQwGkhIr3zKLOwmPuw5r+XlK23eeQsM5PnR6BvATuSnTuAKbfcawkVXyH+CTKS0ARBVMXKhtXjx4c4w6xIKkMXA7pY4IK0NtFsnPL/4BpFyRFR0NG5dRwPexCuU9WoIE6qxbrbyamYElbXEoQc2czKuYLUttBE4XEOcsSpC3ZXwFYEglP/IhfJCgDSUZXNIOPX0/EgqEf+p1JFG34cHNf0iYEvG1EbrO/n3ocDU9nM5cXgzEiVzITRGPebjhEReekaACFl7Sqm6M0vCY0uQgMkOJmNWienBQ4kGRsaNYRxhcDQJQzIz0G14SheOvrJKr5XjNEV1fVbcFzM2cC8CK0GXFVlhuhKp5vwmFNRwGt7vsLtfdvsPoMNHItZRdIPlNtoCxk3xX2yFwM9oM+O5qbvWSShtBy9qoe2NVjA4M5CWKvAqWQSeezoVqkk+am27YOoy2AUQMY1w7R92G29znptmntcw4hX7MCnrlm3kIATfqL/KKq+0tRL3beYF8TWMM/tIPiZEwDGWeVrPKYp+M+sSWTFo6lB8dPN+R6BbWaJsD34f3mAKbx39xKAzNLPkzNJWLyZAdBweZLIATszOGjgw+3ALV9m0rTk3KFsKLr8N3l8o3+1hpykCWyEtxnLoOuqYz8ysIvEvExEOVnGUymY3/Nh7GYF84EMGoMgkVK18EZWKru rDNawes3 l8kTBkclgCLmKLEoaLDx5HYYEPv5VnexnwKU2a8uXUmF+b9yf9HuJDq7va7G5SIBRU6MRPecpk/rmc6wBUYAQ7mMi0gcd9kr0DxGehsbOFSi23gu8t4iqNQ3VnQXFW6xywSfXJ5w7Hn62/u9b1WNVL0AtF9hEReoGba4BgUvXdnLZX/WcK78xfaZ68MvUHaCVW9Cg1xtOz0yCxtmE1KIoMG0sXSZdbI16D30w1VgoTdGGeFkOSjTpSupBOxzI0D21ZcWawttY7IFviIqNka1AuNZT6I8gWGwxsHVh2H71a0WMsToTS95HtjDPP6IvMmFMpGTGOsk3qixqPFM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: It is not currently documented that the child of fork() should receive a coherent snapshot of the parent's memory, or how we get such a snapshot. Add a comment block to explain this. Signed-off-by: Jann Horn --- kernel/fork.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/kernel/fork.c b/kernel/fork.c index 85afccfdf3b1..f78f5df596a9 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -604,6 +604,40 @@ static void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm) } #ifdef CONFIG_MMU +/* + * Anonymous memory inherited by the child MM must, on success, contain a + * coherent snapshot of corresponding anonymous memory in the parent MM. + * (An exception are anonymous memory regions which are concurrently written + * by kernel code or hardware devices through page references obtained via GUP.) + * We effectively snapshot the parent's memory just before + * mmap_write_unlock(oldmm); any writes after that point are invisible to the + * child, while attempted writes before that point are either visible to the + * child or delayed until after mmap_write_unlock(oldmm). + * + * To make that work while only needing a single pass through the parent's VMA + * tree and page tables, we follow these rules: + * + * - Before mmap_write_unlock(), a TLB flush ensures that parent threads can't + * write to copy-on-write pages anymore. + * - Before dup_mmap() copies page contents (which happens rarely), the + * parent's PTE for the page is made read-only and a TLB flush is issued, so + * subsequent writes are delayed until mmap_write_unlock(). + * - Before dup_mmap() starts walking the page tables of a VMA in the parent, + * the VMA is write-locked to ensure that the parent can't perform writes + * that won't be visible in the child before mmap_write_unlock(): + * a) through concurrent copy-on-write handling + * b) by upgrading read-only PTEs to writable + * + * Not following these rules, and giving the child a torn copy of the parent's + * memory contents where different segments come from different points in time, + * would likely _mostly_ work: + * Any memory to which a concurrent parent thread could be writing under a lock + * can't be accessed from the child without risking deadlocks (since the child + * might inherit the lock in a locked state, in which case the lock will stay + * locked forever in the child). + * But if userspace is using trylock or lock-free algorithms, providing a torn + * view of memory could break the child. + */ static __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) { -- 2.49.0.1204.g71687c7c1d-goog