From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42665C0015E for ; Tue, 15 Aug 2023 14:20:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 91A0C94001C; Tue, 15 Aug 2023 10:20:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8CA668D0001; Tue, 15 Aug 2023 10:20:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7917594001C; Tue, 15 Aug 2023 10:20:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6B00C8D0001 for ; Tue, 15 Aug 2023 10:20:19 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id EF2151C9C2B for ; Tue, 15 Aug 2023 14:20:18 +0000 (UTC) X-FDA: 81126548916.17.C5F8279 Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) by imf12.hostedemail.com (Postfix) with ESMTP id 1D4064001D for ; Tue, 15 Aug 2023 14:20:16 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=BVXUKbGS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of jannh@google.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692109217; a=rsa-sha256; cv=none; b=dllYsXZf0C0yX9Xv5Z21wYAyP+/cxpDLlYxe4c1mkI5Hhdey7mSOtVTbPSRCzzEA5t3SpS +fD/cfTA+31NqJxArDt793ubA2OsY8on0gKJS6hfL4tWk1zFfGqclSzPhSWJl2h8SLHHoW 4fErxX4noZ6UdtgSyniiMbhUmxXxItM= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=BVXUKbGS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of jannh@google.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692109217; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yfq66RGthh/jDqLnx2XGtN7DUux5G0QrMF1Og9o03N4=; b=Vz/OIDqWFy/hgMkj/XF14oeTgJj0WxBRO85EESIIGwQkp+yAlSLbuhvlxAVcQHIXboFJHf yRsWNh6T0naPX1M1vMsJMyw1zGTB5G8GLiSpBwvEWgvON3zFJCVtx2aBQOys9EwfRXDg6+ M4sG40Vgc2GGZMPTDAVQ8renPE1Bks0= Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-3fe2a116565so74465e9.1 for ; Tue, 15 Aug 2023 07:20:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692109215; x=1692714015; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Yfq66RGthh/jDqLnx2XGtN7DUux5G0QrMF1Og9o03N4=; b=BVXUKbGSM5V7XRh64ju6Qv9qJH+89VMWUSArDhe4Bpmz+kwCgWOoAOtk5qvjRUF9K4 LE+kaCv9g/Zt3nm2XEb8qZF87hcQV7Y2wWg48nz2cgMPPU9Nuo16QQTMr19QnBIcuLjc XWx1xHtaWaBxZFKSd3JN7F9SmjnGxkiEIGKfefemy5RLBRq9f6+SfDGpCWSudcyiOhLk ojDMyu/5bjL3HGNTDL0K0JoLPBIbA0v+lfVKsD22174SivhjsZaGDzOyWdPn/H/Jr2sj 4nli5743tvc0w52S0izo1ZpHd61Wn9uq5AB0UI21YvYfYCChc/le0LVK1ILfrbsKDjBv TTyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692109215; x=1692714015; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Yfq66RGthh/jDqLnx2XGtN7DUux5G0QrMF1Og9o03N4=; b=WgYCmCeMZiDYZS6vUDysLhxEdTH8+X6LkCwIpIgATAIJKQNa3ZLBOJw1fuvSZ6nA4x Xnn2P4vH5ePfIdE9WpTPt9lNol+am1jnxp1UUXr9FOkEBmgGPZzVdhSMU9n0Qv9USdvn z7Bob9y50Wqfas0ovTWk7nh+TISQNFerv+YFEdWbbQRqslF8gEBSGz2TaD/g/KAlAhAQ MqkABl3suCZTU77fl1sAlzdYYPDwW3sxYwPiJ22HH/ITBM36u+MajnXOxfPyrMcyI1Uf 4ut9WpmMItZ0N1mv/G8Y7kU8kklXtYEP2KAlQDER1+VU2DRQB9PgfpBn/GasVLA4bdIs tLng== X-Gm-Message-State: AOJu0YywK7GVJQl8TrJQCaimtTJkmxdWTJEkYaGMlf1X+epgEMHvBWf4 UeBSTxjKfEEPgmitPYyDj5nP0jCBWPcRc7Jvo1usgg== X-Google-Smtp-Source: AGHT+IHaqR5H5RDLoB3/4IaWducuvBcmXayHOupbcGnkyyMS+481wUOHO1Qsp00nIl/NatF0z0rPhJZD9YUumpwsljA= X-Received: by 2002:a05:600c:1d8e:b0:3fe:5ec3:447f with SMTP id p14-20020a05600c1d8e00b003fe5ec3447fmr391880wms.1.1692109215503; Tue, 15 Aug 2023 07:20:15 -0700 (PDT) MIME-Version: 1.0 References: <20230724183157.3939892-1-Liam.Howlett@oracle.com> <20230724183157.3939892-16-Liam.Howlett@oracle.com> <20230814191835.jzj7ryvhi6dqwruy@revolver> <20230815072907.fsvetn4dzohgt2z5@revolver> In-Reply-To: <20230815072907.fsvetn4dzohgt2z5@revolver> From: Jann Horn Date: Tue, 15 Aug 2023 16:19:38 +0200 Message-ID: Subject: Re: [PATCH v3 15/15] mm/mmap: Change vma iteration order in do_vmi_align_munmap() To: "Liam R. Howlett" , Jann Horn , Andrew Morton , Suren Baghdasaryan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1D4064001D X-Stat-Signature: xmyai79w13ebbm8e564dbtmfdzr64xuf X-HE-Tag: 1692109216-508018 X-HE-Meta: U2FsdGVkX1920dsunMAgkHxfDiZ/c2QpJHIJdWtGOCRODRv1IFIrD3Ni/0KaN60E0RK+/IzVMVkLyEF3Z62wapGgK3wb507mVvXmOIJCR2Aa61YIO79pQAipsxKYi0ZkiLWg5qeCK0X34Pv1q4i4neMWIcu6A/gpNhNPNVn5GUIOKKZFgAeQPLGfoCL0K0071NuMHerttsiUjaaSm86+pknaaZc3nnY/VJXnW2UVUoMMBk4aJdiYwOxs63Wf7O+CHYM6mosY4JdPeWHgvsNhVzhSm/CLkTC0d0ySPMBYQzTwGbAmn+B8G0viEU+HznFV413Ue0dGtGQvpVSHRNNvxRtrPPUDINlZtRqr2cKuoGGXqjdfU4JNrDMVMMqqhzqcpx6XiPzdHhkIjvgDcqM0xc0wXdziD9lyNNgjdQPXMfbB8YNf+yta0li57We6fNLfyKbVgCuhV3e/efjerI8ZHgJG+7WRsKeXLGdbGQn2k+gaYNwnYBa6Kr3Zmo2iTykmdNY0WL9j926l2agrJHtxomv8SepCxpZn0Kwf3V93HvYkFUSy/XyJSYNoXbC1Fi3txIVmIRJxYBUhHSh2n+VjKMT+vf25B6nYA6w0Bqp+V25elk7ghpn1yJffz2R4L9YHu+7hoeXhzmUyz2fRAJlGztJCEQQncVLP5V1i8P9iRw704FJwL3Wbm3TuwOthRdWxFrJbMW9RzoWnWFZbhoHIkDjhhf6q3rRh+9wt7YDUMLIP/yEcRT9/wotvz2fdHQlBgJwAo/roDtYa1Escq1sciqR4pkxUYjdIDnRzZBFqIZBajY6Yjozom7Cmf2JPrMp/7eLqp27BO2TQwN/Fzsifoa2ylzNyQVmbrggIxTeiUvj6Q/iPbh7D4boc27ci/ySqyAdxONBLu45wGpyDz5tsODHAOJzUtDiTZhVcwOT7N9zP8CuBtp9E6wv2x/sZyqrQ/GIKfFkOgwSw5EKT0LW cmI/GPHu VqHGD36/pYO7go6PNmdC5T2M70da/acKVWZ13hOnbstFEH/o0zrzirka3tA1jJk5HArZAi1SqCl9HkLk0Pm0alTuYvxZ2PxcMzKlVEmojqBCqiRBmjwvvqY1EkfbHkyJ93aw605yxlBGnR/dXjCf+1jz/bDRSfbYKmkdNvTFInDwj3ARbeapfD6MTfVVgD/Axm9ccyldyN0imrCTs/FiLCztwGAuXDE0g/5Kxyay19QFMq9cFKcQCsVAKqCmA2og19dKzDb07SVZE2EEl/QtTuuwgX7wNyaVQqJSBoRcVcz+cJ5uEs2k+O5GdI1o1uNePFM4A X-Bogosity: Ham, tests=bogofilter, spamicity=0.000124, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 15, 2023 at 9:29=E2=80=AFAM Liam R. Howlett wrote: > > * Jann Horn [230814 17:22]: > > On Mon, Aug 14, 2023 at 10:32=E2=80=AFPM Liam R. Howlett > > wrote: > > > * Jann Horn [230814 11:44]: > > > > @akpm > > > > > > > > On Mon, Jul 24, 2023 at 8:31=E2=80=AFPM Liam R. Howlett wrote: > > > > > Since prev will be set later in the function, it is better to rev= erse > > > > > the splitting direction of the start VMA (modify the new_below ar= gument > > > > > to __split_vma). > > > > > > > > It might be a good idea to reorder "mm: always lock new vma before > > > > inserting into vma tree" before this patch. > > > > > > > > If you apply this patch without "mm: always lock new vma before > > > > inserting into vma tree", I think move_vma(), when called with a st= art > > > > address in the middle of a VMA, will behave like this: > > > > > > > > - vma_start_write() [lock the VMA to be moved] > > > > - move_page_tables() [moves page table entries] > > > > - do_vmi_munmap() > > > > - do_vmi_align_munmap() > > > > - __split_vma() > > > > - creates a new VMA **covering the moved range** that is **n= ot locked** > > > > - stores the new VMA in the VMA tree **without locking it** = [1] > > > > - new VMA is locked and removed again [2] > > > > [...] > > > > > > > > So after the page tables in the region have already been moved, I > > > > believe there will be a brief window (between [1] and [2]) where pa= ge > > > > faults in the region can happen again, which could probably cause n= ew > > > > page tables and PTEs to be created in the region again in that wind= ow. > > > > (This can't happen in Linus' current tree because the new VMA creat= ed > > > > by __split_vma() only covers the range that is not being moved.) > > > > > > Ah, so my reversing of which VMA to keep to the first split call open= s a > > > window where the VMA being removed is not locked. Good catch. > > Looking at this again, I think it exists in Linus' tree and my change > actually removes this window: > > - error =3D __split_vma(vmi, vma, start, 0); > + error =3D __split_vma(vmi, vma, start, 1); > if (error) > goto start_split_failed; > > The last argument is "new_below", which means the new VMA will be at the > lower address. I don't love the argument of int or the name, also the > documentation is lacking for the split function. > > So, once we split at "start", vm_end =3D "start" in the new VMA while > start will be in the old VMA. I then lock the old vma to be removed > (again) and add it to the detached maple tree. > > Before my patch, we split the VMA and took the new unlocked VMA for > removal.. until I locked the new vma to be removed and add it to the > detached maple tree. So there is a window that we write the new split > VMA into the tree prior to locking the VMA, but it is locked before > removal. > > This change actually aligns the splitting with the other callers who use > the split_vma() wrapper. Oooh, you're right. Sorry, I misread that. > > > > > > > > > > > Though I guess that's not going to lead to anything bad, since > > > > do_vmi_munmap() anyway cleans up PTEs and page tables in the region= ? > > > > So maybe it's not that important. > > > > > > do_vmi_munmap() will clean up PTEs from the end of the previous VMA t= o > > > the start of the next > > > > Alright, I guess no action is needed here then. > > I don't see a difference between this and the race that exists after the > page fault ends and a task unmaps the area prior to the first task using > the faulted areas? Yeah, you're right. I was thinking about it in terms of "this is a weird situation and it would be dangerous if something relied on there not being any non-empty PTEs in the range", but there's nothing that relies on it, so that's fine.