From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DD13E732E9 for ; Thu, 28 Sep 2023 20:11:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 97F428D00DA; Thu, 28 Sep 2023 16:11:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 955818D0053; Thu, 28 Sep 2023 16:11:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81D408D00DA; Thu, 28 Sep 2023 16:11:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6F2D58D0053 for ; Thu, 28 Sep 2023 16:11:29 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4084F80A8E for ; Thu, 28 Sep 2023 20:11:29 +0000 (UTC) X-FDA: 81287101098.10.62F4980 Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180]) by imf25.hostedemail.com (Postfix) with ESMTP id 7ABDAA0002 for ; Thu, 28 Sep 2023 20:11:27 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oLW7jeU4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of surenb@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695931887; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ij2wWnpIU9fE1J8oKziqGhNF7yeojp6FxZY+hdmzkd4=; b=BAJrObXobF89lAXf3ijMRH3B0s8VcU293M9h3aBkCJznL8qCFFpMfkv2WN9mWPx8ceSLny mXSyh9KL3rKcNOOIdyQOWX9E/Nku7avJXJx/B2jp3em0PQBdjkHBGnLHSTWWttyF+DBdKm SosURB9MHertOepxjDWqp5BD9iN14tA= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oLW7jeU4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of surenb@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695931887; a=rsa-sha256; cv=none; b=L9aqtzhZwmG0q86SQK3m1kmDaj330zvJqd6WpsE/UX73cQc4nNQYz9WQgQlNjF2i9zhgIe N8TTU/L9/piVHY8JObgn3dC3kip1O+9g/NBl5A2qm/D9WzloY4KVsLuFOE6iN9IpnSJZKY WkxHF74M/Nv0sk1aWYAM2EVbAXEiURE= Received: by mail-yb1-f180.google.com with SMTP id 3f1490d57ef6-d8afe543712so786494276.0 for ; Thu, 28 Sep 2023 13:11:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695931886; x=1696536686; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ij2wWnpIU9fE1J8oKziqGhNF7yeojp6FxZY+hdmzkd4=; b=oLW7jeU4tQ1ZRhLM9BcwGOBhUWBeALEWe/QeDaLZjZL7yxkWlWeNull4rt1paHU+iZ 1z8P2+f/2hyS/ihrS+VDLqp2bTMxSOKdnJNaOBXlnxHhMDSjcZfr7CYU+QQxxIi/IRCg f621xItOjIkxf4ucxSDBOvfESoxWs2Z5NZhkJinXrbkoIfPOQZ8swfZW4ij4kb7owwUK tTHzxUC0KdWkQFVbrib8JbjXU6yJjhzNTBzG+K8E2pW0Nf8YEwaaF7ApTF4CvR0MmlCt BKu+dvR8Qm8/unnGUlsC6R53DJqN3jZdTxwrrUaLdfheCf0/XAiSmsLVuibA7c386lx7 VFFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695931886; x=1696536686; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ij2wWnpIU9fE1J8oKziqGhNF7yeojp6FxZY+hdmzkd4=; b=N9iMgrZuvgVccw19S6Kw2SlUAxElj+8GeReZVvopY5X3MINWOPSN0XdmbWJqDMHjC3 lHbp69UfzlEHX/u/jmoWO7Es6uE7EqZXoHgqLboPaCigTHJu5a8gjF55aghEdWrGfCnY FaDEW+kalDmSblRfPNaHXWUpXsfDt5Oa32eHf2tJ3efL++9uIZ1kYNUDqd9Y4Z6mtgFQ /K6dY/xtcAhJklfCm8/W3msG7w3LqtnIQPtNKMbQbBBIF0EE3t8184c4B1YsRB14yKya yD/zHo70ZrMUq+DCVKoucrFYAbi1nZ6dKZXTWl//N777DMgbH+2hM2ts4oKTDphvEgvO saKg== X-Gm-Message-State: AOJu0YwVniP53/NbLe7pA6lGVvz3EsgmZgLzVcaxGiN9wZbvGk9tlMF9 Ko+QaBzcBMhBOM8x62arHIc+BXtW1aH50V6/3KdzJA== X-Google-Smtp-Source: AGHT+IE8HAlEvC4nz0JFJXaFuxRDxZuz3GRpHHCV662b5okO+TtRLx+M3iPAd1C8qEkuoWljckYFsN28Ojjnz7MS5II= X-Received: by 2002:a25:a144:0:b0:d78:1502:9330 with SMTP id z62-20020a25a144000000b00d7815029330mr2066119ybh.7.1695931886285; Thu, 28 Sep 2023 13:11:26 -0700 (PDT) MIME-Version: 1.0 References: <20230923013148.1390521-1-surenb@google.com> <20230923013148.1390521-3-surenb@google.com> <03f95e90-82bd-6ee2-7c0d-d4dc5d3e15ee@redhat.com> <9101f70c-0c0a-845b-4ab7-82edf71c7bac@redhat.com> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 28 Sep 2023 13:11:12 -0700 Message-ID: Subject: Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI To: David Hildenbrand Cc: Jann Horn , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org, Liam.Howlett@oracle.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7ABDAA0002 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: xh4ip8755fb45gm7nbw9ukysntmb3p34 X-HE-Tag: 1695931887-412251 X-HE-Meta: U2FsdGVkX1/N1A0WOX2T6K6D+Rw54V2KaWkJV3FcbY/qhsx6CrZKzDPfDBCHUwira1xbZNmOav86VwaWdUZ8TKYTC7zqrb9fc1t4nAQwxl/oFjNGhEKwbJp4NIUflgAZtsdmhbjD8+Z3ALtUp0wuP1vJ5sRfjPT7sPYBoz6gZ4JXWpfbVyKhad1LSgsNwmNk1vRxLrLeqzk7QzF75wKauIka+cjbR/TZp9vkLV+bgvlE9fSTHerVY//5HbzNVnoeVQQWCPto5fIcZ8/s9XXmmd5oRcAYGuRyIAF9ouybKmoTcbJ4m6c8J6uVUyYmgUrWcdVfpH1LQgdWsDUlVSDzGu90ny1mS5xlBpzZv3aXGAug5KJ17zzNp/tQBJu4fp8OpFgNLxcdVjwSlCnmOTvxq/RU4juwUpIJQ3qjFFvbe0X1VSO51uFIQ7ZVM27K4n/O2XirgGdxlU9PXZOaqz0I1aG0gQ8TkCZNLKeeKW7xFf93bBnwEoEg7q7RoqxU2PJRiM4yw/IbUigbPq4V3UPkeL/qscMkQbiDOHijMUtgSeI26sDxOF0yJOHcinBkIeJhaCPK+MgfVfvg9bf3HgGwNhwFkejA5SLJ8OfbmFZkiy+dSxaEs4vxPGhVeCoWWkFsUpu0ojvmsArpDIghK/RbWeLFrSRRFnIZAY2hSK4xoNNECjDWEfirAfmh8u+9AFjfsb/XdgKFGzWu8sQOmLG9Ca9vEWl9SVo4T20pdE15edJS58llBEBWiG8LMo46gYIurSK4ShLotjkSeEupEFZ+ctjd8H8VnH7+ShjEBsaESltoaHbzbOwiDztwy2DhjUPFMmPHtX0BvLFBmUWqQE6s/tWmBtC+TAAfHY4cEvxzTs/grxatsu1iqPsN/OdqG+4Y1gZjbW5NzPE9C8R9mez3C+8Eqy75jq3xEB4+ykFwUuvl0P13xBaBlwiHCNLySyeTuS7gvHBEi9Is6Q5NQPd g1Ak6vQt /6HThDbxYaScW3JOJujxDlsYX7/88M3AbjXTxbyd7xCwlEpWplXUDaQylaja0AO4hWzZqOTMfVpytxZUjbQs/ki4/JiEHjHEzgvwtV49WXgWfDhhvbQgXOTjhanUsp8v9Z2OLhCnI8MWS2Ir0KsyI0zpgI4H5TIMpRtLRgFJpEMZ+rhLqeSw9t9FKeO+jj6RfZXacz+r8aCYdi4VLhH3BiTuvMMZE9dYTQRBjcocguCcpmhNYOek85XLg69rtU8jKMbTNVTldVcUDzQH+LC4k22owBxqosHrMm6RRzR+bs4TJ6zq6OJrdm/fFAqA81OO+Xw75A6xk/+1qv2s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 28, 2023 at 11:32=E2=80=AFAM Suren Baghdasaryan wrote: > > On Thu, Sep 28, 2023 at 10:15=E2=80=AFAM David Hildenbrand wrote: > > > > On 27.09.23 20:25, Suren Baghdasaryan wrote: > > >> > > >> I have some cleanups pending for page_move_anon_rmap(), that moves t= he > > >> SetPageAnonExclusive hunk out. Here we should be using > > >> page_move_anon_rmap() [or rather, folio_move_anon_rmap() after my cl= eanups] > > >> > > >> I'll send them out soonish. > > > > > > Should I keep this as is in my next version until you post the > > > cleanups? I can add a TODO comment to convert it to > > > folio_move_anon_rmap() once it's ready. > > > > You should just be able to use page_move_anon_rmap() and whatever gets > > in first cleans it up :) > > Ack. > > > > > > > > >> > > >>>> + WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, > > >>>> + dst_addr)); = >> + > > >>>> + orig_src_pte =3D ptep_clear_flush(src_vma, src_addr, src_p= te); > > >>>> + orig_dst_pte =3D mk_pte(&src_folio->page, dst_vma->vm_page= _prot); > > >>>> + orig_dst_pte =3D maybe_mkwrite(pte_mkdirty(orig_dst_pte), > > >>>> + dst_vma); > > >>> > > >>> I think there's still a theoretical issue here that you could fix b= y > > >>> checking for the AnonExclusive flag, similar to the huge page case. > > >>> > > >>> Consider the following scenario: > > >>> > > >>> 1. process P1 does a write fault in a private anonymous VMA, creati= ng > > >>> and mapping a new anonymous page A1 > > >>> 2. process P1 forks and creates two children P2 and P3. afterwards,= A1 > > >>> is mapped in P1, P2 and P3 as a COW page, with mapcount 3. > > >>> 3. process P1 removes its mapping of A1, dropping its mapcount to 2= . > > >>> 4. process P2 uses vmsplice() to grab a reference to A1 with get_us= er_pages() > > >>> 5. process P2 removes its mapping of A1, dropping its mapcount to 1= . > > >>> > > >>> If at this point P3 does a write fault on its mapping of A1, it wil= l > > >>> still trigger copy-on-write thanks to the AnonExclusive mechanism; = and > > >>> this is necessary to avoid P3 mapping A1 as writable and writing da= ta > > >>> into it that will become visible to P2, if P2 and P3 are in differe= nt > > >>> security contexts. > > >>> > > >>> But if P3 instead moves its mapping of A1 to another address with > > >>> remap_anon_pte() which only does a page mapcount check, the > > >>> maybe_mkwrite() will directly make the mapping writable, circumvent= ing > > >>> the AnonExclusive mechanism. > > >>> > > >> > > >> Yes, can_change_pte_writable() contains the exact logic when we can = turn > > >> something easily writable even if it wasn't writable before. which > > >> includes that PageAnonExclusive is set. (but with uffd-wp or softdir= ty > > >> tracking, there is more to consider) > > > > > > For uffd_remap can_change_pte_writable() would fail it VM_WRITE is no= t > > > set, but we want remapping to work for RO memory as well. Are you > > > > In a VMA without VM_WRITE you certainly wouldn't want to make PTEs > > writable :) That's why that function just does a sanity check that it i= s > > not called in strange context. So one would only call it if VM_WRITE is= set. > > > > > saying that a PageAnonExclusive() check alone would not be enough > > > here? > > > > There are some interesting questions to ask here: > > > > 1) What happens if the old VMA has VM_SOFTDIRTY set but the new one not= ? > > You most probably have to mark the PTE softdirty and not make it writab= le. > > > > 2) VM_UFFD_WP requires similar care I assume? Peter might know. > > Let me look closer into these cases. > I'll also double-check if we need to support uffd_remap for R/O vmas. > I assumed we do but I actually never checked. Ok, I confirmed that we don't need remapping or R/O areas. So, I can use can_change_pte_writable() and keep things simple. Does that sound good? > Thanks! > > > > > -- > > Cheers, > > > > David / dhildenb > >