From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7CCAE732FD for ; Thu, 28 Sep 2023 18:32:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16FC28D00D0; Thu, 28 Sep 2023 14:32:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 121A78D0053; Thu, 28 Sep 2023 14:32:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00F118D00D0; Thu, 28 Sep 2023 14:32:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E63108D0053 for ; Thu, 28 Sep 2023 14:32:48 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B4FA9140D6D for ; Thu, 28 Sep 2023 18:32:48 +0000 (UTC) X-FDA: 81286852416.14.16C44CE Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf08.hostedemail.com (Postfix) with ESMTP id E31B1160004 for ; Thu, 28 Sep 2023 18:32:46 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NPc4BFKr; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of surenb@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695925966; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Jxxk4KCIHoo6Dl11XcD1x5o63Vws+jZntrILzyholQE=; b=wCQn6WxpzSjxyYg5OwEVZwLOCO62FnAYp/NgN3LTZGi/6TKGksZEqFQJ6X46FaYdL09LZh pM/W2yLng8Mm8hYtTo6dKl3/OsEiAasKo16Fu6wEPJGI89tav9al3tMQ75WZybJ2kabBBZ xJPHULvjr9b1KyQLVVs6hOKhcuXQwMA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NPc4BFKr; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of surenb@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695925966; a=rsa-sha256; cv=none; b=H6mbBpYQkenYSdKee2a6H8jOFNNpY8lT0xsqHx6p7fPuMKUU5LXCxxUI8WrSykP3yHZlq/ Hw/SfzXR08ZPRD3vjz5ksbvoZnTKwuIOJ1BJFXyGCFMqiK3CpJyhA5hCYeAM7UOpKTOBt1 896e8jyYA4yTGAbd+mvVc/XoDDdDbjg= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-59f6492b415so129033297b3.0 for ; Thu, 28 Sep 2023 11:32:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695925966; x=1696530766; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Jxxk4KCIHoo6Dl11XcD1x5o63Vws+jZntrILzyholQE=; b=NPc4BFKrObNmG4GW2Z2/KltIpYzjdj40VAfeuyCyeidqWPAxzqvVpbUYeSnxVEk2C1 prHgl0uMgSRgotQS9nbu2WZ6hJqpNKzsRURz/O7LbCkjSuyOd9SAD33UWijhTOY5V1n+ 3tjshGPPafu9UST0Jj+psTMh7DLms0pYTt21kZokUZPKU8pVbexxuE9ExBKQ11Sp2YRK nN4qZvU0F3FvTq/taxRZzXoC6RE8RS01TrS1XWdMRT3C+ZbuJDtdSshFbwpyACAgBAkB OTLW8jby4tKbr58YCwEgJFLYzgEvRxYHhrTWt70jW7I3TmeYPxUfW1I/BQa0GTn94YH8 UQUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695925966; x=1696530766; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Jxxk4KCIHoo6Dl11XcD1x5o63Vws+jZntrILzyholQE=; b=m5yyi8+3xelJHfZawaQGK+mJZXBoMa149Gf4W9DEJ+YK6wu/hdr7go6wj0eceRP6fj 9/70NXvtqJTqg/epjWCtTLmfrOWrvMgZf9sbHhtY7lnx4VLFeqlX71hUO9KTOPL3D/BC yf3FamUYKHr5go3y85gMLWzGk0RT3rvi34QE1s/HElLKba5H1bw7bleOJGbHrM1BJtKh 7w9SbJA78VCsrfyZbgZhOzULILAa3NX2JOJ6gCeK7+St/pvoT+GfTYoI3ZY9KPv+HIVj L25dOomQaHPgpUPBTJbxsA+1G2sOkf7R9tM8pRqteOlK0UexppHclx+nzA7TSat54TuN LB5Q== X-Gm-Message-State: AOJu0YwUZiDCKymSBEN8lJkEzFUEPGhxTB62y0IsZHf1xP/FWVwKAfDy HNM9upx+IZjhDgDif2NYyAk1ZVTBjLW3KAm7DvPfFA== X-Google-Smtp-Source: AGHT+IFfM9d+nT1mmRvr6ki53RDJdZHiPMVC0Gaj5ZkuC3q5HagWNj/atr1+n5ELXtzcXCs1Xh1khy8oHmcY41lS/ks= X-Received: by 2002:a0d:d84c:0:b0:56c:e480:2b2b with SMTP id a73-20020a0dd84c000000b0056ce4802b2bmr1604583ywe.12.1695925965774; Thu, 28 Sep 2023 11:32:45 -0700 (PDT) MIME-Version: 1.0 References: <20230923013148.1390521-1-surenb@google.com> <20230923013148.1390521-3-surenb@google.com> <03f95e90-82bd-6ee2-7c0d-d4dc5d3e15ee@redhat.com> <9101f70c-0c0a-845b-4ab7-82edf71c7bac@redhat.com> In-Reply-To: <9101f70c-0c0a-845b-4ab7-82edf71c7bac@redhat.com> From: Suren Baghdasaryan Date: Thu, 28 Sep 2023 11:32:34 -0700 Message-ID: Subject: Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI To: David Hildenbrand Cc: Jann Horn , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org, Liam.Howlett@oracle.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: abft3gxx77byw7bqbg5m1umx6j1swj7t X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E31B1160004 X-HE-Tag: 1695925966-232954 X-HE-Meta: U2FsdGVkX19xs7xuP9O3EGqrj2+pE7CHXW3G9gUIZZbV0abqtjvdBW4xNo1/Xa/dUeufwnwMUtIccVUMhrLDmiOVNzz0/l5uuTfLYbZ5aNLK8YlggK2FXxoZNR6NZLi03mgLiwdy2tVdW3k3FxchxHTQST/9qy8cr/1faoWkxmJbyAMf3sW2ZQKF37+wFlopfIo11PIP4ulOWLOSMoc9Dyu61wxXt7ofDtshoUuhygpl7w9AXaHVuIHGKgzd2GBSP2FpbVHJPrl9tZAf8jng16nojtMfXHF5FjDyStmWujd8WUrId3ujj2sd72aF/aJJCr+eP7I1W8eq74fjyw1rC6AOZ1NnHd7sVaRBUVqTx/kJqyPLaiFvInnubrqBeX14wbeyTZUYuxquIMuevo62M7oT5lcXnS/Nvx0wyaPRc/7mifA64t10cnWpPNFKABIgGSTZ7DieMLRaPykE9ZfEbn9dm0ZKtyrsQHC6zerhqq9ExDjrCPLwGtJhtfRTdKTB3ObieJbF3YoopmsP45o79Es7Xb8d+gM4t9TVHBivHKAujVhCvRtIZhV3Pt4xnMiWwu4CqcL5TvpSSKaSv6Z0Qs1aGO5YLytXg8HcdpGKWwjCXu2jvdDTdQxnSNj7DqRO4gDYYsrf3L93SBPaO764PAT5zz/pHdyaIvV7PvDnFbXljvaCVdU5F3ovm0ncFAVbfTbKRvHIv5xKGOmKJmsU1r4bhXjKTAU3rk6vTWlxdcIpvM9DFVTtDEh1/eLjeGphV6a2peHi87XeQyMpckzPLzCJw7vybMnwDmu+c91darVFiOhqUzJGIYI1DtibSbXzDExNPuB/ya4PyTjKaIkPEi1tmzTVyUHt8jVt/xnabgWPB4pJwAZHMT1NpzfXEUbYIV7XRWzixWi4QiYF3kFv2wJNW34f5pJptfE0iPpccSHqYLWCV6Ytxg5/MpyIcmpl+muc7jqe+OAwW5MRm4t 7g3Ot1JE DlvXNRUdCk1DkHQr5lExWyzHUHwLHmJYz6PN4okbG7HEuqLZIb0kdZw63c8NGpu9ShHagHJXEywslFg6sBOICiWro/deBNciwzw0E6VPOiZtJstlA+b4KaI64Exbx1rBNZ4wwKYAjP0rls0kzFwe7qqKJpp8eIDaRlzKTV3yxCn6CrqSeiad/6UfT9aIj4EDDixOsOddEq74Lt+Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 28, 2023 at 10:15=E2=80=AFAM David Hildenbrand wrote: > > On 27.09.23 20:25, Suren Baghdasaryan wrote: > >> > >> I have some cleanups pending for page_move_anon_rmap(), that moves the > >> SetPageAnonExclusive hunk out. Here we should be using > >> page_move_anon_rmap() [or rather, folio_move_anon_rmap() after my clea= nups] > >> > >> I'll send them out soonish. > > > > Should I keep this as is in my next version until you post the > > cleanups? I can add a TODO comment to convert it to > > folio_move_anon_rmap() once it's ready. > > You should just be able to use page_move_anon_rmap() and whatever gets > in first cleans it up :) Ack. > > > > >> > >>>> + WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, > >>>> + dst_addr)); >>= + > >>>> + orig_src_pte =3D ptep_clear_flush(src_vma, src_addr, src_pte= ); > >>>> + orig_dst_pte =3D mk_pte(&src_folio->page, dst_vma->vm_page_p= rot); > >>>> + orig_dst_pte =3D maybe_mkwrite(pte_mkdirty(orig_dst_pte), > >>>> + dst_vma); > >>> > >>> I think there's still a theoretical issue here that you could fix by > >>> checking for the AnonExclusive flag, similar to the huge page case. > >>> > >>> Consider the following scenario: > >>> > >>> 1. process P1 does a write fault in a private anonymous VMA, creating > >>> and mapping a new anonymous page A1 > >>> 2. process P1 forks and creates two children P2 and P3. afterwards, A= 1 > >>> is mapped in P1, P2 and P3 as a COW page, with mapcount 3. > >>> 3. process P1 removes its mapping of A1, dropping its mapcount to 2. > >>> 4. process P2 uses vmsplice() to grab a reference to A1 with get_user= _pages() > >>> 5. process P2 removes its mapping of A1, dropping its mapcount to 1. > >>> > >>> If at this point P3 does a write fault on its mapping of A1, it will > >>> still trigger copy-on-write thanks to the AnonExclusive mechanism; an= d > >>> this is necessary to avoid P3 mapping A1 as writable and writing data > >>> into it that will become visible to P2, if P2 and P3 are in different > >>> security contexts. > >>> > >>> But if P3 instead moves its mapping of A1 to another address with > >>> remap_anon_pte() which only does a page mapcount check, the > >>> maybe_mkwrite() will directly make the mapping writable, circumventin= g > >>> the AnonExclusive mechanism. > >>> > >> > >> Yes, can_change_pte_writable() contains the exact logic when we can tu= rn > >> something easily writable even if it wasn't writable before. which > >> includes that PageAnonExclusive is set. (but with uffd-wp or softdirty > >> tracking, there is more to consider) > > > > For uffd_remap can_change_pte_writable() would fail it VM_WRITE is not > > set, but we want remapping to work for RO memory as well. Are you > > In a VMA without VM_WRITE you certainly wouldn't want to make PTEs > writable :) That's why that function just does a sanity check that it is > not called in strange context. So one would only call it if VM_WRITE is s= et. > > > saying that a PageAnonExclusive() check alone would not be enough > > here? > > There are some interesting questions to ask here: > > 1) What happens if the old VMA has VM_SOFTDIRTY set but the new one not? > You most probably have to mark the PTE softdirty and not make it writable= . > > 2) VM_UFFD_WP requires similar care I assume? Peter might know. Let me look closer into these cases. I'll also double-check if we need to support uffd_remap for R/O vmas. I assumed we do but I actually never checked. Thanks! > > -- > Cheers, > > David / dhildenb >