From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 454A5E82CA1 for ; Wed, 27 Sep 2023 18:25:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C8AB8D0093; Wed, 27 Sep 2023 14:25:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 878E88D001C; Wed, 27 Sep 2023 14:25:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 740728D0093; Wed, 27 Sep 2023 14:25:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 64EBF8D001C for ; Wed, 27 Sep 2023 14:25:41 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 304591CA73A for ; Wed, 27 Sep 2023 18:25:41 +0000 (UTC) X-FDA: 81283205682.22.38CFA83 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) by imf26.hostedemail.com (Postfix) with ESMTP id 3AAEB140026 for ; Wed, 27 Sep 2023 18:25:38 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0TuMPcIV; spf=pass (imf26.hostedemail.com: domain of surenb@google.com designates 209.85.221.52 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695839139; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WFhF8RUb+Bu9+rM6xrYeWt5o4VbnL73BwlvF+/CLnlY=; b=ltLK1Z0Lf93DHQ/r29KJw+3OJ1Y4XzKDOtsEqiYTarXP7L/STb32Vaap/Lz00+2wEUrpxW WwbP1fUcZx++rUCjDYqE604odsQtgFkHBrfyax/kXjaps/jNgHatt7mvQsMYIZ4AA6cixK sBVub2Q4CdPFh+/l5plNmOV7tBrn/g0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695839139; a=rsa-sha256; cv=none; b=aIttVkuLjS1e+LvXVvFpyj8ajckLxzNeJ/TTaXLcpQhkMk5xWNBUYDAbFRAqWTPa1Hvpik QRUuNyFlPNVvlquMHbzP/JY6ZvRYkbiRpa5TdWaZR+JYQw8vaBLPQIJWkMUZuJLr2E1O1b nQ+Djh8sCBJ+bhnl5HQg95VakS2BDUI= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0TuMPcIV; spf=pass (imf26.hostedemail.com: domain of surenb@google.com designates 209.85.221.52 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-32325534cfaso7131157f8f.3 for ; Wed, 27 Sep 2023 11:25:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695839138; x=1696443938; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WFhF8RUb+Bu9+rM6xrYeWt5o4VbnL73BwlvF+/CLnlY=; b=0TuMPcIVDbu8LquScC/aNstkW9AM/GXrsL+bqR2jjMJbA2Mg1o+8w3mk4c9ga2XZPO BgYNWocmdVrHBOmx1mqpCIEP2lb6bASvCj71nZPbgNeo1PeH6lNU8sfjvr3lpqsgDTEN l6Lu0haPkF2KpEvY+n+8ibfhk2VvE0T0dul0lVETTZrf2r6TUQw+upOT9rdjyiXICcQr FihJXj7lJHMYAR9CVR/lTo0YkTSqqrMiE3/xeInGVumUeZ4OeLxZOw58t32MDCkwO7a9 dhu/gcBqYctWr13z77ain/3FPuq8KI+JsPqMwNSvXl3OCb8WSwa1RcC9e4PBaW83LKZ0 tTlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695839138; x=1696443938; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WFhF8RUb+Bu9+rM6xrYeWt5o4VbnL73BwlvF+/CLnlY=; b=s/XuupuAt6Xa3ljHBgmMYb+tfR09VArsl2l4UdAaW3L3ZCIkslXgglCmGoGa3SL4VK 2vV/JO/Pjyc0DD1wCrm+fkZEBRq7Yo7Y3rxEprNGZYXJ/LkojtF1mnJSWXDIXbp+YnW0 b4wXaBEytP9MxMSHwX4qWut8XpFqYUmKLQJxTwxiyf8T3N2Bz+FUrW8M9bHfG75JIcft m4+dmRfqLZKa9GmQXShyWwlVF1d5NYaLPC6at1Z/AopUsJPHWmDoboGyXl+SoamaCFio xxhKBQF2DRG1zBnrH2G9lNpLfN1p2IowT03ymcVA0Vinj3jOw8qnYt/8v781LC/4h5WZ e4wQ== X-Gm-Message-State: AOJu0Yz7svoYUegjfxloo9iz1sOeGK3oox0MAOn4LQZ4jYjm+pUjO7Al lstRTO1q/+I7GFGt6PXx5tGwaGSHhHiPcnWsecZWlg== X-Google-Smtp-Source: AGHT+IHIYVkQLwWoPb+88gb1yMyqkYxT1InPOY5PMBoDofZ8kX8dESzReiqpG2SpAsaSdlnLjgyCI6biew8zpQzuFIY= X-Received: by 2002:adf:f986:0:b0:322:5d58:99b4 with SMTP id f6-20020adff986000000b003225d5899b4mr2685547wrr.0.1695839137441; Wed, 27 Sep 2023 11:25:37 -0700 (PDT) MIME-Version: 1.0 References: <20230923013148.1390521-1-surenb@google.com> <20230923013148.1390521-3-surenb@google.com> <03f95e90-82bd-6ee2-7c0d-d4dc5d3e15ee@redhat.com> In-Reply-To: <03f95e90-82bd-6ee2-7c0d-d4dc5d3e15ee@redhat.com> From: Suren Baghdasaryan Date: Wed, 27 Sep 2023 11:25:22 -0700 Message-ID: Subject: Re: [PATCH v2 2/3] userfaultfd: UFFDIO_REMAP uABI To: David Hildenbrand Cc: Jann Horn , akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, peterx@redhat.com, hughd@google.com, mhocko@suse.com, axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org, Liam.Howlett@oracle.com, zhangpeng362@huawei.com, bgeffon@google.com, kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: mp5qpcedgr9qqr8dj7jibbeg6tbxgcuq X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 3AAEB140026 X-Rspam-User: X-HE-Tag: 1695839138-514327 X-HE-Meta: U2FsdGVkX195sskg5tXH1M75r2AagV38m7OmwaJ/ANZoV3+NBhQJMM9d74hMiJEJUbP892LsOC7FhdHgiLmHvnMvVrgs97Z6zw8HSTniiEY+F4Y98piSaS+ImGKd6O198rqV3yYZSBIDw6Z2dJxHARRzWrKWdFMWqvuoYZNdCR1KIvuQJTaEFMpJqIDPrQxiZr4YAuOOCZXgTFOYQkULhC68tIFq5vzanM6AzfJfNhi1qos3g3S0Xo+VGwqS7v0RkZ5P70c3fq9JIEAkkrb76v+hMPMpJbrz9gFiUZ9hBsNuFiPJ52WAthp8I0EA0+lhKQ1biUUEGN/TWRnDGyCbBc7Qr+iqMcYZwZ49napPPfbBrA6YosAx9+kKnYvekrFp5c2Lbty1oOdqs3F2fOgKerTiu0LzrIFHKK6ebzYzcpS6FUY+oDpCO9u9plylcPXxsEcTNgUWm93yqy/OJjZy4v3vwDlqX1kMlWQ7QvIz7WPaQX989ak038ZUkMm4HuN9Yen3KF07hoyvY78pmkxyM8EdxMVKkAtShpB3GFXqYiQECXbJ6BVaHPUBayTOPcnD2OZGiTSFSOByo0GhSLQkd7do3x9jX/PE+o7EZcLXXSA6rCjaJ1Qa1YpQRp+ukZ6ATDrB+PSgan+nHOIdJk2b01g+QJ4m6mqzs1zA3UBO8CFB01QWV+72YP199udf7/oVwleXg6HFnT8T7828Mk+pxuB//bAnBjWylckHbfotvHXTj+Qh13ldn3IbWU9dZySxnBUFvi1oJlt/wvAkB9xZfBz726ssOds2gMtNxG1wzccJWxsqJy1Z0omKbpvB1k50xKw5kvxNnHHVFszUjLjXQHYV0gxZFWA5opovqt7cDQVWvxoazQFePwgvmWy+L38o8DcSa7fIw5TZKrs55huxeQ3gOhKbCf/D8pr78xepRh3GIL51GLMRW/sNiIgeURPF92HZL+PUbKw7vxoIU73 V8PN1ou5 USrssIsAAunw+h/i3MwoSsb5lGoFELTeGPDz3B/dS9e0p0z+SDBDddZS+prHBPPIEWEVYi3hq/LhKehvnpEG3wpiRtDnL3DHoOKAjX7GcnITmC895Vhys/UQ15fDEffOOG+nrklumw6pSRwAvH0pdZmNXP49pLyZDmyuYYLhuG5zSo97myTQZLJXZn+WID9eaNEaxE0//N1JssAqQ6HCd1N6BhxWoIvcuiaIN7J3XujuJNXz8PwhGROv+1s4ynY22NIrfi9S2nBjcAuQ+5S6Ab23MfefwS5NxunMLfzcPpEL3OGU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 27, 2023 at 6:29=E2=80=AFAM David Hildenbrand wrote: > > >> +static int remap_anon_pte(struct mm_struct *dst_mm, struct mm_struct = *src_mm, > >> + struct vm_area_struct *dst_vma, > >> + struct vm_area_struct *src_vma, > >> + unsigned long dst_addr, unsigned long src_ad= dr, > >> + pte_t *dst_pte, pte_t *src_pte, > >> + pte_t orig_dst_pte, pte_t orig_src_pte, > >> + spinlock_t *dst_ptl, spinlock_t *src_ptl, > >> + struct folio *src_folio) > >> +{ > >> + struct anon_vma *dst_anon_vma; > >> + > >> + double_pt_lock(dst_ptl, src_ptl); > >> + > >> + if (!pte_same(*src_pte, orig_src_pte) || > >> + !pte_same(*dst_pte, orig_dst_pte) || > >> + folio_test_large(src_folio) || > >> + folio_estimated_sharers(src_folio) !=3D 1) { > > ^ here you should check PageAnonExclusive. Please get rid of any > implicit explicit/implcit mapcount checks. Ack. > > >> + double_pt_unlock(dst_ptl, src_ptl); > >> + return -EAGAIN; > >> + } > >> + > >> + BUG_ON(!folio_test_anon(src_folio)); > >> + > >> + dst_anon_vma =3D (void *)dst_vma->anon_vma + PAGE_MAPPING_ANON= ; > >> + WRITE_ONCE(src_folio->mapping, > >> + (struct address_space *) dst_anon_vma); > > I have some cleanups pending for page_move_anon_rmap(), that moves the > SetPageAnonExclusive hunk out. Here we should be using > page_move_anon_rmap() [or rather, folio_move_anon_rmap() after my cleanup= s] > > I'll send them out soonish. Should I keep this as is in my next version until you post the cleanups? I can add a TODO comment to convert it to folio_move_anon_rmap() once it's ready. > > >> + WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, > >> + dst_addr)); >> + > >> + orig_src_pte =3D ptep_clear_flush(src_vma, src_addr, src_pte); > >> + orig_dst_pte =3D mk_pte(&src_folio->page, dst_vma->vm_page_pro= t); > >> + orig_dst_pte =3D maybe_mkwrite(pte_mkdirty(orig_dst_pte), > >> + dst_vma); > > > > I think there's still a theoretical issue here that you could fix by > > checking for the AnonExclusive flag, similar to the huge page case. > > > > Consider the following scenario: > > > > 1. process P1 does a write fault in a private anonymous VMA, creating > > and mapping a new anonymous page A1 > > 2. process P1 forks and creates two children P2 and P3. afterwards, A1 > > is mapped in P1, P2 and P3 as a COW page, with mapcount 3. > > 3. process P1 removes its mapping of A1, dropping its mapcount to 2. > > 4. process P2 uses vmsplice() to grab a reference to A1 with get_user_p= ages() > > 5. process P2 removes its mapping of A1, dropping its mapcount to 1. > > > > If at this point P3 does a write fault on its mapping of A1, it will > > still trigger copy-on-write thanks to the AnonExclusive mechanism; and > > this is necessary to avoid P3 mapping A1 as writable and writing data > > into it that will become visible to P2, if P2 and P3 are in different > > security contexts. > > > > But if P3 instead moves its mapping of A1 to another address with > > remap_anon_pte() which only does a page mapcount check, the > > maybe_mkwrite() will directly make the mapping writable, circumventing > > the AnonExclusive mechanism. > > > > Yes, can_change_pte_writable() contains the exact logic when we can turn > something easily writable even if it wasn't writable before. which > includes that PageAnonExclusive is set. (but with uffd-wp or softdirty > tracking, there is more to consider) For uffd_remap can_change_pte_writable() would fail it VM_WRITE is not set, but we want remapping to work for RO memory as well. Are you saying that a PageAnonExclusive() check alone would not be enough here? Thanks, Suren. > > -- > Cheers, > > David / dhildenb >