From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6B07CCE79A8
	for <linux-mm@archiver.kernel.org>; Tue, 19 Sep 2023 23:51:40 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id CFD1C6B00ED; Tue, 19 Sep 2023 19:51:39 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id CAE476B00EE; Tue, 19 Sep 2023 19:51:39 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id B74596B00EF; Tue, 19 Sep 2023 19:51:39 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id A71006B00ED
	for <linux-mm@kvack.org>; Tue, 19 Sep 2023 19:51:39 -0400 (EDT)
Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id 7614C140FB4
	for <linux-mm@kvack.org>; Tue, 19 Sep 2023 23:51:39 +0000 (UTC)
X-FDA: 81254996718.11.EB3CA5E
Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48])
	by imf11.hostedemail.com (Postfix) with ESMTP id 965EC40006
	for <linux-mm@kvack.org>; Tue, 19 Sep 2023 23:51:37 +0000 (UTC)
Authentication-Results: imf11.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=2AxUqhzh;
	spf=pass (imf11.hostedemail.com: domain of jannh@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jannh@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695167497; a=rsa-sha256;
	cv=none;
	b=4MtBFOhmmkjlEsxG9Ao950xmfBg4ZzYxhquzNAiKxg3eTU+V+vBGB68UljyM61zcnBGksl
	xf0tXsV0W6gTl4nSgvs5hlWB1cbuRp6KViuo3zrOsKg7ZT+tLRa7s2Dzp4giD+bZOvePR2
	mfImVxbOrr/eTFc9y2yUjzw+WKxha7E=
ARC-Authentication-Results: i=1;
	imf11.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=2AxUqhzh;
	spf=pass (imf11.hostedemail.com: domain of jannh@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jannh@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1695167497;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=f6VMf/OUkQsSYVwJPsdAyGwcMTpYmYXfK2NiFd3HWh4=;
	b=7WDD21uu7BXI/fttnlraXDAQW0hcq/WrtAX7q/bapJxYG5xJRzFvgnasWvtrFsBRoRurLA
	25KFUV/X0aqSu+iJfK55PeLglQgro5bThQBD8kJv3oCGxDJSoJh5ns9zA5pgMlejaXkXIv
	4FDjwwH9FWvfe/VDcB6HVAAZpvWHTK4=
Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-529fa243739so7630a12.0
        for <linux-mm@kvack.org>; Tue, 19 Sep 2023 16:51:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1695167496; x=1695772296; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=f6VMf/OUkQsSYVwJPsdAyGwcMTpYmYXfK2NiFd3HWh4=;
        b=2AxUqhzhsqfR8E96Zh5FzVCi05P0uRzmDEWAeKaprWQSWK6b1imLiTguK3ROsqlIvC
         QwoAgVlM/ZzJ4iz8JtSSlnKDWEFuiSVa+VbirT4VyMNYC8pkcVYymzBbAcC60kifTJZH
         sksgmrSPuTZGGBMUiOoKqMX1lwagGYPMZjtb18cXlENVH1u+Bm4VIzfUxzEAz1pYWLMA
         pnj8k8Gp/HAwgipdF/ZfGNsrXzOMIl6XfZkllUzpEkSUWuUJohnumOxwQBPRNOh+H2rJ
         3m+sZfPW6WGhzpba2efnB/bBVvn6Cn+VDd3gNFrFXtqp4RQiUkzw/w3mPkPzpNK5gE4p
         1PiA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1695167496; x=1695772296;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=f6VMf/OUkQsSYVwJPsdAyGwcMTpYmYXfK2NiFd3HWh4=;
        b=US2pp8jmq5+FGsBysiLorXW7iX5c7WfD2EjfKJBfwc+q9wB/ZKALBy2fpKgV8z353S
         Mf1oLZvAgeOV8JcLDsA0uvPL4Kf2dlceyfytL9RmcCTu89t0+2/Hh+ow34O8tioM/tAq
         lFBzByDRaEamLwBf6Vlt536+yxAluoE4e6+ywxBm1kPzUBs0IO0sc4o0bVBC/DXYFvzr
         30ENVLAc1i6vt/hM4LMIAJKy0BdVyxPEJDuz4uazlZZJCCe5RvVAW+Z+uD43ywVleOUA
         WMc4Kz0Nq3l3qbRzwTGHy2grueAtdswO7hQI09wnhtmboR8jTbV9jYeaiUF45M2cRQTh
         NU0Q==
X-Gm-Message-State: AOJu0Ywj18pN9OzzVRs/PPx/VBnRHUMvMTnYPjKZdw7RGEF8/778T9sZ
	A/hUaE71QlJijkipYkGXFZpOArFm0wsXBf7qFNCn6w==
X-Google-Smtp-Source: AGHT+IEkCvERxvTYDfbrhAwIRpXY6kz99DLqVyOLYFXLV+eHS9DS62mqizZVcoB7cM3wk2N8dUWhx+CUmG5AJDEB40U=
X-Received: by 2002:a50:d71b:0:b0:52f:3376:8d7 with SMTP id
 t27-20020a50d71b000000b0052f337608d7mr18998edi.5.1695167495880; Tue, 19 Sep
 2023 16:51:35 -0700 (PDT)
MIME-Version: 1.0
References: <20230914152620.2743033-1-surenb@google.com> <20230914152620.2743033-3-surenb@google.com>
 <CAG48ez0gN_nC8NrMOeq44QmUDT27EpT0bFuNu1ReVKDBt3zy7Q@mail.gmail.com> <CAJuCfpGdbc70aZPu=cNgemK1EFUyvLfZU8ELSjseZtfpSF+EEg@mail.gmail.com>
In-Reply-To: <CAJuCfpGdbc70aZPu=cNgemK1EFUyvLfZU8ELSjseZtfpSF+EEg@mail.gmail.com>
From: Jann Horn <jannh@google.com>
Date: Wed, 20 Sep 2023 01:50:57 +0200
Message-ID: <CAG48ez212+UjQMB94vKvyV4YAEgg=jBhdzg_1b4BRe6=SO09fA@mail.gmail.com>
Subject: Re: [PATCH 2/3] userfaultfd: UFFDIO_REMAP uABI
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, 
	shuah@kernel.org, aarcange@redhat.com, lokeshgidra@google.com, 
	peterx@redhat.com, david@redhat.com, hughd@google.com, mhocko@suse.com, 
	axelrasmussen@google.com, rppt@kernel.org, willy@infradead.org, 
	Liam.Howlett@oracle.com, zhangpeng362@huawei.com, bgeffon@google.com, 
	kaleshsingh@google.com, ngeoffray@google.com, jdduke@google.com, 
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, 
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, 
	kernel-team@android.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam08
X-Rspamd-Queue-Id: 965EC40006
X-Stat-Signature: ruyhqhri6jbz1e6gzgz6xt31merp67uf
X-Rspam-User: 
X-HE-Tag: 1695167497-882698
X-HE-Meta: U2FsdGVkX1+41Bg7i9WhATlUlEj5jG7pxzNmYV42aLx/R0/UOl1BjRHJsaxbVL/ancRcsgW3ZO7Rz3JAsQGgg9WwaeI13kOf0wffop6locTZy79VT1PJqKbOm31/n7GAZnQHcEUsjDqZl+/zQ0yZ3m3nYWKqmY77PY0eIZCOUhRIvgw52HUmyPdolffANwq6ceQ39czSi5APHwSfw2GFqHK1IlroFEYpeVsJdpE94LY/n39MAjIos9rtLTaWR8c1ag4NBx743pGAnVD55BzzvVyPToSJQA2f3y0p2naNLAsUZSyTWAYEgyjxfgWZ9LFk3e/fxVZqBKAc6Cjk69hcjR4GTQUpVVGJ4cs/RXcRkyW0GpdLfKTNx+IKWVYj+UK5R74z1G+mEZ6Iq3hMqik+A8O9Kf5a/CyBInQzr7lUbfWlS6SYQYu37DvApHCzcODIG/BfWY7xTRtdBEc7bZ/3jLHIXcs+N6/q2WWffkSWz0bXwbi6fxfNUpkOw3cAICa7KH+PNVfGwg4vxFQnBNnH3gGogLvVwjP/3093uE06jgl2vFpqKNrcPYE76SB7aEZKXtH+sDf1IRh6rz3uFlCCceCqWB2qd2heOzE6RRpG5ywEJRhiPDcUHgZg8NPBiyEYVNHyxdW7ymGbrsvVYngXcdEO+sfhO0B8ATOTFN2D7PYKcRPAVj6/ep4MlS8+yDSOFhfovJcJx6uj9yirvxXvXXidE7pC58rc8dESV+7B9x9uHcp1DrG9/RLy9MauDB5msSgNIW/7U3hRMxPDpjdRK/MAYjMnRQEfoe1iudCgdIvZCtfWUaAmkUJOFlp0yaDJSZKKnga1iudHbYsv+QxfJ08MQCuE/PDiOHRWZyRc9/uUOUT9nOhhFPFPOwolR9l6w78LNEdHJnWKSnw1x7mH4LV0oUNZVOOO9ph/3wrfHDZUFzOIDGjVV4mreUH1kWmTv0jPl2eihGdw5rpIVvH
 JCOt20g1
 Pd4U/RkkG8wg26NPjN3RBUVMNHe82PXiE73SLWJbu8oEDHRVGvqRclgFQdVTGVB/rlDctpheOSfc4+9usu54V5z3bV2gXsxuyi8a5o29WT6Qlx8AcxWmq6u8hR0v6CUNHHFBUBlQjPrXKT9CjaS6zQH1PrgYfUJXfJmiiTOjA0uGSd2LjFqUtkXojDOuuxtL6n8Jmg4xejF5Fg6NElOl+5TS5e+QQue+tlkp/2JHT84BFrPMkFBf+++3wqhZDGxQhJBdSCxQigAtyA3kezC8dV142IZwx52Z9GQ7Nwp5AngX0VW8TAJDMEdB/vZ1mgvIJQngc
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Sep 20, 2023 at 1:08=E2=80=AFAM Suren Baghdasaryan <surenb@google.c=
om> wrote:
> On Thu, Sep 14, 2023 at 7:28=E2=80=AFPM Jann Horn <jannh@google.com> wrot=
e:
> > On Thu, Sep 14, 2023 at 5:26=E2=80=AFPM Suren Baghdasaryan <surenb@goog=
le.com> wrote:
> > > From: Andrea Arcangeli <aarcange@redhat.com>
> > >
> > > This implements the uABI of UFFDIO_REMAP.
> > >
> > > Notably one mode bitflag is also forwarded (and in turn known) by the
> > > lowlevel remap_pages method.
[...]
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
[...]
> > > +int remap_pages_huge_pmd(struct mm_struct *dst_mm,
> > > +                        struct mm_struct *src_mm,
> > > +                        pmd_t *dst_pmd, pmd_t *src_pmd,
> > > +                        pmd_t dst_pmdval,
> > > +                        struct vm_area_struct *dst_vma,
> > > +                        struct vm_area_struct *src_vma,
> > > +                        unsigned long dst_addr,
> > > +                        unsigned long src_addr)
> > > +{
> > > +       pmd_t _dst_pmd, src_pmdval;
> > > +       struct page *src_page;
> > > +       struct anon_vma *src_anon_vma, *dst_anon_vma;
> > > +       spinlock_t *src_ptl, *dst_ptl;
> > > +       pgtable_t pgtable;
> > > +       struct mmu_notifier_range range;
> > > +
> > > +       src_pmdval =3D *src_pmd;
> > > +       src_ptl =3D pmd_lockptr(src_mm, src_pmd);
> > > +
> > > +       BUG_ON(!pmd_trans_huge(src_pmdval));
> > > +       BUG_ON(!pmd_none(dst_pmdval));
> >
> > Why can we assert that pmd_none(dst_pmdval) is true here? Can we not
> > have concurrent faults (or userfaultfd operations) populating that
> > PMD?
>
> IIUC dst_pmdval is a copy of the value from dst_pmd, so that local
> copy should not change even if some concurrent operation changes
> dst_pmd. We can assert that it's pmd_none because we checked for that
> before calling remap_pages_huge_pmd. Later on we check if dst_pmd
> changed from under us (see pmd_same(*dst_pmd, dst_pmdval) check) and
> retry if that happened.

Oh, right, I don't know what I was thinking when I typed that.

But now I wonder about the check directly above that: What does this
code do for swap PMDs? It looks like that might splat on the
BUG_ON(!pmd_trans_huge(src_pmdval)). All we've checked on the path to
here is that the virtual memory area is aligned, that the destination
PMD is empty, and that pmd_trans_huge_lock() succeeded; but
pmd_trans_huge_lock() explicitly permits swap PMDs (which is the
swapped-out version of transhuge PMDs):

static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
                struct vm_area_struct *vma)
{
        if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
                return __pmd_trans_huge_lock(pmd, vma);
        else
                return NULL;
}

> >
> > > +       BUG_ON(!spin_is_locked(src_ptl));
> > > +       mmap_assert_locked(src_mm);
> > > +       mmap_assert_locked(dst_mm);
> > > +       BUG_ON(src_addr & ~HPAGE_PMD_MASK);
> > > +       BUG_ON(dst_addr & ~HPAGE_PMD_MASK);
> > > +
> > > +       src_page =3D pmd_page(src_pmdval);
> > > +       BUG_ON(!PageHead(src_page));
> > > +       BUG_ON(!PageAnon(src_page));
> > > +       if (unlikely(page_mapcount(src_page) !=3D 1)) {
> > > +               spin_unlock(src_ptl);
> > > +               return -EBUSY;
> > > +       }
> > > +
> > > +       get_page(src_page);
> > > +       spin_unlock(src_ptl);
> > > +
> > > +       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, src_mm, =
src_addr,
> > > +                               src_addr + HPAGE_PMD_SIZE);
> > > +       mmu_notifier_invalidate_range_start(&range);
> > > +
> > > +       /* block all concurrent rmap walks */
> > > +       lock_page(src_page);
> > > +
> > > +       /*
> > > +        * split_huge_page walks the anon_vma chain without the page
> > > +        * lock. Serialize against it with the anon_vma lock, the pag=
e
> > > +        * lock is not enough.
> > > +        */
> > > +       src_anon_vma =3D folio_get_anon_vma(page_folio(src_page));
> > > +       if (!src_anon_vma) {
> > > +               unlock_page(src_page);
> > > +               put_page(src_page);
> > > +               mmu_notifier_invalidate_range_end(&range);
> > > +               return -EAGAIN;
> > > +       }
> > > +       anon_vma_lock_write(src_anon_vma);
> > > +
> > > +       dst_ptl =3D pmd_lockptr(dst_mm, dst_pmd);
> > > +       double_pt_lock(src_ptl, dst_ptl);
> > > +       if (unlikely(!pmd_same(*src_pmd, src_pmdval) ||
> > > +                    !pmd_same(*dst_pmd, dst_pmdval) ||
> > > +                    page_mapcount(src_page) !=3D 1)) {
> > > +               double_pt_unlock(src_ptl, dst_ptl);
> > > +               anon_vma_unlock_write(src_anon_vma);
> > > +               put_anon_vma(src_anon_vma);
> > > +               unlock_page(src_page);
> > > +               put_page(src_page);
> > > +               mmu_notifier_invalidate_range_end(&range);
> > > +               return -EAGAIN;
> > > +       }
> > > +
> > > +       BUG_ON(!PageHead(src_page));
> > > +       BUG_ON(!PageAnon(src_page));
> > > +       /* the PT lock is enough to keep the page pinned now */
> > > +       put_page(src_page);
> > > +
> > > +       dst_anon_vma =3D (void *) dst_vma->anon_vma + PAGE_MAPPING_AN=
ON;
> > > +       WRITE_ONCE(src_page->mapping, (struct address_space *) dst_an=
on_vma);
> > > +       WRITE_ONCE(src_page->index, linear_page_index(dst_vma, dst_ad=
dr));
> > > +
> > > +       if (!pmd_same(pmdp_huge_clear_flush(src_vma, src_addr, src_pm=
d),
> > > +                     src_pmdval))
> > > +               BUG_ON(1);
> >
> > I'm not sure we can assert that the PMDs are exactly equal; the CPU
> > might have changed the A/D bits under us?
>
> Yes. I wonder if I can simply remove the BUG_ON here like this:
>
> src_pmdval =3D pmdp_huge_clear_flush(src_vma, src_addr, src_pmd);
>
> Technically we don't use src_pmdval after this but for the possible
> future use that would keep things correct. If A/D bits changed from
> under us we will still copy correct values into dst_pmd.

And when we set up the dst_pmd, we always mark it as dirty and
accessed... so I guess that's fine.

> > > +       _dst_pmd =3D mk_huge_pmd(src_page, dst_vma->vm_page_prot);
> > > +       _dst_pmd =3D maybe_pmd_mkwrite(pmd_mkdirty(_dst_pmd), dst_vma=
);
> > > +       set_pmd_at(dst_mm, dst_addr, dst_pmd, _dst_pmd);
> > > +
> > > +       pgtable =3D pgtable_trans_huge_withdraw(src_mm, src_pmd);
> > > +       pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
> >
> > Are we allowed to move page tables between mm_structs on all
> > architectures? The first example I found that looks a bit dodgy,
> > looking through various architectures' pte_alloc_one(), is s390's
> > page_table_alloc() which looks like page tables are tied to per-MM
> > lists sometimes.
> > If that's not allowed, we might have to allocate a new deposit table
> > and free the old one or something like that.
>
> Hmm. Yeah, looks like in the case of !CONFIG_PGSTE the table can be
> linked to mm->context.pgtable_list, so can't be moved to another mm. I
> guess I'll have to keep a pgtable allocated, ready to be deposited and
> free the old one. Maybe it's worth having an arch-specific function
> indicating whether moving a pgtable between MMs is supported? Or do it
> separately as an optimization. WDYT?

Hm, dunno. I guess you could have architectures opt in with some
config flag similar to how flags like
ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH are wired up - define it in
init/Kconfig, select it in the architectures that support it, and then
gate the fast version on that with #ifdef?

> > > +       if (dst_mm !=3D src_mm) {
> > > +               add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
> > > +               add_mm_counter(src_mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> > > +       }
> > > +       double_pt_unlock(src_ptl, dst_ptl);
> > > +
> > > +       anon_vma_unlock_write(src_anon_vma);
> > > +       put_anon_vma(src_anon_vma);
> > > +
> > > +       /* unblock rmap walks */
> > > +       unlock_page(src_page);
> > > +
> > > +       mmu_notifier_invalidate_range_end(&range);
> > > +       return 0;
> > > +}
> > > +#endif /* CONFIG_USERFAULTFD */
> > > +
> > >  /*
> > >   * Returns page table lock pointer if a given pmd maps a thp, NULL o=
therwise.
> > >   *
> > [...]
> > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > > index 96d9eae5c7cc..0cca60dfa8f8 100644
> > > --- a/mm/userfaultfd.c
> > > +++ b/mm/userfaultfd.c
> > [...]
> > > +ssize_t remap_pages(struct mm_struct *dst_mm, struct mm_struct *src_=
mm,
> > > +                   unsigned long dst_start, unsigned long src_start,
> > > +                   unsigned long len, __u64 mode)
> > > +{
> > [...]
> > > +
> > > +       if (pgprot_val(src_vma->vm_page_prot) !=3D
> > > +           pgprot_val(dst_vma->vm_page_prot))
> > > +               goto out;
> >
> > Does this check intentionally allow moving pages from a
> > PROT_READ|PROT_WRITE anonymous private VMA into a PROT_READ anonymous
> > private VMA (on architectures like x86 and arm64 where CoW memory has
> > the same protection flags as read-only memory), but forbid moving them
> > from a PROT_READ|PROT_EXEC VMA into a PROT_READ VMA? I think this
> > check needs at least a comment to explain what's going on here.
>
> The check is simply to ensure the VMAs have the same access
> permissions to prevent page copies that should have different
> permissions. The fact that x86 and arm64 have the same protection bits
> for R/O and COW memory is a "side-effect" IMHO. I'm not sure what
> comment would be good here but I'm open to suggestions.

I'm not sure if you can do a meaningful security check on the
->vm_page_prot. I also don't think it matters for anonymous VMAs.
I guess if you want to keep this check but make this behavior more
consistent, you could put another check in front of this that rejects
VMAs where vm_flags like VM_READ, VM_WRITE, VM_SHARED or VM_EXEC are
different?

[...]
> > > +       /*
> > > +        * Ensure the dst_vma has a anon_vma or this page
> > > +        * would get a NULL anon_vma when moved in the
> > > +        * dst_vma.
> > > +        */
> > > +       err =3D -ENOMEM;
> > > +       if (unlikely(anon_vma_prepare(dst_vma)))
> > > +               goto out;
> > > +
> > > +       for (src_addr =3D src_start, dst_addr =3D dst_start;
> > > +            src_addr < src_start + len;) {
> > > +               spinlock_t *ptl;
> > > +               pmd_t dst_pmdval;
> > > +
> > > +               BUG_ON(dst_addr >=3D dst_start + len);
> > > +               src_pmd =3D mm_find_pmd(src_mm, src_addr);
> >
> > (this would blow up pretty badly if we could have transparent huge PUD
> > in the region but I think that's limited to file VMAs so it's fine as
> > it currently is)
>
> Should I add a comment here as a warning if in the future we decide to
> implement support for file-backed pages?

Hm, yeah, I guess that might be a good idea.