From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84E47C433FE for ; Thu, 10 Nov 2022 21:53:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC9BC6B0074; Thu, 10 Nov 2022 16:53:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A793D6B0075; Thu, 10 Nov 2022 16:53:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9401F8E0001; Thu, 10 Nov 2022 16:53:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 864806B0074 for ; Thu, 10 Nov 2022 16:53:28 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5016414085B for ; Thu, 10 Nov 2022 21:53:28 +0000 (UTC) X-FDA: 80118884496.19.36C7EFA Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf12.hostedemail.com (Postfix) with ESMTP id C086540006 for ; Thu, 10 Nov 2022 21:53:27 +0000 (UTC) Received: by mail-ed1-f48.google.com with SMTP id v27so5230632eda.1 for ; Thu, 10 Nov 2022 13:53:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=codesandbox-io.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=fnGD3CTLAv0N9U1NgjsVahY1elEnHGukoySM1xlqJyg=; b=3yfRTmNZc4iDG4clJkJWpeZMkYnRC+zoDlwUAfz92kpVhutYn/iwoq+1hXgryuCWVj T/zo0Ftlsiv3QBpdb8hAtcD1sPU6Y7lfA55OvkiXRyLYs/2LeQRJxSOQFMveIuCYZdLo VT4lzTKdHwW6Alv3OX8OsFFoc36N3lEL/zs3/p2yvPOGCoTcTbmQE7//VVk0yoYqVjA5 ehSAikjaRdJJ9zZp8DFvjDUvVNERzZspF7BpCfguqzsYu2xmbzbqnNR+3jjm6oEFlqXt pH8WuA7iNSWQHdUmiZ7tCBzoiig3pa142Qv7xAXL+YrZc9eqpzuRUluLDW4vHJ0p1PVP /Otw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fnGD3CTLAv0N9U1NgjsVahY1elEnHGukoySM1xlqJyg=; b=dimbyvKan0IN5JIdaF558eVbYltrJkHwMACjjLcFDZYmMba9/cH1WIo2AJjPfBOhSg XnPDPPRDNJePwTaIhuvZ0wvUgBF6y6W+wC86ek26aCSJpmXJ7bwadFWnKKxAz+RpTQep Y6YjBEOBOs/T8TooSxo68vBMgl+CmpYhFf2CH8/xccwQy72NTaXf+n1b5KcSAz9we4Dl qQ2bRTLGXbq+VhpJjDU9CzX5A0mxAWmiM2AFktSzVHAMdrUY0Rxeia9Rd6Kix26S1+N7 91pdPlELXJWdEVS3kYai57MB8XdnpLnLE+twoLhTziVR1OlLOy26+Ap4UwLWOM4oJ4iI BrqA== X-Gm-Message-State: ACrzQf1xb1QReXIHddh7GXMPaMCBT04QLYZwwRLrQVHxUB7djqwjmxCs E1V5AKwk5Yuzaif466kXmofwjTf/SCPxhSreoeAe/g== X-Google-Smtp-Source: AMsMyM4TDfRsX5a99DWS0Kf8VCLlYdFt9ugeBqAjvTENH2+YtuIn/nxzjoFbKnAxCOCZNOxpOJ/G0ioX1hbhfhBSzaE= X-Received: by 2002:a05:6402:4287:b0:458:c66a:3664 with SMTP id g7-20020a056402428700b00458c66a3664mr3559956edc.79.1668117206207; Thu, 10 Nov 2022 13:53:26 -0800 (PST) Received: from 649336022844 named unknown by gmailapi.google.com with HTTPREST; Thu, 10 Nov 2022 13:53:25 -0800 Mime-Version: 1.0 References: <20221110203132.1498183-1-peterx@redhat.com> <20221110203132.1498183-2-peterx@redhat.com> X-Mailer: Superhuman Desktop (2022-11-09T23:06:01Z) X-Superhuman-ID: lablwhaj.fe8c0bcf-19a1-4d9c-9675-8e092a3fa094 In-Reply-To: <20221110203132.1498183-2-peterx@redhat.com> X-Superhuman-Draft-ID: draft00c68bc7a7969bc3 From: Ives van Hoorne Date: Thu, 10 Nov 2022 13:53:25 -0800 Message-ID: Subject: Re: [PATCH v2 1/2] mm/migrate: Fix read-only page got writable when recover pte To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrea Arcangeli , Axel Rasmussen , Nadav Amit , Andrew Morton , Mike Rapoport , stable@vger.kernel.org Content-Type: multipart/alternative; boundary="00000000000085256905ed24cdec" ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=codesandbox-io.20210112.gappssmtp.com header.s=20210112 header.b=3yfRTmNZ; spf=pass (imf12.hostedemail.com: domain of ives@codesandbox.io designates 209.85.208.48 as permitted sender) smtp.mailfrom=ives@codesandbox.io; dmarc=pass (policy=quarantine) header.from=codesandbox.io ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668117208; a=rsa-sha256; cv=none; b=2XPwvjdSwI/4qaRozIxW+AaTV4vIQ6lKrfR3DbRYJMWKs5TJbgFches6Mbsls5EzdweYel MTksgH7okBqBDHJRZ0hzNrDEq6OagXSJKFnoVtfwS2CffFjonczxG5fnLo/g1jiyRmhjNu HsKJAj87Q9Ceoot3+ePKx0Lkdft9VPw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668117208; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fnGD3CTLAv0N9U1NgjsVahY1elEnHGukoySM1xlqJyg=; b=FCiXw8ge2ntIYXtV1YZMcZDKL4xpT71NN3DZqJsOJzjGwH2I2+KdIkAQK700wYYQBadnHa 874Dy0/MbJKOGLiZOILcwKGh/j4LHUslbLQ3MmXllrrAWQ3qAdSw9fbNq4WVcIDpAjAoiR y3rGpjWfojfEfK20kiATRMK2UhkDPA4= X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=codesandbox-io.20210112.gappssmtp.com header.s=20210112 header.b=3yfRTmNZ; spf=pass (imf12.hostedemail.com: domain of ives@codesandbox.io designates 209.85.208.48 as permitted sender) smtp.mailfrom=ives@codesandbox.io; dmarc=pass (policy=quarantine) header.from=codesandbox.io X-Stat-Signature: cy5tcss568farkzkzc58rb95wzpk7c4f X-Rspamd-Queue-Id: C086540006 X-Rspamd-Server: rspam09 X-HE-Tag: 1668117207-183513 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --00000000000085256905ed24cdec Content-Type: text/plain; charset="UTF-8" I verified these latest changes with my initial test case, and can confirm that I haven't been able to reproduce the problem anymore! Usually, I would be able to reproduce the error after ~1-5 runs, given that I would fill the page cache before the run, now I have been running the same test case for 50+ times, and I haven't seen the error anymore. Thanks again Peter for taking the time to find the issue and coming up with a fix! On Thu, Nov 10, 2022 at 21:31:31, Peter Xu wrote: > Ives van Hoorne from codesandbox.io reported an issue regarding possible > data loss of uffd-wp when applied to memfds on heavily loaded systems. The > sympton is some read page got data mismatch from the snapshot child VMs. > > Here I can also reproduce with a Rust reproducer that was provided by Ives > that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate > 80 instances I can trigger the issues in ten minutes. > > It turns out that we got some pages write-through even if uffd-wp is > applied to the pte. > > The problem is, when removing migration entries, we didn't really worry > about write bit as long as we know it's not a write migration entry. That > may not be true, for some memory types (e.g. writable shmem) mk_pte can > return a pte with write bit set, then to recover the migration entry to its > original state we need to explicit wr-protect the pte or it'll has the > write bit set if it's a read migration entry. > > For uffd it can cause write-through. I didn't verify, but I think it'll be > the same for mprotect()ed pages and after migration we can miss the sigbus > instead. > > The relevant code on uffd was introduced in the anon support, which is > commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration", > 2020-04-07). However anon shouldn't suffer from this problem because anon > should already have the write bit cleared always, so that may not be a > proper Fixes target. To satisfy the need on the backport, I'm attaching the > Fixes tag to the uffd-wp shmem support. Since no one had issue with > mprotect, so I assume that's also the kernel version we should start to > backport for stable, and we shouldn't need to worry before that. > > Cc: Andrea Arcangeli > Cc: stable@vger.kernel.org > Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & > hugetlbfs") Reported-by: Ives van Hoorne > Signed-off-by: Peter Xu > --- > mm/migrate.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index dff333593a8a..8b6351c08c78 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio, > pte = pte_mkdirty(pte); > if (is_writable_migration_entry(entry)) > pte = maybe_mkwrite(pte, vma); > - else if (pte_swp_uffd_wp(*pvmw.pte)) > + else > + /* NOTE: mk_pte can have write bit set */ > + pte = pte_wrprotect(pte); > + > + if (pte_swp_uffd_wp(*pvmw.pte)) { > + WARN_ON_ONCE(pte_write(pte)); > pte = pte_mkuffd_wp(pte); > + } > > if (folio_test_anon(folio) && !is_readable_migration_entry(entry)) > rmap_flags |= RMAP_EXCLUSIVE; > -- > 2.37.3 > --00000000000085256905ed24cdec Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I verified these la= test changes with my initial test case, and can confirm that I haven't = been able to reproduce the problem anymore! Usually, I would be able to rep= roduce the error after ~1-5 runs, given that I would fill the page cache be= fore the run, now I have been running the same test case for 50+ times, and= I haven't seen the error anymore.

<= div class=3D"">Thanks again Peter for taking the time to find the issue and= coming up with a fix!


On Thu, Nov 10, 2022 a= t 21:31:31, Peter Xu <peterx@redhat.com> wrote:

Ives van Hoor= ne from codesandbox.io repo= rted an issue regarding possible data loss of uffd-wp when applied to memfds on heavily loaded systems. The sympton is some read page got data mismatch from the snapshot child VMs.

Here I can also reproduce with a Rust reproducer that was provided by Ives that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate 80 instances I can trigger the issues in ten minutes.

It turns out that we got some pages write-through even if uffd-wp is applied to the pte.

The problem is, when removing migration entries, we didn't really worry about write bit as long as we know it's not a write migration entry. T= hat may not be true, for some memory types (e.g. writable shmem) mk_pte can return a pte with write bit set, then to recover the migration entry to its original state we need to explicit wr-protect the pte or it'll has the write bit set if it's a read migration entry.

For uffd it can cause write-through. I didn't verify, but I think it&#= 39;ll be the same for mprotect()ed pages and after migration we can miss the sigbus instead.

The relevant code on uffd was introduced in the anon support, which is commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration= ", 2020-04-07). However anon shouldn't suffer from this problem because a= non should already have the write bit cleared always, so that may not be a proper Fixes target. To satisfy the need on the backport, I'm attachin= g the Fixes tag to the uffd-wp shmem support. Since no one had issue with mprotect, so I assume that's also the kernel version we should start to backport for stable, and we shouldn't need to worry before that.

Cc: Andrea Arcangeli <aarcang= e@redhat.com>
Cc: stable@vger.ker= nel.org
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem &= hugetlbfs") Reported-by: Ives van Hoorne <ives@codesandbox.io> Signed-off-by: Peter Xu <peterx= @redhat.com>
---
mm/migrate.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index dff333593a8a..8b6351c08c78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio, pte =3D pte_mkdirty(pte);
if (is_writable_migration_entry(entry))
pte =3D maybe_mkwrite(pte, vma);
- else if (pte_swp_uffd_wp(*pvmw.pte))
+ else
+ /* NOTE: mk_pte can have write bit set */
+ pte =3D pte_wrprotect(pte);
+
+ if (pte_swp_uffd_wp(*pvmw.pte)) {
+ WARN_ON_ONCE(pte_write(pte));
pte =3D pte_mkuffd_wp(pte);
+ }

if (folio_test_anon(folio) && !is_readable_migration_entry(entry)= ) rmap_flags |=3D RMAP_EXCLUSIVE;
--=20
2.37.3


--00000000000085256905ed24cdec--