From: Alex Williamson <alex.williamson@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
akpm@linux-foundation.org, minchan@kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
paulmck@kernel.org, jhubbard@nvidia.com, joaodias@google.com
Subject: Re: [PATCH] mm: Re-allow pinning of zero pfns
Date: Thu, 23 Jun 2022 14:21:39 -0600 [thread overview]
Message-ID: <20220623142139.462a0841.alex.williamson@redhat.com> (raw)
In-Reply-To: <cb7efb0f-5537-5ce4-7aec-bb10ea81d5de@redhat.com>
On Thu, 23 Jun 2022 20:07:14 +0200
David Hildenbrand <david@redhat.com> wrote:
> On 15.06.22 17:56, Jason Gunthorpe wrote:
> > On Sat, Jun 11, 2022 at 08:29:47PM +0200, David Hildenbrand wrote:
> >> On 11.06.22 00:35, Alex Williamson wrote:
> >>> The commit referenced below subtly and inadvertently changed the logic
> >>> to disallow pinning of zero pfns. This breaks device assignment with
> >>> vfio and potentially various other users of gup. Exclude the zero page
> >>> test from the negation.
> >>
> >> I wonder which setups can reliably work with a long-term pin on a shared
> >> zeropage. In a MAP_PRIVATE mapping, any write access via the page tables
> >> will end up replacing the shared zeropage with an anonymous page.
> >> Something similar should apply in MAP_SHARED mappings, when lazily
> >> allocating disk blocks.
>
> ^ correction, shared zeropage is never user in MAP_SHARED mappings
> (fortunally).
>
> >>
> >> In the future, we might trigger unsharing when taking a R/O pin for the
> >> shared zeropage, just like we do as of now upstream for shared anonymous
> >> pages (!PageAnonExclusive). And something similar could then be done
> >> when finding a !anon page in a MAP_SHARED mapping.
> >
> > I'm also confused how qemu is hitting this and it isn't already a bug?
> >
>
> I assume it's just some random thingy mapped into the guest physical
> address space (by the bios? R/O?), that actually never ends up getting
> used by a device.
>
> So vfio simply only needs this to keep working ... but weon't actually
> ever user that data.
>
> But this is just my best guess after thinking about it.
Good guess.
> > It is arising because vfio doesn't use FOLL_FORCE|FOLL_WRITE to move
> > away the zero page in most cases.
> >
> > And why does Yishai say it causes an infinite loop in the kernel?
>
>
> Good question. Maybe $something keeps retying if pinning fails, either
> in the kernel (which would be bad) or in user space. At least QEMU seems
> to just fail if pinning fails, but maybe it's a different user space?
The loop is in __gup_longterm_locked():
do {
rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
NULL, gup_flags);
if (rc <= 0)
break;
rc = check_and_migrate_movable_pages(rc, pages, gup_flags);
} while (!rc);
It appears we're pinning a 32 page (128K) range,
__get_user_pages_locked() returns 32, but
check_and_migrate_movable_pages() perpetually returns zero. I believe
this is because folio_is_pinnable() previously returned true, and now
returns false. Therefore we drop down to fail at folio_isolate_lru(),
incrementing isolation_error_count. From there we do nothing more than
unpin the pages, return zero, and hope for better luck next time, which
obviously doesn't happen.
If I generate an errno here, QEMU reports failing on the pc.rom memory
region at 0xc0000. Thanks,
Alex
next prev parent reply other threads:[~2022-06-23 20:21 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-10 22:35 Alex Williamson
2022-06-11 0:21 ` Minchan Kim
2022-06-11 18:29 ` David Hildenbrand
2022-06-15 15:56 ` Jason Gunthorpe
2022-06-23 18:07 ` David Hildenbrand
2022-06-23 20:21 ` Alex Williamson [this message]
2022-06-23 20:47 ` Jason Gunthorpe
2022-06-24 0:11 ` Alistair Popple
2022-06-24 1:34 ` Jason Gunthorpe
2022-06-24 1:55 ` Alistair Popple
2022-07-28 8:45 ` Alistair Popple
2022-07-28 9:23 ` David Hildenbrand
2022-07-29 2:49 ` Alistair Popple
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220623142139.462a0841.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=joaodias@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=paulmck@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox