From: John Hubbard <jhubbard@nvidia.com>
To: David Hildenbrand <david@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Hugh Dickins <hughd@google.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast
Date: Tue, 30 Aug 2022 12:18:41 -0700 [thread overview]
Message-ID: <9a4fe603-950e-785b-6281-2e309256463f@nvidia.com> (raw)
In-Reply-To: <9ce3aaaa-71a6-5a81-16a3-36e6763feb91@redhat.com>
On 8/30/22 11:53, David Hildenbrand wrote:
> Good, I managed to attract the attention of someone who understands that machinery :)
>
> While validating whether GUP-fast and PageAnonExclusive code work correctly,
> I started looking at the whole RCU GUP-fast machinery. I do have a patch to
> improve PageAnonExclusive clearing (I think we're missing memory barriers to
> make it work as expected in any possible case), but I also stumbled eventually
> over a more generic issue that might need memory barriers.
>
> Any thoughts whether I am missing something or this is actually missing
> memory barriers?
>
It's actually missing memory barriers.
In fact, others have had that same thought! [1] :) In that 2019 thread,
I recall that this got dismissed because of a focus on the IPI-based
aspect of gup fast synchronization (there was some hand waving, perhaps
accurate waving, about memory barriers vs. CPU interrupts). But now the
RCU (non-IPI) implementation is more widely used than it used to be, the
issue is clearer.
>
> From ce8c941c11d1f60cea87a3e4d941041dc6b79900 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Mon, 29 Aug 2022 16:57:07 +0200
> Subject: [PATCH] mm/gup: update refcount+pincount before testing if the PTE
> changed
>
> mm/ksm.c:write_protect_page() has to make sure that no unknown
> references to a mapped page exist and that no additional ones with write
> permissions are possible -- unknown references could have write permissions
> and modify the page afterwards.
>
> Conceptually, mm/ksm.c:write_protect_page() consists of:
> (1) Clear/invalidate PTE
> (2) Check if there are unknown references; back off if so.
> (3) Update PTE (e.g., map it R/O)
>
> Conceptually, GUP-fast code consists of:
> (1) Read the PTE
> (2) Increment refcount/pincount of the mapped page
> (3) Check if the PTE changed by re-reading it; back off if so.
>
> To make sure GUP-fast won't be able to grab additional references after
> clearing the PTE, but will properly detect the change and back off, we
> need a memory barrier between updating the recount/pincount and checking
> if it changed.
>
> try_grab_folio() doesn't necessarily imply a memory barrier, so add an
> explicit smp_mb__after_atomic() after the atomic RMW operation to
> increment the refcount and pincount.
>
> ptep_clear_flush() used to clear the PTE and flush the TLB should imply
> a memory barrier for flushing the TLB, so don't add another one for now.
>
> PageAnonExclusive handling requires further care and will be handled
> separately.
>
> Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> mm/gup.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 5abdaf487460..0008b808f484 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2392,6 +2392,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> goto pte_unmap;
> }
>
> + /*
> + * Update refcount/pincount before testing for changed PTE. This
> + * is required for code like mm/ksm.c:write_protect_page() that
> + * wants to make sure that a page has no unknown references
> + * after clearing the PTE.
> + */
> + smp_mb__after_atomic();
> +
> if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> gup_put_folio(folio, 1, flags);
> goto pte_unmap;
> @@ -2577,6 +2585,9 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
> if (!folio)
> return 0;
>
> + /* See gup_pte_range(). */
Don't we usually also identify what each mb pairs with, in the comments? That would help.
> + smp_mb__after_atomic();
> +
> if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> gup_put_folio(folio, refs, flags);
> return 0;
> @@ -2643,6 +2654,9 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> if (!folio)
> return 0;
>
> + /* See gup_pte_range(). */
> + smp_mb__after_atomic();
> +
> if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> gup_put_folio(folio, refs, flags);
> return 0;
> @@ -2683,6 +2697,9 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> if (!folio)
> return 0;
>
> + /* See gup_pte_range(). */
> + smp_mb__after_atomic();
> +
> if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> gup_put_folio(folio, refs, flags);
> return 0;
[1] https://lore.kernel.org/lkml/9465df76-0229-1b44-5646-5cced1bc1718@nvidia.com/
thanks,
--
John Hubbard
NVIDIA
next prev parent reply other threads:[~2022-08-30 19:18 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-25 16:46 [PATCH v1 0/3] mm: minor cleanups around NUMA hinting David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 1/3] mm/gup: replace FOLL_NUMA by gup_can_follow_protnone() David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast David Hildenbrand
2022-08-26 14:59 ` David Hildenbrand
2022-08-30 18:23 ` David Hildenbrand
2022-08-30 18:45 ` Jason Gunthorpe
2022-08-30 18:53 ` David Hildenbrand
2022-08-30 19:18 ` John Hubbard [this message]
2022-08-30 19:23 ` David Hildenbrand
2022-08-30 23:44 ` Jason Gunthorpe
2022-08-31 7:44 ` David Hildenbrand
2022-08-31 16:21 ` Peter Xu
2022-08-31 16:31 ` David Hildenbrand
2022-08-31 18:23 ` Peter Xu
2022-08-31 19:25 ` David Hildenbrand
2022-09-01 7:55 ` Alistair Popple
2022-08-30 19:57 ` Jason Gunthorpe
2022-08-30 20:12 ` John Hubbard
2022-08-30 22:39 ` Jason Gunthorpe
2022-08-31 7:15 ` David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 3/3] mm: fixup documentation regarding pte_numa() and PROT_NUMA David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a4fe603-950e-785b-6281-2e309256463f@nvidia.com \
--to=jhubbard@nvidia.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jgg@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=peterx@redhat.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox