linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: David Hildenbrand <david@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hughd@google.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast
Date: Tue, 30 Aug 2022 12:18:41 -0700	[thread overview]
Message-ID: <9a4fe603-950e-785b-6281-2e309256463f@nvidia.com> (raw)
In-Reply-To: <9ce3aaaa-71a6-5a81-16a3-36e6763feb91@redhat.com>

On 8/30/22 11:53, David Hildenbrand wrote:
> Good, I managed to attract the attention of someone who understands that machinery :)
> 
> While validating whether GUP-fast and PageAnonExclusive code work correctly,
> I started looking at the whole RCU GUP-fast machinery. I do have a patch to
> improve PageAnonExclusive clearing (I think we're missing memory barriers to
> make it work as expected in any possible case), but I also stumbled eventually
> over a more generic issue that might need memory barriers.
> 
> Any thoughts whether I am missing something or this is actually missing
> memory barriers?
> 

It's actually missing memory barriers.

In fact, others have had that same thought! [1] :) In that 2019 thread,
I recall that this got dismissed because of a focus on the IPI-based
aspect of gup fast synchronization (there was some hand waving, perhaps
accurate waving, about memory barriers vs. CPU interrupts). But now the
RCU (non-IPI) implementation is more widely used than it used to be, the
issue is clearer.

> 
> From ce8c941c11d1f60cea87a3e4d941041dc6b79900 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Mon, 29 Aug 2022 16:57:07 +0200
> Subject: [PATCH] mm/gup: update refcount+pincount before testing if the PTE
>  changed
> 
> mm/ksm.c:write_protect_page() has to make sure that no unknown
> references to a mapped page exist and that no additional ones with write
> permissions are possible -- unknown references could have write permissions
> and modify the page afterwards.
> 
> Conceptually, mm/ksm.c:write_protect_page() consists of:
>   (1) Clear/invalidate PTE
>   (2) Check if there are unknown references; back off if so.
>   (3) Update PTE (e.g., map it R/O)
> 
> Conceptually, GUP-fast code consists of:
>   (1) Read the PTE
>   (2) Increment refcount/pincount of the mapped page
>   (3) Check if the PTE changed by re-reading it; back off if so.
> 
> To make sure GUP-fast won't be able to grab additional references after
> clearing the PTE, but will properly detect the change and back off, we
> need a memory barrier between updating the recount/pincount and checking
> if it changed.
> 
> try_grab_folio() doesn't necessarily imply a memory barrier, so add an
> explicit smp_mb__after_atomic() after the atomic RMW operation to
> increment the refcount and pincount.
> 
> ptep_clear_flush() used to clear the PTE and flush the TLB should imply
> a memory barrier for flushing the TLB, so don't add another one for now.
> 
> PageAnonExclusive handling requires further care and will be handled
> separately.
> 
> Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/gup.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 5abdaf487460..0008b808f484 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2392,6 +2392,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  			goto pte_unmap;
>  		}
>  
> +		/*
> +		 * Update refcount/pincount before testing for changed PTE. This
> +		 * is required for code like mm/ksm.c:write_protect_page() that
> +		 * wants to make sure that a page has no unknown references
> +		 * after clearing the PTE.
> +		 */
> +		smp_mb__after_atomic();
> +
>  		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
>  			gup_put_folio(folio, 1, flags);
>  			goto pte_unmap;
> @@ -2577,6 +2585,9 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>  	if (!folio)
>  		return 0;
>  
> +	/* See gup_pte_range(). */

Don't we usually also identify what each mb pairs with, in the comments? That would help.

> +	smp_mb__after_atomic();
> +
>  	if (unlikely(pte_val(pte) != pte_val(*ptep))) {
>  		gup_put_folio(folio, refs, flags);
>  		return 0;
> @@ -2643,6 +2654,9 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>  	if (!folio)
>  		return 0;
>  
> +	/* See gup_pte_range(). */
> +	smp_mb__after_atomic();
> +
>  	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
>  		gup_put_folio(folio, refs, flags);
>  		return 0;
> @@ -2683,6 +2697,9 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>  	if (!folio)
>  		return 0;
>  
> +	/* See gup_pte_range(). */
> +	smp_mb__after_atomic();
> +
>  	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
>  		gup_put_folio(folio, refs, flags);
>  		return 0;


[1] https://lore.kernel.org/lkml/9465df76-0229-1b44-5646-5cced1bc1718@nvidia.com/


thanks,

-- 
John Hubbard
NVIDIA


  reply	other threads:[~2022-08-30 19:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 16:46 [PATCH v1 0/3] mm: minor cleanups around NUMA hinting David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 1/3] mm/gup: replace FOLL_NUMA by gup_can_follow_protnone() David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast David Hildenbrand
2022-08-26 14:59   ` David Hildenbrand
2022-08-30 18:23     ` David Hildenbrand
2022-08-30 18:45       ` Jason Gunthorpe
2022-08-30 18:53         ` David Hildenbrand
2022-08-30 19:18           ` John Hubbard [this message]
2022-08-30 19:23             ` David Hildenbrand
2022-08-30 23:44               ` Jason Gunthorpe
2022-08-31  7:44                 ` David Hildenbrand
2022-08-31 16:21               ` Peter Xu
2022-08-31 16:31                 ` David Hildenbrand
2022-08-31 18:23                   ` Peter Xu
2022-08-31 19:25                     ` David Hildenbrand
2022-09-01  7:55                       ` Alistair Popple
2022-08-30 19:57           ` Jason Gunthorpe
2022-08-30 20:12             ` John Hubbard
2022-08-30 22:39               ` Jason Gunthorpe
2022-08-31  7:15             ` David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 3/3] mm: fixup documentation regarding pte_numa() and PROT_NUMA David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9a4fe603-950e-785b-6281-2e309256463f@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=peterx@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox