linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	Alistair Popple <apopple@nvidia.com>
Subject: Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast
Date: Wed, 31 Aug 2022 18:31:23 +0200	[thread overview]
Message-ID: <4d067a99-1112-3b3d-bedf-35c1124904fd@redhat.com> (raw)
In-Reply-To: <Yw+KnRTrZ74qFUAA@xz-m1.local>

[...]

>> +	/*
>> +	 * We have to make sure that while we clear PageAnonExclusive, that
>> +	 * the page is not pinned and that concurrent GUP-fast won't succeed in
>> +	 * concurrently pinning the page.
>> +	 *
>> +	 * Conceptually, GUP-fast pinning code of anon pages consists of:
>> +	 * (1) Read the PTE
>> +	 * (2) Pin the mapped page
>> +	 * (3) Check if the PTE changed by re-reading it; back off if so.
>> +	 * (4) Check if PageAnonExclusive is not set; back off if so.
>> +	 *
>> +	 * Conceptually, PageAnonExclusive clearing code consists of:
>> +	 * (1) Clear PTE
>> +	 * (2) Check if the page is pinned; back off if so.
>> +	 * (3) Clear PageAnonExclusive
>> +	 * (4) Restore PTE (optional)
>> +	 *
>> +	 * In GUP-fast, we have to make sure that (2),(3) and (4) happen in
>> +	 * the right order. Memory order between (2) and (3) is handled by
>> +	 * GUP-fast, independent of PageAnonExclusive.
>> +	 *
>> +	 * When clearing PageAnonExclusive(), we have to make sure that (1),
>> +	 * (2), (3) and (4) happen in the right order.
>> +	 *
>> +	 * Note that (4) has to happen after (3) in both cases to handle the
>> +	 * corner case whereby the PTE is restored to the original value after
>> +	 * clearing PageAnonExclusive and while GUP-fast might not detect the
>> +	 * PTE change, it will detect the PageAnonExclusive change.
>> +	 *
>> +	 * We assume that there might not be a memory barrier after
>> +	 * clearing/invalidating the PTE (1) and before restoring the PTE (4),
>> +	 * so we use explicit ones here.
>> +	 *
>> +	 * These memory barriers are paired with memory barriers in GUP-fast
>> +	 * code, including gup_must_unshare().
>> +	 */
>> +
>> +	/* Clear/invalidate the PTE before checking for PINs. */
>> +	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
>> +		smp_mb();
> 
> Wondering whether this could be smp_mb__before_atomic().

We'll read via atomic_read().

That's a non-RMW operation. smp_mb__before_atomic() only applies to
RMW (Read Modify Write) operations.



I have an updated patch with improved description/comments, that includes the
following explanation/example and showcases how the two barrier pairs
interact:


    
            Thread 0 (KSM)                  Thread 1 (GUP-fast)
    
                                            (B1) Read the PTE
                                            # (B2) skipped without FOLL_WRITE
            (A1) Clear PTE
            smb_mb()
            (A2) Check pinned
                                            (B3) Pin the mapped page
                                            smb_mb()
            (A3) Clear PageAnonExclusive
            smb_wmb()
            (A4) Restore PTE
                                            (B4) Check if the PTE changed
                                            smb_rmb()
                                            (B5) Check PageAnonExclusive


> 
>> +
>> +	if (unlikely(page_maybe_dma_pinned(page)))
>> +		return -EBUSY;
>>  	ClearPageAnonExclusive(page);
>> +
>> +	/* Clear PageAnonExclusive() before eventually restoring the PTE. */
>> +	if (IS_ENABLED(CONFIG_HAVE_FAST_GUP))
>> +		smp_mb__after_atomic();
>>  	return 0;
>>  }
>>  
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index e9414ee57c5b..2aef8d76fcf2 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2140,6 +2140,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>>  		 *
>>  		 * In case we cannot clear PageAnonExclusive(), split the PMD
>>  		 * only and let try_to_migrate_one() fail later.
>> +		 *
>> +		 * See page_try_share_anon_rmap(): invalidate PMD first.
>>  		 */
>>  		anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
>>  		if (freeze && anon_exclusive && page_try_share_anon_rmap(page))
>> @@ -3177,6 +3179,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>>  	flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
>>  	pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
>>  
>> +	/* See page_try_share_anon_rmap(): invalidate PMD first. */
>>  	anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
>>  	if (anon_exclusive && page_try_share_anon_rmap(page)) {
>>  		set_pmd_at(mm, address, pvmw->pmd, pmdval);
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index d7526c705081..971cf923c0eb 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -1091,6 +1091,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
>>  			goto out_unlock;
>>  		}
>>  
>> +		/* See page_try_share_anon_rmap(): clear PTE first. */
>>  		if (anon_exclusive && page_try_share_anon_rmap(page)) {
>>  			set_pte_at(mm, pvmw.address, pvmw.pte, entry);
>>  			goto out_unlock;
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index 27fb37d65476..47e955212f15 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -193,20 +193,16 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>  			bool anon_exclusive;
>>  			pte_t swp_pte;
>>  
> 
> flush_cache_page() missing here?

Hmm, wouldn't that already be missing on the !anon path right now?

> 
> Better copy Alistair too when post formally since this will have a slight
> conflict with the other thread.

Yes, I'll give him a heads-up right away: full patch in
https://lkml.kernel.org/r/68b38ac4-c680-b694-21a9-1971396d63b9@redhat.com


Thanks for having a look Peter1

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2022-08-31 16:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 16:46 [PATCH v1 0/3] mm: minor cleanups around NUMA hinting David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 1/3] mm/gup: replace FOLL_NUMA by gup_can_follow_protnone() David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast David Hildenbrand
2022-08-26 14:59   ` David Hildenbrand
2022-08-30 18:23     ` David Hildenbrand
2022-08-30 18:45       ` Jason Gunthorpe
2022-08-30 18:53         ` David Hildenbrand
2022-08-30 19:18           ` John Hubbard
2022-08-30 19:23             ` David Hildenbrand
2022-08-30 23:44               ` Jason Gunthorpe
2022-08-31  7:44                 ` David Hildenbrand
2022-08-31 16:21               ` Peter Xu
2022-08-31 16:31                 ` David Hildenbrand [this message]
2022-08-31 18:23                   ` Peter Xu
2022-08-31 19:25                     ` David Hildenbrand
2022-09-01  7:55                       ` Alistair Popple
2022-08-30 19:57           ` Jason Gunthorpe
2022-08-30 20:12             ` John Hubbard
2022-08-30 22:39               ` Jason Gunthorpe
2022-08-31  7:15             ` David Hildenbrand
2022-08-25 16:46 ` [PATCH v1 3/3] mm: fixup documentation regarding pte_numa() and PROT_NUMA David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4d067a99-1112-3b3d-bedf-35c1124904fd@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=peterx@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox