linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	stable@vger.kernel.org
Subject: Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages
Date: Wed, 29 Aug 2018 14:14:25 -0400	[thread overview]
Message-ID: <20180829181424.GB3784@redhat.com> (raw)
In-Reply-To: <9209043d-3240-105b-72a3-b4cd30f1b1f1@oracle.com>

On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
> On 08/27/2018 06:46 AM, Jerome Glisse wrote:
> > On Mon, Aug 27, 2018 at 09:46:45AM +0200, Michal Hocko wrote:
> >> On Fri 24-08-18 11:08:24, Mike Kravetz wrote:
> >>> Here is an updated patch which does as you suggest above.
> >> [...]
> >>> @@ -1409,6 +1419,32 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >>>  		subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
> >>>  		address = pvmw.address;
> >>>  
> >>> +		if (PageHuge(page)) {
> >>> +			if (huge_pmd_unshare(mm, &address, pvmw.pte)) {
> >>> +				/*
> >>> +				 * huge_pmd_unshare unmapped an entire PMD
> >>> +				 * page.  There is no way of knowing exactly
> >>> +				 * which PMDs may be cached for this mm, so
> >>> +				 * we must flush them all.  start/end were
> >>> +				 * already adjusted above to cover this range.
> >>> +				 */
> >>> +				flush_cache_range(vma, start, end);
> >>> +				flush_tlb_range(vma, start, end);
> >>> +				mmu_notifier_invalidate_range(mm, start, end);
> >>> +
> >>> +				/*
> >>> +				 * The ref count of the PMD page was dropped
> >>> +				 * which is part of the way map counting
> >>> +				 * is done for shared PMDs.  Return 'true'
> >>> +				 * here.  When there is no other sharing,
> >>> +				 * huge_pmd_unshare returns false and we will
> >>> +				 * unmap the actual page and drop map count
> >>> +				 * to zero.
> >>> +				 */
> >>> +				page_vma_mapped_walk_done(&pvmw);
> >>> +				break;
> >>> +			}
> >>
> >> This still calls into notifier while holding the ptl lock. Either I am
> >> missing something or the invalidation is broken in this loop (not also
> >> for other invalidations).
> > 
> > mmu_notifier_invalidate_range() is done with pt lock held only the start
> > and end versions need to happen outside pt lock.
> 
> Hi Jerome (and anyone else having good understanding of mmu notifier API),
> 
> Michal and I have been looking at backports to stable releases.  If you look
> at the v4.4 version of try_to_unmap_one(), it does not use the
> mmu_notifier_invalidate_range_start/end interfaces. Rather, it uses the
> mmu_notifier_invalidate_page(), passing in the address of the page it
> unmapped.  This is done after releasing the ptl lock.  I'm not even sure if
> this works for huge pages, as it appears some THP supporting code was added
> to try_to_unmap_one() after v4.4.
> 
> But, we were wondering what mmu notifier interface to use in the case where
> try_to_unmap_one() unmaps a shared pmd huge page as addressed in the patch
> above.  In this case, a PUD sized area is effectively unmapped.  In the
> code/patch above we have the invalidate range (start and end as well) take
> the PUD sized area into account.
> 
> What would be the best mmu notifier interface to use where there are no
> start/end calls?
> Or, is the best solution to add the start/end calls as is done in later
> versions of the code?  If that is the suggestion, has there been any change
> in invalidate start/end semantics that we should take into account?

start/end would be the one to add, 4.4 seems broken in respect to THP
and mmu notification. Another solution is to fix user of mmu notifier,
they were only a handful back then. For instance properly adjust the
address to match first address covered by pmd or pud and passing down
correct page size to mmu_notifier_invalidate_page() would allow to fix
this easily.

This is ok because user of try_to_unmap_one() replace the pte/pmd/pud
with an invalid one (either poison, migration or swap) inside the
function. So anyone racing would synchronize on those special entry
hence why it is fine to delay mmu_notifier_invalidate_page() to after
dropping the page table lock.

Adding start/end might the solution with less code churn as you would
only need to change try_to_unmap_one().

Cheers,
Jerome

  reply	other threads:[~2018-08-29 18:14 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-23 20:59 [PATCH v6 0/2] huge_pmd_unshare migration and flushing Mike Kravetz
2018-08-23 20:59 ` [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages Mike Kravetz
2018-08-24  2:59   ` Naoya Horiguchi
2018-08-24  8:41   ` Michal Hocko
2018-08-24 18:08     ` Mike Kravetz
2018-08-27  7:46       ` Michal Hocko
2018-08-27 13:46         ` Jerome Glisse
2018-08-27 19:09           ` Michal Hocko
2018-08-29 17:24           ` Mike Kravetz
2018-08-29 18:14             ` Jerome Glisse [this message]
2018-08-29 18:39               ` Michal Hocko
2018-08-29 21:11                 ` Jerome Glisse
2018-08-30  0:40                   ` Mike Kravetz
2018-08-30 10:56                   ` Michal Hocko
2018-08-30 14:08                     ` Jerome Glisse
2018-08-30 16:19                       ` Michal Hocko
2018-08-30 16:57                         ` Jerome Glisse
2018-08-30 18:05                           ` Mike Kravetz
2018-08-30 18:39                             ` Jerome Glisse
2018-09-03  5:56                               ` Michal Hocko
2018-09-04 14:00                                 ` Jerome Glisse
2018-09-04 17:55                                   ` Mike Kravetz
2018-09-05  6:57                                   ` Michal Hocko
2018-08-27 16:42         ` Mike Kravetz
2018-08-27 19:11       ` Michal Hocko
2018-08-24  9:25   ` Michal Hocko
2018-08-23 20:59 ` [PATCH v6 2/2] hugetlb: take PMD sharing into account when flushing tlb/caches Mike Kravetz
2018-08-24  3:07   ` Naoya Horiguchi
2018-08-24 11:35 ` [PATCH v6 0/2] huge_pmd_unshare migration and flushing Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180829181424.GB3784@redhat.com \
    --to=jglisse@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox