From: Jerome Glisse <jglisse@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Andrew Morton <akpm@linux-foundation.org>,
stable@vger.kernel.org
Subject: Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages
Date: Wed, 29 Aug 2018 14:14:25 -0400 [thread overview]
Message-ID: <20180829181424.GB3784@redhat.com> (raw)
In-Reply-To: <9209043d-3240-105b-72a3-b4cd30f1b1f1@oracle.com>
On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
> On 08/27/2018 06:46 AM, Jerome Glisse wrote:
> > On Mon, Aug 27, 2018 at 09:46:45AM +0200, Michal Hocko wrote:
> >> On Fri 24-08-18 11:08:24, Mike Kravetz wrote:
> >>> Here is an updated patch which does as you suggest above.
> >> [...]
> >>> @@ -1409,6 +1419,32 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >>> subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
> >>> address = pvmw.address;
> >>>
> >>> + if (PageHuge(page)) {
> >>> + if (huge_pmd_unshare(mm, &address, pvmw.pte)) {
> >>> + /*
> >>> + * huge_pmd_unshare unmapped an entire PMD
> >>> + * page. There is no way of knowing exactly
> >>> + * which PMDs may be cached for this mm, so
> >>> + * we must flush them all. start/end were
> >>> + * already adjusted above to cover this range.
> >>> + */
> >>> + flush_cache_range(vma, start, end);
> >>> + flush_tlb_range(vma, start, end);
> >>> + mmu_notifier_invalidate_range(mm, start, end);
> >>> +
> >>> + /*
> >>> + * The ref count of the PMD page was dropped
> >>> + * which is part of the way map counting
> >>> + * is done for shared PMDs. Return 'true'
> >>> + * here. When there is no other sharing,
> >>> + * huge_pmd_unshare returns false and we will
> >>> + * unmap the actual page and drop map count
> >>> + * to zero.
> >>> + */
> >>> + page_vma_mapped_walk_done(&pvmw);
> >>> + break;
> >>> + }
> >>
> >> This still calls into notifier while holding the ptl lock. Either I am
> >> missing something or the invalidation is broken in this loop (not also
> >> for other invalidations).
> >
> > mmu_notifier_invalidate_range() is done with pt lock held only the start
> > and end versions need to happen outside pt lock.
>
> Hi Jerome (and anyone else having good understanding of mmu notifier API),
>
> Michal and I have been looking at backports to stable releases. If you look
> at the v4.4 version of try_to_unmap_one(), it does not use the
> mmu_notifier_invalidate_range_start/end interfaces. Rather, it uses the
> mmu_notifier_invalidate_page(), passing in the address of the page it
> unmapped. This is done after releasing the ptl lock. I'm not even sure if
> this works for huge pages, as it appears some THP supporting code was added
> to try_to_unmap_one() after v4.4.
>
> But, we were wondering what mmu notifier interface to use in the case where
> try_to_unmap_one() unmaps a shared pmd huge page as addressed in the patch
> above. In this case, a PUD sized area is effectively unmapped. In the
> code/patch above we have the invalidate range (start and end as well) take
> the PUD sized area into account.
>
> What would be the best mmu notifier interface to use where there are no
> start/end calls?
> Or, is the best solution to add the start/end calls as is done in later
> versions of the code? If that is the suggestion, has there been any change
> in invalidate start/end semantics that we should take into account?
start/end would be the one to add, 4.4 seems broken in respect to THP
and mmu notification. Another solution is to fix user of mmu notifier,
they were only a handful back then. For instance properly adjust the
address to match first address covered by pmd or pud and passing down
correct page size to mmu_notifier_invalidate_page() would allow to fix
this easily.
This is ok because user of try_to_unmap_one() replace the pte/pmd/pud
with an invalid one (either poison, migration or swap) inside the
function. So anyone racing would synchronize on those special entry
hence why it is fine to delay mmu_notifier_invalidate_page() to after
dropping the page table lock.
Adding start/end might the solution with less code churn as you would
only need to change try_to_unmap_one().
Cheers,
Jerome
next prev parent reply other threads:[~2018-08-29 18:14 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-23 20:59 [PATCH v6 0/2] huge_pmd_unshare migration and flushing Mike Kravetz
2018-08-23 20:59 ` [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages Mike Kravetz
2018-08-24 2:59 ` Naoya Horiguchi
2018-08-24 8:41 ` Michal Hocko
2018-08-24 18:08 ` Mike Kravetz
2018-08-27 7:46 ` Michal Hocko
2018-08-27 13:46 ` Jerome Glisse
2018-08-27 19:09 ` Michal Hocko
2018-08-29 17:24 ` Mike Kravetz
2018-08-29 18:14 ` Jerome Glisse [this message]
2018-08-29 18:39 ` Michal Hocko
2018-08-29 21:11 ` Jerome Glisse
2018-08-30 0:40 ` Mike Kravetz
2018-08-30 10:56 ` Michal Hocko
2018-08-30 14:08 ` Jerome Glisse
2018-08-30 16:19 ` Michal Hocko
2018-08-30 16:57 ` Jerome Glisse
2018-08-30 18:05 ` Mike Kravetz
2018-08-30 18:39 ` Jerome Glisse
2018-09-03 5:56 ` Michal Hocko
2018-09-04 14:00 ` Jerome Glisse
2018-09-04 17:55 ` Mike Kravetz
2018-09-05 6:57 ` Michal Hocko
2018-08-27 16:42 ` Mike Kravetz
2018-08-27 19:11 ` Michal Hocko
2018-08-24 9:25 ` Michal Hocko
2018-08-23 20:59 ` [PATCH v6 2/2] hugetlb: take PMD sharing into account when flushing tlb/caches Mike Kravetz
2018-08-24 3:07 ` Naoya Horiguchi
2018-08-24 11:35 ` [PATCH v6 0/2] huge_pmd_unshare migration and flushing Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180829181424.GB3784@redhat.com \
--to=jglisse@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox