From: Mike Kravetz <mike.kravetz@oracle.com>
To: Jerome Glisse <jglisse@redhat.com>, Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Andrew Morton <akpm@linux-foundation.org>,
stable@vger.kernel.org, linux-rdma@vger.kernel.org,
Matan Barak <matanb@mellanox.com>,
Leon Romanovsky <leonro@mellanox.com>,
Dimitri Sivanich <sivanich@sgi.com>
Subject: Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages
Date: Wed, 29 Aug 2018 17:40:13 -0700 [thread overview]
Message-ID: <8689d0ba-1303-9765-4cae-ad24d2a1435b@oracle.com> (raw)
In-Reply-To: <20180829211106.GC3784@redhat.com>
On 08/29/2018 02:11 PM, Jerome Glisse wrote:
> On Wed, Aug 29, 2018 at 08:39:06PM +0200, Michal Hocko wrote:
>> On Wed 29-08-18 14:14:25, Jerome Glisse wrote:
>>> On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
>> [...]
>>>> What would be the best mmu notifier interface to use where there are no
>>>> start/end calls?
>>>> Or, is the best solution to add the start/end calls as is done in later
>>>> versions of the code? If that is the suggestion, has there been any change
>>>> in invalidate start/end semantics that we should take into account?
>>>
>>> start/end would be the one to add, 4.4 seems broken in respect to THP
>>> and mmu notification. Another solution is to fix user of mmu notifier,
>>> they were only a handful back then. For instance properly adjust the
>>> address to match first address covered by pmd or pud and passing down
>>> correct page size to mmu_notifier_invalidate_page() would allow to fix
>>> this easily.
>>>
>>> This is ok because user of try_to_unmap_one() replace the pte/pmd/pud
>>> with an invalid one (either poison, migration or swap) inside the
>>> function. So anyone racing would synchronize on those special entry
>>> hence why it is fine to delay mmu_notifier_invalidate_page() to after
>>> dropping the page table lock.
>>>
>>> Adding start/end might the solution with less code churn as you would
>>> only need to change try_to_unmap_one().
>>
>> What about dependencies? 369ea8242c0fb sounds like it needs work for all
>> notifiers need to be updated as well.
>
> This commit remove mmu_notifier_invalidate_page() hence why everything
> need to be updated. But in 4.4 you can get away with just adding start/
> end and keep around mmu_notifier_invalidate_page() to minimize disruption.
>
> So the new semantic in 369ea8242c0fb is that all page table changes are
> bracketed with mmu notifier start/end calls and invalidate_range right
> after tlb flush. This simplify thing and make it more reliable for mmu
> notifier users like IOMMU or ODP or GPUs drivers.
Here is what I came up with by adding the start/end calls to the 4.4 version
of try_to_unmap_one. Note that this assumes/uses the new routine
adjust_range_if_pmd_sharing_possible to adjust the notifier/flush range if
huge pmd sharing is possible. I changed the mmu_notifier_invalidate_page
to a mmu_notifier_invalidate_range, but am not sure if that needs to happen
earlier in the routine (like right after tlb flush as you said above).
Does this look reasonable?
diff --git a/mm/rmap.c b/mm/rmap.c
index b577fbb98d4b..7ba8bfeddb4b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1302,11 +1302,30 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
pte_t pteval;
spinlock_t *ptl;
int ret = SWAP_AGAIN;
+ unsigned long start = address, end;
enum ttu_flags flags = (enum ttu_flags)arg;
/* munlock has nothing to gain from examining un-locked vmas */
if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
- goto out;
+ return ret;
+
+ /*
+ * For THP, we have to assume the worse case ie pmd for invalidation.
+ * For hugetlb, it could be much worse if we need to do pud
+ * invalidation in the case of pmd sharing.
+ *
+ * Note that the page can not be free in this function as call of
+ * try_to_unmap() must hold a reference on the page.
+ */
+ end = min(vma->vm_end, start + (PAGE_SIZE << compound_order(page)));
+ if (PageHuge(page)) {
+ /*
+ * If sharing is possible, start and end will be adjusted
+ * accordingly.
+ */
+ adjust_range_if_pmd_sharing_possible(vma, &start, &end);
+ }
+ mmu_notifier_invalidate_range_start(vma->vm_mm, start, end);
pte = page_check_address(page, mm, address, &ptl, 0);
if (!pte)
@@ -1334,6 +1353,29 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
}
}
+ if (PageHuge(page) && huge_pmd_unshare(mm, &address, pte)) {
+ /*
+ * huge_pmd_unshare unmapped an entire PMD page. There is
+ * no way of knowing exactly which PMDs may be cached for
+ * this mm, so flush them all. start/end were already
+ * adjusted to cover this range.
+ */
+ flush_cache_range(vma, start, end);
+ flush_tlb_range(vma, start, end);
+
+ /*
+ * The ref count of the PMD page was dropped which is part
+ * of the way map counting is done for shared PMDs. When
+ * there is no other sharing, huge_pmd_unshare returns false
+ * and we will unmap the actual page and drop map count
+ * to zero.
+ *
+ * Note that huge_pmd_unshare modified address and is likely
+ * not what you would expect.
+ */
+ goto out_unmap;
+ }
+
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
if (should_defer_flush(mm, flags)) {
@@ -1424,10 +1466,11 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
page_cache_release(page);
out_unmap:
- pte_unmap_unlock(pte, ptl);
if (ret != SWAP_FAIL && ret != SWAP_MLOCK && !(flags & TTU_MUNLOCK))
- mmu_notifier_invalidate_page(mm, address);
+ mmu_notifier_invalidate_range(mm, start, end);
+ pte_unmap_unlock(pte, ptl);
out:
+ mmu_notifier_invalidate_range_end(vma->vm_mm, start, end);
return ret;
}
--
Mike Kravetz
next prev parent reply other threads:[~2018-08-30 0:40 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-23 20:59 [PATCH v6 0/2] huge_pmd_unshare migration and flushing Mike Kravetz
2018-08-23 20:59 ` [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages Mike Kravetz
2018-08-24 2:59 ` Naoya Horiguchi
2018-08-24 8:41 ` Michal Hocko
2018-08-24 18:08 ` Mike Kravetz
2018-08-27 7:46 ` Michal Hocko
2018-08-27 13:46 ` Jerome Glisse
2018-08-27 19:09 ` Michal Hocko
2018-08-29 17:24 ` Mike Kravetz
2018-08-29 18:14 ` Jerome Glisse
2018-08-29 18:39 ` Michal Hocko
2018-08-29 21:11 ` Jerome Glisse
2018-08-30 0:40 ` Mike Kravetz [this message]
2018-08-30 10:56 ` Michal Hocko
2018-08-30 14:08 ` Jerome Glisse
2018-08-30 16:19 ` Michal Hocko
2018-08-30 16:57 ` Jerome Glisse
2018-08-30 18:05 ` Mike Kravetz
2018-08-30 18:39 ` Jerome Glisse
2018-09-03 5:56 ` Michal Hocko
2018-09-04 14:00 ` Jerome Glisse
2018-09-04 17:55 ` Mike Kravetz
2018-09-05 6:57 ` Michal Hocko
2018-08-27 16:42 ` Mike Kravetz
2018-08-27 19:11 ` Michal Hocko
2018-08-24 9:25 ` Michal Hocko
2018-08-23 20:59 ` [PATCH v6 2/2] hugetlb: take PMD sharing into account when flushing tlb/caches Mike Kravetz
2018-08-24 3:07 ` Naoya Horiguchi
2018-08-24 11:35 ` [PATCH v6 0/2] huge_pmd_unshare migration and flushing Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8689d0ba-1303-9765-4cae-ad24d2a1435b@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=jglisse@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=leonro@mellanox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=matanb@mellanox.com \
--cc=mhocko@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=sivanich@sgi.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox