From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 1AF866B04A1 for ; Tue, 29 Aug 2017 23:16:49 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id n73so9459772pfj.9 for ; Tue, 29 Aug 2017 20:16:49 -0700 (PDT) Received: from mail-pg0-x243.google.com (mail-pg0-x243.google.com. [2607:f8b0:400e:c05::243]) by mx.google.com with ESMTPS id t69si3390859pgd.690.2017.08.29.20.16.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Aug 2017 20:16:48 -0700 (PDT) Received: by mail-pg0-x243.google.com with SMTP id 83so4101990pgb.3 for ; Tue, 29 Aug 2017 20:16:47 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic From: Nadav Amit In-Reply-To: <20170830025910.GB2386@redhat.com> Date: Tue, 29 Aug 2017 20:16:44 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20170829235447.10050-1-jglisse@redhat.com> <20170829235447.10050-3-jglisse@redhat.com> <6D58FBE4-5D03-49CC-AAFF-3C1279A5A849@gmail.com> <20170830025910.GB2386@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Jerome Glisse Cc: Linux Kernel Mailing List , "open list:MEMORY MANAGEMENT" , Dan Williams , Ross Zwisler , Linus Torvalds , Bernhard Held , Adam Borowski , Andrea Arcangeli , =?utf-8?B?UmFkaW0gS3LEjW3DocWZ?= , Wanpeng Li , Paolo Bonzini , Takashi Iwai , Mike Galbraith , "Kirill A . Shutemov" , axie , Andrew Morton Jerome Glisse wrote: > On Tue, Aug 29, 2017 at 07:46:07PM -0700, Nadav Amit wrote: >> J=C3=A9r=C3=B4me Glisse wrote: >>=20 >>> Replacing all mmu_notifier_invalidate_page() by = mmu_notifier_invalidat_range() >>> and making sure it is bracketed by call to = mmu_notifier_invalidate_range_start/ >>> end. >>>=20 >>> Note that because we can not presume the pmd value or pte value we = have to >>> assume the worse and unconditionaly report an invalidation as = happening. >>>=20 >>> Signed-off-by: J=C3=A9r=C3=B4me Glisse >>> Cc: Dan Williams >>> Cc: Ross Zwisler >>> Cc: Linus Torvalds >>> Cc: Bernhard Held >>> Cc: Adam Borowski >>> Cc: Andrea Arcangeli >>> Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 >>> Cc: Wanpeng Li >>> Cc: Paolo Bonzini >>> Cc: Takashi Iwai >>> Cc: Nadav Amit >>> Cc: Mike Galbraith >>> Cc: Kirill A. Shutemov >>> Cc: axie >>> Cc: Andrew Morton >>> --- >>> mm/rmap.c | 44 +++++++++++++++++++++++++++++++++++++++++--- >>> 1 file changed, 41 insertions(+), 3 deletions(-) >>>=20 >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index c8993c63eb25..da97ed525088 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -887,11 +887,21 @@ static bool page_mkclean_one(struct page = *page, struct vm_area_struct *vma, >>> .address =3D address, >>> .flags =3D PVMW_SYNC, >>> }; >>> + unsigned long start =3D address, end; >>> int *cleaned =3D arg; >>>=20 >>> + /* >>> + * We have to assume the worse case ie pmd for invalidation. = Note that >>> + * the page can not be free from this function. >>> + */ >>> + end =3D min(vma->vm_end, (start & PMD_MASK) + PMD_SIZE); >>> + mmu_notifier_invalidate_range_start(vma->vm_mm, start, end); >>> + >>> while (page_vma_mapped_walk(&pvmw)) { >>> + unsigned long cstart, cend; >>> int ret =3D 0; >>> - address =3D pvmw.address; >>> + >>> + cstart =3D address =3D pvmw.address; >>> if (pvmw.pte) { >>> pte_t entry; >>> pte_t *pte =3D pvmw.pte; >>> @@ -904,6 +914,7 @@ static bool page_mkclean_one(struct page *page, = struct vm_area_struct *vma, >>> entry =3D pte_wrprotect(entry); >>> entry =3D pte_mkclean(entry); >>> set_pte_at(vma->vm_mm, address, pte, entry); >>> + cend =3D cstart + PAGE_SIZE; >>> ret =3D 1; >>> } else { >>> #ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE >>> @@ -918,6 +929,8 @@ static bool page_mkclean_one(struct page *page, = struct vm_area_struct *vma, >>> entry =3D pmd_wrprotect(entry); >>> entry =3D pmd_mkclean(entry); >>> set_pmd_at(vma->vm_mm, address, pmd, entry); >>> + cstart &=3D PMD_MASK; >>> + cend =3D cstart + PMD_SIZE; >>> ret =3D 1; >>> #else >>> /* unexpected pmd-mapped page? */ >>> @@ -926,11 +939,13 @@ static bool page_mkclean_one(struct page = *page, struct vm_area_struct *vma, >>> } >>>=20 >>> if (ret) { >>> - mmu_notifier_invalidate_page(vma->vm_mm, = address); >>> + mmu_notifier_invalidate_range(vma->vm_mm, = cstart, cend); >>> (*cleaned)++; >>> } >>> } >>>=20 >>> + mmu_notifier_invalidate_range_end(vma->vm_mm, start, end); >>> + >>> return true; >>> } >>>=20 >>> @@ -1324,6 +1339,7 @@ static bool try_to_unmap_one(struct page = *page, struct vm_area_struct *vma, >>> pte_t pteval; >>> struct page *subpage; >>> bool ret =3D true; >>> + unsigned long start =3D address, end; >>> enum ttu_flags flags =3D (enum ttu_flags)arg; >>>=20 >>> /* munlock has nothing to gain from examining un-locked vmas */ >>> @@ -1335,6 +1351,14 @@ static bool try_to_unmap_one(struct page = *page, struct vm_area_struct *vma, >>> flags & TTU_MIGRATION, page); >>> } >>>=20 >>> + /* >>> + * We have to assume the worse case ie pmd for invalidation. = Note that >>> + * the page can not be free in this function as call of = try_to_unmap() >>> + * must hold a reference on the page. >>> + */ >>> + end =3D min(vma->vm_end, (start & PMD_MASK) + PMD_SIZE); >>> + mmu_notifier_invalidate_range_start(vma->vm_mm, start, end); >>> + >>> while (page_vma_mapped_walk(&pvmw)) { >>> /* >>> * If the page is mlock()d, we cannot swap it out. >>> @@ -1408,6 +1432,8 @@ static bool try_to_unmap_one(struct page = *page, struct vm_area_struct *vma, >>> set_huge_swap_pte_at(mm, address, >>> pvmw.pte, pteval, >>> = vma_mmu_pagesize(vma)); >>> + mmu_notifier_invalidate_range(mm, = address, >>> + address + = vma_mmu_pagesize(vma)); >>=20 >> I don=E2=80=99t think that the notifier should be called after the = PTE is set, but >> after the PTE is cleared, PTE permissions are demoted (e.g., RW->RO) = or >> access/dirty bits are cleared. [There is an exception: if the PFN in = the PTE >> is changed without clearing the PTE before, but it does not apply = here, and >> there is a different notifier for this case.] >>=20 >> Therefore, IIUC, try_to_umap_one() should only call >> mmu_notifier_invalidate_range() after ptep_get_and_clear() and >> ptep_clear_flush() are called. All the other calls to >> mmu_notifier_invalidate_range() in this function can be removed. >=20 > Yes it would simplify the patch, i was trying to optimize for the case > where we restore the pte to its original value after = ptep_clear_flush() > or ptep_get_and_clear() as in this case there is no need to invalidate > any secondary page table but that's an overkill optimization that just > add too much complexity. Interesting. Actually, prior to your changes, it seems that the break statements would skip mmu_notifier_invalidate_page() when the PTE is not changed. So why not just change mmu_notifier_invalidate_page() into mmu_notifier_invalidate_range() ? Regards, Nadav= -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org