* [RFC 0/4] mm: Introduce lazy exec permission setting on a page
@ 2019-02-13 8:06 Anshuman Khandual
2019-02-13 8:06 ` [RFC 1/4] " Anshuman Khandual
` (5 more replies)
0 siblings, 6 replies; 28+ messages in thread
From: Anshuman Khandual @ 2019-02-13 8:06 UTC (permalink / raw)
To: linux-mm, akpm
Cc: mhocko, kirill, kirill.shutemov, vbabka, will.deacon,
catalin.marinas, dave.hansen
Setting an exec permission on a page normally triggers I-cache invalidation
which might be expensive. I-cache invalidation is not mandatory on a given
page if there is no immediate exec access on it. Non-fault modification of
user page table from generic memory paths like migration can be improved if
setting of the exec permission on the page can be deferred till actual use.
There was a performance report [1] which highlighted the problem.
This introduces [pte|pmd]_mklazyexec() which clears the exec permission on
a page during migration. This exec permission deferral must be enabled back
with maybe_[pmd]_mkexec() during exec page fault (FAULT_FLAG_INSTRUCTION)
if the corresponding VMA contains exec flag (VM_EXEC).
This framework is encapsulated under CONFIG_ARCH_SUPPORTS_LAZY_EXEC so that
non-subscribing architectures don't take any performance hit. For now only
generic memory migration path will be using this framework but later it can
be extended to other generic memory paths as well.
This enables CONFIG_ARCH_SUPPORTS_LAZY_EXEC on arm64 and defines required
helper functions in this regard while changing ptep_set_access_flags() to
allow non-exec to exec transition.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-December/620357.html
Anshuman Khandual (4):
mm: Introduce lazy exec permission setting on a page
arm64/mm: Identify user level instruction faults
arm64/mm: Allow non-exec to exec transition in ptep_set_access_flags()
arm64/mm: Enable ARCH_SUPPORTS_LAZY_EXEC
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/pgtable.h | 17 +++++++++++++++++
arch/arm64/mm/fault.c | 22 ++++++++++++++--------
include/asm-generic/pgtable.h | 12 ++++++++++++
include/linux/mm.h | 26 ++++++++++++++++++++++++++
mm/Kconfig | 9 +++++++++
mm/huge_memory.c | 5 +++++
mm/hugetlb.c | 2 ++
mm/memory.c | 4 ++++
mm/migrate.c | 2 ++
10 files changed, 92 insertions(+), 8 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 28+ messages in thread* [RFC 1/4] mm: Introduce lazy exec permission setting on a page 2019-02-13 8:06 [RFC 0/4] mm: Introduce lazy exec permission setting on a page Anshuman Khandual @ 2019-02-13 8:06 ` Anshuman Khandual 2019-02-13 13:17 ` Matthew Wilcox 2019-02-13 8:06 ` [RFC 2/4] arm64/mm: Identify user level instruction faults Anshuman Khandual ` (4 subsequent siblings) 5 siblings, 1 reply; 28+ messages in thread From: Anshuman Khandual @ 2019-02-13 8:06 UTC (permalink / raw) To: linux-mm, akpm Cc: mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas, dave.hansen Setting an exec permission on a page normally triggers I-cache invalidation which might be expensive. I-cache invalidation is not mandatory on a given page if there is no immediate exec access on it. Non-fault modification of user page table from generic memory paths like migration can be improved if setting of the exec permission on the page can be deferred till actual use. This introduces [pte|pmd]_mklazyexec() which clears the exec permission on a page during migration. This exec permission deferral must be enabled back with maybe_[pmd]_mkexec() during exec page fault (FAULT_FLAG_INSTRUCTION) if the corresponding VMA contains exec flag (VM_EXEC). This framework is encapsulated under CONFIG_ARCH_SUPPORTS_LAZY_EXEC so that non-subscribing architectures don't take any performance hit. For now only generic memory migration path will be using this framework but later it can be extended to other generic memory paths as well. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- include/asm-generic/pgtable.h | 12 ++++++++++++ include/linux/mm.h | 26 ++++++++++++++++++++++++++ mm/Kconfig | 9 +++++++++ mm/huge_memory.c | 5 +++++ mm/hugetlb.c | 2 ++ mm/memory.c | 4 ++++ mm/migrate.c | 2 ++ 7 files changed, 60 insertions(+) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 05e61e6..d35d129 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -26,6 +26,18 @@ #define USER_PGTABLES_CEILING 0UL #endif +#ifndef CONFIG_ARCH_SUPPORTS_LAZY_EXEC +static inline pte_t pte_mklazyexec(pte_t entry) +{ + return entry; +} + +static inline pmd_t pmd_mklazyexec(pmd_t entry) +{ + return entry; +} +#endif + #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, diff --git a/include/linux/mm.h b/include/linux/mm.h index 80bb640..04d7a0a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -755,6 +755,32 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) return pte; } +#ifdef CONFIG_ARCH_SUPPORTS_LAZY_EXEC +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) +{ + if (unlikely(vma->vm_flags & VM_EXEC)) + return pte_mkexec(entry); + return entry; +} + +static inline pmd_t maybe_pmd_mkexec(pmd_t entry, struct vm_area_struct *vma) +{ + if (unlikely(vma->vm_flags & VM_EXEC)) + return pmd_mkexec(entry); + return entry; +} +#else +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) +{ + return entry; +} + +static inline pmd_t maybe_pmd_mkexec(pmd_t entry, struct vm_area_struct *vma) +{ + return entry; +} +#endif + vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, struct page *page); vm_fault_t finish_fault(struct vm_fault *vmf); diff --git a/mm/Kconfig b/mm/Kconfig index 25c71eb..5c046cb 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -322,6 +322,15 @@ config DEFAULT_MMAP_MIN_ADDR This value can be changed after boot using the /proc/sys/vm/mmap_min_addr tunable. +config ARCH_SUPPORTS_LAZY_EXEC + bool "Architecture supports deferred exec permission setting" + help + Some architectures can improve performance during non-fault page + table modifications paths with deferred exec permission setting + which helps in avoiding expensive I-cache invalidations. This + requires arch implementation of ptep_set_access_flags() to allow + non-exec to exec transition. + config ARCH_SUPPORTS_MEMORY_FAILURE bool diff --git a/mm/huge_memory.c b/mm/huge_memory.c index faf357e..9ef7662 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1126,6 +1126,8 @@ void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd) if (write) entry = pmd_mkdirty(entry); haddr = vmf->address & HPAGE_PMD_MASK; + if (vmf->flags & FAULT_FLAG_INSTRUCTION) + entry = maybe_pmd_mkexec(entry, vmf->vma); if (pmdp_set_access_flags(vmf->vma, haddr, vmf->pmd, entry, write)) update_mmu_cache_pmd(vmf->vma, vmf->address, vmf->pmd); @@ -1290,6 +1292,8 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) pmd_t entry; entry = pmd_mkyoung(orig_pmd); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (vmf->flags & FAULT_FLAG_INSTRUCTION) + entry = maybe_pmd_mkexec(entry, vma); if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); ret |= VM_FAULT_WRITE; @@ -2944,6 +2948,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) pmde = pmd_mksoft_dirty(pmde); if (is_write_migration_entry(entry)) pmde = maybe_pmd_mkwrite(pmde, vma); + pmde = pmd_mklazyexec(pmde); flush_cache_range(vma, mmun_start, mmun_start + HPAGE_PMD_SIZE); if (PageAnon(new)) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index afef616..ea41832 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4018,6 +4018,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, entry = huge_pte_mkdirty(entry); } entry = pte_mkyoung(entry); + if (flags & FAULT_FLAG_INSTRUCTION) + entry = maybe_mkexec(entry, vma); if (huge_ptep_set_access_flags(vma, haddr, ptep, entry, flags & FAULT_FLAG_WRITE)) update_mmu_cache(vma, haddr, ptep); diff --git a/mm/memory.c b/mm/memory.c index e11ca9d..74c406b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2218,6 +2218,8 @@ static inline void wp_page_reuse(struct vm_fault *vmf) flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry = pte_mkyoung(vmf->orig_pte); entry = maybe_mkwrite(pte_mkdirty(entry), vma); + if (vmf->flags & FAULT_FLAG_INSTRUCTION) + entry = maybe_mkexec(entry, vma); if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1)) update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -3804,6 +3806,8 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) entry = pte_mkdirty(entry); } entry = pte_mkyoung(entry); + if (vmf->flags & FAULT_FLAG_INSTRUCTION) + entry = maybe_mkexec(entry, vmf->vma); if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, vmf->flags & FAULT_FLAG_WRITE)) { update_mmu_cache(vmf->vma, vmf->address, vmf->pte); diff --git a/mm/migrate.c b/mm/migrate.c index d4fd680..7587717 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -257,6 +257,7 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, if (PageHuge(new)) { pte = pte_mkhuge(pte); pte = arch_make_huge_pte(pte, vma, new, 0); + pte = pte_mklazyexec(pte); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); if (PageAnon(new)) hugepage_add_anon_rmap(new, vma, pvmw.address); @@ -265,6 +266,7 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, } else #endif { + pte = pte_mklazyexec(pte); set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); if (PageAnon(new)) -- 2.7.4 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 1/4] mm: Introduce lazy exec permission setting on a page 2019-02-13 8:06 ` [RFC 1/4] " Anshuman Khandual @ 2019-02-13 13:17 ` Matthew Wilcox 2019-02-13 13:53 ` Anshuman Khandual 0 siblings, 1 reply; 28+ messages in thread From: Matthew Wilcox @ 2019-02-13 13:17 UTC (permalink / raw) To: Anshuman Khandual Cc: linux-mm, akpm, mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas, dave.hansen On Wed, Feb 13, 2019 at 01:36:28PM +0530, Anshuman Khandual wrote: > +#ifdef CONFIG_ARCH_SUPPORTS_LAZY_EXEC > +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) > +{ > + if (unlikely(vma->vm_flags & VM_EXEC)) > + return pte_mkexec(entry); > + return entry; > +} > +#else > +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) > +{ > + return entry; > +} > +#endif > +++ b/mm/memory.c > @@ -2218,6 +2218,8 @@ static inline void wp_page_reuse(struct vm_fault *vmf) > flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); > entry = pte_mkyoung(vmf->orig_pte); > entry = maybe_mkwrite(pte_mkdirty(entry), vma); > + if (vmf->flags & FAULT_FLAG_INSTRUCTION) > + entry = maybe_mkexec(entry, vma); I don't understand this bit. We have a fault based on an instruction fetch. But we're only going to _maybe_ set the exec bit? Why not call pte_mkexec() unconditionally? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 1/4] mm: Introduce lazy exec permission setting on a page 2019-02-13 13:17 ` Matthew Wilcox @ 2019-02-13 13:53 ` Anshuman Khandual 2019-02-14 9:06 ` Mike Rapoport 0 siblings, 1 reply; 28+ messages in thread From: Anshuman Khandual @ 2019-02-13 13:53 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-mm, akpm, mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas, dave.hansen On 02/13/2019 06:47 PM, Matthew Wilcox wrote: > On Wed, Feb 13, 2019 at 01:36:28PM +0530, Anshuman Khandual wrote: >> +#ifdef CONFIG_ARCH_SUPPORTS_LAZY_EXEC >> +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) >> +{ >> + if (unlikely(vma->vm_flags & VM_EXEC)) >> + return pte_mkexec(entry); >> + return entry; >> +} >> +#else >> +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) >> +{ >> + return entry; >> +} >> +#endif > >> +++ b/mm/memory.c >> @@ -2218,6 +2218,8 @@ static inline void wp_page_reuse(struct vm_fault *vmf) >> flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); >> entry = pte_mkyoung(vmf->orig_pte); >> entry = maybe_mkwrite(pte_mkdirty(entry), vma); >> + if (vmf->flags & FAULT_FLAG_INSTRUCTION) >> + entry = maybe_mkexec(entry, vma); > > I don't understand this bit. We have a fault based on an instruction > fetch. But we're only going to _maybe_ set the exec bit? Why not call > pte_mkexec() unconditionally? Because the arch might not have subscribed to this in which case the fall back function does nothing and return the same entry. But in case this is enabled it also checks for VMA exec flag (VM_EXEC) before calling into pte_mkexec() something similar to existing maybe_mkwrite(). ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 1/4] mm: Introduce lazy exec permission setting on a page 2019-02-13 13:53 ` Anshuman Khandual @ 2019-02-14 9:06 ` Mike Rapoport 2019-02-15 8:11 ` Anshuman Khandual 0 siblings, 1 reply; 28+ messages in thread From: Mike Rapoport @ 2019-02-14 9:06 UTC (permalink / raw) To: Anshuman Khandual Cc: Matthew Wilcox, linux-mm, akpm, mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas, dave.hansen On Wed, Feb 13, 2019 at 07:23:18PM +0530, Anshuman Khandual wrote: > > > On 02/13/2019 06:47 PM, Matthew Wilcox wrote: > > On Wed, Feb 13, 2019 at 01:36:28PM +0530, Anshuman Khandual wrote: > >> +#ifdef CONFIG_ARCH_SUPPORTS_LAZY_EXEC > >> +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) > >> +{ > >> + if (unlikely(vma->vm_flags & VM_EXEC)) > >> + return pte_mkexec(entry); > >> + return entry; > >> +} > >> +#else > >> +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) > >> +{ > >> + return entry; > >> +} > >> +#endif > > > >> +++ b/mm/memory.c > >> @@ -2218,6 +2218,8 @@ static inline void wp_page_reuse(struct vm_fault *vmf) > >> flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); > >> entry = pte_mkyoung(vmf->orig_pte); > >> entry = maybe_mkwrite(pte_mkdirty(entry), vma); > >> + if (vmf->flags & FAULT_FLAG_INSTRUCTION) > >> + entry = maybe_mkexec(entry, vma); > > > > I don't understand this bit. We have a fault based on an instruction > > fetch. But we're only going to _maybe_ set the exec bit? Why not call > > pte_mkexec() unconditionally? > > Because the arch might not have subscribed to this in which case the fall > back function does nothing and return the same entry. But in case this is > enabled it also checks for VMA exec flag (VM_EXEC) before calling into > pte_mkexec() something similar to existing maybe_mkwrite(). Than why not pass vmf->flags to maybe_mkexec() so that only arches subscribed to this will have the check for 'flags & FAULT_FLAG_INSTRUCTION' ? -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 1/4] mm: Introduce lazy exec permission setting on a page 2019-02-14 9:06 ` Mike Rapoport @ 2019-02-15 8:11 ` Anshuman Khandual 2019-02-15 9:49 ` Catalin Marinas 0 siblings, 1 reply; 28+ messages in thread From: Anshuman Khandual @ 2019-02-15 8:11 UTC (permalink / raw) To: Mike Rapoport Cc: Matthew Wilcox, linux-mm, akpm, mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas, dave.hansen On 02/14/2019 02:36 PM, Mike Rapoport wrote: > On Wed, Feb 13, 2019 at 07:23:18PM +0530, Anshuman Khandual wrote: >> >> >> On 02/13/2019 06:47 PM, Matthew Wilcox wrote: >>> On Wed, Feb 13, 2019 at 01:36:28PM +0530, Anshuman Khandual wrote: >>>> +#ifdef CONFIG_ARCH_SUPPORTS_LAZY_EXEC >>>> +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) >>>> +{ >>>> + if (unlikely(vma->vm_flags & VM_EXEC)) >>>> + return pte_mkexec(entry); >>>> + return entry; >>>> +} >>>> +#else >>>> +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) >>>> +{ >>>> + return entry; >>>> +} >>>> +#endif >>> >>>> +++ b/mm/memory.c >>>> @@ -2218,6 +2218,8 @@ static inline void wp_page_reuse(struct vm_fault *vmf) >>>> flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); >>>> entry = pte_mkyoung(vmf->orig_pte); >>>> entry = maybe_mkwrite(pte_mkdirty(entry), vma); >>>> + if (vmf->flags & FAULT_FLAG_INSTRUCTION) >>>> + entry = maybe_mkexec(entry, vma); >>> >>> I don't understand this bit. We have a fault based on an instruction >>> fetch. But we're only going to _maybe_ set the exec bit? Why not call >>> pte_mkexec() unconditionally? >> >> Because the arch might not have subscribed to this in which case the fall >> back function does nothing and return the same entry. But in case this is >> enabled it also checks for VMA exec flag (VM_EXEC) before calling into >> pte_mkexec() something similar to existing maybe_mkwrite(). > > Than why not pass vmf->flags to maybe_mkexec() so that only arches > subscribed to this will have the check for 'flags & FAULT_FLAG_INSTRUCTION' ? Right it can help remove couple of instructions from un-subscribing archs. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 1/4] mm: Introduce lazy exec permission setting on a page 2019-02-15 8:11 ` Anshuman Khandual @ 2019-02-15 9:49 ` Catalin Marinas 0 siblings, 0 replies; 28+ messages in thread From: Catalin Marinas @ 2019-02-15 9:49 UTC (permalink / raw) To: Anshuman Khandual Cc: Mike Rapoport, Matthew Wilcox, linux-mm, akpm, mhocko, kirill, kirill.shutemov, vbabka, will.deacon, dave.hansen On Fri, Feb 15, 2019 at 01:41:16PM +0530, Anshuman Khandual wrote: > On 02/14/2019 02:36 PM, Mike Rapoport wrote: > > On Wed, Feb 13, 2019 at 07:23:18PM +0530, Anshuman Khandual wrote: > >> On 02/13/2019 06:47 PM, Matthew Wilcox wrote: > >>> On Wed, Feb 13, 2019 at 01:36:28PM +0530, Anshuman Khandual wrote: > >>>> +#ifdef CONFIG_ARCH_SUPPORTS_LAZY_EXEC > >>>> +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) > >>>> +{ > >>>> + if (unlikely(vma->vm_flags & VM_EXEC)) > >>>> + return pte_mkexec(entry); > >>>> + return entry; > >>>> +} > >>>> +#else > >>>> +static inline pte_t maybe_mkexec(pte_t entry, struct vm_area_struct *vma) > >>>> +{ > >>>> + return entry; > >>>> +} > >>>> +#endif > >>> > >>>> +++ b/mm/memory.c > >>>> @@ -2218,6 +2218,8 @@ static inline void wp_page_reuse(struct vm_fault *vmf) > >>>> flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); > >>>> entry = pte_mkyoung(vmf->orig_pte); > >>>> entry = maybe_mkwrite(pte_mkdirty(entry), vma); > >>>> + if (vmf->flags & FAULT_FLAG_INSTRUCTION) > >>>> + entry = maybe_mkexec(entry, vma); > >>> > >>> I don't understand this bit. We have a fault based on an instruction > >>> fetch. But we're only going to _maybe_ set the exec bit? Why not call > >>> pte_mkexec() unconditionally? > >> > >> Because the arch might not have subscribed to this in which case the fall > >> back function does nothing and return the same entry. But in case this is > >> enabled it also checks for VMA exec flag (VM_EXEC) before calling into > >> pte_mkexec() something similar to existing maybe_mkwrite(). > > > > Than why not pass vmf->flags to maybe_mkexec() so that only arches > > subscribed to this will have the check for 'flags & FAULT_FLAG_INSTRUCTION' ? > > Right it can help remove couple of instructions from un-subscribing archs. If the arch does not enable CONFIG_ARCH_SUPPORTS_LAZY_EXEC, wouldn't the compiler eliminate the FAULT_FLAG_INSTRUCTION check anyway? The current maybe_mkexec() proposal here looks slightly nicer as it matches the maybe_mkwrite() prototype. -- Catalin ^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC 2/4] arm64/mm: Identify user level instruction faults 2019-02-13 8:06 [RFC 0/4] mm: Introduce lazy exec permission setting on a page Anshuman Khandual 2019-02-13 8:06 ` [RFC 1/4] " Anshuman Khandual @ 2019-02-13 8:06 ` Anshuman Khandual 2019-02-13 8:06 ` [RFC 3/4] arm64/mm: Allow non-exec to exec transition in ptep_set_access_flags() Anshuman Khandual ` (3 subsequent siblings) 5 siblings, 0 replies; 28+ messages in thread From: Anshuman Khandual @ 2019-02-13 8:06 UTC (permalink / raw) To: linux-mm, akpm Cc: mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas, dave.hansen Page fault flags (FAULT_FLAG_XXX) need to be passed down fault handling path for appropriate action and reporting. Identify user instruction fetch faults and mark them with FAULT_FLAG_INSTRUCTION. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- arch/arm64/mm/fault.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index efb7b2c..591670d 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -468,6 +468,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, mm_flags |= FAULT_FLAG_WRITE; } + if (is_el0_instruction_abort(esr)) + mm_flags |= FAULT_FLAG_INSTRUCTION; + if (is_ttbr0_addr(addr) && is_el1_permission_fault(addr, esr, regs)) { /* regs->orig_addr_limit may be 0 if we entered from EL0 */ if (regs->orig_addr_limit == KERNEL_DS) -- 2.7.4 ^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC 3/4] arm64/mm: Allow non-exec to exec transition in ptep_set_access_flags() 2019-02-13 8:06 [RFC 0/4] mm: Introduce lazy exec permission setting on a page Anshuman Khandual 2019-02-13 8:06 ` [RFC 1/4] " Anshuman Khandual 2019-02-13 8:06 ` [RFC 2/4] arm64/mm: Identify user level instruction faults Anshuman Khandual @ 2019-02-13 8:06 ` Anshuman Khandual 2019-02-13 8:06 ` [RFC 4/4] arm64/mm: Enable ARCH_SUPPORTS_LAZY_EXEC Anshuman Khandual ` (2 subsequent siblings) 5 siblings, 0 replies; 28+ messages in thread From: Anshuman Khandual @ 2019-02-13 8:06 UTC (permalink / raw) To: linux-mm, akpm Cc: mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas, dave.hansen ptep_set_access_flags() updates page table for a mapped page entry which still got a fault probably because of a different permission than what it is mapped with. Previously an exec enabled page always gets required permission in the page table entry. Hence ptep_set_access_flags() never had to move an entry from non-exec to exec. This is going to change with deferred exec permission setting with later patches. Hence allow non-exec to exec transition here and do the required I-cache invalidation. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- arch/arm64/mm/fault.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 591670d..1540fc1 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -227,22 +227,25 @@ int ptep_set_access_flags(struct vm_area_struct *vma, if (pte_same(pte, entry)) return 0; - /* only preserve the access flags and write permission */ - pte_val(entry) &= PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY; + /* only preserve the access flags, write and exec permission */ + pte_val(entry) &= PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY | PTE_UXN; + + if (pte_user_exec(entry)) + __sync_icache_dcache(pte); /* * Setting the flags must be done atomically to avoid racing with the - * hardware update of the access/dirty state. The PTE_RDONLY bit must - * be set to the most permissive (lowest value) of *ptep and entry - * (calculated as: a & b == ~(~a | ~b)). + * hardware update of the access/dirty state. The PTE_RDONLY bit and + * PTE_UXN must be set to the most permissive (lowest value) of *ptep + * and entry (calculated as: a & b == ~(~a | ~b)). */ - pte_val(entry) ^= PTE_RDONLY; + pte_val(entry) ^= PTE_RDONLY | PTE_UXN; pteval = pte_val(pte); do { old_pteval = pteval; - pteval ^= PTE_RDONLY; + pteval ^= PTE_RDONLY | PTE_UXN; pteval |= pte_val(entry); - pteval ^= PTE_RDONLY; + pteval ^= PTE_RDONLY | PTE_UXN; pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval, pteval); } while (pteval != old_pteval); -- 2.7.4 ^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC 4/4] arm64/mm: Enable ARCH_SUPPORTS_LAZY_EXEC 2019-02-13 8:06 [RFC 0/4] mm: Introduce lazy exec permission setting on a page Anshuman Khandual ` (2 preceding siblings ...) 2019-02-13 8:06 ` [RFC 3/4] arm64/mm: Allow non-exec to exec transition in ptep_set_access_flags() Anshuman Khandual @ 2019-02-13 8:06 ` Anshuman Khandual 2019-02-13 11:21 ` [RFC 0/4] mm: Introduce lazy exec permission setting on a page Catalin Marinas 2019-02-13 15:44 ` Dave Hansen 5 siblings, 0 replies; 28+ messages in thread From: Anshuman Khandual @ 2019-02-13 8:06 UTC (permalink / raw) To: linux-mm, akpm Cc: mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas, dave.hansen Make arm64 subscribe to ARCH_SUPPORTS_LAZY_EXEC framework and provided all required helpers for this purpose. This moves away execution cost from the migration path to exec fault path as expected. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a4168d3..3cdb3e4 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -59,6 +59,7 @@ config ARM64 select ARCH_USE_CMPXCHG_LOCKREF select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS + select ARCH_SUPPORTS_LAZY_EXEC select ARCH_SUPPORTS_MEMORY_FAILURE select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_INT128 if GCC_VERSION >= 50000 || CC_IS_CLANG diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index de70c1e..f2a5716 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -217,6 +217,18 @@ static inline pmd_t pmd_mkcont(pmd_t pmd) return __pmd(pmd_val(pmd) | PMD_SECT_CONT); } +#ifdef CONFIG_ARCH_SUPPORTS_LAZY_EXEC +static inline pte_t pte_mkexec(pte_t pte) +{ + return clear_pte_bit(pte, __pgprot(PTE_UXN)); +} + +static inline pte_t pte_mklazyexec(pte_t pte) +{ + return set_pte_bit(pte, __pgprot(PTE_UXN)); +} +#endif + static inline void set_pte(pte_t *ptep, pte_t pte) { WRITE_ONCE(*ptep, pte); @@ -355,6 +367,11 @@ static inline int pmd_protnone(pmd_t pmd) } #endif +#ifdef CONFIG_ARCH_SUPPORTS_LAZY_EXEC +#define pmd_mkexec(pmd) pte_pmd(pte_mkexec(pmd_pte(pmd))) +#define pmd_mklazyexec(pmd) pte_pmd(pte_mklazyexec(pmd_pte(pmd))) +#endif + /* * THP definitions. */ -- 2.7.4 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-13 8:06 [RFC 0/4] mm: Introduce lazy exec permission setting on a page Anshuman Khandual ` (3 preceding siblings ...) 2019-02-13 8:06 ` [RFC 4/4] arm64/mm: Enable ARCH_SUPPORTS_LAZY_EXEC Anshuman Khandual @ 2019-02-13 11:21 ` Catalin Marinas 2019-02-13 15:38 ` Michal Hocko 2019-02-13 15:44 ` Dave Hansen 5 siblings, 1 reply; 28+ messages in thread From: Catalin Marinas @ 2019-02-13 11:21 UTC (permalink / raw) To: Anshuman Khandual Cc: linux-mm, akpm, mhocko, kirill, kirill.shutemov, vbabka, will.deacon, dave.hansen On Wed, Feb 13, 2019 at 01:36:27PM +0530, Anshuman Khandual wrote: > Setting an exec permission on a page normally triggers I-cache invalidation > which might be expensive. I-cache invalidation is not mandatory on a given > page if there is no immediate exec access on it. Non-fault modification of > user page table from generic memory paths like migration can be improved if > setting of the exec permission on the page can be deferred till actual use. > There was a performance report [1] which highlighted the problem. [...] > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-December/620357.html FTR, this performance regression has been addressed by commit 132fdc379eb1 ("arm64: Do not issue IPIs for user executable ptes"). That said, I still think this patch series is valuable for further optimising the page migration path on arm64 (and can be extended to other architectures that currently require I/D cache maintenance for executable pages). BTW, if you are going to post new versions of this series, please include linux-arch and linux-arm-kernel. -- Catalin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-13 11:21 ` [RFC 0/4] mm: Introduce lazy exec permission setting on a page Catalin Marinas @ 2019-02-13 15:38 ` Michal Hocko 2019-02-14 6:04 ` Anshuman Khandual 0 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-02-13 15:38 UTC (permalink / raw) To: Catalin Marinas Cc: Anshuman Khandual, linux-mm, akpm, kirill, kirill.shutemov, vbabka, will.deacon, dave.hansen On Wed 13-02-19 11:21:36, Catalin Marinas wrote: > On Wed, Feb 13, 2019 at 01:36:27PM +0530, Anshuman Khandual wrote: > > Setting an exec permission on a page normally triggers I-cache invalidation > > which might be expensive. I-cache invalidation is not mandatory on a given > > page if there is no immediate exec access on it. Non-fault modification of > > user page table from generic memory paths like migration can be improved if > > setting of the exec permission on the page can be deferred till actual use. > > There was a performance report [1] which highlighted the problem. > [...] > > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-December/620357.html > > FTR, this performance regression has been addressed by commit > 132fdc379eb1 ("arm64: Do not issue IPIs for user executable ptes"). That > said, I still think this patch series is valuable for further optimising > the page migration path on arm64 (and can be extended to other > architectures that currently require I/D cache maintenance for > executable pages). Are there any numbers to show the optimization impact? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-13 15:38 ` Michal Hocko @ 2019-02-14 6:04 ` Anshuman Khandual 2019-02-14 8:38 ` Michal Hocko 2019-02-14 15:38 ` Dave Hansen 0 siblings, 2 replies; 28+ messages in thread From: Anshuman Khandual @ 2019-02-14 6:04 UTC (permalink / raw) To: Michal Hocko, Catalin Marinas Cc: linux-mm, akpm, kirill, kirill.shutemov, vbabka, will.deacon, dave.hansen On 02/13/2019 09:08 PM, Michal Hocko wrote: > On Wed 13-02-19 11:21:36, Catalin Marinas wrote: >> On Wed, Feb 13, 2019 at 01:36:27PM +0530, Anshuman Khandual wrote: >>> Setting an exec permission on a page normally triggers I-cache invalidation >>> which might be expensive. I-cache invalidation is not mandatory on a given >>> page if there is no immediate exec access on it. Non-fault modification of >>> user page table from generic memory paths like migration can be improved if >>> setting of the exec permission on the page can be deferred till actual use. >>> There was a performance report [1] which highlighted the problem. >> [...] >>> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-December/620357.html >> >> FTR, this performance regression has been addressed by commit >> 132fdc379eb1 ("arm64: Do not issue IPIs for user executable ptes"). That >> said, I still think this patch series is valuable for further optimising >> the page migration path on arm64 (and can be extended to other >> architectures that currently require I/D cache maintenance for >> executable pages). > > Are there any numbers to show the optimization impact? This series transfers execution cost linearly with nr_pages from migration path to subsequent exec access path for normal, THP and HugeTLB pages. The experiment is on mainline kernel (1f947a7a011fcceb14cb912f548) along with some patches for HugeTLB and THP migration enablement on arm64 platform. A. [Normal Pages] nr_pages migration1 migration2 execfault1 execfault2 1000 7.000000 3.000000 24.000000 31.000000 5000 38.000000 18.000000 127.000000 153.000000 10000 80.000000 40.000000 289.000000 343.000000 15000 120.000000 60.000000 435.000000 514.000000 19900 159.000000 79.000000 576.000000 681.000000 B. [THP Pages] nr_pages migration1 migration2 execfault1 execfault2 10 22.000000 3.000000 131.000000 146.000000 30 72.000000 15.000000 443.000000 503.000000 50 121.000000 24.000000 739.000000 837.000000 100 242.000000 49.000000 1485.000000 1673.000000 199 473.000000 98.000000 2685.000000 3327.000000 C. [HugeTLB Pages] nr_pages migration1 migration2 execfault1 execfault2 10 97.000000 79.000000 125.000000 144.000000 30 292.000000 235.000000 408.000000 463.000000 50 487.000000 392.000000 674.000000 777.000000 100 995.000000 802.000000 1480.000000 1671.000000 130 1300.000000 1048.000000 1925.000000 2172.000000 NOTE: migration1: Execution time (ms) for migrating nr_pages without patches migration2: Execution time (ms) for migrating nr_pages with patches execfault1: Execution time (ms) for executing nr_pages without patches execfault2: Execution time (ms) for executing nr_pages with patches ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-14 6:04 ` Anshuman Khandual @ 2019-02-14 8:38 ` Michal Hocko 2019-02-14 10:19 ` Catalin Marinas 2019-02-14 15:38 ` Dave Hansen 1 sibling, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-02-14 8:38 UTC (permalink / raw) To: Anshuman Khandual Cc: Catalin Marinas, linux-mm, akpm, kirill, kirill.shutemov, vbabka, will.deacon, dave.hansen On Thu 14-02-19 11:34:09, Anshuman Khandual wrote: > > > On 02/13/2019 09:08 PM, Michal Hocko wrote: > > On Wed 13-02-19 11:21:36, Catalin Marinas wrote: > >> On Wed, Feb 13, 2019 at 01:36:27PM +0530, Anshuman Khandual wrote: > >>> Setting an exec permission on a page normally triggers I-cache invalidation > >>> which might be expensive. I-cache invalidation is not mandatory on a given > >>> page if there is no immediate exec access on it. Non-fault modification of > >>> user page table from generic memory paths like migration can be improved if > >>> setting of the exec permission on the page can be deferred till actual use. > >>> There was a performance report [1] which highlighted the problem. > >> [...] > >>> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-December/620357.html > >> > >> FTR, this performance regression has been addressed by commit > >> 132fdc379eb1 ("arm64: Do not issue IPIs for user executable ptes"). That > >> said, I still think this patch series is valuable for further optimising > >> the page migration path on arm64 (and can be extended to other > >> architectures that currently require I/D cache maintenance for > >> executable pages). > > > > Are there any numbers to show the optimization impact? > > This series transfers execution cost linearly with nr_pages from migration path > to subsequent exec access path for normal, THP and HugeTLB pages. The experiment > is on mainline kernel (1f947a7a011fcceb14cb912f548) along with some patches for > HugeTLB and THP migration enablement on arm64 platform. Please make sure that these numbers are in the changelog. I am also missing an explanation why this is an overal win. Why should we pay on the later access rather than the migration which is arguably a slower path. What is the usecase that benefits from the cost shift? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-14 8:38 ` Michal Hocko @ 2019-02-14 10:19 ` Catalin Marinas 2019-02-14 12:28 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Catalin Marinas @ 2019-02-14 10:19 UTC (permalink / raw) To: Michal Hocko Cc: Anshuman Khandual, linux-mm, akpm, kirill, kirill.shutemov, vbabka, will.deacon, dave.hansen On Thu, Feb 14, 2019 at 09:38:44AM +0100, Michal Hocko wrote: > On Thu 14-02-19 11:34:09, Anshuman Khandual wrote: > > On 02/13/2019 09:08 PM, Michal Hocko wrote: > > > Are there any numbers to show the optimization impact? > > > > This series transfers execution cost linearly with nr_pages from migration path > > to subsequent exec access path for normal, THP and HugeTLB pages. The experiment > > is on mainline kernel (1f947a7a011fcceb14cb912f548) along with some patches for > > HugeTLB and THP migration enablement on arm64 platform. > > Please make sure that these numbers are in the changelog. I am also > missing an explanation why this is an overal win. Why should we pay > on the later access rather than the migration which is arguably a slower > path. What is the usecase that benefits from the cost shift? Originally the investigation started because of a regression we had sending IPIs on each set_pte_at(PROT_EXEC). This has been fixed separately, so the original value of this patchset has been diminished. Trying to frame the problem, let's analyse the overall cost of migration + execute. Removing other invariants like cost of the initial mapping of the pages or the mapping of new pages after migration, we have: M - number of mapped executable pages just before migration N - number of previously mapped pages that will be executed after migration (N <= M) D - cost of migrating page data I - cost of I-cache maintenance for a page F - cost of an instruction fault (handle_mm_fault() + set_pte_at() without the actual I-cache maintenance) Tc - total migration cost current kernel (including executing) Tp - total migration cost patched kernel (including executing) Tc = M * (D + I) Tp = M * D + N * (F + I) To be useful, we want this patchset to lead to: Tp < Tc Simplifying: M * D + N * (F + I) < M * (D + I) ... F < I * (M - N) / N So the question is, in a *real-world* scenario, what proportion of the mapped executable pages would still be executed from after migration. I'd leave this as a task for Anshuman to investigate and come up with some numbers (and it's fine if it's just in the noise, we won't need this patchset). Also note that there are ARM CPU implementations that don't need I-cache maintenance (the I side can snoop the D side), so for those this patchset introducing an additional cost. But we can make the decision in the arch code via pte_mklazyexec(). We implemented something similar in arm64 KVM (d0e22b4ac3ba "KVM: arm/arm64: Limit icache invalidation to prefetch aborts") but the use-case was different: previously KVM considered all pages executable though the vast majority were only data pages in guests. -- Catalin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-14 10:19 ` Catalin Marinas @ 2019-02-14 12:28 ` Michal Hocko 2019-02-15 8:45 ` Anshuman Khandual 0 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-02-14 12:28 UTC (permalink / raw) To: Catalin Marinas Cc: Anshuman Khandual, linux-mm, akpm, kirill, kirill.shutemov, vbabka, will.deacon, dave.hansen On Thu 14-02-19 10:19:37, Catalin Marinas wrote: > On Thu, Feb 14, 2019 at 09:38:44AM +0100, Michal Hocko wrote: > > On Thu 14-02-19 11:34:09, Anshuman Khandual wrote: > > > On 02/13/2019 09:08 PM, Michal Hocko wrote: > > > > Are there any numbers to show the optimization impact? > > > > > > This series transfers execution cost linearly with nr_pages from migration path > > > to subsequent exec access path for normal, THP and HugeTLB pages. The experiment > > > is on mainline kernel (1f947a7a011fcceb14cb912f548) along with some patches for > > > HugeTLB and THP migration enablement on arm64 platform. > > > > Please make sure that these numbers are in the changelog. I am also > > missing an explanation why this is an overal win. Why should we pay > > on the later access rather than the migration which is arguably a slower > > path. What is the usecase that benefits from the cost shift? > > Originally the investigation started because of a regression we had > sending IPIs on each set_pte_at(PROT_EXEC). This has been fixed > separately, so the original value of this patchset has been diminished. > > Trying to frame the problem, let's analyse the overall cost of migration > + execute. Removing other invariants like cost of the initial mapping of > the pages or the mapping of new pages after migration, we have: > > M - number of mapped executable pages just before migration > N - number of previously mapped pages that will be executed after > migration (N <= M) > D - cost of migrating page data > I - cost of I-cache maintenance for a page > F - cost of an instruction fault (handle_mm_fault() + set_pte_at() > without the actual I-cache maintenance) > > Tc - total migration cost current kernel (including executing) > Tp - total migration cost patched kernel (including executing) > > Tc = M * (D + I) > Tp = M * D + N * (F + I) > > To be useful, we want this patchset to lead to: > > Tp < Tc > > Simplifying: > > M * D + N * (F + I) < M * (D + I) > ... > F < I * (M - N) / N > > So the question is, in a *real-world* scenario, what proportion of the > mapped executable pages would still be executed from after migration. > I'd leave this as a task for Anshuman to investigate and come up with > some numbers (and it's fine if it's just in the noise, we won't need > this patchset). Yeah, betting on accessing only a smaller subset of the migrated memory is something I figured out. But I am really missing a usecase or a larger set of them to actually benefit from it. We have different triggers for a migration. E.g. numa balancing. I would expect that migrated pages are likely to be accessed after migration because the primary reason to migrate them is that they are accessed from a remote node. Then we a compaction which is a completely different story. It is hard to assume any further access for migrated pages here. Then we have an explicit move_pages syscall and I would expect this to be somewhere in the middle. One would expect that the caller knows why the memory is migrated and it will be used but again, we cannot really assume anything. This would suggest that this depends on the migration reason quite a lot. So I would really like to see a more comprehensive analysis of different workloads to see whether this is really worth it. Thanks! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-14 12:28 ` Michal Hocko @ 2019-02-15 8:45 ` Anshuman Khandual 2019-02-15 9:27 ` Michal Hocko 0 siblings, 1 reply; 28+ messages in thread From: Anshuman Khandual @ 2019-02-15 8:45 UTC (permalink / raw) To: Michal Hocko, Catalin Marinas Cc: linux-mm, akpm, kirill, kirill.shutemov, vbabka, will.deacon, dave.hansen a On 02/14/2019 05:58 PM, Michal Hocko wrote: > On Thu 14-02-19 10:19:37, Catalin Marinas wrote: >> On Thu, Feb 14, 2019 at 09:38:44AM +0100, Michal Hocko wrote: >>> On Thu 14-02-19 11:34:09, Anshuman Khandual wrote: >>>> On 02/13/2019 09:08 PM, Michal Hocko wrote: >>>>> Are there any numbers to show the optimization impact? >>>> >>>> This series transfers execution cost linearly with nr_pages from migration path >>>> to subsequent exec access path for normal, THP and HugeTLB pages. The experiment >>>> is on mainline kernel (1f947a7a011fcceb14cb912f548) along with some patches for >>>> HugeTLB and THP migration enablement on arm64 platform. >>> >>> Please make sure that these numbers are in the changelog. I am also >>> missing an explanation why this is an overal win. Why should we pay >>> on the later access rather than the migration which is arguably a slower >>> path. What is the usecase that benefits from the cost shift? >> >> Originally the investigation started because of a regression we had >> sending IPIs on each set_pte_at(PROT_EXEC). This has been fixed >> separately, so the original value of this patchset has been diminished. >> >> Trying to frame the problem, let's analyse the overall cost of migration >> + execute. Removing other invariants like cost of the initial mapping of >> the pages or the mapping of new pages after migration, we have: >> >> M - number of mapped executable pages just before migration >> N - number of previously mapped pages that will be executed after >> migration (N <= M) >> D - cost of migrating page data >> I - cost of I-cache maintenance for a page >> F - cost of an instruction fault (handle_mm_fault() + set_pte_at() >> without the actual I-cache maintenance) >> >> Tc - total migration cost current kernel (including executing) >> Tp - total migration cost patched kernel (including executing) >> >> Tc = M * (D + I) >> Tp = M * D + N * (F + I) >> >> To be useful, we want this patchset to lead to: >> >> Tp < Tc >> >> Simplifying: >> >> M * D + N * (F + I) < M * (D + I) >> ... >> F < I * (M - N) / N >> >> So the question is, in a *real-world* scenario, what proportion of the >> mapped executable pages would still be executed from after migration. >> I'd leave this as a task for Anshuman to investigate and come up with >> some numbers (and it's fine if it's just in the noise, we won't need >> this patchset). > > Yeah, betting on accessing only a smaller subset of the migrated memory > is something I figured out. But I am really missing a usecase or a > larger set of them to actually benefit from it. We have different > triggers for a migration. E.g. numa balancing. I would expect that > migrated pages are likely to be accessed after migration because > the primary reason to migrate them is that they are accessed from a > remote node. Then we a compaction which is a completely different story. That access might not have been an exec fault it could have been bunch of write faults which triggered NUMA migration. So NUMA triggered migration does not necessarily mean continuing exec faults before and after migration. Compaction might move around mapped pages with exec permission which might not have any recent history of exec accesses before compaction or might not even see any future exec access as well. > It is hard to assume any further access for migrated pages here. Then we > have an explicit move_pages syscall and I would expect this to be > somewhere in the middle. One would expect that the caller knows why the > memory is migrated and it will be used but again, we cannot really > assume anything. What if the caller knows that it wont be used ever again or in near future and hence trying to migrate to a different node which has less expensive and slower memory. Kernel should not assume either way on it but can decide to be conservative in spending time in preparing for future exec faults. But being conservative during migration risks additional exec faults which would have been avoided if exec permission should have stayed on followed by an I-cache invalidation. Deferral of the I-cache invalidation requires removing the exec permission completely (unless there is some magic which I am not aware about) i.e unmapping page for exec permission and risking an exec fault next time around. This problem gets particularly amplified for mixed permission (WRITE | EXEC) user space mappings where things like NUMA migration, compaction etc probably gets triggered by write faults and additional exec permission there never really gets used. > > This would suggest that this depends on the migration reason quite a > lot. So I would really like to see a more comprehensive analysis of > different workloads to see whether this is really worth it. Sure. Could you please give some more details on how to go about this and what specifically you are looking for ? User initiated migration through systems calls seems bit tricky as an application can be written primarily to benefit from this series. If real world applications can help give some better insights then which ones I wonder. Or do we need to understand more about compaction and NUMA triggered migration which are kernel driven. Statistics from compaction/NUMA migration can reveal what ratio of the exec enabled mapping gets exec faulted again later on after kernel driven migrations (compaction/NUMA) which are more or less random without depending too much on application behavior. - Anshuman ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-15 8:45 ` Anshuman Khandual @ 2019-02-15 9:27 ` Michal Hocko 2019-02-18 3:07 ` Anshuman Khandual 0 siblings, 1 reply; 28+ messages in thread From: Michal Hocko @ 2019-02-15 9:27 UTC (permalink / raw) To: Anshuman Khandual Cc: Catalin Marinas, linux-mm, akpm, kirill, kirill.shutemov, vbabka, will.deacon, dave.hansen On Fri 15-02-19 14:15:58, Anshuman Khandual wrote: > On 02/14/2019 05:58 PM, Michal Hocko wrote: > > It is hard to assume any further access for migrated pages here. Then we > > have an explicit move_pages syscall and I would expect this to be > > somewhere in the middle. One would expect that the caller knows why the > > memory is migrated and it will be used but again, we cannot really > > assume anything. > > What if the caller knows that it wont be used ever again or in near future > and hence trying to migrate to a different node which has less expensive and > slower memory. Kernel should not assume either way on it but can decide to > be conservative in spending time in preparing for future exec faults. > > But being conservative during migration risks additional exec faults which > would have been avoided if exec permission should have stayed on followed > by an I-cache invalidation. Deferral of the I-cache invalidation requires > removing the exec permission completely (unless there is some magic which > I am not aware about) i.e unmapping page for exec permission and risking > an exec fault next time around. > > This problem gets particularly amplified for mixed permission (WRITE | EXEC) > user space mappings where things like NUMA migration, compaction etc probably > gets triggered by write faults and additional exec permission there never > really gets used. Please quantify that and provide us with some _data_ > > This would suggest that this depends on the migration reason quite a > > lot. So I would really like to see a more comprehensive analysis of > > different workloads to see whether this is really worth it. > > Sure. Could you please give some more details on how to go about this and > what specifically you are looking for ? You are proposing an optimization without actually providing any justification. The overhead is not removed it is just shifted from one path to another. So you should have some pretty convincing arguments to back that shift as a general win. You can go an test on wider range of workloads and isolate the worst/best case behavior. I fully realize that this is tedious. Another option would be to define conditions when the optimization is going to be a huge win and have some convincing arguments that many/most workloads are falling into that category while pathological ones are not suffering much. This is no different from any other optimizations/heuristics we have. Btw. have you considered to have this optimization conditional based on the migration reason or vma flags? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-15 9:27 ` Michal Hocko @ 2019-02-18 3:07 ` Anshuman Khandual 0 siblings, 0 replies; 28+ messages in thread From: Anshuman Khandual @ 2019-02-18 3:07 UTC (permalink / raw) To: Michal Hocko Cc: Catalin Marinas, linux-mm, akpm, kirill, kirill.shutemov, vbabka, will.deacon, dave.hansen On 02/15/2019 02:57 PM, Michal Hocko wrote: > On Fri 15-02-19 14:15:58, Anshuman Khandual wrote: >> On 02/14/2019 05:58 PM, Michal Hocko wrote: >>> It is hard to assume any further access for migrated pages here. Then we >>> have an explicit move_pages syscall and I would expect this to be >>> somewhere in the middle. One would expect that the caller knows why the >>> memory is migrated and it will be used but again, we cannot really >>> assume anything. >> >> What if the caller knows that it wont be used ever again or in near future >> and hence trying to migrate to a different node which has less expensive and >> slower memory. Kernel should not assume either way on it but can decide to >> be conservative in spending time in preparing for future exec faults. >> >> But being conservative during migration risks additional exec faults which >> would have been avoided if exec permission should have stayed on followed >> by an I-cache invalidation. Deferral of the I-cache invalidation requires >> removing the exec permission completely (unless there is some magic which >> I am not aware about) i.e unmapping page for exec permission and risking >> an exec fault next time around. >> >> This problem gets particularly amplified for mixed permission (WRITE | EXEC) >> user space mappings where things like NUMA migration, compaction etc probably >> gets triggered by write faults and additional exec permission there never >> really gets used. > > Please quantify that and provide us with some _data_> >>> This would suggest that this depends on the migration reason quite a >>> lot. So I would really like to see a more comprehensive analysis of >>> different workloads to see whether this is really worth it. >> >> Sure. Could you please give some more details on how to go about this and >> what specifically you are looking for ? > > You are proposing an optimization without actually providing any > justification. The overhead is not removed it is just shifted from one > path to another. So you should have some pretty convincing arguments > to back that shift as a general win. You can go an test on wider range > of workloads and isolate the worst/best case behavior. I fully realize > that this is tedious. Another option would be to define conditions when > the optimization is going to be a huge win and have some convincing Yeah conditional approach might narrow down the field and provide better probability for a general win. The system call (move_pages/mbind) based migrations from the user space are better placed for an win because the user might just want to put those pages aside for rare exec accesses in the future and the worst case cost for those deferral is not too high as well. A hint regarding probable rare exec access in the future for the kernel would have been better but I am afraid it would then require a new user interface. But I think lazy exec decision can be taken right away for MR_SYSCALL triggered migrations for VMAs with mixed permission ([VM_READ]|VM_WRITE|VM_EXEC) knowing the fact that in worst case the cost is just getting migrated. MR_NUMA_MISPLACED triggered migrations requires explicit tracking of fault type (exec/write/[read]) per VMA along with it's applicable permission to determine if exec permission deferral would be helpful or not. These stats can also be used for all other kernel or user initiated migrations like MR_COMPACTION, MR_MEMORY_FAILURE, MR_MEMORY_HOTPLUG and MR_CONTIG_RANGE. Would it be worth adding explicit fault type tracking per VMA ? Can it be used for some other purpose as well. > arguments that many/most workloads are falling into that category while > pathological ones are not suffering much. > > This is no different from any other optimizations/heuristics we have. Sure. Will think about this further. > > Btw. have you considered to have this optimization conditional based on > the migration reason or vma flags? Started considering it after our discussions here. It makes sense to look into the migration reason and the VMA flags right away but as I mentioned earlier VMA fault type stats can really help on this as well. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-14 6:04 ` Anshuman Khandual 2019-02-14 8:38 ` Michal Hocko @ 2019-02-14 15:38 ` Dave Hansen 2019-02-18 3:19 ` Anshuman Khandual 1 sibling, 1 reply; 28+ messages in thread From: Dave Hansen @ 2019-02-14 15:38 UTC (permalink / raw) To: Anshuman Khandual, Michal Hocko, Catalin Marinas Cc: linux-mm, akpm, kirill, kirill.shutemov, vbabka, will.deacon On 2/13/19 10:04 PM, Anshuman Khandual wrote: >> Are there any numbers to show the optimization impact? > This series transfers execution cost linearly with nr_pages from migration path > to subsequent exec access path for normal, THP and HugeTLB pages. The experiment > is on mainline kernel (1f947a7a011fcceb14cb912f548) along with some patches for > HugeTLB and THP migration enablement on arm64 platform. > > A. [Normal Pages] > > nr_pages migration1 migration2 execfault1 execfault2 > > 1000 7.000000 3.000000 24.000000 31.000000 > 5000 38.000000 18.000000 127.000000 153.000000 > 10000 80.000000 40.000000 289.000000 343.000000 > 15000 120.000000 60.000000 435.000000 514.000000 > 19900 159.000000 79.000000 576.000000 681.000000 Do these numbers comprehend the increased fault costs or just the decreased migration costs? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-14 15:38 ` Dave Hansen @ 2019-02-18 3:19 ` Anshuman Khandual 0 siblings, 0 replies; 28+ messages in thread From: Anshuman Khandual @ 2019-02-18 3:19 UTC (permalink / raw) To: Dave Hansen, Michal Hocko, Catalin Marinas Cc: linux-mm, akpm, kirill, kirill.shutemov, vbabka, will.deacon On 02/14/2019 09:08 PM, Dave Hansen wrote: > On 2/13/19 10:04 PM, Anshuman Khandual wrote: >>> Are there any numbers to show the optimization impact? >> This series transfers execution cost linearly with nr_pages from migration path >> to subsequent exec access path for normal, THP and HugeTLB pages. The experiment >> is on mainline kernel (1f947a7a011fcceb14cb912f548) along with some patches for >> HugeTLB and THP migration enablement on arm64 platform. >> >> A. [Normal Pages] >> >> nr_pages migration1 migration2 execfault1 execfault2 >> >> 1000 7.000000 3.000000 24.000000 31.000000 >> 5000 38.000000 18.000000 127.000000 153.000000 >> 10000 80.000000 40.000000 289.000000 343.000000 >> 15000 120.000000 60.000000 435.000000 514.000000 >> 19900 159.000000 79.000000 576.000000 681.000000 > > Do these numbers comprehend the increased fault costs or just the > decreased migration costs? Both. It transfers cost from migration path to exec fault path. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-13 8:06 [RFC 0/4] mm: Introduce lazy exec permission setting on a page Anshuman Khandual ` (4 preceding siblings ...) 2019-02-13 11:21 ` [RFC 0/4] mm: Introduce lazy exec permission setting on a page Catalin Marinas @ 2019-02-13 15:44 ` Dave Hansen 2019-02-14 4:12 ` Anshuman Khandual 5 siblings, 1 reply; 28+ messages in thread From: Dave Hansen @ 2019-02-13 15:44 UTC (permalink / raw) To: Anshuman Khandual, linux-mm, akpm Cc: mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas On 2/13/19 12:06 AM, Anshuman Khandual wrote: > Setting an exec permission on a page normally triggers I-cache invalidation > which might be expensive. I-cache invalidation is not mandatory on a given > page if there is no immediate exec access on it. Non-fault modification of > user page table from generic memory paths like migration can be improved if > setting of the exec permission on the page can be deferred till actual use. > There was a performance report [1] which highlighted the problem. How does this happen? If the page was not executed, then it'll (presumably) be non-present which won't require icache invalidation. So, this would only be for pages that have been executed (and won't again before the next migration), *or* for pages that were mapped executable but never executed. Any idea which one it is? If it's pages that got mapped in but were never executed, how did that happen? Was it fault-around? If so, maybe it would just be simpler to not do fault-around for executable pages on these platforms. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-13 15:44 ` Dave Hansen @ 2019-02-14 4:12 ` Anshuman Khandual 2019-02-14 16:55 ` Dave Hansen 0 siblings, 1 reply; 28+ messages in thread From: Anshuman Khandual @ 2019-02-14 4:12 UTC (permalink / raw) To: Dave Hansen, linux-mm, akpm Cc: mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas On 02/13/2019 09:14 PM, Dave Hansen wrote: > On 2/13/19 12:06 AM, Anshuman Khandual wrote: >> Setting an exec permission on a page normally triggers I-cache invalidation >> which might be expensive. I-cache invalidation is not mandatory on a given >> page if there is no immediate exec access on it. Non-fault modification of >> user page table from generic memory paths like migration can be improved if >> setting of the exec permission on the page can be deferred till actual use. >> There was a performance report [1] which highlighted the problem. > > How does this happen? If the page was not executed, then it'll > (presumably) be non-present which won't require icache invalidation. > So, this would only be for pages that have been executed (and won't > again before the next migration), *or* for pages that were mapped > executable but never executed. I-cache invalidation happens while migrating a 'mapped and executable' page irrespective whether that page was really executed for being mapped there in the first place. > > Any idea which one it is? > I am not sure about this particular reported case. But was able to reproduce the problem through a test case where a buffer was mapped with R|W|X, get it faulted/mapped through write, migrate and then execute from it. > If it's pages that got mapped in but were never executed, how did that > happen? Was it fault-around? If so, maybe it would just be simpler to > not do fault-around for executable pages on these platforms. Page can get mapped through a different access (write) without being executed. Even if it got mapped through execution and subsequent invalidation, the invalidation does not have to be repeated again after migration without first getting an exec access subsequently. This series just tries to hold off the invalidation after migration till subsequent exec access. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-14 4:12 ` Anshuman Khandual @ 2019-02-14 16:55 ` Dave Hansen 2019-02-18 8:31 ` Anshuman Khandual 0 siblings, 1 reply; 28+ messages in thread From: Dave Hansen @ 2019-02-14 16:55 UTC (permalink / raw) To: Anshuman Khandual, linux-mm, akpm Cc: mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas On 2/13/19 8:12 PM, Anshuman Khandual wrote: > On 02/13/2019 09:14 PM, Dave Hansen wrote: >> On 2/13/19 12:06 AM, Anshuman Khandual wrote: >>> Setting an exec permission on a page normally triggers I-cache invalidation >>> which might be expensive. I-cache invalidation is not mandatory on a given >>> page if there is no immediate exec access on it. Non-fault modification of >>> user page table from generic memory paths like migration can be improved if >>> setting of the exec permission on the page can be deferred till actual use. >>> There was a performance report [1] which highlighted the problem. >> >> How does this happen? If the page was not executed, then it'll >> (presumably) be non-present which won't require icache invalidation. >> So, this would only be for pages that have been executed (and won't >> again before the next migration), *or* for pages that were mapped >> executable but never executed. > I-cache invalidation happens while migrating a 'mapped and executable' page > irrespective whether that page was really executed for being mapped there > in the first place. Ahh, got it. I also assume that the Accessed bit on these platforms is also managed similar to how we do it on x86 such that it can't be used to drive invalidation decisions? >> Any idea which one it is? > > I am not sure about this particular reported case. But was able to reproduce > the problem through a test case where a buffer was mapped with R|W|X, get it > faulted/mapped through write, migrate and then execute from it. Could you make sure, please? Write and Execute at the same time are generally a "bad idea". Given the hardware, I'm not surprised that this problem pops up, but it would be great to find out if this is a real application, or a "doctor it hurts when I do this." >> If it's pages that got mapped in but were never executed, how did that >> happen? Was it fault-around? If so, maybe it would just be simpler to >> not do fault-around for executable pages on these platforms. > Page can get mapped through a different access (write) without being executed. > Even if it got mapped through execution and subsequent invalidation, the > invalidation does not have to be repeated again after migration without first > getting an exec access subsequently. This series just tries to hold off the > invalidation after migration till subsequent exec access. This set generally seems to be assuming an environment with "lots of migration, and not much execution". That seems like a kinda odd situation to me. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-14 16:55 ` Dave Hansen @ 2019-02-18 8:31 ` Anshuman Khandual 2019-02-18 9:04 ` Catalin Marinas 2019-02-18 18:20 ` Dave Hansen 0 siblings, 2 replies; 28+ messages in thread From: Anshuman Khandual @ 2019-02-18 8:31 UTC (permalink / raw) To: Dave Hansen, linux-mm, akpm Cc: mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas On 02/14/2019 10:25 PM, Dave Hansen wrote: > On 2/13/19 8:12 PM, Anshuman Khandual wrote: >> On 02/13/2019 09:14 PM, Dave Hansen wrote: >>> On 2/13/19 12:06 AM, Anshuman Khandual wrote: >>>> Setting an exec permission on a page normally triggers I-cache invalidation >>>> which might be expensive. I-cache invalidation is not mandatory on a given >>>> page if there is no immediate exec access on it. Non-fault modification of >>>> user page table from generic memory paths like migration can be improved if >>>> setting of the exec permission on the page can be deferred till actual use. >>>> There was a performance report [1] which highlighted the problem. >>> >>> How does this happen? If the page was not executed, then it'll >>> (presumably) be non-present which won't require icache invalidation. >>> So, this would only be for pages that have been executed (and won't >>> again before the next migration), *or* for pages that were mapped >>> executable but never executed. >> I-cache invalidation happens while migrating a 'mapped and executable' page >> irrespective whether that page was really executed for being mapped there >> in the first place. > > Ahh, got it. I also assume that the Accessed bit on these platforms is > also managed similar to how we do it on x86 such that it can't be used > to drive invalidation decisions? Drive I-cache invalidation ? Could you please elaborate on this. Is not that the access bit mechanism is to identify dirty pages after write faults when it is SW updated or write accesses when HW updated. In SW updated method, given PTE goes through pte_young() during page fault. Then how to differentiate exec fault/access from an write fault/access and decide to invalidate the I-cache. Just being curious. > >>> Any idea which one it is? >> >> I am not sure about this particular reported case. But was able to reproduce >> the problem through a test case where a buffer was mapped with R|W|X, get it >> faulted/mapped through write, migrate and then execute from it. > > Could you make sure, please? The test in the report [1] does not create any explicit PROT_EXEC maps and just attempts to migrate all pages of the process (which has 10 child processes) including the exec pages. So the only exec mappings would be the primary text segment and the mapped shared glibc segment. Looks like the shared libraries have some mapped pages. $cat /proc/[PID]/numa_maps | grep libc ffffaa4c9000 default file=/lib/aarch64-linux-gnu/libc-2.28.so mapped=150 mapmax=57 N0=150 kernelpagesize_kB=4 ffffaa621000 default file=/lib/aarch64-linux-gnu/libc-2.28.so ffffaa630000 default file=/lib/aarch64-linux-gnu/libc-2.28.so anon=4 dirty=4 mapmax=11 N0=4 kernelpagesize_kB=4 ffffaa634000 default file=/lib/aarch64-linux-gnu/libc-2.28.so anon=2 dirty=2 mapmax=11 N0=2 kernelpagesize_kB=4 Will keep looking into this. > > Write and Execute at the same time are generally a "bad idea". Given But wont this be the case for all run-time generate code which gets written to a buffer before being executed from there. > the hardware, I'm not surprised that this problem pops up, but it would > be great to find out if this is a real application, or a "doctor it > hurts when I do this." Is not that a problem though :) > >>> If it's pages that got mapped in but were never executed, how did that >>> happen? Was it fault-around? If so, maybe it would just be simpler to >>> not do fault-around for executable pages on these platforms. >> Page can get mapped through a different access (write) without being executed. >> Even if it got mapped through execution and subsequent invalidation, the >> invalidation does not have to be repeated again after migration without first >> getting an exec access subsequently. This series just tries to hold off the >> invalidation after migration till subsequent exec access. > > This set generally seems to be assuming an environment with "lots of > migration, and not much execution". That seems like a kinda odd > situation to me. Irrespective of the reported problem which is user driven, there are many kernel triggered migrations which can accumulate I-cache invalidation cost over time on a memory heavy system with high number of exec enabled user pages. Will that be such a rare situation ! [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-December/620357.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-18 8:31 ` Anshuman Khandual @ 2019-02-18 9:04 ` Catalin Marinas 2019-02-18 9:16 ` Anshuman Khandual 2019-02-18 18:20 ` Dave Hansen 1 sibling, 1 reply; 28+ messages in thread From: Catalin Marinas @ 2019-02-18 9:04 UTC (permalink / raw) To: Anshuman Khandual Cc: Dave Hansen, linux-mm, akpm, mhocko, kirill, kirill.shutemov, vbabka, will.deacon On Mon, Feb 18, 2019 at 02:01:55PM +0530, Anshuman Khandual wrote: > On 02/14/2019 10:25 PM, Dave Hansen wrote: > > On 2/13/19 8:12 PM, Anshuman Khandual wrote: > >> On 02/13/2019 09:14 PM, Dave Hansen wrote: > >>> On 2/13/19 12:06 AM, Anshuman Khandual wrote: > >>>> Setting an exec permission on a page normally triggers I-cache invalidation > >>>> which might be expensive. I-cache invalidation is not mandatory on a given > >>>> page if there is no immediate exec access on it. Non-fault modification of > >>>> user page table from generic memory paths like migration can be improved if > >>>> setting of the exec permission on the page can be deferred till actual use. > >>>> There was a performance report [1] which highlighted the problem. > >>> > >>> How does this happen? If the page was not executed, then it'll > >>> (presumably) be non-present which won't require icache invalidation. > >>> So, this would only be for pages that have been executed (and won't > >>> again before the next migration), *or* for pages that were mapped > >>> executable but never executed. > >> I-cache invalidation happens while migrating a 'mapped and executable' page > >> irrespective whether that page was really executed for being mapped there > >> in the first place. > > > > Ahh, got it. I also assume that the Accessed bit on these platforms is > > also managed similar to how we do it on x86 such that it can't be used > > to drive invalidation decisions? > > Drive I-cache invalidation ? Could you please elaborate on this. Is not that > the access bit mechanism is to identify dirty pages after write faults when > it is SW updated or write accesses when HW updated. In SW updated method, given > PTE goes through pte_young() during page fault. Then how to differentiate exec > fault/access from an write fault/access and decide to invalidate the I-cache. > Just being curious. The access flag is used to identify young/old pages only (the dirty bit is used to track writes to a page). Depending on the Arm implementation, the access bit/flag could be managed by hardware transparently, so no fault taken to the kernel on accessing through an 'old' pte. -- Catalin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-18 9:04 ` Catalin Marinas @ 2019-02-18 9:16 ` Anshuman Khandual 0 siblings, 0 replies; 28+ messages in thread From: Anshuman Khandual @ 2019-02-18 9:16 UTC (permalink / raw) To: Catalin Marinas Cc: Dave Hansen, linux-mm, akpm, mhocko, kirill, kirill.shutemov, vbabka, will.deacon On 02/18/2019 02:34 PM, Catalin Marinas wrote: > On Mon, Feb 18, 2019 at 02:01:55PM +0530, Anshuman Khandual wrote: >> On 02/14/2019 10:25 PM, Dave Hansen wrote: >>> On 2/13/19 8:12 PM, Anshuman Khandual wrote: >>>> On 02/13/2019 09:14 PM, Dave Hansen wrote: >>>>> On 2/13/19 12:06 AM, Anshuman Khandual wrote: >>>>>> Setting an exec permission on a page normally triggers I-cache invalidation >>>>>> which might be expensive. I-cache invalidation is not mandatory on a given >>>>>> page if there is no immediate exec access on it. Non-fault modification of >>>>>> user page table from generic memory paths like migration can be improved if >>>>>> setting of the exec permission on the page can be deferred till actual use. >>>>>> There was a performance report [1] which highlighted the problem. >>>>> >>>>> How does this happen? If the page was not executed, then it'll >>>>> (presumably) be non-present which won't require icache invalidation. >>>>> So, this would only be for pages that have been executed (and won't >>>>> again before the next migration), *or* for pages that were mapped >>>>> executable but never executed. >>>> I-cache invalidation happens while migrating a 'mapped and executable' page >>>> irrespective whether that page was really executed for being mapped there >>>> in the first place. >>> >>> Ahh, got it. I also assume that the Accessed bit on these platforms is >>> also managed similar to how we do it on x86 such that it can't be used >>> to drive invalidation decisions? >> >> Drive I-cache invalidation ? Could you please elaborate on this. Is not that >> the access bit mechanism is to identify dirty pages after write faults when >> it is SW updated or write accesses when HW updated. In SW updated method, given >> PTE goes through pte_young() during page fault. Then how to differentiate exec >> fault/access from an write fault/access and decide to invalidate the I-cache. >> Just being curious. > > The access flag is used to identify young/old pages only (the dirty bit > is used to track writes to a page). Depending on the Arm implementation, > the access bit/flag could be managed by hardware transparently, so no > fault taken to the kernel on accessing through an 'old' pte. Then there is no way to identify an exec fault with either of the facilities of access/reference bit or dirty bit whether managed by SW or HW. Still wondering about previous comment where Dave mentioned how it can be used for I-cache invalidation. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page 2019-02-18 8:31 ` Anshuman Khandual 2019-02-18 9:04 ` Catalin Marinas @ 2019-02-18 18:20 ` Dave Hansen 1 sibling, 0 replies; 28+ messages in thread From: Dave Hansen @ 2019-02-18 18:20 UTC (permalink / raw) To: Anshuman Khandual, linux-mm, akpm Cc: mhocko, kirill, kirill.shutemov, vbabka, will.deacon, catalin.marinas On 2/18/19 12:31 AM, Anshuman Khandual wrote: >> Ahh, got it. I also assume that the Accessed bit on these platforms is >> also managed similar to how we do it on x86 such that it can't be used >> to drive invalidation decisions? > > Drive I-cache invalidation ? Could you please elaborate on this. Is not that > the access bit mechanism is to identify dirty pages after write faults when > it is SW updated or write accesses when HW updated. In SW updated method, given > PTE goes through pte_young() during page fault. Then how to differentiate exec > fault/access from an write fault/access and decide to invalidate the I-cache. > Just being curious. Let's say this was on x86 where the Accessed bit is set by the hardware on any access. Let's also say that Linux invalidated the TLB any time that bit was cleared in software (it doesn't, but let's pretend it did). In that case, if we needed to do icache invalidation, we could optimize it by only invalidating the icache when we see the Accessed bit set. That's because any execution would first set the Accessed bit before the icache was populated. So, my question >>>> Any idea which one it is? >>> >>> I am not sure about this particular reported case. But was able to reproduce >>> the problem through a test case where a buffer was mapped with R|W|X, get it >>> faulted/mapped through write, migrate and then execute from it. >> >> Could you make sure, please? > > The test in the report [1] does not create any explicit PROT_EXEC maps and just > attempts to migrate all pages of the process (which has 10 child processes) > including the exec pages. So the only exec mappings would be the primary text > segment and the mapped shared glibc segment. Looks like the shared libraries > have some mapped pages. Yeah, but the executable ones are also read-only in your example: > $cat /proc/[PID]/numa_maps | grep libc > > ffffaa4c9000 default file=/lib/aarch64-linux-gnu/libc-2.28.so mapped=150 mapmax=57 N0=150 kernelpagesize_kB=4 ^ These are all page-cache, executable and read-only. > ffffaa621000 default file=/lib/aarch64-linux-gnu/libc-2.28.so > ffffaa630000 default file=/lib/aarch64-linux-gnu/libc-2.28.so anon=4 dirty=4 mapmax=11 N0=4 kernelpagesize_kB=4 > ffffaa634000 default file=/lib/aarch64-linux-gnu/libc-2.28.so anon=2 dirty=2 mapmax=11 N0=2 kernelpagesize_kB=4 This last one is the only read-write one and it's not executable. >> Write and Execute at the same time are generally a "bad idea". Given > > But wont this be the case for all run-time generate code which gets written to a > buffer before being executed from there. No. They usually are r=1,w=1,x=0, then transition to r=1,w=0,x=1. It's never simultaneously executable and writable. >> the hardware, I'm not surprised that this problem pops up, but it would >> be great to find out if this is a real application, or a "doctor it >> hurts when I do this." > > Is not that a problem though :) The point is that it's not a real-world problem. You can certainly expose this, but do *real* apps do this rather than something entirely synthetic? >> This set generally seems to be assuming an environment with "lots of >> migration, and not much execution". That seems like a kinda odd >> situation to me. > > Irrespective of the reported problem which is user driven, there are many kernel > triggered migrations which can accumulate I-cache invalidation cost over time on > a memory heavy system with high number of exec enabled user pages. Will that be > such a rare situation ! > > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-December/620357.html I translate "trivial C application" to "highly synthetic microbenchmark". I suspect what's happening here is that somebody wrote a micro that worked well on x86, although it was being rather stupid. Somebody got an arm system, and voila: it's slower. Someone says "Oh, this arm system is slower than x86!" Again, the big questions you have real-world applications with writable, executable pages? The kernel essentially has *zero* of these because they're such a massive security risk. Adding this feature will encourage folks to replicate this massive security risk in userspace. Seems like a bad idea. ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2019-02-18 18:20 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-13 8:06 [RFC 0/4] mm: Introduce lazy exec permission setting on a page Anshuman Khandual 2019-02-13 8:06 ` [RFC 1/4] " Anshuman Khandual 2019-02-13 13:17 ` Matthew Wilcox 2019-02-13 13:53 ` Anshuman Khandual 2019-02-14 9:06 ` Mike Rapoport 2019-02-15 8:11 ` Anshuman Khandual 2019-02-15 9:49 ` Catalin Marinas 2019-02-13 8:06 ` [RFC 2/4] arm64/mm: Identify user level instruction faults Anshuman Khandual 2019-02-13 8:06 ` [RFC 3/4] arm64/mm: Allow non-exec to exec transition in ptep_set_access_flags() Anshuman Khandual 2019-02-13 8:06 ` [RFC 4/4] arm64/mm: Enable ARCH_SUPPORTS_LAZY_EXEC Anshuman Khandual 2019-02-13 11:21 ` [RFC 0/4] mm: Introduce lazy exec permission setting on a page Catalin Marinas 2019-02-13 15:38 ` Michal Hocko 2019-02-14 6:04 ` Anshuman Khandual 2019-02-14 8:38 ` Michal Hocko 2019-02-14 10:19 ` Catalin Marinas 2019-02-14 12:28 ` Michal Hocko 2019-02-15 8:45 ` Anshuman Khandual 2019-02-15 9:27 ` Michal Hocko 2019-02-18 3:07 ` Anshuman Khandual 2019-02-14 15:38 ` Dave Hansen 2019-02-18 3:19 ` Anshuman Khandual 2019-02-13 15:44 ` Dave Hansen 2019-02-14 4:12 ` Anshuman Khandual 2019-02-14 16:55 ` Dave Hansen 2019-02-18 8:31 ` Anshuman Khandual 2019-02-18 9:04 ` Catalin Marinas 2019-02-18 9:16 ` Anshuman Khandual 2019-02-18 18:20 ` Dave Hansen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox