linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, ak@linux.intel.com,
	kirill.shutemov@linux.intel.com, mhocko@suse.com,
	akpm@linux-foundation.org, hpa@zytor.com,
	harish.srinivasappa@intel.com, lukasz.odzioba@intel.com,
	grzegorz.andrejczuk@intel.com, lukasz.daniluk@intel.com
Subject: Re: [PATCH v2] Linux VM workaround for Knights Landing A/D leak
Date: Tue, 14 Jun 2016 20:10:02 +0200	[thread overview]
Message-ID: <20160614181002.GA30049@pd.tnic> (raw)
In-Reply-To: <1465923672-14232-1-git-send-email-lukasz.anaczkowski@intel.com>

On Tue, Jun 14, 2016 at 07:01:12PM +0200, Lukasz Anaczkowski wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Knights Landing has a issue that a thread setting A or D bits
> may not do so atomically against checking the present bit.
> A thread which is going to page fault may still set those
> bits, even though the present bit was already atomically cleared.
> 
> This implies that when the kernel clears present atomically,
> some time later the supposed to be zero entry could be corrupted
> with stray A or D bits.
> 
> Since the PTE could be already used for storing a swap index,
> or a NUMA migration index, this cannot be tolerated. Most
> of the time the kernel detects the problem, but in some
> rare cases it may not.
> 
> This patch enforces that the page unmap path in vmscan/direct reclaim
> always flushes other CPUs after clearing each page, and also
> clears the PTE again after the flush.
> 
> For reclaim this brings the performance back to before Mel's
> flushing changes, but for unmap it disables batching.
> 
> This makes sure any leaked A/D bits are immediately cleared before the entry
> is used for something else.
> 
> Any parallel faults that check for entry is zero may loop,
> but they should eventually recover after the entry is written.
> 
> Also other users may spin in the page table lock until we
> "fixed" the PTE. This is ensured by always taking the page table lock
> even for the swap cache case. Previously this was only done
> on architectures with non atomic PTE accesses (such as 32bit PTE),
> but now it is also done when this bug workaround is active.
> 
> I audited apply_pte_range and other users of arch_enter_lazy...
> and they seem to all not clear the present bit.
> 
> Right now the extra flush is done in the low level
> architecture code, while the higher level code still
> does batched TLB flush. This means there is always one extra
> unnecessary TLB flush now. As a followon optimization
> this could be avoided by telling the callers that
> the flush already happenend.
> 
> v2 (Lukasz Anaczkowski):
>     () added call to smp_mb__after_atomic() to synchornize with
>        switch_mm, based on Nadav's comment
>     () fixed compilation breakage
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
> ---
>  arch/x86/include/asm/cpufeatures.h |  1 +
>  arch/x86/include/asm/hugetlb.h     |  9 ++++++++-
>  arch/x86/include/asm/pgtable.h     |  5 +++++
>  arch/x86/include/asm/pgtable_64.h  |  6 ++++++
>  arch/x86/kernel/cpu/intel.c        | 10 ++++++++++
>  arch/x86/mm/tlb.c                  | 22 ++++++++++++++++++++++
>  include/linux/mm.h                 |  4 ++++
>  mm/memory.c                        |  3 ++-
>  8 files changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 4a41348..2c48011 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -303,6 +303,7 @@
>  #define X86_BUG_SYSRET_SS_ATTRS	X86_BUG(8) /* SYSRET doesn't fix up SS attrs */
>  #define X86_BUG_NULL_SEG	X86_BUG(9) /* Nulling a selector preserves the base */
>  #define X86_BUG_SWAPGS_FENCE	X86_BUG(10) /* SWAPGS without input dep on GS */
> +#define X86_BUG_PTE_LEAK        X86_BUG(11) /* PTE may leak A/D bits after clear */
>  
>  
>  #ifdef CONFIG_X86_32
> diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
> index 3a10616..58e1ca9 100644
> --- a/arch/x86/include/asm/hugetlb.h
> +++ b/arch/x86/include/asm/hugetlb.h
> @@ -41,10 +41,17 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>  	set_pte_at(mm, addr, ptep, pte);
>  }
>  
> +extern void fix_pte_leak(struct mm_struct *mm, unsigned long addr,
> +			 pte_t *ptep);
> +
>  static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>  					    unsigned long addr, pte_t *ptep)
>  {
> -	return ptep_get_and_clear(mm, addr, ptep);
> +	pte_t pte = ptep_get_and_clear(mm, addr, ptep);
> +
> +	if (boot_cpu_has_bug(X86_BUG_PTE_LEAK))

static_cpu_has_bug()

> +		fix_pte_leak(mm, addr, ptep);
> +	return pte;
>  }
>  
>  static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 1a27396..9769355 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -794,11 +794,16 @@ extern int ptep_test_and_clear_young(struct vm_area_struct *vma,
>  extern int ptep_clear_flush_young(struct vm_area_struct *vma,
>  				  unsigned long address, pte_t *ptep);
>  
> +extern void fix_pte_leak(struct mm_struct *mm, unsigned long addr,
> +			 pte_t *ptep);
> +
>  #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
>  static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
>  				       pte_t *ptep)
>  {
>  	pte_t pte = native_ptep_get_and_clear(ptep);
> +	if (boot_cpu_has_bug(X86_BUG_PTE_LEAK))

static_cpu_has_bug()

> +		fix_pte_leak(mm, addr, ptep);
>  	pte_update(mm, addr, ptep);
>  	return pte;
>  }
> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
> index 2ee7811..6fa4079 100644
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -178,6 +178,12 @@ extern void cleanup_highmap(void);
>  extern void init_extra_mapping_uc(unsigned long phys, unsigned long size);
>  extern void init_extra_mapping_wb(unsigned long phys, unsigned long size);
>  
> +#define ARCH_HAS_NEEDS_SWAP_PTL 1
> +static inline bool arch_needs_swap_ptl(void)
> +{
> +       return boot_cpu_has_bug(X86_BUG_PTE_LEAK);

static_cpu_has_bug()

> +}
> +
>  #endif /* !__ASSEMBLY__ */
>  
>  #endif /* _ASM_X86_PGTABLE_64_H */
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 6e2ffbe..f499513 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -181,6 +181,16 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>  		}
>  	}
>  
> +	if (c->x86_model == 87) {

That should be INTEL_FAM6_XEON_PHI_KNL, AFAICT.

> +		static bool printed;
> +
> +		if (!printed) {
> +			pr_info("Enabling PTE leaking workaround\n");
> +			printed = true;
> +		}

pr_info_once

> +		set_cpu_bug(c, X86_BUG_PTE_LEAK);
> +	}
> +
>  	/*
>  	 * Intel Quark Core DevMan_001.pdf section 6.4.11
>  	 * "The operating system also is required to invalidate (i.e., flush)
-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-06-14 18:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-14 15:58 [PATCH] " Lukasz Anaczkowski
2016-06-14 16:31 ` kbuild test robot
2016-06-14 16:47 ` Nadav Amit
2016-06-14 16:54   ` Anaczkowski, Lukasz
2016-06-14 17:01   ` [PATCH v2] " Lukasz Anaczkowski
2016-06-14 17:24     ` Dave Hansen
2016-06-14 18:34       ` One Thousand Gnomes
2016-06-14 18:54         ` Dave Hansen
2016-06-14 19:19           ` Borislav Petkov
2016-06-14 20:20             ` H. Peter Anvin
2016-06-14 20:47               ` Borislav Petkov
2016-06-14 20:54                 ` H. Peter Anvin
2016-06-14 21:02                   ` Borislav Petkov
2016-06-14 21:08                     ` H. Peter Anvin
2016-06-14 21:13                     ` H. Peter Anvin
2016-06-14 18:10     ` Borislav Petkov [this message]
2016-06-15 13:12       ` Anaczkowski, Lukasz
2016-06-14 18:38     ` Nadav Amit
2016-06-15 13:12       ` Anaczkowski, Lukasz
2016-06-15 20:04         ` Nadav Amit
2016-06-15 20:10           ` Dave Hansen
2016-06-15 20:26             ` Nadav Amit
2016-06-16 15:14     ` [PATCH v3] " Lukasz Anaczkowski
2016-06-16 16:43       ` Nadav Amit
2016-06-16 20:23       ` Dave Hansen
2016-06-14 17:18   ` [PATCH] " Dave Hansen
2016-06-14 20:16     ` Nadav Amit
2016-06-14 21:37       ` Dave Hansen
2016-06-15  2:20         ` Andy Lutomirski
2016-06-15  2:35           ` Nadav Amit
2016-06-15  2:36             ` Andy Lutomirski
2016-06-15  2:44               ` Nadav Amit
2016-06-15  3:09                 ` Andy Lutomirski
2016-06-15  3:20         ` Nadav Amit
2016-06-14 16:58 ` kbuild test robot
2016-06-14 17:19 ` Dave Hansen
2016-06-15 13:06   ` Anaczkowski, Lukasz
2016-06-14 17:47 ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160614181002.GA30049@pd.tnic \
    --to=bp@alien8.de \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=grzegorz.andrejczuk@intel.com \
    --cc=harish.srinivasappa@intel.com \
    --cc=hpa@zytor.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lukasz.anaczkowski@intel.com \
    --cc=lukasz.daniluk@intel.com \
    --cc=lukasz.odzioba@intel.com \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox