Re: [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Dave Hansen <dave@sr71.net>,
	linux-kernel@vger.kernel.org
Cc: x86@kernel.org, linux-mm@kvack.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	bp@alien8.de, ak@linux.intel.com, mhocko@suse.com
Subject: Re: [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum
Date: Wed, 13 Jul 2016 13:37:21 +0200	[thread overview]
Message-ID: <9c09c63c-5c2a-20a4-d68b-a6dc2f88ecaa@suse.cz> (raw)
In-Reply-To: <1467412092.7422.56.camel@kernel.crashing.org>

On 07/02/2016 12:28 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2016-07-01 at 10:46 -0700, Dave Hansen wrote:
>> The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
>> Landing) has an erratum where a processor thread setting the Accessed
>> or Dirty bits may not do so atomically against its checks for the
>> Present bit.  This may cause a thread (which is about to page fault)
>> to set A and/or D, even though the Present bit had already been
>> atomically cleared.
>
> Interesting.... I always wondered where in the Intel docs did it specify
> that present was tested atomically with setting of A and D ... I couldn't
> find it.
>
> Isn't there a more fundamental issue however that you may actually lose
> those bits ? For example if we do an munmap, in zap_pte_range()
>
> We first exchange all the PTEs with 0 with ptep_get_and_clear_full()
> and we then transfer D that we just read into the struct page.
>
> We rely on the fact that D will never be set again, what we go it a
> "final" D bit. IE. We rely on the fact that a processor either:
>
>    - Has a cached PTE in its TLB with D set, in which case it can still
> write to the page until we flush the TLB or
>
>    - Doesn't have a cached PTE in its TLB with D set and so will fail
> to do so due to the atomic P check, thus never writing.
>
> With the errata, don't you have a situation where a processor in the second
> category will write and set D despite P having been cleared (due to the
> race) and thus causing us to miss the transfer of that D to the struct
> page and essentially completely miss that the physical page is dirty ?

Seems to me like this is indeed possible, but...

> (Leading to memory corruption).

... what memory corruption, exactly? If a process is writing to its 
memory from one thread and unmapping it from other thread at the same 
time, there are no guarantees anyway? Would anything sensible rely on 
the guarantee that if the write in such racy scenario didn't end up as a 
segfault (i.e. unmapping was faster), then it must hit the disk? Or are 
there any other scenarios where zap_pte_range() is called? Hmm, but how 
does this affect the page migration scenario, can we lose the D bit there?

And maybe related thing that just occured to me, what if page is made 
non-writable during fork() to catch COW? Any race in that one, or just 
the P bit? But maybe the argument would be the same as above...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-07-13 11:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 17:46 Dave Hansen
2016-07-01 17:47 ` [PATCH 1/4] x86, swap: move swap offset/type up in PTE to work around erratum Dave Hansen
2016-07-01 17:47 ` [PATCH 2/4] x86, pagetable: ignore A/D bits in pte/pmd/pud_none() Dave Hansen
2016-07-01 17:47 ` [PATCH 3/4] x86: disallow running with 32-bit PTEs to work around erratum Dave Hansen
2016-07-01 17:47 ` [PATCH 4/4] x86: use pte_none() to test for empty PTE Dave Hansen
2016-07-01 22:28 ` [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum Benjamin Herrenschmidt
2016-07-13 11:37   ` Vlastimil Babka [this message]
2016-07-13 12:10     ` Vlastimil Babka
2016-07-13 14:04     ` Dave Hansen
2016-07-08  0:19 Dave Hansen
2016-07-13  9:54 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c09c63c-5c2a-20a4-d68b-a6dc2f88ecaa@suse.cz \
    --to=vbabka@suse.cz \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=dave@sr71.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox