Re: PTE access rules & abstraction - Benjamin Herrenschmidt

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Hugh Dickins <hugh@veritas.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Linux Kernel list <linux-kernel@vger.kernel.org>,
	Nick Piggin <npiggin@suse.de>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Peter Chubb <peterc@gelato.unsw.edu.au>
Subject: Re: PTE access rules & abstraction
Date: Thu, 25 Sep 2008 08:07:21 +1000	[thread overview]
Message-ID: <1222294041.8277.104.camel@pasglop> (raw)
In-Reply-To: <48DAB7E2.5030009@goop.org>

> What do you propose then?  Ideally one would like to get something that
> works for powerpc, s390, all the wacky ia64 modes as well as x86.  The
> ia64 folks proposed something, but I've not looked at it closely.  From
> an x86 virtualization perspective, something that's basically x86 with
> as much scope for batching and deferring as possible would be fine.

That's where things get interesting. I liked Nick ideas of doing
something transactional that could encompass the lock, bach and flushing
but that may be too much at this stage...

> As a start, what's the state machine for a pte?  What states can it be
> in, and how does it move from state to state?  It sounds like powerpc
> has at least one extra state above x86 (hashed, with the hash key stored
> in the pte itself?).

We store in the PTE whether it was hashed, and the location within a
hash bucket. (For each hash value, there's 8 buckets, or rather 16 if
you count our secondary hashing).

We must never write a new valid PTE after we cleared a hashed one
without having a flush in between.

On 32 bits we have less state (only the 'hashed' bit) but the problem is
similar, though we handle it differently: we never clear the hash bit
until we flush the hash, ie, pte_clear doesn't clear the hash bit. On
64-bit we do things differently, we do clear PTEs and pile up in a
per-cpu batch what needs to be flushed, the flush then happens when
leaving lazy mode.

> ptep_get_and_clear() is not batchable anyway, because the x86
> implementation requires an atomic xchg on the pte, which will likely
> result in some sort of trap (and if it doesn't then it doesn't need
> batching).

Well, ptep_get_and_clear() used to be used by zap_pte_range() which I
_HOPE_ was batchable on x86 :-)

Nowadays, there's this new ptep_get_and_clear_full() (yet another
totally meaningless name for an ad-hoc API added for some random special
purpose) that zap_pte_range() uses. Maybe that one is now subtly
different such as it can be used to batch on x86 ?

In any case, powerpc batches -everything- (unless it's called *_flush in
which case the flush is immediate) in a private per-cpu batch and
flushes the hash when leaving lazy mode.

>   The start/commit API was specifically so that we can do the
> mprotect (and fork COW updates) in a batchable way (in Xen its
> implemented with a pte update hypercall which updates the pte without
> affecting the A/D bits).

I think we have different ideas of what batch means but yeah, we do
batch everything including these on powerpc without the new start/commit
interface.

Ben.

>     J
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-09-24 22:07 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-19 17:42 Benjamin Herrenschmidt
2008-09-22  6:22 ` Jeremy Fitzhardinge
2008-09-22 21:05   ` Benjamin Herrenschmidt
2008-09-23  3:10     ` Nick Piggin
2008-09-23  3:16       ` David Miller, Nick Piggin
2008-09-23  5:35         ` Benjamin Herrenschmidt
2008-09-23  6:18           ` Nick Piggin
2008-09-23  5:31       ` Benjamin Herrenschmidt
2008-09-23  6:13         ` Jeremy Fitzhardinge
2008-09-23  6:49           ` Benjamin Herrenschmidt
2008-09-23  9:50             ` Nick Piggin
2008-09-23 11:54               ` peter
2008-09-24 18:45     ` Hugh Dickins
2008-09-24 21:20       ` Benjamin Herrenschmidt
2008-09-24 21:57         ` Jeremy Fitzhardinge
2008-09-24 22:07           ` Benjamin Herrenschmidt [this message]
2008-09-24 22:43             ` Jeremy Fitzhardinge
2008-09-24 22:53               ` Benjamin Herrenschmidt
2008-09-24 23:55         ` Hugh Dickins
2008-09-25  1:04           ` Benjamin Herrenschmidt
2008-09-25 18:15             ` Jeremy Fitzhardinge
2008-09-25 21:44               ` Benjamin Herrenschmidt
2008-09-25 22:27                 ` Jeremy Fitzhardinge
2008-09-25 23:02                   ` Benjamin Herrenschmidt
2008-09-24 22:17       ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1222294041.8277.104.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=hugh@veritas.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=peterc@gelato.unsw.edu.au \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox