From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Hugh Dickins <hugh@veritas.com>,
Linux Memory Management List <linux-mm@kvack.org>,
Linux Kernel list <linux-kernel@vger.kernel.org>,
Nick Piggin <npiggin@suse.de>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Peter Chubb <peterc@gelato.unsw.edu.au>
Subject: Re: PTE access rules & abstraction
Date: Thu, 25 Sep 2008 08:07:21 +1000 [thread overview]
Message-ID: <1222294041.8277.104.camel@pasglop> (raw)
In-Reply-To: <48DAB7E2.5030009@goop.org>
> What do you propose then? Ideally one would like to get something that
> works for powerpc, s390, all the wacky ia64 modes as well as x86. The
> ia64 folks proposed something, but I've not looked at it closely. From
> an x86 virtualization perspective, something that's basically x86 with
> as much scope for batching and deferring as possible would be fine.
That's where things get interesting. I liked Nick ideas of doing
something transactional that could encompass the lock, bach and flushing
but that may be too much at this stage...
> As a start, what's the state machine for a pte? What states can it be
> in, and how does it move from state to state? It sounds like powerpc
> has at least one extra state above x86 (hashed, with the hash key stored
> in the pte itself?).
We store in the PTE whether it was hashed, and the location within a
hash bucket. (For each hash value, there's 8 buckets, or rather 16 if
you count our secondary hashing).
We must never write a new valid PTE after we cleared a hashed one
without having a flush in between.
On 32 bits we have less state (only the 'hashed' bit) but the problem is
similar, though we handle it differently: we never clear the hash bit
until we flush the hash, ie, pte_clear doesn't clear the hash bit. On
64-bit we do things differently, we do clear PTEs and pile up in a
per-cpu batch what needs to be flushed, the flush then happens when
leaving lazy mode.
> ptep_get_and_clear() is not batchable anyway, because the x86
> implementation requires an atomic xchg on the pte, which will likely
> result in some sort of trap (and if it doesn't then it doesn't need
> batching).
Well, ptep_get_and_clear() used to be used by zap_pte_range() which I
_HOPE_ was batchable on x86 :-)
Nowadays, there's this new ptep_get_and_clear_full() (yet another
totally meaningless name for an ad-hoc API added for some random special
purpose) that zap_pte_range() uses. Maybe that one is now subtly
different such as it can be used to batch on x86 ?
In any case, powerpc batches -everything- (unless it's called *_flush in
which case the flush is immediate) in a private per-cpu batch and
flushes the hash when leaving lazy mode.
> The start/commit API was specifically so that we can do the
> mprotect (and fork COW updates) in a batchable way (in Xen its
> implemented with a pte update hypercall which updates the pte without
> affecting the A/D bits).
I think we have different ideas of what batch means but yeah, we do
batch everything including these on powerpc without the new start/commit
interface.
Ben.
> J
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-09-24 22:07 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-19 17:42 Benjamin Herrenschmidt
2008-09-22 6:22 ` Jeremy Fitzhardinge
2008-09-22 21:05 ` Benjamin Herrenschmidt
2008-09-23 3:10 ` Nick Piggin
2008-09-23 3:16 ` David Miller, Nick Piggin
2008-09-23 5:35 ` Benjamin Herrenschmidt
2008-09-23 6:18 ` Nick Piggin
2008-09-23 5:31 ` Benjamin Herrenschmidt
2008-09-23 6:13 ` Jeremy Fitzhardinge
2008-09-23 6:49 ` Benjamin Herrenschmidt
2008-09-23 9:50 ` Nick Piggin
2008-09-23 11:54 ` peter
2008-09-24 18:45 ` Hugh Dickins
2008-09-24 21:20 ` Benjamin Herrenschmidt
2008-09-24 21:57 ` Jeremy Fitzhardinge
2008-09-24 22:07 ` Benjamin Herrenschmidt [this message]
2008-09-24 22:43 ` Jeremy Fitzhardinge
2008-09-24 22:53 ` Benjamin Herrenschmidt
2008-09-24 23:55 ` Hugh Dickins
2008-09-25 1:04 ` Benjamin Herrenschmidt
2008-09-25 18:15 ` Jeremy Fitzhardinge
2008-09-25 21:44 ` Benjamin Herrenschmidt
2008-09-25 22:27 ` Jeremy Fitzhardinge
2008-09-25 23:02 ` Benjamin Herrenschmidt
2008-09-24 22:17 ` Martin Schwidefsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1222294041.8277.104.camel@pasglop \
--to=benh@kernel.crashing.org \
--cc=hugh@veritas.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=peterc@gelato.unsw.edu.au \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox