* update_mmu_cache(): fault or not fault ?
@ 2005-09-26 6:22 Benjamin Herrenschmidt
2005-09-26 7:41 ` David S. Miller, Benjamin Herrenschmidt
2005-09-26 8:05 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2005-09-26 6:22 UTC (permalink / raw)
To: linux-mm; +Cc: Linux Kernel list
Hi !
I been toying with using update_mmu_cache() to actually fill the TLB
entry directly when taking a fault on some PPC CPUs with software TLB
reload (among other optims I have in mind). Most of CPUs with software
TLB reload currently take double TLB faults on linux page faults.
The problem is that want to only ever do that kind of hw TLB pre-fill
when update_mmu_cache() is called as the result an actual fault.
However, for some reasons that I'm not 100% sure about (*)
update_mmu_cache() is called from other places, typically in mm/fremap.c
which aren't directly results of faults.
So I suggest adding an argument to it "int is_fault", that would
basically be '1' on all the call sites in mm/memory.c and '0' in all the
call sites in mm/fremap.c.
Any objection, comment, whatever, before I come up with a patch adding
it to all archs ?
Ben.
(*) I suspect because update_mmu_cache() has historically been hijacked
to do the icache/dcache sync on some architecture, and thus was added to
all call sites that can populate a PTE out of the blue, though it's a
bit dodgy that it's not called in mremap(), thus people with hw execute
permission using that trick should be careful... but then, if you have
execute permission, you probably don't need that trick. This is what
ppc32 and ppc64 old older CPUs do, in an SMP racy way even ;) But that's
a different discussion and I'll have to fix it some day.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: update_mmu_cache(): fault or not fault ?
2005-09-26 6:22 update_mmu_cache(): fault or not fault ? Benjamin Herrenschmidt
@ 2005-09-26 7:41 ` David S. Miller, Benjamin Herrenschmidt
2005-09-26 8:03 ` Benjamin Herrenschmidt
2005-09-26 8:05 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 6+ messages in thread
From: David S. Miller, Benjamin Herrenschmidt @ 2005-09-26 7:41 UTC (permalink / raw)
To: benh; +Cc: linux-mm, linux-kernel
> The problem is that want to only ever do that kind of hw TLB pre-fill
> when update_mmu_cache() is called as the result an actual fault.
> However, for some reasons that I'm not 100% sure about (*)
> update_mmu_cache() is called from other places, typically in mm/fremap.c
> which aren't directly results of faults.
>
> So I suggest adding an argument to it "int is_fault", that would
> basically be '1' on all the call sites in mm/memory.c and '0' in all the
> call sites in mm/fremap.c.
You can track this in your port specific code. That's what I do on
sparc64 to deal with this case. I record the TLB miss type (D or I
tlb), and also whether a write occurred, in a bitmask. Then I check
this in update_mmu_cache() to decide whether to prefill.
I store it in current_thread_info() and clear it at the end of fault
processing.
Just grep for "FAULT_CODE_*" in the sparc64 code to see how this
works.
Although, I'm ambivalent as to whether prefilling helps at all.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: update_mmu_cache(): fault or not fault ?
2005-09-26 7:41 ` David S. Miller, Benjamin Herrenschmidt
@ 2005-09-26 8:03 ` Benjamin Herrenschmidt
2005-09-26 19:52 ` David S. Miller, Benjamin Herrenschmidt
0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2005-09-26 8:03 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-mm, linux-kernel
> You can track this in your port specific code. That's what I do on
> sparc64 to deal with this case. I record the TLB miss type (D or I
> tlb), and also whether a write occurred, in a bitmask. Then I check
> this in update_mmu_cache() to decide whether to prefill.
>
> I store it in current_thread_info() and clear it at the end of fault
> processing.
>
> Just grep for "FAULT_CODE_*" in the sparc64 code to see how this
> works.
Yup, that would work, thanks. I'll look into it. I just did something
similar on ppc64 for i/d cache coherency. On CPUs with support for no
executable pages, we map pages non-exec and do the cache flush on the
resulting exec fault. That means however that when faulting in text
pages that haven't been used yet (typically app launch), we would take
the linux page fault, put a PTE in, have update_mmu_cache() put a read
HPTE in the hash table without exec permission, then take a new fault
(exec permission violation), do the flush & return.
I just hacked in some code to test in update_mmu_cache() (just using
current->thread.regs->trap for now) if we come from an instruction
access exception, then do the cache sync and hash in an executable HPTE
(if the linux PTE is executable of course) directly so we avoid the
double fault. It's currently deep into a patch that does many more
things, so I didn't yet have a chance to bench separately, but I'll try
to get some numbers, might grab a little bit more perfs on app launch on
my G5 :)
> Although, I'm ambivalent as to whether prefilling helps at all.
If it's really only ever done on faults, I fail to see how it can hurt
at least, since we are basically just removing the cost of a second
exception. Wether it's useful in practice probably depends on the cost
of taking such an exception on a given CPU. Difficult to say without
some benchmarking...
Ben.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: update_mmu_cache(): fault or not fault ?
2005-09-26 6:22 update_mmu_cache(): fault or not fault ? Benjamin Herrenschmidt
2005-09-26 7:41 ` David S. Miller, Benjamin Herrenschmidt
@ 2005-09-26 8:05 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2005-09-26 8:05 UTC (permalink / raw)
To: linux-mm; +Cc: Linux Kernel list
> So I suggest adding an argument to it "int is_fault", that would
> basically be '1' on all the call sites in mm/memory.c and '0' in all the
> call sites in mm/fremap.c.
>
> Any objection, comment, whatever, before I come up with a patch adding
> it to all archs ?
Acutally, that wouldn't work for calls to get_user_pages() which will
cause the fault code path on non-faults... looks like David's solution
is the best one at this point.
Ben.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: update_mmu_cache(): fault or not fault ?
2005-09-26 8:03 ` Benjamin Herrenschmidt
@ 2005-09-26 19:52 ` David S. Miller, Benjamin Herrenschmidt
2005-09-26 21:28 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 6+ messages in thread
From: David S. Miller, Benjamin Herrenschmidt @ 2005-09-26 19:52 UTC (permalink / raw)
To: benh; +Cc: linux-mm, linux-kernel
> > Although, I'm ambivalent as to whether prefilling helps at all.
>
> If it's really only ever done on faults, I fail to see how it can hurt
> at least, since we are basically just removing the cost of a second
> exception. Wether it's useful in practice probably depends on the cost
> of taking such an exception on a given CPU. Difficult to say without
> some benchmarking...
I guess my ambivalence comes from some aspects of how sparc64 TLB
refilling works.
When you take the TLB miss, the cpu sets up all of these things for
the TLB reload that you have to do by hand if you want to do the
TLB refill in some other context.
There is an MMU register which holds the page aligned virtual address
and the MMU context value. Next, there is a register where you
write the TLB "tag" which contains the PTE entry and, the write to
this register is what performs the TLB load up. (it uses the virtual
address + context value to figure out where to place the PTE entry,
and the PTE itself comes from the store source register)
At TLB miss time, the MMU automatically fills in the virutal address
+ context register, and all you have to do is store the PTE value
and you're done. Whereas in a context like update_mmu_cache() I
have to setup that value as well.
Things get more complicated on UltraSPARC-III+ and later, which have
one 16-entry CAM D-TLB and two indexed 512-entry D-TLBs. You can
configure each 512-entry D-TLB to hold a parituclar page size. (So
for the kernel, for example, I configure the first one to hold 4MB
pages, and the second one for 8K pages) It is configurable by context.
So to do a TLB refill on these chips it has to know which of these 3
TLBs gets the write enable when you load in the PTE value. It does
this with a register that holds the page size configuration for the
active context at the time of the TLB miss.
So this is yet another register I'd have to load by hand to load the
TLB at update_mmu_cache() time.
I also have to disable interrupts so that TLB loading (which requires
multiple stores and is thus not atomic) does not get interrupted by a
cross-cpu call that flushes the TLB or similar.
So this is a ton of complication, which is straightforwardly done in
the TLB miss handler. And if you think about it, since we've been
writing the PTE entries and walking the page tables for fault
processing, all of this will be hot in the L2 cache when we take
the nearly immediate TLB miss.
Anyways, I'm very likely going to remove the prefilling of TLB entries
on sparc64. I hope it's more beneficial and less complicated for ppc64
:-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: update_mmu_cache(): fault or not fault ?
2005-09-26 19:52 ` David S. Miller, Benjamin Herrenschmidt
@ 2005-09-26 21:28 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2005-09-26 21:28 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-mm, linux-kernel
On Mon, 2005-09-26 at 12:52 -0700, David S. Miller wrote:
> So this is a ton of complication, which is straightforwardly done in
> the TLB miss handler. And if you think about it, since we've been
> writing the PTE entries and walking the page tables for fault
> processing, all of this will be hot in the L2 cache when we take
> the nearly immediate TLB miss.
>
> Anyways, I'm very likely going to remove the prefilling of TLB entries
> on sparc64. I hope it's more beneficial and less complicated for ppc64
> :-)
Ok, makes sense. On most ppc, things aren't pretty much equivalent on
real faults and pre-fill (except for masking interrupts which we have to
add to the pre-fill case). Anyway, best is to get real numbers with some
benchmarks, I'll see if I can get something from the 4xx folks.
Ben.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-09-26 21:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-26 6:22 update_mmu_cache(): fault or not fault ? Benjamin Herrenschmidt
2005-09-26 7:41 ` David S. Miller, Benjamin Herrenschmidt
2005-09-26 8:03 ` Benjamin Herrenschmidt
2005-09-26 19:52 ` David S. Miller, Benjamin Herrenschmidt
2005-09-26 21:28 ` Benjamin Herrenschmidt
2005-09-26 8:05 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox