* PTE aging, ptep_test_and_clear_young() and TLB
@ 2004-04-17 20:15 Russell King
2004-04-17 20:43 ` William Lee Irwin III
2004-04-17 22:27 ` Hugh Dickins
0 siblings, 2 replies; 17+ messages in thread
From: Russell King @ 2004-04-17 20:15 UTC (permalink / raw)
To: linux-mm
Hi,
Marc Singer has been investigating some issues with ARM where we
appear to unmap pages which are in active use by the application.
While Bill Irwin has been looking at them (see
<http://marc.theaimsgroup.com/?l=linux-mm&m=108218227006508&w=2>),
I'm a little concerned about the page aging.
We implement the page age tracking by causing faults when the page is
marked "old". It turns out that the implementation is "lazy" because
ptep_test_and_clear_young() does not flush the TLB to get rid of the
existing entry. This means that even though we update the PTE to cause
a fault on the next access, the MMU doesn't see the change until:
(1) the next context switch which change user space mappings, or
(2) there is sufficient TLB replacement to cause older entries to
be evicted. (where older does not depend on use of that entry.)
This same issue came up with 2.4 kernels, where it appears to be less
of a problem. IIRC it was decided that the TLB flush when we mark
PTEs "old" was not necessary, even for systems which maintain the page
age state by software means, since we won't evict the page even after
unmapping it until we have unmapped it from all processes.
However, I'm led to believe that the current 2.6 VM is more agressive,
and needs the young bit to prevent pages being thrown out and needing
to be re-read from disk/network. Essentially, I'm led to believe that
when a page is marked "old", it is up for eviction on the very next
rescan if it hasn't been marked "young".
So, it seems to me that maintaining the PTE age state is far more
important, and a lazy approach is no longer possible.
This in turn means that we need to replace ptep_test_and_clear_young()
with ptep_clear_flush_young(), which in turn means we need the VMA and
address. However, this implies introducing more code into
page_referenced().
Comments?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-17 20:15 PTE aging, ptep_test_and_clear_young() and TLB Russell King
@ 2004-04-17 20:43 ` William Lee Irwin III
2004-04-18 9:36 ` Russell King
2004-04-17 22:27 ` Hugh Dickins
1 sibling, 1 reply; 17+ messages in thread
From: William Lee Irwin III @ 2004-04-17 20:43 UTC (permalink / raw)
To: Russell King; +Cc: linux-mm
On Sat, Apr 17, 2004 at 09:15:06PM +0100, Russell King wrote:
> This in turn means that we need to replace ptep_test_and_clear_young()
> with ptep_clear_flush_young(), which in turn means we need the VMA and
> address. However, this implies introducing more code into
> page_referenced().
> Comments?
The address and mm should already be recoverable via the pte page
tagging technique. The vma is recoverable from that, albeit at some
cost (mm->page_table_lock acquisition + find_vma() call). OTOH unless
kswapd's going wild it should largely count as a slow path anyway.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-17 20:15 PTE aging, ptep_test_and_clear_young() and TLB Russell King
2004-04-17 20:43 ` William Lee Irwin III
@ 2004-04-17 22:27 ` Hugh Dickins
2004-04-17 23:47 ` Anton Blanchard
1 sibling, 1 reply; 17+ messages in thread
From: Hugh Dickins @ 2004-04-17 22:27 UTC (permalink / raw)
To: Russell King; +Cc: linux-mm
On Sat, 17 Apr 2004, Russell King wrote:
>
> So, it seems to me that maintaining the PTE age state is far more
> important, and a lazy approach is no longer possible.
>
> This in turn means that we need to replace ptep_test_and_clear_young()
> with ptep_clear_flush_young(), which in turn means we need the VMA and
> address. However, this implies introducing more code into
> page_referenced().
>
> Comments?
I think you're quite likely right on all counts; and this may be why
ppc and ppc64 have arranged their ptep_test_and_clear_young to flush TLB.
But I don't much like the thought of flushing TLB on all cpus each time
page_referenced finds the referenced bit set in a pte, perhaps many times
even for the one page. We'd prefer page_referenced to remain lightweight
in contrast to try_to_unmap. Need to do some kind of gathering before
TLB flush.
(Andrea will know one reason why I'm afraid of vmas in page_referenced ;)
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-17 22:27 ` Hugh Dickins
@ 2004-04-17 23:47 ` Anton Blanchard
0 siblings, 0 replies; 17+ messages in thread
From: Anton Blanchard @ 2004-04-17 23:47 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Russell King, linux-mm
> I think you're quite likely right on all counts; and this may be why
> ppc and ppc64 have arranged their ptep_test_and_clear_young to flush TLB.
Yep thats the reason we do it.
Anton
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-17 20:43 ` William Lee Irwin III
@ 2004-04-18 9:36 ` Russell King
2004-04-18 9:39 ` William Lee Irwin III
2004-04-18 10:42 ` Russell King
0 siblings, 2 replies; 17+ messages in thread
From: Russell King @ 2004-04-18 9:36 UTC (permalink / raw)
To: William Lee Irwin III, linux-mm
On Sat, Apr 17, 2004 at 01:43:02PM -0700, William Lee Irwin III wrote:
> On Sat, Apr 17, 2004 at 09:15:06PM +0100, Russell King wrote:
> > This in turn means that we need to replace ptep_test_and_clear_young()
> > with ptep_clear_flush_young(), which in turn means we need the VMA and
> > address. However, this implies introducing more code into
> > page_referenced().
> > Comments?
>
> The address and mm should already be recoverable via the pte page
> tagging technique. The vma is recoverable from that, albeit at some
> cost (mm->page_table_lock acquisition + find_vma() call). OTOH unless
> kswapd's going wild it should largely count as a slow path anyway.
Actually, we don't actually need the VMA - if you look at flush_tlb_page()
in include/asm-arm/tlbflush.h, we only really need the MM. Therefore,
it's pointless digging up the VMA. (I did think that we didn't flush
the I-TLB if VM_EXEC wasn't set, but I think that was a previous
incarnation.)
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 9:36 ` Russell King
@ 2004-04-18 9:39 ` William Lee Irwin III
2004-04-18 10:58 ` Hugh Dickins
2004-04-18 10:42 ` Russell King
1 sibling, 1 reply; 17+ messages in thread
From: William Lee Irwin III @ 2004-04-18 9:39 UTC (permalink / raw)
To: Russell King; +Cc: linux-mm
On Sat, Apr 17, 2004 at 01:43:02PM -0700, William Lee Irwin III wrote:
>> The address and mm should already be recoverable via the pte page
>> tagging technique. The vma is recoverable from that, albeit at some
>> cost (mm->page_table_lock acquisition + find_vma() call). OTOH unless
>> kswapd's going wild it should largely count as a slow path anyway.
On Sun, Apr 18, 2004 at 10:36:16AM +0100, Russell King wrote:
> Actually, we don't actually need the VMA - if you look at flush_tlb_page()
> in include/asm-arm/tlbflush.h, we only really need the MM. Therefore,
> it's pointless digging up the VMA. (I did think that we didn't flush
> the I-TLB if VM_EXEC wasn't set, but I think that was a previous
> incarnation.)
This sounds like when hugh's stuff to prep for either his or andrea's
try_to_unmap() reimplementation goes in, something akin to current ppc64
may be needed for ARM. That should preserve the mm/address tagging by
shoving the pte page tagging into arch code.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 9:36 ` Russell King
2004-04-18 9:39 ` William Lee Irwin III
@ 2004-04-18 10:42 ` Russell King
2004-04-18 15:12 ` William Lee Irwin III
1 sibling, 1 reply; 17+ messages in thread
From: Russell King @ 2004-04-18 10:42 UTC (permalink / raw)
To: William Lee Irwin III, linux-mm
On Sun, Apr 18, 2004 at 10:36:16AM +0100, Russell King wrote:
> Actually, we don't actually need the VMA - if you look at flush_tlb_page()
> in include/asm-arm/tlbflush.h, we only really need the MM. Therefore,
> it's pointless digging up the VMA. (I did think that we didn't flush
> the I-TLB if VM_EXEC wasn't set, but I think that was a previous
> incarnation.)
Grumble - there's one big problem here - it's the kernel include
dependencies.
In file included from include/linux/mm.h:25,
from arch/arm/kernel/asm-offsets.c:14:
include/asm/pgtable.h: In function `ptep_test_and_clear_young':
include/asm/pgtable.h:404: warning: implicit declaration of function `flush_tlb_mm_page'
include/asm/pgtable.h:404: warning: implicit declaration of function `ptep_to_mm'
include/asm/pgtable.h:404: warning: implicit declaration of function `ptep_to_address'
Ok, so linux/mm.h includes asm/pgtable.h, which in turn includes
asm-generic/pgtable.h. I need to get at the mm and address in my
implementation of ptep_test_and_clear_young() - and the functions
are defined in asm-generic/rmap.h. This includes linux/mm.h, so
I can't include it in asm/pgtable.h. Moreover, mm_struct hasn't
been declared yet.
Converting ptep_test_and_clear_young() to be a macro doesn't look
sane either, not without creating some rather disgusting code.
So, how do I get at the mm_struct and address in asm/pgtable.h ?
Maybe we need to split out the pte manipulation into asm/pte.h rather
than overloading pgtable.h with it?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 9:39 ` William Lee Irwin III
@ 2004-04-18 10:58 ` Hugh Dickins
2004-04-18 11:23 ` Russell King
2004-04-18 15:52 ` William Lee Irwin III
0 siblings, 2 replies; 17+ messages in thread
From: Hugh Dickins @ 2004-04-18 10:58 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Russell King, linux-mm
On Sun, 18 Apr 2004, William Lee Irwin III wrote:
> On Sun, Apr 18, 2004 at 10:36:16AM +0100, Russell King wrote:
> > Actually, we don't actually need the VMA - if you look at flush_tlb_page()
> > in include/asm-arm/tlbflush.h, we only really need the MM. Therefore,
> > it's pointless digging up the VMA. (I did think that we didn't flush
> > the I-TLB if VM_EXEC wasn't set, but I think that was a previous
> > incarnation.)
>
> This sounds like when hugh's stuff to prep for either his or andrea's
> try_to_unmap() reimplementation goes in, something akin to current ppc64
> may be needed for ARM. That should preserve the mm/address tagging by
> shoving the pte page tagging into arch code.
mm and address are directly available in both mine and Andrea's (the
difference between us is finding vma: mine needs find_vma in the anon
case, on Andrea's it's directly available), shouldn't be any need to
add in that ppc/ppc64 code.
Hmm, maybe I didn't look hard enough at it, and could have just taken
it out of ppc/ppc64, instead of moving it from generic; I'll go back
and check on that sometime.
I'm not surprised Russell's found he just needs mm rather than vma,
I did try briefly yesterday to understand just what it is that vma
gives to flush TLB. Needs thorough research through all the arches,
the ARM case is not necessarily representative.
Wouldn't surprise me if it turns out vma necessary on some in the
file-backed case, but on none in the anon case (would then cease
to be a differentiator between anonmm and anon_vma if so).
But I still think that we'd want to cut down on the intercpu TLB
flushes for page_referenced, should batch them up to some extent.
Russell may well be right that we're much too lazy about the
referenced bit in 2.6, but that doesn't mean we now have to
jump and get it exactly right all the time: the dirty bit is
vital, the referenced bit never more than a hint.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 10:58 ` Hugh Dickins
@ 2004-04-18 11:23 ` Russell King
2004-04-18 12:36 ` Hugh Dickins
2004-04-18 15:52 ` William Lee Irwin III
1 sibling, 1 reply; 17+ messages in thread
From: Russell King @ 2004-04-18 11:23 UTC (permalink / raw)
To: Hugh Dickins; +Cc: William Lee Irwin III, linux-mm
On Sun, Apr 18, 2004 at 11:58:21AM +0100, Hugh Dickins wrote:
> I'm not surprised Russell's found he just needs mm rather than vma,
> I did try briefly yesterday to understand just what it is that vma
> gives to flush TLB. Needs thorough research through all the arches,
> the ARM case is not necessarily representative.
For flushing TLBs, my understanding is the vma gives you access to
vm_flags, specifically the VM_EXEC flag. This can be used as an
optimisation by Harvard architectures to avoid touching the I-TLB
if the page is not executable.
If this is the only reason, and we need to spent cycles looking up
the VMA, it becomes questionable whether the optimisation is really
valid in every case. If the VMA is already available for some other
purpose then it makes sense, but otherwise it doesn't.
> Russell may well be right that we're much too lazy about the
> referenced bit in 2.6, but that doesn't mean we now have to
> jump and get it exactly right all the time: the dirty bit is
> vital, the referenced bit never more than a hint.
The evidence from Marc appears to imply that it is far more than a
hint. His case appears to show that if we flush the TLB (due to
a context switch) his problems vanish completely. This will be
because the referenced bit will be updated shortly after each
switch.
However, consider the following case: a TLB with ASIDs and we only
flush the TLB when we have used up all ASIDs. The only other way
entries are purged from the TLB is when they are recycled.
The lifetime of a TLB entry is now much longer - the context
switch boundary is now eliminated. This means that unless the
TLB entry is flushed, we'll _never_ know if the page has been
referenced after the VM scan has aged the entry.
So, I think we definitely need the flush there. The available data
so far from Marc appears to confirm this, and the theory surrounding
ASID-based MMUs (which are coming on ARM) also require it.
This leaves one major problem - implementation. The kernel include
files are a mess which makes it hard to get to the information
required to implement this. We certainly can't get at the mm_struct
in asm/pgtable.h because it hasn't been defined at the point pgtable.h
is included.
I'm going to be looking into what can be done to relieve the include
mess today, and then see about implementing the flush in the private
architecture code. However, there is most certainly a dependency
between the two activities. ;(
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 11:23 ` Russell King
@ 2004-04-18 12:36 ` Hugh Dickins
2004-04-18 12:42 ` Russell King
0 siblings, 1 reply; 17+ messages in thread
From: Hugh Dickins @ 2004-04-18 12:36 UTC (permalink / raw)
To: Russell King; +Cc: William Lee Irwin III, linux-mm
On Sun, 18 Apr 2004, Russell King wrote:
>
> So, I think we definitely need the flush there. The available data
> so far from Marc appears to confirm this, and the theory surrounding
> ASID-based MMUs (which are coming on ARM) also require it.
I agree that we need to flush TLB more, that if we keep on ignoring a
hint forever then things go awry. I disagree that it needs to be done
so immediately, in the young/referenced/accessed case. But go ahead,
we can always optimize some of it out later on.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 12:36 ` Hugh Dickins
@ 2004-04-18 12:42 ` Russell King
2004-04-18 19:55 ` Russell King
0 siblings, 1 reply; 17+ messages in thread
From: Russell King @ 2004-04-18 12:42 UTC (permalink / raw)
To: Hugh Dickins; +Cc: William Lee Irwin III, linux-mm
On Sun, Apr 18, 2004 at 01:36:11PM +0100, Hugh Dickins wrote:
> On Sun, 18 Apr 2004, Russell King wrote:
> >
> > So, I think we definitely need the flush there. The available data
> > so far from Marc appears to confirm this, and the theory surrounding
> > ASID-based MMUs (which are coming on ARM) also require it.
>
> I agree that we need to flush TLB more, that if we keep on ignoring a
> hint forever then things go awry. I disagree that it needs to be done
> so immediately, in the young/referenced/accessed case. But go ahead,
> we can always optimize some of it out later on.
Well, having struggled with the kernels include mess to try to get at
the information I need to flush the TLB from an asm-arm header file,
I'm just considering whether to just say "fuck it" and add
#ifdef __arm__
flush_tlb_mm_page(ptep_to_mm(pte), ptep_to_address(pte));
#endif
directly into page_referenced() and be done with it.
Basically, to be able to use either ptep_to_mm() or ptep_to_address()
in asm/pgtable.h, you need to:
1. remove linux/mm.h from asm-generic/rmap.h
2. somehow work around linux/highmem.h which includes linux/mm.h so
asm-generic/rmap.h can have a definition of kmap_atomic_to_page()
3. remove asm/pgtable.h from linux/mm.h and linux/page-flags.h
I've managed to get so far with that, but the real killer seems to
be (2).
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 10:42 ` Russell King
@ 2004-04-18 15:12 ` William Lee Irwin III
0 siblings, 0 replies; 17+ messages in thread
From: William Lee Irwin III @ 2004-04-18 15:12 UTC (permalink / raw)
To: Russell King; +Cc: linux-mm
On Sun, Apr 18, 2004 at 11:42:11AM +0100, Russell King wrote:
> Ok, so linux/mm.h includes asm/pgtable.h, which in turn includes
> asm-generic/pgtable.h. I need to get at the mm and address in my
> implementation of ptep_test_and_clear_young() - and the functions
> are defined in asm-generic/rmap.h. This includes linux/mm.h, so
> I can't include it in asm/pgtable.h. Moreover, mm_struct hasn't
> been declared yet.
> Converting ptep_test_and_clear_young() to be a macro doesn't look
> sane either, not without creating some rather disgusting code.
> So, how do I get at the mm_struct and address in asm/pgtable.h ?
> Maybe we need to split out the pte manipulation into asm/pte.h rather
> than overloading pgtable.h with it?
I think the usual answer is "lots of giant macros." =(
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 10:58 ` Hugh Dickins
2004-04-18 11:23 ` Russell King
@ 2004-04-18 15:52 ` William Lee Irwin III
1 sibling, 0 replies; 17+ messages in thread
From: William Lee Irwin III @ 2004-04-18 15:52 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Russell King, linux-mm
On Sun, Apr 18, 2004 at 11:58:21AM +0100, Hugh Dickins wrote:
> mm and address are directly available in both mine and Andrea's (the
> difference between us is finding vma: mine needs find_vma in the anon
> case, on Andrea's it's directly available), shouldn't be any need to
> add in that ppc/ppc64 code.
> Hmm, maybe I didn't look hard enough at it, and could have just taken
> it out of ppc/ppc64, instead of moving it from generic; I'll go back
> and check on that sometime.
> I'm not surprised Russell's found he just needs mm rather than vma,
> I did try briefly yesterday to understand just what it is that vma
> gives to flush TLB. Needs thorough research through all the arches,
> the ARM case is not necessarily representative.
> Wouldn't surprise me if it turns out vma necessary on some in the
> file-backed case, but on none in the anon case (would then cease
> to be a differentiator between anonmm and anon_vma if so).
I have to confess to not looking closely at the recent merge-oriented
code. Passing the things in when they're available will do it.
On Sun, Apr 18, 2004 at 11:58:21AM +0100, Hugh Dickins wrote:
> But I still think that we'd want to cut down on the intercpu TLB
> flushes for page_referenced, should batch them up to some extent.
> Russell may well be right that we're much too lazy about the
> referenced bit in 2.6, but that doesn't mean we now have to
> jump and get it exactly right all the time: the dirty bit is
> vital, the referenced bit never more than a hint.
I'm not foreseeing many effective algorithms for batching TLB flushes
there. Maybe something will get brewed up that surprises me.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 12:42 ` Russell King
@ 2004-04-18 19:55 ` Russell King
2004-04-18 20:01 ` Russell King
2004-04-18 23:14 ` Hugh Dickins
0 siblings, 2 replies; 17+ messages in thread
From: Russell King @ 2004-04-18 19:55 UTC (permalink / raw)
To: Hugh Dickins; +Cc: William Lee Irwin III, linux-mm
On Sun, Apr 18, 2004 at 01:42:28PM +0100, Russell King wrote:
> Basically, to be able to use either ptep_to_mm() or ptep_to_address()
> in asm/pgtable.h, you need to:
>
> 1. remove linux/mm.h from asm-generic/rmap.h
> 2. somehow work around linux/highmem.h which includes linux/mm.h so
> asm-generic/rmap.h can have a definition of kmap_atomic_to_page()
> 3. remove asm/pgtable.h from linux/mm.h and linux/page-flags.h
>
> I've managed to get so far with that, but the real killer seems to
> be (2).
Ok, I've found a solution.
There are three things which keep linux/mm.h requires from asm/pgtable.h:
- pte_addr_t
- pgd_none (in pmd_alloc)
- pmd_offset (in pmd_alloc)
However, there are some hidden dependencies between asm/pgalloc.h and
asm/pgtable.h which have a nasty sting in the tail - and eliminating
asm/pgtable.h from linux/mm.h is enough to trigger it.
Essentially, the plan to solve this sanely boils down to:
1. move pte_addr_t from asm-*/pgtable.h into asm-*/page.h
This should be safe to do because pte_addr_t is used by linux/mm.h
and all files which use pte_addr_t include linux/mm.h. Secondly,
linux/mm.h includes both these files, asm-*/page.h before
asm-*/pgtable.h
pte_addr_t also appears to sit more naturally in asm-*/page.h -
it's where pte_t is defined, and most of the architectures
derive its definition from pte_t.
2. Eliminate asm/pgalloc.h from most files.
Many files appear not to use anything from this header file, but
include it anyway. Grepping around for uses of the definitions
in asm/pgalloc.h reveals 52 files using or providing pgalloc.h
definitions (including pgalloc.h files). However, a wapping
557 files include pgalloc.h.
The only files which need pgalloc.h include are:
./arch/alpha/mm/init.c
./arch/arm/mm/mm-armv.c
./arch/arm26/mm/mm-memc.c
./arch/i386/mm/pgtable.c
+./arch/ia64/kernel/process.c
+./arch/ia64/mm/init.c
+./arch/parisc/kernel/process.c
+./arch/parisc/mm/init.c
./arch/ppc/mm/pgtable.c
+./arch/ppc64/mm/tlb.c
./arch/s390/mm/init.c
./arch/sparc/kernel/process.c
./arch/sparc/mm/init.c
./arch/sparc/mm/srmmu.c
./arch/sparc/mm/sun4c.c
./arch/sparc64/kernel/process.c
./arch/sparc64/mm/init.c
./arch/um/kernel/mem.c
+./include/asm-alpha/tlb.h
+./include/asm-arm/tlb.h
+./include/asm-arm26/tlb.h
+./include/asm-generic/tlb.h
+./include/asm-ia64/tlb.h
+./include/asm-m68k/pgtable.h
+./include/asm-parisc/tlb.h
+./include/asm-ppc64/tlb.h
+./include/asm-sparc64/pgtable.h
+./include/asm-sparc64/tlb.h
+./include/asm-x86_64/pgtable.h
./kernel/fork.c
./mm/memory.c
Files prefixed with '+' use pgalloc.h definitions but do not
directly include it. I'm intending cleaning ARM up as far as
this header goes. It may be worth others gradually (over time)
submitting tested patches to remove the gross needless include
of asm/pgalloc.h
Why is this such an issue... When we come to (3), all the files
which incorrectly include asm/pgalloc.h suddenly break.
3. Move asm/pgtable.h include, along with xxx_alloc prototypes
into linux/pgtable.h, add linux/mm.h, and update files to use
linux/pgtable.h instead of linux/mm.h if they manipulate
pgd/pmd/ptes. Eliminate asm/pgtable.h includes from all files
except linux/pgtable.h
This is the most difficult part (no kidding) because its going
to cause problems all over the place. I suspect it may be
possible to grep around to find everywhere which needs to be
updated like in (2).
Once (3) is done, we no longer have the restriction that pgtable.h
is included without having struct page, struct mm_struct,
struct vma_struct etc defined, and, we can also get at things like
the mm_struct and the userspace address from a pte.
Now, given all that, I'm going to expect a "whoa, that's too much
especially in a stable kernel series" so I think this is something
I'll keep for 2.7. However, I think (1) and (2) should at least
get sorted out for sanity sake.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 19:55 ` Russell King
@ 2004-04-18 20:01 ` Russell King
2004-04-18 23:14 ` Hugh Dickins
1 sibling, 0 replies; 17+ messages in thread
From: Russell King @ 2004-04-18 20:01 UTC (permalink / raw)
To: Hugh Dickins; +Cc: William Lee Irwin III, linux-mm
On Sun, Apr 18, 2004 at 08:55:13PM +0100, Russell King wrote:
> 2. Eliminate asm/pgalloc.h from most files.
>
> Many files appear not to use anything from this header file, but
> include it anyway. Grepping around for uses of the definitions
> in asm/pgalloc.h reveals 52 files using or providing pgalloc.h
> definitions (including pgalloc.h files). However, a wapping
> 557 files include pgalloc.h.
B*****. grepped for the wrong include file. 203 files not 557.
> The only files which need pgalloc.h include are:
The correct list is:
./arch/alpha/mm/init.c
./arch/arm/mm/mm-armv.c
./arch/arm26/mm/mm-memc.c
./arch/i386/mm/pgtable.c
./arch/ia64/kernel/process.c
./arch/ia64/mm/init.c
./arch/parisc/kernel/process.c
./arch/parisc/mm/init.c
./arch/ppc/mm/pgtable.c
./arch/ppc64/mm/tlb.c
./arch/s390/mm/init.c
./arch/sparc/kernel/process.c
./arch/sparc/mm/init.c
./arch/sparc/mm/srmmu.c
./arch/sparc/mm/sun4c.c
./arch/sparc64/kernel/process.c
./arch/sparc64/mm/init.c
./arch/um/kernel/mem.c
+./include/asm-alpha/tlb.h
+./include/asm-arm/tlb.h
+./include/asm-arm26/tlb.h
+./include/asm-generic/tlb.h
./include/asm-ia64/tlb.h
+./include/asm-m68k/pgtable.h
+./include/asm-parisc/tlb.h
+./include/asm-ppc64/tlb.h
+./include/asm-sparc64/pgtable.h
+./include/asm-sparc64/tlb.h
+./include/asm-x86_64/pgtable.h
./kernel/fork.c
./mm/memory.c
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 19:55 ` Russell King
2004-04-18 20:01 ` Russell King
@ 2004-04-18 23:14 ` Hugh Dickins
2004-04-18 23:34 ` Russell King
1 sibling, 1 reply; 17+ messages in thread
From: Hugh Dickins @ 2004-04-18 23:14 UTC (permalink / raw)
To: Russell King; +Cc: William Lee Irwin III, linux-mm
On Sun, 18 Apr 2004, Russell King wrote:
> On Sun, Apr 18, 2004 at 01:42:28PM +0100, Russell King wrote:
> > Basically, to be able to use either ptep_to_mm() or ptep_to_address()
> > in asm/pgtable.h, you need to:
> >
> > 1. remove linux/mm.h from asm-generic/rmap.h
> > 2. somehow work around linux/highmem.h which includes linux/mm.h so
> > asm-generic/rmap.h can have a definition of kmap_atomic_to_page()
> > 3. remove asm/pgtable.h from linux/mm.h and linux/page-flags.h
> >
> > I've managed to get so far with that, but the real killer seems to
> > be (2).
>
> Ok, I've found a solution.
I think you're choosing the wrong moment to get into all of this.
Assuming one or another form of object-based rmap really does go in,
pte_addr_t, ptep_to_mm, include/asm*/rmap.h all disappear. The
patch for that went to Andrew on Friday, you were on the CC list.
Revisit in a couple of weeks?
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: PTE aging, ptep_test_and_clear_young() and TLB
2004-04-18 23:14 ` Hugh Dickins
@ 2004-04-18 23:34 ` Russell King
0 siblings, 0 replies; 17+ messages in thread
From: Russell King @ 2004-04-18 23:34 UTC (permalink / raw)
To: Hugh Dickins; +Cc: William Lee Irwin III, linux-mm
On Mon, Apr 19, 2004 at 12:14:01AM +0100, Hugh Dickins wrote:
> I think you're choosing the wrong moment to get into all of this.
>
> Assuming one or another form of object-based rmap really does go in,
> pte_addr_t, ptep_to_mm, include/asm*/rmap.h all disappear. The
> patch for that went to Andrew on Friday, you were on the CC list.
>
> Revisit in a couple of weeks?
Nevertheless, getting rid of all those needless asm/pgalloc.h includes
is something worth doing anyway. This isn't the first time not being
able to get at mm_struct / vma_area_struct in asm/pgtable.h has been
a problem.
So I think its worth sorting this out anyway, independent of the
rmap changes.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2004-04-18 23:34 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-17 20:15 PTE aging, ptep_test_and_clear_young() and TLB Russell King
2004-04-17 20:43 ` William Lee Irwin III
2004-04-18 9:36 ` Russell King
2004-04-18 9:39 ` William Lee Irwin III
2004-04-18 10:58 ` Hugh Dickins
2004-04-18 11:23 ` Russell King
2004-04-18 12:36 ` Hugh Dickins
2004-04-18 12:42 ` Russell King
2004-04-18 19:55 ` Russell King
2004-04-18 20:01 ` Russell King
2004-04-18 23:14 ` Hugh Dickins
2004-04-18 23:34 ` Russell King
2004-04-18 15:52 ` William Lee Irwin III
2004-04-18 10:42 ` Russell King
2004-04-18 15:12 ` William Lee Irwin III
2004-04-17 22:27 ` Hugh Dickins
2004-04-17 23:47 ` Anton Blanchard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox