linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [QUICKLIST 0/4] Arch independent quicklists V2
       [not found]     ` <20070313035250.f908a50e.akpm@linux-foundation.org>
@ 2007-03-13 11:06       ` Nick Piggin
  2007-03-13 12:15         ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Nick Piggin @ 2007-03-13 11:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: clameter, linux-mm, linux-kernel

Andrew Morton wrote:
>>On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:

>>Page allocator still requires interrupts to be disabled, which this doesn't.
> 
> 
> Bah.  How many cli/sti statements fit into a single cachemiss?

On a Pentium 4? ;)

Sure, that is a minor detail, considering that you'll usually be allocating
an order of magnitude or three more anon/pagecache pages than page tables.

>>Considering there isn't much else that frees known zeroed pages, I wonder if
>>it is worthwhile.
> 
> 
> If you want a zeroed page for pagecache and someone has just stuffed a
> known-zero, cache-hot page into the pagetable quicklists, you have good
> reason to be upset.

The thing is, pagetable pages are the one really good exception to the
rule that we should keep cache hot and initialise-on-demand. They
typically are fairly sparsely populated and sparsely accessed. Even
for last level page tables, I think it is reasonable to assume they will
usually be pretty cold.

And you want to allocate cache cold pages as well, for the same reasons
(you want to keep your cache hot pages for when they actually will be
used - eg. for the anon/pagecache itself).

> In fact, if you want a _non_-zeroed page and someone has just stuffed a
> known-zero, cache-hot page into the pagetable quicklists, you still have
> reason to be upset.  You *want* that cache-hot page.
> 
> Generally, all these little private lists of pages (such as the ones which
> slab had/has) are a bad deal.  Cache effects preponderate and I do think
> we're generally better off tossing the things into a central pool.

For slab I understand. And a lot of users of slab constructers were also
silly, precisely because we should initialise on demand to keep the cache
hits up.

But cold(ish?) pagetable quicklists make sense, IMO (that is, if you *must*
avoid using slab).

>>Last time the zeroidle discussion came up was IIRC not actually real performance
>>gain, just cooking the 1024 CPU threaded pagefault numbers ;)
> 
> 
> Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
> fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
> page.  But it needed too much support in core VM to bother.  Since then
> we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
> anyone having tried doing it on x86 with non-temporal stores.

You can win on specifically constructed benchmarks, easily.

But considering all the other problems you're going to introduce, we'd need
a significant win on a significant something, IMO.

You waste memory bandwidth. You also use more CPU and memory cycles
speculatively, ergo you waste more power.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 12:15         ` Andrew Morton
@ 2007-03-13 11:20           ` Christoph Lameter
  2007-03-13 12:30             ` Andrew Morton
  2007-03-13 11:30           ` Nick Piggin
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2007-03-13 11:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, linux-mm, linux-kernel

On Tue, 13 Mar 2007, Andrew Morton wrote:

> Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
> anyone having tried it properly...

Ok, then what did I do wrong 3 years ago with the prezeroing patchsets?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 12:15         ` Andrew Morton
  2007-03-13 11:20           ` Christoph Lameter
@ 2007-03-13 11:30           ` Nick Piggin
  2007-03-13 12:47             ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: Nick Piggin @ 2007-03-13 11:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: clameter, linux-mm, linux-kernel

Andrew Morton wrote:
>>On Tue, 13 Mar 2007 22:06:46 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>>Andrew Morton wrote:
>>
>>>>On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>>
>>...
>>
>>
>>>>Page allocator still requires interrupts to be disabled, which this doesn't.
> 
> 
>>>>it is worthwhile.
>>>
>>>
>>>If you want a zeroed page for pagecache and someone has just stuffed a
>>>known-zero, cache-hot page into the pagetable quicklists, you have good
>>>reason to be upset.
>>
>>The thing is, pagetable pages are the one really good exception to the
>>rule that we should keep cache hot and initialise-on-demand. They
>>typically are fairly sparsely populated and sparsely accessed. Even
>>for last level page tables, I think it is reasonable to assume they will
>>usually be pretty cold.
> 
> 
> eh?  I'd have thought that a pte page which has just gone through
> zap_pte_range() will very often have a _lot_ of hot cachelines, and
> that's a common case.
> 
> Still.   It's pretty easy to test.

Well I guess that would be the case if you had just unmapped a 4MB
chunk that was pretty dense with pages.

My malloc seems to allocate and free in blocks of 128K, so that's
only going to give us 3% of the last level pte being cache hot when
it gets freed. Not sure what common mmap(file) access patterns
look like.

The majority of programs I run have a smattering of llpt pages
pretty sparsely populated, covering text, libraries, heap, stack,
vdso.

We don't actually have to zap_pte_range the entire page table in
order to free it (IIRC we used to have to, before the 4lpt patches).

But yeah let's see some tests. I would definitely want to avoid this
extra layer of complexity if it is just as good to return the pages
to the pcp lists.

>>>Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
>>>fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
>>>page.  But it needed too much support in core VM to bother.  Since then
>>>we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
>>>anyone having tried doing it on x86 with non-temporal stores.
>>
>>You can win on specifically constructed benchmarks, easily.
>>
>>But considering all the other problems you're going to introduce, we'd need
>>a significant win on a significant something, IMO.
>>
>>You waste memory bandwidth. You also use more CPU and memory cycles
>>speculatively, ergo you waste more power.
> 
> 
> Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
> anyone having tried it properly...

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 12:47             ` Andrew Morton
@ 2007-03-13 12:01               ` Nick Piggin
  2007-03-13 13:11                 ` Andrew Morton
  2007-03-13 17:30                 ` Jeremy Fitzhardinge
  2007-03-14  1:12               ` William Lee Irwin III
  1 sibling, 2 replies; 21+ messages in thread
From: Nick Piggin @ 2007-03-13 12:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: clameter, linux-mm, linux-kernel

Andrew Morton wrote:
>>On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>>We don't actually have to zap_pte_range the entire page table in
>>order to free it (IIRC we used to have to, before the 4lpt patches).
> 
> 
> I'm trying to remember why we ever would have needed to zero out the pagetable
> pages if we're taking down the whole mm?  Maybe it's because "oh, the
> arch wants to put this page into a quicklist to recycle it", which is
> all rather circular.
> 
> It would be interesting to look at a) leave the page full of random garbage
> if we're releasing the whole mm and b) return it straight to the page allocator.

Well we have the 'fullmm' case, which avoids all the locked pte operations
(for those architectures where hardware pt walking requires atomicity).

However we still have to visit those to-be-unmapped parts of the page table,
to find the pages and free them. So we still at least need to bring it into
cache for the read... at which point, the store probably isn't a big burden.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 11:06       ` [QUICKLIST 0/4] Arch independent quicklists V2 Nick Piggin
@ 2007-03-13 12:15         ` Andrew Morton
  2007-03-13 11:20           ` Christoph Lameter
  2007-03-13 11:30           ` Nick Piggin
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2007-03-13 12:15 UTC (permalink / raw)
  To: Nick Piggin; +Cc: clameter, linux-mm, linux-kernel

> On Tue, 13 Mar 2007 22:06:46 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Andrew Morton wrote:
> >>On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> ...
>
> >>Page allocator still requires interrupts to be disabled, which this doesn't.

> >>it is worthwhile.
> > 
> > 
> > If you want a zeroed page for pagecache and someone has just stuffed a
> > known-zero, cache-hot page into the pagetable quicklists, you have good
> > reason to be upset.
> 
> The thing is, pagetable pages are the one really good exception to the
> rule that we should keep cache hot and initialise-on-demand. They
> typically are fairly sparsely populated and sparsely accessed. Even
> for last level page tables, I think it is reasonable to assume they will
> usually be pretty cold.

eh?  I'd have thought that a pte page which has just gone through
zap_pte_range() will very often have a _lot_ of hot cachelines, and
that's a common case.

Still.   It's pretty easy to test.

> > 
> > Maybe, dunno.  It was apparently a win on powerpc many years ago.  I had a
> > fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the
> > page.  But it needed too much support in core VM to bother.  Since then
> > we've grown per-cpu page magazines and __GFP_ZERO.  Plus I'm not aware of
> > anyone having tried doing it on x86 with non-temporal stores.
> 
> You can win on specifically constructed benchmarks, easily.
> 
> But considering all the other problems you're going to introduce, we'd need
> a significant win on a significant something, IMO.
> 
> You waste memory bandwidth. You also use more CPU and memory cycles
> speculatively, ergo you waste more power.

Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
anyone having tried it properly...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 13:11                 ` Andrew Morton
@ 2007-03-13 12:18                   ` Nick Piggin
  0 siblings, 0 replies; 21+ messages in thread
From: Nick Piggin @ 2007-03-13 12:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: clameter, linux-mm, linux-kernel

Andrew Morton wrote:
>>On Tue, 13 Mar 2007 23:01:11 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>>Andrew Morton wrote:

>>>It would be interesting to look at a) leave the page full of random garbage
>>>if we're releasing the whole mm and b) return it straight to the page allocator.
>>
>>Well we have the 'fullmm' case, which avoids all the locked pte operations
>>(for those architectures where hardware pt walking requires atomicity).
> 
> 
> I suspect there are some tlb operations which could be skipped in that case
> too.

Depends on the tlb flush implementation. The generic one doesn't look like
it is all that smart about optimising the fullmm case. It does skip some
tlb flushing though.

>>However we still have to visit those to-be-unmapped parts of the page table
>>to find the pages and free them. So we still at least need to bring it into
>>cache for the read... at which point, the store probably isn't a big burden.
> 
> 
> It means all that data has to be written back.  Yes, I expect it'll prove
> to be less costly than the initial load.

Still, it is something we could try.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 11:20           ` Christoph Lameter
@ 2007-03-13 12:30             ` Andrew Morton
  2007-03-15 20:23               ` Christoph Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2007-03-13 12:30 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: nickpiggin, linux-mm, linux-kernel

> On Tue, 13 Mar 2007 04:20:48 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> On Tue, 13 Mar 2007, Andrew Morton wrote:
> 
> > Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
> > anyone having tried it properly...
> 
> Ok, then what did I do wrong 3 years ago with the prezeroing patchsets?

Failed to provide us a link to it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 11:30           ` Nick Piggin
@ 2007-03-13 12:47             ` Andrew Morton
  2007-03-13 12:01               ` Nick Piggin
  2007-03-14  1:12               ` William Lee Irwin III
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2007-03-13 12:47 UTC (permalink / raw)
  To: Nick Piggin; +Cc: clameter, linux-mm, linux-kernel

> On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> We don't actually have to zap_pte_range the entire page table in
> order to free it (IIRC we used to have to, before the 4lpt patches).

I'm trying to remember why we ever would have needed to zero out the pagetable
pages if we're taking down the whole mm?  Maybe it's because "oh, the
arch wants to put this page into a quicklist to recycle it", which is
all rather circular.

It would be interesting to look at a) leave the page full of random garbage
if we're releasing the whole mm and b) return it straight to the page allocator.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 12:01               ` Nick Piggin
@ 2007-03-13 13:11                 ` Andrew Morton
  2007-03-13 12:18                   ` Nick Piggin
  2007-03-13 17:30                 ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2007-03-13 13:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: clameter, linux-mm, linux-kernel

> On Tue, 13 Mar 2007 23:01:11 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Andrew Morton wrote:
> >>On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> >>We don't actually have to zap_pte_range the entire page table in
> >>order to free it (IIRC we used to have to, before the 4lpt patches).
> > 
> > 
> > I'm trying to remember why we ever would have needed to zero out the pagetable
> > pages if we're taking down the whole mm?  Maybe it's because "oh, the
> > arch wants to put this page into a quicklist to recycle it", which is
> > all rather circular.
> > 
> > It would be interesting to look at a) leave the page full of random garbage
> > if we're releasing the whole mm and b) return it straight to the page allocator.
> 
> Well we have the 'fullmm' case, which avoids all the locked pte operations
> (for those architectures where hardware pt walking requires atomicity).

I suspect there are some tlb operations which could be skipped in that case
too.

> However we still have to visit those to-be-unmapped parts of the page table
> to find the pages and free them. So we still at least need to bring it into
> cache for the read... at which point, the store probably isn't a big burden.

It means all that data has to be written back.  Yes, I expect it'll prove
to be less costly than the initial load.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 12:01               ` Nick Piggin
  2007-03-13 13:11                 ` Andrew Morton
@ 2007-03-13 17:30                 ` Jeremy Fitzhardinge
  2007-03-13 20:03                   ` Matt Mackall
  1 sibling, 1 reply; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-13 17:30 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, clameter, linux-mm, linux-kernel

Nick Piggin wrote:
> However we still have to visit those to-be-unmapped parts of the page
> table,
> to find the pages and free them. So we still at least need to bring it
> into
> cache for the read... at which point, the store probably isn't a big
> burden.

Why not try to find a place to stash a linklist pointer and link them
all together?  Saves the pulldown pagetable walk altogether.

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 17:30                 ` Jeremy Fitzhardinge
@ 2007-03-13 20:03                   ` Matt Mackall
  2007-03-13 20:17                     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 21+ messages in thread
From: Matt Mackall @ 2007-03-13 20:03 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, Andrew Morton, clameter, linux-mm, linux-kernel

On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote:
> Nick Piggin wrote:
> > However we still have to visit those to-be-unmapped parts of the page
> > table,
> > to find the pages and free them. So we still at least need to bring it
> > into
> > cache for the read... at which point, the store probably isn't a big
> > burden.
> 
> Why not try to find a place to stash a linklist pointer and link them
> all together?  Saves the pulldown pagetable walk altogether.

Because we'd need one link per mm that a page is mapped in?

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 20:03                   ` Matt Mackall
@ 2007-03-13 20:17                     ` Jeremy Fitzhardinge
  2007-03-13 20:21                       ` Matt Mackall
  0 siblings, 1 reply; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-13 20:17 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Nick Piggin, Andrew Morton, clameter, linux-mm, linux-kernel

Matt Mackall wrote:
> On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote:
>   
>> Nick Piggin wrote:
>>     
>>> However we still have to visit those to-be-unmapped parts of the page
>>> table,
>>> to find the pages and free them. So we still at least need to bring it
>>> into
>>> cache for the read... at which point, the store probably isn't a big
>>> burden.
>>>       
>> Why not try to find a place to stash a linklist pointer and link them
>> all together?  Saves the pulldown pagetable walk altogether.
>>     
>
> Because we'd need one link per mm that a page is mapped in?
>   

Can pagetable pages be shared between mms?  (Kernel pmds in PAE excepted.)

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 20:17                     ` Jeremy Fitzhardinge
@ 2007-03-13 20:21                       ` Matt Mackall
  2007-03-13 21:07                         ` David Miller, Matt Mackall
  0 siblings, 1 reply; 21+ messages in thread
From: Matt Mackall @ 2007-03-13 20:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, Andrew Morton, clameter, linux-mm, linux-kernel

On Tue, Mar 13, 2007 at 01:17:00PM -0700, Jeremy Fitzhardinge wrote:
> Matt Mackall wrote:
> > On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote:
> >   
> >> Nick Piggin wrote:
> >>     
> >>> However we still have to visit those to-be-unmapped parts of the page
> >>> table,
> >>> to find the pages and free them. So we still at least need to bring it
> >>> into
> >>> cache for the read... at which point, the store probably isn't a big
> >>> burden.
> >>>       
> >> Why not try to find a place to stash a linklist pointer and link them
> >> all together?  Saves the pulldown pagetable walk altogether.
> >>     
> >
> > Because we'd need one link per mm that a page is mapped in?
> >   
> 
> Can pagetable pages be shared between mms?  (Kernel pmds in PAE excepted.)

Ahh, I think the issue is that we have to walk the page tables to drop
the reference count of the _actual pages_ they point to. The page
tables themselves could all be put on a list or two lists (one for
PMDs, one for everything else), but that wouldn't really be a win over
just walking the tree, especially given the extra list maintenance.

Because the fan-out is large, the bulk of the work is bringing the last
layer of the tree into cache to find all the pages in the address
space. And there's really no way around that.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 20:21                       ` Matt Mackall
@ 2007-03-13 21:07                         ` David Miller, Matt Mackall
  2007-03-13 21:14                           ` Matt Mackall
  0 siblings, 1 reply; 21+ messages in thread
From: David Miller, Matt Mackall @ 2007-03-13 21:07 UTC (permalink / raw)
  To: mpm; +Cc: jeremy, nickpiggin, akpm, clameter, linux-mm, linux-kernel

> Because the fan-out is large, the bulk of the work is bringing the last
> layer of the tree into cache to find all the pages in the address
> space. And there's really no way around that.

That's right.

And I will note that historically we used to be much worse
in this area, as we used to walk the page table tree twice
on address space teardown (once to hit the PTE entries, once
to free the page tables).

Happily it is a one-pass algorithm now.

But, within active VMA ranges, we do have to walk all
the bits at least one time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 21:07                         ` David Miller, Matt Mackall
@ 2007-03-13 21:14                           ` Matt Mackall
  2007-03-13 21:36                             ` Jeremy Fitzhardinge
  2007-03-13 21:48                             ` David Miller, Matt Mackall
  0 siblings, 2 replies; 21+ messages in thread
From: Matt Mackall @ 2007-03-13 21:14 UTC (permalink / raw)
  To: David Miller; +Cc: jeremy, nickpiggin, akpm, clameter, linux-mm, linux-kernel

On Tue, Mar 13, 2007 at 02:07:22PM -0700, David Miller wrote:
> From: Matt Mackall <mpm@selenic.com>
> Date: Tue, 13 Mar 2007 15:21:25 -0500
> 
> > Because the fan-out is large, the bulk of the work is bringing the last
> > layer of the tree into cache to find all the pages in the address
> > space. And there's really no way around that.
> 
> That's right.
> 
> And I will note that historically we used to be much worse
> in this area, as we used to walk the page table tree twice
> on address space teardown (once to hit the PTE entries, once
> to free the page tables).
> 
> Happily it is a one-pass algorithm now.
> 
> But, within active VMA ranges, we do have to walk all
> the bits at least one time.

Well you -could- do this:

- reuse a long in struct page as a used map that divides the page up
  into 32 or 64 segments
- every time you set a PTE, set the corresponding bit in the mask
- when we zap, only visit the regions set in the mask

Thus, you avoid visiting most of a PMD page in the sparse case,
assuming PTEs aren't evenly spread across the PMD.

This might not even be too horrible as the appropriate struct page
should be in cache with the appropriate bits of the mm already locked,
etc.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 21:14                           ` Matt Mackall
@ 2007-03-13 21:36                             ` Jeremy Fitzhardinge
  2007-03-13 21:46                               ` Peter Chubb
  2007-03-13 21:48                             ` David Miller, Matt Mackall
  1 sibling, 1 reply; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2007-03-13 21:36 UTC (permalink / raw)
  To: Matt Mackall
  Cc: David Miller, nickpiggin, akpm, clameter, linux-mm, linux-kernel

Matt Mackall wrote:
> Well you -could- do this:
>
> - reuse a long in struct page as a used map that divides the page up
>   into 32 or 64 segments
> - every time you set a PTE, set the corresponding bit in the mask
> - when we zap, only visit the regions set in the mask
>
> Thus, you avoid visiting most of a PMD page in the sparse case,
> assuming PTEs aren't evenly spread across the PMD.
>
> This might not even be too horrible as the appropriate struct page
> should be in cache with the appropriate bits of the mm already locked,
> etc.
>   

And do the same in pte pages for actual mapped pages?  Or do you think
they would be too densely populated for it to be worthwhile?

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 21:36                             ` Jeremy Fitzhardinge
@ 2007-03-13 21:46                               ` Peter Chubb
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Chubb @ 2007-03-13 21:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, ianw
  Cc: Matt Mackall, David Miller, nickpiggin, akpm, clameter, linux-mm,
	linux-kernel

>>>>> "Jeremy" == Jeremy Fitzhardinge <jeremy@goop.org> writes:


Jeremy> And do the same in pte pages for actual mapped pages?  Or do
Jeremy> you think they would be too densely populated for it to be
Jeremy> worthwhile?

We've been doing some measurements on how densely clumped ptes are.
On 32-bit platforms, they're pretty dense.  On IA64, quite a bit
sparser, depending on the workload of course.  I think that's mostly because
of the larger pagesize on IA64 -- with 64k pages, you don't need very
many to map a small object.

I'm hoping IanW can give more details.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 21:14                           ` Matt Mackall
  2007-03-13 21:36                             ` Jeremy Fitzhardinge
@ 2007-03-13 21:48                             ` David Miller, Matt Mackall
  1 sibling, 0 replies; 21+ messages in thread
From: David Miller, Matt Mackall @ 2007-03-13 21:48 UTC (permalink / raw)
  To: mpm; +Cc: jeremy, nickpiggin, akpm, clameter, linux-mm, linux-kernel

> Well you -could- do this:
> 
> - reuse a long in struct page as a used map that divides the page up
>   into 32 or 64 segments
> - every time you set a PTE, set the corresponding bit in the mask
> - when we zap, only visit the regions set in the mask
> 
> Thus, you avoid visiting most of a PMD page in the sparse case,
> assuming PTEs aren't evenly spread across the PMD.
> 
> This might not even be too horrible as the appropriate struct page
> should be in cache with the appropriate bits of the mm already locked,
> etc.

Yes, I've even had that idea before.

You can even hide it behind pmd_none() et al., the generic VM
doesn't even have to know that the page table macros are doing
this optimization.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 12:47             ` Andrew Morton
  2007-03-13 12:01               ` Nick Piggin
@ 2007-03-14  1:12               ` William Lee Irwin III
  2007-03-15 23:12                 ` William Lee Irwin III
  1 sibling, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2007-03-14  1:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, clameter, linux-mm, linux-kernel

On Tue, Mar 13, 2007 at 04:47:56AM -0800, Andrew Morton wrote:
> I'm trying to remember why we ever would have needed to zero out the
> pagetable pages if we're taking down the whole mm?  Maybe it's
> because "oh, the arch wants to put this page into a quicklist to
> recycle it", which is all rather circular.
> It would be interesting to look at a) leave the page full of random
> garbage if we're releasing the whole mm and b) return it straight to
> the page allocator.

We never did need to modify ptes on exit() or other pagetable prunings
(not that they were ever done outside exit() before 2.6.x). The only
subtlety is that pruning on munmap() needs a TLB flush for the TLB
itself to drop the references to the pages referred to by the PTE's on
pruning in the presence of hardware pagetable walkers (in the exit()
case there are no user execution contexts left to potentially utilize
the dead translations so it's less important). That's handled by
tlb_remove_page() and shouldn't need any updates across such a change.

I believe the zeroing on teardown was largely a result of idiom vs.
any particular need. Essentially using ptep_get_and_clear() to handle
the non-pruning munmap() case in a manner unified with other pagetable
teardowns. Also likely is 2.4.x legacy from when that and possibly
earlier kernels maintained arch-private quicklists for pagetables.

There are furthermore distinctions to make between fork() and execve().
fork() stomps over the entire process address space copying pagetables
en masse. After execve() a process incrementally faults in PTE's one at
a time. It should be clear that if case analyses are of interest at
all, fork() will want cache-hot pages (cache-preloaded pages?) where
such are largely wasted on incremental faults after execve(). The copy
operations in fork() should probably also be examined in the context of
shared pagetables at some point.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-13 12:30             ` Andrew Morton
@ 2007-03-15 20:23               ` Christoph Lameter
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2007-03-15 20:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, linux-mm, linux-kernel

On Tue, 13 Mar 2007, Andrew Morton wrote:

> > On Tue, 13 Mar 2007 04:20:48 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> > On Tue, 13 Mar 2007, Andrew Morton wrote:
> > 
> > > Yeah, prezeroing in idle is probably pointless.  But I'm not aware of
> > > anyone having tried it properly...
> > 
> > Ok, then what did I do wrong 3 years ago with the prezeroing patchsets?
> 
> Failed to provide us a link to it?

You merged part of it and were involved in the discussions.

General overviews:

http://lwn.net/Articles/117881/
http://lwn.net/Articles/128225/

The details on the problems with prezeroing and touching multiple 
cachelines of the page.

http://www.gelato.unsw.edu.au/archives/linux-ia64/0412/12252.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [QUICKLIST 0/4] Arch independent quicklists V2
  2007-03-14  1:12               ` William Lee Irwin III
@ 2007-03-15 23:12                 ` William Lee Irwin III
  0 siblings, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2007-03-15 23:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, clameter, linux-mm, linux-kernel

On Tue, Mar 13, 2007 at 06:12:44PM -0700, William Lee Irwin III wrote:
> There are furthermore distinctions to make between fork() and execve().
> fork() stomps over the entire process address space copying pagetables
> en masse. After execve() a process incrementally faults in PTE's one at
> a time. It should be clear that if case analyses are of interest at
> all, fork() will want cache-hot pages (cache-preloaded pages?) where
> such are largely wasted on incremental faults after execve(). The copy
> operations in fork() should probably also be examined in the context of
> shared pagetables at some point.

To make this perfectly clear, we can deal with the varying usage cases
with hot/cold flags to the pagetable allocator functions. Where bulk
copies such as fork() are happening, it makes perfect sense to
precharge the cache by eager zeroing. Where sparse single pte affairs
such as incrementally faulting things in after execve() are involved,
cache cold preconstructed pagetable pages are ideal. Address hints
could furthermore be used to precharge single cachelines (e.g. via
prefetch) in the sparse usage case.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2007-03-15 23:12 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20070313071325.4920.82870.sendpatchset@schroedinger.engr.sgi.com>
     [not found] ` <20070313005334.853559ca.akpm@linux-foundation.org>
     [not found]   ` <45F65ADA.9010501@yahoo.com.au>
     [not found]     ` <20070313035250.f908a50e.akpm@linux-foundation.org>
2007-03-13 11:06       ` [QUICKLIST 0/4] Arch independent quicklists V2 Nick Piggin
2007-03-13 12:15         ` Andrew Morton
2007-03-13 11:20           ` Christoph Lameter
2007-03-13 12:30             ` Andrew Morton
2007-03-15 20:23               ` Christoph Lameter
2007-03-13 11:30           ` Nick Piggin
2007-03-13 12:47             ` Andrew Morton
2007-03-13 12:01               ` Nick Piggin
2007-03-13 13:11                 ` Andrew Morton
2007-03-13 12:18                   ` Nick Piggin
2007-03-13 17:30                 ` Jeremy Fitzhardinge
2007-03-13 20:03                   ` Matt Mackall
2007-03-13 20:17                     ` Jeremy Fitzhardinge
2007-03-13 20:21                       ` Matt Mackall
2007-03-13 21:07                         ` David Miller, Matt Mackall
2007-03-13 21:14                           ` Matt Mackall
2007-03-13 21:36                             ` Jeremy Fitzhardinge
2007-03-13 21:46                               ` Peter Chubb
2007-03-13 21:48                             ` David Miller, Matt Mackall
2007-03-14  1:12               ` William Lee Irwin III
2007-03-15 23:12                 ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox