* Re: [QUICKLIST 0/4] Arch independent quicklists V2 [not found] ` <20070313035250.f908a50e.akpm@linux-foundation.org> @ 2007-03-13 11:06 ` Nick Piggin 2007-03-13 12:15 ` Andrew Morton 0 siblings, 1 reply; 21+ messages in thread From: Nick Piggin @ 2007-03-13 11:06 UTC (permalink / raw) To: Andrew Morton; +Cc: clameter, linux-mm, linux-kernel Andrew Morton wrote: >>On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: >>Page allocator still requires interrupts to be disabled, which this doesn't. > > > Bah. How many cli/sti statements fit into a single cachemiss? On a Pentium 4? ;) Sure, that is a minor detail, considering that you'll usually be allocating an order of magnitude or three more anon/pagecache pages than page tables. >>Considering there isn't much else that frees known zeroed pages, I wonder if >>it is worthwhile. > > > If you want a zeroed page for pagecache and someone has just stuffed a > known-zero, cache-hot page into the pagetable quicklists, you have good > reason to be upset. The thing is, pagetable pages are the one really good exception to the rule that we should keep cache hot and initialise-on-demand. They typically are fairly sparsely populated and sparsely accessed. Even for last level page tables, I think it is reasonable to assume they will usually be pretty cold. And you want to allocate cache cold pages as well, for the same reasons (you want to keep your cache hot pages for when they actually will be used - eg. for the anon/pagecache itself). > In fact, if you want a _non_-zeroed page and someone has just stuffed a > known-zero, cache-hot page into the pagetable quicklists, you still have > reason to be upset. You *want* that cache-hot page. > > Generally, all these little private lists of pages (such as the ones which > slab had/has) are a bad deal. Cache effects preponderate and I do think > we're generally better off tossing the things into a central pool. For slab I understand. And a lot of users of slab constructers were also silly, precisely because we should initialise on demand to keep the cache hits up. But cold(ish?) pagetable quicklists make sense, IMO (that is, if you *must* avoid using slab). >>Last time the zeroidle discussion came up was IIRC not actually real performance >>gain, just cooking the 1024 CPU threaded pagefault numbers ;) > > > Maybe, dunno. It was apparently a win on powerpc many years ago. I had a > fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the > page. But it needed too much support in core VM to bother. Since then > we've grown per-cpu page magazines and __GFP_ZERO. Plus I'm not aware of > anyone having tried doing it on x86 with non-temporal stores. You can win on specifically constructed benchmarks, easily. But considering all the other problems you're going to introduce, we'd need a significant win on a significant something, IMO. You waste memory bandwidth. You also use more CPU and memory cycles speculatively, ergo you waste more power. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 11:06 ` [QUICKLIST 0/4] Arch independent quicklists V2 Nick Piggin @ 2007-03-13 12:15 ` Andrew Morton 2007-03-13 11:20 ` Christoph Lameter 2007-03-13 11:30 ` Nick Piggin 0 siblings, 2 replies; 21+ messages in thread From: Andrew Morton @ 2007-03-13 12:15 UTC (permalink / raw) To: Nick Piggin; +Cc: clameter, linux-mm, linux-kernel > On Tue, 13 Mar 2007 22:06:46 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > Andrew Morton wrote: > >>On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > ... > > >>Page allocator still requires interrupts to be disabled, which this doesn't. > >>it is worthwhile. > > > > > > If you want a zeroed page for pagecache and someone has just stuffed a > > known-zero, cache-hot page into the pagetable quicklists, you have good > > reason to be upset. > > The thing is, pagetable pages are the one really good exception to the > rule that we should keep cache hot and initialise-on-demand. They > typically are fairly sparsely populated and sparsely accessed. Even > for last level page tables, I think it is reasonable to assume they will > usually be pretty cold. eh? I'd have thought that a pte page which has just gone through zap_pte_range() will very often have a _lot_ of hot cachelines, and that's a common case. Still. It's pretty easy to test. > > > > Maybe, dunno. It was apparently a win on powerpc many years ago. I had a > > fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the > > page. But it needed too much support in core VM to bother. Since then > > we've grown per-cpu page magazines and __GFP_ZERO. Plus I'm not aware of > > anyone having tried doing it on x86 with non-temporal stores. > > You can win on specifically constructed benchmarks, easily. > > But considering all the other problems you're going to introduce, we'd need > a significant win on a significant something, IMO. > > You waste memory bandwidth. You also use more CPU and memory cycles > speculatively, ergo you waste more power. Yeah, prezeroing in idle is probably pointless. But I'm not aware of anyone having tried it properly... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 12:15 ` Andrew Morton @ 2007-03-13 11:20 ` Christoph Lameter 2007-03-13 12:30 ` Andrew Morton 2007-03-13 11:30 ` Nick Piggin 1 sibling, 1 reply; 21+ messages in thread From: Christoph Lameter @ 2007-03-13 11:20 UTC (permalink / raw) To: Andrew Morton; +Cc: Nick Piggin, linux-mm, linux-kernel On Tue, 13 Mar 2007, Andrew Morton wrote: > Yeah, prezeroing in idle is probably pointless. But I'm not aware of > anyone having tried it properly... Ok, then what did I do wrong 3 years ago with the prezeroing patchsets? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 11:20 ` Christoph Lameter @ 2007-03-13 12:30 ` Andrew Morton 2007-03-15 20:23 ` Christoph Lameter 0 siblings, 1 reply; 21+ messages in thread From: Andrew Morton @ 2007-03-13 12:30 UTC (permalink / raw) To: Christoph Lameter; +Cc: nickpiggin, linux-mm, linux-kernel > On Tue, 13 Mar 2007 04:20:48 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote: > On Tue, 13 Mar 2007, Andrew Morton wrote: > > > Yeah, prezeroing in idle is probably pointless. But I'm not aware of > > anyone having tried it properly... > > Ok, then what did I do wrong 3 years ago with the prezeroing patchsets? Failed to provide us a link to it? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 12:30 ` Andrew Morton @ 2007-03-15 20:23 ` Christoph Lameter 0 siblings, 0 replies; 21+ messages in thread From: Christoph Lameter @ 2007-03-15 20:23 UTC (permalink / raw) To: Andrew Morton; +Cc: nickpiggin, linux-mm, linux-kernel On Tue, 13 Mar 2007, Andrew Morton wrote: > > On Tue, 13 Mar 2007 04:20:48 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote: > > On Tue, 13 Mar 2007, Andrew Morton wrote: > > > > > Yeah, prezeroing in idle is probably pointless. But I'm not aware of > > > anyone having tried it properly... > > > > Ok, then what did I do wrong 3 years ago with the prezeroing patchsets? > > Failed to provide us a link to it? You merged part of it and were involved in the discussions. General overviews: http://lwn.net/Articles/117881/ http://lwn.net/Articles/128225/ The details on the problems with prezeroing and touching multiple cachelines of the page. http://www.gelato.unsw.edu.au/archives/linux-ia64/0412/12252.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 12:15 ` Andrew Morton 2007-03-13 11:20 ` Christoph Lameter @ 2007-03-13 11:30 ` Nick Piggin 2007-03-13 12:47 ` Andrew Morton 1 sibling, 1 reply; 21+ messages in thread From: Nick Piggin @ 2007-03-13 11:30 UTC (permalink / raw) To: Andrew Morton; +Cc: clameter, linux-mm, linux-kernel Andrew Morton wrote: >>On Tue, 13 Mar 2007 22:06:46 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: >>Andrew Morton wrote: >> >>>>On Tue, 13 Mar 2007 19:03:38 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: >> >>... >> >> >>>>Page allocator still requires interrupts to be disabled, which this doesn't. > > >>>>it is worthwhile. >>> >>> >>>If you want a zeroed page for pagecache and someone has just stuffed a >>>known-zero, cache-hot page into the pagetable quicklists, you have good >>>reason to be upset. >> >>The thing is, pagetable pages are the one really good exception to the >>rule that we should keep cache hot and initialise-on-demand. They >>typically are fairly sparsely populated and sparsely accessed. Even >>for last level page tables, I think it is reasonable to assume they will >>usually be pretty cold. > > > eh? I'd have thought that a pte page which has just gone through > zap_pte_range() will very often have a _lot_ of hot cachelines, and > that's a common case. > > Still. It's pretty easy to test. Well I guess that would be the case if you had just unmapped a 4MB chunk that was pretty dense with pages. My malloc seems to allocate and free in blocks of 128K, so that's only going to give us 3% of the last level pte being cache hot when it gets freed. Not sure what common mmap(file) access patterns look like. The majority of programs I run have a smattering of llpt pages pretty sparsely populated, covering text, libraries, heap, stack, vdso. We don't actually have to zap_pte_range the entire page table in order to free it (IIRC we used to have to, before the 4lpt patches). But yeah let's see some tests. I would definitely want to avoid this extra layer of complexity if it is just as good to return the pages to the pcp lists. >>>Maybe, dunno. It was apparently a win on powerpc many years ago. I had a >>>fiddle with it 5-6 years ago on x86 using a cache-disabled mapping of the >>>page. But it needed too much support in core VM to bother. Since then >>>we've grown per-cpu page magazines and __GFP_ZERO. Plus I'm not aware of >>>anyone having tried doing it on x86 with non-temporal stores. >> >>You can win on specifically constructed benchmarks, easily. >> >>But considering all the other problems you're going to introduce, we'd need >>a significant win on a significant something, IMO. >> >>You waste memory bandwidth. You also use more CPU and memory cycles >>speculatively, ergo you waste more power. > > > Yeah, prezeroing in idle is probably pointless. But I'm not aware of > anyone having tried it properly... -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 11:30 ` Nick Piggin @ 2007-03-13 12:47 ` Andrew Morton 2007-03-13 12:01 ` Nick Piggin 2007-03-14 1:12 ` William Lee Irwin III 0 siblings, 2 replies; 21+ messages in thread From: Andrew Morton @ 2007-03-13 12:47 UTC (permalink / raw) To: Nick Piggin; +Cc: clameter, linux-mm, linux-kernel > On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > We don't actually have to zap_pte_range the entire page table in > order to free it (IIRC we used to have to, before the 4lpt patches). I'm trying to remember why we ever would have needed to zero out the pagetable pages if we're taking down the whole mm? Maybe it's because "oh, the arch wants to put this page into a quicklist to recycle it", which is all rather circular. It would be interesting to look at a) leave the page full of random garbage if we're releasing the whole mm and b) return it straight to the page allocator. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 12:47 ` Andrew Morton @ 2007-03-13 12:01 ` Nick Piggin 2007-03-13 13:11 ` Andrew Morton 2007-03-13 17:30 ` Jeremy Fitzhardinge 2007-03-14 1:12 ` William Lee Irwin III 1 sibling, 2 replies; 21+ messages in thread From: Nick Piggin @ 2007-03-13 12:01 UTC (permalink / raw) To: Andrew Morton; +Cc: clameter, linux-mm, linux-kernel Andrew Morton wrote: >>On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: >>We don't actually have to zap_pte_range the entire page table in >>order to free it (IIRC we used to have to, before the 4lpt patches). > > > I'm trying to remember why we ever would have needed to zero out the pagetable > pages if we're taking down the whole mm? Maybe it's because "oh, the > arch wants to put this page into a quicklist to recycle it", which is > all rather circular. > > It would be interesting to look at a) leave the page full of random garbage > if we're releasing the whole mm and b) return it straight to the page allocator. Well we have the 'fullmm' case, which avoids all the locked pte operations (for those architectures where hardware pt walking requires atomicity). However we still have to visit those to-be-unmapped parts of the page table, to find the pages and free them. So we still at least need to bring it into cache for the read... at which point, the store probably isn't a big burden. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 12:01 ` Nick Piggin @ 2007-03-13 13:11 ` Andrew Morton 2007-03-13 12:18 ` Nick Piggin 2007-03-13 17:30 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 21+ messages in thread From: Andrew Morton @ 2007-03-13 13:11 UTC (permalink / raw) To: Nick Piggin; +Cc: clameter, linux-mm, linux-kernel > On Tue, 13 Mar 2007 23:01:11 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > Andrew Morton wrote: > >>On Tue, 13 Mar 2007 22:30:19 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > >>We don't actually have to zap_pte_range the entire page table in > >>order to free it (IIRC we used to have to, before the 4lpt patches). > > > > > > I'm trying to remember why we ever would have needed to zero out the pagetable > > pages if we're taking down the whole mm? Maybe it's because "oh, the > > arch wants to put this page into a quicklist to recycle it", which is > > all rather circular. > > > > It would be interesting to look at a) leave the page full of random garbage > > if we're releasing the whole mm and b) return it straight to the page allocator. > > Well we have the 'fullmm' case, which avoids all the locked pte operations > (for those architectures where hardware pt walking requires atomicity). I suspect there are some tlb operations which could be skipped in that case too. > However we still have to visit those to-be-unmapped parts of the page table > to find the pages and free them. So we still at least need to bring it into > cache for the read... at which point, the store probably isn't a big burden. It means all that data has to be written back. Yes, I expect it'll prove to be less costly than the initial load. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 13:11 ` Andrew Morton @ 2007-03-13 12:18 ` Nick Piggin 0 siblings, 0 replies; 21+ messages in thread From: Nick Piggin @ 2007-03-13 12:18 UTC (permalink / raw) To: Andrew Morton; +Cc: clameter, linux-mm, linux-kernel Andrew Morton wrote: >>On Tue, 13 Mar 2007 23:01:11 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote: >>Andrew Morton wrote: >>>It would be interesting to look at a) leave the page full of random garbage >>>if we're releasing the whole mm and b) return it straight to the page allocator. >> >>Well we have the 'fullmm' case, which avoids all the locked pte operations >>(for those architectures where hardware pt walking requires atomicity). > > > I suspect there are some tlb operations which could be skipped in that case > too. Depends on the tlb flush implementation. The generic one doesn't look like it is all that smart about optimising the fullmm case. It does skip some tlb flushing though. >>However we still have to visit those to-be-unmapped parts of the page table >>to find the pages and free them. So we still at least need to bring it into >>cache for the read... at which point, the store probably isn't a big burden. > > > It means all that data has to be written back. Yes, I expect it'll prove > to be less costly than the initial load. Still, it is something we could try. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 12:01 ` Nick Piggin 2007-03-13 13:11 ` Andrew Morton @ 2007-03-13 17:30 ` Jeremy Fitzhardinge 2007-03-13 20:03 ` Matt Mackall 1 sibling, 1 reply; 21+ messages in thread From: Jeremy Fitzhardinge @ 2007-03-13 17:30 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, clameter, linux-mm, linux-kernel Nick Piggin wrote: > However we still have to visit those to-be-unmapped parts of the page > table, > to find the pages and free them. So we still at least need to bring it > into > cache for the read... at which point, the store probably isn't a big > burden. Why not try to find a place to stash a linklist pointer and link them all together? Saves the pulldown pagetable walk altogether. J -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 17:30 ` Jeremy Fitzhardinge @ 2007-03-13 20:03 ` Matt Mackall 2007-03-13 20:17 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 21+ messages in thread From: Matt Mackall @ 2007-03-13 20:03 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Nick Piggin, Andrew Morton, clameter, linux-mm, linux-kernel On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote: > Nick Piggin wrote: > > However we still have to visit those to-be-unmapped parts of the page > > table, > > to find the pages and free them. So we still at least need to bring it > > into > > cache for the read... at which point, the store probably isn't a big > > burden. > > Why not try to find a place to stash a linklist pointer and link them > all together? Saves the pulldown pagetable walk altogether. Because we'd need one link per mm that a page is mapped in? -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 20:03 ` Matt Mackall @ 2007-03-13 20:17 ` Jeremy Fitzhardinge 2007-03-13 20:21 ` Matt Mackall 0 siblings, 1 reply; 21+ messages in thread From: Jeremy Fitzhardinge @ 2007-03-13 20:17 UTC (permalink / raw) To: Matt Mackall; +Cc: Nick Piggin, Andrew Morton, clameter, linux-mm, linux-kernel Matt Mackall wrote: > On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote: > >> Nick Piggin wrote: >> >>> However we still have to visit those to-be-unmapped parts of the page >>> table, >>> to find the pages and free them. So we still at least need to bring it >>> into >>> cache for the read... at which point, the store probably isn't a big >>> burden. >>> >> Why not try to find a place to stash a linklist pointer and link them >> all together? Saves the pulldown pagetable walk altogether. >> > > Because we'd need one link per mm that a page is mapped in? > Can pagetable pages be shared between mms? (Kernel pmds in PAE excepted.) J -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 20:17 ` Jeremy Fitzhardinge @ 2007-03-13 20:21 ` Matt Mackall 2007-03-13 21:07 ` David Miller, Matt Mackall 0 siblings, 1 reply; 21+ messages in thread From: Matt Mackall @ 2007-03-13 20:21 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Nick Piggin, Andrew Morton, clameter, linux-mm, linux-kernel On Tue, Mar 13, 2007 at 01:17:00PM -0700, Jeremy Fitzhardinge wrote: > Matt Mackall wrote: > > On Tue, Mar 13, 2007 at 10:30:10AM -0700, Jeremy Fitzhardinge wrote: > > > >> Nick Piggin wrote: > >> > >>> However we still have to visit those to-be-unmapped parts of the page > >>> table, > >>> to find the pages and free them. So we still at least need to bring it > >>> into > >>> cache for the read... at which point, the store probably isn't a big > >>> burden. > >>> > >> Why not try to find a place to stash a linklist pointer and link them > >> all together? Saves the pulldown pagetable walk altogether. > >> > > > > Because we'd need one link per mm that a page is mapped in? > > > > Can pagetable pages be shared between mms? (Kernel pmds in PAE excepted.) Ahh, I think the issue is that we have to walk the page tables to drop the reference count of the _actual pages_ they point to. The page tables themselves could all be put on a list or two lists (one for PMDs, one for everything else), but that wouldn't really be a win over just walking the tree, especially given the extra list maintenance. Because the fan-out is large, the bulk of the work is bringing the last layer of the tree into cache to find all the pages in the address space. And there's really no way around that. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 20:21 ` Matt Mackall @ 2007-03-13 21:07 ` David Miller, Matt Mackall 2007-03-13 21:14 ` Matt Mackall 0 siblings, 1 reply; 21+ messages in thread From: David Miller, Matt Mackall @ 2007-03-13 21:07 UTC (permalink / raw) To: mpm; +Cc: jeremy, nickpiggin, akpm, clameter, linux-mm, linux-kernel > Because the fan-out is large, the bulk of the work is bringing the last > layer of the tree into cache to find all the pages in the address > space. And there's really no way around that. That's right. And I will note that historically we used to be much worse in this area, as we used to walk the page table tree twice on address space teardown (once to hit the PTE entries, once to free the page tables). Happily it is a one-pass algorithm now. But, within active VMA ranges, we do have to walk all the bits at least one time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 21:07 ` David Miller, Matt Mackall @ 2007-03-13 21:14 ` Matt Mackall 2007-03-13 21:36 ` Jeremy Fitzhardinge 2007-03-13 21:48 ` David Miller, Matt Mackall 0 siblings, 2 replies; 21+ messages in thread From: Matt Mackall @ 2007-03-13 21:14 UTC (permalink / raw) To: David Miller; +Cc: jeremy, nickpiggin, akpm, clameter, linux-mm, linux-kernel On Tue, Mar 13, 2007 at 02:07:22PM -0700, David Miller wrote: > From: Matt Mackall <mpm@selenic.com> > Date: Tue, 13 Mar 2007 15:21:25 -0500 > > > Because the fan-out is large, the bulk of the work is bringing the last > > layer of the tree into cache to find all the pages in the address > > space. And there's really no way around that. > > That's right. > > And I will note that historically we used to be much worse > in this area, as we used to walk the page table tree twice > on address space teardown (once to hit the PTE entries, once > to free the page tables). > > Happily it is a one-pass algorithm now. > > But, within active VMA ranges, we do have to walk all > the bits at least one time. Well you -could- do this: - reuse a long in struct page as a used map that divides the page up into 32 or 64 segments - every time you set a PTE, set the corresponding bit in the mask - when we zap, only visit the regions set in the mask Thus, you avoid visiting most of a PMD page in the sparse case, assuming PTEs aren't evenly spread across the PMD. This might not even be too horrible as the appropriate struct page should be in cache with the appropriate bits of the mm already locked, etc. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 21:14 ` Matt Mackall @ 2007-03-13 21:36 ` Jeremy Fitzhardinge 2007-03-13 21:46 ` Peter Chubb 2007-03-13 21:48 ` David Miller, Matt Mackall 1 sibling, 1 reply; 21+ messages in thread From: Jeremy Fitzhardinge @ 2007-03-13 21:36 UTC (permalink / raw) To: Matt Mackall Cc: David Miller, nickpiggin, akpm, clameter, linux-mm, linux-kernel Matt Mackall wrote: > Well you -could- do this: > > - reuse a long in struct page as a used map that divides the page up > into 32 or 64 segments > - every time you set a PTE, set the corresponding bit in the mask > - when we zap, only visit the regions set in the mask > > Thus, you avoid visiting most of a PMD page in the sparse case, > assuming PTEs aren't evenly spread across the PMD. > > This might not even be too horrible as the appropriate struct page > should be in cache with the appropriate bits of the mm already locked, > etc. > And do the same in pte pages for actual mapped pages? Or do you think they would be too densely populated for it to be worthwhile? J -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 21:36 ` Jeremy Fitzhardinge @ 2007-03-13 21:46 ` Peter Chubb 0 siblings, 0 replies; 21+ messages in thread From: Peter Chubb @ 2007-03-13 21:46 UTC (permalink / raw) To: Jeremy Fitzhardinge, ianw Cc: Matt Mackall, David Miller, nickpiggin, akpm, clameter, linux-mm, linux-kernel >>>>> "Jeremy" == Jeremy Fitzhardinge <jeremy@goop.org> writes: Jeremy> And do the same in pte pages for actual mapped pages? Or do Jeremy> you think they would be too densely populated for it to be Jeremy> worthwhile? We've been doing some measurements on how densely clumped ptes are. On 32-bit platforms, they're pretty dense. On IA64, quite a bit sparser, depending on the workload of course. I think that's mostly because of the larger pagesize on IA64 -- with 64k pages, you don't need very many to map a small object. I'm hoping IanW can give more details. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 21:14 ` Matt Mackall 2007-03-13 21:36 ` Jeremy Fitzhardinge @ 2007-03-13 21:48 ` David Miller, Matt Mackall 1 sibling, 0 replies; 21+ messages in thread From: David Miller, Matt Mackall @ 2007-03-13 21:48 UTC (permalink / raw) To: mpm; +Cc: jeremy, nickpiggin, akpm, clameter, linux-mm, linux-kernel > Well you -could- do this: > > - reuse a long in struct page as a used map that divides the page up > into 32 or 64 segments > - every time you set a PTE, set the corresponding bit in the mask > - when we zap, only visit the regions set in the mask > > Thus, you avoid visiting most of a PMD page in the sparse case, > assuming PTEs aren't evenly spread across the PMD. > > This might not even be too horrible as the appropriate struct page > should be in cache with the appropriate bits of the mm already locked, > etc. Yes, I've even had that idea before. You can even hide it behind pmd_none() et al., the generic VM doesn't even have to know that the page table macros are doing this optimization. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-13 12:47 ` Andrew Morton 2007-03-13 12:01 ` Nick Piggin @ 2007-03-14 1:12 ` William Lee Irwin III 2007-03-15 23:12 ` William Lee Irwin III 1 sibling, 1 reply; 21+ messages in thread From: William Lee Irwin III @ 2007-03-14 1:12 UTC (permalink / raw) To: Andrew Morton; +Cc: Nick Piggin, clameter, linux-mm, linux-kernel On Tue, Mar 13, 2007 at 04:47:56AM -0800, Andrew Morton wrote: > I'm trying to remember why we ever would have needed to zero out the > pagetable pages if we're taking down the whole mm? Maybe it's > because "oh, the arch wants to put this page into a quicklist to > recycle it", which is all rather circular. > It would be interesting to look at a) leave the page full of random > garbage if we're releasing the whole mm and b) return it straight to > the page allocator. We never did need to modify ptes on exit() or other pagetable prunings (not that they were ever done outside exit() before 2.6.x). The only subtlety is that pruning on munmap() needs a TLB flush for the TLB itself to drop the references to the pages referred to by the PTE's on pruning in the presence of hardware pagetable walkers (in the exit() case there are no user execution contexts left to potentially utilize the dead translations so it's less important). That's handled by tlb_remove_page() and shouldn't need any updates across such a change. I believe the zeroing on teardown was largely a result of idiom vs. any particular need. Essentially using ptep_get_and_clear() to handle the non-pruning munmap() case in a manner unified with other pagetable teardowns. Also likely is 2.4.x legacy from when that and possibly earlier kernels maintained arch-private quicklists for pagetables. There are furthermore distinctions to make between fork() and execve(). fork() stomps over the entire process address space copying pagetables en masse. After execve() a process incrementally faults in PTE's one at a time. It should be clear that if case analyses are of interest at all, fork() will want cache-hot pages (cache-preloaded pages?) where such are largely wasted on incremental faults after execve(). The copy operations in fork() should probably also be examined in the context of shared pagetables at some point. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [QUICKLIST 0/4] Arch independent quicklists V2 2007-03-14 1:12 ` William Lee Irwin III @ 2007-03-15 23:12 ` William Lee Irwin III 0 siblings, 0 replies; 21+ messages in thread From: William Lee Irwin III @ 2007-03-15 23:12 UTC (permalink / raw) To: Andrew Morton; +Cc: Nick Piggin, clameter, linux-mm, linux-kernel On Tue, Mar 13, 2007 at 06:12:44PM -0700, William Lee Irwin III wrote: > There are furthermore distinctions to make between fork() and execve(). > fork() stomps over the entire process address space copying pagetables > en masse. After execve() a process incrementally faults in PTE's one at > a time. It should be clear that if case analyses are of interest at > all, fork() will want cache-hot pages (cache-preloaded pages?) where > such are largely wasted on incremental faults after execve(). The copy > operations in fork() should probably also be examined in the context of > shared pagetables at some point. To make this perfectly clear, we can deal with the varying usage cases with hot/cold flags to the pagetable allocator functions. Where bulk copies such as fork() are happening, it makes perfect sense to precharge the cache by eager zeroing. Where sparse single pte affairs such as incrementally faulting things in after execve() are involved, cache cold preconstructed pagetable pages are ideal. Address hints could furthermore be used to precharge single cachelines (e.g. via prefetch) in the sparse usage case. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2007-03-15 23:12 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20070313071325.4920.82870.sendpatchset@schroedinger.engr.sgi.com>
[not found] ` <20070313005334.853559ca.akpm@linux-foundation.org>
[not found] ` <45F65ADA.9010501@yahoo.com.au>
[not found] ` <20070313035250.f908a50e.akpm@linux-foundation.org>
2007-03-13 11:06 ` [QUICKLIST 0/4] Arch independent quicklists V2 Nick Piggin
2007-03-13 12:15 ` Andrew Morton
2007-03-13 11:20 ` Christoph Lameter
2007-03-13 12:30 ` Andrew Morton
2007-03-15 20:23 ` Christoph Lameter
2007-03-13 11:30 ` Nick Piggin
2007-03-13 12:47 ` Andrew Morton
2007-03-13 12:01 ` Nick Piggin
2007-03-13 13:11 ` Andrew Morton
2007-03-13 12:18 ` Nick Piggin
2007-03-13 17:30 ` Jeremy Fitzhardinge
2007-03-13 20:03 ` Matt Mackall
2007-03-13 20:17 ` Jeremy Fitzhardinge
2007-03-13 20:21 ` Matt Mackall
2007-03-13 21:07 ` David Miller, Matt Mackall
2007-03-13 21:14 ` Matt Mackall
2007-03-13 21:36 ` Jeremy Fitzhardinge
2007-03-13 21:46 ` Peter Chubb
2007-03-13 21:48 ` David Miller, Matt Mackall
2007-03-14 1:12 ` William Lee Irwin III
2007-03-15 23:12 ` William Lee Irwin III
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox