When you have lots of tasks, the pagetables start taking up lots of lowmem. We have the ability to push the PTE pages into highmem, but that exacts a penalty from the atomic kmaps which, depending on workload, can be a 10-15% performance hit. The following patches implement something which we like to call UKVA. It's a Kernel Virtual Area which is private to a process, just like Userspace. You can put any process-local data that you want in the area. But, for now, I just put PTE pages in there. It has some really nice attributes, which aren't taken full advantage of in this patch. For one, since the PTE pages are laid out virtually in line, it's really easy to figure out where the PTE that maps a particular address is sitting. The PTE that maps 0x00000000 is always virtually at *FIRST_UKVA_PTE, just as the PTE that maps 0xFFFFFFFF is always mapped at *LAST_UKVA_PTE. This gives implicit behavior doing things in hardware that we usually have software constructs like follow_page() do instead. Since only the current process's PTEs are mapped into the area, you still need to use kmap_atomic() to get to another process's pagetables. That is why I started passing mm around everywhere. If anyone wants to play with it, be my guest. But, don't go applying it to anything important. It certainly won't compile or boot without highpte and 64GB support. I've done all of the work on top of 2.5.70-mjb1. There are 3 patches on which this is built: reslabify-pmd-pgd-2.5.70-mjb1-0.patch sepmd-2.5.70-mjb1-0.patch banana_split-2.5.70-mjb1-1.patch Here's a differential profile. Higher numbers mean worse with UKVA, lower numbers mean better. I'm not sure why the total is so much bigger. I think my profiling script screwed up, and forgot to stop the profiler at the right time. Everything else looks OK. 158930 total 154829 default_idle 1523 pmd_free_ukva 1190 do_anonymous_page 896 pmd_alloc_ukva 754 free_hot_cold_page 616 .text.lock.namei 535 buffered_rmqueue 454 __d_lookup ... -238 fd_install -394 .text.lock.libfs -445 filemap_nopage -506 pte_alloc_map -696 kmap_atomic_to_page -3747 kmap_atomic Notice that there are a lot fewer kmap_atomic() calls, and kmap_atomic_to_page() is called less, because UKVA is used instead. The increase in pmd_free_ukva, pmd_alloc_ukva, and free_hot_cold_page are all due to the extra 4 pages per process that must be allocated. do_anonymous_page is probably due to the extra TLB overhead because of disabling lazy tlb mode (which I plan to fix). pmd_free_ukva() and pmd_alloc_ukva() probably doesn't need to be clearing the pages anyway. -- Dave Hansen haveblue@us.ibm.com