[ The attached patch is Proof of Concept (POC) code only. It only works on x86_64, it only supports the slab allocator, it only relocates the lowest level of page tables, it's less efficient that it should be, and I'm convinced the locking is deficient. It does work well enough to play around with though. The patch is a unified diff against a clean 2.6.23.] I'd like to propose 4 somewhat interdependent code changes. 1) Add a separate meta-data allocation to the slab and slub allocator and allocate full pages through kmem_cache_alloc instead of get_page. The primary motivation of this is that we could shrink struct page by using kmem_cache_alloc to allocate whole pages and put the supported data in the meta_data area instead of struct page. The downside is that we might end up using more memory because of alignment issues. I believe we can keep the code as efficient as the current code by allocating many pages at once with known alignment and locating the meta data in the first few pages. Then locating the meta data for a page by page_address & mask + (page_address >> foo) & mask * meta_data_size + offset. Which should be just as fast as the current calculation. This is different than the proof of concept implementation. I also believe this would reduce kernel memory fragmentation. 2) Add support for relocating memory allocated via kmem_cache_alloc. When a cache is created, optional relocation information can be provided. If a relocation function is provided, caches can be defragmented and overall memory consumption can be reduced. 3) Create a handle struct for holding references to memory that might be moved out from under you. This is one of those things that looks really good on paper, but in practice isn't very useful. While I'm sure there are a few case in /syfs and /proc where handles could be put to good use, in general the overhead involved does not justify their use. I worry that they could become a fad and that people will start using them when they should not be used. The reason for including them is that they are really good for setting up synthetic tests for relocating memory. and finally the real reason for doing all of the above. 4) Modify pte_alloc/free and friends to use kmem_cache_alloc and make page tables relocatable. I believe this would go a long way towards keeping kernel memory from fragmenting. The biggest down side is the number of tlb flushes involved. The POC code uses RCU to free the old copies of the page tables, which should reduce the flushes. However, it blindly flushes the tlbs on all of the cpus, when it really only needs to flush the tlb on any cpu using the mm in question. I believe that by only flushing the tlbs on cpus actually using the mm in question, we can reduce the flushes to an acceptable level. One alternative is to create an RCU class for tlb flushes, so that the old table only gets freed after all the cpus have flushed their tlbs. I believe that the above opens the doors to shrinking struct page and greatly reducing kernel memory fragmentation with the only real downside being an increase in code complexity and a possible increase in memory usage if we are not careful. I'm willing to code all of this, but I'd like to get others opinions on what's appropriate and what's already being done. With the exception of tlb flushes and meta data location, I believe the POC code demonstrates how I intend to solve most of the problems that will be encountered. One thing I am worried about is the performance impact of the changes and I would like pointers to any micro benchmarks that might be relevant. Ross