Hi, The attached patch, against 2.1.78, contains some simple memory management changes. They reduce memory usage, and give a slight increase in the performance of the page-cache (when a required page is not present). I'd appreciate any comments and/or test results. Changes; o In mm/slab.c, SLAB_BREAK_GFP_ORDER has been dropped from 2 to 1. This reduces the page order (size) of the slabs for some caches, which prevents stressing the page-allocator with only a small hit in performance. (Stability is more important than performance). o The SLAB cache used for "files_struct" (files_cachep), has been split into two slab-caches (files_cachep, and fds_cachep). The fds_cachep is used for the "struct file" pointers. Previously, the size of objects in files_cachep did not pack well into any possible slab-cache size. So the SLAB used a large slab size to reduce internal fragmentation. This was the main cause of "fork(): Out of memory" errors on loaded systems. With the default value of NR_OPEN (1024), the fds_cachep has an object size of 4096 (or 8192 on 64-bit) which is a nice size for the SLAB. NOTE: This also required a change to arch/i386/kernel/init_task.c, to give the initial (idle) task file-descriptors. (Does it actually need any?). o The members of vm_area_struct have been re-arranged so that all the members, used during a find_vma() search, fall on the same cache-line for archs which have a small line size (eg. Intel's 486). NOTE: This change also needs a complementary change to INIT_MMAP in asm/processor.h o Three members of 'struct page' have been unionised; inode - only needed for named pages buffers - only needed for buffer pages pg_swap_entry - only needed for anonymous pages Named (inode) pages are marked PageNamed(pg), and buffer pages are marked PageBuffer(pg). While there is currently no need to mark anonymous-pages, I'll probably mark them in the future (but that hits a lot of code). o In 'struct page', the member "struct page *prev" has been changed to "struct page **pprev". This simplifies the code to add/remove a named page from an inode's page queue. o page_unuse(), in filemap.c, has been simplified. It is only ever called for a named page, and if the page is still shared it returns "0". A zero tells the address-space scanning function in vmscan.c that no page has been found, _and_ there has been no blocking (ie. the context which is being scanned could not have exited). o In mm/page_alloc.c a new allocation function, __get_user_page() has been added. This function takes _no_ arguments, and allocates a single page at priority GFP_USER (although the priority isn't actually used during page-reaping). It offers a slight performance improvement over the generic __get_free_pages(), and returns a "struct page *". Returning a pointer is only benefical in a few palaces at the moment. By changing some of the vm_operations (eg. ->nopage), to return/take a 'struct page*' some code paths/ error handling can be improved. (I haven't changed the vm_ops in this patch, as it hits _alot_ of code). Also in mm/page_alloc.c there is a new releasing function; __free_freed_page(). This is used with the new inline function (in include/linux/mm.h), release_page(). This new inline removes the need to always call __free_page() in mm/filemap.c. (A page's count is used to 'lock' a page against reaping when a block may occur. Normally, filemap.c just needs to drop this lock. However, the inode associated with the named-page may have been truncated/invalidated. The truncation/invalidation removes the page from the cache, but it is up to the last page-user to return it to the free-page pool). NOTE: relase_page() is also used in mm/memory.c when zapping pages from a context. As some of these pages are named pages, the 'freeing' of the page only reduces the reference count. ie. we avoid the overhead of some unnecessary function calls. o In mm/filemap.c, the need to always re-check the page-cache (find_page()) after a _possible_ blocking allocation has been removed. Before an allocation, a 'cookie' is taken from the page-cache hash line where the page would/will appear. After the allocation, the cookie is re-check. If it has changed, the page-cache needs to be search. Otherwise there is no need to re-check. Rather than add a cookie counter to each page-cache hash line (which would be 'fat'), the cookie is the first page pointer. As all pages are added to the head of a hash line, the pointer is sufficient. (It can also give a false-positive if the first page is removed, but this is a small price to pay). o The swap code has been changed to dynamically allocate the swap_info_struct when a new swap-area is added. Also, rather than use indices to link the swap-areas togther (as was previously done), two sets of pointers are used. By using two sets of pointers (one set for all swap-areas order by priority, one set for all swap-areas with the same prio) and code to find a swap-page of the highest available priority is simplified. An array of swap-area pointers is still needed, but it is better than an array of swap-area structures. o rw_swap_page() has been changed to take a "struct page *" of an I/O locked page, and to return a error. Handling a swap-page write error is difficult, and is not currently done correctly. The problem is that rw_swap_page() may block, so it is not possible to determine if the PTE is still around (context has exited during the block) to be reloaded. I need to re-check, perhaps rw_swap_page() cannot block _and_ return an error.... o There are a few other changes, such as using "struct page *" rather than unsigned long. Regards, markhe