Hi,

   The attached patch, against 2.1.78, contains some simple memory
management changes.  They reduce memory usage, and give a slight increase
in the performance of the page-cache (when a required page is not
present).
   I'd appreciate any comments and/or test results.

  Changes;
	o In mm/slab.c, SLAB_BREAK_GFP_ORDER has been dropped from 2
	  to 1.  This reduces the page order (size) of the slabs
	  for some caches, which prevents stressing the page-allocator
	  with only a small hit in performance.  (Stability is more
	  important than performance).

	o The SLAB cache used for "files_struct" (files_cachep), has
	  been split into two slab-caches (files_cachep, and fds_cachep).
	  The fds_cachep is used for the "struct file" pointers.
	  Previously, the size of objects in files_cachep did not pack
	  well into any possible slab-cache size.  So the SLAB used a
	  large slab size to reduce internal fragmentation.  This was the
	  main cause of "fork(): Out of memory" errors on loaded systems.
	  With the default value of NR_OPEN (1024), the fds_cachep has
	  an object size of 4096 (or 8192 on 64-bit) which is a nice
	  size for the SLAB.
	  NOTE: This also required a change to
		arch/i386/kernel/init_task.c, to give the initial
		(idle) task file-descriptors.  (Does it actually need
		any?).

	o The members of vm_area_struct have been re-arranged so that
	  all the members, used during a find_vma() search, fall on
	  the same cache-line for archs which have a small line size
	  (eg. Intel's 486).
	  NOTE: This change also needs a complementary change to
		INIT_MMAP in asm/processor.h

	o Three members of 'struct page' have been unionised;
		inode	- only needed for named pages
		buffers	- only needed for buffer pages
		pg_swap_entry - only needed for anonymous pages
	  Named (inode) pages are marked PageNamed(pg), and
	  buffer pages are marked PageBuffer(pg).  While there
	  is currently no need to mark anonymous-pages, I'll probably
	  mark them in the future (but that hits a lot of code).

	o In 'struct page', the member "struct page *prev" has been
	  changed to "struct page **pprev".  This simplifies the
	  code to add/remove a named page from an inode's page queue.

	o page_unuse(), in filemap.c, has been simplified.  It is
	  only ever called for a named page, and if the page is still
	  shared it returns "0".  A zero tells the address-space
	  scanning function in vmscan.c that no page has been found,
	  _and_ there has been no blocking (ie. the context which
	  is being scanned could not have exited).

	o In mm/page_alloc.c a new allocation function, __get_user_page()
	  has been added.  This function takes _no_ arguments, and
	  allocates a single page at priority GFP_USER (although the
	  priority isn't actually used during page-reaping).  It offers
	  a slight performance improvement over the generic
	  __get_free_pages(), and returns a "struct page *".

	  Returning a pointer is only benefical in a few palaces at
	  the moment.  By changing some of the vm_operations (eg.
	  ->nopage), to return/take a 'struct page*' some code paths/
	  error handling can be improved.  (I haven't changed the
	  vm_ops in this patch, as it hits _alot_ of code).

	  Also in mm/page_alloc.c there is a new releasing function;
	  __free_freed_page().  This is used with the new inline
	  function (in include/linux/mm.h), release_page().  This new
	  inline removes the need to always call __free_page() in
	  mm/filemap.c.  (A page's count is used to 'lock' a page
	  against reaping when a block may occur.  Normally, filemap.c
	  just needs to drop this lock.  However, the inode associated
	  with the named-page may have been truncated/invalidated.  The
	  truncation/invalidation removes the page from the cache, but it
	  is up to the last page-user to return it to the free-page pool).
	  NOTE: relase_page() is also used in mm/memory.c when zapping
		pages from a context.  As some of these pages are named
		pages, the 'freeing' of the page only reduces the
		reference count.  ie. we avoid the overhead of some
		unnecessary function calls.

	o In mm/filemap.c, the need to always re-check the
	  page-cache (find_page()) after a _possible_ blocking allocation
	  has been removed.
	  Before an allocation, a 'cookie' is taken from the page-cache
	  hash line where the page would/will appear.  After the
	  allocation, the cookie is re-check.  If it has changed, the
	  page-cache needs to be search.  Otherwise there is no need
	  to re-check.
	  Rather than add a cookie counter to each page-cache hash line
	  (which would be 'fat'), the cookie is the first page pointer.
	  As all pages are added to the head of a hash line, the pointer
	  is sufficient.  (It can also give a false-positive if the first
	  page is removed, but this is a small price to pay).

	o The swap code has been changed to dynamically allocate the
	  swap_info_struct when a new swap-area is added.
	  Also, rather than use indices to link the swap-areas togther
	  (as was previously done), two sets of pointers are used.
	  By using two sets of pointers (one set for all swap-areas order
	  by priority, one set for all swap-areas with the same prio) and
	  code to find a swap-page of the highest available priority is
	  simplified.
	  An array of swap-area pointers is still needed, but it is
	  better than an array of swap-area structures.

	o rw_swap_page() has been changed to take a "struct page *" of
	  an I/O locked page, and to return a error.
	  Handling a swap-page write error is difficult, and is not
	  currently done correctly.  The problem is that rw_swap_page()
	  may block, so it is not possible to determine if the PTE
	  is still around (context has exited during the block) to
	  be reloaded.  I need to re-check, perhaps rw_swap_page()
	  cannot block _and_ return an error....

	o There are a few other changes, such as using "struct page *"
	  rather than unsigned long.

   Regards,

      markhe