linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* inactive_dirty list
@ 2002-09-06 20:42 Andrew Morton
  2002-09-06 21:03 ` Rik van Riel
  2002-09-07  2:14 ` Andrew Morton
  0 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2002-09-06 20:42 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm

Rik, it seems that the time has come...

I was doing some testing overnight with mem=1024m.  Page reclaim
was pretty inefficient at that level: kswapd consumed 6% of CPU
on a permanent basis (workload was heavy dbench plus looping
make -j6 bzImage).  kswapd was reclaiming only 3% of the pages
which it was looking at.

This doesn't happen at mem=768m, and I'm sure it won't happen at
mem=1.5G.

What is happening here is that the logic which clamps dirty+writeback
pagecache at 40% of memory is working nicely, and the allocate-from-
highmem-first logic is ensuring that all of ZONE_HIGHMEM is dirty
or under writeback all the time.  kswapd isn't allowed to block
against that pagecache, so it's scanning zillions of pages.

This is a fundamental problem when the size of the highmem zone is
approximately equal to 40% of total memory.

We could fix it by changing the page allocator to balance its
allocations across zones, but I don't think we want to do that.

I think it's best to split the inactive list into reclaimable
and unreclaimable.  (inactive_clean/inactive_dirty).

I'll code that tonight; please let me run some thoughts by you:

- inactive_dirty holds pages which are dirty or under writeback.

- end_page_writeback() will move the page onto inactive_clean.

- everywhere where we add a page to the inactive list will now
  add it to either inactive_clean or inactive_dirty, based on
  its PageDirty || PageWriteback state.

- the inactive target logic will remain the same.  So
  zone->nr_inactive_pages will be the sum of the pages on
  zone->inactive_clean and zone->inactive_dirty.

- swapcache pages don't go on inactive_dirty(!).  They remain on
  inactive_clean, so if a page allocator or kswapd hits a swapcache
  page, they block on it (swapout throttling).

  A result of this is that we never need to scan inactive_dirty.
  Those pages will always be written out in balance_dirty_pages
  by the write(2) caller, or by pdflush.

  (Hence: we don't need inactive_dirty at all.  We could just cut
  those pages off the LRU altogether.  But let's not do that).

- Hence: the only pages which are written out from within the VM
  are swapcache.

- So the only real source of throttling for tasks which aren't
  running generic_file_write() is the call to blk_congestion_wait()
  in try_to_free_pages().  Which seems sane to me - this will wake
  up after 1/4 of a second, or after someone frees a write request
  against *any* queue.  We know that the pages which were covered
  by that request were just placed onto inactive_clean, so off
  we go again.  Should work (heh).

- with this scheme, we don't actually need zone->nr_inactive_dirty_pages
  and zone->nr_inactive_clean_pages, but I may as well do that - it's
  easy enough.

- MAP_SHARED pages will be on inactive_clean, but if we change the
  logic in there to give these pages a second round on the LRU then
  the apges will automatically be added to inactive_dirty on the
  way out of shrink_zone().

How does that all sound?

btw, it is approximately the case that the pages will come clean
in LRU order (oldest-first) because of the writeback logic.  fs-writeback.c
walks the inodes in oldest-dirtied to newest-dirtied order, and
it walks the inode pages in oldest-dirtied to newest-dirtied
order.   But I think that end_page_writeback() should still move
cleaned pages onto the far (hot) end of inactive_clean?

I think all of this will not result in the zone balancing logic
going into a tailspin.  I'm just a bit worried about corner cases
when the number of reclaimable pages in highmem is getting low - the
classzone balancing code may keep on trying to refill that zone's free
memory pools too much.   We'll see...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2002-09-08 21:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-06 20:42 inactive_dirty list Andrew Morton
2002-09-06 21:03 ` Rik van Riel
2002-09-06 21:40   ` Andrew Morton
2002-09-06 21:49     ` Rik van Riel
2002-09-06 21:58       ` Andrew Morton
2002-09-06 22:04         ` Rik van Riel
2002-09-06 22:19           ` Andrew Morton
2002-09-06 22:23             ` Rik van Riel
2002-09-06 22:48               ` Andrew Morton
2002-09-06 23:03                 ` Rik van Riel
2002-09-06 23:34                   ` Andrew Morton
2002-09-07  0:00                     ` Rik van Riel
2002-09-07  0:29                       ` Andrew Morton
2002-09-08 21:21                     ` Daniel Phillips
2002-09-06 22:22           ` Rik van Riel
2002-09-07  2:14 ` Andrew Morton
2002-09-07  2:10   ` Rik van Riel
2002-09-07  5:28     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox