From: Andrew Morton <akpm@digeo.com>
To: Rik van Riel <riel@conectiva.com.br>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: inactive_dirty list
Date: Fri, 06 Sep 2002 13:42:06 -0700 [thread overview]
Message-ID: <3D79131E.837F08B3@digeo.com> (raw)
Rik, it seems that the time has come...
I was doing some testing overnight with mem=1024m. Page reclaim
was pretty inefficient at that level: kswapd consumed 6% of CPU
on a permanent basis (workload was heavy dbench plus looping
make -j6 bzImage). kswapd was reclaiming only 3% of the pages
which it was looking at.
This doesn't happen at mem=768m, and I'm sure it won't happen at
mem=1.5G.
What is happening here is that the logic which clamps dirty+writeback
pagecache at 40% of memory is working nicely, and the allocate-from-
highmem-first logic is ensuring that all of ZONE_HIGHMEM is dirty
or under writeback all the time. kswapd isn't allowed to block
against that pagecache, so it's scanning zillions of pages.
This is a fundamental problem when the size of the highmem zone is
approximately equal to 40% of total memory.
We could fix it by changing the page allocator to balance its
allocations across zones, but I don't think we want to do that.
I think it's best to split the inactive list into reclaimable
and unreclaimable. (inactive_clean/inactive_dirty).
I'll code that tonight; please let me run some thoughts by you:
- inactive_dirty holds pages which are dirty or under writeback.
- end_page_writeback() will move the page onto inactive_clean.
- everywhere where we add a page to the inactive list will now
add it to either inactive_clean or inactive_dirty, based on
its PageDirty || PageWriteback state.
- the inactive target logic will remain the same. So
zone->nr_inactive_pages will be the sum of the pages on
zone->inactive_clean and zone->inactive_dirty.
- swapcache pages don't go on inactive_dirty(!). They remain on
inactive_clean, so if a page allocator or kswapd hits a swapcache
page, they block on it (swapout throttling).
A result of this is that we never need to scan inactive_dirty.
Those pages will always be written out in balance_dirty_pages
by the write(2) caller, or by pdflush.
(Hence: we don't need inactive_dirty at all. We could just cut
those pages off the LRU altogether. But let's not do that).
- Hence: the only pages which are written out from within the VM
are swapcache.
- So the only real source of throttling for tasks which aren't
running generic_file_write() is the call to blk_congestion_wait()
in try_to_free_pages(). Which seems sane to me - this will wake
up after 1/4 of a second, or after someone frees a write request
against *any* queue. We know that the pages which were covered
by that request were just placed onto inactive_clean, so off
we go again. Should work (heh).
- with this scheme, we don't actually need zone->nr_inactive_dirty_pages
and zone->nr_inactive_clean_pages, but I may as well do that - it's
easy enough.
- MAP_SHARED pages will be on inactive_clean, but if we change the
logic in there to give these pages a second round on the LRU then
the apges will automatically be added to inactive_dirty on the
way out of shrink_zone().
How does that all sound?
btw, it is approximately the case that the pages will come clean
in LRU order (oldest-first) because of the writeback logic. fs-writeback.c
walks the inodes in oldest-dirtied to newest-dirtied order, and
it walks the inode pages in oldest-dirtied to newest-dirtied
order. But I think that end_page_writeback() should still move
cleaned pages onto the far (hot) end of inactive_clean?
I think all of this will not result in the zone balancing logic
going into a tailspin. I'm just a bit worried about corner cases
when the number of reclaimable pages in highmem is getting low - the
classzone balancing code may keep on trying to refill that zone's free
memory pools too much. We'll see...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next reply other threads:[~2002-09-06 20:42 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-09-06 20:42 Andrew Morton [this message]
2002-09-06 21:03 ` Rik van Riel
2002-09-06 21:40 ` Andrew Morton
2002-09-06 21:49 ` Rik van Riel
2002-09-06 21:58 ` Andrew Morton
2002-09-06 22:04 ` Rik van Riel
2002-09-06 22:19 ` Andrew Morton
2002-09-06 22:23 ` Rik van Riel
2002-09-06 22:48 ` Andrew Morton
2002-09-06 23:03 ` Rik van Riel
2002-09-06 23:34 ` Andrew Morton
2002-09-07 0:00 ` Rik van Riel
2002-09-07 0:29 ` Andrew Morton
2002-09-08 21:21 ` Daniel Phillips
2002-09-06 22:22 ` Rik van Riel
2002-09-07 2:14 ` Andrew Morton
2002-09-07 2:10 ` Rik van Riel
2002-09-07 5:28 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3D79131E.837F08B3@digeo.com \
--to=akpm@digeo.com \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox