From: Andrew Morton <akpm@digeo.com>
To: Rik van Riel <riel@conectiva.com.br>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: inactive_dirty list
Date: Fri, 06 Sep 2002 22:28:44 -0700 [thread overview]
Message-ID: <3D798E8C.3DCD883C@digeo.com> (raw)
In-Reply-To: <Pine.LNX.4.44L.0209062309580.1857-100000@imladris.surriel.com>
Rik van Riel wrote:
>
> On Fri, 6 Sep 2002, Andrew Morton wrote:
>
> > I have a silly feeling that setting DEF_PRIORITY to "12" will
> > simply fix this.
> >
> > Duh.
>
> Ideally we'd get rid of DEF_PRIORITY alltogether and would
> just scan each zone once.
>
What I'm doing now is:
#define DEF_PRIORITY 12 /* puke */
for (priority = DEF_PRIORITY; priority; priority--) {
int total_scanned = 0;
shrink_caches(priority, &total_scanned);
if (that didn't work) {
wakeup_bdflush(total_scanned);
blk_congestion_wait(WRITE, HZ/4);
}
}
and in shrink_caches():
max_scan = zone->nr_inactive >> priority;
if (max_scan < nr_pages * 2)
max_scan = nr_pages * 2;
nr_pages = shrink_zone(zone, max_scan, gfp_mask, nr_pages);
So in effect, for a 32-page reclaim attempt we'll scan 64 pages
of ZONE_HIGHMEM, then 128 pages of ZONE_NORMAL/DMA. If that doesn't
yield 32 pages we ask pdflush to write 3*64 pages. Then take a nap.
Then do it again: scan 64 pages of ZONE_HIGHMEM, then 128 of ZONE_NORMAL/DMA,
then write back 192 pages then nap.
Then do it again: scan 128 pages of ZONE_HIGHMEM, then 256 of ZONE_NORMAL/DMA,
then write back 384 pages then nap.
etc. Plus there are the actual pages which we started IO against
during the LRU scan - there can be up to 32 of those.
BTW, it turns out that the main reason why kswapd was going silly was
that the VM is *not* treating the `priority' as a logarithmic thing at
all:
int max_scan = nr_inactive_pages / priority;
so the claims about scanning 1/64th of the list are crap. That
thing scans 1/6th of the queue on the first pass. In the mem=1G
case, that's 30,000 damn pages. Maybe someone should take a look
at Marcelo's kernel?
There are a few warts: pdflush_operation will fail if all pdflush threads
are out doing something (pretty unlikely with the nonblocking stuff.
Might happen if writeback has to run get_block()). But we'll be writing
back stuff anyway.
I changed blk_congestion_wait a bit too. The first version would
return immediately if no queues were congested ( > 75% full). Now,
it will sleep even if no queues are congested. It will return
as soon as someone puts back a write request. If someone is silly
enough to call blk_congestion_wait() when there are no write requests
in flight at all, they get to take the full 1/4 second sleep.
The mem=1G corner case is fixed, and page reclaim just doesn't
figure:
c012c034 288 0.317709 do_wp_page
c0144ae0 316 0.348597 __block_commit_write
c012c910 342 0.377279 do_anonymous_page
c0143efc 353 0.389414 __find_get_block
c012f7e0 356 0.392724 find_lock_page
c012f9f0 356 0.392724 do_generic_file_read
c01832bc 367 0.404858 ext2_free_branches
c0136e70 371 0.409271 __free_pages_ok
c010e7b4 386 0.425818 timer_interrupt
c01e3cfc 414 0.456707 radix_tree_lookup
c0141894 434 0.47877 vfs_write
c012f580 474 0.522896 unlock_page
c0134348 500 0.551578 kmem_cache_alloc
c01347d0 531 0.585776 kmem_cache_free
c013712c 574 0.633212 rmqueue
c0141320 605 0.667409 generic_file_llseek
c0156924 616 0.679544 count_list
c0142c04 617 0.680647 fget
c01091e0 793 0.874803 system_call
c0155914 860 0.948714 __d_lookup
c0144674 1076 1.187 __block_prepare_write
c014c63c 1184 1.30614 link_path_walk
c012fcd4 10932 12.0597 file_read_actor
c0130674 16443 18.1392 generic_file_write_nolock
c0107048 31293 34.5211 poll_idle
The balancing of the zones looks OK from a first glance and of course
the change in system behaviour under heavy writeout loads is profound.
Let's do the MAP_SHARED-pages-get-a-second-round thing, and it'd
be good if we could come up with some algorithm for setting the
current dirty pagecache clamping level rather than relying on the
dopey /proc/sys/vm/dirty_async_ratio magic number.
I'm thinking that dirty_async_ratio becomes a maximum ratio, and
that we dynamically lower it when large amounts of dirty pagecache
would be embarrassing. Or maybe there's just no need for this. Dunno.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
prev parent reply other threads:[~2002-09-07 5:14 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-09-06 20:42 Andrew Morton
2002-09-06 21:03 ` Rik van Riel
2002-09-06 21:40 ` Andrew Morton
2002-09-06 21:49 ` Rik van Riel
2002-09-06 21:58 ` Andrew Morton
2002-09-06 22:04 ` Rik van Riel
2002-09-06 22:19 ` Andrew Morton
2002-09-06 22:23 ` Rik van Riel
2002-09-06 22:48 ` Andrew Morton
2002-09-06 23:03 ` Rik van Riel
2002-09-06 23:34 ` Andrew Morton
2002-09-07 0:00 ` Rik van Riel
2002-09-07 0:29 ` Andrew Morton
2002-09-08 21:21 ` Daniel Phillips
2002-09-06 22:22 ` Rik van Riel
2002-09-07 2:14 ` Andrew Morton
2002-09-07 2:10 ` Rik van Riel
2002-09-07 5:28 ` Andrew Morton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3D798E8C.3DCD883C@digeo.com \
--to=akpm@digeo.com \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox