Re: balance_pgdat(): where is total

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: balance_pgdat(): where is total_scanned ever updated?
       [not found]             ` <16786.5789.465433.655127@thebsh.namesys.com>
@ 2004-11-11 14:49               ` Marcelo Tosatti
  2004-11-11 19:37                 ` Nikita Danilov
  0 siblings, 1 reply; 2+ messages in thread
From: Marcelo Tosatti @ 2004-11-11 14:49 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: linux-mm

switching to linux-mm

On Wed, Nov 10, 2004 at 04:24:45PM +0300, Nikita Danilov wrote:
> Marcelo Tosatti writes:
> 
> [...]
> 
>  > 
>  > Another related thing I noted this afternoon is that right now kswapd will
>  > always block on full queues:
>  > 
>  > static int may_write_to_queue(struct backing_dev_info *bdi)
>  > {
>  >         if (current_is_kswapd())
>  >                 return 1;
>  >         if (current_is_pdflush())       /* This is unlikely, but why not... */
>  >                 return 1;
>  >         if (!bdi_write_congested(bdi))
>  >                 return 1;
>  >         if (bdi == current->backing_dev_info)
>  >                 return 1;
>  >         return 0;
>  > }
>  > 
>  > We should make kswapd use the "bdi_write_congested" information and avoid
>  > blocking on full queues. It should improve performance on multi-device 
>  > systems with intense VM loads.
> 
> This will have following undesirable side effect: if
> may_write_to_queue() returns false, page is not paged out, instead it is
> thrown to the head of the inactive queue, thus destroying "LRU
> ordering", shrink_list() will dive deeper into inactive list, reclaiming
> hotter pages.
> It's OK to accidentially skip pageout in direct reclaim path, because
> 
>  - we hope most pageout is done by kswapd, and
> 
>  - we don't want __alloc_pages() to stall
> 
> but _something_ in the kernel should take a pain of actually writing
> pages out in LRU order.

I see - it breaks LRU ordering of pageout. 

>  > Maybe something along the lines 
>  > 
>  > "if the reclaim ratio is high, do not writepage"
>  > "if the reclaim ratio is below high, writepage but not block"
>  > "if the reclaim ratio is low, writepage and block"
> 
> If kswapd blocking is a concern, inactive list scanning should be
> decoupled from actual page-out (a la Solaris): kswapd queues pages to
> the yet another kernel thread that calls pageout().

Its just concern, no numbers to back that up.

But its pretty obvious that its behaviour is suboptimal when you 
think about multi-device systems. kswapd may block for example
in get_block() (there is a comment on top of pageout() about
that), which makes the situation even worse.

> I played with this idea (see
> http://nikita.w3.to/code/patches/2-6-10-rc1/async-writepage.txt note
> that async_writepage() has to be adjusted to work for kswapd), but while
> in some cases (large concurrent builds) it does provide a benefit, in
> other cases (heavy write through mmap) it makes throughput slightly
> worse.

Very sweet, I like it.

Why do you think the heavy write through mmap decreased throughput?

Would be nice if you had those numbers saved somewhere.

> Besides, this doesn't completely avoid the problem of destroying LRU
> ordering, as kswapd still proceeds further through inactive list while
> pages are sent out asynchronously.

Well pages are being sent out in order - which should do fine. o?

kswapd proceeds further through inactive list while pages are sent 
out asynchronously with the current design - pageout() writes,
 moves the pages (now under IO) to head of inactive list and 
continues.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: balance_pgdat(): where is total_scanned ever updated?
  2004-11-11 14:49               ` balance_pgdat(): where is total_scanned ever updated? Marcelo Tosatti
@ 2004-11-11 19:37                 ` Nikita Danilov
  0 siblings, 0 replies; 2+ messages in thread
From: Nikita Danilov @ 2004-11-11 19:37 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-mm

Marcelo Tosatti writes:
 > 

[...]

 > 
 > > I played with this idea (see
 > > http://nikita.w3.to/code/patches/2-6-10-rc1/async-writepage.txt note
 > > that async_writepage() has to be adjusted to work for kswapd), but while
 > > in some cases (large concurrent builds) it does provide a benefit, in
 > > other cases (heavy write through mmap) it makes throughput slightly
 > > worse.
 > 
 > Very sweet, I like it.

Additional advantage of async-writepage is that in this case one has
whole queue of dirty pages ready for page-out, so that some smarter
clustering can be implemented.

 > 
 > Why do you think the heavy write through mmap decreased throughput?

Because I thought I measured it, but see below :)

 > 
 > Would be nice if you had those numbers saved somewhere.

Second column is averaged number of microseconds it takes to dirty 1G
through mmap (big file larger than ram is mmapped in 1G chunks and one
byte at each its page is touched in a loop). Rows correspond to patches
from http://nikita.w3.to/code/patches/2-6-10-rc1/ applied one after
another.

2.6.10-rc1                      77370854.641026
skip-writepage                  72766988.375000
dont-rotate-active-list         71440066.068966
async-writepage                 75028707.083333 /* regression */
batch-mark_page_accessed        74183312.078947
page_referenced-move-dirty      72947326.125000
dont-unmap-on-pageout           72702028.843750
ignore-page_referenced          74188417.156250 /* regression */
cluster-pageout                 69449001.583333

Err... now that I pasted this, I recall that async-writepage patch
tested above does _not_ allow kswapd to do asynchronous page-out:

----------------------------------------------------------------------
/*
 * check whether writepage should be done asynchronously by kaiod.
 */
static int
async_writepage(struct page *page, int nr_dirty)
{
	/* goal of doing writepage asynchronously is to decrease latency of
	 * memory allocations involving direct reclaim, which is inapplicable
	 * to the kswapd */
	if (current_is_kswapd())
		return 0;
	/* limit number of pending async-writepage requests */
	else if (kaio_nr_requests > KAIO_THROTTLE)
		return 0;
	/* if we are under memory pressure---do pageout synchronously to
	 * throttle scanner. */
	else if (page_zone(page)->prev_priority != DEF_PRIORITY)
		return 0;
	/* if expected number of writepage requests submitted by this
	 * invocation of shrink_list() is small enough---do them
	 * asynchronously */
	else if (nr_dirty <= KAIO_CLUSTER_SIZE)
		return 1;
	else
		return 0;
}
----------------------------------------------------------------------

First if ... return 0; should be removed.

Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-11-11 19:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200411061418_MC3-1-8E17-8B6C@compuserve.com>
     [not found] ` <20041106161114.1cbb512b.akpm@osdl.org>
     [not found]   ` <20041109104220.GB6326@logos.cnet>
     [not found]     ` <20041109113620.16b47e28.akpm@osdl.org>
     [not found]       ` <20041109180223.GG7632@logos.cnet>
     [not found]         ` <20041109134032.124b55fa.akpm@osdl.org>
     [not found]           ` <20041109185221.GA8414@logos.cnet>
     [not found]             ` <16786.5789.465433.655127@thebsh.namesys.com>
2004-11-11 14:49               ` balance_pgdat(): where is total_scanned ever updated? Marcelo Tosatti
2004-11-11 19:37                 ` Nikita Danilov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox