Re: [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>
Subject: Re: [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible
Date: Thu, 10 Jun 2010 09:38:42 +0900	[thread overview]
Message-ID: <20100610093842.6a038ab0.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20100609095200.GA5650@csn.ul.ie>

On Wed, 9 Jun 2010 10:52:00 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> On Wed, Jun 09, 2010 at 11:52:11AM +0900, KAMEZAWA Hiroyuki wrote:
 
> > > <SNIP>
> > 
> > My concern is how memcg should work. IOW, what changes will be necessary for
> > memcg to work with the new vmscan logic as no-direct-writeback.
> > 
> 
> At worst, memcg waits on background flushers to clean their pages but
> obviously this could lead to stalls in containers if it happened to be full
> of dirty pages.
> 
yes.

> Do you have test scenarios already setup for functional and performance
> regression testing of containers? If so, can you run tests with this series
> and see what sort of impact you find? I haven't done performance testing
> with containers to date so I don't know what the expected values are.
> 
Maybe kernbench is enough. I think it does enough write and malloc.
'Limit' size for test depends on your host. I sometimes does this on
8cpu SMP box.
  
  # mount -t cgroup none /cgroups -o memory
  # mkdir /cgroups/A
  # echo $$ > /cgroups/A
  # echo 300M > /cgroups/memory.limit_in_bytes
  # make -j 8 or make -j 16

Comparing size of swap and speed will be interesting.
(Above 300M is enough small because my test machine has 24G memory.)

Or 
  # mount -t cgroup none /cgroups -o memory
  # mkdir /cgroups/A
  # echo $$ > /cgroups/A
  # echo 50M > /cgroups/memory.limit_in_bytes
  # dd if=/dev/zero of=./tmpfile bs=65536 count=100000

or some. When I tested the original patch for "avoiding writeback" by
Dave Chinner, I saw 2 ooms in 10 tests.
If not patched, I never see OOM.



> > Maybe an ideal solution will be
> >  - support buffered I/O tracking in I/O cgroup.
> >  - flusher threads should work with I/O cgroup.
> >  - memcg itself should support dirty ratio. and add a trigger to kick flusher
> >    threads for dirty pages in a memcg.
> > But I know it's a long way.
> > 
> 
> I'm not very familiar with memcg I'm afraid or its requirements so I am
> having trouble guessing which of these would behave the best. You could take
> a gamble on having memcg doing writeback in direct reclaim but you may run
> into the same problem of overflowing stacks.
> 
maybe.

> I'm not sure how a flusher thread would work just within a cgroup. It
> would have to do a lot of searching to find the pages it needs
> considering that it's looking at inodes rather than pages.
> 
yes. So, I(we) need some way for coloring inode for selectable writeback.
But people in this area are very nervous about performance (me too ;), I've
not found the answer yet.


> One possibility I guess would be to create a flusher-like thread if a direct
> reclaimer finds that the dirty pages in the container are above the dirty
> ratio. It would scan and clean all dirty pages in the container LRU on behalf
> of dirty reclaimers.
> 
Yes, that's possible. But Andrew recommends not to do add more threads. So,
I'll use workqueue if necessary.

> Another possibility would be to have kswapd work in containers.
> Specifically, if wakeup_kswapd() is called with a cgroup that it's added
> to a list. kswapd gives priority to global reclaim but would
> occasionally check if there is a container that needs kswapd on a
> pending list and if so, work within the container. Is there a good
> reason why kswapd does not work within container groups?
> 
One reason is node v.s. memcg.
Because memcg doesn't limit memory placement, a container can contain pages
from the all nodes. So,it's a bit problem which node's kswapd we should run .
(but yes, maybe small problem.)
Another is memory-reclaim-prioirty between memcg.
(I don't want to add such a knob...)

Maybe it's time to consider about that.
Now, we're using kswapd for softlimit. I think similar hints for kswapd
should work. yes.

> Finally, you could just allow reclaim within a memcg do writeback. Right
> now, the check is based on current_is_kswapd() but I could create a helper
> function that also checked for sc->mem_cgroup. Direct reclaim from the
> page allocator never appears to work within a container group (which
> raises questions in itself such as why a process in a container would
> reclaim pages outside the container?) so it would remain safe.
> 
isolate_lru_pages() for memcg finds only pages in a memcg ;) 


> > How the new logic works with memcg ? Because memcg doesn't trigger kswapd,
> > memcg has to wait for a flusher thread make pages clean ?
> 
> Right now, memcg has to wait for a flusher thread to make pages clean.
> 
ok.


> > Or memcg should have kswapd-for-memcg ?
> > 
> > Is it okay to call writeback directly when !scanning_global_lru() ?
> > memcg's reclaim routine is only called from specific positions, so, I guess
> > no stack problem.
> 
> It's a judgement call from you really. I see that direct reclaimers do
> not set mem_cgroup so it's down to - are you reasonably sure that all
> the paths that reclaim based on a container are not deep?

One concerns is add_to_page_cache(). If it's called in deep stack, my assumption
is wrong.

> I looked
> around for a while and the bulk appeared to be in the fault path so I
> would guess "yes" but as I'm not familiar with the memcg implementation
> I'll have missed a lot.
> 
> > But we just have I/O pattern problem.
> 
> True.
> 

Okay, I'll consider about how to kick kswapd via memcg or flusher-for-memcg.
Please go ahead as you want. I love good I/O pattern, too.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-06-10  0:43 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-08  9:02 Mel Gorman
2010-06-08  9:02 ` [PATCH 1/6] tracing, vmscan: Add trace events for kswapd wakeup, sleeping and direct reclaim Mel Gorman
2010-06-08  9:02 ` [PATCH 2/6] tracing, vmscan: Add trace events for LRU page isolation Mel Gorman
2010-06-08  9:02 ` [PATCH 3/6] tracing, vmscan: Add trace event when a page is written Mel Gorman
2010-06-08  9:02 ` [PATCH 4/6] tracing, vmscan: Add a postprocessing script for reclaim-related ftrace events Mel Gorman
2010-06-08  9:02 ` [PATCH 5/6] vmscan: Write out ranges of pages contiguous to the inode where possible Mel Gorman
2010-06-11  6:10   ` Andrew Morton
2010-06-11 12:49     ` Mel Gorman
2010-06-11 19:07       ` Andrew Morton
2010-06-11 20:44         ` Mel Gorman
2010-06-11 21:33           ` Andrew Morton
2010-06-12  0:17             ` Mel Gorman
2010-06-11 16:27     ` Christoph Hellwig
2010-06-08  9:02 ` [PATCH 6/6] vmscan: Do not writeback pages in direct reclaim Mel Gorman
2010-06-11  6:17   ` Andrew Morton
2010-06-11 12:54     ` Mel Gorman
2010-06-11 16:25     ` Christoph Hellwig
2010-06-11 17:43       ` Andrew Morton
2010-06-11 17:49         ` Christoph Hellwig
2010-06-11 18:13           ` Mel Gorman
2010-06-08  9:08 ` [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Christoph Hellwig
2010-06-08  9:28   ` Mel Gorman
2010-06-11 16:29     ` Christoph Hellwig
2010-06-11 18:15       ` Mel Gorman
2010-06-11 19:12       ` Chris Mason
2010-06-09  2:52 ` KAMEZAWA Hiroyuki
2010-06-09  9:52   ` Mel Gorman
2010-06-10  0:38     ` KAMEZAWA Hiroyuki [this message]
2010-06-10  1:10       ` Mel Gorman
2010-06-10  1:29         ` KAMEZAWA Hiroyuki
2010-06-11  5:57 ` Andrew Morton
2010-06-11 12:33   ` Mel Gorman
2010-06-11 16:30     ` Christoph Hellwig
2010-06-11 18:17       ` Mel Gorman
2010-06-15 14:00 ` Andrea Arcangeli
2010-06-15 14:11   ` Christoph Hellwig
2010-06-15 14:22     ` Andrea Arcangeli
2010-06-15 14:43       ` Christoph Hellwig
2010-06-15 15:08         ` Andrea Arcangeli
2010-06-15 15:25           ` Christoph Hellwig
2010-06-15 15:45             ` Andrea Arcangeli
2010-06-15 16:26               ` Christoph Hellwig
2010-06-15 16:31                 ` Andrea Arcangeli
2010-06-15 16:49                 ` Rik van Riel
2010-06-15 16:54                   ` Christoph Hellwig
2010-06-15 19:13                     ` Rik van Riel
2010-06-15 19:17                       ` Christoph Hellwig
2010-06-15 19:44                         ` Chris Mason
2010-06-16  7:57                       ` Nick Piggin
2010-06-16 16:59                         ` Rik van Riel
2010-06-16 17:04                           ` Andrea Arcangeli
2010-06-15 16:54                   ` Nick Piggin
2010-06-15 15:38           ` Mel Gorman
2010-06-15 16:14             ` Andrea Arcangeli
2010-06-15 16:22               ` Christoph Hellwig
2010-06-15 16:30               ` Mel Gorman
2010-06-15 16:34                 ` Mel Gorman
2010-06-15 16:54                   ` Andrea Arcangeli
2010-06-15 16:35                 ` Christoph Hellwig
2010-06-15 16:37                 ` Andrea Arcangeli
2010-06-15 17:43                   ` Christoph Hellwig
2010-06-15 16:45               ` Christoph Hellwig
2010-06-15 14:51   ` Mel Gorman
2010-06-15 14:55     ` Rik van Riel
2010-06-15 15:08     ` Nick Piggin
2010-06-15 15:10       ` Mel Gorman
2010-06-15 16:28     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100610093842.6a038ab0.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox