From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, Dave Chinner <david@fromorbit.com>,
Chris Mason <chris.mason@oracle.com>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>
Subject: Re: [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible
Date: Thu, 10 Jun 2010 09:38:42 +0900 [thread overview]
Message-ID: <20100610093842.6a038ab0.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20100609095200.GA5650@csn.ul.ie>
On Wed, 9 Jun 2010 10:52:00 +0100
Mel Gorman <mel@csn.ul.ie> wrote:
> On Wed, Jun 09, 2010 at 11:52:11AM +0900, KAMEZAWA Hiroyuki wrote:
> > > <SNIP>
> >
> > My concern is how memcg should work. IOW, what changes will be necessary for
> > memcg to work with the new vmscan logic as no-direct-writeback.
> >
>
> At worst, memcg waits on background flushers to clean their pages but
> obviously this could lead to stalls in containers if it happened to be full
> of dirty pages.
>
yes.
> Do you have test scenarios already setup for functional and performance
> regression testing of containers? If so, can you run tests with this series
> and see what sort of impact you find? I haven't done performance testing
> with containers to date so I don't know what the expected values are.
>
Maybe kernbench is enough. I think it does enough write and malloc.
'Limit' size for test depends on your host. I sometimes does this on
8cpu SMP box.
# mount -t cgroup none /cgroups -o memory
# mkdir /cgroups/A
# echo $$ > /cgroups/A
# echo 300M > /cgroups/memory.limit_in_bytes
# make -j 8 or make -j 16
Comparing size of swap and speed will be interesting.
(Above 300M is enough small because my test machine has 24G memory.)
Or
# mount -t cgroup none /cgroups -o memory
# mkdir /cgroups/A
# echo $$ > /cgroups/A
# echo 50M > /cgroups/memory.limit_in_bytes
# dd if=/dev/zero of=./tmpfile bs=65536 count=100000
or some. When I tested the original patch for "avoiding writeback" by
Dave Chinner, I saw 2 ooms in 10 tests.
If not patched, I never see OOM.
> > Maybe an ideal solution will be
> > - support buffered I/O tracking in I/O cgroup.
> > - flusher threads should work with I/O cgroup.
> > - memcg itself should support dirty ratio. and add a trigger to kick flusher
> > threads for dirty pages in a memcg.
> > But I know it's a long way.
> >
>
> I'm not very familiar with memcg I'm afraid or its requirements so I am
> having trouble guessing which of these would behave the best. You could take
> a gamble on having memcg doing writeback in direct reclaim but you may run
> into the same problem of overflowing stacks.
>
maybe.
> I'm not sure how a flusher thread would work just within a cgroup. It
> would have to do a lot of searching to find the pages it needs
> considering that it's looking at inodes rather than pages.
>
yes. So, I(we) need some way for coloring inode for selectable writeback.
But people in this area are very nervous about performance (me too ;), I've
not found the answer yet.
> One possibility I guess would be to create a flusher-like thread if a direct
> reclaimer finds that the dirty pages in the container are above the dirty
> ratio. It would scan and clean all dirty pages in the container LRU on behalf
> of dirty reclaimers.
>
Yes, that's possible. But Andrew recommends not to do add more threads. So,
I'll use workqueue if necessary.
> Another possibility would be to have kswapd work in containers.
> Specifically, if wakeup_kswapd() is called with a cgroup that it's added
> to a list. kswapd gives priority to global reclaim but would
> occasionally check if there is a container that needs kswapd on a
> pending list and if so, work within the container. Is there a good
> reason why kswapd does not work within container groups?
>
One reason is node v.s. memcg.
Because memcg doesn't limit memory placement, a container can contain pages
from the all nodes. So,it's a bit problem which node's kswapd we should run .
(but yes, maybe small problem.)
Another is memory-reclaim-prioirty between memcg.
(I don't want to add such a knob...)
Maybe it's time to consider about that.
Now, we're using kswapd for softlimit. I think similar hints for kswapd
should work. yes.
> Finally, you could just allow reclaim within a memcg do writeback. Right
> now, the check is based on current_is_kswapd() but I could create a helper
> function that also checked for sc->mem_cgroup. Direct reclaim from the
> page allocator never appears to work within a container group (which
> raises questions in itself such as why a process in a container would
> reclaim pages outside the container?) so it would remain safe.
>
isolate_lru_pages() for memcg finds only pages in a memcg ;)
> > How the new logic works with memcg ? Because memcg doesn't trigger kswapd,
> > memcg has to wait for a flusher thread make pages clean ?
>
> Right now, memcg has to wait for a flusher thread to make pages clean.
>
ok.
> > Or memcg should have kswapd-for-memcg ?
> >
> > Is it okay to call writeback directly when !scanning_global_lru() ?
> > memcg's reclaim routine is only called from specific positions, so, I guess
> > no stack problem.
>
> It's a judgement call from you really. I see that direct reclaimers do
> not set mem_cgroup so it's down to - are you reasonably sure that all
> the paths that reclaim based on a container are not deep?
One concerns is add_to_page_cache(). If it's called in deep stack, my assumption
is wrong.
> I looked
> around for a while and the bulk appeared to be in the fault path so I
> would guess "yes" but as I'm not familiar with the memcg implementation
> I'll have missed a lot.
>
> > But we just have I/O pattern problem.
>
> True.
>
Okay, I'll consider about how to kick kswapd via memcg or flusher-for-memcg.
Please go ahead as you want. I love good I/O pattern, too.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-06-10 0:43 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-08 9:02 Mel Gorman
2010-06-08 9:02 ` [PATCH 1/6] tracing, vmscan: Add trace events for kswapd wakeup, sleeping and direct reclaim Mel Gorman
2010-06-08 9:02 ` [PATCH 2/6] tracing, vmscan: Add trace events for LRU page isolation Mel Gorman
2010-06-08 9:02 ` [PATCH 3/6] tracing, vmscan: Add trace event when a page is written Mel Gorman
2010-06-08 9:02 ` [PATCH 4/6] tracing, vmscan: Add a postprocessing script for reclaim-related ftrace events Mel Gorman
2010-06-08 9:02 ` [PATCH 5/6] vmscan: Write out ranges of pages contiguous to the inode where possible Mel Gorman
2010-06-11 6:10 ` Andrew Morton
2010-06-11 12:49 ` Mel Gorman
2010-06-11 19:07 ` Andrew Morton
2010-06-11 20:44 ` Mel Gorman
2010-06-11 21:33 ` Andrew Morton
2010-06-12 0:17 ` Mel Gorman
2010-06-11 16:27 ` Christoph Hellwig
2010-06-08 9:02 ` [PATCH 6/6] vmscan: Do not writeback pages in direct reclaim Mel Gorman
2010-06-11 6:17 ` Andrew Morton
2010-06-11 12:54 ` Mel Gorman
2010-06-11 16:25 ` Christoph Hellwig
2010-06-11 17:43 ` Andrew Morton
2010-06-11 17:49 ` Christoph Hellwig
2010-06-11 18:13 ` Mel Gorman
2010-06-08 9:08 ` [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Christoph Hellwig
2010-06-08 9:28 ` Mel Gorman
2010-06-11 16:29 ` Christoph Hellwig
2010-06-11 18:15 ` Mel Gorman
2010-06-11 19:12 ` Chris Mason
2010-06-09 2:52 ` KAMEZAWA Hiroyuki
2010-06-09 9:52 ` Mel Gorman
2010-06-10 0:38 ` KAMEZAWA Hiroyuki [this message]
2010-06-10 1:10 ` Mel Gorman
2010-06-10 1:29 ` KAMEZAWA Hiroyuki
2010-06-11 5:57 ` Andrew Morton
2010-06-11 12:33 ` Mel Gorman
2010-06-11 16:30 ` Christoph Hellwig
2010-06-11 18:17 ` Mel Gorman
2010-06-15 14:00 ` Andrea Arcangeli
2010-06-15 14:11 ` Christoph Hellwig
2010-06-15 14:22 ` Andrea Arcangeli
2010-06-15 14:43 ` Christoph Hellwig
2010-06-15 15:08 ` Andrea Arcangeli
2010-06-15 15:25 ` Christoph Hellwig
2010-06-15 15:45 ` Andrea Arcangeli
2010-06-15 16:26 ` Christoph Hellwig
2010-06-15 16:31 ` Andrea Arcangeli
2010-06-15 16:49 ` Rik van Riel
2010-06-15 16:54 ` Christoph Hellwig
2010-06-15 19:13 ` Rik van Riel
2010-06-15 19:17 ` Christoph Hellwig
2010-06-15 19:44 ` Chris Mason
2010-06-16 7:57 ` Nick Piggin
2010-06-16 16:59 ` Rik van Riel
2010-06-16 17:04 ` Andrea Arcangeli
2010-06-15 16:54 ` Nick Piggin
2010-06-15 15:38 ` Mel Gorman
2010-06-15 16:14 ` Andrea Arcangeli
2010-06-15 16:22 ` Christoph Hellwig
2010-06-15 16:30 ` Mel Gorman
2010-06-15 16:34 ` Mel Gorman
2010-06-15 16:54 ` Andrea Arcangeli
2010-06-15 16:35 ` Christoph Hellwig
2010-06-15 16:37 ` Andrea Arcangeli
2010-06-15 17:43 ` Christoph Hellwig
2010-06-15 16:45 ` Christoph Hellwig
2010-06-15 14:51 ` Mel Gorman
2010-06-15 14:55 ` Rik van Riel
2010-06-15 15:08 ` Nick Piggin
2010-06-15 15:10 ` Mel Gorman
2010-06-15 16:28 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100610093842.6a038ab0.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox