From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Satoru Moriya <satoru.moriya@hds.com>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>,
Randy Dunlap <rdunlap@xenotime.net>,
Satoru Moriya <smoriya@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"lwoodman@redhat.com" <lwoodman@redhat.com>,
Seiji Aguchi <saguchi@redhat.com>,
"hughd@google.com" <hughd@google.com>,
"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH -v2 -mm] add extra free kbytes tunable
Date: Thu, 13 Oct 2011 17:09:07 +0900 [thread overview]
Message-ID: <20111013170907.80775c54.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20111013073321.GA2784@barrios-desktop>
On Thu, 13 Oct 2011 16:33:21 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
> On Fri, Sep 02, 2011 at 12:31:14PM -0400, Satoru Moriya wrote:
> > On 09/01/2011 05:58 PM, Andrew Morton wrote:
> > > On Thu, 1 Sep 2011 15:26:50 -0400
> > > Rik van Riel <riel@redhat.com> wrote:
> > >
> > >> Add a userspace visible knob
> > >
> > > argh. Fear and hostility at new knobs which need to be maintained for
> > > ever, even if the underlying implementation changes.
> > >
> > > Unfortunately, this one makes sense.
> > >
> > >> to tell the VM to keep an extra amount of memory free, by increasing
> > >> the gap between each zone's min and low watermarks.
> > >>
> > >> This is useful for realtime applications that call system calls and
> > >> have a bound on the number of allocations that happen in any short
> > >> time period. In this application, extra_free_kbytes would be left at
> > >> an amount equal to or larger than the maximum number of
> > >> allocations that happen in any burst.
> > >
> > > _is_ it useful? Proof?
> > >
> > > Who is requesting this? Have they tested it? Results?
> >
> > This is interesting for me.
> >
> > Some of our customers have realtime applications and they are concerned
> > the fact that Linux uses free memory as pagecache. It means that
> > when their application allocate memory, Linux kernel tries to reclaim
> > memory at first and then allocate it. This may make memory allocation
> > latency bigger.
> >
> > In many cases this is not a big issue because Linux has kswapd for
> > background reclaim and it is fast enough not to enter direct reclaim
> > path if there are a lot of clean cache. But under some situations -
> > e.g. Application allocates a lot of memory which is larger than delta
> > between watermark_low and watermark_min in a short time and kswapd
> > can't reclaim fast enough due to dirty page reclaim, direct reclaim
> > is executed and causes big latency.
> >
> > We can avoid the issue above by using preallocation and mlock.
> > But it can't cover kmalloc used in systemcall. So I'd like to use
> > this patch with mlock to avoid memory allocation latency issue as
> > low as possible. It may not be a perfect solution but it is important
> > for customers in enterprise area to configure the amount of free
> > memory at their own risk.
>
> I agree needs for such feature but don't like such primitive interface
> exporting to user.
>
> As Satoru said, we can reserve free pages for user through preallocation and mlocking.
> The thing is free pages for kernel itself.
> Most desirable thing is we have to avoid syscall in critical realtime section.
> But if we can't avoid, my crazy idea is to use memcg for kernel pages.
> Of course, we should implement it and not simple stuff but AFAIK, memcg people
> always consider it and finally will do it. :)
> Recently, Glauber try "Basic kernel memory functionality" but I don't have reviewed
> it yet. I am not sure we can reuse it, anyway. Kame?
>
I reviewed it and it seems good. It adds kmem.limit_in_bytes then we're ready
to go forward to kernel memory cgroup.
But it adds only interfaces now.
I think Greg Thelen <gthelen@google.com> has some idea.
> My simple idea is as follows,
>
> We can assign basic revered page pool and/or size of user-determined pages pool
> for each task registred at memcg-slab.
Hmm, memcg-mempool ?
> The application have to notify start of RT section to memcg before it goes to
> RT section. So, memcg could fill up page pool if it is short. In this case,
> application can stuck but it's okay as it doesn't go to RT section yet.
> The applicatoin have to notify end of RT section to memcg, too so that memcg
> could try to fill up reserved page pool in case of shortage.
>
That 'notification' doesn't sounds good to me. When application died/moved to
other group without notification, memcg will be unstable.
It should be task's state rather than memcg's state.
> Why we need such notification is kswapd high prioiry, new knob and others never
> can meet application's deadline requirement in some situations(ex,
> there are so many dirty pages in LRU or fill up anon pages in non-swap case and so on)
> so that application might end up stuck at some point. The somepoint must be out of RT
> section of the task.
>
> For implemenation, we might need new watermark setting for each memcg or/and
> kswapd prioirity promotion like thing for hurry reclaiming.
> Anyway, they are just implementaions and we could enhance/add further more through
> various techniques as time goes by.
>
> Personally, I think it could a valuable featue.
>
Hmm. For avoid latency at allocation, what we can do is only pre-allocation before it's
required. But the problem is that applications cannot forecast when the 'burst' allocation
happens and we need to prepare memory pool always.
I think we need 2 implemenations.
1. free-page mempool for a memcg.
2. a background reclaim thread for a memcg. This is triggered by mempool.
Prioritity of this thread should be able to controlled by some ways.
If we take care of memcg's limit, watermark should trigger background reclaim.
?
But the memory reclaim routine should never be in sleep...
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-10-13 8:10 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-01 14:52 [PATCH " Rik van Riel
2011-09-01 17:06 ` Randy Dunlap
2011-09-01 19:26 ` [PATCH -v2 " Rik van Riel
2011-09-01 21:58 ` Andrew Morton
2011-09-01 22:08 ` David Rientjes
2011-09-01 22:16 ` Andrew Morton
2011-09-02 16:31 ` Satoru Moriya
2011-10-13 7:33 ` Minchan Kim
2011-10-13 8:09 ` KAMEZAWA Hiroyuki [this message]
[not found] ` <E1FA588BC672D846BDBB452FCA1E308C2389B4@USINDEVS02.corp.hds.com>
2011-09-15 3:33 ` Satoru Moriya
2011-09-01 22:09 ` Andrew Morton
2011-09-02 16:26 ` [PATCH -mm] fixes & cleanups for "add extra free kbytes tunable" Rik van Riel
2011-09-30 21:43 ` [PATCH -v2 -mm] add extra free kbytes tunable Johannes Weiner
2011-10-08 3:08 ` David Rientjes
2011-10-10 22:37 ` Andrew Morton
2011-10-11 19:32 ` Satoru Moriya
2011-10-11 19:54 ` Andrew Morton
2011-10-11 20:23 ` Satoru Moriya
2011-10-11 20:54 ` Andrew Morton
2011-10-12 13:09 ` Rik van Riel
2011-10-12 19:20 ` Andrew Morton
2011-10-12 19:58 ` Rik van Riel
2011-10-12 20:26 ` David Rientjes
2011-10-21 23:48 ` Satoru Moriya
2011-10-23 21:22 ` David Rientjes
2011-10-25 2:04 ` Satoru Moriya
2011-10-25 21:50 ` David Rientjes
2011-10-26 18:59 ` Satoru Moriya
2011-10-12 21:08 ` Satoru Moriya
2011-10-12 22:41 ` David Rientjes
2011-10-12 23:52 ` Satoru Moriya
2011-10-13 0:01 ` David Rientjes
2011-10-13 5:35 ` KAMEZAWA Hiroyuki
2011-10-13 20:55 ` David Rientjes
2011-10-14 22:16 ` Satoru Moriya
2011-10-14 22:46 ` David Rientjes
2011-10-14 5:32 ` Satoru Moriya
2011-10-14 5:06 ` Satoru Moriya
2011-10-11 23:22 ` David Rientjes
2011-10-13 16:54 ` Satoru Moriya
2011-10-13 20:48 ` David Rientjes
2011-10-13 21:11 ` Rik van Riel
2011-10-13 22:02 ` David Rientjes
2011-10-11 19:20 ` Satoru Moriya
2011-10-11 21:04 ` David Rientjes
2011-10-12 13:13 ` Rik van Riel
2011-10-12 20:21 ` David Rientjes
2011-10-13 4:13 ` Rik van Riel
2011-10-13 5:22 ` David Rientjes
2011-10-22 0:11 ` Satoru Moriya
2011-09-09 23:01 Satoru Moriya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111013170907.80775c54.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=minchan.kim@gmail.com \
--cc=rdunlap@xenotime.net \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=saguchi@redhat.com \
--cc=satoru.moriya@hds.com \
--cc=smoriya@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox