From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"hugh@veritas.com" <hugh@veritas.com>
Subject: Re: [RFC][PATCH] fix swap entries is not reclaimed in proper way for memg v3.
Date: Mon, 27 Apr 2009 14:13:47 +0530 [thread overview]
Message-ID: <20090427084347.GJ4454@balbir.in.ibm.com> (raw)
In-Reply-To: <20090427172119.d84aaa68.kamezawa.hiroyu@jp.fujitsu.com>
* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-04-27 17:21:19]:
> On Mon, 27 Apr 2009 13:42:06 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-04-24 16:28:40]:
> >
> > > This is new one. (using new logic.) Maybe enough light-weight and caches all cases.
> >
> > You sure mean catches above :)
> >
> >
> > >
> > > Thanks,
> > > -Kame
> > > ==
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > >
> > > Because free_swap_and_cache() function is called under spinlocks,
> > > it can't sleep and use trylock_page() instead of lock_page().
> > > By this, swp_entry which is not used after zap_xx can exists as
> > > SwapCache, which will be never used.
> > > This kind of SwapCache is reclaimed by global LRU when it's found
> > > at LRU rotation.
> > >
> > > When memory cgroup is used, the global LRU will not be kicked and
> > > stale Swap Caches will not be reclaimed. This is problematic because
> > > memcg's swap entry accounting is leaked and memcg can't know it.
> > > To catch this stale SwapCache, we have to chase it and check the
> > > swap is alive or not again.
> > >
> > > This patch adds a function to chase stale swap cache and reclaim it
> > > in modelate way. When zap_xxx fails to remove swap ent, it will be
> > > recoreded into buffer and memcg's "work" will reclaim it later.
> > > No sleep, no memory allocation under free_swap_and_cache().
> > >
> > > This patch also adds stale-swap-cache-congestion logic and try to avoid having
> > > too much stale swap caches at the same time.
> > >
> > > Implementation is naive but maybe the cost meets trade-off.
> > >
> > > How to test:
> > > 1. set limit of memory to very small (1-2M?).
> > > 2. run some amount of program and run page reclaim/swap-in.
> > > 3. kill programs by SIGKILL etc....then, Stale Swap Cache will
> > > be increased. After this patch, stale swap caches are reclaimed
> > > and mem+swap controller will not go to OOM.
> > >
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Quick comment on the design
> >
> > 1. I like the marking of swap cache entries as stale
>
> I like to. But there is no space to record it as stale. And "race" makes
> that difficult even if we have enough space. If you read the whole thread,
> you know there are many patterns of race.
There have been several iterations of this discussion, summarizing it
would be nice, let me find the thread.
>
> > 2. Can't we reclaim stale entries during memcg LRU reclaim? Why write
> > a GC for it?
> >
> Because they are not on memcg LRU. we can't reclaim it by memcg LRU.
> (See the first mail from Nishimura of this thread. It explains well.)
>
Hmm.. I don't find it, let me do a more exhaustive search on the web.
If the entry is stale and not on memcg LRU, it is still accounted to
the memcg?
> One easy case is here.
>
> - CPU0 call zap_pte()->free_swap_and_cache()
> - CPU1 tries to swap-in it.
> In this case, free_swap_and_cache() doesn't free swp_entry and swp_entry
> is read into the memory. But it will never be added memcg's LRU until
> it's mapped.
That is strange.. not even added to the LRU as a cached page?
> (What we have to consider here is swapin-readahead. It can swap-in memory
> even if it's not accessed. Then, this race window is larger than expected.)
>
> We can't use memcg's LRU then...what we can do is.
>
> - scanning global LRU all
> or
> - use some trick to reclaim them in lazy way.
>
Thanks for being patient, some of these questions have been discussed
before I suppose. Let me dig out the thread.
--
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-04-27 8:44 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-21 7:21 [RFC][PATCH] fix swap entries is not reclaimed in proper way for mem+swap controller KAMEZAWA Hiroyuki
2009-04-22 5:38 ` Daisuke Nishimura
2009-04-22 6:10 ` KAMEZAWA Hiroyuki
2009-04-23 4:14 ` Daisuke Nishimura
2009-04-23 8:45 ` KAMEZAWA Hiroyuki
2009-04-24 4:33 ` KAMEZAWA Hiroyuki
2009-04-24 6:21 ` Daisuke Nishimura
2009-04-24 7:28 ` [RFC][PATCH] fix swap entries is not reclaimed in proper way for memg v3 KAMEZAWA Hiroyuki
2009-04-24 8:07 ` Daisuke Nishimura
2009-04-25 12:54 ` Daisuke Nishimura
2009-04-25 16:06 ` Daisuke Nishimura
2009-04-27 7:39 ` KAMEZAWA Hiroyuki
2009-04-27 8:12 ` Balbir Singh
2009-04-27 8:21 ` KAMEZAWA Hiroyuki
2009-04-27 8:43 ` Balbir Singh [this message]
2009-04-27 8:49 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090427084347.GJ4454@balbir.in.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox