linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: nishimura@mxp.nes.nec.co.jp
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	"hugh@veritas.com" <hugh@veritas.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	nishimura@mxp.nes.nec.co.jp
Subject: Re: [PATCH] fix leak of swap accounting as stale swap cache under memcg
Date: Tue, 28 Apr 2009 11:38:00 +0900	[thread overview]
Message-ID: <isapiwc.d5d1bc3c.6e29.49f66c08.26940.be@mail.jp.nec.com> (raw)
In-Reply-To: <20090428101924.88f67e27.kamezawa.hiroyu@jp.fujitsu.com>

> On Tue, 28 Apr 2009 10:09:30 +0900
> nishimura@mxp.nes.nec.co.jp wrote:
> 
>> > On Mon, 27 Apr 2009 21:08:56 +0900
>> > Daisuke Nishimura <d-nishimura@mtf.biglobe.ne.jp> wrote:
>> > 
>> >> > Index: mmotm-2.6.30-Apr24/mm/vmscan.c
>> >> > ===================================================================
>> >> > --- mmotm-2.6.30-Apr24.orig/mm/vmscan.c
>> >> > +++ mmotm-2.6.30-Apr24/mm/vmscan.c
>> >> > @@ -661,6 +661,9 @@ static unsigned long shrink_page_list(st
>> >> >  		if (PageAnon(page) && !PageSwapCache(page)) {
>> >> >  			if (!(sc->gfp_mask & __GFP_IO))
>> >> >  				goto keep_locked;
>> >> > +			/* avoid making more stale swap caches */
>> >> > +			if (memcg_stale_swap_congestion())
>> >> > +				goto keep_locked;
>> >> >  			if (!add_to_swap(page))
>> >> >  				goto activate_locked;
>> >> >  			may_enter_fs = 1;
>> >> > 
>> >> Well, as I mentioned before(http://marc.info/?l=linux-kernel&m=124066623510867&w=2),
>> >> this cannot avoid type-2(set !PageCgroupUsed by the owner process via
>> >> page_remove_rmap()->mem_cgroup_uncharge_page() before being added to swap cache).
>> >> If these swap caches go through shrink_page_list() without beeing freed
>> >> for some reason, these swap caches doesn't go back to memcg's LRU.
>> >> 
>> >> Type-2 doesn't pressure memsw.usage, but you can see it by plotting
>> >> "grep SwapCached /proc/meminfo".
>> >> 
>> >> And I don't think it's a good idea to add memcg_stale_swap_congestion() here.
>> >> This means less possibility to reclaim pages.
>> >> 
>> > Hmm. maybe adding congestion_wait() ?
>> > 
>> I don't think no hook before add_to_swap() is needed.
>> 
>> >> Do you dislike the patch I attached in the above mail ?
>> >> 
>> > I doubt whether the patch covers all type-2 case.
>> > 
>> hmm, I didn't see any leak anymore when I tested the patch.
>> 
> 
> At first, your patch
> ==
>  		if (PageAnon(page) && !PageSwapCache(page)) {
>  			if (!(sc->gfp_mask & __GFP_IO))
>  				goto keep_locked;
> -			/* avoid making more stale swap caches */
> -			if (memcg_stale_swap_congestion())
> -				goto keep_locked;
>  			if (!add_to_swap(page))
>  				goto activate_locked;
> +			/*
> +			 * The owner process might have uncharged the page
> +			 * (by page_remove_rmap()) before it has been added
> +			 * to swap cache.
> +			 * Check it here to avoid making it stale.
> +			 */
> +			if (memcg_free_unused_swapcache(page))
> +				goto keep_locked;
>  			may_enter_fs = 1;
>  		}
> ==
> Should be
> ==
> 
> 	if (PageAnon(page) && !PageSwapCache(page)) {
> 		... // don't modify here
> 	}
> 	if (PageAnon(page) && PageSwapCache(page) && !page_mapped(page)) {
> 		if (try_to_free_page(page)) // or memcg_free_unused_swapcache()
> 			goto free_it;
> 	}
> ==
> I think.
> 
It may work too.

But if the page is on swap cache already at the point of page_remove_rmap()
-> mem_cgroup_uncharge_page, the page is not uncharged.
So, it can be freed in memcg's LRU scanning in the long run by
shrink_page_list()->pageout()->swap_writepage()->try_to_free_swap().

I added the hook there just because I wanted to clarify what the
problematic case is.

And I don't think "goto free_it" is good.
It calls free_hot_cold_page(), but some process (like swapoff) might
have got the swap cache already and be waiting for the lock of the page.

> And we need hook to free_swap_and_cache() for handling PageWriteback() case.
> 
Ah, You're right.


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-04-28  2:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-27  9:12 KAMEZAWA Hiroyuki
2009-04-27 10:13 ` Balbir Singh
2009-04-27 11:35   ` Daisuke Nishimura
2009-04-27 19:17     ` Balbir Singh
2009-04-27 23:57       ` KAMEZAWA Hiroyuki
2009-04-28 21:46         ` Balbir Singh
2009-04-30  0:03           ` KAMEZAWA Hiroyuki
2009-04-28  0:41   ` KAMEZAWA Hiroyuki
2009-04-27 12:08 ` Daisuke Nishimura
2009-04-28  0:19   ` KAMEZAWA Hiroyuki
2009-04-28  1:09     ` nishimura
2009-04-28  1:19       ` KAMEZAWA Hiroyuki
2009-04-28  2:38         ` nishimura [this message]
2009-04-28  3:49       ` Daisuke Nishimura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=isapiwc.d5d1bc3c.6e29.49f66c08.26940.be@mail.jp.nec.com \
    --to=nishimura@mxp.nes.nec.co.jp \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=hugh@veritas.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox