linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Hugh Dickins <hughd@google.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>
Subject: Re: [PATCH RFC 00/15] mm: memory book keeping and lru_lock splitting
Date: Fri, 17 Feb 2012 08:54:31 +0900	[thread overview]
Message-ID: <20120217085431.80daa020.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <4F3CE243.9050203@openvz.org>

On Thu, 16 Feb 2012 15:02:27 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Thu, 16 Feb 2012 09:43:52 +0400
> > Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
> >
> >> KAMEZAWA Hiroyuki wrote:
> >>> On Thu, 16 Feb 2012 02:57:04 +0400
> >>> Konstantin Khlebnikov<khlebnikov@openvz.org>   wrote:
> >
> >>>> * optimize page to book translations, move it upper in the call stack,
> >>>>     replace some struct zone arguments with struct book pointer.
> >>>>
> >>>
> >>> a page->book transrater from patch 2/15
> >>>
> >>> +struct book *page_book(struct page *page)
> >>> +{
> >>> +	struct mem_cgroup_per_zone *mz;
> >>> +	struct page_cgroup *pc;
> >>> +
> >>> +	if (mem_cgroup_disabled())
> >>> +		return&page_zone(page)->book;
> >>> +
> >>> +	pc = lookup_page_cgroup(page);
> >>> +	if (!PageCgroupUsed(pc))
> >>> +		return&page_zone(page)->book;
> >>> +	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
> >>> +	smp_rmb();
> >>> +	mz = mem_cgroup_zoneinfo(pc->mem_cgroup,
> >>> +			page_to_nid(page), page_zonenum(page));
> >>> +	return&mz->book;
> >>> +}
> >>>
> >>> What happens when pc->mem_cgroup is rewritten by move_account() ?
> >>> Where is the guard for lockless access of this ?
> >>
> >> Initially this suppose to be protected with lru_lock, in final patch they are protected with rcu.
> >
> > Hmm, VM_BUG_ON(!PageLRU(page)) ?
> 
> Where?
> 

You said this is guarded by lru_lock. So, page should be on LRU.



> >
> > move_account() overwrites pc->mem_cgroup with isolating page from LRU.
> > but it doesn't take lru_lock.
> 
> There three kinds of lock_page_book() users:
> 1) caller want to catch page in LRU, it will lock either old or new book and
>     recheck PageLRU() after locking, if page not it in LRU it don't touch anything.
>     some of these functions has stable reference to page, some of them not.
>   [ There actually exist small race, I knew about it, just forget to pick this chunk from old code. See below. ]
> 2) page is isolated by caller, it want to put it back. book link is stable. no problems.
> 3) page-release functions. page-counter is zero. no references -- no problems.
> 
> race for 1)
> 
> catcher					switcher
> 
> 					# isolate
> 					old_book = lock_page_book(page)
> 					ClearPageLRU(page)
> 					unlock_book(old_book)				
> 					# charge
> old_book = lock_page_book(page)		
> 					# switch
> 					page->book = new_book
> 					# putback
> 					lock_book(new_book)
> 					SetPageLRU(page)
> 					unlock_book(new_book)
> if (PageLRU(page))
> 	oops, page actually in new_book
> unlock_book(old_book)
> 
> 
> I'll protect "switch" phase with old_book lru-lock:
> 
In linex-next, pc->mem_cgroup is modified only when Page is on LRU.

When we need to touch "book", if !PageLRU() ?


> lock_book(old_book)
> page->book = new_book
> unlock_book(old_book)
> 
> The other option is recheck in "catcher" page book after PageLRU()
> maybe there exists some other variants.
> 
> > BTW, what amount of perfomance benefit ?
> 
> It depends, but usually lru_lock is very-very hot.
> This lock splitting can be used without cgroups and containers,
> now huge zones can be easily sliced into arbitrary pieces, for example one book per 256Mb.
> 
I personally think reducing lock by pagevec works enough well.
So, want to see perforamance on real machine with real apps.


> 
> According to my experience, one of complicated thing there is how to postpone "book" destroying
> if some its pages are isolated. For example lumpy reclaim and memory compaction isolates pages
> from several books. And they wants to put them back. Currently this can be broken, if someone removes
> cgroup in wrong moment. There appears funny races with three players: catcher, switcher and destroyer.

Thank you for pointing out. Hmm... it can happen ? Currently, at cgroup destroying,
force_empty() works 

  1. find a page from LRU
  2. remove it from LRU
  3. move it or reclaim it (you said "switcher")
  4. if res.usage != 0 goto 1.

I think "4" will finally keep cgroup from being destroyed.


> This can be fixed with some extra reference-counting or some other sleepable synchronizing.
> In my rhel6-based implementation I uses extra reference-counting, and it looks ugly. So I want to invent something better.
> Other option is just never release books, reuse them after rcu grace period for rcu-list iterating.
> 

Another reference counting is very very bad.



Thanks,
-Kame





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-02-16 23:56 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-15 22:57 Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 01/15] mm: rename struct lruvec into struct book Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 02/15] mm: memory bookkeeping core Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 03/15] mm: add book->pages_count Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 04/15] mm: unify inactive_list_is_low() Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 05/15] mm: add book->reclaim_stat Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 06/15] mm: kill struct mem_cgroup_zone Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 07/15] mm: move page-to-book translation upper Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 08/15] mm: introduce book locking primitives Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 09/15] mm: handle book relocks on lumpy reclaim Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 10/15] mm: handle book relocks in compaction Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 11/15] mm: handle book relock in memory controller Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 12/15] mm: optimize books in update_page_reclaim_stat() Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 13/15] mm: optimize books in pagevec_lru_move_fn() Konstantin Khlebnikov
2012-02-15 22:57 ` [PATCH RFC 14/15] mm: optimize putback for 0-order reclaim Konstantin Khlebnikov
2012-02-15 22:58 ` [PATCH RFC 15/15] mm: split zone->lru_lock Konstantin Khlebnikov
2012-02-16  2:04 ` [PATCH RFC 00/15] mm: memory book keeping and lru_lock splitting KAMEZAWA Hiroyuki
2012-02-16  5:43   ` Konstantin Khlebnikov
2012-02-16  8:24     ` KAMEZAWA Hiroyuki
2012-02-16 11:02       ` Konstantin Khlebnikov
2012-02-16 15:54         ` Konstantin Khlebnikov
2012-02-16 23:54         ` KAMEZAWA Hiroyuki [this message]
2012-02-18  9:09           ` Konstantin Khlebnikov
2012-02-16  2:37 ` Hugh Dickins
2012-02-16  4:51   ` Konstantin Khlebnikov
2012-02-16 21:37     ` Hugh Dickins
2012-02-17 19:56       ` Konstantin Khlebnikov
2012-02-18  2:13       ` Hugh Dickins
2012-02-18  6:35         ` Konstantin Khlebnikov
2012-02-18  7:14           ` Hugh Dickins
2012-02-20  0:32             ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120217085431.80daa020.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=khlebnikov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox