linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Mel Gorman <mgorman@suse.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Petr Holasek <pholasek@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Izik Eidus <izik.eidus@ravellosystems.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 7/11] ksm: make KSM page migration possible
Date: Fri, 8 Feb 2013 12:52:12 -0800 (PST)	[thread overview]
Message-ID: <alpine.LNX.2.00.1302081133540.4233@eggly.anvils> (raw)
In-Reply-To: <20130205191102.GM21389@suse.de>

Paul, I've added you to the Cc in the hope that you can shed your light
on an smp_read_barrier_depends() question with which Mel taxes me below.
You may ask for more context: linux-next currently has an mm/ksm.c after
this patch is applied, but you may have questions beyond that - thanks!

On Tue, 5 Feb 2013, Mel Gorman wrote:
> On Fri, Jan 25, 2013 at 06:03:31PM -0800, Hugh Dickins wrote:
> > KSM page migration is already supported in the case of memory hotremove,
> > which takes the ksm_thread_mutex across all its migrations to keep life
> > simple.
> > 
> > But the new KSM NUMA merge_across_nodes knob introduces a problem, when
> > it's set to non-default 0: if a KSM page is migrated to a different NUMA
> > node, how do we migrate its stable node to the right tree?  And what if
> > that collides with an existing stable node?
> > 
> > So far there's no provision for that, and this patch does not attempt
> > to deal with it either.  But how will I test a solution, when I don't
> > know how to hotremove memory? 
> 
> Just reach in and yank it straight out with a chisel.

:)

> 
> > The best answer is to enable KSM page
> > migration in all cases now, and test more common cases.  With THP and
> > compaction added since KSM came in, page migration is now mainstream,
> > and it's a shame that a KSM page can frustrate freeing a page block.
> > 
> 
> THP will at least check if migration within a node works. It won't
> necessarily check we can migrate across nodes properly but it's a lot
> better than nothing.

No, I went back and dug out a hack-patch I was using three or four years
ago: occasionally on fault, just migrate every possible page in that mm
for no reason other than to test page migration.

> >  static struct page *get_ksm_page(struct stable_node *stable_node, bool locked)
> >  {
> >  	struct page *page;
> >  	void *expected_mapping;
> > +	unsigned long kpfn;
> >  
> > -	page = pfn_to_page(stable_node->kpfn);
> >  	expected_mapping = (void *)stable_node +
> >  				(PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
> > -	if (page->mapping != expected_mapping)
> > -		goto stale;
> > -	if (!get_page_unless_zero(page))
> > +again:
> > +	kpfn = ACCESS_ONCE(stable_node->kpfn);
> > +	page = pfn_to_page(kpfn);
> > +
> 
> Ok.
> 
> There should be no concern that hot-remove made the kpfn invalid because
> those stable tree entries should have been discarded.

Yes.

> 
> > +	/*
> > +	 * page is computed from kpfn, so on most architectures reading
> > +	 * page->mapping is naturally ordered after reading node->kpfn,
> > +	 * but on Alpha we need to be more careful.
> > +	 */
> > +	smp_read_barrier_depends();
> 
> The value of page is data dependant on pfn_to_page(). Is it really possible
> for that to be re-ordered even on Alpha?

My intuition (to say "understanding" would be an exaggeration) is that
on Alpha a very old value of page->mapping (in the line below) might be
lying around and read from one cache, which has not necessarily been
invalidated by ksm_migrate_page() pointing stable_node->kpfn to this
new page.

And if that happens, we could easily and mistakenly conclude that this
stable node is stale: although there's an smp_rmb() after goto stale,
stable_node->kpfn would still match kpfn, and we wrongly remove the node.

My confidence that I've expressed that clearly in words, is lower than
my confidence that I've coded it right; and if I'm wrong, yes, surely
it's better to remove any cargo-cult smp_read_barrier_depends().

> 
> > +	if (ACCESS_ONCE(page->mapping) != expected_mapping)
> >  		goto stale;
> > -	if (page->mapping != expected_mapping) {
> > +
> > +	/*
> > +	 * We cannot do anything with the page while its refcount is 0.
> > +	 * Usually 0 means free, or tail of a higher-order page: in which
> > +	 * case this node is no longer referenced, and should be freed;
> > +	 * however, it might mean that the page is under page_freeze_refs().
> > +	 * The __remove_mapping() case is easy, again the node is now stale;
> > +	 * but if page is swapcache in migrate_page_move_mapping(), it might
> > +	 * still be our page, in which case it's essential to keep the node.
> > +	 */
> > +	while (!get_page_unless_zero(page)) {
> > +		/*
> > +		 * Another check for page->mapping != expected_mapping would
> > +		 * work here too.  We have chosen the !PageSwapCache test to
> > +		 * optimize the common case, when the page is or is about to
> > +		 * be freed: PageSwapCache is cleared (under spin_lock_irq)
> > +		 * in the freeze_refs section of __remove_mapping(); but Anon
> > +		 * page->mapping reset to NULL later, in free_pages_prepare().
> > +		 */
> > +		if (!PageSwapCache(page))
> > +			goto stale;
> > +		cpu_relax();
> > +	}
> 
> The recheck of stable_node->kpfn check after a barrier distinguishes between
> a free and a completed migration, that's fine. I'm hesitate to ask because
> it must be obvious but where is the guarantee that a KSM page is in the
> swap cache?

Certainly none at all: it's the less common case that a KSM page is in
swap cache.  But if it is not in swap cache, how could its page count be
0 (causing get_page_unless_zero to fail)?  By being free, or well on its
way to being freed (hence stale); or reused as part of a compound page
(hence stale also); or reused for another purpose which arrives at a
page_freeze_refs() (hence stale also); other cases?

It's hard to see from the diff, but in the original version of
get_ksm_page(), !get_page_unless_zero goes straight to stale.

Don't for a moment imagine that this function sprang fully formed
from my mind: it was hard to get it working right (the swap cache
get_page_unless_zero failure during migration really caught me out),
and then to pare it down to its fairly simple final form.

Hugh

> 
> > +
> > +	if (ACCESS_ONCE(page->mapping) != expected_mapping) {
> >  		put_page(page);
> >  		goto stale;
> >  	}
> > +
> >  	if (locked) {
> >  		lock_page(page);
> > -		if (page->mapping != expected_mapping) {
> > +		if (ACCESS_ONCE(page->mapping) != expected_mapping) {
> >  			unlock_page(page);
> >  			put_page(page);
> >  			goto stale;
> >  		}
> >  	}
> >  	return page;
> > +
> >  stale:
> > +	/*
> > +	 * We come here from above when page->mapping or !PageSwapCache
> > +	 * suggests that the node is stale; but it might be under migration.
> > +	 * We need smp_rmb(), matching the smp_wmb() in ksm_migrate_page(),
> > +	 * before checking whether node->kpfn has been changed.
> > +	 */
> > +	smp_rmb();
> > +	if (ACCESS_ONCE(stable_node->kpfn) != kpfn)
> > +		goto again;
> >  	remove_node_from_stable_tree(stable_node);
> >  	return NULL;
> >  }
> > @@ -1903,6 +1947,14 @@ void ksm_migrate_page(struct page *newpa
> >  	if (stable_node) {
> >  		VM_BUG_ON(stable_node->kpfn != page_to_pfn(oldpage));
> >  		stable_node->kpfn = page_to_pfn(newpage);
> > +		/*
> > +		 * newpage->mapping was set in advance; now we need smp_wmb()
> > +		 * to make sure that the new stable_node->kpfn is visible
> > +		 * to get_ksm_page() before it can see that oldpage->mapping
> > +		 * has gone stale (or that PageSwapCache has been cleared).
> > +		 */
> > +		smp_wmb();
> > +		set_page_stable_node(oldpage, NULL);
> >  	}
> >  }
> >  #endif /* CONFIG_MIGRATION */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-02-08 20:52 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-26  1:53 [PATCH 0/11] ksm: NUMA trees and page migration Hugh Dickins
2013-01-26  1:54 ` [PATCH 1/11] ksm: allow trees per NUMA node Hugh Dickins
2013-01-27  1:14   ` Simon Jeons
2013-01-27  2:54     ` Hugh Dickins
2013-01-27  3:16       ` Simon Jeons
2013-01-27 21:55         ` Hugh Dickins
2013-01-28 23:03   ` Andrew Morton
2013-01-29  1:17     ` Hugh Dickins
2013-01-28 23:08   ` Andrew Morton
2013-01-29  1:38     ` Hugh Dickins
2013-02-05 16:41   ` Mel Gorman
2013-02-07 23:57     ` Hugh Dickins
2013-01-26  1:56 ` [PATCH 2/11] ksm: add sysfs ABI Documentation Hugh Dickins
2013-01-26  1:58 ` [PATCH 3/11] ksm: trivial tidyups Hugh Dickins
2013-01-28 23:11   ` Andrew Morton
2013-01-29  1:44     ` Hugh Dickins
2013-01-26  1:59 ` [PATCH 4/11] ksm: reorganize ksm_check_stable_tree Hugh Dickins
2013-02-05 16:48   ` Mel Gorman
2013-02-08  0:07     ` Hugh Dickins
2013-02-14 11:30       ` Mel Gorman
2013-01-26  2:00 ` [PATCH 5/11] ksm: get_ksm_page locked Hugh Dickins
2013-01-27  2:36   ` Simon Jeons
2013-01-27 22:08     ` Hugh Dickins
2013-01-28  0:36       ` Simon Jeons
2013-01-28  3:35         ` Hugh Dickins
2013-01-27  2:48   ` Simon Jeons
2013-01-27 22:10     ` Hugh Dickins
2013-02-05 17:18   ` Mel Gorman
2013-02-08  0:33     ` Hugh Dickins
2013-02-14 11:34       ` Mel Gorman
2013-01-26  2:01 ` [PATCH 6/11] ksm: remove old stable nodes more thoroughly Hugh Dickins
2013-01-27  4:55   ` Simon Jeons
2013-01-27 23:05     ` Hugh Dickins
2013-01-28  1:42       ` Simon Jeons
2013-01-28  4:14         ` Hugh Dickins
2013-01-28  2:12   ` Simon Jeons
2013-01-28  4:19     ` Hugh Dickins
2013-01-28  6:36   ` Simon Jeons
2013-01-28 23:44   ` Andrew Morton
2013-01-29  2:03     ` Hugh Dickins
2013-02-05 17:55   ` Mel Gorman
2013-02-08 19:33     ` Hugh Dickins
2013-02-14 11:58       ` Mel Gorman
2013-02-14 22:19         ` Hugh Dickins
2013-01-26  2:03 ` [PATCH 7/11] ksm: make KSM page migration possible Hugh Dickins
2013-01-27  5:47   ` Simon Jeons
2013-01-27 23:12     ` Hugh Dickins
2013-01-28  0:41       ` Simon Jeons
2013-01-28  3:44         ` Hugh Dickins
2013-02-05 19:11   ` Mel Gorman
2013-02-08 20:52     ` Hugh Dickins [this message]
2013-01-26  2:05 ` [PATCH 8/11] ksm: make !merge_across_nodes migration safe Hugh Dickins
2013-01-27  8:49   ` Simon Jeons
2013-01-27 23:25     ` Hugh Dickins
2013-01-28  3:44   ` Simon Jeons
2013-01-26  2:06 ` [PATCH 9/11] ksm: enable KSM page migration Hugh Dickins
2013-01-26  2:07 ` [PATCH 10/11] mm: remove offlining arg to migrate_pages Hugh Dickins
2013-01-26  2:10 ` [PATCH 11/11] ksm: stop hotremove lockdep warning Hugh Dickins
2013-01-27  6:23   ` Simon Jeons
2013-01-27 23:35     ` Hugh Dickins
2013-02-08 18:45   ` Gerald Schaefer
2013-02-11 22:13     ` Hugh Dickins
2013-01-28 23:54 ` [PATCH 0/11] ksm: NUMA trees and page migration Andrew Morton
2013-01-29  0:49   ` Izik Eidus
2013-01-29  2:26     ` Izik Eidus
2013-01-29 16:51       ` Andrea Arcangeli
2013-01-31  0:05         ` Ric Mason
2013-01-29  1:07   ` Hugh Dickins
2013-01-29 10:45     ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LNX.2.00.1302081133540.4233@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=izik.eidus@ravellosystems.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pholasek@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox