From: Hugh Dickins <hughd@google.com>
To: Mel Gorman <mgorman@suse.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Petr Holasek <pholasek@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Izik Eidus <izik.eidus@ravellosystems.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 7/11] ksm: make KSM page migration possible
Date: Fri, 8 Feb 2013 12:52:12 -0800 (PST) [thread overview]
Message-ID: <alpine.LNX.2.00.1302081133540.4233@eggly.anvils> (raw)
In-Reply-To: <20130205191102.GM21389@suse.de>
Paul, I've added you to the Cc in the hope that you can shed your light
on an smp_read_barrier_depends() question with which Mel taxes me below.
You may ask for more context: linux-next currently has an mm/ksm.c after
this patch is applied, but you may have questions beyond that - thanks!
On Tue, 5 Feb 2013, Mel Gorman wrote:
> On Fri, Jan 25, 2013 at 06:03:31PM -0800, Hugh Dickins wrote:
> > KSM page migration is already supported in the case of memory hotremove,
> > which takes the ksm_thread_mutex across all its migrations to keep life
> > simple.
> >
> > But the new KSM NUMA merge_across_nodes knob introduces a problem, when
> > it's set to non-default 0: if a KSM page is migrated to a different NUMA
> > node, how do we migrate its stable node to the right tree? And what if
> > that collides with an existing stable node?
> >
> > So far there's no provision for that, and this patch does not attempt
> > to deal with it either. But how will I test a solution, when I don't
> > know how to hotremove memory?
>
> Just reach in and yank it straight out with a chisel.
:)
>
> > The best answer is to enable KSM page
> > migration in all cases now, and test more common cases. With THP and
> > compaction added since KSM came in, page migration is now mainstream,
> > and it's a shame that a KSM page can frustrate freeing a page block.
> >
>
> THP will at least check if migration within a node works. It won't
> necessarily check we can migrate across nodes properly but it's a lot
> better than nothing.
No, I went back and dug out a hack-patch I was using three or four years
ago: occasionally on fault, just migrate every possible page in that mm
for no reason other than to test page migration.
> > static struct page *get_ksm_page(struct stable_node *stable_node, bool locked)
> > {
> > struct page *page;
> > void *expected_mapping;
> > + unsigned long kpfn;
> >
> > - page = pfn_to_page(stable_node->kpfn);
> > expected_mapping = (void *)stable_node +
> > (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM);
> > - if (page->mapping != expected_mapping)
> > - goto stale;
> > - if (!get_page_unless_zero(page))
> > +again:
> > + kpfn = ACCESS_ONCE(stable_node->kpfn);
> > + page = pfn_to_page(kpfn);
> > +
>
> Ok.
>
> There should be no concern that hot-remove made the kpfn invalid because
> those stable tree entries should have been discarded.
Yes.
>
> > + /*
> > + * page is computed from kpfn, so on most architectures reading
> > + * page->mapping is naturally ordered after reading node->kpfn,
> > + * but on Alpha we need to be more careful.
> > + */
> > + smp_read_barrier_depends();
>
> The value of page is data dependant on pfn_to_page(). Is it really possible
> for that to be re-ordered even on Alpha?
My intuition (to say "understanding" would be an exaggeration) is that
on Alpha a very old value of page->mapping (in the line below) might be
lying around and read from one cache, which has not necessarily been
invalidated by ksm_migrate_page() pointing stable_node->kpfn to this
new page.
And if that happens, we could easily and mistakenly conclude that this
stable node is stale: although there's an smp_rmb() after goto stale,
stable_node->kpfn would still match kpfn, and we wrongly remove the node.
My confidence that I've expressed that clearly in words, is lower than
my confidence that I've coded it right; and if I'm wrong, yes, surely
it's better to remove any cargo-cult smp_read_barrier_depends().
>
> > + if (ACCESS_ONCE(page->mapping) != expected_mapping)
> > goto stale;
> > - if (page->mapping != expected_mapping) {
> > +
> > + /*
> > + * We cannot do anything with the page while its refcount is 0.
> > + * Usually 0 means free, or tail of a higher-order page: in which
> > + * case this node is no longer referenced, and should be freed;
> > + * however, it might mean that the page is under page_freeze_refs().
> > + * The __remove_mapping() case is easy, again the node is now stale;
> > + * but if page is swapcache in migrate_page_move_mapping(), it might
> > + * still be our page, in which case it's essential to keep the node.
> > + */
> > + while (!get_page_unless_zero(page)) {
> > + /*
> > + * Another check for page->mapping != expected_mapping would
> > + * work here too. We have chosen the !PageSwapCache test to
> > + * optimize the common case, when the page is or is about to
> > + * be freed: PageSwapCache is cleared (under spin_lock_irq)
> > + * in the freeze_refs section of __remove_mapping(); but Anon
> > + * page->mapping reset to NULL later, in free_pages_prepare().
> > + */
> > + if (!PageSwapCache(page))
> > + goto stale;
> > + cpu_relax();
> > + }
>
> The recheck of stable_node->kpfn check after a barrier distinguishes between
> a free and a completed migration, that's fine. I'm hesitate to ask because
> it must be obvious but where is the guarantee that a KSM page is in the
> swap cache?
Certainly none at all: it's the less common case that a KSM page is in
swap cache. But if it is not in swap cache, how could its page count be
0 (causing get_page_unless_zero to fail)? By being free, or well on its
way to being freed (hence stale); or reused as part of a compound page
(hence stale also); or reused for another purpose which arrives at a
page_freeze_refs() (hence stale also); other cases?
It's hard to see from the diff, but in the original version of
get_ksm_page(), !get_page_unless_zero goes straight to stale.
Don't for a moment imagine that this function sprang fully formed
from my mind: it was hard to get it working right (the swap cache
get_page_unless_zero failure during migration really caught me out),
and then to pare it down to its fairly simple final form.
Hugh
>
> > +
> > + if (ACCESS_ONCE(page->mapping) != expected_mapping) {
> > put_page(page);
> > goto stale;
> > }
> > +
> > if (locked) {
> > lock_page(page);
> > - if (page->mapping != expected_mapping) {
> > + if (ACCESS_ONCE(page->mapping) != expected_mapping) {
> > unlock_page(page);
> > put_page(page);
> > goto stale;
> > }
> > }
> > return page;
> > +
> > stale:
> > + /*
> > + * We come here from above when page->mapping or !PageSwapCache
> > + * suggests that the node is stale; but it might be under migration.
> > + * We need smp_rmb(), matching the smp_wmb() in ksm_migrate_page(),
> > + * before checking whether node->kpfn has been changed.
> > + */
> > + smp_rmb();
> > + if (ACCESS_ONCE(stable_node->kpfn) != kpfn)
> > + goto again;
> > remove_node_from_stable_tree(stable_node);
> > return NULL;
> > }
> > @@ -1903,6 +1947,14 @@ void ksm_migrate_page(struct page *newpa
> > if (stable_node) {
> > VM_BUG_ON(stable_node->kpfn != page_to_pfn(oldpage));
> > stable_node->kpfn = page_to_pfn(newpage);
> > + /*
> > + * newpage->mapping was set in advance; now we need smp_wmb()
> > + * to make sure that the new stable_node->kpfn is visible
> > + * to get_ksm_page() before it can see that oldpage->mapping
> > + * has gone stale (or that PageSwapCache has been cleared).
> > + */
> > + smp_wmb();
> > + set_page_stable_node(oldpage, NULL);
> > }
> > }
> > #endif /* CONFIG_MIGRATION */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-02-08 20:52 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-26 1:53 [PATCH 0/11] ksm: NUMA trees and page migration Hugh Dickins
2013-01-26 1:54 ` [PATCH 1/11] ksm: allow trees per NUMA node Hugh Dickins
2013-01-27 1:14 ` Simon Jeons
2013-01-27 2:54 ` Hugh Dickins
2013-01-27 3:16 ` Simon Jeons
2013-01-27 21:55 ` Hugh Dickins
2013-01-28 23:03 ` Andrew Morton
2013-01-29 1:17 ` Hugh Dickins
2013-01-28 23:08 ` Andrew Morton
2013-01-29 1:38 ` Hugh Dickins
2013-02-05 16:41 ` Mel Gorman
2013-02-07 23:57 ` Hugh Dickins
2013-01-26 1:56 ` [PATCH 2/11] ksm: add sysfs ABI Documentation Hugh Dickins
2013-01-26 1:58 ` [PATCH 3/11] ksm: trivial tidyups Hugh Dickins
2013-01-28 23:11 ` Andrew Morton
2013-01-29 1:44 ` Hugh Dickins
2013-01-26 1:59 ` [PATCH 4/11] ksm: reorganize ksm_check_stable_tree Hugh Dickins
2013-02-05 16:48 ` Mel Gorman
2013-02-08 0:07 ` Hugh Dickins
2013-02-14 11:30 ` Mel Gorman
2013-01-26 2:00 ` [PATCH 5/11] ksm: get_ksm_page locked Hugh Dickins
2013-01-27 2:36 ` Simon Jeons
2013-01-27 22:08 ` Hugh Dickins
2013-01-28 0:36 ` Simon Jeons
2013-01-28 3:35 ` Hugh Dickins
2013-01-27 2:48 ` Simon Jeons
2013-01-27 22:10 ` Hugh Dickins
2013-02-05 17:18 ` Mel Gorman
2013-02-08 0:33 ` Hugh Dickins
2013-02-14 11:34 ` Mel Gorman
2013-01-26 2:01 ` [PATCH 6/11] ksm: remove old stable nodes more thoroughly Hugh Dickins
2013-01-27 4:55 ` Simon Jeons
2013-01-27 23:05 ` Hugh Dickins
2013-01-28 1:42 ` Simon Jeons
2013-01-28 4:14 ` Hugh Dickins
2013-01-28 2:12 ` Simon Jeons
2013-01-28 4:19 ` Hugh Dickins
2013-01-28 6:36 ` Simon Jeons
2013-01-28 23:44 ` Andrew Morton
2013-01-29 2:03 ` Hugh Dickins
2013-02-05 17:55 ` Mel Gorman
2013-02-08 19:33 ` Hugh Dickins
2013-02-14 11:58 ` Mel Gorman
2013-02-14 22:19 ` Hugh Dickins
2013-01-26 2:03 ` [PATCH 7/11] ksm: make KSM page migration possible Hugh Dickins
2013-01-27 5:47 ` Simon Jeons
2013-01-27 23:12 ` Hugh Dickins
2013-01-28 0:41 ` Simon Jeons
2013-01-28 3:44 ` Hugh Dickins
2013-02-05 19:11 ` Mel Gorman
2013-02-08 20:52 ` Hugh Dickins [this message]
2013-01-26 2:05 ` [PATCH 8/11] ksm: make !merge_across_nodes migration safe Hugh Dickins
2013-01-27 8:49 ` Simon Jeons
2013-01-27 23:25 ` Hugh Dickins
2013-01-28 3:44 ` Simon Jeons
2013-01-26 2:06 ` [PATCH 9/11] ksm: enable KSM page migration Hugh Dickins
2013-01-26 2:07 ` [PATCH 10/11] mm: remove offlining arg to migrate_pages Hugh Dickins
2013-01-26 2:10 ` [PATCH 11/11] ksm: stop hotremove lockdep warning Hugh Dickins
2013-01-27 6:23 ` Simon Jeons
2013-01-27 23:35 ` Hugh Dickins
2013-02-08 18:45 ` Gerald Schaefer
2013-02-11 22:13 ` Hugh Dickins
2013-01-28 23:54 ` [PATCH 0/11] ksm: NUMA trees and page migration Andrew Morton
2013-01-29 0:49 ` Izik Eidus
2013-01-29 2:26 ` Izik Eidus
2013-01-29 16:51 ` Andrea Arcangeli
2013-01-31 0:05 ` Ric Mason
2013-01-29 1:07 ` Hugh Dickins
2013-01-29 10:45 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LNX.2.00.1302081133540.4233@eggly.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=izik.eidus@ravellosystems.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=paulmck@linux.vnet.ibm.com \
--cc=pholasek@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox