From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Petr Holasek <pholasek@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Izik Eidus <izik.eidus@ravellosystems.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 6/11] ksm: remove old stable nodes more thoroughly
Date: Fri, 25 Jan 2013 18:01:59 -0800 (PST) [thread overview]
Message-ID: <alpine.LNX.2.00.1301251800550.29196@eggly.anvils> (raw)
In-Reply-To: <alpine.LNX.2.00.1301251747590.29196@eggly.anvils>
Switching merge_across_nodes after running KSM is liable to oops on stale
nodes still left over from the previous stable tree. It's not something
that people will often want to do, but it would be lame to demand a reboot
when they're trying to determine which merge_across_nodes setting is best.
How can this happen? We only permit switching merge_across_nodes when
pages_shared is 0, and usually set run 2 to force that beforehand, which
ought to unmerge everything: yet oopses still occur when you then run 1.
Three causes:
1. The old stable tree (built according to the inverse merge_across_nodes)
has not been fully torn down. A stable node lingers until get_ksm_page()
notices that the page it references no longer references it: but the page
is not necessarily freed as soon as expected, particularly when swapcache.
Fix this with a pass through the old stable tree, applying get_ksm_page()
to each of the remaining nodes (most found stale and removed immediately),
with forced removal of any left over. Unless the page is still mapped:
I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
and EBUSY than BUG.
2. __ksm_enter() has a nice little optimization, to insert the new mm
just behind ksmd's cursor, so there's a full pass for it to stabilize
(or be removed) before ksmd addresses it. Nice when ksmd is running,
but not so nice when we're trying to unmerge all mms: we were missing
those mms forked and inserted behind the unmerge cursor. Easily fixed
by inserting at the end when KSM_RUN_UNMERGE.
3. It is possible for a KSM page to be faulted back from swapcache into
an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
A long outstanding, unrelated bugfix sneaks in with that third fix:
ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
I/O error when read in from swap) to a page which it then marks Uptodate.
Fix this case by not copying, letting do_swap_page() discover the error.
Signed-off-by: Hugh Dickins <hughd@google.com>
---
include/linux/ksm.h | 18 ++-------
mm/ksm.c | 83 +++++++++++++++++++++++++++++++++++++++---
mm/memory.c | 19 ++++-----
3 files changed, 92 insertions(+), 28 deletions(-)
--- mmotm.orig/include/linux/ksm.h 2013-01-25 14:27:58.220193250 -0800
+++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800
@@ -16,9 +16,6 @@
struct stable_node;
struct mem_cgroup;
-struct page *ksm_does_need_to_copy(struct page *page,
- struct vm_area_struct *vma, unsigned long address);
-
#ifdef CONFIG_KSM
int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
unsigned long end, int advice, unsigned long *vm_flags);
@@ -73,15 +70,8 @@ static inline void set_page_stable_node(
* We'd like to make this conditional on vma->vm_flags & VM_MERGEABLE,
* but what if the vma was unmerged while the page was swapped out?
*/
-static inline int ksm_might_need_to_copy(struct page *page,
- struct vm_area_struct *vma, unsigned long address)
-{
- struct anon_vma *anon_vma = page_anon_vma(page);
-
- return anon_vma &&
- (anon_vma->root != vma->anon_vma->root ||
- page->index != linear_page_index(vma, address));
-}
+struct page *ksm_might_need_to_copy(struct page *page,
+ struct vm_area_struct *vma, unsigned long address);
int page_referenced_ksm(struct page *page,
struct mem_cgroup *memcg, unsigned long *vm_flags);
@@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
return 0;
}
-static inline int ksm_might_need_to_copy(struct page *page,
+static inline struct page *ksm_might_need_to_copy(struct page *page,
struct vm_area_struct *vma, unsigned long address)
{
- return 0;
+ return page;
}
static inline int page_referenced_ksm(struct page *page,
--- mmotm.orig/mm/ksm.c 2013-01-25 14:36:58.856206099 -0800
+++ mmotm/mm/ksm.c 2013-01-25 14:37:00.768206145 -0800
@@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
/*
* Only called through the sysfs control interface:
*/
+static int remove_stable_node(struct stable_node *stable_node)
+{
+ struct page *page;
+ int err;
+
+ page = get_ksm_page(stable_node, true);
+ if (!page) {
+ /*
+ * get_ksm_page did remove_node_from_stable_tree itself.
+ */
+ return 0;
+ }
+
+ if (WARN_ON_ONCE(page_mapped(page)))
+ err = -EBUSY;
+ else {
+ /*
+ * This page might be in a pagevec waiting to be freed,
+ * or it might be PageSwapCache (perhaps under writeback),
+ * or it might have been removed from swapcache a moment ago.
+ */
+ set_page_stable_node(page, NULL);
+ remove_node_from_stable_tree(stable_node);
+ err = 0;
+ }
+
+ unlock_page(page);
+ put_page(page);
+ return err;
+}
+
+static int remove_all_stable_nodes(void)
+{
+ struct stable_node *stable_node;
+ int nid;
+ int err = 0;
+
+ for (nid = 0; nid < nr_node_ids; nid++) {
+ while (root_stable_tree[nid].rb_node) {
+ stable_node = rb_entry(root_stable_tree[nid].rb_node,
+ struct stable_node, node);
+ if (remove_stable_node(stable_node)) {
+ err = -EBUSY;
+ break; /* proceed to next nid */
+ }
+ cond_resched();
+ }
+ }
+ return err;
+}
+
static int unmerge_and_remove_all_rmap_items(void)
{
struct mm_slot *mm_slot;
@@ -691,6 +742,8 @@ static int unmerge_and_remove_all_rmap_i
}
}
+ /* Clean up stable nodes, but don't worry if some are still busy */
+ remove_all_stable_nodes();
ksm_scan.seqnr = 0;
return 0;
@@ -1586,11 +1639,19 @@ int __ksm_enter(struct mm_struct *mm)
spin_lock(&ksm_mmlist_lock);
insert_to_mm_slots_hash(mm, mm_slot);
/*
- * Insert just behind the scanning cursor, to let the area settle
+ * When KSM_RUN_MERGE (or KSM_RUN_STOP),
+ * insert just behind the scanning cursor, to let the area settle
* down a little; when fork is followed by immediate exec, we don't
* want ksmd to waste time setting up and tearing down an rmap_list.
+ *
+ * But when KSM_RUN_UNMERGE, it's important to insert ahead of its
+ * scanning cursor, otherwise KSM pages in newly forked mms will be
+ * missed: then we might as well insert at the end of the list.
*/
- list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list);
+ if (ksm_run & KSM_RUN_UNMERGE)
+ list_add_tail(&mm_slot->mm_list, &ksm_mm_head.mm_list);
+ else
+ list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list);
spin_unlock(&ksm_mmlist_lock);
set_bit(MMF_VM_MERGEABLE, &mm->flags);
@@ -1640,11 +1701,25 @@ void __ksm_exit(struct mm_struct *mm)
}
}
-struct page *ksm_does_need_to_copy(struct page *page,
+struct page *ksm_might_need_to_copy(struct page *page,
struct vm_area_struct *vma, unsigned long address)
{
+ struct anon_vma *anon_vma = page_anon_vma(page);
struct page *new_page;
+ if (PageKsm(page)) {
+ if (page_stable_node(page) &&
+ !(ksm_run & KSM_RUN_UNMERGE))
+ return page; /* no need to copy it */
+ } else if (!anon_vma) {
+ return page; /* no need to copy it */
+ } else if (anon_vma->root == vma->anon_vma->root &&
+ page->index == linear_page_index(vma, address)) {
+ return page; /* still no need to copy it */
+ }
+ if (!PageUptodate(page))
+ return page; /* let do_swap_page report the error */
+
new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
if (new_page) {
copy_user_highpage(new_page, page, address, vma);
@@ -2024,7 +2099,7 @@ static ssize_t merge_across_nodes_store(
mutex_lock(&ksm_thread_mutex);
if (ksm_merge_across_nodes != knob) {
- if (ksm_pages_shared)
+ if (ksm_pages_shared || remove_all_stable_nodes())
err = -EBUSY;
else
ksm_merge_across_nodes = knob;
--- mmotm.orig/mm/memory.c 2013-01-25 14:27:58.220193250 -0800
+++ mmotm/mm/memory.c 2013-01-25 14:37:00.768206145 -0800
@@ -2994,17 +2994,16 @@ static int do_swap_page(struct mm_struct
if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val))
goto out_page;
- if (ksm_might_need_to_copy(page, vma, address)) {
- swapcache = page;
- page = ksm_does_need_to_copy(page, vma, address);
-
- if (unlikely(!page)) {
- ret = VM_FAULT_OOM;
- page = swapcache;
- swapcache = NULL;
- goto out_page;
- }
+ swapcache = page;
+ page = ksm_might_need_to_copy(page, vma, address);
+ if (unlikely(!page)) {
+ ret = VM_FAULT_OOM;
+ page = swapcache;
+ swapcache = NULL;
+ goto out_page;
}
+ if (page == swapcache)
+ swapcache = NULL;
if (mem_cgroup_try_charge_swapin(mm, page, GFP_KERNEL, &ptr)) {
ret = VM_FAULT_OOM;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-01-26 2:01 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-26 1:53 [PATCH 0/11] ksm: NUMA trees and page migration Hugh Dickins
2013-01-26 1:54 ` [PATCH 1/11] ksm: allow trees per NUMA node Hugh Dickins
2013-01-27 1:14 ` Simon Jeons
2013-01-27 2:54 ` Hugh Dickins
2013-01-27 3:16 ` Simon Jeons
2013-01-27 21:55 ` Hugh Dickins
2013-01-28 23:03 ` Andrew Morton
2013-01-29 1:17 ` Hugh Dickins
2013-01-28 23:08 ` Andrew Morton
2013-01-29 1:38 ` Hugh Dickins
2013-02-05 16:41 ` Mel Gorman
2013-02-07 23:57 ` Hugh Dickins
2013-01-26 1:56 ` [PATCH 2/11] ksm: add sysfs ABI Documentation Hugh Dickins
2013-01-26 1:58 ` [PATCH 3/11] ksm: trivial tidyups Hugh Dickins
2013-01-28 23:11 ` Andrew Morton
2013-01-29 1:44 ` Hugh Dickins
2013-01-26 1:59 ` [PATCH 4/11] ksm: reorganize ksm_check_stable_tree Hugh Dickins
2013-02-05 16:48 ` Mel Gorman
2013-02-08 0:07 ` Hugh Dickins
2013-02-14 11:30 ` Mel Gorman
2013-01-26 2:00 ` [PATCH 5/11] ksm: get_ksm_page locked Hugh Dickins
2013-01-27 2:36 ` Simon Jeons
2013-01-27 22:08 ` Hugh Dickins
2013-01-28 0:36 ` Simon Jeons
2013-01-28 3:35 ` Hugh Dickins
2013-01-27 2:48 ` Simon Jeons
2013-01-27 22:10 ` Hugh Dickins
2013-02-05 17:18 ` Mel Gorman
2013-02-08 0:33 ` Hugh Dickins
2013-02-14 11:34 ` Mel Gorman
2013-01-26 2:01 ` Hugh Dickins [this message]
2013-01-27 4:55 ` [PATCH 6/11] ksm: remove old stable nodes more thoroughly Simon Jeons
2013-01-27 23:05 ` Hugh Dickins
2013-01-28 1:42 ` Simon Jeons
2013-01-28 4:14 ` Hugh Dickins
2013-01-28 2:12 ` Simon Jeons
2013-01-28 4:19 ` Hugh Dickins
2013-01-28 6:36 ` Simon Jeons
2013-01-28 23:44 ` Andrew Morton
2013-01-29 2:03 ` Hugh Dickins
2013-02-05 17:55 ` Mel Gorman
2013-02-08 19:33 ` Hugh Dickins
2013-02-14 11:58 ` Mel Gorman
2013-02-14 22:19 ` Hugh Dickins
2013-01-26 2:03 ` [PATCH 7/11] ksm: make KSM page migration possible Hugh Dickins
2013-01-27 5:47 ` Simon Jeons
2013-01-27 23:12 ` Hugh Dickins
2013-01-28 0:41 ` Simon Jeons
2013-01-28 3:44 ` Hugh Dickins
2013-02-05 19:11 ` Mel Gorman
2013-02-08 20:52 ` Hugh Dickins
2013-01-26 2:05 ` [PATCH 8/11] ksm: make !merge_across_nodes migration safe Hugh Dickins
2013-01-27 8:49 ` Simon Jeons
2013-01-27 23:25 ` Hugh Dickins
2013-01-28 3:44 ` Simon Jeons
2013-01-26 2:06 ` [PATCH 9/11] ksm: enable KSM page migration Hugh Dickins
2013-01-26 2:07 ` [PATCH 10/11] mm: remove offlining arg to migrate_pages Hugh Dickins
2013-01-26 2:10 ` [PATCH 11/11] ksm: stop hotremove lockdep warning Hugh Dickins
2013-01-27 6:23 ` Simon Jeons
2013-01-27 23:35 ` Hugh Dickins
2013-02-08 18:45 ` Gerald Schaefer
2013-02-11 22:13 ` Hugh Dickins
2013-01-28 23:54 ` [PATCH 0/11] ksm: NUMA trees and page migration Andrew Morton
2013-01-29 0:49 ` Izik Eidus
2013-01-29 2:26 ` Izik Eidus
2013-01-29 16:51 ` Andrea Arcangeli
2013-01-31 0:05 ` Ric Mason
2013-01-29 1:07 ` Hugh Dickins
2013-01-29 10:45 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LNX.2.00.1301251800550.29196@eggly.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=izik.eidus@ravellosystems.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pholasek@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox