From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Petr Holasek <pholasek@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Izik Eidus <izik.eidus@ravellosystems.com>,
Gerald Schaefer <gerald.schaefer@de.ibm.com>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH 11/11] ksm: stop hotremove lockdep warning
Date: Fri, 25 Jan 2013 18:10:18 -0800 (PST) [thread overview]
Message-ID: <alpine.LNX.2.00.1301251808120.29196@eggly.anvils> (raw)
In-Reply-To: <alpine.LNX.2.00.1301251747590.29196@eggly.anvils>
Complaints are rare, but lockdep still does not understand the way
ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and
holds it until the ksm_memory_callback(MEM_OFFLINE): that appears
to be a problem because notifier callbacks are made under down_read
of blocking_notifier_head->rwsem (so first the mutex is taken while
holding the rwsem, then later the rwsem is taken while still holding
the mutex); but is not in fact a problem because mem_hotplug_mutex
is held throughout the dance.
There was an attempt to fix this with mutex_lock_nested(); but if that
happened to fool lockdep two years ago, apparently it does so no longer.
I had hoped to eradicate this issue in extending KSM page migration not
to need the ksm_thread_mutex. But then realized that although the page
migration itself is safe, we do still need to lock out ksmd and other
users of get_ksm_page() while offlining memory - at some point between
MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may
vanish, and get_ksm_page()'s accesses to them become a violation.
So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to
MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining()
checks, to achieve the same lockout without being caught by lockdep.
This is less elegant for KSM, but it's more important to keep lockdep
useful to other users - and I apologize for how long it took to fix.
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---
mm/ksm.c | 55 +++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 41 insertions(+), 14 deletions(-)
--- mmotm.orig/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800
+++ mmotm/mm/ksm.c 2013-01-25 14:38:53.984208836 -0800
@@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod
#define KSM_RUN_STOP 0
#define KSM_RUN_MERGE 1
#define KSM_RUN_UNMERGE 2
-static unsigned int ksm_run = KSM_RUN_STOP;
+#define KSM_RUN_OFFLINE 4
+static unsigned long ksm_run = KSM_RUN_STOP;
+static void wait_while_offlining(void);
static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait);
static DEFINE_MUTEX(ksm_thread_mutex);
@@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing
while (!kthread_should_stop()) {
mutex_lock(&ksm_thread_mutex);
+ wait_while_offlining();
if (ksmd_should_run())
ksm_do_scan(ksm_thread_pages_to_scan);
mutex_unlock(&ksm_thread_mutex);
@@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa
#endif /* CONFIG_MIGRATION */
#ifdef CONFIG_MEMORY_HOTREMOVE
+static int just_wait(void *word)
+{
+ schedule();
+ return 0;
+}
+
+static void wait_while_offlining(void)
+{
+ while (ksm_run & KSM_RUN_OFFLINE) {
+ mutex_unlock(&ksm_thread_mutex);
+ wait_on_bit(&ksm_run, ilog2(KSM_RUN_OFFLINE),
+ just_wait, TASK_UNINTERRUPTIBLE);
+ mutex_lock(&ksm_thread_mutex);
+ }
+}
+
static void ksm_check_stable_tree(unsigned long start_pfn,
unsigned long end_pfn)
{
@@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no
switch (action) {
case MEM_GOING_OFFLINE:
/*
- * Keep it very simple for now: just lock out ksmd and
- * MADV_UNMERGEABLE while any memory is going offline.
- * mutex_lock_nested() is necessary because lockdep was alarmed
- * that here we take ksm_thread_mutex inside notifier chain
- * mutex, and later take notifier chain mutex inside
- * ksm_thread_mutex to unlock it. But that's safe because both
- * are inside mem_hotplug_mutex.
+ * Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items()
+ * and remove_all_stable_nodes() while memory is going offline:
+ * it is unsafe for them to touch the stable tree at this time.
+ * But unmerge_ksm_pages(), rmap lookups and other entry points
+ * which do not need the ksm_thread_mutex are all safe.
*/
- mutex_lock_nested(&ksm_thread_mutex, SINGLE_DEPTH_NESTING);
+ mutex_lock(&ksm_thread_mutex);
+ ksm_run |= KSM_RUN_OFFLINE;
+ mutex_unlock(&ksm_thread_mutex);
break;
case MEM_OFFLINE:
@@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no
/* fallthrough */
case MEM_CANCEL_OFFLINE:
+ mutex_lock(&ksm_thread_mutex);
+ ksm_run &= ~KSM_RUN_OFFLINE;
mutex_unlock(&ksm_thread_mutex);
+
+ smp_mb(); /* wake_up_bit advises this */
+ wake_up_bit(&ksm_run, ilog2(KSM_RUN_OFFLINE));
break;
}
return NOTIFY_OK;
}
+#else
+static void wait_while_offlining(void)
+{
+}
#endif /* CONFIG_MEMORY_HOTREMOVE */
#ifdef CONFIG_SYSFS
@@ -2189,7 +2217,7 @@ KSM_ATTR(pages_to_scan);
static ssize_t run_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
- return sprintf(buf, "%u\n", ksm_run);
+ return sprintf(buf, "%lu\n", ksm_run);
}
static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr,
@@ -2212,6 +2240,7 @@ static ssize_t run_store(struct kobject
*/
mutex_lock(&ksm_thread_mutex);
+ wait_while_offlining();
if (ksm_run != flags) {
ksm_run = flags;
if (flags & KSM_RUN_UNMERGE) {
@@ -2254,6 +2283,7 @@ static ssize_t merge_across_nodes_store(
return -EINVAL;
mutex_lock(&ksm_thread_mutex);
+ wait_while_offlining();
if (ksm_merge_across_nodes != knob) {
if (ksm_pages_shared || remove_all_stable_nodes())
err = -EBUSY;
@@ -2366,10 +2396,7 @@ static int __init ksm_init(void)
#endif /* CONFIG_SYSFS */
#ifdef CONFIG_MEMORY_HOTREMOVE
- /*
- * Choose a high priority since the callback takes ksm_thread_mutex:
- * later callbacks could only be taking locks which nest within that.
- */
+ /* There is no significance to this priority 100 */
hotplug_memory_notifier(ksm_memory_callback, 100);
#endif
return 0;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-01-26 2:10 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-26 1:53 [PATCH 0/11] ksm: NUMA trees and page migration Hugh Dickins
2013-01-26 1:54 ` [PATCH 1/11] ksm: allow trees per NUMA node Hugh Dickins
2013-01-27 1:14 ` Simon Jeons
2013-01-27 2:54 ` Hugh Dickins
2013-01-27 3:16 ` Simon Jeons
2013-01-27 21:55 ` Hugh Dickins
2013-01-28 23:03 ` Andrew Morton
2013-01-29 1:17 ` Hugh Dickins
2013-01-28 23:08 ` Andrew Morton
2013-01-29 1:38 ` Hugh Dickins
2013-02-05 16:41 ` Mel Gorman
2013-02-07 23:57 ` Hugh Dickins
2013-01-26 1:56 ` [PATCH 2/11] ksm: add sysfs ABI Documentation Hugh Dickins
2013-01-26 1:58 ` [PATCH 3/11] ksm: trivial tidyups Hugh Dickins
2013-01-28 23:11 ` Andrew Morton
2013-01-29 1:44 ` Hugh Dickins
2013-01-26 1:59 ` [PATCH 4/11] ksm: reorganize ksm_check_stable_tree Hugh Dickins
2013-02-05 16:48 ` Mel Gorman
2013-02-08 0:07 ` Hugh Dickins
2013-02-14 11:30 ` Mel Gorman
2013-01-26 2:00 ` [PATCH 5/11] ksm: get_ksm_page locked Hugh Dickins
2013-01-27 2:36 ` Simon Jeons
2013-01-27 22:08 ` Hugh Dickins
2013-01-28 0:36 ` Simon Jeons
2013-01-28 3:35 ` Hugh Dickins
2013-01-27 2:48 ` Simon Jeons
2013-01-27 22:10 ` Hugh Dickins
2013-02-05 17:18 ` Mel Gorman
2013-02-08 0:33 ` Hugh Dickins
2013-02-14 11:34 ` Mel Gorman
2013-01-26 2:01 ` [PATCH 6/11] ksm: remove old stable nodes more thoroughly Hugh Dickins
2013-01-27 4:55 ` Simon Jeons
2013-01-27 23:05 ` Hugh Dickins
2013-01-28 1:42 ` Simon Jeons
2013-01-28 4:14 ` Hugh Dickins
2013-01-28 2:12 ` Simon Jeons
2013-01-28 4:19 ` Hugh Dickins
2013-01-28 6:36 ` Simon Jeons
2013-01-28 23:44 ` Andrew Morton
2013-01-29 2:03 ` Hugh Dickins
2013-02-05 17:55 ` Mel Gorman
2013-02-08 19:33 ` Hugh Dickins
2013-02-14 11:58 ` Mel Gorman
2013-02-14 22:19 ` Hugh Dickins
2013-01-26 2:03 ` [PATCH 7/11] ksm: make KSM page migration possible Hugh Dickins
2013-01-27 5:47 ` Simon Jeons
2013-01-27 23:12 ` Hugh Dickins
2013-01-28 0:41 ` Simon Jeons
2013-01-28 3:44 ` Hugh Dickins
2013-02-05 19:11 ` Mel Gorman
2013-02-08 20:52 ` Hugh Dickins
2013-01-26 2:05 ` [PATCH 8/11] ksm: make !merge_across_nodes migration safe Hugh Dickins
2013-01-27 8:49 ` Simon Jeons
2013-01-27 23:25 ` Hugh Dickins
2013-01-28 3:44 ` Simon Jeons
2013-01-26 2:06 ` [PATCH 9/11] ksm: enable KSM page migration Hugh Dickins
2013-01-26 2:07 ` [PATCH 10/11] mm: remove offlining arg to migrate_pages Hugh Dickins
2013-01-26 2:10 ` Hugh Dickins [this message]
2013-01-27 6:23 ` [PATCH 11/11] ksm: stop hotremove lockdep warning Simon Jeons
2013-01-27 23:35 ` Hugh Dickins
2013-02-08 18:45 ` Gerald Schaefer
2013-02-11 22:13 ` Hugh Dickins
2013-01-28 23:54 ` [PATCH 0/11] ksm: NUMA trees and page migration Andrew Morton
2013-01-29 0:49 ` Izik Eidus
2013-01-29 2:26 ` Izik Eidus
2013-01-29 16:51 ` Andrea Arcangeli
2013-01-31 0:05 ` Ric Mason
2013-01-29 1:07 ` Hugh Dickins
2013-01-29 10:45 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LNX.2.00.1301251808120.29196@eggly.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=gerald.schaefer@de.ibm.com \
--cc=izik.eidus@ravellosystems.com \
--cc=kosaki.motohiro@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pholasek@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox