linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org,
	bugme-daemon@bugzilla.kernel.org, qcui@redhat.com,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Li Zefan <lizf@cn.fujitsu.com>, Mel Gorman <mgorman@suse.de>
Subject: Re: [Bugme-new] [Bug 36192] New: Kernel panic when boot the 2.6.39+ kernel based off of 2.6.32 kernel
Date: Tue, 7 Jun 2011 09:51:31 +0200	[thread overview]
Message-ID: <20110607075131.GB22234@cmpxchg.org> (raw)
In-Reply-To: <20110607095708.6097689a.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, Jun 07, 2011 at 09:57:08AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 6 Jun 2011 14:45:19 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > Hopefully he can test this one for us as well, thanks.
> > 
> 
> A  patch with better description (of mine) is here.
> Anyway, I felt I needed a fix for ARM special case.

It's a different issue that warrants a separate patch, I think.

> fix-init-page_cgroup-for-sparsemem-taking-care-of-broken-page-flags.patch
> Even with SPARSEMEM, there are some magical memmap.
> 
> If a Node is not aligned to SECTION, memmap of pfn which is out of
> Node's range is not initialized. And page->flags contains 0.
> 
> If Node(0) doesn't exist, NODE_DATA(pfn_to_nid(pfn)) causes error.
> 
> In another case, for example, ARM frees memmap which is never be used
> even under SPARSEMEM. In that case, page->flags will contain broken
> value.
> 
> This patch does a strict check on nid which is obtained by
> pfn_to_page() and use proper NID for page_cgroup allocation.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  mm/page_cgroup.c |   36 +++++++++++++++++++++++++++++++++++-
>  1 file changed, 35 insertions(+), 1 deletion(-)
> 
> Index: linux-3.0-rc1/mm/page_cgroup.c
> ===================================================================
> --- linux-3.0-rc1.orig/mm/page_cgroup.c
> +++ linux-3.0-rc1/mm/page_cgroup.c
> @@ -168,6 +168,7 @@ static int __meminit init_section_page_c
>  	struct mem_section *section;
>  	unsigned long table_size;
>  	unsigned long nr;
> +	unsigned long tmp;
>  	int nid, index;
>  
>  	nr = pfn_to_section_nr(pfn);
> @@ -175,8 +176,41 @@ static int __meminit init_section_page_c
>  
>  	if (section->page_cgroup)
>  		return 0;
> +	/*
> +	 * check Node-ID. Because we get 'pfn' which is obtained by calculation,
> +	 * the pfn may "not exist" or "alreay freed". Even if pfn_valid() returns
> +	 * true, page->flags may contain broken value and pfn_to_nid() returns
> +	 * bad value.
> +	 * (See CONFIG_ARCH_HAS_HOLES_MEMORYMODEL and ARM's free_memmap())
> +	 * So, we need to do careful check, here.
> +	 */
> +	for (tmp = pfn;
> +	     tmp < pfn + PAGES_PER_SECTION;
> +	     tmp += MAX_ORDER_NR_PAGES, nid = -1) {
> +		struct page *page;
> +
> +		if (!pfn_valid(tmp))
> +			continue;
> +
> +		page = pfn_to_page(tmp);
> +		nid = page_to_nid(page);
>  
> -	nid = page_to_nid(pfn_to_page(pfn));
> +		/*
> +		 * If 'page' isn't initialized or freed, it may contains broken
> +		 * information.
> +		 */
> +		if (!node_state(nid, N_NORMAL_MEMORY))
> +			continue;
> +		if (page_to_pfn(pfn_to_page(tmp)) != tmp)
> +			continue;

This looks quite elaborate just to figure out the node id.

Here is what I wrote before I went with the sparsemem model fix (with
a modified changelog, because it also fixed the off-node range
problem).

It just iterates nodes in the first place, so the node id is never a
question.  The memory hotplug callback still relies on pfn_to_nid()
but ARM, at least as of now, does not support hotplug anyway.

What do you think?

---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: [patch] page_cgroup: do not rely on memmap outside of node ranges

On ARM, memmap for present sections may partially be released.

As a consequence of this, a PFN walker like the page_cgroup array
allocator may not rely on the struct page that corresponds to a
pfn_present() PFN before it has been validated with something like
memmap_valid_within().

However, since this code only requires the node ID from the PFN range,
this patch changes it from a pure PFN walker to a PFN in nodes walker.
The node ID information is then inherently available through the node
that is currently being walked.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_cgroup.c |   41 +++++++++++++++++++++++++----------------
 1 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 74ccff6..46b6814 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -162,13 +162,13 @@ static void free_page_cgroup(void *addr)
 }
 #endif
 
-static int __meminit init_section_page_cgroup(unsigned long pfn)
+static int __meminit init_section_page_cgroup(int nid, unsigned long pfn)
 {
 	struct page_cgroup *base, *pc;
 	struct mem_section *section;
 	unsigned long table_size;
 	unsigned long nr;
-	int nid, index;
+	int index;
 
 	nr = pfn_to_section_nr(pfn);
 	section = __nr_to_section(nr);
@@ -176,7 +176,6 @@ static int __meminit init_section_page_cgroup(unsigned long pfn)
 	if (section->page_cgroup)
 		return 0;
 
-	nid = page_to_nid(pfn_to_page(pfn));
 	table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
 	base = alloc_page_cgroup(table_size, nid);
 
@@ -222,13 +221,16 @@ int __meminit online_page_cgroup(unsigned long start_pfn,
 	unsigned long start, end, pfn;
 	int fail = 0;
 
+	if (nid < 0)
+		nid = pfn_to_nid(start_pfn);
+
 	start = start_pfn & ~(PAGES_PER_SECTION - 1);
 	end = ALIGN(start_pfn + nr_pages, PAGES_PER_SECTION);
 
 	for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
 		if (!pfn_present(pfn))
 			continue;
-		fail = init_section_page_cgroup(pfn);
+		fail = init_section_page_cgroup(nid, pfn);
 	}
 	if (!fail)
 		return 0;
@@ -283,23 +285,30 @@ static int __meminit page_cgroup_callback(struct notifier_block *self,
 
 void __init page_cgroup_init(void)
 {
-	unsigned long pfn;
-	int fail = 0;
+	pg_data_t *pgdat;
 
 	if (mem_cgroup_disabled())
 		return;
 
-	for (pfn = 0; !fail && pfn < max_pfn; pfn += PAGES_PER_SECTION) {
-		if (!pfn_present(pfn))
-			continue;
-		fail = init_section_page_cgroup(pfn);
-	}
-	if (fail) {
-		printk(KERN_CRIT "try 'cgroup_disable=memory' boot option\n");
-		panic("Out of memory");
-	} else {
-		hotplug_memory_notifier(page_cgroup_callback, 0);
+	for_each_online_pgdat(pgdat) {
+		unsigned long start;
+		unsigned long end;
+		unsigned long pfn;
+
+		start = pgdat->node_start_pfn & ~(PAGES_PER_SECTION - 1);
+		end = ALIGN(pgdat->node_start_pfn + pgdat->node_spanned_pages,
+			    PAGES_PER_SECTION);
+		for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
+			if (!pfn_present(pfn))
+				continue;
+			if (!init_section_page_cgroup(pgdat->node_id, pfn))
+				continue;
+			printk(KERN_CRIT
+			       "try 'cgroup_disable=memory' boot option\n");
+			panic("Out of memory");
+		}
 	}
+	hotplug_memory_notifier(page_cgroup_callback, 0);
 	printk(KERN_INFO "allocated %ld bytes of page_cgroup\n", total_usage);
 	printk(KERN_INFO "please try 'cgroup_disable=memory' option if you don't"
 	" want memory cgroups\n");
-- 
1.7.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-06-07  7:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-36192-10286@https.bugzilla.kernel.org/>
2011-05-30  6:19 ` Andrew Morton
2011-05-30  7:01   ` KAMEZAWA Hiroyuki
2011-05-30  7:12     ` Minchan Kim
2011-05-30  7:29     ` KAMEZAWA Hiroyuki
2011-05-30  7:54       ` KAMEZAWA Hiroyuki
2011-05-30  8:51         ` KAMEZAWA Hiroyuki
2011-06-06 12:54           ` Johannes Weiner
2011-06-06 21:45             ` Andrew Morton
2011-06-06 23:45               ` KAMEZAWA Hiroyuki
2011-06-07  8:45                 ` Mel Gorman
2011-06-07  8:43                   ` KAMEZAWA Hiroyuki
2011-06-07  9:09                     ` Mel Gorman
2011-06-07  9:33                       ` KAMEZAWA Hiroyuki
2011-06-07 10:18                         ` Mel Gorman
2011-06-07 23:40                           ` KAMEZAWA Hiroyuki
2011-06-08  0:42                             ` KAMEZAWA Hiroyuki
2011-06-08  7:43                               ` Mel Gorman
2011-06-08  8:45                                 ` KAMEZAWA Hiroyuki
2011-06-08  9:03                                   ` Mel Gorman
2011-06-08 10:15                                   ` Johannes Weiner
2011-06-09  1:04                                     ` KAMEZAWA Hiroyuki
2011-06-09  1:42                                       ` [PATCH] [BUGFIX] Avoid getting nid from invalid struct page at page_cgroup allocation (as " KAMEZAWA Hiroyuki
2011-06-07  0:57               ` KAMEZAWA Hiroyuki
2011-06-07  7:51                 ` Johannes Weiner [this message]
2011-06-07  7:55                   ` KAMEZAWA Hiroyuki
2011-06-07 10:26                     ` Johannes Weiner
2011-06-07 23:45                       ` KAMEZAWA Hiroyuki
2011-06-08  9:33                         ` Johannes Weiner
2011-06-07  9:03                 ` Mel Gorman
2011-06-07  9:06                   ` KAMEZAWA Hiroyuki
2011-06-07 10:13                     ` Mel Gorman
2011-06-07  8:37             ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110607075131.GB22234@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=bugme-daemon@bugzilla.kernel.org \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mgorman@suse.de \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=qcui@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox