linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] badpage: more resilient bad page pte and rmap
@ 2008-12-01  0:37 Hugh Dickins
  2008-12-01  0:40 ` [PATCH 1/8] badpage: simplify page_alloc flag check+clear Hugh Dickins
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01  0:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Manfred Spraul, Nick Piggin, Dave Jones, Arjan van de Ven,
	linux-kernel, linux-mm

Here's a batch of 8 mm patches, intended for 2.6.29: revisiting
bad_page() and print_bad_pte() and the page_remove_rmap() Eeek BUG.
Diffed to slot in to the mmotm series just before "mmend".

The only clash with later mmotm patches is with Manfred's
mm-debug-dump-pageframes-on-bad_page.patch
which puts a hexdump in there.  Trivial to fix up, but I've never
actually found that patch helpful - perhaps because it isn't an -mm
tree that "Bad page state" reporters are running.  Time to drop it?

 include/linux/page-flags.h |   25 ++------
 include/linux/rmap.h       |    2 
 include/linux/swap.h       |   12 ---
 mm/filemap_xip.c           |    2 
 mm/fremap.c                |    2 
 mm/internal.h              |    1 
 mm/memory.c                |  109 ++++++++++++++++++++++++++---------
 mm/page_alloc.c            |  108 +++++++++++++++++++---------------
 mm/rmap.c                  |   24 -------
 mm/swapfile.c              |    7 +-
 10 files changed, 166 insertions(+), 126 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/8] badpage: simplify page_alloc flag check+clear
  2008-12-01  0:37 [PATCH 0/8] badpage: more resilient bad page pte and rmap Hugh Dickins
@ 2008-12-01  0:40 ` Hugh Dickins
  2008-12-01 14:47   ` Christoph Lameter
  2008-12-01  0:41 ` [PATCH 2/8] badpage: keep any bad page out of circulation Hugh Dickins
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01  0:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Russ Anderson, Nick Piggin, Dave Jones, Arjan van de Ven,
	linux-kernel, linux-mm

Simplify the PAGE_FLAGS checking and clearing when freeing and allocating
a page: check the same flags as before when freeing, clear ALL the flags
(unless PageReserved) when freeing, check ALL flags off when allocating.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 include/linux/page-flags.h |   25 ++++++++-----------------
 mm/page_alloc.c            |   19 ++++++-------------
 2 files changed, 14 insertions(+), 30 deletions(-)

--- badpage0/include/linux/page-flags.h	2008-11-26 12:18:59.000000000 +0000
+++ badpage1/include/linux/page-flags.h	2008-11-28 20:40:33.000000000 +0000
@@ -375,31 +375,22 @@ static inline void __ClearPageTail(struc
 #define __PG_MLOCKED		0
 #endif
 
-#define PAGE_FLAGS	(1 << PG_lru   | 1 << PG_private   | 1 << PG_locked | \
-			 1 << PG_buddy | 1 << PG_writeback | \
-			 1 << PG_slab  | 1 << PG_swapcache | 1 << PG_active | \
-			 __PG_UNEVICTABLE | __PG_MLOCKED)
-
-/*
- * Flags checked in bad_page().  Pages on the free list should not have
- * these flags set.  It they are, there is a problem.
- */
-#define PAGE_FLAGS_CLEAR_WHEN_BAD (PAGE_FLAGS | \
-		1 << PG_reclaim | 1 << PG_dirty | 1 << PG_swapbacked)
-
 /*
  * Flags checked when a page is freed.  Pages being freed should not have
  * these flags set.  It they are, there is a problem.
  */
-#define PAGE_FLAGS_CHECK_AT_FREE (PAGE_FLAGS | 1 << PG_reserved)
+#define PAGE_FLAGS_CHECK_AT_FREE \
+	(1 << PG_lru   | 1 << PG_private   | 1 << PG_locked | \
+	 1 << PG_buddy | 1 << PG_writeback | 1 << PG_reserved | \
+	 1 << PG_slab  | 1 << PG_swapcache | 1 << PG_active | \
+	 __PG_UNEVICTABLE | __PG_MLOCKED)
 
 /*
  * Flags checked when a page is prepped for return by the page allocator.
- * Pages being prepped should not have these flags set.  It they are, there
- * is a problem.
+ * Pages being prepped should not have any flags set.  It they are set,
+ * there has been a kernel bug or struct page corruption.
  */
-#define PAGE_FLAGS_CHECK_AT_PREP (PAGE_FLAGS | \
-		1 << PG_reserved | 1 << PG_dirty | 1 << PG_swapbacked)
+#define PAGE_FLAGS_CHECK_AT_PREP	((1 << NR_PAGEFLAGS) - 1)
 
 #endif /* !__GENERATING_BOUNDS_H */
 #endif	/* PAGE_FLAGS_H */
--- badpage0/mm/page_alloc.c	2008-11-26 12:19:00.000000000 +0000
+++ badpage1/mm/page_alloc.c	2008-11-28 20:40:33.000000000 +0000
@@ -231,7 +231,6 @@ static void bad_page(struct page *page)
 	printk(KERN_EMERG "Trying to fix it up, but a reboot is needed\n"
 		KERN_EMERG "Backtrace:\n");
 	dump_stack();
-	page->flags &= ~PAGE_FLAGS_CLEAR_WHEN_BAD;
 	set_page_count(page, 0);
 	reset_page_mapcount(page);
 	page->mapping = NULL;
@@ -468,16 +467,16 @@ static inline int free_pages_check(struc
 		(page_count(page) != 0)  |
 		(page->flags & PAGE_FLAGS_CHECK_AT_FREE)))
 		bad_page(page);
-	if (PageDirty(page))
-		__ClearPageDirty(page);
-	if (PageSwapBacked(page))
-		__ClearPageSwapBacked(page);
 	/*
 	 * For now, we report if PG_reserved was found set, but do not
 	 * clear it, and do not free the page.  But we shall soon need
 	 * to do more, for when the ZERO_PAGE count wraps negative.
 	 */
-	return PageReserved(page);
+	if (PageReserved(page))
+		return 1;
+	if (page->flags & PAGE_FLAGS_CHECK_AT_PREP)
+		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+	return 0;
 }
 
 /*
@@ -621,13 +620,7 @@ static int prep_new_page(struct page *pa
 	if (PageReserved(page))
 		return 1;
 
-	page->flags &= ~(1 << PG_uptodate | 1 << PG_error | 1 << PG_reclaim |
-			1 << PG_referenced | 1 << PG_arch_1 |
-			1 << PG_owner_priv_1 | 1 << PG_mappedtodisk
-#ifdef CONFIG_UNEVICTABLE_LRU
-			| 1 << PG_mlocked
-#endif
-			);
+	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 	set_page_private(page, 0);
 	set_page_refcounted(page);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/8] badpage: keep any bad page out of circulation
  2008-12-01  0:37 [PATCH 0/8] badpage: more resilient bad page pte and rmap Hugh Dickins
  2008-12-01  0:40 ` [PATCH 1/8] badpage: simplify page_alloc flag check+clear Hugh Dickins
@ 2008-12-01  0:41 ` Hugh Dickins
  2008-12-01 14:49   ` Christoph Lameter
  2008-12-01  0:42 ` [PATCH 3/8] badpage: replace page_remove_rmap Eeek and BUG Hugh Dickins
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01  0:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Dave Jones, Arjan van de Ven, linux-kernel, linux-mm

Until now the bad_page() checkers have special-cased PageReserved, keeping
those pages out of circulation thereafter.  Now extend the special case to
all: we want to keep ANY page with bad state out of circulation - the
"free" page may well be in use by something.

Leave the bad state of those pages untouched, for examination by debuggers;
except for PageBuddy - leaving that set would risk bringing the page back.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 mm/page_alloc.c |   52 +++++++++++++++++++++-------------------------
 1 file changed, 24 insertions(+), 28 deletions(-)

--- badpage1/mm/page_alloc.c	2008-11-28 20:40:33.000000000 +0000
+++ badpage2/mm/page_alloc.c	2008-11-28 20:40:36.000000000 +0000
@@ -231,9 +231,9 @@ static void bad_page(struct page *page)
 	printk(KERN_EMERG "Trying to fix it up, but a reboot is needed\n"
 		KERN_EMERG "Backtrace:\n");
 	dump_stack();
-	set_page_count(page, 0);
-	reset_page_mapcount(page);
-	page->mapping = NULL;
+
+	/* Leave bad fields for debug, except PageBuddy could make trouble */
+	__ClearPageBuddy(page);
 	add_taint(TAINT_BAD_PAGE);
 }
 
@@ -290,25 +290,31 @@ void prep_compound_gigantic_page(struct 
 }
 #endif
 
-static void destroy_compound_page(struct page *page, unsigned long order)
+static int destroy_compound_page(struct page *page, unsigned long order)
 {
 	int i;
 	int nr_pages = 1 << order;
+	int bad = 0;
 
-	if (unlikely(compound_order(page) != order))
+	if (unlikely(compound_order(page) != order) ||
+	    unlikely(!PageHead(page))) {
 		bad_page(page);
+		bad++;
+	}
 
-	if (unlikely(!PageHead(page)))
-			bad_page(page);
 	__ClearPageHead(page);
+
 	for (i = 1; i < nr_pages; i++) {
 		struct page *p = page + i;
 
-		if (unlikely(!PageTail(p) |
-				(p->first_page != page)))
+		if (unlikely(!PageTail(p) | (p->first_page != page))) {
 			bad_page(page);
+			bad++;
+		}
 		__ClearPageTail(p);
 	}
+
+	return bad;
 }
 
 static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags)
@@ -428,7 +434,8 @@ static inline void __free_one_page(struc
 	int migratetype = get_pageblock_migratetype(page);
 
 	if (unlikely(PageCompound(page)))
-		destroy_compound_page(page, order);
+		if (unlikely(destroy_compound_page(page, order)))
+			return;
 
 	page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
 
@@ -465,15 +472,10 @@ static inline int free_pages_check(struc
 	if (unlikely(page_mapcount(page) |
 		(page->mapping != NULL)  |
 		(page_count(page) != 0)  |
-		(page->flags & PAGE_FLAGS_CHECK_AT_FREE)))
+		(page->flags & PAGE_FLAGS_CHECK_AT_FREE))) {
 		bad_page(page);
-	/*
-	 * For now, we report if PG_reserved was found set, but do not
-	 * clear it, and do not free the page.  But we shall soon need
-	 * to do more, for when the ZERO_PAGE count wraps negative.
-	 */
-	if (PageReserved(page))
 		return 1;
+	}
 	if (page->flags & PAGE_FLAGS_CHECK_AT_PREP)
 		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 	return 0;
@@ -521,11 +523,11 @@ static void __free_pages_ok(struct page 
 {
 	unsigned long flags;
 	int i;
-	int reserved = 0;
+	int bad = 0;
 
 	for (i = 0 ; i < (1 << order) ; ++i)
-		reserved += free_pages_check(page + i);
-	if (reserved)
+		bad += free_pages_check(page + i);
+	if (bad)
 		return;
 
 	if (!PageHighMem(page)) {
@@ -610,17 +612,11 @@ static int prep_new_page(struct page *pa
 	if (unlikely(page_mapcount(page) |
 		(page->mapping != NULL)  |
 		(page_count(page) != 0)  |
-		(page->flags & PAGE_FLAGS_CHECK_AT_PREP)))
+		(page->flags & PAGE_FLAGS_CHECK_AT_PREP))) {
 		bad_page(page);
-
-	/*
-	 * For now, we report if PG_reserved was found set, but do not
-	 * clear it, and do not allocate the page: as a safety net.
-	 */
-	if (PageReserved(page))
 		return 1;
+	}
 
-	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 	set_page_private(page, 0);
 	set_page_refcounted(page);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/8] badpage: replace page_remove_rmap Eeek and BUG
  2008-12-01  0:37 [PATCH 0/8] badpage: more resilient bad page pte and rmap Hugh Dickins
  2008-12-01  0:40 ` [PATCH 1/8] badpage: simplify page_alloc flag check+clear Hugh Dickins
  2008-12-01  0:41 ` [PATCH 2/8] badpage: keep any bad page out of circulation Hugh Dickins
@ 2008-12-01  0:42 ` Hugh Dickins
  2008-12-01  0:43 ` [PATCH 4/8] badpage: vm_normal_page use print_bad_pte Hugh Dickins
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Dave Jones, Arjan van de Ven, linux-kernel, linux-mm

Now that bad pages are kept out of circulation, there is no need for the
infamous page_remove_rmap() BUG() - once that page is freed, its negative
mapcount will issue a "Bad page state" message and the page won't be freed.
Removing the BUG() allows more info, on subsequent pages, to be gathered.

We do have more info about the page at this point than bad_page() can know
- notably, what the pmd is, which might pinpoint something like low 64kB
corruption - but page_remove_rmap() isn't given the address to find that.

In practice, there is only one call to page_remove_rmap() which has ever
reported anything, that from zap_pte_range() (usually on exit, sometimes
on munmap).  It has all the info, so remove page_remove_rmap()'s "Eeek"
message and leave it all to zap_pte_range().

mm/memory.c already has a hardly used print_bad_pte() function, showing
some of the appropriate info: extend it to show what we want for the rmap
case: pte info, page info (when there is a page) and vma info to compare.
zap_pte_range() already knows the pmd, but print_bad_pte() is easier to
use if it works that out for itself.

Some of this info is also shown in bad_page()'s "Bad page state" message.
Keep them separate, but adjust them to match each other as far as possible.
Say "Bad page map" in print_bad_pte(), and add a TAINT_BAD_PAGE there too.

print_bad_pte() show current->comm unconditionally (though it should get
repeated in the usually irrelevant stack trace): sorry, I misled Nick
Piggin to make it conditional on vm_mm == current->mm, but current->mm
is already NULL in the exit case.  Usually current->comm is good, though
exceptionally it may not be that of the mm (when "swapoff" for example).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 mm/memory.c     |   50 +++++++++++++++++++++++++++++++++++-----------
 mm/page_alloc.c |   14 ++++++------
 mm/rmap.c       |   16 --------------
 3 files changed, 46 insertions(+), 34 deletions(-)

--- badpage2/mm/memory.c	2008-11-26 12:19:00.000000000 +0000
+++ badpage3/mm/memory.c	2008-11-28 20:40:40.000000000 +0000
@@ -52,6 +52,9 @@
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
 #include <linux/mmu_notifier.h>
+#include <linux/kallsyms.h>
+#include <linux/swapops.h>
+#include <linux/elf.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -59,9 +62,6 @@
 #include <asm/tlbflush.h>
 #include <asm/pgtable.h>
 
-#include <linux/swapops.h>
-#include <linux/elf.h>
-
 #include "internal.h"
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
@@ -375,15 +375,41 @@ static inline void add_mm_rss(struct mm_
  *
  * The calling function must still handle the error.
  */
-static void print_bad_pte(struct vm_area_struct *vma, pte_t pte,
-			  unsigned long vaddr)
+static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
+			  pte_t pte, struct page *page)
 {
-	printk(KERN_ERR "Bad pte = %08llx, process = %s, "
-			"vm_flags = %lx, vaddr = %lx\n",
-		(long long)pte_val(pte),
-		(vma->vm_mm == current->mm ? current->comm : "???"),
-		vma->vm_flags, vaddr);
+	pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
+	pud_t *pud = pud_offset(pgd, addr);
+	pmd_t *pmd = pmd_offset(pud, addr);
+	struct address_space *mapping;
+	pgoff_t index;
+
+	mapping = vma->vm_file ? vma->vm_file->f_mapping : NULL;
+	index = linear_page_index(vma, addr);
+
+	printk(KERN_EMERG "Bad page map in process %s  pte:%08llx pmd:%08llx\n",
+		current->comm,
+		(long long)pte_val(pte), (long long)pmd_val(*pmd));
+	if (page) {
+		printk(KERN_EMERG
+		"page:%p flags:%p count:%d mapcount:%d mapping:%p index:%lx\n",
+		page, (void *)page->flags, page_count(page),
+		page_mapcount(page), page->mapping, page->index);
+	}
+	printk(KERN_EMERG
+		"addr:%p vm_flags:%08lx anon_vma:%p mapping:%p index:%lx\n",
+		(void *)addr, vma->vm_flags, vma->anon_vma, mapping, index);
+	/*
+	 * Choose text because data symbols depend on CONFIG_KALLSYMS_ALL=y
+	 */
+	if (vma->vm_ops)
+		print_symbol(KERN_EMERG "vma->vm_ops->fault: %s\n",
+				(unsigned long)vma->vm_ops->fault);
+	if (vma->vm_file && vma->vm_file->f_op)
+		print_symbol(KERN_EMERG "vma->vm_file->f_op->mmap: %s\n",
+				(unsigned long)vma->vm_file->f_op->mmap);
 	dump_stack();
+	add_taint(TAINT_BAD_PAGE);
 }
 
 static inline int is_cow_mapping(unsigned int flags)
@@ -763,6 +789,8 @@ static unsigned long zap_pte_range(struc
 				file_rss--;
 			}
 			page_remove_rmap(page, vma);
+			if (unlikely(page_mapcount(page) < 0))
+				print_bad_pte(vma, addr, ptent, page);
 			tlb_remove_page(tlb, page);
 			continue;
 		}
@@ -2657,7 +2685,7 @@ static int do_nonlinear_fault(struct mm_
 		/*
 		 * Page table corrupted: show pte and kill process.
 		 */
-		print_bad_pte(vma, orig_pte, address);
+		print_bad_pte(vma, address, orig_pte, NULL);
 		return VM_FAULT_OOM;
 	}
 
--- badpage2/mm/page_alloc.c	2008-11-28 20:40:36.000000000 +0000
+++ badpage3/mm/page_alloc.c	2008-11-28 20:40:40.000000000 +0000
@@ -222,14 +222,14 @@ static inline int bad_range(struct zone 
 
 static void bad_page(struct page *page)
 {
-	printk(KERN_EMERG "Bad page state in process '%s'\n" KERN_EMERG
-		"page:%p flags:0x%0*lx mapping:%p mapcount:%d count:%d\n",
-		current->comm, page, (int)(2*sizeof(unsigned long)),
-		(unsigned long)page->flags, page->mapping,
-		page_mapcount(page), page_count(page));
+	printk(KERN_EMERG "Bad page state in process %s  pfn:%05lx\n",
+		current->comm, page_to_pfn(page));
+	printk(KERN_EMERG
+		"page:%p flags:%p count:%d mapcount:%d mapping:%p index:%lx\n",
+		page, (void *)page->flags, page_count(page),
+		page_mapcount(page), page->mapping, page->index);
+	printk(KERN_EMERG "Trying to fix it up, but a reboot is needed\n");
 
-	printk(KERN_EMERG "Trying to fix it up, but a reboot is needed\n"
-		KERN_EMERG "Backtrace:\n");
 	dump_stack();
 
 	/* Leave bad fields for debug, except PageBuddy could make trouble */
--- badpage2/mm/rmap.c	2008-11-26 12:19:00.000000000 +0000
+++ badpage3/mm/rmap.c	2008-11-28 20:40:40.000000000 +0000
@@ -47,7 +47,6 @@
 #include <linux/rmap.h>
 #include <linux/rcupdate.h>
 #include <linux/module.h>
-#include <linux/kallsyms.h>
 #include <linux/memcontrol.h>
 #include <linux/mmu_notifier.h>
 #include <linux/migrate.h>
@@ -725,21 +724,6 @@ void page_dup_rmap(struct page *page, st
 void page_remove_rmap(struct page *page, struct vm_area_struct *vma)
 {
 	if (atomic_add_negative(-1, &page->_mapcount)) {
-		if (unlikely(page_mapcount(page) < 0)) {
-			printk (KERN_EMERG "Eeek! page_mapcount(page) went negative! (%d)\n", page_mapcount(page));
-			printk (KERN_EMERG "  page pfn = %lx\n", page_to_pfn(page));
-			printk (KERN_EMERG "  page->flags = %lx\n", page->flags);
-			printk (KERN_EMERG "  page->count = %x\n", page_count(page));
-			printk (KERN_EMERG "  page->mapping = %p\n", page->mapping);
-			print_symbol (KERN_EMERG "  vma->vm_ops = %s\n", (unsigned long)vma->vm_ops);
-			if (vma->vm_ops) {
-				print_symbol (KERN_EMERG "  vma->vm_ops->fault = %s\n", (unsigned long)vma->vm_ops->fault);
-			}
-			if (vma->vm_file && vma->vm_file->f_op)
-				print_symbol (KERN_EMERG "  vma->vm_file->f_op->mmap = %s\n", (unsigned long)vma->vm_file->f_op->mmap);
-			BUG();
-		}
-
 		/*
 		 * Now that the last pte has gone, s390 must transfer dirty
 		 * flag from storage key to struct page.  We can usually skip

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/8] badpage: vm_normal_page use print_bad_pte
  2008-12-01  0:37 [PATCH 0/8] badpage: more resilient bad page pte and rmap Hugh Dickins
                   ` (2 preceding siblings ...)
  2008-12-01  0:42 ` [PATCH 3/8] badpage: replace page_remove_rmap Eeek and BUG Hugh Dickins
@ 2008-12-01  0:43 ` Hugh Dickins
  2008-12-01  0:44 ` [PATCH 5/8] badpage: zap print_bad_pte on swap and file Hugh Dickins
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01  0:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Manfred Spraul, Nick Piggin, Dave Jones, Arjan van de Ven,
	linux-kernel, linux-mm

print_bad_pte() is so far being called only when zap_pte_range() finds
negative page_mapcount, or there's a fault on a pte_file where it does
not belong.  That's weak coverage when we suspect pagetable corruption.

Originally, it was called when vm_normal_page() found an invalid pfn:
but pfn_valid is expensive on some architectures and configurations, so
2.6.24 put that under CONFIG_DEBUG_VM (which doesn't help in the field),
then 2.6.26 replaced it by a VM_BUG_ON (likewise).

Reinstate the print_bad_pte() in vm_normal_page(), but use a cheaper
test than pfn_valid(): memmap_init_zone() (used in bootup and hotplug)
keep a __read_mostly note of the highest_memmap_pfn, vm_normal_page()
then check pfn against that.  We could call this pfn_plausible() or
pfn_sane(), but I doubt we'll need it elsewhere: of course it's not
reliable, but gives much stronger pagetable validation on many boxes.

Also use print_bad_pte() when the pte_special bit is found outside a
VM_PFNMAP or VM_MIXEDMAP area, instead of VM_BUG_ON.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 mm/internal.h   |    1 +
 mm/memory.c     |   20 ++++++++++----------
 mm/page_alloc.c |    4 ++++
 3 files changed, 15 insertions(+), 10 deletions(-)

--- badpage3/mm/internal.h	2008-11-10 11:27:02.000000000 +0000
+++ badpage4/mm/internal.h	2008-11-28 20:40:42.000000000 +0000
@@ -49,6 +49,7 @@ extern void putback_lru_page(struct page
 /*
  * in mm/page_alloc.c
  */
+extern unsigned long highest_memmap_pfn;
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 
 /*
--- badpage3/mm/memory.c	2008-11-28 20:40:40.000000000 +0000
+++ badpage4/mm/memory.c	2008-11-28 20:40:42.000000000 +0000
@@ -467,21 +467,18 @@ static inline int is_cow_mapping(unsigne
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 				pte_t pte)
 {
-	unsigned long pfn;
+	unsigned long pfn = pte_pfn(pte);
 
 	if (HAVE_PTE_SPECIAL) {
-		if (likely(!pte_special(pte))) {
-			VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
-			return pte_page(pte);
-		}
-		VM_BUG_ON(!(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)));
+		if (likely(!pte_special(pte)))
+			goto check_pfn;
+		if (!(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)))
+			print_bad_pte(vma, addr, pte, NULL);
 		return NULL;
 	}
 
 	/* !HAVE_PTE_SPECIAL case follows: */
 
-	pfn = pte_pfn(pte);
-
 	if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
 		if (vma->vm_flags & VM_MIXEDMAP) {
 			if (!pfn_valid(pfn))
@@ -497,11 +494,14 @@ struct page *vm_normal_page(struct vm_ar
 		}
 	}
 
-	VM_BUG_ON(!pfn_valid(pfn));
+check_pfn:
+	if (unlikely(pfn > highest_memmap_pfn)) {
+		print_bad_pte(vma, addr, pte, NULL);
+		return NULL;
+	}
 
 	/*
 	 * NOTE! We still have PageReserved() pages in the page tables.
-	 *
 	 * eg. VDSO mappings can cause them to exist.
 	 */
 out:
--- badpage3/mm/page_alloc.c	2008-11-28 20:40:40.000000000 +0000
+++ badpage4/mm/page_alloc.c	2008-11-28 20:40:42.000000000 +0000
@@ -69,6 +69,7 @@ EXPORT_SYMBOL(node_states);
 
 unsigned long totalram_pages __read_mostly;
 unsigned long totalreserve_pages __read_mostly;
+unsigned long highest_memmap_pfn __read_mostly;
 int percpu_pagelist_fraction;
 
 #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
@@ -2597,6 +2598,9 @@ void __meminit memmap_init_zone(unsigned
 	unsigned long pfn;
 	struct zone *z;
 
+	if (highest_memmap_pfn < end_pfn - 1)
+		highest_memmap_pfn = end_pfn - 1;
+
 	z = &NODE_DATA(nid)->node_zones[zone];
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 5/8] badpage: zap print_bad_pte on swap and file
  2008-12-01  0:37 [PATCH 0/8] badpage: more resilient bad page pte and rmap Hugh Dickins
                   ` (3 preceding siblings ...)
  2008-12-01  0:43 ` [PATCH 4/8] badpage: vm_normal_page use print_bad_pte Hugh Dickins
@ 2008-12-01  0:44 ` Hugh Dickins
  2008-12-01  0:45 ` [PATCH 6/8] badpage: remove vma from page_remove_rmap Hugh Dickins
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01  0:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Dave Jones, Arjan van de Ven, linux-kernel, linux-mm

Complete zap_pte_range()'s coverage of bad pagetable entries by calling
print_bad_pte() on a pte_file in a linear vma and on a bad swap entry.
That needs free_swap_and_cache() to tell it, which will also have shown
one of those "swap_free" errors (but with much less information).

Similar checks in fork's copy_one_pte()?  No, that would be more noisy
than helpful: we'll see them when parent and child exec or exit.

Where do_nonlinear_fault() calls print_bad_pte(): omit !VM_CAN_NONLINEAR
case, that could only be a bug in sys_remap_file_pages(), not a bad pte.
VM_FAULT_OOM rather than VM_FAULT_SIGBUS?  Well, okay, that is consistent
with what happens if do_swap_page() operates a bad swap entry; but don't
we have patches to be more careful about killing when VM_FAULT_OOM?

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 include/linux/swap.h |   12 +++---------
 mm/memory.c          |   11 +++++++----
 mm/swapfile.c        |    7 ++++---

--- badpage4/include/linux/swap.h	2008-11-26 12:19:00.000000000 +0000
+++ badpage5/include/linux/swap.h	2008-11-28 20:40:46.000000000 +0000
@@ -305,7 +305,7 @@ extern swp_entry_t get_swap_page_of_type
 extern int swap_duplicate(swp_entry_t);
 extern int valid_swaphandles(swp_entry_t, unsigned long *);
 extern void swap_free(swp_entry_t);
-extern void free_swap_and_cache(swp_entry_t);
+extern int free_swap_and_cache(swp_entry_t);
 extern int swap_type_of(dev_t, sector_t, struct block_device **);
 extern unsigned int count_swap_pages(int, int);
 extern sector_t map_swap_page(struct swap_info_struct *, pgoff_t);
@@ -355,14 +355,8 @@ static inline void show_swap_cache_info(
 {
 }
 
-static inline void free_swap_and_cache(swp_entry_t swp)
-{
-}
-
-static inline int swap_duplicate(swp_entry_t swp)
-{
-	return 0;
-}
+#define free_swap_and_cache(swp)	is_migration_entry(swp)
+#define swap_duplicate(swp)		is_migration_entry(swp)
 
 static inline void swap_free(swp_entry_t swp)
 {
--- badpage4/mm/memory.c	2008-11-28 20:40:42.000000000 +0000
+++ badpage5/mm/memory.c	2008-11-28 20:40:46.000000000 +0000
@@ -800,8 +800,12 @@ static unsigned long zap_pte_range(struc
 		 */
 		if (unlikely(details))
 			continue;
-		if (!pte_file(ptent))
-			free_swap_and_cache(pte_to_swp_entry(ptent));
+		if (pte_file(ptent)) {
+			if (unlikely(!(vma->vm_flags & VM_NONLINEAR)))
+				print_bad_pte(vma, addr, ptent, NULL);
+		} else if
+		  (unlikely(!free_swap_and_cache(pte_to_swp_entry(ptent))))
+			print_bad_pte(vma, addr, ptent, NULL);
 		pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 	} while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));
 
@@ -2680,8 +2684,7 @@ static int do_nonlinear_fault(struct mm_
 	if (!pte_unmap_same(mm, pmd, page_table, orig_pte))
 		return 0;
 
-	if (unlikely(!(vma->vm_flags & VM_NONLINEAR) ||
-			!(vma->vm_flags & VM_CAN_NONLINEAR))) {
+	if (unlikely(!(vma->vm_flags & VM_NONLINEAR))) {
 		/*
 		 * Page table corrupted: show pte and kill process.
 		 */
--- badpage4/mm/swapfile.c	2008-11-28 20:37:16.000000000 +0000
+++ badpage5/mm/swapfile.c	2008-11-28 20:40:46.000000000 +0000
@@ -571,13 +571,13 @@ int try_to_free_swap(struct page *page)
  * Free the swap entry like above, but also try to
  * free the page cache entry if it is the last user.
  */
-void free_swap_and_cache(swp_entry_t entry)
+int free_swap_and_cache(swp_entry_t entry)
 {
-	struct swap_info_struct * p;
+	struct swap_info_struct *p;
 	struct page *page = NULL;
 
 	if (is_migration_entry(entry))
-		return;
+		return 1;
 
 	p = swap_info_get(entry);
 	if (p) {
@@ -603,6 +603,7 @@ void free_swap_and_cache(swp_entry_t ent
 		unlock_page(page);
 		page_cache_release(page);
 	}
+	return p != NULL;
 }
 
 #ifdef CONFIG_HIBERNATION

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 6/8] badpage: remove vma from page_remove_rmap
  2008-12-01  0:37 [PATCH 0/8] badpage: more resilient bad page pte and rmap Hugh Dickins
                   ` (4 preceding siblings ...)
  2008-12-01  0:44 ` [PATCH 5/8] badpage: zap print_bad_pte on swap and file Hugh Dickins
@ 2008-12-01  0:45 ` Hugh Dickins
  2008-12-01  0:46 ` [PATCH 7/8] badpage: ratelimit print_bad_pte and bad_page Hugh Dickins
  2008-12-01  0:48 ` [PATCH 8/8] badpage: KERN_ALERT BUG instead of KERN_EMERG Hugh Dickins
  7 siblings, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01  0:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Dave Jones, Arjan van de Ven, linux-kernel, linux-mm

Remove page_remove_rmap()'s vma arg, which was only for the Eeek message.
And remove the BUG_ON(page_mapcount(page) == 0) from CONFIG_DEBUG_VM's
page_dup_rmap(): we're trying to be more resilient about that than BUGs.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 include/linux/rmap.h |    2 +-
 mm/filemap_xip.c     |    2 +-
 mm/fremap.c          |    2 +-
 mm/memory.c          |    4 ++--
 mm/rmap.c            |    8 +++-----
 5 files changed, 8 insertions(+), 10 deletions(-)

--- badpage5/include/linux/rmap.h	2008-11-26 12:18:59.000000000 +0000
+++ badpage6/include/linux/rmap.h	2008-11-28 20:40:48.000000000 +0000
@@ -69,7 +69,7 @@ void __anon_vma_link(struct vm_area_stru
 void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long);
 void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned long);
 void page_add_file_rmap(struct page *);
-void page_remove_rmap(struct page *, struct vm_area_struct *);
+void page_remove_rmap(struct page *);
 
 #ifdef CONFIG_DEBUG_VM
 void page_dup_rmap(struct page *page, struct vm_area_struct *vma, unsigned long address);
--- badpage5/mm/filemap_xip.c	2008-10-09 23:13:53.000000000 +0100
+++ badpage6/mm/filemap_xip.c	2008-11-28 20:40:48.000000000 +0000
@@ -193,7 +193,7 @@ retry:
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));
 			pteval = ptep_clear_flush_notify(vma, address, pte);
-			page_remove_rmap(page, vma);
+			page_remove_rmap(page);
 			dec_mm_counter(mm, file_rss);
 			BUG_ON(pte_dirty(pteval));
 			pte_unmap_unlock(pte, ptl);
--- badpage5/mm/fremap.c	2008-10-24 09:28:26.000000000 +0100
+++ badpage6/mm/fremap.c	2008-11-28 20:40:48.000000000 +0000
@@ -37,7 +37,7 @@ static void zap_pte(struct mm_struct *mm
 		if (page) {
 			if (pte_dirty(pte))
 				set_page_dirty(page);
-			page_remove_rmap(page, vma);
+			page_remove_rmap(page);
 			page_cache_release(page);
 			update_hiwater_rss(mm);
 			dec_mm_counter(mm, file_rss);
--- badpage5/mm/memory.c	2008-11-28 20:40:46.000000000 +0000
+++ badpage6/mm/memory.c	2008-11-28 20:40:48.000000000 +0000
@@ -788,7 +788,7 @@ static unsigned long zap_pte_range(struc
 					mark_page_accessed(page);
 				file_rss--;
 			}
-			page_remove_rmap(page, vma);
+			page_remove_rmap(page);
 			if (unlikely(page_mapcount(page) < 0))
 				print_bad_pte(vma, addr, ptent, page);
 			tlb_remove_page(tlb, page);
@@ -1996,7 +1996,7 @@ gotten:
 			 * mapcount is visible. So transitively, TLBs to
 			 * old page will be flushed before it can be reused.
 			 */
-			page_remove_rmap(old_page, vma);
+			page_remove_rmap(old_page);
 		}
 
 		/* Free the old page.. */
--- badpage5/mm/rmap.c	2008-11-28 20:40:40.000000000 +0000
+++ badpage6/mm/rmap.c	2008-11-28 20:40:48.000000000 +0000
@@ -707,7 +707,6 @@ void page_add_file_rmap(struct page *pag
  */
 void page_dup_rmap(struct page *page, struct vm_area_struct *vma, unsigned long address)
 {
-	BUG_ON(page_mapcount(page) == 0);
 	if (PageAnon(page))
 		__page_check_anon_rmap(page, vma, address);
 	atomic_inc(&page->_mapcount);
@@ -717,11 +716,10 @@ void page_dup_rmap(struct page *page, st
 /**
  * page_remove_rmap - take down pte mapping from a page
  * @page: page to remove mapping from
- * @vma: the vm area in which the mapping is removed
  *
  * The caller needs to hold the pte lock.
  */
-void page_remove_rmap(struct page *page, struct vm_area_struct *vma)
+void page_remove_rmap(struct page *page)
 {
 	if (atomic_add_negative(-1, &page->_mapcount)) {
 		/*
@@ -837,7 +835,7 @@ static int try_to_unmap_one(struct page 
 		dec_mm_counter(mm, file_rss);
 
 
-	page_remove_rmap(page, vma);
+	page_remove_rmap(page);
 	page_cache_release(page);
 
 out_unmap:
@@ -952,7 +950,7 @@ static int try_to_unmap_cluster(unsigned
 		if (pte_dirty(pteval))
 			set_page_dirty(page);
 
-		page_remove_rmap(page, vma);
+		page_remove_rmap(page);
 		page_cache_release(page);
 		dec_mm_counter(mm, file_rss);
 		(*mapcount)--;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 7/8] badpage: ratelimit print_bad_pte and bad_page
  2008-12-01  0:37 [PATCH 0/8] badpage: more resilient bad page pte and rmap Hugh Dickins
                   ` (5 preceding siblings ...)
  2008-12-01  0:45 ` [PATCH 6/8] badpage: remove vma from page_remove_rmap Hugh Dickins
@ 2008-12-01  0:46 ` Hugh Dickins
  2008-12-03  0:56   ` Andrew Morton
  2008-12-01  0:48 ` [PATCH 8/8] badpage: KERN_ALERT BUG instead of KERN_EMERG Hugh Dickins
  7 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01  0:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Dave Jones, Arjan van de Ven, linux-kernel, linux-mm

print_bad_pte() and bad_page() might each need ratelimiting - especially
for their dump_stacks, almost never of interest, yet not quite dispensible.
Correlating corruption across neighbouring entries can be very helpful,
so allow a burst of 60 reports before keeping quiet for the remainder
of that minute (or allow a steady drip of one report per second).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 mm/memory.c     |   23 +++++++++++++++++++++++
 mm/page_alloc.c |   26 +++++++++++++++++++++++++-
 2 files changed, 48 insertions(+), 1 deletion(-)

--- badpage6/mm/memory.c	2008-11-28 20:40:48.000000000 +0000
+++ badpage7/mm/memory.c	2008-11-28 20:40:50.000000000 +0000
@@ -383,6 +383,29 @@ static void print_bad_pte(struct vm_area
 	pmd_t *pmd = pmd_offset(pud, addr);
 	struct address_space *mapping;
 	pgoff_t index;
+	static unsigned long resume;
+	static unsigned long nr_shown;
+	static unsigned long nr_unshown;
+
+	/*
+	 * Allow a burst of 60 reports, then keep quiet for that minute;
+	 * or allow a steady drip of one report per second.
+	 */
+	if (nr_shown == 60) {
+		if (time_before(jiffies, resume)) {
+			nr_unshown++;
+			return;
+		}
+		if (nr_unshown) {
+			printk(KERN_EMERG
+				"Bad page map: %lu messages suppressed\n",
+				nr_unshown);
+			nr_unshown = 0;
+		}
+		nr_shown = 0;
+	}
+	if (nr_shown++ == 0)
+		resume = jiffies + 60 * HZ;
 
 	mapping = vma->vm_file ? vma->vm_file->f_mapping : NULL;
 	index = linear_page_index(vma, addr);
--- badpage6/mm/page_alloc.c	2008-11-28 20:40:42.000000000 +0000
+++ badpage7/mm/page_alloc.c	2008-11-28 20:40:50.000000000 +0000
@@ -223,6 +223,30 @@ static inline int bad_range(struct zone 
 
 static void bad_page(struct page *page)
 {
+	static unsigned long resume;
+	static unsigned long nr_shown;
+	static unsigned long nr_unshown;
+
+	/*
+	 * Allow a burst of 60 reports, then keep quiet for that minute;
+	 * or allow a steady drip of one report per second.
+	 */
+	if (nr_shown == 60) {
+		if (time_before(jiffies, resume)) {
+			nr_unshown++;
+			goto out;
+		}
+		if (nr_unshown) {
+			printk(KERN_EMERG
+				"Bad page state: %lu messages suppressed\n",
+				nr_unshown);
+			nr_unshown = 0;
+		}
+		nr_shown = 0;
+	}
+	if (nr_shown++ == 0)
+		resume = jiffies + 60 * HZ;
+
 	printk(KERN_EMERG "Bad page state in process %s  pfn:%05lx\n",
 		current->comm, page_to_pfn(page));
 	printk(KERN_EMERG
@@ -232,7 +256,7 @@ static void bad_page(struct page *page)
 	printk(KERN_EMERG "Trying to fix it up, but a reboot is needed\n");
 
 	dump_stack();
-
+out:
 	/* Leave bad fields for debug, except PageBuddy could make trouble */
 	__ClearPageBuddy(page);
 	add_taint(TAINT_BAD_PAGE);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 8/8] badpage: KERN_ALERT BUG instead of KERN_EMERG
  2008-12-01  0:37 [PATCH 0/8] badpage: more resilient bad page pte and rmap Hugh Dickins
                   ` (6 preceding siblings ...)
  2008-12-01  0:46 ` [PATCH 7/8] badpage: ratelimit print_bad_pte and bad_page Hugh Dickins
@ 2008-12-01  0:48 ` Hugh Dickins
  2008-12-01 14:40   ` Christoph Lameter
  7 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01  0:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Dave Jones, Arjan van de Ven, linux-kernel, linux-mm

bad_page() and rmap Eeek messages have said KERN_EMERG for a few years,
which I've followed in print_bad_pte().  These are serious system errors,
on a par with BUGs, but they're not quite emergencies, and we do our best
to carry on: say KERN_ALERT "BUG: " like the x86 oops does.

And remove the "Trying to fix it up, but a reboot is needed" line:
it's not untrue, but I hope the KERN_ALERT "BUG: " conveys as much.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
I've left this proposal until last, expecting some opposition.

Considered adding oops_begin() and oops_end(),
but I'm not at all sure that would work out well for these.

 mm/memory.c     |   15 ++++++++-------
 mm/page_alloc.c |    9 ++++-----
 2 files changed, 12 insertions(+), 12 deletions(-)

--- badpage7/mm/memory.c	2008-11-28 20:40:50.000000000 +0000
+++ badpage8/mm/memory.c	2008-11-28 20:40:52.000000000 +0000
@@ -397,8 +397,8 @@ static void print_bad_pte(struct vm_area
 			return;
 		}
 		if (nr_unshown) {
-			printk(KERN_EMERG
-				"Bad page map: %lu messages suppressed\n",
+			printk(KERN_ALERT
+				"BUG: Bad page map: %lu messages suppressed\n",
 				nr_unshown);
 			nr_unshown = 0;
 		}
@@ -410,26 +410,27 @@ static void print_bad_pte(struct vm_area
 	mapping = vma->vm_file ? vma->vm_file->f_mapping : NULL;
 	index = linear_page_index(vma, addr);
 
-	printk(KERN_EMERG "Bad page map in process %s  pte:%08llx pmd:%08llx\n",
+	printk(KERN_ALERT
+		"BUG: Bad page map in process %s  pte:%08llx pmd:%08llx\n",
 		current->comm,
 		(long long)pte_val(pte), (long long)pmd_val(*pmd));
 	if (page) {
-		printk(KERN_EMERG
+		printk(KERN_ALERT
 		"page:%p flags:%p count:%d mapcount:%d mapping:%p index:%lx\n",
 		page, (void *)page->flags, page_count(page),
 		page_mapcount(page), page->mapping, page->index);
 	}
-	printk(KERN_EMERG
+	printk(KERN_ALERT
 		"addr:%p vm_flags:%08lx anon_vma:%p mapping:%p index:%lx\n",
 		(void *)addr, vma->vm_flags, vma->anon_vma, mapping, index);
 	/*
 	 * Choose text because data symbols depend on CONFIG_KALLSYMS_ALL=y
 	 */
 	if (vma->vm_ops)
-		print_symbol(KERN_EMERG "vma->vm_ops->fault: %s\n",
+		print_symbol(KERN_ALERT "vma->vm_ops->fault: %s\n",
 				(unsigned long)vma->vm_ops->fault);
 	if (vma->vm_file && vma->vm_file->f_op)
-		print_symbol(KERN_EMERG "vma->vm_file->f_op->mmap: %s\n",
+		print_symbol(KERN_ALERT "vma->vm_file->f_op->mmap: %s\n",
 				(unsigned long)vma->vm_file->f_op->mmap);
 	dump_stack();
 	add_taint(TAINT_BAD_PAGE);
--- badpage7/mm/page_alloc.c	2008-11-28 20:40:50.000000000 +0000
+++ badpage8/mm/page_alloc.c	2008-11-28 20:40:52.000000000 +0000
@@ -237,8 +237,8 @@ static void bad_page(struct page *page)
 			goto out;
 		}
 		if (nr_unshown) {
-			printk(KERN_EMERG
-				"Bad page state: %lu messages suppressed\n",
+			printk(KERN_ALERT
+			      "BUG: Bad page state: %lu messages suppressed\n",
 				nr_unshown);
 			nr_unshown = 0;
 		}
@@ -247,13 +247,12 @@ static void bad_page(struct page *page)
 	if (nr_shown++ == 0)
 		resume = jiffies + 60 * HZ;
 
-	printk(KERN_EMERG "Bad page state in process %s  pfn:%05lx\n",
+	printk(KERN_ALERT "BUG: Bad page state in process %s  pfn:%05lx\n",
 		current->comm, page_to_pfn(page));
-	printk(KERN_EMERG
+	printk(KERN_ALERT
 		"page:%p flags:%p count:%d mapcount:%d mapping:%p index:%lx\n",
 		page, (void *)page->flags, page_count(page),
 		page_mapcount(page), page->mapping, page->index);
-	printk(KERN_EMERG "Trying to fix it up, but a reboot is needed\n");
 
 	dump_stack();
 out:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] badpage: KERN_ALERT BUG instead of KERN_EMERG
  2008-12-01  0:48 ` [PATCH 8/8] badpage: KERN_ALERT BUG instead of KERN_EMERG Hugh Dickins
@ 2008-12-01 14:40   ` Christoph Lameter
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2008-12-01 14:40 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Nick Piggin, Dave Jones, Arjan van de Ven,
	linux-kernel, linux-mm

On Mon, 1 Dec 2008, Hugh Dickins wrote:
> And remove the "Trying to fix it up, but a reboot is needed" line:
> it's not untrue, but I hope the KERN_ALERT "BUG: " conveys as much.

Thanks. That was a pretty strange message....

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] badpage: simplify page_alloc flag check+clear
  2008-12-01  0:40 ` [PATCH 1/8] badpage: simplify page_alloc flag check+clear Hugh Dickins
@ 2008-12-01 14:47   ` Christoph Lameter
  2008-12-01 23:50     ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2008-12-01 14:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Russ Anderson, Nick Piggin, Dave Jones,
	Arjan van de Ven, linux-kernel, linux-mm

On Mon, 1 Dec 2008, Hugh Dickins wrote:

>  /*
>   * Flags checked when a page is freed.  Pages being freed should not have
>   * these flags set.  It they are, there is a problem.
>   */
> -#define PAGE_FLAGS_CHECK_AT_FREE (PAGE_FLAGS | 1 << PG_reserved)
> +#define PAGE_FLAGS_CHECK_AT_FREE \
> +	(1 << PG_lru   | 1 << PG_private   | 1 << PG_locked | \
> +	 1 << PG_buddy | 1 << PG_writeback | 1 << PG_reserved | \
> +	 1 << PG_slab  | 1 << PG_swapcache | 1 << PG_active | \
> +	 __PG_UNEVICTABLE | __PG_MLOCKED)

Rename this to PAGE_FLAGS_CLEAR_WHEN_FREE?

> + * Pages being prepped should not have any flags set.  It they are set,
> + * there has been a kernel bug or struct page corruption.
>   */
> -#define PAGE_FLAGS_CHECK_AT_PREP (PAGE_FLAGS | \
> -		1 << PG_reserved | 1 << PG_dirty | 1 << PG_swapbacked)
> +#define PAGE_FLAGS_CHECK_AT_PREP	((1 << NR_PAGEFLAGS) - 1)

These are all the bits. Can we get rid of this definition?

>  	/*
>  	 * For now, we report if PG_reserved was found set, but do not
>  	 * clear it, and do not free the page.  But we shall soon need
>  	 * to do more, for when the ZERO_PAGE count wraps negative.
>  	 */
> -	return PageReserved(page);
> +	if (PageReserved(page))
> +		return 1;
> +	if (page->flags & PAGE_FLAGS_CHECK_AT_PREP)
> +		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
> +	return 0;

The name PAGE_FLAGS_CHECK_AT_PREP is strange. We clear these flags without
message. This is equal to

page->flags &=~PAGE_FLAGS_CHECK_AT_PREP;

You can drop the if...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] badpage: keep any bad page out of circulation
  2008-12-01  0:41 ` [PATCH 2/8] badpage: keep any bad page out of circulation Hugh Dickins
@ 2008-12-01 14:49   ` Christoph Lameter
  2008-12-01 23:19     ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2008-12-01 14:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Nick Piggin, Dave Jones, Arjan van de Ven,
	linux-kernel, linux-mm

On Mon, 1 Dec 2008, Hugh Dickins wrote:

> Until now the bad_page() checkers have special-cased PageReserved, keeping
> those pages out of circulation thereafter.  Now extend the special case to
> all: we want to keep ANY page with bad state out of circulation - the
> "free" page may well be in use by something.

If I screw up with a VM patch then my machine will now die because of OOM
instead of letting me shutdown and reboot?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] badpage: keep any bad page out of circulation
  2008-12-01 14:49   ` Christoph Lameter
@ 2008-12-01 23:19     ` Hugh Dickins
  0 siblings, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01 23:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Nick Piggin, Dave Jones, Arjan van de Ven,
	linux-kernel, linux-mm

On Mon, 1 Dec 2008, Christoph Lameter wrote:
> On Mon, 1 Dec 2008, Hugh Dickins wrote:
> 
> > Until now the bad_page() checkers have special-cased PageReserved, keeping
> > those pages out of circulation thereafter.  Now extend the special case to
> > all: we want to keep ANY page with bad state out of circulation - the
> > "free" page may well be in use by something.
> 
> If I screw up with a VM patch

Oh, perish the thought!

> then my machine will now die because of OOM
> instead of letting me shutdown and reboot?

If you screw up so royally as to allocate every page in the machine
and free it with bad state, yes, that's indeed the way it will tend.
Or, to the extent that you're relying on high orders and low zones,
it will happen even sooner.  Same as if you screw up and forget to
free your pages.

That's okay.  In more normal cases you'll see the warnings before
it's dead, and shutdown and reboot (hopefully a different kernel!)
before it reaches that state.  By the time your patches reach -mm,
I'd hope you'll have weeded out the immediate OOM cases.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] badpage: simplify page_alloc flag check+clear
  2008-12-01 14:47   ` Christoph Lameter
@ 2008-12-01 23:50     ` Hugh Dickins
  2008-12-02  2:21       ` Christoph Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2008-12-01 23:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Russ Anderson, Nick Piggin, Dave Jones,
	Arjan van de Ven, Martin Schwidefsky, linux-kernel, linux-mm

On Mon, 1 Dec 2008, Christoph Lameter wrote:
> On Mon, 1 Dec 2008, Hugh Dickins wrote:
> >  /*
> >   * Flags checked when a page is freed.  Pages being freed should not have
> >   * these flags set.  It they are, there is a problem.
> >   */
> > -#define PAGE_FLAGS_CHECK_AT_FREE (PAGE_FLAGS | 1 << PG_reserved)
> > +#define PAGE_FLAGS_CHECK_AT_FREE \
> > +	(1 << PG_lru   | 1 << PG_private   | 1 << PG_locked | \
> > +	 1 << PG_buddy | 1 << PG_writeback | 1 << PG_reserved | \
> > +	 1 << PG_slab  | 1 << PG_swapcache | 1 << PG_active | \
> > +	 __PG_UNEVICTABLE | __PG_MLOCKED)
> 
> Rename this to PAGE_FLAGS_CLEAR_WHEN_FREE?

No, that's a list of just the ones it's checking at free;
it then (with this patch) goes on to clear all of them.

> 
> > + * Pages being prepped should not have any flags set.  It they are set,
> > + * there has been a kernel bug or struct page corruption.
> >   */
> > -#define PAGE_FLAGS_CHECK_AT_PREP (PAGE_FLAGS | \
> > -		1 << PG_reserved | 1 << PG_dirty | 1 << PG_swapbacked)
> > +#define PAGE_FLAGS_CHECK_AT_PREP	((1 << NR_PAGEFLAGS) - 1)
> 
> These are all the bits. Can we get rid of this definition?

PAGE_FLAGS_CHECK_AT_PREP may not be the best name for it now;
but I do think we need a definition for it, and I'm not sure
that it will remain "all the page flags".

As it was, I just took the existing name, and then included every
flag in it.  I'd love to include the empty space, if any, up as
far as the mmzone bits - is there a convenient way to do that?

It could as well be called PAGE_FLAGS_CLEAR_AT_FREE.  I'm not
sure that it's necessarily the same as all the flags - in fact,
I was rather surprised that the patch booted first time, I was
expecting to find that I'd overlooked some special cases.

I meant to, but didn't, look at Martin's guest page hinting, might
that be defining page flags set even across the free/alloc gap?
Cc'ed Martin now, no need for him to answer, but let's at least
warn him of this patch, something he might need to change with his.

> >  	/*
> >  	 * For now, we report if PG_reserved was found set, but do not
> >  	 * clear it, and do not free the page.  But we shall soon need
> >  	 * to do more, for when the ZERO_PAGE count wraps negative.
> >  	 */
> > -	return PageReserved(page);
> > +	if (PageReserved(page))
> > +		return 1;
> > +	if (page->flags & PAGE_FLAGS_CHECK_AT_PREP)
> > +		page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
> > +	return 0;
> 
> The name PAGE_FLAGS_CHECK_AT_PREP is strange. We clear these flags without
> message.

Would you be happier with PAGE_FLAGS_CLEAR_AT_FREE, then?
That would be fine by me, even if we add the gap to mmzone later.

One of the problems with PREP is that it's not obvious that it
means ALLOC: yes, I'd be happier with PAGE_FLAGS_CLEAR_AT_FREE.

> This is equal to
> 
> page->flags &=~PAGE_FLAGS_CHECK_AT_PREP;
> 
> You can drop the if...

I was intentionally following the existing style of
	if (PageDirty(page))
		__ClearPageDirty(page);
	if (PageSwapBacked(page))
		__ClearPageSwapBacked(page);
which is going out of its way to avoid dirtying a cacheline.

In all the obvious cases, I think the cacheline will already
be dirty; but I guess there's an important case (high order
but not compound?) which has a lot of clean cachelines.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] badpage: simplify page_alloc flag check+clear
  2008-12-01 23:50     ` Hugh Dickins
@ 2008-12-02  2:21       ` Christoph Lameter
  2008-12-02 10:39         ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2008-12-02  2:21 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Russ Anderson, Nick Piggin, Dave Jones,
	Arjan van de Ven, Martin Schwidefsky, linux-kernel, linux-mm

On Mon, 1 Dec 2008, Hugh Dickins wrote:

> > Rename this to PAGE_FLAGS_CLEAR_WHEN_FREE?
>
> No, that's a list of just the ones it's checking at free;
> it then (with this patch) goes on to clear all of them.

But they are always clear on free. The checking is irrelevant.

> One of the problems with PREP is that it's not obvious that it
> means ALLOC: yes, I'd be happier with PAGE_FLAGS_CLEAR_AT_FREE.

Ok.

>
> > This is equal to
> >
> > page->flags &=~PAGE_FLAGS_CHECK_AT_PREP;
> >
> > You can drop the if...
>
> I was intentionally following the existing style of
> 	if (PageDirty(page))
> 		__ClearPageDirty(page);
> 	if (PageSwapBacked(page))
> 		__ClearPageSwapBacked(page);
> which is going out of its way to avoid dirtying a cacheline.
>
> In all the obvious cases, I think the cacheline will already
> be dirty; but I guess there's an important case (high order
> but not compound?) which has a lot of clean cachelines.

Free or alloc dirties the cacheline of the page struct regardless since
the LRU field is always modified.

Well, ok. The not compound high order case may be an exception.

But then lets at least make a single check

If (page->flags & (all the flags including dirty and SwapBacked))
	zap-em.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] badpage: simplify page_alloc flag check+clear
  2008-12-02  2:21       ` Christoph Lameter
@ 2008-12-02 10:39         ` Hugh Dickins
  2008-12-02 13:12           ` Christoph Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2008-12-02 10:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Russ Anderson, Nick Piggin, Dave Jones,
	Arjan van de Ven, Martin Schwidefsky, linux-kernel, linux-mm

On Mon, 1 Dec 2008, Christoph Lameter wrote:
> On Mon, 1 Dec 2008, Hugh Dickins wrote:
> > > > PAGE_FLAGS_CHECK_AT_FREE
> 
> > > Rename this to PAGE_FLAGS_CLEAR_WHEN_FREE?
> >
> > No, that's a list of just the ones it's checking at free;
> > it then (with this patch) goes on to clear all of them.
> 
> But they are always clear on free. The checking is irrelevant.

How about CHECK_PAGE_FLAGS_CLEAR_AT_FREE?

> 
> > One of the problems with PREP is that it's not obvious that it
> > means ALLOC: yes, I'd be happier with PAGE_FLAGS_CLEAR_AT_FREE.
> 
> Ok.

If you like the suggestion above, then for this one
how about CHECK_PAGE_FLAGS_CLEAR_AT_ALLOC?

I just haven't changed those names in the patch, continued to
use the names from before: they're probably not the names I'd
have chosen, but it is hard to write a paragraph in a name.

The one I really disliked was "PAGE_FLAGS" for an obscure
subset of page flags, and have got rid of that.

> > > This is equal to
> > >
> > > page->flags &=~PAGE_FLAGS_CHECK_AT_PREP;
> > >
> > > You can drop the if...
> >
> > I was intentionally following the existing style of
> > 	if (PageDirty(page))
> > 		__ClearPageDirty(page);
> > 	if (PageSwapBacked(page))
> > 		__ClearPageSwapBacked(page);
> > which is going out of its way to avoid dirtying a cacheline.
> >
> > In all the obvious cases, I think the cacheline will already
> > be dirty; but I guess there's an important case (high order
> > but not compound?) which has a lot of clean cachelines.
> 
> Free or alloc dirties the cacheline of the page struct regardless since
> the LRU field is always modified.
> 
> Well, ok. The not compound high order case may be an exception.
> 
> But then lets at least make a single check
> 
> If (page->flags & (all the flags including dirty and SwapBacked))
> 	zap-em.

That's exactly what I did, isn't it?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] badpage: simplify page_alloc flag check+clear
  2008-12-02 10:39         ` Hugh Dickins
@ 2008-12-02 13:12           ` Christoph Lameter
  2008-12-02 14:12             ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2008-12-02 13:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Russ Anderson, Nick Piggin, Dave Jones,
	Arjan van de Ven, Martin Schwidefsky, linux-kernel, linux-mm

On Tue, 2 Dec 2008, Hugh Dickins wrote:

> > But they are always clear on free. The checking is irrelevant.
>
> How about CHECK_PAGE_FLAGS_CLEAR_AT_FREE?

Strange name.

> The one I really disliked was "PAGE_FLAGS" for an obscure
> subset of page flags, and have got rid of that.

Good.

> > If (page->flags & (all the flags including dirty and SwapBacked))
> > 	zap-em.
>
> That's exactly what I did, isn't it?

Yes but you added another instance of this. Can you consolidate all the
check and clears into one?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] badpage: simplify page_alloc flag check+clear
  2008-12-02 13:12           ` Christoph Lameter
@ 2008-12-02 14:12             ` Hugh Dickins
  2008-12-03  0:57               ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2008-12-02 14:12 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Russ Anderson, Nick Piggin, Dave Jones,
	Arjan van de Ven, Martin Schwidefsky, linux-kernel, linux-mm

On Tue, 2 Dec 2008, Christoph Lameter wrote:
> On Tue, 2 Dec 2008, Hugh Dickins wrote:
> 
> > > But they are always clear on free. The checking is irrelevant.
> >
> > How about CHECK_PAGE_FLAGS_CLEAR_AT_FREE?
> 
> Strange name.

Looks like I'm not going to be able to satisfy you then.  I didn't
introduce the names in the patch, so let's leave them as is for now,
and everybody can muse on what they should get called in the end.

> > > If (page->flags & (all the flags including dirty and SwapBacked))
> > > 	zap-em.
> >
> > That's exactly what I did, isn't it?
> 
> Yes but you added another instance of this.

Did I?  Whereabouts?  I wonder if you're thinking of the
+	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
in prep_new_page(), which replaces the clearing of another
collection of flags which somehow didn't get named before.

That clearing is a temporary measure, to keep the handling
of PageReserved unchanged in that patch; then it vanishes in the
next patch, where we treat all bad_page candidates the same way.

> Can you consolidate all the check and clears into one?

You mean one test_and_clear_bits() that somehow covers the different
cases of what we expect at free time and what we need at alloc time?
I don't think so.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] badpage: ratelimit print_bad_pte and bad_page
  2008-12-01  0:46 ` [PATCH 7/8] badpage: ratelimit print_bad_pte and bad_page Hugh Dickins
@ 2008-12-03  0:56   ` Andrew Morton
  2008-12-03 13:04     ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2008-12-03  0:56 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: nickpiggin, davej, arjan, linux-kernel, linux-mm

On Mon, 1 Dec 2008 00:46:53 +0000 (GMT)
Hugh Dickins <hugh@veritas.com> wrote:

> print_bad_pte() and bad_page() might each need ratelimiting - especially
> for their dump_stacks, almost never of interest, yet not quite dispensible.
> Correlating corruption across neighbouring entries can be very helpful,
> so allow a burst of 60 reports before keeping quiet for the remainder
> of that minute (or allow a steady drip of one report per second).
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ---
> 
>  mm/memory.c     |   23 +++++++++++++++++++++++
>  mm/page_alloc.c |   26 +++++++++++++++++++++++++-
>  2 files changed, 48 insertions(+), 1 deletion(-)
> 
> --- badpage6/mm/memory.c	2008-11-28 20:40:48.000000000 +0000
> +++ badpage7/mm/memory.c	2008-11-28 20:40:50.000000000 +0000
> @@ -383,6 +383,29 @@ static void print_bad_pte(struct vm_area
>  	pmd_t *pmd = pmd_offset(pud, addr);
>  	struct address_space *mapping;
>  	pgoff_t index;
> +	static unsigned long resume;
> +	static unsigned long nr_shown;
> +	static unsigned long nr_unshown;
> +
> +	/*
> +	 * Allow a burst of 60 reports, then keep quiet for that minute;
> +	 * or allow a steady drip of one report per second.
> +	 */
> +	if (nr_shown == 60) {
> +		if (time_before(jiffies, resume)) {
> +			nr_unshown++;
> +			return;
> +		}
> +		if (nr_unshown) {
> +			printk(KERN_EMERG
> +				"Bad page map: %lu messages suppressed\n",
> +				nr_unshown);
> +			nr_unshown = 0;
> +		}
> +		nr_shown = 0;
> +	}
> +	if (nr_shown++ == 0)
> +		resume = jiffies + 60 * HZ;
>  
>  	mapping = vma->vm_file ? vma->vm_file->f_mapping : NULL;
>  	index = linear_page_index(vma, addr);
> --- badpage6/mm/page_alloc.c	2008-11-28 20:40:42.000000000 +0000
> +++ badpage7/mm/page_alloc.c	2008-11-28 20:40:50.000000000 +0000
> @@ -223,6 +223,30 @@ static inline int bad_range(struct zone 
>  
>  static void bad_page(struct page *page)
>  {
> +	static unsigned long resume;
> +	static unsigned long nr_shown;
> +	static unsigned long nr_unshown;
> +
> +	/*
> +	 * Allow a burst of 60 reports, then keep quiet for that minute;
> +	 * or allow a steady drip of one report per second.
> +	 */
> +	if (nr_shown == 60) {
> +		if (time_before(jiffies, resume)) {
> +			nr_unshown++;
> +			goto out;
> +		}
> +		if (nr_unshown) {
> +			printk(KERN_EMERG
> +				"Bad page state: %lu messages suppressed\n",
> +				nr_unshown);
> +			nr_unshown = 0;
> +		}
> +		nr_shown = 0;
> +	}
> +	if (nr_shown++ == 0)
> +		resume = jiffies + 60 * HZ;
> +

gee, that's pretty elaborate.  There's no way of using the
possibly-enhanced ratelimit.h?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] badpage: simplify page_alloc flag check+clear
  2008-12-02 14:12             ` Hugh Dickins
@ 2008-12-03  0:57               ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2008-12-03  0:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: cl, rja, nickpiggin, davej, arjan, schwidefsky, linux-kernel, linux-mm

On Tue, 2 Dec 2008 14:12:05 +0000 (GMT)
Hugh Dickins <hugh@veritas.com> wrote:

> On Tue, 2 Dec 2008, Christoph Lameter wrote:
> > On Tue, 2 Dec 2008, Hugh Dickins wrote:
> > 
> > > > But they are always clear on free. The checking is irrelevant.
> > >
> > > How about CHECK_PAGE_FLAGS_CLEAR_AT_FREE?
> > 
> > Strange name.
> 
> Looks like I'm not going to be able to satisfy you then.  I didn't
> introduce the names in the patch, so let's leave them as is for now,
> and everybody can muse on what they should get called in the end.

It's unclear to me where your discussion with Christoph ended up, so I
went the merge-it-and-see-who-shouts-at-me route.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] badpage: ratelimit print_bad_pte and bad_page
  2008-12-03  0:56   ` Andrew Morton
@ 2008-12-03 13:04     ` Hugh Dickins
  0 siblings, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2008-12-03 13:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, davej, arjan, linux-kernel, linux-mm

On Tue, 2 Dec 2008, Andrew Morton wrote:
> On Mon, 1 Dec 2008 00:46:53 +0000 (GMT)
> Hugh Dickins <hugh@veritas.com> wrote:
> > +	/*
> > +	 * Allow a burst of 60 reports, then keep quiet for that minute;
> > +	 * or allow a steady drip of one report per second.
> > +	 */
> > +	if (nr_shown == 60) {
> > +		if (time_before(jiffies, resume)) {
> > +			nr_unshown++;
> > +			goto out;
> > +		}
> > +		if (nr_unshown) {
> > +			printk(KERN_EMERG
> > +				"Bad page state: %lu messages suppressed\n",
> > +				nr_unshown);
> > +			nr_unshown = 0;
> > +		}
> > +		nr_shown = 0;
> > +	}
> > +	if (nr_shown++ == 0)
> > +		resume = jiffies + 60 * HZ;
> > +
> 
> gee, that's pretty elaborate.  There's no way of using the
> possibly-enhanced ratelimit.h?

Thanks a lot for the pointer: I'd browsed around kernel/printk.c and
not found what I needed, hadn't realized there's a lib/ratelimit.c.

It looks eerily like what I'm trying to do, just a less specific
missed/suppressed message, never mind that.  I'll try making a patch
later to replace this (in its subsequent KERN_ALERT form) by that -
in doing so, perhaps I'll encounter a problem, but should be good.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-12-03 13:04 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-01  0:37 [PATCH 0/8] badpage: more resilient bad page pte and rmap Hugh Dickins
2008-12-01  0:40 ` [PATCH 1/8] badpage: simplify page_alloc flag check+clear Hugh Dickins
2008-12-01 14:47   ` Christoph Lameter
2008-12-01 23:50     ` Hugh Dickins
2008-12-02  2:21       ` Christoph Lameter
2008-12-02 10:39         ` Hugh Dickins
2008-12-02 13:12           ` Christoph Lameter
2008-12-02 14:12             ` Hugh Dickins
2008-12-03  0:57               ` Andrew Morton
2008-12-01  0:41 ` [PATCH 2/8] badpage: keep any bad page out of circulation Hugh Dickins
2008-12-01 14:49   ` Christoph Lameter
2008-12-01 23:19     ` Hugh Dickins
2008-12-01  0:42 ` [PATCH 3/8] badpage: replace page_remove_rmap Eeek and BUG Hugh Dickins
2008-12-01  0:43 ` [PATCH 4/8] badpage: vm_normal_page use print_bad_pte Hugh Dickins
2008-12-01  0:44 ` [PATCH 5/8] badpage: zap print_bad_pte on swap and file Hugh Dickins
2008-12-01  0:45 ` [PATCH 6/8] badpage: remove vma from page_remove_rmap Hugh Dickins
2008-12-01  0:46 ` [PATCH 7/8] badpage: ratelimit print_bad_pte and bad_page Hugh Dickins
2008-12-03  0:56   ` Andrew Morton
2008-12-03 13:04     ` Hugh Dickins
2008-12-01  0:48 ` [PATCH 8/8] badpage: KERN_ALERT BUG instead of KERN_EMERG Hugh Dickins
2008-12-01 14:40   ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox