linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gene Heskett <gene.heskett@verizon.net>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Mel Gorman <mel@csn.ul.ie>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: Bad page state (was Re: Linux 2.6.31-rc7)
Date: Sun, 23 Aug 2009 04:20:46 -0400	[thread overview]
Message-ID: <200908230420.46228.gene.heskett@verizon.net> (raw)
In-Reply-To: <20090823072246.GA20028@localhost>

On Sunday 23 August 2009, Wu Fengguang wrote:
>On Sat, Aug 22, 2009 at 12:17:48PM +0800, Linus Torvalds wrote:
>> On Fri, 21 Aug 2009, Gene Heskett wrote:
>> > From messages, I already have a problem with lzma too:
>>
>> And for this too, can you tell what the last working kernel was?
>>
>> Does the problem happen consistently? (And btw, it's not probably so much
>> lzma, but something random that released a page without clearing some of
>> the page flags or something).
>>
>> Wu - I'm not seeing a lot of changes to compund page handling except for
>> commit 20a0307c0396c2edb651401d2f2db193dda2f3c9 ("mm: introduce
>> PageHuge() for testing huge/gigantic pages").
>>
>> That one removed the
>>
>> 	set_compound_page_dtor(page, free_compound_page);
>>
>> thing from prep_compound_gigantic_page(), which looks a bit odd and
>> suspicious (the commit message only talks about _moving_ it). But I don't
>> know the hugetlb code.
>
>Sorry for not describing the remove in changelog.  Remove of that line
>was proposed by Mel and I think it changed nothing in behavior.
>Because the only possible call train is:
>
>        gather_bootmem_prealloc()
>                prep_compound_huge_page()
>                        prep_compound_gigantic_page()
>==>                             set_compound_page_dtor(page,
> free_compound_page); prep_new_huge_page()
>==>                     set_compound_page_dtor(page, free_huge_page);
>
>So obviously the first set_compound_page_dtor() call is extraordinary.
>
>> But that commit went into -rc1 already.  Gene, I know you sent me email
>> about a later -rc release, but maybe you didn't test it on that machine
>> or with that config?
>>
>> > Aug 21 22:37:47 coyote kernel: [ 1030.152737] BUG: Bad page state in
>> > process lzma  pfn:a1093 Aug 21 22:37:47 coyote kernel: [ 1030.152743]
>> > page:c28fc260 flags:80004000 count:0 mapcount:0 mapping:(null) index:0
>> > Aug 21 22:37:47 coyote kernel: [ 1030.152747] Pid: 17927, comm: lzma
>> > Not tainted 2.6.31-rc7 #1 Aug 21 22:37:47 coyote kernel: [ 1030.152750]
>> > Call Trace:
>> > Aug 21 22:37:47 coyote kernel: [ 1030.152758]  [<c130e363>] ?
>> > printk+0x23/0x40 Aug 21 22:37:47 coyote kernel: [ 1030.152763] 
>> > [<c108404f>] bad_page+0xcf/0x150 Aug 21 22:37:47 coyote kernel: [
>> > 1030.152767]  [<c10850ed>] get_page_from_freelist+0x37d/0x480 Aug 21
>> > 22:37:47 coyote kernel: [ 1030.152771]  [<c10853cf>]
>> > __alloc_pages_nodemask+0xdf/0x520 Aug 21 22:37:47 coyote kernel: [
>> > 1030.152775]  [<c1096b19>] handle_mm_fault+0x4a9/0x9f0 Aug 21 22:37:47
>> > coyote kernel: [ 1030.152780]  [<c1020d61>] do_page_fault+0x141/0x290
>> > Aug 21 22:37:47 coyote kernel: [ 1030.152784]  [<c1020c20>] ?
>> > do_page_fault+0x0/0x290 Aug 21 22:37:47 coyote kernel: [ 1030.152787] 
>> > [<c1311bcb>] error_code+0x73/0x78 Aug 21 22:37:47 coyote kernel: [
>> > 1030.152789] Disabling lock debugging due to kernel taint
>>
>> It looks like 'flags' is the one that causes this problem at allocation
>> time (count, mapcount, mapping and index all look nicely zeroed).
>>
>> In particular, it's the 0x4000 bit (the high bit, which is also set, is
>> the upper field bits for page section/node/zone numbers etc), which is
>> either PG_head or PG_compound depending on CONFIG_PAGEFLAGS_EXTENDED.
>>
>> And in your case, since you have CONFIG_PAGEFLAGS_EXTENDED=y, it would be
>> PG_head.
>
>Right. btw it takes time to reverse engineer the page flag names each
>time it oops. Does it make sense to print a more readable form, eg.
>
>        flags:80004000 (MOVABLE,head)
>
>?
>
>> Btw guys, why don't we check PG_head etc at free time when we add the
>> page to the free list? Now we get that annoying error only when it is way
>> too late, and have no way to know who screwed up..
>
>And what puzzled me is that PG_head should have been cleared by
>free_pages_check():
>
>        if (page->flags & PAGE_FLAGS_CHECK_AT_PREP)
>                page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
>
>Thanks,
>Fengguang

I changed the vmlinuz compression to gzip and rebooted to it last night, and 
got this shortly after the bootup to -rc7 with the kernal cli argument that 
makes sensors work on an asus board again:

Aug 22 22:29:07 coyote kernel: [ 2449.053652] BUG: Bad page state in process python  pfn:a0e93                                            
Aug 22 22:29:07 coyote kernel: [ 2449.053658] page:c28fc260 flags:80004000 count:0 mapcount:0 mapping:(null) index:0                      
Aug 22 22:29:07 coyote kernel: [ 2449.053662] Pid: 4818, comm: python Not tainted 2.6.31-rc7 #3                                           
Aug 22 22:29:07 coyote kernel: [ 2449.053664] Call Trace:                                                                                 
Aug 22 22:29:07 coyote kernel: [ 2449.053672]  [<c130fb33>] ? printk+0x23/0x40                                                            
Aug 22 22:29:07 coyote kernel: [ 2449.053678]  [<c108352f>] bad_page+0xcf/0x150                                                           
Aug 22 22:29:07 coyote kernel: [ 2449.053682]  [<c10845cd>] get_page_from_freelist+0x37d/0x480                                            
Aug 22 22:29:07 coyote kernel: [ 2449.053686]  [<c10848af>] __alloc_pages_nodemask+0xdf/0x520                                             
Aug 22 22:29:07 coyote kernel: [ 2449.053691]  [<c1095ff9>] handle_mm_fault+0x4a9/0x9f0                                                   
Aug 22 22:29:07 coyote kernel: [ 2449.053695]  [<c105ca83>] ? tick_dev_program_event+0x43/0xf0                                            
Aug 22 22:29:07 coyote kernel: [ 2449.053699]  [<c105cbd6>] ? tick_program_event+0x36/0x60                                                
Aug 22 22:29:07 coyote kernel: [ 2449.053703]  [<c1020d61>] do_page_fault+0x141/0x290                                                     
Aug 22 22:29:07 coyote kernel: [ 2449.053707]  [<c1020c20>] ? do_page_fault+0x0/0x290                                                     
Aug 22 22:29:07 coyote kernel: [ 2449.053710]  [<c131339b>] error_code+0x73/0x78                                                          
Aug 22 22:29:07 coyote kernel: [ 2449.053712] Disabling lock debugging due to kernel taint

This doesn't look exactly like the previous one but the result is similar.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
The NRA is offering FREE Associate memberships to anyone who wants them.
<https://www.nrahq.org/nrabonus/accept-membership.asp>

Now I am depressed ...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-08-26  1:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <alpine.LFD.2.01.0908211810390.3158@localhost.localdomain>
     [not found] ` <200908212248.40987.gene.heskett@verizon.net>
2009-08-22  4:17   ` Linus Torvalds
2009-08-23  7:22     ` Wu Fengguang
2009-08-23  8:20       ` Gene Heskett [this message]
2009-08-23 16:44         ` Linus Torvalds
2009-08-23 17:04           ` Gene Heskett
2009-08-24  2:22           ` Gene Heskett
2009-08-24 13:55     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200908230420.46228.gene.heskett@verizon.net \
    --to=gene.heskett@verizon.net \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox