Oops in pte_chain_alloc (rmap 12h applied to vanilla 2.4.18) (fwd)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Oops in pte_chain_alloc (rmap 12h applied to vanilla 2.4.18) (fwd)
@ 2002-06-05 20:45 Rik van Riel
  2002-06-05 21:48 ` Jonathan Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Rik van Riel @ 2002-06-05 20:45 UTC (permalink / raw)
  To: linux-mm; +Cc: Michael Chapman

forwarded to linux-mm ... I'm still on holidays ;)

In the mean time, do you have your test program available
somewhere so other people can reproduce the problem ?

---------- Forwarded message ----------
Date: Thu, 6 Jun 2002 06:21:08 +1000 (EST)
From: Michael Chapman <mchapman@beren.hn.org>
Reply-To: Michael Chapman <mchapman@student.usyd.edu.au>
To: riel@nl.linux.org
Subject: Oops in pte_chain_alloc (rmap 12h applied to vanilla 2.4.18)

Firstly, my apologies for mailing you directly. Please tell me if it is
more appropriate to bring this issue to the kernel mailing list instead
(or directly to you _and_ lkml?)

I am able to consistently cause kernel 2.4.18, patched only with rmap 12h,
to oops in the pte_chain_alloc function in rmap.c.

Under normal load the system would remain up for a few hours before
crashing, however I've found that a simple program that allocates and
frees memory all over the place fairly randomly is able to cause it to
oops almost immediately. Every single oops is on the same line in the
function:

static inline struct pte_chain * pte_chain_alloc(void)
{
        struct pte_chain * pte_chain;

        /* Allocate new pte_chain structs as needed. */
        if (!pte_chain_freelist)
                alloc_new_pte_chains();

        /* Grab the first pte_chain from the freelist. */
        pte_chain = pte_chain_freelist;
        pte_chain_freelist = pte_chain->next;  // *** OOPS OCCURS HERE
        pte_chain->next = NULL;

        return pte_chain;
}

It seems to be independent of any modules I have loaded at the time. In
fact, I have been able to cause the oops even in runlevel 1, with only the
ext3 and jbd modules loaded.

I compiled this kernel with gcc 2.96. The machine is an i686, with 384
Meg ram. The configuration I used was kernel-2.4.18-i686-debug.config
provided in Red Hat's 2.4.18-4 kernel source package. My reason for using
this config was that I had originally seen this oops occur with the Red
Hat kernel, and I wanted this 2.4.18+rmap kernel to be configured as
close as possible to the Red Hat one.

I've done extensive tests with memtest86 and cpuburn. Neither of these
indicated any problems at all.

I'm happy to provide more info if you need it.

Michael Chapman
mchapman@student.usyd.edu.au


----
  Oops trace: (ksymoops run post-mortem on 2.4.16)

ksymoops 2.4.4 on i686 2.4.16.  Options used
     -V (default)
     -K (specified)
     -L (specified)
     -O (specified)
     -m /boot/System.map-2.4.18 (specified)

Reading Oops report from the terminal
Jun  3 09:58:02 beren kernel: Unable to handle kernel paging request at
virtual address 14000000
Jun  3 09:58:03 beren kernel: c0134845
Jun  3 09:58:03 beren kernel: *pde = 00000000
Jun  3 09:58:03 beren kernel: Oops: 0000
Jun  3 09:58:03 beren kernel: CPU:    0
Jun  3 09:58:03 beren kernel: EIP:    0010:[<c0134845>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Jun  3 09:58:03 beren kernel: EFLAGS: 00010206
Jun  3 09:58:03 beren kernel: eax: 00000048   ebx: c1419818   ecx:
c0247764   edx: 14000000
Jun  3 09:58:03 beren kernel: esi: d2154d2c   edi: 12bdb067   ebp:
00000025   esp: d49e9e64
Jun  3 09:58:03 beren kernel: ds: 0018   es: 0018   ss: 0018
Jun  3 09:58:03 beren kernel: Process crash (pid: 1202,
stackpage=d49e9000)
Jun  3 09:58:03 beren kernel: Stack: c013454e c1419818 d2154d2c c01244b6
00000001 d7812494 40f4b000 00000001
Jun  3 09:58:07 beren kernel:        c01244f7 d7812494 d6fbb65c d2154d2c
00000001 40f4b000 d49e8000 40eb1000
Jun  3 09:58:07 beren kernel:        d494840c d7812494 d6fbb65c d7812494
40f4b000 00000001 c012479a d7812494
Jun  3 09:58:07 beren kernel: Call Trace: [<c013454e>] [<c01244b6>]
[<c01244f7>] [<c012479a>] [<c0124c34>]
Jun  3 09:58:07 beren kernel:    [<c0113a2a>] [<da8c98ad>] [<da8c98c0>]
[<da8c98cb>] [<c011e3c3>] [<c011ac1b>]
Jun  3 09:58:07 beren kernel:    [<c0114632>] [<c01138a0>] [<c010700c>]
Jun  3 09:58:07 beren kernel: Code: 8b 02 a3 c8 d7 2a c0 89 d0 c7 02 00 00
00 00 c3 8d 74 26 00

>>EIP; c0134845 <pte_chain_alloc+15/30>   <=====
Trace; c013454e <page_add_rmap+2e/40>
Trace; c01244b6 <do_anonymous_page+f6/100>
Trace; c01244f7 <do_no_page+37/210>
Trace; c012479a <handle_mm_fault+ca/150>
Trace; c0124c34 <__vma_link+64/c0>
Trace; c0113a2a <do_page_fault+18a/4cb>
Trace; da8c98ad <END_OF_CODE+1a5de255/????>
Trace; da8c98c0 <END_OF_CODE+1a5de268/????>
Trace; da8c98cb <END_OF_CODE+1a5de273/????>
Trace; c011e3c3 <timer_bh+213/250>
Trace; c011ac1b <bh_action+1b/50>
Trace; c0114632 <schedule+2f2/320>
Trace; c01138a0 <do_page_fault+0/4cb>
Trace; c010700c <error_code+34/3c>
Code;  c0134845 <pte_chain_alloc+15/30>
00000000 <_EIP>:
Code;  c0134845 <pte_chain_alloc+15/30>   <=====
   0:   8b 02                     mov    (%edx),%eax   <=====
Code;  c0134847 <pte_chain_alloc+17/30>
   2:   a3 c8 d7 2a c0            mov    %eax,0xc02ad7c8
Code;  c013484c <pte_chain_alloc+1c/30>
   7:   89 d0                     mov    %edx,%eax
Code;  c013484e <pte_chain_alloc+1e/30>
   9:   c7 02 00 00 00 00         movl   $0x0,(%edx)
Code;  c0134854 <pte_chain_alloc+24/30>
   f:   c3                        ret
Code;  c0134855 <pte_chain_alloc+25/30>
  10:   8d 74 26 00               lea    0x0(%esi,1),%esi

----
  Program that causes the kernel to oops almost immediately:

#include <malloc.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>

/*
  This program is crude, but effective. Expect to oops the kernel
  after just a couple of seconds!

  Tweak these #defines as necessary.
  The current values work nicely for a box with 384 Megs of RAM.
*/
#define NUM_BUFFERS 100
#define MAX_BUFFER_SIZE 6553600

void main() {
        char* buffers[NUM_BUFFERS];
        int i, size;

        srandom(time(NULL));
        memset(buffers, 0, sizeof(void*) * NUM_BUFFERS);
        while (1) {
                for (i = 0; i < NUM_BUFFERS; ++i) {
                        if (buffers[i])
                                free(buffers[i]);
                        size = random() % (MAX_BUFFER_SIZE - 1) + 1;
                        buffers[i] = malloc(size);
                        memset(buffers[i], 1, size);
                }
        }
}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in pte_chain_alloc (rmap 12h applied to vanilla 2.4.18) (fwd)
  2002-06-05 20:45 Oops in pte_chain_alloc (rmap 12h applied to vanilla 2.4.18) (fwd) Rik van Riel
@ 2002-06-05 21:48 ` Jonathan Morton
  2002-06-06  4:29   ` Michael Chapman
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Morton @ 2002-06-05 21:48 UTC (permalink / raw)
  To: Rik van Riel, linux-mm; +Cc: Michael Chapman

>I compiled this kernel with gcc 2.96.

I understood you weren't supposed to do that.  Try 2.95.3.

-- 
--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
website:  http://www.chromatix.uklinux.net/
geekcode: GCS$/E dpu(!) s:- a21 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$
           V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
tagline:  The key to knowledge is not to rely on people to teach you it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in pte_chain_alloc (rmap 12h applied to vanilla 2.4.18) (fwd)
  2002-06-05 21:48 ` Jonathan Morton
@ 2002-06-06  4:29   ` Michael Chapman
  2002-06-06  4:39     ` Benjamin LaHaise
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Chapman @ 2002-06-06  4:29 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Rik van Riel, linux-mm

On Wed, 5 Jun 2002, Jonathan Morton wrote:
> >I compiled this kernel with gcc 2.96.
> 
> I understood you weren't supposed to do that.  Try 2.95.3.

OK, I've now tried that. It still crashes on the same line of code.

Michael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in pte_chain_alloc (rmap 12h applied to vanilla 2.4.18) (fwd)
  2002-06-06  4:29   ` Michael Chapman
@ 2002-06-06  4:39     ` Benjamin LaHaise
  2002-06-06  4:47       ` Michael Chapman
  0 siblings, 1 reply; 5+ messages in thread
From: Benjamin LaHaise @ 2002-06-06  4:39 UTC (permalink / raw)
  To: Michael Chapman; +Cc: Jonathan Morton, Rik van Riel, linux-mm

On Thu, Jun 06, 2002 at 02:29:18PM +1000, Michael Chapman wrote:
> On Wed, 5 Jun 2002, Jonathan Morton wrote:
> > >I compiled this kernel with gcc 2.96.
> > 
> > I understood you weren't supposed to do that.  Try 2.95.3.
> 
> OK, I've now tried that. It still crashes on the same line of code.

This looks like a memory corruption footprint:

Jun  3 09:58:02 beren kernel: Unable to handle kernel paging request at virtual address 14000000

Have you tried running memtest86 on the machine?  A few bugzilla reports 
have turned up with similar footprints that have all turned out to be 
bad ram, so it is worth investigating.

		-ben
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in pte_chain_alloc (rmap 12h applied to vanilla 2.4.18) (fwd)
  2002-06-06  4:39     ` Benjamin LaHaise
@ 2002-06-06  4:47       ` Michael Chapman
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Chapman @ 2002-06-06  4:47 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Jonathan Morton, Rik van Riel, linux-mm

On Thu, 6 Jun 2002, Benjamin LaHaise wrote:
> On Thu, Jun 06, 2002 at 02:29:18PM +1000, Michael Chapman wrote:
> > On Wed, 5 Jun 2002, Jonathan Morton wrote:
> > > >I compiled this kernel with gcc 2.96.
> > > 
> > > I understood you weren't supposed to do that.  Try 2.95.3.
> > 
> > OK, I've now tried that. It still crashes on the same line of code.
> 
> This looks like a memory corruption footprint:
> 
> Jun  3 09:58:02 beren kernel: Unable to handle kernel paging request at virtual address 14000000
> 
> Have you tried running memtest86 on the machine?  A few bugzilla reports 
> have turned up with similar footprints that have all turned out to be 
> bad ram, so it is worth investigating.

Yes I have. I ran memtest (version 3.0) for over 12 hours a couple of days 
ago. Not one problem reported.

> 		-ben

Michael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-06-06  4:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-06-05 20:45 Oops in pte_chain_alloc (rmap 12h applied to vanilla 2.4.18) (fwd) Rik van Riel
2002-06-05 21:48 ` Jonathan Morton
2002-06-06  4:29   ` Michael Chapman
2002-06-06  4:39     ` Benjamin LaHaise
2002-06-06  4:47       ` Michael Chapman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox