From mboxrd@z Thu Jan  1 00:00:00 1970
Received: by wr-out-0506.google.com with SMTP id c37so459109wra.26
        for <linux-mm@kvack.org>; Fri, 18 Apr 2008 06:22:58 -0700 (PDT)
Message-ID: <19f34abd0804180622l4f89191cp4cc7833822e058f5@mail.gmail.com>
Date: Fri, 18 Apr 2008 15:22:57 +0200
From: "Vegard Nossum" <vegard.nossum@gmail.com>
Subject: Re: 2.6.25-mm1: not looking good
In-Reply-To: <48089BCA.1090704@windriver.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20080417160331.b4729f0c.akpm@linux-foundation.org>
	 <20080417164034.e406ef53.akpm@linux-foundation.org>
	 <20080417171413.6f8458e4.akpm@linux-foundation.org>
	 <48080FE7.1070400@windriver.com> <20080418073732.GA22724@elte.hu>
	 <19f34abd0804180446u2d6f17damf391a8c0584358b8@mail.gmail.com>
	 <20080418123439.GA17013@elte.hu>
	 <19f34abd0804180541l7b4d14a6tb13bdd51dd533d70@mail.gmail.com>
	 <48089BCA.1090704@windriver.com>
Sender: owner-linux-mm@kvack.org
Return-Path: <owner-linux-mm@kvack.org>
To: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>, Andrew Morton <akpm@linux-foundation.org>, tglx@linutronix.de, penberg@cs.helsinki.fi, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, jmorris@namei.org, sds@tycho.nsa.gov
List-ID: <linux-mm.kvack.org>

On Fri, Apr 18, 2008 at 3:02 PM, Jason Wessel
<jason.wessel@windriver.com> wrote:
> Vegard Nossum wrote:
>  > On Fri, Apr 18, 2008 at 2:34 PM, Ingo Molnar <mingo@elte.hu> wrote:
>  >
>  >>  * Vegard Nossum <vegard.nossum@gmail.com> wrote:
>  >>
>  >>  > With the patch below, it seems 100% reproducible to me (7 out of 7
>  >>  > bootups hung).
>  >>  >
>  >>  > The number of loops it could do before hanging were, in order: 697,
>  >>  > 898, 237, 55, 45, 92, 59
>  >>
>  >>  cool! Jason: i think that particular self-test should be repeated 1000
>  >>  times before reporting success ;-)
>  >>
>  >
>  > BTW, I just tested a 32-bit config and it hung after 55 iterations as well.
>  >
>  > Vegard
>  >
>  >
>  >
>  I assume this was SMP?

Yes. But now that I realize this, I tried running same kernel with
qemu, using -smp 16, and it seems to be stuck here:

[   16.562659] kgdb: Registered I/O driver kgdbts.
[   16.565875] kgdbts:RUN plant and detach test

and the code is at kgdb_handle_exception():

        /*
         * Wait for the other CPUs to be notified and be waiting for us:
         */
        for_each_online_cpu(i) {
                while (!atomic_read(&cpu_in_kgdb[i]))
                        cpu_relax();
        }


>
>  While I had not tried it yet, my guess would have been this did not
>  happen on a UP kernel.  If it does occur on a UP kernel it means the
>  problem is squarely between the task scheduling after the exception is
>  handled and the kgdb state logic for re-entering the debug state after a
>  single step exception occurs.
>
>  It seems reasonable to go for 1000 iterations of this particular test to
>  declare success as pointed out by Ingo.  Previous versions of kgdb
>  handled some of the irq + single step + cpu sync slightly differently
>  and it is entirely possible there is a regression there.
>
>  Jason.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>