From: Jack Steiner <steiner@sgi.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: tglx@linutronix.de, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Mike Travis <travis@sgi.com>
Subject: Re: [bug] Re: [PATCH] - Fix stack overflow for large values of MAX_APICS
Date: Tue, 24 Jun 2008 17:03:35 -0500 [thread overview]
Message-ID: <20080624220335.GA8039@sgi.com> (raw)
In-Reply-To: <20080624102401.GA27614@elte.hu>
On Tue, Jun 24, 2008 at 12:24:01PM +0200, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
> > * Jack Steiner <steiner@sgi.com> wrote:
> >
> > > physid_mask_of_physid() causes a huge stack (12k) to be created if
> > > the number of APICS is large. Replace physid_mask_of_physid() with a
> > > new function that does not create large stacks. This is a problem
> > > only on large x86_64 systems.
> >
> > this indeed fixes the crash i reported here:
> >
> > http://lkml.org/lkml/2008/6/19/98
> >
> > so i've added both this and the MAXAPICS patch to tip/x86/uv, and will
> > test it some more. Lets hope it goes all well this time :-)
>
> -tip auto-testing found a new boot failure on x86 which happens if
> NR_CPUS is changed from 8 to 4096. The hang goes like this:
>
Still looking but here is what I have found so far.
The most obvious change was to revert the patch that changed MAX_APICS to
32k. With this patch reverted, the system still hangs at the same spot.
I noticed that the hang is random. It usually occurs at acpi_event_init()
but sometimes it hangs at a different place.
I also observed that the hang does not always occur. The system will
boot to the point of mounting /root, then panics because the mount
fails. I expect that this is a different failure due to missing drivers.
I'll chase that down later.
I added trace code & isolated the hang to a call to synchronize_rcu().
Usually from netlink_change_ngroups().
If I boot with "maxcpus=1, it never hangs (obviously) but always fails
to mount /root.
Next I changed NR_CPUS to 128. I still see random hangs at the call
to acpi_event_init().
I'll chase this more tomorrow. Has anyone else seen any failures that might be
related???
> Linux version 2.6.26-rc7-tip (mingo@dione) (gcc version 4.2.3) #10233 SMP
> Tue Jun 24 12:13:46 CEST 2008
> [...]
> initcall init_mnt_writers+0x0/0x8c returned 0 after 0 msecs
> calling eventpoll_init+0x0/0x9a
> initcall eventpoll_init+0x0/0x9a returned 0 after 0 msecs
> calling anon_inode_init+0x0/0x11a
> initcall anon_inode_init+0x0/0x11a returned 0 after 0 msecs
> calling pcie_aspm_init+0x0/0x27
> initcall pcie_aspm_init+0x0/0x27 returned 0 after 0 msecs
> calling acpi_event_init+0x0/0x57
> [... hard hang ...]
>
> on a good bootup, it would continue like this:
>
> initcall acpi_event_init+0x0/0x57 returned 0 after 38 msecs
> calling pnp_system_init+0x0/0x17
> [...]
>
> the config, full bootlog and reproducer bzImage is at:
>
> http://redhat.com/~mingo/misc/config-Tue_Jun_24_07_44_17_CEST_2008.bad
> http://redhat.com/~mingo/misc/log-Tue_Jun_24_07_44_17_CEST_2008.bad
> http://redhat.com/~mingo/misc/bzImage-Tue_Jun_24_07_44_17_CEST_2008.bad
>
> changing CONFIG_NR_CPUS from 4096 to 8 causes the system to boot up
> fine.
>
> Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-06-24 22:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-20 2:51 Jack Steiner
2008-06-20 10:27 ` Ingo Molnar
2008-06-20 10:39 ` Ingo Molnar
2008-06-24 10:24 ` [bug] " Ingo Molnar
2008-06-24 22:03 ` Jack Steiner [this message]
2008-06-25 20:56 ` Jack Steiner
2008-06-26 12:32 ` Ingo Molnar
2008-06-26 12:41 ` Jack Steiner
2008-06-26 22:38 ` Paul E. McKenney
2008-06-26 22:58 ` Jack Steiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080624220335.GA8039@sgi.com \
--to=steiner@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox