Hi, We now succeeded in reproducing the freeze with a minimal set of running tasks and modules: - gettys - ssh - fillmem.py python script to fill memory (respawned by init when killed by oom-killer) - wget: to introduce network traffic (respawned by init when killed by oom-killer) - syslog The freeze was introduced some time after we loaded the md-crypt and mptsas modules (without mounting any drives). Before we loaded these modules the system was running with the same set of processes for several hours and did not freeze. Some modules like xfs, ipmi, ext2, ext3 were compiled in kernel and not as module. putty_before_freeze: huge logging before we noticed the freeze putty_task_trace_frozen: State information in frozen state putty_blocked_trace: blocked task list, backtrace of CPUs and memory info in frozen state We were not able to kill any tasks with SysRq-kIll. Kind regards, Peter Trekels QuESD nv. -----Original Message----- From: Arnout Vandecappelle [mailto:arnout@mind.be] Sent: Friday, October 16, 2009 11:17 AM To: Andrew Morton Cc: bugzilla-daemon@bugzilla.kernel.org; linux-mm@kvack.org; Peter Trekels Subject: Re: [Bug 14403] New: Kernel freeze when going out of memory On Friday 16 Oct 2009 01:33:45 Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via > the bugzilla web interface). > > On Wed, 14 Oct 2009 11:44:08 GMT > > bugzilla-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=14403 > > > > Summary: Kernel freeze when going out of memory > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 2.6.24.6 through 2.6.31.1 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: high > > Priority: P1 > > Component: Other > > AssignedTo: akpm@linux-foundation.org > > ReportedBy: arnout@mind.be > > CC: arnout@mind.be > > Regression: No > > > > > > Created an attachment (id=23404) > > --> (http://bugzilla.kernel.org/attachment.cgi?id=23404) > > console log during freeze (bzip2) > > > > I get very frequent kernel freezes on two of my systems when they go > > out of memory. This happens with all kernels I tried (2.6.24 > > through 2.6.31). These systems run a set of applications that > > occupy most of the memory, they have no swap space, and they have > > very high network and disk activity (xfs). The network chip varies (tg3, bnx2, r8169). > > > > Symptoms are that no user processes make any progress, though SysRq > > interaction is still possible. SysRq-I recovers the system (init > > starts new gettys). > > > > During the freeze, there are a lot of page allocation failures from > > the network interrupt handler. There doesn't seem to be any > > invocation of the OOM killer (I can't find any 'kill process ... > > score ...' messages), although before the freeze the OOM killer is > > usually called successfully a couple of times. Note that the killed > > processes are restarted soon after (but with lower memory consumption). > > > > During the freeze, pinging and arping the system is (usually) still > > possible. There is very little traffic on the network interface, > > most of it is broadcast. There are also TCP ACKs still going around. > > The amount of page allocation failures seems to correspond more or > > less with the amount of traffic on the interface, but it's hard to > > be sure (serial line has delay and printks are not timestamped). > > Still, some skb allocations must be successful or the ping would never get a reply. > > > > Manual invocation of the OOM killer doesn't seem to do anything > > (nothing is killed, no memory is freed). > > > > Attached is a long log taken over the serial console. In the > > beginning there are some invocations of the OOM killer which bring > > userspace back (as can be seen from the syslog output that appears after a while). > > Then, while the system is frozen there is a continuous stream of > > page allocation failures (2158 in this hour). This log corresponds > > to about 1 hour of frozen time (from 11:48 till 12:47). In this > > time I did a couple of SysRq-T's, a SysRq-F with no results, a > > SysRq-E with no results (not surprising since userspace is never > > invoked), and finally a SysRq-I where the SysRq-M immediately before and after show that it was successful. > > > > About the memory usage: 620MB is due to files in tmpfs that I > > created in order to trigger the out of memory situation sooner. > > It would help if we could see the result of the sysrq-t output when > the kernel is frozen. > > - enable and configure a serial console or netconsole > (Documentation/networking/netconsole.txt) > > - boot with log_buf_len=1M > > - run `dmesg -n 7' > > - freeze the kernel > > - hit sysrq-t > > - send us the resulting output. Please don't let it get wordwrapped > by your email client! Hoi, The SysRq-t output was already in my original bug report. For your convenience, I've extracted just the SysRq-T part in the attached log. The output was intermingled with some page allocation failures, but these I've removed again. I've left in a few page allocation failures, hung tasks and a SysRq-l for good measure. I'm now trying to reproduce it with fewer processes and loaded modules. Regards, Arnout -- Arnout Vandecappelle arnout at mind be Senior Embedded Software Architect +32-16-286540 Essensium/Mind http://www.mind.be G.Geenslaan 9, 3001 Leuven, Belgium BE 872 984 063 RPR Leuven LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle GPG fingerprint: D206 D44B 5155 DF98 550D 3F2A 2213 88AA A1C7 C933