* 2.5.42-mm2 hangs system
@ 2002-10-13 16:04 Henrik Størner
2002-10-13 21:03 ` William Lee Irwin III
[not found] ` <3DA9CA28.155BA5CB@digeo.com>
0 siblings, 2 replies; 21+ messages in thread
From: Henrik Størner @ 2002-10-13 16:04 UTC (permalink / raw)
To: linux-mm
I gave 2.5.42-mm2 a test run yesterday, and it hung the box solid
while doing a kernel compile. The compile stopped dead in the middle
of a file, and there was no response when trying to access another
console (no X running). Alt-sysrq worked, so it wasn't completely dead
- sync/umount/reboot worked.
Nothing in the logs - no oops or other kernel messages.
Rebooted and repeated the experiment with the same result,
so it appears to be reproducible.
Stock 2.5.42 has worked OK for a day now, including kernel
compiles - the system has performed flawlessly for a
couple of years as my normal workstation.
PII processor, 384 MB RAM, SCSI disk (ncr53c8xx driver),
Intel eepro/100 network adapter. Kernel config at
http://www.hswn.dk/config-2.5.42-mm2
--
Henrik Storner <henrik@hswn.dk>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: 2.5.42-mm2 hangs system 2002-10-13 16:04 2.5.42-mm2 hangs system Henrik Størner @ 2002-10-13 21:03 ` William Lee Irwin III [not found] ` <3DA9CA28.155BA5CB@digeo.com> 1 sibling, 0 replies; 21+ messages in thread From: William Lee Irwin III @ 2002-10-13 21:03 UTC (permalink / raw) To: Henrik St?rner; +Cc: linux-mm On Sun, Oct 13, 2002 at 06:04:51PM +0200, Henrik St?rner wrote: > I gave 2.5.42-mm2 a test run yesterday, and it hung the box solid > while doing a kernel compile. The compile stopped dead in the middle > of a file, and there was no response when trying to access another > console (no X running). Alt-sysrq worked, so it wasn't completely dead > - sync/umount/reboot worked. > Nothing in the logs - no oops or other kernel messages. > Rebooted and repeated the experiment with the same result, > so it appears to be reproducible. > Stock 2.5.42 has worked OK for a day now, including kernel > compiles - the system has performed flawlessly for a > couple of years as my normal workstation. > PII processor, 384 MB RAM, SCSI disk (ncr53c8xx driver), > Intel eepro/100 network adapter. Kernel config at > http://www.hswn.dk/config-2.5.42-mm2 Please reproduce and pass on the output from sysrq-t. Bill -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <3DA9CA28.155BA5CB@digeo.com>]
* Re: 2.5.42-mm2 hangs system [not found] ` <3DA9CA28.155BA5CB@digeo.com> @ 2002-10-13 22:33 ` Henrik Størner 2002-10-13 22:57 ` Andrew Morton 2002-10-16 13:09 ` 2.5.42-mm2 hangs system Maneesh Soni 0 siblings, 2 replies; 21+ messages in thread From: Henrik Størner @ 2002-10-13 22:33 UTC (permalink / raw) To: linux-mm On Sun, Oct 13, 2002 at 12:31:52PM -0700, Andrew Morton wrote: > Henrik Storner wrote: > > > > I gave 2.5.42-mm2 a test run yesterday, and it hung the box solid > > while doing a kernel compile. The compile stopped dead in the middle > > of a file, and there was no response when trying to access another > > console (no X running). Alt-sysrq worked, so it wasn't completely dead > > - sync/umount/reboot worked. > > > > Nothing in the logs - no oops or other kernel messages. > > > > Rebooted and repeated the experiment with the same result, > > so it appears to be reproducible. > > > > Stock 2.5.42 has worked OK for a day now, including kernel > > compiles - the system has performed flawlessly for a > > couple of years as my normal workstation. > > > > PII processor, 384 MB RAM, SCSI disk (ncr53c8xx driver), > > Intel eepro/100 network adapter. Kernel config at > > http://www.hswn.dk/config-2.5.42-mm2 > > Very odd. > > If you have time, could you please enable "load all symbols" > in the kernel hacking menu and capture a sysrq-T trace? > Thanks. Did so - built it again from a fresh kernel tree, just to be sure. Compiler is gcc 3.2 from Red Hat 8, by the way. Bug is still there. sysrq-T scrolls off the screen too fast for me to read, but the last screenful has several processes like this (could see sh, make, sh, gcc): Call Trace: sys_wait4+0x209/0x4d0 default_wake_function+0x0/0x40 default_wake_function+0x0/0x40 syscall_call+0x7/0xb The last two tasks: cc1 R d4d74080 20 2232 2231 2233 (NOTLB) Call Trace: work_resched+0x5/0x16 as R d3c778c0 24 2233 2231 2232 (NOTLB) Call Trace: pipe_wait+0x98/0xe0 default_wake_function+0x0/0x40 default_wake_function+0x0/0x40 pipe_read+0xf9/0x240 vfs_read+0xdc/0x150 sys_mmap2+0x9f/0xe0 sys_read+0x3e/0x60 syscall_call+0x7/0xb I captured the ALT+ScrollLock output also: Pid 1739, comm: nfsd EIP 0060:c0160250 CPU:0 EIP is at d_lookup+0x70/0x160 Eflags: 00000297 Not tainted Call Trace cached_lookup+0x1b/0x70 lookup_hash+0x72/0xe0 lookup_one_len+0x5f/0x70 find_exported_dentry+0x61f/0x730 reiserfs_delete_solid_item+0xfd/0x2b0 reiserfs_delete_solid_item+0xfd/0x2b0 check_journal_end+0x18a/0x2b0 rcu_check_callbacks+0x59/0x90 schedule_tick+0x348/0x350 update_process_times+0x46/0x60 reiserfs_decode_fh+0xc2/0x100 nfsd_acceptable+0x0/0xe0 fh_verify+0x38e/0x570 nfsd_acceptable+0x0/0xe0 nsfd_statfs+0x2f/0x70 nfsd3_proc_fsstat+0x37/0xc0 nfs3svc_decode_fhandle+0x38/0xb0 nfsd_dispatch+0xce/0x230 svc_process+0x3f6+0x5e0 nfsd+0x13f/0x250 nfsd+0x0/0x250 kernel_thread_helper+0x5/0x18 If you need the full sysrq-t output, I'll have to setup a serial console to capture it. -- Henrik Storner <henrik@hswn.dk> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 hangs system 2002-10-13 22:33 ` Henrik Størner @ 2002-10-13 22:57 ` Andrew Morton 2002-10-14 12:25 ` 2.5.42-mm2 on small systems Ed Tomlinson 2002-10-16 13:09 ` 2.5.42-mm2 hangs system Maneesh Soni 1 sibling, 1 reply; 21+ messages in thread From: Andrew Morton @ 2002-10-13 22:57 UTC (permalink / raw) To: Henrik Størner; +Cc: linux-mm Henrik Storner wrote: > > I captured the ALT+ScrollLock output also: > > Pid 1739, comm: nfsd > EIP 0060:c0160250 CPU:0 > EIP is at d_lookup+0x70/0x160 > Eflags: 00000297 Not tainted > Call Trace > cached_lookup+0x1b/0x70 > lookup_hash+0x72/0xe0 > lookup_one_len+0x5f/0x70 > find_exported_dentry+0x61f/0x730 > reiserfs_delete_solid_item+0xfd/0x2b0 > reiserfs_delete_solid_item+0xfd/0x2b0 > check_journal_end+0x18a/0x2b0 > rcu_check_callbacks+0x59/0x90 > schedule_tick+0x348/0x350 > update_process_times+0x46/0x60 > reiserfs_decode_fh+0xc2/0x100 > nfsd_acceptable+0x0/0xe0 > fh_verify+0x38e/0x570 > nfsd_acceptable+0x0/0xe0 > nsfd_statfs+0x2f/0x70 > nfsd3_proc_fsstat+0x37/0xc0 > nfs3svc_decode_fhandle+0x38/0xb0 OK. This is possibly dentry hashtable corruption. I saw one instance of this in about 2.5.41-mm3, followed by two other weird random memory corruptions. So it could be that something in there is going for a memory stomp. Don't really know any more than that at this time. I _was_ suspecting oprofile or the latest addition to the shared pagetable code. But you're not using either. It would be interesting to enable all the memory debugging options under the kernel hacking menu, see if that turns anything up. I'll build a kernel with your config and beat on reiserfs for a bit, see if I can make it happen. Apart from that, one way to isolate it is to just keep backing off the patches until it goes away. Which is not a ton of fun. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* 2.5.42-mm2 on small systems 2002-10-13 22:57 ` Andrew Morton @ 2002-10-14 12:25 ` Ed Tomlinson 2002-10-14 14:34 ` Martin J. Bligh 2002-10-15 6:42 ` Andrew Morton 0 siblings, 2 replies; 21+ messages in thread From: Ed Tomlinson @ 2002-10-14 12:25 UTC (permalink / raw) To: Andrew Morton, Bill Davidsen; +Cc: linux-mm Hi, I have an old 486 with 64m and 512M of disk that I use as a serial console. It does not have enough space to be useful for much else. So I decided to test the low end and tried it with 2.5.42-mm2. It boots and seems to work fine. Then I tried the resp1 (http://pages.prodigy.net/davidsen/) benchmark. With 2.4.18 it works: Memory size 61 MB Starting 1 CPU run with 61 MB RAM, minimum 5 data points at 20 sec intervals . . . . . . . . _____________ delay ms. ____________ Test low high median average S.D. ratio noload 2128.527 2138.035 2129.915 2131.269 0.003 1.000 smallwrite 4178.129 27436.634 4318.342 11111.745 8.927 5.214 largewrite 4157.574 78592.200 4222.064 16336.681 24.926 7.665 cpuload 6109.576 8018.156 6230.810 6425.307 0.600 3.015 spawnload 5508.218 6934.219 5556.992 5706.077 0.462 2.677 8ctx-mem 10090.974 22222.700 12662.532 13511.634 3.433 6.340 2ctx-mem 9330.010 21106.194 10745.474 11650.974 3.612 5.467 with 2.5.42-mm2 it does not finish. The machine is sort of usable while its runing and control C has no problem ending the program. I waited 11 hours for the spawnload test to complete - it was looking very good before this.... Memory size 61 MB Starting 1 CPU run with 61 MB RAM, minimum 5 data points at 20 sec intervals . . . . . . . . _____________ delay ms. ____________ Test low high median average S.D. ratio noload 2262.747 2269.895 2264.050 2264.796 0.002 1.000 smallwrite 3797.901 12132.336 3875.934 5364.276 2.815 2.369 largewrite 3857.445 35682.893 3875.064 8405.061 10.531 3.711 cpuload 5385.148 7589.479 5514.157 5771.985 0.729 2.549 The box was not limited by IO (no swapping nor was there much bi/bo in vmstat). About 25% User and 75% system in cpu though. Ed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 on small systems 2002-10-14 12:25 ` 2.5.42-mm2 on small systems Ed Tomlinson @ 2002-10-14 14:34 ` Martin J. Bligh 2002-10-14 21:24 ` Bill Davidsen 2002-10-15 6:42 ` Andrew Morton 1 sibling, 1 reply; 21+ messages in thread From: Martin J. Bligh @ 2002-10-14 14:34 UTC (permalink / raw) To: Ed Tomlinson, Andrew Morton, Bill Davidsen; +Cc: linux-mm > I have an old 486 with 64m and 512M of disk that I use as a serial ... > with 2.5.42-mm2 it does not finish. The machine is sort of usable > while its runing and control C has no problem ending the program. > I waited 11 hours for the spawnload test to complete - it was What does spawnload do (for those of us who don't have the inclination to go source diving)? M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 on small systems 2002-10-14 14:34 ` Martin J. Bligh @ 2002-10-14 21:24 ` Bill Davidsen 0 siblings, 0 replies; 21+ messages in thread From: Bill Davidsen @ 2002-10-14 21:24 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Ed Tomlinson, Andrew Morton, linux-mm On Mon, 14 Oct 2002, Martin J. Bligh wrote: > > > I have an old 486 with 64m and 512M of disk that I use as a serial > ... > > with 2.5.42-mm2 it does not finish. The machine is sort of usable > > while its runing and control C has no problem ending the program. > > I waited 11 hours for the spawnload test to complete - it was > > What does spawnload do (for those of us who don't have the inclination > to go source diving)? In this case a half scree of source diving is the best answer, it forks a process which fork/exec's a shell, which either runs the builtin pwd or /bin/pwd depending on what shell you have set. In most cases that's bash, and uses the builtin. Does a bunch of process creation and cleanup, and can generate some impressive contet switching. while (RunMe) { if (pid = fork()) { (void)wait(); NumFork++; } else { // Do a 2nd level fork/exec a few times system("pwd >/dev/null"); exit(0); } I will say that I ran 41-mm2 and 41-mm2v (Con Kolivas' patch) just fine, I can't get 5.42 anything to even build, it's looking for NLS and the config has no NLS, unless I have a bad patch. I'm going to scan the list for patches later, but that's my current eperience. The README (choose text, Postscript or HTML) has a description of what each test does. Or what I think it does. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 on small systems 2002-10-14 12:25 ` 2.5.42-mm2 on small systems Ed Tomlinson 2002-10-14 14:34 ` Martin J. Bligh @ 2002-10-15 6:42 ` Andrew Morton 2002-10-16 20:55 ` Bill Davidsen 1 sibling, 1 reply; 21+ messages in thread From: Andrew Morton @ 2002-10-15 6:42 UTC (permalink / raw) To: Ed Tomlinson; +Cc: Bill Davidsen, linux-mm Ed Tomlinson wrote: > > ... > > with 2.5.42-mm2 it does not finish. The machine is sort of usable while its runing > and control C has no problem ending the program. I waited 11 hours for the spawnload > test to complete - it was looking very good before this.... > > Memory size 61 MB > Starting 1 CPU run with 61 MB RAM, minimum 5 data points at 20 sec intervals > > . . . . . . . . > _____________ delay ms. ____________ > Test low high median average S.D. ratio > noload 2262.747 2269.895 2264.050 2264.796 0.002 1.000 > smallwrite 3797.901 12132.336 3875.934 5364.276 2.815 2.369 > largewrite 3857.445 35682.893 3875.064 8405.061 10.531 3.711 > cpuload 5385.148 7589.479 5514.157 5771.985 0.729 2.549 > > The box was not limited by IO (no swapping nor was there much bi/bo in > vmstat). About 25% User and 75% system in cpu though. hm. Works for me. The default setting are waaay too boring, so I used ./resp -m2 -M5 -w5 Test low high median average median avg noload 143.168 149.676 143.258 145.602 1.000 1.000 smallwrite 144.319 4350.325 269.161 1428.881 1.879 9.814 largewrite 230.759 1129.816 492.421 539.192 3.437 3.703 cpuload 142.833 207.206 143.374 159.036 1.001 1.092 spawnload 143.066 313.944 143.240 177.391 1.000 1.218 8ctx-mem 159.396 5823.791 810.837 2020.066 5.660 13.874 2ctx-mem 757.203 8192.148 1294.120 2538.975 9.033 17.438 Could be a scheduler thing? Maybe a bug in the test? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 on small systems 2002-10-15 6:42 ` Andrew Morton @ 2002-10-16 20:55 ` Bill Davidsen 2002-10-16 22:43 ` Ed Tomlinson 0 siblings, 1 reply; 21+ messages in thread From: Bill Davidsen @ 2002-10-16 20:55 UTC (permalink / raw) To: Andrew Morton; +Cc: Ed Tomlinson, linux-mm On Mon, 14 Oct 2002, Andrew Morton wrote: > hm. Works for me. The default setting are waaay too boring, so > I used ./resp -m2 -M5 -w5 The problem with reducing the sleep is that it hides a kernel which is swappy, since there isn't time to build up a big backlog of disk writes, and the swap doesn't seem to happen right away. And I often see jackpot cases which are less likely to happen if you reduce the number of tests. Again it makes the kernel look good, but may not reflect what's really happening. I agree that it's slow, I've been debugging it for several weeks now, but every time I think I've got the corner cases cornered I find another corner. The next version will add -R to set the retry max count, because some kernels don't recover from one test and return no resources on fork() because they haven't cleaned up all terminated processes. This was intended to be a simple test of how the kernel feels, and it is that, but some kernels I've tried get to one test or another and shit the bed every time. It's not a stress test! How can I get my numbers if the kernel keeps hanging solid? ;-) -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 on small systems 2002-10-16 20:55 ` Bill Davidsen @ 2002-10-16 22:43 ` Ed Tomlinson 0 siblings, 0 replies; 21+ messages in thread From: Ed Tomlinson @ 2002-10-16 22:43 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-mm On October 16, 2002 04:55 pm, Bill Davidsen wrote: > On Mon, 14 Oct 2002, Andrew Morton wrote: > > hm. Works for me. The default setting are waaay too boring, so > > I used ./resp -m2 -M5 -w5 > This was intended to be a simple test of how the kernel feels, and it is > that, but some kernels I've tried get to one test or another and shit the > bed every time. It's not a stress test! How can I get my numbers if the > kernel keeps hanging solid? ;-) You add sufficient tracing so you can find were it hangs... And report it so it can get fixed. IMHO, while not a stress test, it can put stress on the kernel - it needs to to test the interactive response. Still trying to figure out what is happening on my 64m 486. Thanks for the interesting benchmark. Ed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 hangs system 2002-10-13 22:33 ` Henrik Størner 2002-10-13 22:57 ` Andrew Morton @ 2002-10-16 13:09 ` Maneesh Soni 2002-10-16 15:49 ` Henrik Størner 1 sibling, 1 reply; 21+ messages in thread From: Maneesh Soni @ 2002-10-16 13:09 UTC (permalink / raw) To: Henrik Størner; +Cc: linux-mm, akpm, Dipankar Sarma On Sun, Oct 13, 2002 at 10:34:40PM +0000, Henrik Storner wrote: > On Sun, Oct 13, 2002 at 12:31:52PM -0700, Andrew Morton wrote: > > Henrik Storner wrote: > > > > > > I gave 2.5.42-mm2 a test run yesterday, and it hung the box solid > > > while doing a kernel compile. The compile stopped dead in the middle > > > of a file, and there was no response when trying to access another > > > console (no X running). Alt-sysrq worked, so it wasn't completely dead > > > - sync/umount/reboot worked. > > > > > > Nothing in the logs - no oops or other kernel messages. > > > > > > Rebooted and repeated the experiment with the same result, > > > so it appears to be reproducible. > > > > > > Stock 2.5.42 has worked OK for a day now, including kernel > > > compiles - the system has performed flawlessly for a > > > couple of years as my normal workstation. > > > > > > PII processor, 384 MB RAM, SCSI disk (ncr53c8xx driver), > > > Intel eepro/100 network adapter. Kernel config at > > > http://www.hswn.dk/config-2.5.42-mm2 > > > > Very odd. > > > > If you have time, could you please enable "load all symbols" > > in the kernel hacking menu and capture a sysrq-T trace? > > Thanks. > > Did so - built it again from a fresh kernel tree, just to be sure. > Compiler is gcc 3.2 from Red Hat 8, by the way. > > Bug is still there. sysrq-T scrolls off the screen too fast for me to > read, but the last screenful has several processes like this (could > see sh, make, sh, gcc): > > Call Trace: Hello Henrik, I tired recreating the hang, but it didnot occur. I could guess from the call trace that you are using reiserfs and nfs but I not very clear how are you recreating it. I created a resierfs partition and exported it. Then tried to compile a kernel over it. I used the config file from the site you mentioned. It will be nice if you can list the exact recreation steps mentioning the filesystems you are using. As the hang looks like a loop in d_lookup can you try recreating it *without* dcache_rcu.patch. You can backout this patch http://www.zipworld.com.au/~akpm/linux/patches/2.5/2.5.42/2.5.42-mm2/broken-out/dcache_rcu.patch Thanks Maneesh -- Maneesh Soni IBM Linux Technology Center, IBM India Software Lab, Bangalore. Phone: +91-80-5044999 email: maneesh@in.ibm.com http://lse.sourceforge.net/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 hangs system 2002-10-16 13:09 ` 2.5.42-mm2 hangs system Maneesh Soni @ 2002-10-16 15:49 ` Henrik Størner 2002-10-16 18:59 ` Henrik Størner 2002-10-17 14:38 ` Maneesh Soni 0 siblings, 2 replies; 21+ messages in thread From: Henrik Størner @ 2002-10-16 15:49 UTC (permalink / raw) To: Maneesh Soni; +Cc: linux-mm, akpm, Dipankar Sarma Hi Maneesh, sorry about not getting back with more info sooner. Daytime jobs can be all-consuming. I tried doing what Andrew suggested, and enabling all memory debugging options. This did not produce anything. The setup here: Workstation where I see the problem is a PII/350, 392 MB RAM and some swap. Just about all the software packages are from Red Hat 8 (recently upgraded from a 7.x installation). SCSI disk off an Symbios Logic 53c875 controller is used for Linux. There is an IDE disk in the system and the kernel has support for it, but it is not used normally (nothing mounted). Network is with an Intel eepro100 adapter, gets an IP via DHCP. root-fs is a local filesystem on the scsi disk, reiserfs formatted. /home is NFS-mounted from a Linux server running kernel 2.4.19 The kernel sources are located in /usr/src which is on the local (combined root+usr) filesystem, but I normally go there via a symlink in my home-dir, ~/kernel/linux-2.5-mm/ is the directory for the 2.5+mm directory I use. The system runs apmd, atd, crond, autofs (for mounting /home), gpm, lpd, nfs-server (the /usr/src directory is exported), nfs-client, ntpd, portmap, sshd, xfs and xinetd. A DHCP client is also running. No X server has been running while I've tested these hangs. To recreate it, I've booted up the 2.5.2-mm2 kernel, starting up all the normal services. Log in (automounts home directory), cd ~/kernel/linux-2.5-mm, make oldconfig, make clean, make The system then hangs after a few minutes of working through the kernel compile. Not the same place everytime. I've got some time tonight, so I will try un-doing the patch you mention and see if that changes anything. Thanks, Henrik On Wed, Oct 16, 2002 at 06:39:07PM +0530, Maneesh Soni wrote: > On Sun, Oct 13, 2002 at 10:34:40PM +0000, Henrik Storner wrote: > > On Sun, Oct 13, 2002 at 12:31:52PM -0700, Andrew Morton wrote: > > > Henrik Storner wrote: > > > > > > > > I gave 2.5.42-mm2 a test run yesterday, and it hung the box solid > > > > while doing a kernel compile. The compile stopped dead in the middle > > > > of a file, and there was no response when trying to access another > > > > console (no X running). Alt-sysrq worked, so it wasn't completely dead > > > > - sync/umount/reboot worked. > > > > > > > > Nothing in the logs - no oops or other kernel messages. > > > > > > > > Rebooted and repeated the experiment with the same result, > > > > so it appears to be reproducible. > > > > > > > > Stock 2.5.42 has worked OK for a day now, including kernel > > > > compiles - the system has performed flawlessly for a > > > > couple of years as my normal workstation. > > > > > > > > PII processor, 384 MB RAM, SCSI disk (ncr53c8xx driver), > > > > Intel eepro/100 network adapter. Kernel config at > > > > http://www.hswn.dk/config-2.5.42-mm2 > > > > > > Very odd. > > > > > > If you have time, could you please enable "load all symbols" > > > in the kernel hacking menu and capture a sysrq-T trace? > > > Thanks. > > > > Did so - built it again from a fresh kernel tree, just to be sure. > > Compiler is gcc 3.2 from Red Hat 8, by the way. > > > > Bug is still there. sysrq-T scrolls off the screen too fast for me to > > read, but the last screenful has several processes like this (could > > see sh, make, sh, gcc): > > > > Call Trace: > > Hello Henrik, > > I tired recreating the hang, but it didnot occur. I could guess from the > call trace that you are using reiserfs and nfs but I not very clear how > are you recreating it. I created a resierfs partition and exported it. Then > tried to compile a kernel over it. I used the config file from the site > you mentioned. > > It will be nice if you can list the exact recreation steps mentioning the > filesystems you are using. > > As the hang looks like a loop in d_lookup can you try > recreating it *without* dcache_rcu.patch. You can backout this patch > > http://www.zipworld.com.au/~akpm/linux/patches/2.5/2.5.42/2.5.42-mm2/broken-out/dcache_rcu.patch > > > Thanks > Maneesh > > -- > Maneesh Soni > IBM Linux Technology Center, > IBM India Software Lab, Bangalore. > Phone: +91-80-5044999 email: maneesh@in.ibm.com > http://lse.sourceforge.net/ > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ -- Henrik Storner <henrik@hswn.dk> Hvis du vil have god, palidelig info om Open Source og Linux, sa overvej at stotte Linux Weekly News med et abonnement. http://lwn.net/Articles/10688/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 hangs system 2002-10-16 15:49 ` Henrik Størner @ 2002-10-16 18:59 ` Henrik Størner 2002-10-16 19:31 ` Dipankar Sarma 2002-10-30 9:48 ` [FIX] " Maneesh Soni 2002-10-17 14:38 ` Maneesh Soni 1 sibling, 2 replies; 21+ messages in thread From: Henrik Størner @ 2002-10-16 18:59 UTC (permalink / raw) To: Maneesh Soni; +Cc: linux-mm, akpm, Dipankar Sarma Hi Maneesh, On Wed, Oct 16, 2002 at 05:49:43PM +0200, Henrik Storner wrote: > On Wed, Oct 16, 2002 at 06:39:07PM +0530, Maneesh Soni wrote: > > As the hang looks like a loop in d_lookup can you try > > recreating it *without* dcache_rcu.patch. You can backout this patch > > > > http://www.zipworld.com.au/~akpm/linux/patches/2.5/2.5.42/2.5.42-mm2/broken-out/dcache_rcu.patch > > > I've got some time tonight, so I will try un-doing the patch you > mention and see if that changes anything. well you hit the nail right on the head there. I've just been running the 2.5.42-mm2 kernel except for the dcache_rcu patch for a full hour, and I was unable to reproduce the hangs that I saw with the full -mm2 patch installed. Did two full kernel builds while reading some mail and doing other stuff - no problems what so ever. Just to be sure, I re-applied the dcache_rcu patch, rebuilt the kernel, booted with the kernel containing dcache_rcu patch, and the system died within a few minutes. So it is definitely something in the dcache_rcu patch that does it. -- Henrik Storner <henrik@hswn.dk> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 hangs system 2002-10-16 18:59 ` Henrik Størner @ 2002-10-16 19:31 ` Dipankar Sarma 2002-10-16 19:43 ` Andrew Morton 2002-10-30 9:48 ` [FIX] " Maneesh Soni 1 sibling, 1 reply; 21+ messages in thread From: Dipankar Sarma @ 2002-10-16 19:31 UTC (permalink / raw) To: Henrik Størner; +Cc: Maneesh Soni, linux-mm, akpm On Wed, Oct 16, 2002 at 08:59:08PM +0200, Henrik Storner wrote: > well you hit the nail right on the head there. > > I've just been running the 2.5.42-mm2 kernel except for the dcache_rcu > patch for a full hour, and I was unable to reproduce the hangs that I > saw with the full -mm2 patch installed. Did two full kernel builds > while reading some mail and doing other stuff - no problems what so > ever. > > Just to be sure, I re-applied the dcache_rcu patch, rebuilt the > kernel, booted with the kernel containing dcache_rcu patch, > and the system died within a few minutes. > > So it is definitely something in the dcache_rcu patch that does it. Well, I am not quite sure of this yet. Maneesh pointed out this earlier - In this machine with 2.5.42-mm2 and no dcache_rcu, (with your .config), we see this - [root@llm04 dbench]# df Filesystem 1k-blocks Used Available Use% Mounted on /dev/sda6 1004024 461168 491852 49% / /dev/sda1 505605 38348 441153 8% /boot /dev/sda5 2514172 1791560 594900 76% /usr none 257532 0 257532 0% /dev/shm /dev/sdb5 6324896 23996 5979604 1% /mnt/sdb5 llm04:/mnt/sdb5 6324896 23968 5979616 1% /mnt/sdc1 /dev/sda2 9068648 3993040 4614948 47% /home [root@llm04 dbench]# pwd /mnt/sdc1/dbench root@llm04 dbench]# ./dbench 4 4 clients started ..........................................................................................................................................rmdir CLIENTS/CLIENT2/~DMTMP/WORDPRO failed (Directory not empty) rmdir CLIENTS/CLIENT2/~DMTMP/PARADOX failed (Directory not empty) rmdir CLIENTS/CLIENT2/~DMTMP failed (Directory not empty) +.......rmdir CLIENTS/CLIENT0/~DMTMP/WORDPRO failed (Directory not empty) rmdir CLIENTS/CLIENT0/~DMTMP/PARADOX failed (Directory not empty) .rmdir CLIENTS/CLIENT0/~DMTMP failed (Directory not empty) +.rmdir CLIENTS/CLIENT3/~DMTMP/WORDPRO failed (Directory not empty) rmdir CLIENTS/CLIENT3/~DMTMP/PARADOX failed (Directory not empty) rmdir CLIENTS/CLIENT3/~DMTMP failed (Directory not empty) +.rmdir CLIENTS/CLIENT1/~DMTMP/WORDPRO failed (Directory not empty) rmdir CLIENTS/CLIENT1/~DMTMP/PARADOX failed (Directory not empty) rmdir CLIENTS/CLIENT1/~DMTMP failed (Directory not empty) +**** Throughput 36.6733 MB/sec (NB=45.8417 MB/sec 366.733 MBit/sec) This needs more investigation. I would be really supprised if dcache_rcu has any effect on UP code. Thanks -- Dipankar Sarma <dipankar@in.ibm.com> http://lse.sourceforge.net Linux Technology Center, IBM Software Lab, Bangalore, India. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 hangs system 2002-10-16 19:31 ` Dipankar Sarma @ 2002-10-16 19:43 ` Andrew Morton 2002-10-16 20:05 ` Dipankar Sarma 0 siblings, 1 reply; 21+ messages in thread From: Andrew Morton @ 2002-10-16 19:43 UTC (permalink / raw) To: dipankar; +Cc: Henrik Størner, Maneesh Soni, linux-mm Dipankar Sarma wrote: > > On Wed, Oct 16, 2002 at 08:59:08PM +0200, Henrik Storner wrote: > > well you hit the nail right on the head there. > > > > I've just been running the 2.5.42-mm2 kernel except for the dcache_rcu > > patch for a full hour, and I was unable to reproduce the hangs that I > > saw with the full -mm2 patch installed. Did two full kernel builds > > while reading some mail and doing other stuff - no problems what so > > ever. > > > > Just to be sure, I re-applied the dcache_rcu patch, rebuilt the > > kernel, booted with the kernel containing dcache_rcu patch, > > and the system died within a few minutes. > > > > So it is definitely something in the dcache_rcu patch that does it. > > Well, I am not quite sure of this yet. Maneesh pointed out this earlier - > In this machine with 2.5.42-mm2 and no dcache_rcu, (with your .config), > we see this - > > [root@llm04 dbench]# df > Filesystem 1k-blocks Used Available Use% Mounted on > /dev/sda6 1004024 461168 491852 49% / > /dev/sda1 505605 38348 441153 8% /boot > /dev/sda5 2514172 1791560 594900 76% /usr > none 257532 0 257532 0% /dev/shm > /dev/sdb5 6324896 23996 5979604 1% /mnt/sdb5 > llm04:/mnt/sdb5 6324896 23968 5979616 1% /mnt/sdc1 > /dev/sda2 9068648 3993040 4614948 47% /home > [root@llm04 dbench]# pwd > /mnt/sdc1/dbench > root@llm04 dbench]# ./dbench 4 > 4 clients started > ..........................................................................................................................................rmdir CLIENTS/CLIENT2/~DMTMP/WORDPRO failed (Directory not empty) > rmdir CLIENTS/CLIENT2/~DMTMP/PARADOX failed (Directory not empty) Is this dbench-on-NFS? That has always failed - it's to do with the funny NFS handling of unlinked-while-open files. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 hangs system 2002-10-16 19:43 ` Andrew Morton @ 2002-10-16 20:05 ` Dipankar Sarma 0 siblings, 0 replies; 21+ messages in thread From: Dipankar Sarma @ 2002-10-16 20:05 UTC (permalink / raw) To: Andrew Morton; +Cc: Henrik Størner, Maneesh Soni, linux-mm On Wed, Oct 16, 2002 at 12:43:06PM -0700, Andrew Morton wrote: > Is this dbench-on-NFS? That has always failed - it's to do > with the funny NFS handling of unlinked-while-open files. Yes, it was. I guess the thing to do would be to investigate NFS with dcache_rcu and see where the don't mix. IIRC, this combination was tested a while ago, maybe 2.5.2x timeframe. We'll see. Thanks -- Dipankar Sarma <dipankar@in.ibm.com> http://lse.sourceforge.net Linux Technology Center, IBM Software Lab, Bangalore, India. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* [FIX] Re: 2.5.42-mm2 hangs system 2002-10-16 18:59 ` Henrik Størner 2002-10-16 19:31 ` Dipankar Sarma @ 2002-10-30 9:48 ` Maneesh Soni 2002-10-31 7:54 ` Henrik Størner 1 sibling, 1 reply; 21+ messages in thread From: Maneesh Soni @ 2002-10-30 9:48 UTC (permalink / raw) To: Henrik Størner Hello Henrik, I hope the following patch should solve your problem. The patch is made over 2.5.44-mm6 kernel. The problem was due to anonymous dentries getting connected with DCACHE_UNHASHED flag set. diff -urN linux-2.5.44-mm6/fs/dcache.c linux-2.5.44-mm6-fix/fs/dcache.c --- linux-2.5.44-mm6/fs/dcache.c Wed Oct 30 14:42:33 2002 +++ linux-2.5.44-mm6-fix/fs/dcache.c Wed Oct 30 13:13:43 2002 @@ -788,12 +788,15 @@ res = tmp; tmp = NULL; if (res) { + spin_lock(&res->d_lock); res->d_sb = inode->i_sb; res->d_parent = res; res->d_inode = inode; res->d_flags |= DCACHE_DISCONNECTED; + res->d_vfs_flags &= ~DCACHE_UNHASHED; list_add(&res->d_alias, &inode->i_dentry); list_add(&res->d_hash, &inode->i_sb->s_anon); + spin_unlock(&res->d_lock); } inode = NULL; /* don't drop reference */ } Regards, Maneesh On Wed, Oct 16, 2002 at 07:03:14PM +0000, Henrik Storner wrote: > Hi Maneesh, > > On Wed, Oct 16, 2002 at 05:49:43PM +0200, Henrik Storner wrote: > > On Wed, Oct 16, 2002 at 06:39:07PM +0530, Maneesh Soni wrote: > > > As the hang looks like a loop in d_lookup can you try > > > recreating it *without* dcache_rcu.patch. You can backout this patch > > > > > > http://www.zipworld.com.au/~akpm/linux/patches/2.5/2.5.42/2.5.42-mm2/broken-out/dcache_rcu.patch > > > > > I've got some time tonight, so I will try un-doing the patch you > > mention and see if that changes anything. > > well you hit the nail right on the head there. > > I've just been running the 2.5.42-mm2 kernel except for the dcache_rcu > patch for a full hour, and I was unable to reproduce the hangs that I > saw with the full -mm2 patch installed. Did two full kernel builds > while reading some mail and doing other stuff - no problems what so > ever. > > Just to be sure, I re-applied the dcache_rcu patch, rebuilt the > kernel, booted with the kernel containing dcache_rcu patch, > and the system died within a few minutes. > > So it is definitely something in the dcache_rcu patch that does it. > > -- > Henrik Storner <henrik@hswn.dk> -- Maneesh Soni IBM Linux Technology Center, IBM India Software Lab, Bangalore. Phone: +91-80-5044999 email: maneesh@in.ibm.com http://lse.sourceforge.net/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [FIX] Re: 2.5.42-mm2 hangs system 2002-10-30 9:48 ` [FIX] " Maneesh Soni @ 2002-10-31 7:54 ` Henrik Størner 0 siblings, 0 replies; 21+ messages in thread From: Henrik Størner @ 2002-10-31 7:54 UTC (permalink / raw) To: Maneesh Soni; +Cc: linux-mm Hi Maneesh, On Wed, Oct 30, 2002 at 03:18:46PM +0530, Maneesh Soni wrote: > Hello Henrik, > > I hope the following patch should solve your problem. The patch is made > over 2.5.44-mm6 kernel. The problem was due to anonymous dentries getting > connected with DCACHE_UNHASHED flag set. the patch does fix the sudden halts that I was seeing with 2.5.42-mm2. The system has now survived about 10 successive kernel compiles and it is still running. There are a couple of odd things going on, though - but I don't know for sure if they are related to the mm patch or not. I am seeing these messages regularly - disk activity seems to provoke them. Oct 30 23:14:44 osiris kernel: bad: scheduling while atomic! Oct 30 23:14:44 osiris kernel: Call Trace: Oct 30 23:14:44 osiris kernel: [do_schedule+763/768] do_schedule+0x2fb/0x300 Oct 30 23:14:44 osiris kernel: [<c011973b>] do_schedule+0x2fb/0x300 Oct 30 23:14:44 osiris kernel: [kswapd+236/284] kswapd+0xec/0x11c Oct 30 23:14:44 osiris kernel: [<c013bd9c>] kswapd+0xec/0x11c Oct 30 23:14:44 osiris kernel: [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 Oct 30 23:14:44 osiris kernel: [<c011ae70>] autoremove_wake_function+0x0/0x50 Oct 30 23:14:44 osiris kernel: [preempt_schedule+54/80] preempt_schedule+0x36/0x50 Oct 30 23:14:44 osiris kernel: [<c0119776>] preempt_schedule+0x36/0x50 Oct 30 23:14:44 osiris kernel: [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 Oct 30 23:14:44 osiris kernel: [<c011ae70>] autoremove_wake_function+0x0/0x50 Oct 30 23:14:44 osiris kernel: [kswapd+0/284] kswapd+0x0/0x11c Oct 30 23:14:44 osiris kernel: [<c013bcb0>] kswapd+0x0/0x11c Oct 30 23:14:44 osiris kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18 Oct 30 23:14:44 osiris kernel: [<c01074cd>] kernel_thread_helper+0x5/0x18 Oct 30 23:14:44 osiris kernel: And one full blown Oops apparently when I tried to login to an X session (I use KDE for the desktop): Oct 31 08:38:11 osiris kernel: Unable to handle kernel paging request at virtual address 4172f058 Oct 31 08:38:11 osiris kernel: printing eip: Oct 31 08:38:11 osiris kernel: 083b80d4 Oct 31 08:38:11 osiris kernel: *pde = 06437067 Oct 31 08:38:11 osiris kernel: *pte = 00000000 Oct 31 08:38:11 osiris kernel: Oops: 0006 Oct 31 08:38:11 osiris kernel: eepro100 mii sb sb_lib uart401 sound soundcore Oct 31 08:38:11 osiris kernel: CPU: 0 Oct 31 08:38:11 osiris kernel: EIP: 0023:[serport_exit+138115172/-1072695408] Not tainted Oct 31 08:38:11 osiris kernel: EIP: 0023:[<083b80d4>] Not tainted Oct 31 08:38:11 osiris kernel: EFLAGS: 00013206 Oct 31 08:38:11 osiris kernel: eax: 0021449c ebx: 4172f058 ecx: 00000000 edx: 00000000 Oct 31 08:38:11 osiris kdm[8787]: Server for display :0 terminated unexpectedly Oct 31 08:38:11 osiris kernel: esi: 088674dc edi: 0021449c ebp: 00000002 esp: bffff58c Oct 31 08:38:12 osiris kernel: ds: 002b es: 002b ss: 002b Oct 31 08:38:12 osiris kernel: Process X (pid: 25678, threadinfo=d1f54000 task=d675cce0) Oct 31 08:38:12 osiris kernel: <6>note: X[25678] exited with preempt_count 2 Oct 31 08:38:12 osiris kernel: Debug: sleeping function called from illegal context at include/asm/semaphore.h:119 Oct 31 08:38:12 osiris kernel: Call Trace: Oct 31 08:38:12 osiris kernel: [shm_close+48/192] shm_close+0x30/0xc0 Oct 31 08:38:12 osiris kernel: [<c0200190>] shm_close+0x30/0xc0 Oct 31 08:38:12 osiris kernel: [exit_mmap+214/224] exit_mmap+0xd6/0xe0 Oct 31 08:38:12 osiris kernel: [<c0133146>] exit_mmap+0xd6/0xe0 Oct 31 08:38:12 osiris kernel: [mmput+78/160] mmput+0x4e/0xa0 Oct 31 08:38:12 osiris kernel: [<c011b10e>] mmput+0x4e/0xa0 Oct 31 08:38:12 osiris kernel: [do_exit+197/688] do_exit+0xc5/0x2b0 Oct 31 08:38:12 osiris kernel: [<c0120aa5>] do_exit+0xc5/0x2b0 Oct 31 08:38:12 osiris kernel: [die+134/144] die+0x86/0x90 Oct 31 08:38:12 osiris kernel: [<c010a456>] die+0x86/0x90 Oct 31 08:38:12 osiris kernel: [do_page_fault+358/1268] do_page_fault+0x166/0x4f4 Oct 31 08:38:12 osiris kernel: [<c0118006>] do_page_fault+0x166/0x4f4 Oct 31 08:38:12 osiris kernel: [vfs_read+230/320] vfs_read+0xe6/0x140 Oct 31 08:38:12 osiris kernel: [<c0149cf6>] vfs_read+0xe6/0x140 Oct 31 08:38:12 osiris kernel: [sys_setitimer+86/192] sys_setitimer+0x56/0x160 Oct 31 08:38:12 osiris kernel: [<c0121c16>] sys_setitimer+0x56/0x160 Oct 31 08:38:12 osiris kernel: [sys_read+69/96] sys_read+0x45/0x60 Oct 31 08:38:12 osiris kernel: [<c0149f95>] sys_read+0x45/0x60 Oct 31 08:38:12 osiris kernel: [do_page_fault+0/1268] do_page_fault+0x0/0x4f4 Oct 31 08:38:12 osiris kernel: [<c0117ea0>] do_page_fault+0x0/0x4f4 Oct 31 08:38:12 osiris kernel: [error_code+45/56] error_code+0x2d/0x38 Oct 31 08:38:12 osiris kernel: [<c0109e75>] error_code+0x2d/0x38 Oct 31 08:38:12 osiris kernel: -- Henrik Storner <henrik@hswn.dk> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.42-mm2 hangs system 2002-10-16 15:49 ` Henrik Størner 2002-10-16 18:59 ` Henrik Størner @ 2002-10-17 14:38 ` Maneesh Soni 2002-10-17 16:14 ` 2.5.43-mm2 gets network connection stuck Sebastian Benoit 1 sibling, 1 reply; 21+ messages in thread From: Maneesh Soni @ 2002-10-17 14:38 UTC (permalink / raw) To: Henrik Størner; +Cc: linux-mm, akpm, Dipankar Sarma On Wed, Oct 16, 2002 at 05:49:43PM +0200, Henrik Storner wrote: > The kernel sources are located in /usr/src which is on the local > (combined root+usr) filesystem, but I normally go there via a > symlink in my home-dir, ~/kernel/linux-2.5-mm/ is the directory > for the 2.5+mm directory I use. > > The system runs apmd, atd, crond, autofs (for mounting /home), gpm, > lpd, nfs-server (the /usr/src directory is exported), nfs-client, > ntpd, portmap, sshd, xfs and xinetd. A DHCP client is also running. > No X server has been running while I've tested these hangs. > > To recreate it, I've booted up the 2.5.2-mm2 kernel, starting up > all the normal services. Log in (automounts home directory), > cd ~/kernel/linux-2.5-mm, make oldconfig, make clean, make > > The system then hangs after a few minutes of working through the > kernel compile. Not the same place everytime. I tried similar setup that is making link to an local reiserfs partition on an NFS mounted partition. NFS server was running on a system with 2.4.19 kernel. I had the following setup [root@llm04 root]# mount /dev/sda6 on / type ext2 (rw) none on /proc type proc (rw) /dev/sda1 on /boot type ext2 (rw) /dev/sda2 on /home type ext2 (rw) /dev/sda5 on /usr type ext2 (rw) none on /dev/shm type tmpfs (rw) /dev/sdc3 on /mnt/sdc3 type reiserfs (rw) /dev/sdb1 on /bm type ext2 (rw) 192.168.1.10:/home/maneesh/test on /mnt/sdc2 type nfs (rw,addr=192.168.1.10) [root@llm04 tmp]# l total 8 drwxr-xr-x 5 nfsnobod nfsnobod 4096 Oct 17 16:35 dbench lrwxrwxrwx 1 root root 10 Oct 17 16:08 dbench-link-to-ext2-local -> /bm/dbench lrwxrwxrwx 1 root root 17 Oct 17 15:03 dbench-link-to-rfs-local -> /mnt/sdc3/dbench/ lrwxrwxrwx 1 root root 23 Oct 17 15:05 linux-2542-link-to-rfs-local -> /mnt/sdc3/linux-2.5.42/ drwxrwxr-x 17 1046 101 4096 Oct 17 14:39 linux-2.5.43 lrwxrwxrwx 1 root root 19 Oct 17 15:08 linux-2543-link-to-ext2-local -> /src1/linux-2.5.43/ With this setup I could run make properly. Even dbench also runs fine if ran through the link. The problem I am seeing is only when I am running dbench directly over the nfs mounted partition (i.e, no sym link). I see dbench giving errors and _sometimes_ hanging the system. Where as if I ran the nfs-server on the same machine like yesterday I see hang occuring all the time. With your setup I didnot see that you don't need nfs-server running. So just to narrow down the problem can you stop nfs-server and then do the make. Thanks Maneesh -- Maneesh Soni IBM Linux Technology Center, IBM India Software Lab, Bangalore. Phone: +91-80-5044999 email: maneesh@in.ibm.com http://lse.sourceforge.net/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* 2.5.43-mm2 gets network connection stuck 2002-10-17 14:38 ` Maneesh Soni @ 2002-10-17 16:14 ` Sebastian Benoit 2002-10-17 17:22 ` Andrew Morton 0 siblings, 1 reply; 21+ messages in thread From: Sebastian Benoit @ 2002-10-17 16:14 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm [-- Attachment #1: Type: text/plain, Size: 1496 bytes --] Hi, funny problem w. 2.5.43-mm2: i'm running 2.5.43-mm2 on my workstation. Normal workload, X-windows, a few xterms, editor, mozilla, etc. (host A) I have a NFS/SAMBA-mount (both show the problem) to host B. Host B runs 2.4.19rc5aa1. I can get a xterm, in which i have a ssh-connection to a third host C 'stuck' by simply cat'ing a large file from the NFS/SAMBA server to /dev/null. The xterm/ssh seems stuck, that is no key i press is received on the other end, but output of the program running on host C is updated in the xterm. I checked with tcpdump: the keypress does not generate a packet, my host only sends ACK's on that ssh connection to host C. The ssh-connection is not unstuck by stopping the data transfer from host B. I checked that plain 2.5.42 and 2.5.43-mm1 do not have this problem: here my input goes through to C. At least for small amounts of input, i did not test anything beyond typing a few hundret chars. recap: "mount /mnt/hostB" "ssh hostC" -> type random stuff in that connection at the same time do "cat /mnt/hostB/bigfile > /dev/null" ssh gets stuck. hardware: PIII/600, 3c905B on 10baseT half-duplex I'm sorry i cant do any further checks until Friday afternoon (MET). /B. -- Sebastian Benoit <benoit-lists@fb12.de> My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/ GnuPG 0x5BA22F00 2001-07-31 2999 9839 6C9E E4BF B540 C44B 4EC4 E1BE 5BA2 2F00 Oxymoron #654: Fatally Injured [-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.5.43-mm2 gets network connection stuck 2002-10-17 16:14 ` 2.5.43-mm2 gets network connection stuck Sebastian Benoit @ 2002-10-17 17:22 ` Andrew Morton 0 siblings, 0 replies; 21+ messages in thread From: Andrew Morton @ 2002-10-17 17:22 UTC (permalink / raw) To: Sebastian Benoit; +Cc: linux-mm Sebastian Benoit wrote: > > Hi, > > funny problem w. 2.5.43-mm2: > I saw something like that last night as well. One ssh session (sshd running on 2.5.43-mm2) just stopped doing anything. The -mm patches always include Linus's current -bk snapshot, and 2.5.43-mm2 has a lot of networking changes: net/core/dst.c | 25 net/ipv4/af_inet.c | 17 net/ipv4/icmp.c | 4 net/ipv4/ip_output.c | 880 ++++++++-- net/ipv4/ip_proc.c | 74 net/ipv4/ip_sockglue.c | 4 net/ipv4/raw.c | 7 net/ipv4/tcp.c | 49 net/ipv4/tcp_ipv4.c | 6 net/ipv4/tcp_minisocks.c | 10 net/ipv4/udp.c | 296 +++ Looks like something may have broken there. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2002-10-31 7:54 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-13 16:04 2.5.42-mm2 hangs system Henrik Størner
2002-10-13 21:03 ` William Lee Irwin III
[not found] ` <3DA9CA28.155BA5CB@digeo.com>
2002-10-13 22:33 ` Henrik Størner
2002-10-13 22:57 ` Andrew Morton
2002-10-14 12:25 ` 2.5.42-mm2 on small systems Ed Tomlinson
2002-10-14 14:34 ` Martin J. Bligh
2002-10-14 21:24 ` Bill Davidsen
2002-10-15 6:42 ` Andrew Morton
2002-10-16 20:55 ` Bill Davidsen
2002-10-16 22:43 ` Ed Tomlinson
2002-10-16 13:09 ` 2.5.42-mm2 hangs system Maneesh Soni
2002-10-16 15:49 ` Henrik Størner
2002-10-16 18:59 ` Henrik Størner
2002-10-16 19:31 ` Dipankar Sarma
2002-10-16 19:43 ` Andrew Morton
2002-10-16 20:05 ` Dipankar Sarma
2002-10-30 9:48 ` [FIX] " Maneesh Soni
2002-10-31 7:54 ` Henrik Størner
2002-10-17 14:38 ` Maneesh Soni
2002-10-17 16:14 ` 2.5.43-mm2 gets network connection stuck Sebastian Benoit
2002-10-17 17:22 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox