* 2.5.66-mm1 @ 2003-03-26 9:38 Andrew Morton 2003-03-28 2:06 ` 2.5.66-mm1 Ed Tomlinson 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2003-03-26 9:38 UTC (permalink / raw) To: linux-kernel, linux-mm ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.66/2.5.66-mm1/ . The anticipatory scheduler is in wrapup mode now. It is pretty much in its final form. . The ext2 locking changes have been significantly redone. The per-blockgroup data structures had to go. For a 4TB filesystem we cannot even kmalloc that many pointers, let alone data structures. So the per-blockgroup spinlocking has been replaced with hashed spinlocking and the per-blockgroup accounting has been removed. A "per-cpu counter" thing has been invented to amortise the locking cost of the filesystem-wide counters. . ext3 is now using spinlocking in its block allocator rather than a filesystem-wide semaphore. It is stability-tested but I have not yet performance tested this closely. It does appear to have improved the context switch problem (and the file fragmentation problem which the context switch problem causes). But there's a way to go here. Changes since 2.5.65-mm4: linus.patch Latest -bk -nfsd-32-bit-dev_t-fixes.patch -i2c-fix.patch Merged +kgdb-ga.patch George Anzinger's gdb stub +ppa-null-pointer-fix.patch Might fix the parport scsi driver +initcall-debug.patch Debugging support for misbehaving initcalls +posix-timers-64-bit-fix.patch Timer fix for 64-bit machines +slab-off-by-one-fix.patch Slab was using too much memory. +install_page-flush_cache_page.patch Cache coherency bug in remap_file_pages() +as-minor-tweaks.patch +as-remove-stats.patch Anticipaory scheduler tuning and clanups. +posix-timer-double-expiration-fix.patch Posix timers were sending timer expiry info twice. +hugh-01-no-SWAP_ERROR.patch +hugh-02-try_to_unmap-CONFIG_SWAP.patch +hugh-03-add_to_swap_cache.patch +hugh-04-page_convert_anon-ENOMEM.patch +hugh-05-page_convert_anon-unlocking.patch +hugh-06-wrap-below-vm_start.patch +hugh-07-objrmap-page_table_lock.patch +hugh-08-rmap-comments.patch +hugh-09-tmpfs-truncation.patch +hugh-10-tmpfs-atomics.patch +hugh-11-fix-unuse_pmd-fixme.patch +hugh-12-vm_enough_memory-double-counts.patch Various vm/mm fixes and cleanups +ext3-max-file-size-fix.patch Allow ext3 to create files larger than 32GB (should be nearly 2TB) -ext2-no-lock_super.patch -ext2-ialloc-no-lock_super.patch +ext2-no-lock_super-ng.patch +ext2-ialloc-no-lock_super-ng.patch Rework the ext2 block and inode allocator locking changes. +dev_t-remove-B_FREE.patch Remove B_FREE. +tty_io-cleanup.patch +page_to_pfn-in-blk_queue_bounce.patch +init_inode_once-bloat-fix.patch Cleanups and fixlets +compound-page-warning-fix.patch Fix a warning +slab-cache-sizes-cleanup.patch Unduplicate some tables in slab. +stat_t-larger-dev_t.patch Large dev_t fix. +acpi-build-fix.patch make acpi compile. +sync_blockdev-on-final-close.patch Only write out blockdev mappings on the final close. +ext3-concurrent-block-inode-allocation.patch +ext3-concurrent-block-allocation-fix-1.patch Use spinlocking in the ext3 block allocator, not as fs-wide semaphore. All 104 patches: linus.patch mm.patch add -mmN to EXTRAVERSION kgdb-ga.patch kgdb stub for ia32 (George Anzinger's one) ppa-null-pointer-fix.patch initcall-debug.patch initcall debugging support posix-timers-64-bit-fix.patch POSIX timers interface long/int cleanup slab-off-by-one-fix.patch slab: fix off-by-one in size calculation config_spinline.patch uninline spinlocks for profiling accuracy. ppc64-reloc_hide.patch ppc64-pci-patch.patch Subject: pci patch ppc64-aio-32bit-emulation.patch 32/64bit emulation for aio ppc64-scruffiness.patch Fix some PPC64 compile warnings sym-do-160.patch make the SYM driver do 160 MB/sec install_page-flush_cache_page.patch add flush_cache_page() to install_page() config-PAGE_OFFSET.patch Configurable kenrel/user memory split ptrace-flush.patch cache flushing in the ptrace code buffer-debug.patch buffer.c debugging warn-null-wakeup.patch ext3-truncate-ordered-pages.patch ext3: explicitly free truncated pages reiserfs_file_write-5.patch rcu-stats.patch RCU statistics reporting ext3-journalled-data-assertion-fix.patch Remove incorrect assertion from ext3 nfs-speedup.patch nfs-oom-fix.patch nfs oom fix sk-allocation.patch Subject: Re: nfs oom nfs-more-oom-fix.patch rpciod-atomic-allocations.patch Make rcpiod use atomic allocations linux-isp.patch isp-update-1.patch kblockd.patch Create `kblockd' workqueue as-iosched.patch anticipatory I/O scheduler as-np-reads-1.patch AS: read-vs-read fixes as-np-reads-2.patch AS: more read-vs-read fixes as-predict-data-direction.patch as: predict direction of next IO as-remove-frontmerge.patch AS: remove frontmerge tunable as-misc-cleanups.patch AS: misc cleanups as-minor-tweaks.patch AS: tuning and tweaks as-remove-stats.patch AS: remove statistics cfq-2.patch CFQ scheduler, #2 unplug-use-kblockd.patch Use kblockd for running request queues fremap-all-mappings.patch Make all executable mappings be nonlinear objrmap-2.5.62-5.patch object-based rmap sched-2.5.64-D3.patch sched-2.5.64-D3, more interactivity changes scheduler-tunables.patch scheduler tunables show_task-free-stack-fix.patch show_task() fix and cleanup yellowfin-set_bit-fix.patch yellowfin driver set_bit fix htree-nfs-fix.patch Fix ext3 htree / NFS compatibility problems task_prio-fix.patch simple task_prio() fix slab_store_user-large-objects.patch slab debug: perform redzoning against larger objects pcmcia-2.patch pcmcia-3b.patch pcmcia-3.patch pcmcia-4.patch pcmcia-5.patch pcmcia-6.patch pcmcia-7b.patch pcmcia-7.patch pcmcia-8.patch pcmcia-9.patch pcmcia-10.patch htree-nfs-fix-2.patch htree nfs fix posix-timer-double-expiration-fix.patch posix timers: fix double-reporting of timer expiration hugh-01-no-SWAP_ERROR.patch swap 01/13 no SWAP_ERROR hugh-02-try_to_unmap-CONFIG_SWAP.patch Subject: [PATCH] swap 02/13 !CONFIG_SWAP try_to_unmap hugh-03-add_to_swap_cache.patch swap 03/13 add_to_swap_cache hugh-04-page_convert_anon-ENOMEM.patch swap 04/13 page_convert_anon -ENOMEM hugh-05-page_convert_anon-unlocking.patch swap 05/13 page_convert_anon unlocking hugh-06-wrap-below-vm_start.patch swap 06/13 wrap below vm_start hugh-07-objrmap-page_table_lock.patch swap 07/13 objrmap page_table_lock hugh-08-rmap-comments.patch swap 08/13 rmap comments hugh-09-tmpfs-truncation.patch swap 09/13 tmpfs truncation hugh-10-tmpfs-atomics.patch swap 10/13 tmpfs atomics hugh-11-fix-unuse_pmd-fixme.patch swap 11/13 fix unuse_pmd fixme hugh-12-vm_enough_memory-double-counts.patch swap 12/13 vm_enough_memory double counts ext3-max-file-size-fix.patch ext3: fix max file size ext2-no-lock_super-ng.patch ext2-ialloc-no-lock_super-ng.patch linear-oops-fix-1.patch md/linear oops fix dev_t-32-bit.patch [for playing only] change type of dev_t dev_t-remove-B_FREE.patch dev_t: eliminate B_FREE dev_t-drm-warnings.patch dev_t: fix drm printk warnings sg-dev_t-fix.patch 32-bit dev_t fix for sg oops-dump-preceding-code.patch i386 oops output: dump preceding code x86-clock-override-option.patch x86 clock override boot option tty_io-cleanup.patch tty_io cleanup page_to_pfn-in-blk_queue_bounce.patch Subject: use page_to_pfn() in __blk_queue_bounce() init_inode_once-bloat-fix.patch Subject: init_inode_once() wants sizeof(struct hlist_head) conntrack-use-after-free-fix.patch fix use-after-free in ip_conntrack VM_DONTEXPAND-fix.patch honour VM_DONTEXPAND in vma merging compound-page-warning-fix.patch Fix 64bit warnings in mm/page_alloc.c cdevname-irq-safety-fix.patch make cdevname() callable from interrupts register_chrdev_region-leak-fix.patch register_chrdev_region() leak and race fix slab-cache-sizes-cleanup.patch slab: cache sizes cleanup stat_t-larger-dev_t.patch struct stat - support larger dev_t acpi-build-fix.patch ACPI build fix sync_blockdev-on-final-close.patch sync blockdevs on the final close only ext3_mark_inode_dirty-speedup.patch ext3_mark_inode_dirty() speedup ext3_mark_inode_dirty-less-calls.patch ext3_commit_write speedup ext3-handle-cache.patch ext3: create a slab cache for transaction handles ext3-no-bkl.patch journal_dirty_metadata-speedup.patch journal_get_write_access-speedup.patch ext3-concurrent-block-inode-allocation.patch Subject: [PATCH] concurrent block/inode allocation for EXT3 ext3-concurrent-block-allocation-fix-1.patch -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.5.66-mm1 2003-03-26 9:38 2.5.66-mm1 Andrew Morton @ 2003-03-28 2:06 ` Ed Tomlinson 2003-03-28 4:59 ` 2.5.66-mm1 Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Ed Tomlinson @ 2003-03-28 2:06 UTC (permalink / raw) To: Andrew Morton, linux-kernel, linux-mm Hi Andrew, Got this opps after about 20 hours with mm1 (65-mm3 lasted 5 days until I rebooted). Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c011516d *pde = 00000000 Oops: 0002 [#1] CPU: 0 EIP: 0060:[<c011516d>] Not tainted VLI EFLAGS: 00010097 EIP is at schedule+0x8d/0x3a0 eax: 00000001 ebx: cf5e99c0 ecx: cf5e99c0 edx: ffffffff esi: 00000000 edi: c031de00 ebp: cf5ebf08 esp: cf5ebef0 ds: 007b es: 007b ss: 0068 Process newsplex (pid: 1205, threadinfo=cf5ea000 task=cf5e99c0) Stack: c011fbd7 c02bbc40 00000246 05261e41 cf5ebf14 cf5ebf50 cf5ebf3c c0120754 cf5ebf14 c02bc538 c02bc538 05261e41 4b87ad6e c01206e0 cf5e99c0 c02bbc40 c015abd6 000007d1 00000000 cf5ebf60 c015ac19 cf5ea000 cf5ea000 00000000 Call Trace: [<c011fbd7>] add_timer+0x57/0xa0 [<c0120754>] schedule_timeout+0x54/0xa0 [<c01206e0>] process_timeout+0x0/0x20 [<c015abd6>] do_poll+0x56/0xc0 [<c015ac19>] do_poll+0x99/0xc0 [<c015ad88>] sys_poll+0x148/0x220 [<c013eb3b>] sys_mprotect+0x21b/0x22f [<c01079ec>] sys_clone+0x2c/0x60 [<c015a200>] __pollwait+0x0/0xc0 [<c0109277>] syscall_call+0x7/0xb Code: 40 17 04 75 4d 8b 03 85 c0 74 47 48 0f 84 da 02 00 00 ff 0d 00 de 31 c0 8b 43 68 ff 08 8b 03 83 f8 02 0f 84 b6 02 00 00 8b 73 28 <ff> 4e 00 8b 53 24 8b 43 20 89 50 04 89 02 8b 4b 18 8d 14 ce 8d <6>note: newsplex[1205] exited with preempt_count 2 Debug: sleeping function called from illegal context at include/linux/rwsem.h:43 Call Trace: [<c01168d3>] __might_sleep+0x53/0x60 [<c01198d5>] profile_exit_task+0x15/0x60 [<c011aee6>] do_exit+0x86/0x460 [<c0109ab5>] die+0x75/0x80 [<c0113854>] do_page_fault+0x134/0x45e [<c0114798>] try_to_wake_up+0x138/0x240 [<c011fde4>] mod_timer+0x124/0x180 [<c012a520>] nanosleep_wake_up+0x0/0x20 [<c0131feb>] buffered_rmqueue+0xab/0x140 [<c0132103>] __alloc_pages+0x83/0x280 [<c0113720>] do_page_fault+0x0/0x45e [<c01094dd>] error_code+0x2d/0x40 [<c011516d>] schedule+0x8d/0x3a0 [<c011fbd7>] add_timer+0x57/0xa0 [<c0120754>] schedule_timeout+0x54/0xa0 [<c01206e0>] process_timeout+0x0/0x20 [<c015abd6>] do_poll+0x56/0xc0 [<c015ac19>] do_poll+0x99/0xc0 [<c015ad88>] sys_poll+0x148/0x220 [<c013eb3b>] sys_mprotect+0x21b/0x22f [<c01079ec>] sys_clone+0x2c/0x60 [<c015a200>] __pollwait+0x0/0xc0 [<c0109277>] syscall_call+0x7/0xb Hope this helps Ed Tomlinson -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.5.66-mm1 2003-03-28 2:06 ` 2.5.66-mm1 Ed Tomlinson @ 2003-03-28 4:59 ` Andrew Morton 2003-03-28 10:45 ` 2.5.66-mm1 Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Andrew Morton @ 2003-03-28 4:59 UTC (permalink / raw) To: Ed Tomlinson; +Cc: linux-kernel, linux-mm, Ingo Molnar Ed Tomlinson <tomlins@cam.org> wrote: > > Hi Andrew, > > Got this opps after about 20 hours with mm1 (65-mm3 lasted 5 days > until I rebooted). > > Unable to handle kernel NULL pointer dereference at virtual address 00000000 > printing eip: > c011516d > *pde = 00000000 > Oops: 0002 [#1] > CPU: 0 > EIP: 0060:[<c011516d>] Not tainted VLI > EFLAGS: 00010097 > EIP is at schedule+0x8d/0x3a0 > eax: 00000001 ebx: cf5e99c0 ecx: cf5e99c0 edx: ffffffff > esi: 00000000 edi: c031de00 ebp: cf5ebf08 esp: cf5ebef0 > ds: 007b es: 007b ss: 0068 > Process newsplex (pid: 1205, threadinfo=cf5ea000 task=cf5e99c0) > Stack: c011fbd7 c02bbc40 00000246 05261e41 cf5ebf14 cf5ebf50 cf5ebf3c c0120754 > cf5ebf14 c02bc538 c02bc538 05261e41 4b87ad6e c01206e0 cf5e99c0 c02bbc40 > c015abd6 000007d1 00000000 cf5ebf60 c015ac19 cf5ea000 cf5ea000 00000000 > Call Trace: > [<c011fbd7>] add_timer+0x57/0xa0 > [<c0120754>] schedule_timeout+0x54/0xa0 > [<c01206e0>] process_timeout+0x0/0x20 > [<c015abd6>] do_poll+0x56/0xc0 > [<c015ac19>] do_poll+0x99/0xc0 > [<c015ad88>] sys_poll+0x148/0x220 > [<c013eb3b>] sys_mprotect+0x21b/0x22f > [<c01079ec>] sys_clone+0x2c/0x60 > [<c015a200>] __pollwait+0x0/0xc0 > [<c0109277>] syscall_call+0x7/0xb > > Code: 40 17 04 75 4d 8b 03 85 c0 74 47 48 0f 84 da 02 00 00 ff 0d 00 de 31 c0 8b 43 68 ff 08 8b 03 83 f8 02 0f 84 b6 02 00 00 8b 73 28 <ff> 4e 00 8b 53 24 8b 43 20 89 50 04 89 02 8b 4b 18 8d 14 ce 8d That longer Code: line is really handy. You died in schedule()->deactivate_task()->dequeue_task(). static inline void dequeue_task(struct task_struct *p, prio_array_t *array) { array->nr_active--; `array' is zero. I'm going to Cc Ingo and run away. Ed uses preempt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.5.66-mm1 2003-03-28 4:59 ` 2.5.66-mm1 Andrew Morton @ 2003-03-28 10:45 ` Ingo Molnar [not found] ` <Pine.LNX.4.44.0303281139500.6678-100000@localhost.localdom ain> [not found] ` <Pine.LNX.4.50.0303280942420.2884-100000@montezuma.mastecen de.com> 2 siblings, 0 replies; 9+ messages in thread From: Ingo Molnar @ 2003-03-28 10:45 UTC (permalink / raw) To: Andrew Morton; +Cc: Ed Tomlinson, linux-kernel, linux-mm, Mike Galbraith On Thu, 27 Mar 2003, Andrew Morton wrote: > That longer Code: line is really handy. > > You died in schedule()->deactivate_task()->dequeue_task(). > > static inline void dequeue_task(struct task_struct *p, prio_array_t *array) > { > array->nr_active--; > > `array' is zero. > > I'm going to Cc Ingo and run away. Ed uses preempt. hm, this is an 'impossible' scenario from the scheduler code POV. Whenever we deactivate a task, we remove it from the runqueue and set p->array to NULL. Whenever we activate a task again, we set p->array to non-NULL. A double-deactivate is not possible. I tried to reproduce it with various scheduler workloads, but didnt succeed. Mike, do you have a backtrace of the crash you saw? Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <Pine.LNX.4.44.0303281139500.6678-100000@localhost.localdom ain>]
* Re: 2.5.66-mm1 [not found] ` <Pine.LNX.4.44.0303281139500.6678-100000@localhost.localdom ain> @ 2003-03-28 14:26 ` Mike Galbraith 2003-03-28 14:56 ` 2.5.66-mm1 Zwane Mwaikambo 0 siblings, 1 reply; 9+ messages in thread From: Mike Galbraith @ 2003-03-28 14:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, Ed Tomlinson, linux-kernel, linux-mm At 11:45 AM 3/28/2003 +0100, Ingo Molnar wrote: >On Thu, 27 Mar 2003, Andrew Morton wrote: > > > That longer Code: line is really handy. > > > > You died in schedule()->deactivate_task()->dequeue_task(). > > > > static inline void dequeue_task(struct task_struct *p, prio_array_t *array) > > { > > array->nr_active--; > > > > `array' is zero. > > > > I'm going to Cc Ingo and run away. Ed uses preempt. > >hm, this is an 'impossible' scenario from the scheduler code POV. Whenever >we deactivate a task, we remove it from the runqueue and set p->array to >NULL. Whenever we activate a task again, we set p->array to non-NULL. A >double-deactivate is not possible. I tried to reproduce it with various >scheduler workloads, but didnt succeed. > >Mike, do you have a backtrace of the crash you saw? No, I didn't save it due to "grubby fingerprints". -Mike -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.5.66-mm1 2003-03-28 14:26 ` 2.5.66-mm1 Mike Galbraith @ 2003-03-28 14:56 ` Zwane Mwaikambo 2003-03-28 15:25 ` 2.5.66-mm1 Ingo Molnar [not found] ` <Pine.LNX.4.44.0303281619530.9943-100000@localhost.localdom ain> 0 siblings, 2 replies; 9+ messages in thread From: Zwane Mwaikambo @ 2003-03-28 14:56 UTC (permalink / raw) To: Mike Galbraith Cc: Ingo Molnar, Andrew Morton, Ed Tomlinson, linux-kernel, linux-mm On Fri, 28 Mar 2003, Mike Galbraith wrote: > >hm, this is an 'impossible' scenario from the scheduler code POV. Whenever > >we deactivate a task, we remove it from the runqueue and set p->array to > >NULL. Whenever we activate a task again, we set p->array to non-NULL. A > >double-deactivate is not possible. I tried to reproduce it with various > >scheduler workloads, but didnt succeed. > > > >Mike, do you have a backtrace of the crash you saw? > > No, I didn't save it due to "grubby fingerprints". Hmm i think i may have his this one but i never posted due to being unable to reproduce it on a vanilla kernel or the same kernel afterwards (which was hacked so i won't vouch for it's cleanliness). I think preempt might have bitten him in a bad place (mine is also CONFIG_PREEMPT), is it possible that when we did the task_rq_unlock we got preempted and when we got back we used the local variable requeue_waker which was set before dropping the lock, and therefore might not be valid anymore due to scheduler decisions done after dropping the runqueue lock? Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c011b8d9 *pde = 00000000 Oops: 0000 [#1] CPU: 0 EIP: 0060:[<c011b8d9>] Not tainted EFLAGS: 00010046 EIP is at try_to_wake_up+0x1e9/0x4f0 eax: c055a000 ebx: c04e5aa0 ecx: c0552fc0 edx: c04e5aa0 esi: 00000000 edi: 00000000 ebp: c055bee4 esp: c055beb8 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c055a000 task=c04e5aa0) Stack: 00000001 c055a000 c0552fc0 00000000 cb1a0000 00000001 00000001 00000002 00000000 c04e88e4 00000001 c055bf08 c011d172 c1694700 00000001 00000000 c04e88e4 c04e88dc c055a000 00000001 c055bf3c c011d203 c04e88dc 00000001 Call Trace: [<c011d172>] __wake_up_common+0x32/0x60 [<c011d203>] __wake_up+0x63/0xb0 [<c0122fb5>] release_console_sem+0x165/0x170 [<c0122d7b>] printk+0x1eb/0x270 [<c015e210>] invalidate_bh_lru+0x0/0x60 [<c015e210>] invalidate_bh_lru+0x0/0x60 [<c015e210>] invalidate_bh_lru+0x0/0x60 [<c01163f2>] smp_call_function_interrupt+0x42/0xb0 [<c015e210>] invalidate_bh_lru+0x0/0x60 [<c0106eb0>] default_idle+0x0/0x40 [<c010a41a>] call_function_interrupt+0x1a/0x20 [<c0106eb0>] default_idle+0x0/0x40 [<c0106ede>] default_idle+0x2e/0x40 [<c0106f6a>] cpu_idle+0x3a/0x50 [<c0105000>] rest_init+0x0/0x80 Code: 8b 06 48 89 06 8b 4a 24 8b 42 20 89 01 89 48 04 8b 4a 18 8d 0xc011b8d9 is in try_to_wake_up (kernel/sched.c:282). 277 /* 278 * Adding/removing a task to/from a priority array: 279 */ 280 static inline void dequeue_task(struct task_struct *p, prio_array_t *array) 281 { 282 array->nr_active--; 283 list_del(&p->run_list); 284 if (list_empty(array->queue + p->prio)) 285 __clear_bit(p->prio, array->bitmap); 286 } (gdb) list *__wake_up_common+0x32 0xc011d1b2 is in __wake_up_common (kernel/sched.c:1424). 1419 list_for_each_safe(tmp, next, &q->task_list) { 1420 wait_queue_t *curr; 1421 unsigned flags; 1422 curr = list_entry(tmp, wait_queue_t, task_list); 1423 flags = curr->flags; 1424 if (curr->func(curr, mode, sync) && 1425 (flags & WQ_FLAG_EXCLUSIVE) && 1426 !--nr_exclusive) 1427 break; 1428 } (gdb) list *__wake_up+0x62 0xc011d242 is in __wake_up (kernel/sched.c:1445). 1440 1441 if (unlikely(!q)) 1442 return; 1443 1444 spin_lock_irqsave(&q->lock, flags); 1445 __wake_up_common(q, mode, nr_exclusive, 0); 1446 spin_unlock_irqrestore(&q->lock, flags); 1447 } 1448 1449 /* -- function.linuxpower.ca -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2.5.66-mm1 2003-03-28 14:56 ` 2.5.66-mm1 Zwane Mwaikambo @ 2003-03-28 15:25 ` Ingo Molnar [not found] ` <Pine.LNX.4.44.0303281619530.9943-100000@localhost.localdom ain> 1 sibling, 0 replies; 9+ messages in thread From: Ingo Molnar @ 2003-03-28 15:25 UTC (permalink / raw) To: Zwane Mwaikambo Cc: Mike Galbraith, Andrew Morton, Ed Tomlinson, linux-kernel, linux-mm On Fri, 28 Mar 2003, Zwane Mwaikambo wrote: > Hmm i think i may have his this one but i never posted due to being > unable to reproduce it on a vanilla kernel or the same kernel afterwards > (which was hacked so i won't vouch for it's cleanliness). I think > preempt might have bitten him in a bad place (mine is also > CONFIG_PREEMPT), is it possible that when we did the task_rq_unlock we > got preempted and when we got back we used the local variable > requeue_waker which was set before dropping the lock, and therefore > might not be valid anymore due to scheduler decisions done after > dropping the runqueue lock? yes, this one was my only suspect, but it should really never cause any problems. We might change sleep_avg during the wakeup, and carry the requeue_waker flag over a preemptible window, but the requeueing itself re-takes the runqueue lock, and does not take anything for granted. The flag could very well be random as well, and the code should still be correct - there's no requirement to recalculate the priority every time we change sleep_avg. (in fact we at times intentionally keep those values detached.) Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <Pine.LNX.4.44.0303281619530.9943-100000@localhost.localdom ain>]
* Re: 2.5.66-mm1 [not found] ` <Pine.LNX.4.44.0303281619530.9943-100000@localhost.localdom ain> @ 2003-03-28 16:05 ` Mike Galbraith 0 siblings, 0 replies; 9+ messages in thread From: Mike Galbraith @ 2003-03-28 16:05 UTC (permalink / raw) To: Ingo Molnar Cc: Zwane Mwaikambo, Andrew Morton, Ed Tomlinson, linux-kernel, linux-mm At 04:25 PM 3/28/2003 +0100, Ingo Molnar wrote: >On Fri, 28 Mar 2003, Zwane Mwaikambo wrote: > > > Hmm i think i may have his this one but i never posted due to being > > unable to reproduce it on a vanilla kernel or the same kernel afterwards > > (which was hacked so i won't vouch for it's cleanliness). I think > > preempt might have bitten him in a bad place (mine is also > > CONFIG_PREEMPT), is it possible that when we did the task_rq_unlock we > > got preempted and when we got back we used the local variable > > requeue_waker which was set before dropping the lock, and therefore > > might not be valid anymore due to scheduler decisions done after > > dropping the runqueue lock? > >yes, this one was my only suspect, but it should really never cause any >problems. We might change sleep_avg during the wakeup, and carry the >requeue_waker flag over a preemptible window, but the requeueing itself >re-takes the runqueue lock, and does not take anything for granted. The >flag could very well be random as well, and the code should still be >correct - there's no requirement to recalculate the priority every time we >change sleep_avg. (in fact we at times intentionally keep those values >detached.) In my 66-twiddle tree, I moved that under the lock out of pure paranoia. I can try to see if printing under hefty (very) load will still trigger the occasional explosion. -Mike -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <Pine.LNX.4.50.0303280942420.2884-100000@montezuma.mastecen de.com>]
* Re: 2.5.66-mm1 [not found] ` <Pine.LNX.4.50.0303280942420.2884-100000@montezuma.mastecen de.com> @ 2003-03-28 16:01 ` Mike Galbraith 0 siblings, 0 replies; 9+ messages in thread From: Mike Galbraith @ 2003-03-28 16:01 UTC (permalink / raw) To: Zwane Mwaikambo Cc: Ingo Molnar, Andrew Morton, Ed Tomlinson, linux-kernel, linux-mm [-- Attachment #1: Type: text/plain, Size: 3414 bytes --] At 09:56 AM 3/28/2003 -0500, Zwane Mwaikambo wrote: >On Fri, 28 Mar 2003, Mike Galbraith wrote: > > > >hm, this is an 'impossible' scenario from the scheduler code POV. Whenever > > >we deactivate a task, we remove it from the runqueue and set p->array to > > >NULL. Whenever we activate a task again, we set p->array to non-NULL. A > > >double-deactivate is not possible. I tried to reproduce it with various > > >scheduler workloads, but didnt succeed. > > > > > >Mike, do you have a backtrace of the crash you saw? > > > > No, I didn't save it due to "grubby fingerprints". > >Hmm i think i may have his this one but i never posted due to being unable >to reproduce it on a vanilla kernel or the same kernel afterwards (which >was hacked so i won't vouch for it's cleanliness). I think preempt >might have bitten him in a bad place (mine is also CONFIG_PREEMPT), is it >possible that when we did the task_rq_unlock we got preempted and when we >got back we used the local variable requeue_waker which was set before >dropping the lock, and therefore might not be valid anymore due to >scheduler decisions done after dropping the runqueue lock? Dunno. I did have one lying around. The attached one was while printing out array switch latency after starvation timeout. Others happened while printing wakeup stats for p->state > 1 tasks in scheduler_tick() [under lock w/ wakeup disabled in printk.c]. It's nothing I did to the scheduler ;-) I don't think, but this was in 65-mm3-twiddle-twiddle-twiddle. >Unable to handle kernel NULL pointer dereference at virtual address 00000000 > printing eip: >c011b8d9 >*pde = 00000000 >Oops: 0000 [#1] >CPU: 0 >EIP: 0060:[<c011b8d9>] Not tainted >EFLAGS: 00010046 >EIP is at try_to_wake_up+0x1e9/0x4f0 >eax: c055a000 ebx: c04e5aa0 ecx: c0552fc0 edx: c04e5aa0 >esi: 00000000 edi: 00000000 ebp: c055bee4 esp: c055beb8 >ds: 007b es: 007b ss: 0068 >Process swapper (pid: 0, threadinfo=c055a000 task=c04e5aa0) >Stack: 00000001 c055a000 c0552fc0 00000000 cb1a0000 00000001 00000001 >00000002 > 00000000 c04e88e4 00000001 c055bf08 c011d172 c1694700 00000001 > 00000000 > c04e88e4 c04e88dc c055a000 00000001 c055bf3c c011d203 c04e88dc > 00000001 >Call Trace: > [<c011d172>] __wake_up_common+0x32/0x60 > [<c011d203>] __wake_up+0x63/0xb0 > [<c0122fb5>] release_console_sem+0x165/0x170 > [<c0122d7b>] printk+0x1eb/0x270 > [<c015e210>] invalidate_bh_lru+0x0/0x60 > [<c015e210>] invalidate_bh_lru+0x0/0x60 > [<c015e210>] invalidate_bh_lru+0x0/0x60 > [<c01163f2>] smp_call_function_interrupt+0x42/0xb0 > [<c015e210>] invalidate_bh_lru+0x0/0x60 > [<c0106eb0>] default_idle+0x0/0x40 > [<c010a41a>] call_function_interrupt+0x1a/0x20 > [<c0106eb0>] default_idle+0x0/0x40 > [<c0106ede>] default_idle+0x2e/0x40 > [<c0106f6a>] cpu_idle+0x3a/0x50 > [<c0105000>] rest_init+0x0/0x80 > >Code: 8b 06 48 89 06 8b 4a 24 8b 42 20 89 01 89 48 04 8b 4a 18 8d > >0xc011b8d9 is in try_to_wake_up (kernel/sched.c:282). >277 /* >278 * Adding/removing a task to/from a priority array: >279 */ >280 static inline void dequeue_task(struct task_struct *p, >prio_array_t *array) >281 { >282 array->nr_active--; >283 list_del(&p->run_list); >284 if (list_empty(array->queue + p->prio)) >285 __clear_bit(p->prio, array->bitmap); >286 } Same spot. -Mike [-- Attachment #2: oops.txt --] [-- Type: text/plain, Size: 3183 bytes --] Loglevel set to 9 hmm.. 289 ms hmm.. 6 ms hmm.. 4 ms hmm.. 7 ms hmm.. 13 ms hmm.. 15 ms Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0114d0a *pde = 00000000 Oops: 0002 [#1] CPU: 0 EIP: 0060:[<c0114d0a>] Not tainted VLI EFLAGS: 00010006 EIP is at try_to_wake_up+0x1e2/0x258 eax: 00000008 ebx: c02cb3c8 ecx: c0dcf360 edx: c0dcf360 esi: c0c24000 edi: 00000000 ebp: c0c25ed4 esp: c0c25eb8 ds: 007b es: 007b ss: 0068 Process gcc (pid: 592, threadinfo=c0c24000 task=c0dcf360) Stack: 00000001 00000001 c0298ff4 c0c25ed0 00000001 00000001 00000002 c0c25ee8 c0115887 c7b8a0a0 00000003 00000000 c0c25f08 c01158c2 c2d81e5c 00000003 00000000 c0c24000 00000082 c0298fe8 c0c25f20 c011594a c0298ff0 00000003 Call Trace: [<c0115887>] default_wake_function+0x17/0x1c [<c01158c2>] __wake_up_common+0x36/0x50 [<c011594a>] __wake_up_locked+0xe/0x14 [<c0107cdc>] __down_trylock+0x34/0x54 [<c0107d1b>] __down_failed_trylock+0x7/0xc [<c011928b>] .text.lock.printk+0x5/0x2a [<c01155f0>] schedule+0x13c/0x378 [<c011ab07>] sys_wait4+0xab/0x234 [<c011ac5d>] sys_wait4+0x201/0x234 [<c0115870>] default_wake_function+0x0/0x1c [<c0115870>] default_wake_function+0x0/0x1c [<c0108b5f>] syscall_call+0x7/0xb Code: ff 48 14 8b 40 08 a8 08 74 07 e8 3e 0b 00 00 89 f6 85 f6 74 7e 8b 55 f0 9c 8f 02 fa be 00 e0 ff ff 21 e6 ff 46 14 8b 16 8b 7a 28 <ff> 0f 8b 42 20 8b 4a 24 89 48 04 89 01 8b 52 18 8d 44 d7 18 39 (gdb) list *try_to_wake_up+0x1e2 0x26a is in try_to_wake_up (kernel/sched.c:310). 305 /* 306 * Adding/removing a task to/from a priority array: 307 */ 308 static inline void dequeue_task(struct task_struct *p, prio_array_t *array) 309 { 310 array->nr_active--; 311 list_del(&p->run_list); 312 if (list_empty(array->queue + p->prio)) 313 __clear_bit(p->prio, array->bitmap); 314 } (gdb) <6>note: gcc[592] exited with preempt_count 5 bad: scheduling while atomic! Call Trace: [<c01154f0>] schedule+0x3c/0x378 [<c0134b26>] unmap_vmas+0xea/0x1e0 [<c011647b>] __cond_resched+0x17/0x1c [<c0134b86>] unmap_vmas+0x14a/0x1e0 [<c0137fb8>] exit_mmap+0x64/0x158 [<c0116dbd>] mmput+0x55/0x74 [<c011a368>] do_exit+0x158/0x3b4 [<c0109267>] die+0x87/0x88 [<c0114068>] do_page_fault+0x2d8/0x404 [<c0113d90>] do_page_fault+0x0/0x404 [<c011b91a>] do_softirq+0x5a/0xac [<c010a170>] do_IRQ+0xfc/0x118 [<c012e173>] __rmqueue+0xa3/0x10c [<c012e21f>] rmqueue_bulk+0x43/0x6c [<c0108d69>] error_code+0x2d/0x38 [<c0114d0a>] try_to_wake_up+0x1e2/0x258 [<c0115887>] default_wake_function+0x17/0x1c [<c01158c2>] __wake_up_common+0x36/0x50 [<c011594a>] __wake_up_locked+0xe/0x14 [<c0107cdc>] __down_trylock+0x34/0x54 [<c0107d1b>] __down_failed_trylock+0x7/0xc [<c011928b>] .text.lock.printk+0x5/0x2a [<c01155f0>] schedule+0x13c/0x378 [<c011ab07>] sys_wait4+0xab/0x234 [<c011ac5d>] sys_wait4+0x201/0x234 [<c0115870>] default_wake_function+0x0/0x1c [<c0115870>] default_wake_function+0x0/0x1c [<c0108b5f>] syscall_call+0x7/0xb hmm.. 42 ms hmm.. 24 ms hmm.. 33 ms hmm.. 23 ms hmm.. 31 ms hmm.. 30 ms hmm.. 30 ms ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-03-28 16:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-26 9:38 2.5.66-mm1 Andrew Morton
2003-03-28 2:06 ` 2.5.66-mm1 Ed Tomlinson
2003-03-28 4:59 ` 2.5.66-mm1 Andrew Morton
2003-03-28 10:45 ` 2.5.66-mm1 Ingo Molnar
[not found] ` <Pine.LNX.4.44.0303281139500.6678-100000@localhost.localdom ain>
2003-03-28 14:26 ` 2.5.66-mm1 Mike Galbraith
2003-03-28 14:56 ` 2.5.66-mm1 Zwane Mwaikambo
2003-03-28 15:25 ` 2.5.66-mm1 Ingo Molnar
[not found] ` <Pine.LNX.4.44.0303281619530.9943-100000@localhost.localdom ain>
2003-03-28 16:05 ` 2.5.66-mm1 Mike Galbraith
[not found] ` <Pine.LNX.4.50.0303280942420.2884-100000@montezuma.mastecen de.com>
2003-03-28 16:01 ` 2.5.66-mm1 Mike Galbraith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox