At 09:56 AM 3/28/2003 -0500, Zwane Mwaikambo wrote: >On Fri, 28 Mar 2003, Mike Galbraith wrote: > > > >hm, this is an 'impossible' scenario from the scheduler code POV. Whenever > > >we deactivate a task, we remove it from the runqueue and set p->array to > > >NULL. Whenever we activate a task again, we set p->array to non-NULL. A > > >double-deactivate is not possible. I tried to reproduce it with various > > >scheduler workloads, but didnt succeed. > > > > > >Mike, do you have a backtrace of the crash you saw? > > > > No, I didn't save it due to "grubby fingerprints". > >Hmm i think i may have his this one but i never posted due to being unable >to reproduce it on a vanilla kernel or the same kernel afterwards (which >was hacked so i won't vouch for it's cleanliness). I think preempt >might have bitten him in a bad place (mine is also CONFIG_PREEMPT), is it >possible that when we did the task_rq_unlock we got preempted and when we >got back we used the local variable requeue_waker which was set before >dropping the lock, and therefore might not be valid anymore due to >scheduler decisions done after dropping the runqueue lock? Dunno. I did have one lying around. The attached one was while printing out array switch latency after starvation timeout. Others happened while printing wakeup stats for p->state > 1 tasks in scheduler_tick() [under lock w/ wakeup disabled in printk.c]. It's nothing I did to the scheduler ;-) I don't think, but this was in 65-mm3-twiddle-twiddle-twiddle. >Unable to handle kernel NULL pointer dereference at virtual address 00000000 > printing eip: >c011b8d9 >*pde = 00000000 >Oops: 0000 [#1] >CPU: 0 >EIP: 0060:[] Not tainted >EFLAGS: 00010046 >EIP is at try_to_wake_up+0x1e9/0x4f0 >eax: c055a000 ebx: c04e5aa0 ecx: c0552fc0 edx: c04e5aa0 >esi: 00000000 edi: 00000000 ebp: c055bee4 esp: c055beb8 >ds: 007b es: 007b ss: 0068 >Process swapper (pid: 0, threadinfo=c055a000 task=c04e5aa0) >Stack: 00000001 c055a000 c0552fc0 00000000 cb1a0000 00000001 00000001 >00000002 > 00000000 c04e88e4 00000001 c055bf08 c011d172 c1694700 00000001 > 00000000 > c04e88e4 c04e88dc c055a000 00000001 c055bf3c c011d203 c04e88dc > 00000001 >Call Trace: > [] __wake_up_common+0x32/0x60 > [] __wake_up+0x63/0xb0 > [] release_console_sem+0x165/0x170 > [] printk+0x1eb/0x270 > [] invalidate_bh_lru+0x0/0x60 > [] invalidate_bh_lru+0x0/0x60 > [] invalidate_bh_lru+0x0/0x60 > [] smp_call_function_interrupt+0x42/0xb0 > [] invalidate_bh_lru+0x0/0x60 > [] default_idle+0x0/0x40 > [] call_function_interrupt+0x1a/0x20 > [] default_idle+0x0/0x40 > [] default_idle+0x2e/0x40 > [] cpu_idle+0x3a/0x50 > [] rest_init+0x0/0x80 > >Code: 8b 06 48 89 06 8b 4a 24 8b 42 20 89 01 89 48 04 8b 4a 18 8d > >0xc011b8d9 is in try_to_wake_up (kernel/sched.c:282). >277 /* >278 * Adding/removing a task to/from a priority array: >279 */ >280 static inline void dequeue_task(struct task_struct *p, >prio_array_t *array) >281 { >282 array->nr_active--; >283 list_del(&p->run_list); >284 if (list_empty(array->queue + p->prio)) >285 __clear_bit(p->prio, array->bitmap); >286 } Same spot. -Mike