swapping to death by stressing mlock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* swapping to death by stressing mlock
@ 2003-09-18 20:21 Rusty Lynch
  2003-09-18 20:46 ` Rusty Lynch
  2003-09-18 21:15 ` William Lee Irwin III
  0 siblings, 2 replies; 3+ messages in thread
From: Rusty Lynch @ 2003-09-18 20:21 UTC (permalink / raw)
  To: linux-mm; +Cc: rusty

While getting more familiar with the vm subsystem I discovered that it is
fairly easy to lockup my system by mlocking enough memory. I believe what 
is happening is that I am reducing the amount of swappable physical ram
to the point that try_to_free_pages() will go into an endless loop waiting
for bdflush to free up some pages.

I'm guessing this is not a valid condition for a properly configured server,
but since I'm not feeling very confident about my above explanation, I'm not
so sure this isn't something to look into.

On my 2.6.0-test5 kernel I run a little utility that attempts to allocate 
a large enough chunk of memory, touch all pages in the buffer, and then 
mlock the buffer.  Just setting vm.overcommit_memory=2 and a real low
vm.overcommit_ratio doesn't help a lot since all I have to do is squeeze out
the available physical ram that can be swapped out.

This is what I see for my offending process if I meta-sysrq-t.

fat_bastard   D 00000001 4293732848   598    550                     (NOTLB)
cc9d3c78 00000082 c1285bc0 00000001 00000003 c1286580 c1285bc0 cc9d3c98
       00000000 00000246 c014f520 cc9d3c6c cf033004 cf6ff000 00000007 00000000
       00000000 ffff8258 cc9d3c8c 00000000 cc9d3cc4 c0134dde cc9d3c8c ffff8258
Call Trace:
 [<c014f520>] background_writeout+0x0/0xe0
 [<c0134dde>] schedule_timeout+0x6e/0xc0
 [<c0134d60>] process_timeout+0x0/0x10
 [<c012793b>] io_schedule_timeout+0x2b/0x40
 [<c031d2bb>] blk_congestion_wait+0x8b/0xa0
 [<c0128c30>] autoremove_wake_function+0x0/0x50
 [<c0128c30>] autoremove_wake_function+0x0/0x50
 [<c01581f2>] try_to_free_pages+0x102/0x1c0
 [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
 [<c0166d31>] read_swap_cache_async+0xb1/0xbd
 [<c015b8b2>] swapin_readahead+0x42/0x90
 [<c015bb68>] do_swap_page+0x268/0x340
 [<c011007b>] save_v86_state+0x4b/0x200
 [<c015c521>] handle_mm_fault+0xf1/0x200
 [<c015ab1e>] get_user_pages+0xee/0x3a0
 [<c015f18d>] insert_vm_struct+0x6d/0x77
 [<c015c74d>] make_pages_present+0x8d/0xa0
 [<c015cd24>] mlock_fixup+0xe4/0x120
 [<c0280e94>] capable+0x24/0x50
 [<c015ce49>] do_mlock+0xe9/0x110
 [<c015cf37>] sys_mlock+0xc7/0xe0
 [<c010c873>] syscall_call+0x7/0xb

If I attempt to kill all processes with meta-sysrq-i, then I start seeing init
stuck in the same spot:

init          D 00000001 21838320   606      1                 605 (NOTLB)
cea9fc5c 00000082 c1285bc0 00000001 00000003 c1286580 c1285bc0 cea9fc7c
       00000000 00000246 c014f520 cea9fc50 ce3d0004 cf6ff000 00000007 00000000
       00000000 00076d98 cea9fc70 00000000 cea9fca8 c0134dde cea9fc70 00076d98
Call Trace:
 [<c014f520>] background_writeout+0x0/0xe0
 [<c0134dde>] schedule_timeout+0x6e/0xc0
 [<c0134d60>] process_timeout+0x0/0x10
 [<c012793b>] io_schedule_timeout+0x2b/0x40
 [<c031d2bb>] blk_congestion_wait+0x8b/0xa0
 [<c0128c30>] autoremove_wake_function+0x0/0x50
 [<c0128c30>] autoremove_wake_function+0x0/0x50
 [<c01581f2>] try_to_free_pages+0x102/0x1c0
 [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
 [<c0150982>] __do_page_cache_readahead+0x182/0x21e
 [<c014b18f>] filemap_nopage+0x11f/0x330
 [<c015bff1>] do_no_page+0xd1/0x3f0
 [<c015c548>] handle_mm_fault+0x118/0x200
 [<c0123886>] do_page_fault+0x176/0x4dc
 [<c0138c91>] sigprocmask+0x71/0x150
 [<c0138e11>] sys_rt_sigprocmask+0xa1/0x1e0
 [<c0123710>] do_page_fault+0x0/0x4dc
 [<c010d2dd>] error_code+0x2d/0x38

The current process (as seen via meta-sysrq-p) seems to always be the swapper:
Pid: 0, comm:              swapper
EIP: 0060:[<c010a070>] CPU: 0
EIP is at default_idle+0x30/0x40
 EFLAGS: 00000246    Not tainted
EAX: 00000000 EBX: c0600000 ECX: 001d9b2e EDX: c0600000
ESI: c0600000 EDI: c010a040 EBP: c0601fb4 DS: 007b ES: 007b
CR0: 8005003b CR2: 0804d6a0 CR3: 0b9b8000 CR4: 00000680
Call Trace:
 [<c010a106>] cpu_idle+0x46/0x50
 [<c0105000>] rest_init+0x0/0x80
 [<c0602961>] start_kernel+0x181/0x1b0
 [<c0602500>] unknown_bootoption+0x0/0x100

I also noticed that try_to_free_pages() is ignoring the return value for 
wakeup_bdflush(), so for kicks I 

+        WARN_ON(wakeup_bdflush(total_scanned));
-        wakeup_bdflush(total_scanned);

After my system is nicely locked up, I start seeing tons of warnings
like:

Badness in try_to_free_pages at mm/vmscan.c:886
Call Trace:
 [<c01582b8>] try_to_free_pages+0x1c8/0x1e0
 [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
 [<c014e372>] __get_free_pages+0x22/0x50
 [<c0152385>] cache_grow+0x125/0x400
 [<c013437c>] del_timer_sync+0x2c/0x80
 [<c0124819>] kernel_map_pages+0x29/0x64
 [<c015279a>] cache_alloc_refill+0x13a/0x4c0
 [<c0153185>] kmem_cache_alloc+0x1b5/0x1e0
 [<c017ca59>] getname+0x29/0xd0
 [<c017e28b>] __user_walk+0x1b/0x60
 [<c018319e>] select_bits_alloc+0x1e/0x30
 [<c01785ce>] vfs_stat+0x1e/0x60
 [<c01833fb>] sys_select+0x23b/0x520
 [<c0178ccb>] sys_stat64+0x1b/0x40
 [<c012f105>] sys_time+0x35/0x70
 [<c010c873>] syscall_call+0x7/0xb


So... is my explanation on target?  Is this a condition that would really
only pop up in crazy stress testing?  If not then maybe sys_mlock should have
an additional threshold?

    --rustyl
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: swapping to death by stressing mlock
  2003-09-18 20:21 swapping to death by stressing mlock Rusty Lynch
@ 2003-09-18 20:46 ` Rusty Lynch
  2003-09-18 21:15 ` William Lee Irwin III
  1 sibling, 0 replies; 3+ messages in thread
From: Rusty Lynch @ 2003-09-18 20:46 UTC (permalink / raw)
  To: Rusty Lynch; +Cc: linux-mm

I just loaded my 2.4.18 kernel and noticed that:
* I can not allocate and mlock as large a chunk of memory because mlock
returns fails, but I can start multiple allocate/mlock operations and
get to the same lockup
* BUT, the processes that are hogging up memory are now in a runnable
state (instead of being in an uninterpretable sleep), so I can us
meta-sysrq-i to kill off the offending processes and totally recover
from the condition. 

So, maybe this is a valid buggy behavior?

    --rustyl

On Thu, 2003-09-18 at 13:21, Rusty Lynch wrote:
> While getting more familiar with the vm subsystem I discovered that it is
> fairly easy to lockup my system by mlocking enough memory. I believe what 
> is happening is that I am reducing the amount of swappable physical ram
> to the point that try_to_free_pages() will go into an endless loop waiting
> for bdflush to free up some pages.
> 
> I'm guessing this is not a valid condition for a properly configured server,
> but since I'm not feeling very confident about my above explanation, I'm not
> so sure this isn't something to look into.
> 
> On my 2.6.0-test5 kernel I run a little utility that attempts to allocate 
> a large enough chunk of memory, touch all pages in the buffer, and then 
> mlock the buffer.  Just setting vm.overcommit_memory=2 and a real low
> vm.overcommit_ratio doesn't help a lot since all I have to do is squeeze out
> the available physical ram that can be swapped out.
> 
> This is what I see for my offending process if I meta-sysrq-t.
> 
> fat_bastard   D 00000001 4293732848   598    550                     (NOTLB)
> cc9d3c78 00000082 c1285bc0 00000001 00000003 c1286580 c1285bc0 cc9d3c98
>        00000000 00000246 c014f520 cc9d3c6c cf033004 cf6ff000 00000007 00000000
>        00000000 ffff8258 cc9d3c8c 00000000 cc9d3cc4 c0134dde cc9d3c8c ffff8258
> Call Trace:
>  [<c014f520>] background_writeout+0x0/0xe0
>  [<c0134dde>] schedule_timeout+0x6e/0xc0
>  [<c0134d60>] process_timeout+0x0/0x10
>  [<c012793b>] io_schedule_timeout+0x2b/0x40
>  [<c031d2bb>] blk_congestion_wait+0x8b/0xa0
>  [<c0128c30>] autoremove_wake_function+0x0/0x50
>  [<c0128c30>] autoremove_wake_function+0x0/0x50
>  [<c01581f2>] try_to_free_pages+0x102/0x1c0
>  [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
>  [<c0166d31>] read_swap_cache_async+0xb1/0xbd
>  [<c015b8b2>] swapin_readahead+0x42/0x90
>  [<c015bb68>] do_swap_page+0x268/0x340
>  [<c011007b>] save_v86_state+0x4b/0x200
>  [<c015c521>] handle_mm_fault+0xf1/0x200
>  [<c015ab1e>] get_user_pages+0xee/0x3a0
>  [<c015f18d>] insert_vm_struct+0x6d/0x77
>  [<c015c74d>] make_pages_present+0x8d/0xa0
>  [<c015cd24>] mlock_fixup+0xe4/0x120
>  [<c0280e94>] capable+0x24/0x50
>  [<c015ce49>] do_mlock+0xe9/0x110
>  [<c015cf37>] sys_mlock+0xc7/0xe0
>  [<c010c873>] syscall_call+0x7/0xb
> 
> If I attempt to kill all processes with meta-sysrq-i, then I start seeing init
> stuck in the same spot:
> 
> init          D 00000001 21838320   606      1                 605 (NOTLB)
> cea9fc5c 00000082 c1285bc0 00000001 00000003 c1286580 c1285bc0 cea9fc7c
>        00000000 00000246 c014f520 cea9fc50 ce3d0004 cf6ff000 00000007 00000000
>        00000000 00076d98 cea9fc70 00000000 cea9fca8 c0134dde cea9fc70 00076d98
> Call Trace:
>  [<c014f520>] background_writeout+0x0/0xe0
>  [<c0134dde>] schedule_timeout+0x6e/0xc0
>  [<c0134d60>] process_timeout+0x0/0x10
>  [<c012793b>] io_schedule_timeout+0x2b/0x40
>  [<c031d2bb>] blk_congestion_wait+0x8b/0xa0
>  [<c0128c30>] autoremove_wake_function+0x0/0x50
>  [<c0128c30>] autoremove_wake_function+0x0/0x50
>  [<c01581f2>] try_to_free_pages+0x102/0x1c0
>  [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
>  [<c0150982>] __do_page_cache_readahead+0x182/0x21e
>  [<c014b18f>] filemap_nopage+0x11f/0x330
>  [<c015bff1>] do_no_page+0xd1/0x3f0
>  [<c015c548>] handle_mm_fault+0x118/0x200
>  [<c0123886>] do_page_fault+0x176/0x4dc
>  [<c0138c91>] sigprocmask+0x71/0x150
>  [<c0138e11>] sys_rt_sigprocmask+0xa1/0x1e0
>  [<c0123710>] do_page_fault+0x0/0x4dc
>  [<c010d2dd>] error_code+0x2d/0x38
> 
> The current process (as seen via meta-sysrq-p) seems to always be the swapper:
> Pid: 0, comm:              swapper
> EIP: 0060:[<c010a070>] CPU: 0
> EIP is at default_idle+0x30/0x40
>  EFLAGS: 00000246    Not tainted
> EAX: 00000000 EBX: c0600000 ECX: 001d9b2e EDX: c0600000
> ESI: c0600000 EDI: c010a040 EBP: c0601fb4 DS: 007b ES: 007b
> CR0: 8005003b CR2: 0804d6a0 CR3: 0b9b8000 CR4: 00000680
> Call Trace:
>  [<c010a106>] cpu_idle+0x46/0x50
>  [<c0105000>] rest_init+0x0/0x80
>  [<c0602961>] start_kernel+0x181/0x1b0
>  [<c0602500>] unknown_bootoption+0x0/0x100
> 
> I also noticed that try_to_free_pages() is ignoring the return value for 
> wakeup_bdflush(), so for kicks I 
> 
> +        WARN_ON(wakeup_bdflush(total_scanned));
> -        wakeup_bdflush(total_scanned);
> 
> After my system is nicely locked up, I start seeing tons of warnings
> like:
> 
> Badness in try_to_free_pages at mm/vmscan.c:886
> Call Trace:
>  [<c01582b8>] try_to_free_pages+0x1c8/0x1e0
>  [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
>  [<c014e372>] __get_free_pages+0x22/0x50
>  [<c0152385>] cache_grow+0x125/0x400
>  [<c013437c>] del_timer_sync+0x2c/0x80
>  [<c0124819>] kernel_map_pages+0x29/0x64
>  [<c015279a>] cache_alloc_refill+0x13a/0x4c0
>  [<c0153185>] kmem_cache_alloc+0x1b5/0x1e0
>  [<c017ca59>] getname+0x29/0xd0
>  [<c017e28b>] __user_walk+0x1b/0x60
>  [<c018319e>] select_bits_alloc+0x1e/0x30
>  [<c01785ce>] vfs_stat+0x1e/0x60
>  [<c01833fb>] sys_select+0x23b/0x520
>  [<c0178ccb>] sys_stat64+0x1b/0x40
>  [<c012f105>] sys_time+0x35/0x70
>  [<c010c873>] syscall_call+0x7/0xb
> 
> 
> So... is my explanation on target?  Is this a condition that would really
> only pop up in crazy stress testing?  If not then maybe sys_mlock should have
> an additional threshold?
> 
>     --rustyl


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: swapping to death by stressing mlock
  2003-09-18 20:21 swapping to death by stressing mlock Rusty Lynch
  2003-09-18 20:46 ` Rusty Lynch
@ 2003-09-18 21:15 ` William Lee Irwin III
  1 sibling, 0 replies; 3+ messages in thread
From: William Lee Irwin III @ 2003-09-18 21:15 UTC (permalink / raw)
  To: Rusty Lynch; +Cc: linux-mm

On Thu, Sep 18, 2003 at 01:21:49PM -0700, Rusty Lynch wrote:
> While getting more familiar with the vm subsystem I discovered that it is
> fairly easy to lockup my system by mlocking enough memory. I believe what 
> is happening is that I am reducing the amount of swappable physical ram
> to the point that try_to_free_pages() will go into an endless loop waiting
> for bdflush to free up some pages.
> I'm guessing this is not a valid condition for a properly configured server,
> but since I'm not feeling very confident about my above explanation, I'm not
> so sure this isn't something to look into.
> On my 2.6.0-test5 kernel I run a little utility that attempts to allocate 
> a large enough chunk of memory, touch all pages in the buffer, and then 
> mlock the buffer.  Just setting vm.overcommit_memory=2 and a real low
> vm.overcommit_ratio doesn't help a lot since all I have to do is squeeze out
> the available physical ram that can be swapped out.
> This is what I see for my offending process if I meta-sysrq-t.

(a) mlock_fixup() ignores the return value of make_pages_present();
	it might be a good idea to hand back -ENOMEM if it fail
(b) get_user_pages() (and hence make_pages_present() etc.) should fail
	when bad things would otherwise happen.
(c) apart from these two observations, the best I can tell is that
	you're thrashing

So, you've seen that wakeup_bdflush() is done repeatedly; is there any
indication such as, say, the address in get_user_pages() to tell you
whether you're making forward progress? Just the fact that's repeatedly
called isn't really enough to make the distinction between livelock and
just being slow.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-09-18 21:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-18 20:21 swapping to death by stressing mlock Rusty Lynch
2003-09-18 20:46 ` Rusty Lynch
2003-09-18 21:15 ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox