From: Rusty Lynch <rusty@linux.co.intel.com>
To: Rusty Lynch <rusty@linux.co.intel.com>
Cc: linux-mm@kvack.org
Subject: Re: swapping to death by stressing mlock
Date: 18 Sep 2003 13:46:57 -0700 [thread overview]
Message-ID: <1063918017.12547.9.camel@vmhack> (raw)
In-Reply-To: <200309182021.h8IKLnqX006918@penguin.co.intel.com>
I just loaded my 2.4.18 kernel and noticed that:
* I can not allocate and mlock as large a chunk of memory because mlock
returns fails, but I can start multiple allocate/mlock operations and
get to the same lockup
* BUT, the processes that are hogging up memory are now in a runnable
state (instead of being in an uninterpretable sleep), so I can us
meta-sysrq-i to kill off the offending processes and totally recover
from the condition.
So, maybe this is a valid buggy behavior?
--rustyl
On Thu, 2003-09-18 at 13:21, Rusty Lynch wrote:
> While getting more familiar with the vm subsystem I discovered that it is
> fairly easy to lockup my system by mlocking enough memory. I believe what
> is happening is that I am reducing the amount of swappable physical ram
> to the point that try_to_free_pages() will go into an endless loop waiting
> for bdflush to free up some pages.
>
> I'm guessing this is not a valid condition for a properly configured server,
> but since I'm not feeling very confident about my above explanation, I'm not
> so sure this isn't something to look into.
>
> On my 2.6.0-test5 kernel I run a little utility that attempts to allocate
> a large enough chunk of memory, touch all pages in the buffer, and then
> mlock the buffer. Just setting vm.overcommit_memory=2 and a real low
> vm.overcommit_ratio doesn't help a lot since all I have to do is squeeze out
> the available physical ram that can be swapped out.
>
> This is what I see for my offending process if I meta-sysrq-t.
>
> fat_bastard D 00000001 4293732848 598 550 (NOTLB)
> cc9d3c78 00000082 c1285bc0 00000001 00000003 c1286580 c1285bc0 cc9d3c98
> 00000000 00000246 c014f520 cc9d3c6c cf033004 cf6ff000 00000007 00000000
> 00000000 ffff8258 cc9d3c8c 00000000 cc9d3cc4 c0134dde cc9d3c8c ffff8258
> Call Trace:
> [<c014f520>] background_writeout+0x0/0xe0
> [<c0134dde>] schedule_timeout+0x6e/0xc0
> [<c0134d60>] process_timeout+0x0/0x10
> [<c012793b>] io_schedule_timeout+0x2b/0x40
> [<c031d2bb>] blk_congestion_wait+0x8b/0xa0
> [<c0128c30>] autoremove_wake_function+0x0/0x50
> [<c0128c30>] autoremove_wake_function+0x0/0x50
> [<c01581f2>] try_to_free_pages+0x102/0x1c0
> [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
> [<c0166d31>] read_swap_cache_async+0xb1/0xbd
> [<c015b8b2>] swapin_readahead+0x42/0x90
> [<c015bb68>] do_swap_page+0x268/0x340
> [<c011007b>] save_v86_state+0x4b/0x200
> [<c015c521>] handle_mm_fault+0xf1/0x200
> [<c015ab1e>] get_user_pages+0xee/0x3a0
> [<c015f18d>] insert_vm_struct+0x6d/0x77
> [<c015c74d>] make_pages_present+0x8d/0xa0
> [<c015cd24>] mlock_fixup+0xe4/0x120
> [<c0280e94>] capable+0x24/0x50
> [<c015ce49>] do_mlock+0xe9/0x110
> [<c015cf37>] sys_mlock+0xc7/0xe0
> [<c010c873>] syscall_call+0x7/0xb
>
> If I attempt to kill all processes with meta-sysrq-i, then I start seeing init
> stuck in the same spot:
>
> init D 00000001 21838320 606 1 605 (NOTLB)
> cea9fc5c 00000082 c1285bc0 00000001 00000003 c1286580 c1285bc0 cea9fc7c
> 00000000 00000246 c014f520 cea9fc50 ce3d0004 cf6ff000 00000007 00000000
> 00000000 00076d98 cea9fc70 00000000 cea9fca8 c0134dde cea9fc70 00076d98
> Call Trace:
> [<c014f520>] background_writeout+0x0/0xe0
> [<c0134dde>] schedule_timeout+0x6e/0xc0
> [<c0134d60>] process_timeout+0x0/0x10
> [<c012793b>] io_schedule_timeout+0x2b/0x40
> [<c031d2bb>] blk_congestion_wait+0x8b/0xa0
> [<c0128c30>] autoremove_wake_function+0x0/0x50
> [<c0128c30>] autoremove_wake_function+0x0/0x50
> [<c01581f2>] try_to_free_pages+0x102/0x1c0
> [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
> [<c0150982>] __do_page_cache_readahead+0x182/0x21e
> [<c014b18f>] filemap_nopage+0x11f/0x330
> [<c015bff1>] do_no_page+0xd1/0x3f0
> [<c015c548>] handle_mm_fault+0x118/0x200
> [<c0123886>] do_page_fault+0x176/0x4dc
> [<c0138c91>] sigprocmask+0x71/0x150
> [<c0138e11>] sys_rt_sigprocmask+0xa1/0x1e0
> [<c0123710>] do_page_fault+0x0/0x4dc
> [<c010d2dd>] error_code+0x2d/0x38
>
> The current process (as seen via meta-sysrq-p) seems to always be the swapper:
> Pid: 0, comm: swapper
> EIP: 0060:[<c010a070>] CPU: 0
> EIP is at default_idle+0x30/0x40
> EFLAGS: 00000246 Not tainted
> EAX: 00000000 EBX: c0600000 ECX: 001d9b2e EDX: c0600000
> ESI: c0600000 EDI: c010a040 EBP: c0601fb4 DS: 007b ES: 007b
> CR0: 8005003b CR2: 0804d6a0 CR3: 0b9b8000 CR4: 00000680
> Call Trace:
> [<c010a106>] cpu_idle+0x46/0x50
> [<c0105000>] rest_init+0x0/0x80
> [<c0602961>] start_kernel+0x181/0x1b0
> [<c0602500>] unknown_bootoption+0x0/0x100
>
> I also noticed that try_to_free_pages() is ignoring the return value for
> wakeup_bdflush(), so for kicks I
>
> + WARN_ON(wakeup_bdflush(total_scanned));
> - wakeup_bdflush(total_scanned);
>
> After my system is nicely locked up, I start seeing tons of warnings
> like:
>
> Badness in try_to_free_pages at mm/vmscan.c:886
> Call Trace:
> [<c01582b8>] try_to_free_pages+0x1c8/0x1e0
> [<c014e1a7>] __alloc_pages+0x1f7/0x3a0
> [<c014e372>] __get_free_pages+0x22/0x50
> [<c0152385>] cache_grow+0x125/0x400
> [<c013437c>] del_timer_sync+0x2c/0x80
> [<c0124819>] kernel_map_pages+0x29/0x64
> [<c015279a>] cache_alloc_refill+0x13a/0x4c0
> [<c0153185>] kmem_cache_alloc+0x1b5/0x1e0
> [<c017ca59>] getname+0x29/0xd0
> [<c017e28b>] __user_walk+0x1b/0x60
> [<c018319e>] select_bits_alloc+0x1e/0x30
> [<c01785ce>] vfs_stat+0x1e/0x60
> [<c01833fb>] sys_select+0x23b/0x520
> [<c0178ccb>] sys_stat64+0x1b/0x40
> [<c012f105>] sys_time+0x35/0x70
> [<c010c873>] syscall_call+0x7/0xb
>
>
> So... is my explanation on target? Is this a condition that would really
> only pop up in crazy stress testing? If not then maybe sys_mlock should have
> an additional threshold?
>
> --rustyl
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2003-09-18 20:46 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-09-18 20:21 Rusty Lynch
2003-09-18 20:46 ` Rusty Lynch [this message]
2003-09-18 21:15 ` William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1063918017.12547.9.camel@vmhack \
--to=rusty@linux.co.intel.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox