linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mark Hemment <markhe@nextd.demon.co.uk>
To: Karl Kleinpaste <karl@jprc.com>
Cc: linux-mm <linux-mm@kvack.org>
Subject: Re: 2.0.30: Lockups with huge proceses mallocing all VM
Date: Thu, 4 Dec 1997 10:02:44 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.3.95.971204092943.86B-100000@nextd.demon.co.uk> (raw)
In-Reply-To: <vxkiut6fiku.fsf@pocari-sweat.jprc.com>

On 3 Dec 1997, Karl Kleinpaste wrote:
> We can reproduce this fairly reliably.  Before anyone had looked at
> the code at all closely, some folks were surmising that perhaps Linux
> was not guaranteeing the availability of backing store for freshly-
> allocated pages, and that perhaps eventually Linux was getting stuck
> looking for a free page when none were to be found.

  Yep, Linux uses lazy swap-page allocation (some OSes use eager
allocation).  This means it can run out of pages.
  If a page allocation fails during a page fault, then the faulting task
is killed.  If an allocation fails for a "management structure" (eg.
vm_area_struct), then the system-call should fail with EAGAIN.
  Unfortunately, not all allocation failures are handled cleanly - some
work may have been done before the failure, which is not completely undone
after the allocation failure.  An example of this is munmap(), which
may partially unmap the given address range before an allocation failure
(yep, munmap() can cause allocations).  Another example is mlock() and
mprotect() - either may partially succeed.  This can confuse an
allocation, which may end up seg-faulting.

> I'm wondering whether this sort of lockup is analogous to the
> fragmentation lockups recently mentioned by Bill Hawes and others.  If
> so, could someone direct me toward Mark Hemment or others doing work
> of this sort?

I was (am) working on reducing the free-page pool fragmentation when my
page-colouring is being used.  It places a lower bound on the
fragmentation.
BTW, are you using NFS?  This requries largish orders of contigious pages
from the page allocator, and I believe this can cause NFS to stall the
machine until it gets the requried allocations - but I might be wrong
here, and/or this 'feature' may only be in 2.1.xx....

> I'm perfectly willing to wade into the kernel mem.mgmt code to figure
> out what I can about this, though it sounds like others may be way out
> in front on the issue.  In the meantime, we're working around the
> problem as best we can by imposing datasize limits (via ulimit) since
> the problem only presents itself when the machine is out of aggregate
> VM anyway -- it doesn't matter if we make this lone process die as
> long as the machine as a whole survives.

I think the bug is not in the VM sub-system, but the lack of available
pages is causing some other sub-system to lock your box.
If you have a free test target, try using the latest 2.1.x to see if the
problem is still there.

  Regards,

     markhe

      parent reply	other threads:[~1997-12-04 22:27 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <vxkiut6fiku.fsf@pocari-sweat.jprc.com>
1997-12-03 22:05 ` Rik van Riel
1997-12-04  2:26   ` Karl Kleinpaste
1997-12-04 10:02 ` Mark Hemment [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.3.95.971204092943.86B-100000@nextd.demon.co.uk \
    --to=markhe@nextd.demon.co.uk \
    --cc=karl@jprc.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox