Re: RFC: Noreclaim with "Keep Mlocked Pages off the LRU"

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: linux-mm <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>
Subject: Re: RFC:  Noreclaim with "Keep Mlocked Pages off the LRU"
Date: Wed, 29 Aug 2007 06:38:03 +0200	[thread overview]
Message-ID: <20070829043803.GD25335@wotan.suse.de> (raw)
In-Reply-To: <1188312766.5079.77.camel@localhost>

On Tue, Aug 28, 2007 at 10:52:46AM -0400, Lee Schermerhorn wrote:
> On Tue, 2007-08-28 at 02:06 +0200, Nick Piggin wrote:
> > 
> > I don't have a problem with having a more unified approach, although if
> > we did that, then I'd prefer just to do it more simply and don't special
> > case mlocked pages _at all_. Ie. just slowly try to reclaim them and
> > eventually when everybody unlocks them, you will notice sooner or later.
> 
> I didn't think I was special casing mlocked pages.  I wanted to treat
> all !page_reclaimable() pages the same--i.e., put them on the noreclaim
> list.

But you are keeping track of the mlock count? Why not simply call
try_to_unmap and see if they are still mlocked?


> > But once you do the code for mlock refcounting, that's most of the hard
> > part done so you may as well remove them completely from the LRU, no?
> > Then they become more or less transparent to the rest of the VM as well.
> 
> Well, no.  Depending on the reason for !reclaimable, the page would go
> on the noreclaim list or just be dropped--special handling.  More
> importantly [for me], we still have to handle them specially in
> migration, dumping them back onto the LRU so that we can arbitrate
> access.  If I'm ever successful in getting automatic/lazy page migration
> +replication accepted, I don't want that overhead in
> auto-migration/replication.

Oh OK. I don't know if there should be a whole lot of overhead involved
with that, though. I can't remember exactly what the problems were here
with my mlock patch, but I think it could have been made more optimal.


> > Could be possible. Tricky though. Probably take less code to use
> > ->lru ;)
> 
> Oh, certainly less code to use any separate field.  But the lru list
> field is the only link we have in the page struct, and a lot of VM
> depends on being able to pass around lists of pages.  I'd hate to lose
> that for mlocked pages, or to have to dump the lock count and
> reestablish it in those cases, like migration, where we need to put the
> page on a list.

Hmm, yes. Migration could possibly use a single linked list.
But I'm only saying it _could_ be possible to do mlocked accounting
efficiently with one of the LRU pointers -- I would prefer the idea
of just using a single bit for example, if that is sufficient. It
should cut down on code.


> > I don't know. I'd have thought efficient mlock handling might be useful
> > for realtime systems, probably many of which would be 32-bit.
> 
> I agree.  I just wonder if those systems have a sufficient number of
> pages that they're suffering from the long lru lists with a large
> fraction of unreclaimable pages...  If we do want to support keeping
> nonreclaimable pages off the [in]active lists for these systems, we'll
> need to find a place for the flag[s].

That's true, they will have a lot less pages (and probably won't
be using highmem).


> > Are you seeing mlock pinning heaps of memory in the field?
> 
> It is a common usage to mlock() large shared memory areas, as well as
> entire tasks [MLOCK_CURRENT|MLOCK_FUTURE].  I think it would be even
> more frequent if one could inherit MLOCK_FUTURE across fork and exec.
> Then one could write/enhance a prefix command, like numactl and taskset,
> to enable locking of unmodified applications.  I prototyped this once,
> but never updated it to do the mlock accounting [e.g., down in
> copy_page_range() during fork()] for your patch.
> 
> What we see more of is folks just figuring that they've got sufficient
> memory [100s of GB] for their apps and shared memory areas, so they
> don't add enough swap to back all of the anon and shmem regions.  Then,
> when they get under memory pressure--e.g., the old "backup ate my
> pagecache" scenario--the system more or less live-locks in vmscan
> shuffling non-reclaimable [unswappable] pages.  A large number of
> mlocked pages on the LRU produces the same symptom; as do excessively
> long anon_vma lists and huge i_mmap trees--the latter seen with some
> large Oracle workloads.

OK, thanks for the background.

Thanks,
Nick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-08-29  4:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-23  4:11 vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru Nick Piggin
2007-08-23  7:15 ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru Andrew Morton
2007-08-23  9:07   ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru Nick Piggin
2007-08-23 11:48     ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-aroun d-the-lru Andrea Arcangeli
2007-08-24 20:43 ` RFC: Noreclaim with "Keep Mlocked Pages off the LRU" Lee Schermerhorn
2007-08-27  1:35   ` Nick Piggin
2007-08-27 14:34     ` Lee Schermerhorn
2007-08-27 15:44       ` Christoph Hellwig
2007-08-27 23:51         ` Nick Piggin
2007-08-28 12:29           ` Christoph Hellwig
2007-08-28  0:06       ` Nick Piggin
2007-08-28 14:52         ` Lee Schermerhorn
2007-08-28 21:54           ` Christoph Lameter
2007-08-29 14:40             ` Lee Schermerhorn
2007-08-29 17:39               ` Christoph Lameter
2007-08-30  0:09                 ` Rik van Riel
2007-08-30 14:49                   ` Lee Schermerhorn
2007-08-29  4:38           ` Nick Piggin [this message]
2007-08-30 16:34             ` Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070829043803.GD25335@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox