linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] mm, compaction: avoid isolating pinned pages
Date: Mon, 3 Feb 2014 02:49:32 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.02.1402030231590.31061@chino.kir.corp.google.com> (raw)
In-Reply-To: <20140203095329.GH6732@suse.de>

On Mon, 3 Feb 2014, Mel Gorman wrote:

> > Page migration will fail for memory that is pinned in memory with, for
> > example, get_user_pages().  In this case, it is unnecessary to take
> > zone->lru_lock or isolating the page and passing it to page migration
> > which will ultimately fail.
> > 
> > This is a racy check, the page can still change from under us, but in
> > that case we'll just fail later when attempting to move the page.
> > 
> > This avoids very expensive memory compaction when faulting transparent
> > hugepages after pinning a lot of memory with a Mellanox driver.
> > 
> > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > see the enormous disparity in the number of page migration failures
> > because of the pinning (from /proc/vmstat):
> > 
> > compact_blocks_moved 7609
> > compact_pages_moved 3431
> > compact_pagemigrate_failed 133219
> > compact_stall 13
> > 
> > After the patch, it is much more efficient:
> > 
> > compact_blocks_moved 7998
> > compact_pages_moved 6403
> > compact_pagemigrate_failed 3
> > compact_stall 15
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> > ---
> >  mm/compaction.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> >  			continue;
> >  		}
> >  
> > +		/*
> > +		 * Migration will fail if an anonymous page is pinned in memory,
> > +		 * so avoid taking zone->lru_lock and isolating it unnecessarily
> > +		 * in an admittedly racy check.
> > +		 */
> > +		if (!page_mapping(page) && page_count(page))
> > +			continue;
> > +
> 
> Are you sure about this? The page_count check migration does is this
> 
>         int expected_count = 1 + extra_count;
>         if (!mapping) {
>                 if (page_count(page) != expected_count)
>                         return -EAGAIN;
>                 return MIGRATEPAGE_SUCCESS;
>         }
> 
>         spin_lock_irq(&mapping->tree_lock);
> 
>         pslot = radix_tree_lookup_slot(&mapping->page_tree,
>                                         page_index(page));
> 
>         expected_count += 1 + page_has_private(page);
> 
> Migration expects and can migrate pages with no mapping and a page count
> but you are now skipping them. I think you may have intended to split
> migrations page count into a helper or copy the logic.
> 

Thanks for taking a look!

The patch is correct, it just shows my lack of a complete commit message 
which I'm struggling with recently.  In the case that this is addressing, 
get_user_pages() already gives page_count(page) == 1, then 
__isolate_lru_page() does another get_page() that is dropped in 
putback_lru_page() after the call into migrate_pages().  So in the code 
you quote above we always have page_count(page) == 2 and
expected_count == 1.

So what we desperately need to do is avoid isolating any page where 
page_count(page) is non-zero and !page_mapping(page) and do that before 
the get_page() in __isolate_lru_page() because we want to avoid taking 
zone->lru_lock.  On my 128GB machine filled with ~120GB of pinned memory 
for the driver, this lock gets highly contended under compaction and even 
reclaim if the rest of userspace is using a lot of memory.

It's not really relevant to the commit message, but I found that if all 
that ~120GB is faulted and I manually invoke compaction with the procfs 
trigger (with my fix to do cc.ignore_skip_hint = true), this lock gets 
taken ~450,000 times and only 0.05% of isolated pages are actually 
successfully migrated.

Deferred compaction will certainly help for compaction that isn't induced 
via procfs, but we've encountered massive amounts of lock contention in 
this path and extremely low success to failure ratios of page migration on 
average of 2-3 out of 60 runs and the fault path really does grind to a 
halt without this patch (or simply doing MADV_NOHUGEPAGE before the driver 
does ib_umem_get() for 120GB of memory, but we want those hugepages!).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-02-03 10:49 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-02  5:46 David Rientjes
2014-02-03  9:53 ` Mel Gorman
2014-02-03 10:49   ` David Rientjes [this message]
2014-02-04  0:02     ` Joonsoo Kim
2014-02-04  1:20       ` [patch] mm, compaction: avoid isolating pinned pages fix David Rientjes
2014-02-04  1:53         ` Joonsoo Kim
2014-02-04  2:00           ` David Rientjes
2014-02-04  2:15             ` Joonsoo Kim
2014-02-04  2:50               ` David Rientjes
2014-02-04  3:47                 ` Hugh Dickins
2014-02-05  2:44                 ` [patch v2] mm, compaction: avoid isolating pinned pages David Rientjes
2014-02-05 20:56                   ` Hugh Dickins
2014-02-06  0:05                     ` Joonsoo Kim
2014-02-06  1:16                       ` Hugh Dickins
2014-02-06 13:53                         ` Mel Gorman
2014-02-06 18:48                   ` Vlastimil Babka
2014-02-06 21:33                     ` David Rientjes
2014-02-04  2:44     ` [patch] " Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1402030231590.31061@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox