From: Johannes Weiner <hannes@cmpxchg.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>, Joonsoo Kim <js1304@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@fb.com
Subject: Re: Regression in mobility grouping?
Date: Wed, 28 Sep 2016 11:39:25 -0400 [thread overview]
Message-ID: <20160928153925.GA24966@cmpxchg.org> (raw)
In-Reply-To: <8c3b7dd8-ef6f-6666-2f60-8168d41202cf@suse.cz>
Hi Vlastimil,
On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> On 09/28/2016 03:41 AM, Johannes Weiner wrote:
> > Hi guys,
> >
> > we noticed what looks like a regression in page mobility grouping
> > during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> > uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> >
> > Number of blocks type Unmovable Reclaimable Movable Reserve Isolate
> > Node 1, zone Normal 815 433 31518 2 0
> >
> > and on 4.0 like this:
> >
> > Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate
> > Node 1, zone Normal 3880 3530 25356 2 0 0
>
> It's worth to keep in mind that this doesn't reflect where the actual
> unmovable pages reside. It might be that in 3.10 they are spread within
> the movable pages. IIRC enabling page_owner (not sure if in 4.0, there
> were some later fixes I think) can augment pagetypeinfo with at least
> some statistics of polluted pageblocks.
Thanks, I'll look at the mixed block counts. I failed to make clear,
we saw that issue in the switch from 3.10 to 4.0, and I mentioned
those two kernels as last known good / first known bad. But later
kernels - we tried with 4.6 - look the same. This appears to be a
regression in (higher-order) allocation service quality somewhere
after 3.10 that persists into current kernels.
> Does e.g. /proc/meminfo suggest how much unmovable/reclaimable memory
> there should be allocated and if it would fill the respective
> pageblocks, or if they are poorly utilized?
They are very poorly utilized. On a machine with 90% anon/cache pages
alone we saw 50% of the page blocks unmovable.
> > 4.0 is either polluting pageblocks more aggressively at allocation, or
> > is not able to make pageblocks movable again when the reclaimable and
> > unmovable allocations are released. Invoking compaction manually
> > (/proc/sys/vm/compact_memory) is not bringing them back, either.
> >
> > The problem we are debugging is that these machines have a very high
> > rate of order-3 allocations (fdtable during fork, network rx), and
> > after the upgrade allocstalls have increased dramatically. I'm not
> > entirely sure this is the same issue, since even order-0 allocations
> > are struggling, but the mobility grouping in itself looks problematic.
> >
> > I'm still going through the changes relevant to mobility grouping in
> > that timeframe, but if this rings a bell for anyone, it would help. I
> > hate blaming random patches, but these caught my eye:
> >
> > 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> > 3a1086f mm: always steal split buddies in fallback allocations
> > 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page
>
> Check also the changelogs for mentions of earlier commits, e.g. 99592d5
> should be restoring behavior that changed in 3.12-3.13 and you are
> upgrading from 3.10.
Good point.
> > The changelog states that by aggressively stealing split buddy pages
> > during a fallback allocation we avoid subsequent stealing. But since
> > there are generally more movable/reclaimable pages available, and so
> > less falling back and stealing freepages on behalf of movable, won't
> > this mean that we could expect exactly that result - growing numbers
> > of unmovable blocks, while rarely stealing them back in movable alloc
> > fallbacks? And the expansion of !MOVABLE blocks would over time make
> > compaction less and less effective too, seeing as it doesn't consider
> > anything !MOVABLE suitable migration targets?
>
> Yeah this is an issue with compaction that was brought up recently and I
> want to tackle next.
Agreed, it would be nice if compaction could reclaim unmovable and
reclaimable blocks whose polluting allocations have since been freed.
But there is a limit to how lazy mobility grouping can be and still
expect compaction to fix it up. If 50% of the page blocks are marked
unmovable, we don't pack incoming polluting allocations. When spread
out the right way, even just a few of those can have a devastating
impact on overall compactability.
So regardless of future compaction improvements, we need to get
anti-frag accuracy in the allocator closer to 3.10 levels again.
> > Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both
> > kernels on machines with similar uptimes and directly after invoking
> > compaction. As you can see, the buddy lists are much more fragmented
> > on 4.0, with unmovable/reclaimable allocations polluting more blocks.
> >
> > Any thoughts on this would be greatly appreciated. I can test patches.
>
> I guess testing revert of 9c0415e could give us some idea. Commit
> 3a1086f shouldn't result in pageblock marking differences and as I said
> above, 99592d5 should be just restoring to what 3.10 did.
I can give this a shot, but note that this commit makes only unmovable
stealing more aggressive. We see reclaimable blocks up as well.
The workload is fairly variable, so it'll take about a day to smooth
out a meaningful average.
Thanks for your insights, Vlastimil!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-09-28 15:39 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-28 1:41 Johannes Weiner
2016-09-28 9:00 ` Vlastimil Babka
2016-09-28 15:39 ` Johannes Weiner [this message]
2016-09-29 2:25 ` Johannes Weiner
2016-09-29 6:14 ` Joonsoo Kim
2016-09-29 16:14 ` Johannes Weiner
2016-10-13 7:33 ` Joonsoo Kim
2016-09-29 7:17 ` Vlastimil Babka
2016-09-28 10:26 ` Mel Gorman
2016-09-28 16:37 ` Johannes Weiner
2016-09-29 21:05 ` [RFC 0/4] try to reduce fragmenting fallbacks Vlastimil Babka
2016-09-29 21:05 ` [RFC 1/4] mm, compaction: change migrate_async_suitable() to suitable_migration_source() Vlastimil Babka
2016-09-29 21:05 ` [RFC 2/4] mm, compaction: add migratetype to compact_control Vlastimil Babka
2016-09-29 21:05 ` [RFC 3/4] mm, compaction: restrict async compaction to matching migratetype Vlastimil Babka
2016-09-29 21:05 ` [RFC 4/4] mm, page_alloc: disallow migratetype fallback in fastpath Vlastimil Babka
2016-10-12 14:51 ` Vlastimil Babka
2016-10-13 7:58 ` Joonsoo Kim
2016-10-13 11:46 ` Vlastimil Babka
2016-10-07 8:32 ` [RFC 5/4] mm, page_alloc: split smallest stolen page in fallback Vlastimil Babka
2016-10-10 17:16 ` [RFC 0/4] try to reduce fragmenting fallbacks Johannes Weiner
2016-10-11 13:11 ` [RFC 6/4] mm, page_alloc: introduce MIGRATE_MIXED migratetype Vlastimil Babka
2016-10-13 14:11 ` [RFC 7/4] mm, page_alloc: count movable pages when stealing Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160928153925.GA24966@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=js1304@gmail.com \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox