Re: [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Michal Hocko <mhocko@kernel.org>, David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@suse.de>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed
Date: Thu, 5 Sep 2019 13:22:39 +0200	[thread overview]
Message-ID: <fab91766-da33-d62f-59fb-c226e4790a91@suse.cz> (raw)
In-Reply-To: <20190905090009.GF3838@dhcp22.suse.cz>

On 9/5/19 11:00 AM, Michal Hocko wrote:
> [Ccing Mike for checking on the hugetlb side of this change]
> 
> On Wed 04-09-19 12:54:22, David Rientjes wrote:
>> Memory compaction has a couple significant drawbacks as the allocation
>> order increases, specifically:
>>
>>  - isolate_freepages() is responsible for finding free pages to use as
>>    migration targets and is implemented as a linear scan of memory
>>    starting at the end of a zone,

Note that's no longer entirely true, see fast_isolate_freepages().

>>  - failing order-0 watermark checks in memory compaction does not account
>>    for how far below the watermarks the zone actually is: to enable
>>    migration, there must be *some* free memory available.  Per the above,
>>    watermarks are not always suffficient if isolate_freepages() cannot
>>    find the free memory but it could require hundreds of MBs of reclaim to
>>    even reach this threshold (read: potentially very expensive reclaim with
>>    no indication compaction can be successful), and

I doubt it's hundreds of MBs for a 2MB hugepage.

>>  - if compaction at this order has failed recently so that it does not even
>>    run as a result of deferred compaction, looping through reclaim can often
>>    be pointless.

Agreed.

>> For hugepage allocations, these are quite substantial drawbacks because
>> these are very high order allocations (order-9 on x86) and falling back to
>> doing reclaim can potentially be *very* expensive without any indication
>> that compaction would even be successful.

You seem to lump together hugetlbfs and THP here, by saying "hugepage",
but these are very different things - hugetlbfs reservations are
expected to be potentially expensive.

>> Reclaim itself is unlikely to free entire pageblocks and certainly no
>> reliance should be put on it to do so in isolation (recall lumpy reclaim).
>> This means we should avoid reclaim and simply fail hugepage allocation if
>> compaction is deferred.

It is however possible that reclaim frees enough to make even a
previously deferred compaction succeed.

>> It is also not helpful to thrash a zone by doing excessive reclaim if
>> compaction may not be able to access that memory.  If order-0 watermarks
>> fail and the allocation order is sufficiently large, it is likely better
>> to fail the allocation rather than thrashing the zone.
>>
>> Signed-off-by: David Rientjes <rientjes@google.com>
>> ---
>>  mm/page_alloc.c | 22 ++++++++++++++++++++++
>>  1 file changed, 22 insertions(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4458,6 +4458,28 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>>  		if (page)
>>  			goto got_pg;
>>  
>> +		 if (order >= pageblock_order && (gfp_mask & __GFP_IO)) {
>> +			/*
>> +			 * If allocating entire pageblock(s) and compaction
>> +			 * failed because all zones are below low watermarks
>> +			 * or is prohibited because it recently failed at this
>> +			 * order, fail immediately.
>> +			 *
>> +			 * Reclaim is
>> +			 *  - potentially very expensive because zones are far
>> +			 *    below their low watermarks or this is part of very
>> +			 *    bursty high order allocations,
>> +			 *  - not guaranteed to help because isolate_freepages()
>> +			 *    may not iterate over freed pages as part of its
>> +			 *    linear scan, and
>> +			 *  - unlikely to make entire pageblocks free on its
>> +			 *    own.
>> +			 */
>> +			if (compact_result == COMPACT_SKIPPED ||
>> +			    compact_result == COMPACT_DEFERRED)
>> +				goto nopage;

As I said, I expect this will make hugetlbfs reservations fail
prematurely - Mike can probably confirm or disprove that.
I think it also addresses consequences, not the primary problem, IMHO.
I believe the primary problem is that we reclaim something even if
there's enough memory for compaction. This won't change with your patch,
as compact_result won't be SKIPPED in that case. Then we continue
through to __alloc_pages_direct_reclaim(), shrink_zones() which will
call compaction_ready(), which will only return true and skip reclaim of
the zone, if there's high_watermark (!!!) + compact_gap() pages. But as
long as one zone isn't compaction_ready(), we enter shrink_node(), which
will reclaim something and call should_continue_reclaim() where we might
finally notice that compaction_suitable() returns CONTINUE, and abort
reclaim.

Thus I think the right solution might be to really avoid reclaim for
zones where compaction is not skipped, while your patch avoids reclaim
when compaction is skipped. The per-node reclaim vs per-zone compaction
might complicate those decisions a lot, though.

>> +		}
>> +
>>  		/*
>>  		 * Checks for costly allocations with __GFP_NORETRY, which
>>  		 * includes THP page fault allocations
>

next prev parent reply	other threads:[~2019-09-05 11:22 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04 19:54 [patch for-5.3 0/4] revert immediate fallback to remote hugepages David Rientjes
2019-09-04 19:54 ` [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed David Rientjes
2019-09-05  9:00   ` Michal Hocko
2019-09-05 11:22     ` Vlastimil Babka [this message]
2019-09-05 20:53       ` Mike Kravetz
2019-09-06 20:16         ` David Rientjes
2019-09-06 20:49       ` David Rientjes
2019-09-04 20:43 ` [patch for-5.3 0/4] revert immediate fallback to remote hugepages Linus Torvalds
2019-09-05 20:54   ` David Rientjes
2019-09-07 19:51     ` David Rientjes
2019-09-07 19:55       ` Linus Torvalds
2019-09-08  1:50         ` David Rientjes
2019-09-08 12:47           ` Vlastimil Babka
2019-09-08 20:45             ` David Rientjes
2019-09-09  8:37               ` Michal Hocko
2019-09-04 20:55 ` Andrea Arcangeli
2019-09-05 21:06   ` David Rientjes
2019-09-09 19:30     ` Michal Hocko
2019-09-25  7:08       ` Michal Hocko
2019-09-26 19:03         ` David Rientjes
2019-09-27  7:48           ` Michal Hocko
2019-09-28 20:59             ` Linus Torvalds
2019-09-30 11:28               ` Michal Hocko
2019-10-01  5:43                 ` Michal Hocko
2019-10-01  8:37                   ` Michal Hocko
2019-10-18 14:15                     ` Michal Hocko
2019-10-23 11:03                       ` Vlastimil Babka
2019-10-24 18:59                         ` David Rientjes
2019-10-29 14:14                           ` Vlastimil Babka
2019-10-29 15:15                             ` Michal Hocko
2019-10-29 21:33                               ` Andrew Morton
2019-10-29 21:45                                 ` Vlastimil Babka
2019-10-29 23:25                                 ` David Rientjes
2019-11-05 13:02                                   ` Michal Hocko
2019-11-06  1:01                                     ` David Rientjes
2019-11-06  7:35                                       ` Michal Hocko
2019-11-06 21:32                                         ` David Rientjes
2019-11-13 11:20                                           ` Mel Gorman
2019-11-25  0:10                                             ` David Rientjes
2019-11-25 11:47                                               ` Michal Hocko
2019-11-25 20:38                                                 ` David Rientjes
2019-11-25 21:34                                                   ` Vlastimil Babka
2019-10-01 13:50                   ` Vlastimil Babka
2019-10-01 20:31                     ` David Rientjes
2019-10-01 21:54                       ` Vlastimil Babka
2019-10-02 10:34                         ` Michal Hocko
2019-10-02 22:32                           ` David Rientjes
2019-10-03  8:00                             ` Vlastimil Babka
2019-10-04 12:18                               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fab91766-da33-d62f-59fb-c226e4790a91@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox