linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages
Date: Tue, 29 Oct 2019 15:14:59 +0100	[thread overview]
Message-ID: <08a3f4dd-c3ce-0009-86c5-9ee51aba8557@suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.21.1910241156370.130350@chino.kir.corp.google.com>

On 10/24/19 8:59 PM, David Rientjes wrote:
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 4ae967bcf954..2c48146f3ee2 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -2129,18 +2129,20 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
>>  		nmask = policy_nodemask(gfp, pol);
>>  		if (!nmask || node_isset(hpage_node, *nmask)) {
>>  			mpol_cond_put(pol);
>> +			/*
>> +			 * First, try to allocate THP only on local node, but
>> +			 * don't reclaim unnecessarily, just compact.
>> +			 */
>>  			page = __alloc_pages_node(hpage_node,
>> -						gfp | __GFP_THISNODE, order);
>> +				gfp | __GFP_THISNODE | __GFP_NORETRY, order);
>>  
>>  			/*
>> -			 * If hugepage allocations are configured to always
>> -			 * synchronous compact or the vma has been madvised
>> -			 * to prefer hugepage backing, retry allowing remote
>> -			 * memory as well.
>> +			 * If that fails, allow both compaction and reclaim,
>> +			 * but on all nodes.
>>  			 */
>> -			if (!page && (gfp & __GFP_DIRECT_RECLAIM))
>> +			if (!page)
>>  				page = __alloc_pages_node(hpage_node,
>> -						gfp | __GFP_NORETRY, order);
>> +								gfp, order);
>>  
>>  			goto out;
>>  		}
> Hi Vlastimil,
> 
> For the default case where thp enabled is not set to "always" and the VMA

I assume you meant "defrag" instead of "enabled".
 
> is not madvised for MADV_HUGEPAGE, how does this prefer to return node 
> local pages rather than remote hugepages?  The idea is to optimize for 
> access latency when the vma has not been explicitly madvised.
 
Right, you mentioned this before IIRC, and I forgot. How about this?
We could be also smarter and consolidate to a single attempt if there's
actually just a single NUMA node, but that can be optimized later.

----8<----
From 4c3a2217d0ee5ead00b1443010d07c664b6ac645 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Tue, 1 Oct 2019 14:20:58 +0200
Subject: [PATCH] mm, thp: tweak reclaim/compaction effort of local-only and
 all-node allocations

THP page faults now attempt a __GFP_THISNODE allocation first, which should
only compact existing free memory, followed by another attempt that can
allocate from any node using reclaim/compaction effort specified by global
defrag setting and madvise.

This patch makes the following changes to the scheme:

- before the patch, the first allocation relies on a check for pageblock order
  and __GFP_IO to prevent excessive reclaim. This however affects also the
  second attempt, which is not limited to single node. Instead of that, reuse
  the existing check for costly order __GFP_NORETRY allocations, and make sure
  the first THP attempt uses __GFP_NORETRY. As a side-effect, all costly order
  __GFP_NORETRY allocations will bail out if compaction needs reclaim, while
  previously they only bailed out when compaction was deferred due to previous
  failures. This should be still acceptable within the __GFP_NORETRY semantics.

- before the patch, the second allocation attempt (on all nodes) was passing
  __GFP_NORETRY. This is redundant as the check for pageblock order (discussed
  above) was stronger. It's also contrary to madvise(MADV_HUGEPAGE) which means
  some effort to allocate THP is requested. After this patch, the second
  attempt doesn't pass __GFP_THISNODE nor __GFP_NORETRY.

To sum up, THP page faults now try the following attempts:

1. local node only THP allocation with no reclaim, just compaction.
2. for madvised VMA's or when synchronous compaction is enabled always - THP
   allocation from any node with effort determined by global defrag setting
   and VMA madvise
3. fallback to base pages on any node

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/mempolicy.c  | 10 +++++++---
 mm/page_alloc.c | 24 +++++-------------------
 2 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ae967bcf954..ed6fbc5b1e20 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2129,18 +2129,22 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 		nmask = policy_nodemask(gfp, pol);
 		if (!nmask || node_isset(hpage_node, *nmask)) {
 			mpol_cond_put(pol);
+			/*
+			 * First, try to allocate THP only on local node, but
+			 * don't reclaim unnecessarily, just compact.
+			 */
 			page = __alloc_pages_node(hpage_node,
-						gfp | __GFP_THISNODE, order);
+				gfp | __GFP_THISNODE | __GFP_NORETRY, order);
 
 			/*
 			 * If hugepage allocations are configured to always
 			 * synchronous compact or the vma has been madvised
 			 * to prefer hugepage backing, retry allowing remote
-			 * memory as well.
+			 * memory with both reclaim and compact as well.
 			 */
 			if (!page && (gfp & __GFP_DIRECT_RECLAIM))
 				page = __alloc_pages_node(hpage_node,
-						gfp | __GFP_NORETRY, order);
+								gfp, order);
 
 			goto out;
 		}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ecc3dbad606b..36d7d852f7b1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4473,8 +4473,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		if (page)
 			goto got_pg;
 
-		 if (order >= pageblock_order && (gfp_mask & __GFP_IO) &&
-		     !(gfp_mask & __GFP_RETRY_MAYFAIL)) {
+		/*
+		 * Checks for costly allocations with __GFP_NORETRY, which
+		 * includes some THP page fault allocations
+		 */
+		if (costly_order && (gfp_mask & __GFP_NORETRY)) {
 			/*
 			 * If allocating entire pageblock(s) and compaction
 			 * failed because all zones are below low watermarks
@@ -4495,23 +4498,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 			if (compact_result == COMPACT_SKIPPED ||
 			    compact_result == COMPACT_DEFERRED)
 				goto nopage;
-		}
-
-		/*
-		 * Checks for costly allocations with __GFP_NORETRY, which
-		 * includes THP page fault allocations
-		 */
-		if (costly_order && (gfp_mask & __GFP_NORETRY)) {
-			/*
-			 * If compaction is deferred for high-order allocations,
-			 * it is because sync compaction recently failed. If
-			 * this is the case and the caller requested a THP
-			 * allocation, we do not want to heavily disrupt the
-			 * system, so we fail the allocation instead of entering
-			 * direct reclaim.
-			 */
-			if (compact_result == COMPACT_DEFERRED)
-				goto nopage;
 
 			/*
 			 * Looks like reclaim/compaction is worth trying, but
-- 
2.23.0



  reply	other threads:[~2019-10-29 14:15 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04 19:54 David Rientjes
2019-09-04 19:54 ` [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed David Rientjes
2019-09-05  9:00   ` Michal Hocko
2019-09-05 11:22     ` Vlastimil Babka
2019-09-05 20:53       ` Mike Kravetz
2019-09-06 20:16         ` David Rientjes
2019-09-06 20:49       ` David Rientjes
2019-09-04 20:43 ` [patch for-5.3 0/4] revert immediate fallback to remote hugepages Linus Torvalds
2019-09-05 20:54   ` David Rientjes
2019-09-07 19:51     ` David Rientjes
2019-09-07 19:55       ` Linus Torvalds
2019-09-08  1:50         ` David Rientjes
2019-09-08 12:47           ` Vlastimil Babka
2019-09-08 20:45             ` David Rientjes
2019-09-09  8:37               ` Michal Hocko
2019-09-04 20:55 ` Andrea Arcangeli
2019-09-05 21:06   ` David Rientjes
2019-09-09 19:30     ` Michal Hocko
2019-09-25  7:08       ` Michal Hocko
2019-09-26 19:03         ` David Rientjes
2019-09-27  7:48           ` Michal Hocko
2019-09-28 20:59             ` Linus Torvalds
2019-09-30 11:28               ` Michal Hocko
2019-10-01  5:43                 ` Michal Hocko
2019-10-01  8:37                   ` Michal Hocko
2019-10-18 14:15                     ` Michal Hocko
2019-10-23 11:03                       ` Vlastimil Babka
2019-10-24 18:59                         ` David Rientjes
2019-10-29 14:14                           ` Vlastimil Babka [this message]
2019-10-29 15:15                             ` Michal Hocko
2019-10-29 21:33                               ` Andrew Morton
2019-10-29 21:45                                 ` Vlastimil Babka
2019-10-29 23:25                                 ` David Rientjes
2019-11-05 13:02                                   ` Michal Hocko
2019-11-06  1:01                                     ` David Rientjes
2019-11-06  7:35                                       ` Michal Hocko
2019-11-06 21:32                                         ` David Rientjes
2019-11-13 11:20                                           ` Mel Gorman
2019-11-25  0:10                                             ` David Rientjes
2019-11-25 11:47                                               ` Michal Hocko
2019-11-25 20:38                                                 ` David Rientjes
2019-11-25 21:34                                                   ` Vlastimil Babka
2019-10-01 13:50                   ` Vlastimil Babka
2019-10-01 20:31                     ` David Rientjes
2019-10-01 21:54                       ` Vlastimil Babka
2019-10-02 10:34                         ` Michal Hocko
2019-10-02 22:32                           ` David Rientjes
2019-10-03  8:00                             ` Vlastimil Babka
2019-10-04 12:18                               ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08a3f4dd-c3ce-0009-86c5-9ee51aba8557@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox