linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [UNTESTED RFC PATCH 0/8] compaction scanners rework
@ 2017-12-13  8:59 Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 1/8] mm, compaction: don't mark pageblocks unsuitable when not fully scanned Vlastimil Babka
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Vlastimil Babka @ 2017-12-13  8:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Johannes Weiner, Mel Gorman, Joonsoo Kim, David Rientjes,
	Vlastimil Babka

Hi,

I have been working on this in the past weeks, but probably won't have time to
finish and test properly this year. So here's an UNTESTED RFC for those brave
enough to test, and also for review comments. I've been focusing on 1-7, and
patch 8 is unchanged since the last posting,  so Mel's suggestions (wrt
fallbacks and scanning pageblock where we get the free page from) from are not
included yet.

For context, please see the recent threads [1] [2]. The main goal is to
eliminate the reported huge free scanner activity by replacing the scanner with
allocation from free lists. This has some dangers of excessive migrations as
described in Patch 8 commit log, so the earlier patches try to eliminate most
of them by making the migration scanner decide to actually migrate pages only
if it looks like it can succeed. This should be benefical even in the current
scheme.

[1] https://lkml.kernel.org/r/20171122143321.29501-1-hannes@cmpxchg.org
[2] https://lkml.kernel.org/r/0168732b-d53f-a1b8-6623-4e4e26b85c5d@suse.cz

Vlastimil Babka (8):
  mm, compaction: don't mark pageblocks unsuitable when not fully
    scanned
  mm, compaction: skip_on_failure only for MIGRATE_MOVABLE allocations
  mm, compaction: pass valid_page to isolate_migratepages_block
  mm, compaction: skip on isolation failure also in sync compaction
  mm, compaction: factor out checking if page can be isolated for
    migration
  mm, compaction: prescan before isolating in skip_on_failure mode
  mm, compaction: prescan all MIGRATE_MOVABLE pageblocks
  mm, compaction: replace free scanner with direct freelist allocation

 include/linux/vm_event_item.h |   2 +
 mm/compaction.c               | 311 ++++++++++++++++++++++++++++++++----------
 mm/internal.h                 |   3 +
 mm/page_alloc.c               |  71 ++++++++++
 mm/vmstat.c                   |   3 +
 5 files changed, 316 insertions(+), 74 deletions(-)

-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/8] mm, compaction: don't mark pageblocks unsuitable when not fully scanned
  2017-12-13  8:59 [UNTESTED RFC PATCH 0/8] compaction scanners rework Vlastimil Babka
@ 2017-12-13  8:59 ` Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 2/8] mm, compaction: skip_on_failure only for MIGRATE_MOVABLE allocations Vlastimil Babka
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2017-12-13  8:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Johannes Weiner, Mel Gorman, Joonsoo Kim, David Rientjes,
	Vlastimil Babka

Compaction migration scanner marks a pageblock as unsuitable (via pageblock
skip bit) if it fails to isolate any pages in them. When scanning for async
direct compaction, it skips all pages of a order-aligned block once a page
fails isolation, because a single page is enough to prevent forming a free page
of given order. But the skipped pages might still be migratable and form a free
page of a lower order. Therefore we should not mark pageblock unsuitable, if
skipping has happened. The worst example would be a THP allocation attempt
marking pageblock unsuitable due to a single page, so the following lower-order
and more critical allocation will skip the pageblock.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b8c23882c8ae..ce73badad464 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -698,7 +698,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	bool locked = false;
 	struct page *page = NULL, *valid_page = NULL;
 	unsigned long start_pfn = low_pfn;
-	bool skip_on_failure = false;
+	bool skip_on_failure = false, skipped_pages = false;
 	unsigned long next_skip_pfn = 0;
 
 	/*
@@ -920,13 +920,14 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			nr_isolated = 0;
 		}
 
-		if (low_pfn < next_skip_pfn) {
+		if (low_pfn < next_skip_pfn - 1) {
 			low_pfn = next_skip_pfn - 1;
 			/*
 			 * The check near the loop beginning would have updated
 			 * next_skip_pfn too, but this is a bit simpler.
 			 */
 			next_skip_pfn += 1UL << cc->order;
+			skipped_pages = true;
 		}
 	}
 
@@ -944,7 +945,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	 * Update the pageblock-skip information and cached scanner pfn,
 	 * if the whole pageblock was scanned without isolating any page.
 	 */
-	if (low_pfn == end_pfn)
+	if (low_pfn == end_pfn && !skipped_pages)
 		update_pageblock_skip(cc, valid_page, nr_isolated, true);
 
 	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/8] mm, compaction: skip_on_failure only for MIGRATE_MOVABLE allocations
  2017-12-13  8:59 [UNTESTED RFC PATCH 0/8] compaction scanners rework Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 1/8] mm, compaction: don't mark pageblocks unsuitable when not fully scanned Vlastimil Babka
@ 2017-12-13  8:59 ` Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 3/8] mm, compaction: pass valid_page to isolate_migratepages_block Vlastimil Babka
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2017-12-13  8:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Johannes Weiner, Mel Gorman, Joonsoo Kim, David Rientjes,
	Vlastimil Babka

When migration scanner skips the rest of cc->order aligned block on isolation
failure, it avoids making migrations that cannot help the allocation at hand to
succeed, but the potential downside is not freeing pages in !MIGRATE_MOVABLE
pageblocks which could otherwise prevent allocations of same migratetype from
fallback to other pageblock types. Therefore let's restrict the skipping only
to MIGRATE_MOVABLE allocations, which in the async direct compaction mode only
scan MIGRATE_MOVABLE pageblocks.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ce73badad464..95b8b5ae59c5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -720,7 +720,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	if (compact_should_abort(cc))
 		return 0;
 
-	if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) {
+	if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC) &&
+			cc->migratetype == MIGRATE_MOVABLE) {
 		skip_on_failure = true;
 		next_skip_pfn = block_end_pfn(low_pfn, cc->order);
 	}
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 3/8] mm, compaction: pass valid_page to isolate_migratepages_block
  2017-12-13  8:59 [UNTESTED RFC PATCH 0/8] compaction scanners rework Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 1/8] mm, compaction: don't mark pageblocks unsuitable when not fully scanned Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 2/8] mm, compaction: skip_on_failure only for MIGRATE_MOVABLE allocations Vlastimil Babka
@ 2017-12-13  8:59 ` Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 4/8] mm, compaction: skip on isolation failure also in sync compaction Vlastimil Babka
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2017-12-13  8:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Johannes Weiner, Mel Gorman, Joonsoo Kim, David Rientjes,
	Vlastimil Babka

The valid_page pointer is needed to operate on pageblock bits. The next
patches will need it sooner in isolate_migratepages_block() than currently
estabilished. Since isolate_migratepages() has the pointer already, pass it
down. CMA's isolate_migratepages_range() doesn't, but we will use it only
for compaction so that's ok.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 95b8b5ae59c5..00dc46343093 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -674,6 +674,8 @@ static bool too_many_isolated(struct zone *zone)
  *				  a single pageblock
  * @cc:		Compaction control structure.
  * @low_pfn:	The first PFN to isolate
+ * @valid_page: Page belonging to same pageblock as low_pfn (for pageblock
+ *              flag operations). May be NULL.
  * @end_pfn:	The one-past-the-last PFN to isolate, within same pageblock
  * @isolate_mode: Isolation mode to be used.
  *
@@ -689,14 +691,15 @@ static bool too_many_isolated(struct zone *zone)
  */
 static unsigned long
 isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
-			unsigned long end_pfn, isolate_mode_t isolate_mode)
+		struct page *valid_page, unsigned long end_pfn,
+		isolate_mode_t isolate_mode)
 {
 	struct zone *zone = cc->zone;
 	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct lruvec *lruvec;
 	unsigned long flags = 0;
 	bool locked = false;
-	struct page *page = NULL, *valid_page = NULL;
+	struct page *page = NULL;
 	unsigned long start_pfn = low_pfn;
 	bool skip_on_failure = false, skipped_pages = false;
 	unsigned long next_skip_pfn = 0;
@@ -992,7 +995,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 					block_end_pfn, cc->zone))
 			continue;
 
-		pfn = isolate_migratepages_block(cc, pfn, block_end_pfn,
+		pfn = isolate_migratepages_block(cc, pfn, NULL, block_end_pfn,
 							ISOLATE_UNEVICTABLE);
 
 		if (!pfn)
@@ -1282,7 +1285,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			continue;
 
 		/* Perform the isolation */
-		low_pfn = isolate_migratepages_block(cc, low_pfn,
+		low_pfn = isolate_migratepages_block(cc, low_pfn, page,
 						block_end_pfn, isolate_mode);
 
 		if (!low_pfn || cc->contended)
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 4/8] mm, compaction: skip on isolation failure also in sync compaction
  2017-12-13  8:59 [UNTESTED RFC PATCH 0/8] compaction scanners rework Vlastimil Babka
                   ` (2 preceding siblings ...)
  2017-12-13  8:59 ` [RFC PATCH 3/8] mm, compaction: pass valid_page to isolate_migratepages_block Vlastimil Babka
@ 2017-12-13  8:59 ` Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 5/8] mm, compaction: factor out checking if page can be isolated for migration Vlastimil Babka
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2017-12-13  8:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Johannes Weiner, Mel Gorman, Joonsoo Kim, David Rientjes,
	Vlastimil Babka

When scanning for async direct compaction for movable allocation, migration
scanner skips all pages of an order-aligned block once a page fails isolation,
because a single page is enough to prevent forming a free page of given order.
The same is true for sync compaction, so extend the heuristic to there as well.

But make sure we don't skip inside !MOVABLE pageblocks, where we generally want
to migrate all movable pages away from them to prevent non-movable allocations
falling back to more movable blocks before using up all non-movable blocks for
non-movable allocations. Until now this goal relied on async direct compaction
for movable allocation scanning only movable pageblocks, and sync direct
compaction to not skip at all.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 00dc46343093..4f93a7307fb5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -703,6 +703,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	unsigned long start_pfn = low_pfn;
 	bool skip_on_failure = false, skipped_pages = false;
 	unsigned long next_skip_pfn = 0;
+	int pageblock_mt;
 
 	/*
 	 * Ensure that there are not too many pages isolated from the LRU
@@ -723,10 +724,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	if (compact_should_abort(cc))
 		return 0;
 
-	if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC) &&
-			cc->migratetype == MIGRATE_MOVABLE) {
-		skip_on_failure = true;
-		next_skip_pfn = block_end_pfn(low_pfn, cc->order);
+	if (cc->direct_compaction && !cc->finishing_block) {
+		pageblock_mt = get_pageblock_migratetype(valid_page);
+		if (pageblock_mt == MIGRATE_MOVABLE
+		    && cc->migratetype == MIGRATE_MOVABLE) {
+			skip_on_failure = true;
+			next_skip_pfn = block_end_pfn(low_pfn, cc->order);
+		}
 	}
 
 	/* Time to isolate some pages for migration */
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 5/8] mm, compaction: factor out checking if page can be isolated for migration
  2017-12-13  8:59 [UNTESTED RFC PATCH 0/8] compaction scanners rework Vlastimil Babka
                   ` (3 preceding siblings ...)
  2017-12-13  8:59 ` [RFC PATCH 4/8] mm, compaction: skip on isolation failure also in sync compaction Vlastimil Babka
@ 2017-12-13  8:59 ` Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 6/8] mm, compaction: prescan before isolating in skip_on_failure mode Vlastimil Babka
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2017-12-13  8:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Johannes Weiner, Mel Gorman, Joonsoo Kim, David Rientjes,
	Vlastimil Babka

The following patch will introduce pre-scanning in migration scanner, which
will check for pages that can be isolated, without actually isolating them. To
prepare for this, move the checking into a new function. No functional change.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 150 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 88 insertions(+), 62 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 4f93a7307fb5..1ef090aa96e6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -669,6 +669,80 @@ static bool too_many_isolated(struct zone *zone)
 	return isolated > (inactive + active) / 2;
 }
 
+enum candidate_status {
+	CANDIDATE_FAIL,
+	CANDIDATE_FREE,
+	CANDIDATE_LRU,
+	CANDIDATE_OK = CANDIDATE_LRU,
+	CANDIDATE_MOVABLE
+};
+
+static enum candidate_status
+check_isolate_candidate(struct page *page, unsigned long *pfn,
+						struct compact_control *cc)
+{
+	/*
+	 * Skip free pages. We read page order here without zone lock which is
+	 * generally unsafe, but the race window is small and the worst thing
+	 * that can happen is that we skip some potential isolation targets.
+	 */
+	if (PageBuddy(page)) {
+		unsigned long freepage_order = page_order_unsafe(page);
+
+		/*
+		 * Without lock, we cannot be sure that what we got is a valid
+		 * page order. Consider only values in the valid order range to
+		 * prevent _pfn overflow.
+		 */
+		if (freepage_order > 0 && freepage_order < MAX_ORDER)
+			*pfn += (1UL << freepage_order) - 1;
+		return CANDIDATE_FREE;
+	}
+
+	/*
+	 * Regardless of being on LRU, compound pages such as THP and hugetlbfs
+	 * are not to be compacted. We can potentially save a lot of iterations
+	 * if we skip them at once. The check is racy, but we can consider only
+	 * valid values and the only danger is skipping too much.
+	 */
+	if (PageCompound(page)) {
+		const unsigned int order = compound_order(page);
+
+		if (likely(order < MAX_ORDER))
+			*pfn += (1UL << order) - 1;
+		return CANDIDATE_FAIL;
+	}
+
+	/*
+	 * Check may be lockless but that's ok as we recheck later.  It's
+	 * possible to migrate LRU and non-lru movable pages.  Skip any other
+	 * type of page
+	 */
+	if (!PageLRU(page)) {
+		if (unlikely(__PageMovable(page)) && !PageIsolated(page))
+			return CANDIDATE_MOVABLE;
+
+		return CANDIDATE_FAIL;
+	}
+
+	/*
+	 * Migration will fail if an anonymous page is pinned in memory, so
+	 * avoid taking lru_lock and isolating it unnecessarily in an admittedly
+	 * racy check.
+	 */
+	if (!page_mapping(page) && page_count(page) > page_mapcount(page))
+		return CANDIDATE_FAIL;
+
+	/*
+	 * Only allow to migrate anonymous pages in GFP_NOFS context because
+	 * those do not depend on fs locks.
+	 */
+	if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page))
+		return CANDIDATE_FAIL;
+
+	return CANDIDATE_LRU;
+}
+
 /**
  * isolate_migratepages_block() - isolate all migrate-able pages within
  *				  a single pageblock
@@ -736,6 +810,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	/* Time to isolate some pages for migration */
 	for (; low_pfn < end_pfn; low_pfn++) {
 
+		enum candidate_status status;
+
 		if (skip_on_failure && low_pfn >= next_skip_pfn) {
 			/*
 			 * We have isolated all migration candidates in the
@@ -777,82 +853,32 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		if (!valid_page)
 			valid_page = page;
 
-		/*
-		 * Skip if free. We read page order here without zone lock
-		 * which is generally unsafe, but the race window is small and
-		 * the worst thing that can happen is that we skip some
-		 * potential isolation targets.
-		 */
-		if (PageBuddy(page)) {
-			unsigned long freepage_order = page_order_unsafe(page);
+		status = check_isolate_candidate(page, &low_pfn, cc);
 
-			/*
-			 * Without lock, we cannot be sure that what we got is
-			 * a valid page order. Consider only values in the
-			 * valid order range to prevent low_pfn overflow.
-			 */
-			if (freepage_order > 0 && freepage_order < MAX_ORDER)
-				low_pfn += (1UL << freepage_order) - 1;
+		if (status == CANDIDATE_FREE)
 			continue;
-		}
-
-		/*
-		 * Regardless of being on LRU, compound pages such as THP and
-		 * hugetlbfs are not to be compacted. We can potentially save
-		 * a lot of iterations if we skip them at once. The check is
-		 * racy, but we can consider only valid values and the only
-		 * danger is skipping too much.
-		 */
-		if (PageCompound(page)) {
-			const unsigned int order = compound_order(page);
-
-			if (likely(order < MAX_ORDER))
-				low_pfn += (1UL << order) - 1;
+		else if (status == CANDIDATE_FAIL)
 			goto isolate_fail;
-		}
 
-		/*
-		 * Check may be lockless but that's ok as we recheck later.
-		 * It's possible to migrate LRU and non-lru movable pages.
-		 * Skip any other type of page
-		 */
-		if (!PageLRU(page)) {
+		if (unlikely(status == CANDIDATE_MOVABLE)) {
 			/*
 			 * __PageMovable can return false positive so we need
 			 * to verify it under page_lock.
 			 */
-			if (unlikely(__PageMovable(page)) &&
-					!PageIsolated(page)) {
-				if (locked) {
-					spin_unlock_irqrestore(zone_lru_lock(zone),
-									flags);
-					locked = false;
-				}
-
-				if (!isolate_movable_page(page, isolate_mode))
-					goto isolate_success;
+			if (locked) {
+				spin_unlock_irqrestore(zone_lru_lock(zone),
+								flags);
+				locked = false;
 			}
 
-			goto isolate_fail;
+			if (!isolate_movable_page(page, isolate_mode))
+				goto isolate_success;
 		}
 
 		/*
-		 * Migration will fail if an anonymous page is pinned in memory,
-		 * so avoid taking lru_lock and isolating it unnecessarily in an
-		 * admittedly racy check.
+		 * The remaining case is CANDIDATE_LRU. If we already hold the
+		 * lock, we can skip some rechecking
 		 */
-		if (!page_mapping(page) &&
-		    page_count(page) > page_mapcount(page))
-			goto isolate_fail;
-
-		/*
-		 * Only allow to migrate anonymous pages in GFP_NOFS context
-		 * because those do not depend on fs locks.
-		 */
-		if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page))
-			goto isolate_fail;
-
-		/* If we already hold the lock, we can skip some rechecking */
 		if (!locked) {
 			locked = compact_trylock_irqsave(zone_lru_lock(zone),
 								&flags, cc);
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 6/8] mm, compaction: prescan before isolating in skip_on_failure mode
  2017-12-13  8:59 [UNTESTED RFC PATCH 0/8] compaction scanners rework Vlastimil Babka
                   ` (4 preceding siblings ...)
  2017-12-13  8:59 ` [RFC PATCH 5/8] mm, compaction: factor out checking if page can be isolated for migration Vlastimil Babka
@ 2017-12-13  8:59 ` Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 7/8] mm, compaction: prescan all MIGRATE_MOVABLE pageblocks Vlastimil Babka
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2017-12-13  8:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Johannes Weiner, Mel Gorman, Joonsoo Kim, David Rientjes,
	Vlastimil Babka

When migration scanner skips cc->order aligned block where a page cannot be
isolated, it could have isolated some pages already and they have to be put
back, which is wasted work. Worse, since we can only isolate and migrate up to
COMPACT_CLUSTER_MAX pages (which is 32) we might have already migrated a number
of pages before finding a page that can't be isolated. This can be a lot of
wasted effort e.g. for a THP allocation (512 pages on x86).

This patch introduces "pre-scanning" in the migration scanner which checks for
a whole cc->order aligned block to contain pages that can be isolated, before
actually starting to isolate them. There is a new vmstat counter
compact_migrate_prescanned to monitor its activity. The result is that some
pages will be scanned twice, but that should be relatively cheap. Importantly,
the patch should avoid isolations and migrations that do not lead to the
cc->order free page to be formed.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/vm_event_item.h |   1 +
 mm/compaction.c               | 105 ++++++++++++++++++++++++++++++++++++++++++
 mm/internal.h                 |   1 +
 mm/vmstat.c                   |   1 +
 4 files changed, 108 insertions(+)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 5c7f010676a7..cf92b1f115ee 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -55,6 +55,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #endif
 #ifdef CONFIG_COMPACTION
 		COMPACTMIGRATE_SCANNED, COMPACTFREE_SCANNED,
+		COMPACTMIGRATE_PRESCANNED,
 		COMPACTISOLATED,
 		COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
 		KCOMPACTD_WAKE,
diff --git a/mm/compaction.c b/mm/compaction.c
index 1ef090aa96e6..99c34a903688 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -743,6 +743,92 @@ check_isolate_candidate(struct page *page, unsigned long *pfn,
 	return CANDIDATE_LRU;
 }
 
+/*
+ * Scan the pages between prescan_pfn and end_pfn for a cc->order aligned block
+ * of pages that all can be isolated for migration (or are free), but do not
+ * actually isolate them. Return the first pfn (of a non-free page) in that
+ * block, so that actual isolation can begin from there, or end_pfn if no such
+ * block was found.
+ *
+ * The highest prescanned page is stored in cc->prescan_pfn.
+ */
+static unsigned long
+prescan_migratepages_block(unsigned long prescan_pfn, unsigned long end_pfn,
+		struct compact_control *cc, struct page *valid_page,
+		bool *skipped_pages)
+{
+	bool prescan_found = false;
+	unsigned long scan_start_pfn = prescan_pfn;
+	unsigned long next_skip_pfn = block_end_pfn(prescan_pfn, cc->order);
+	struct page *page;
+	unsigned long nr_prescanned = 0;
+
+	for(; prescan_pfn < end_pfn; prescan_pfn++) {
+		enum candidate_status status;
+
+		if (prescan_pfn >= next_skip_pfn) {
+			/*
+			 * We found at least one candidate in the last block and
+			 * did not see any non-migratable pages. Go isolate.
+			 */
+			if (prescan_found)
+				break;
+
+			/*
+			 * No luck with the last block, try the next one. Also
+			 * make sure the proper scan skips the former.
+			 */
+			next_skip_pfn = block_end_pfn(prescan_pfn, cc->order);
+			scan_start_pfn = prescan_pfn;
+		}
+
+		if (!(prescan_pfn % SWAP_CLUSTER_MAX))
+			cond_resched();
+
+		if (!pfn_valid_within(prescan_pfn))
+			goto scan_fail;
+		nr_prescanned++;
+
+		page = pfn_to_page(prescan_pfn);
+		if (!valid_page)
+			valid_page = page;
+
+		status = check_isolate_candidate(page, &prescan_pfn, cc);
+
+		if (status == CANDIDATE_FREE) {
+			/*
+			 * if we have only seen free pages so far, update the
+			 * proper scanner's starting pfn to skip over them.
+			 */
+			if (!prescan_found)
+				scan_start_pfn = prescan_pfn;
+			continue;
+		}
+
+		if (status != CANDIDATE_FAIL) {
+			prescan_found = true;
+			continue;
+		}
+
+scan_fail:
+		/*
+		 * We found a page that doesn't seem migratable. Skip the rest
+		 * of the block.
+		 */
+		prescan_found = false;
+		if (prescan_pfn < next_skip_pfn - 1) {
+			prescan_pfn = next_skip_pfn - 1;
+			*skipped_pages = true;
+		}
+	}
+
+	cc->prescan_pfn = min(prescan_pfn, end_pfn);
+	if (nr_prescanned)
+		count_compact_events(COMPACTMIGRATE_PRESCANNED, nr_prescanned);
+
+	return scan_start_pfn;
+}
+
 /**
  * isolate_migratepages_block() - isolate all migrate-able pages within
  *				  a single pageblock
@@ -776,6 +862,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	struct page *page = NULL;
 	unsigned long start_pfn = low_pfn;
 	bool skip_on_failure = false, skipped_pages = false;
+	bool prescan_block = false;
 	unsigned long next_skip_pfn = 0;
 	int pageblock_mt;
 
@@ -798,15 +885,33 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	if (compact_should_abort(cc))
 		return 0;
 
+	/*
+	 * If we are skipping blocks where isolation has failed, we also don't
+	 * attempt to isolate, until we prescan the whole cc->order block ahead
+	 * to check that it contains only pages that can be isolated (or free).
+	 */
 	if (cc->direct_compaction && !cc->finishing_block) {
 		pageblock_mt = get_pageblock_migratetype(valid_page);
 		if (pageblock_mt == MIGRATE_MOVABLE
 		    && cc->migratetype == MIGRATE_MOVABLE) {
+			prescan_block = true;
 			skip_on_failure = true;
 			next_skip_pfn = block_end_pfn(low_pfn, cc->order);
 		}
 	}
 
+	/*
+	 * Because we can only isolate COMPACT_CLUSTER_MAX pages at a time, it's
+	 * possible that we already prescanned the block on the previous call of
+	 * this function.
+	 */
+	if (prescan_block && cc->prescan_pfn < next_skip_pfn) {
+		low_pfn = prescan_migratepages_block(low_pfn, end_pfn, cc,
+						valid_page, &skipped_pages);
+		if (skip_on_failure)
+			next_skip_pfn = block_end_pfn(low_pfn, cc->order);
+	}
+
 	/* Time to isolate some pages for migration */
 	for (; low_pfn < end_pfn; low_pfn++) {
 
diff --git a/mm/internal.h b/mm/internal.h
index 3e5dc95dc259..35ff677cf731 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -193,6 +193,7 @@ struct compact_control {
 	unsigned long total_free_scanned;
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	unsigned long prescan_pfn;	/* highest migrate prescanned pfn */
 	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 40b2db6db6b1..cf445f8280e4 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1223,6 +1223,7 @@ const char * const vmstat_text[] = {
 #ifdef CONFIG_COMPACTION
 	"compact_migrate_scanned",
 	"compact_free_scanned",
+	"compact_migrate_prescanned",
 	"compact_isolated",
 	"compact_stall",
 	"compact_fail",
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 7/8] mm, compaction: prescan all MIGRATE_MOVABLE pageblocks
  2017-12-13  8:59 [UNTESTED RFC PATCH 0/8] compaction scanners rework Vlastimil Babka
                   ` (5 preceding siblings ...)
  2017-12-13  8:59 ` [RFC PATCH 6/8] mm, compaction: prescan before isolating in skip_on_failure mode Vlastimil Babka
@ 2017-12-13  8:59 ` Vlastimil Babka
  2017-12-13  8:59 ` [RFC PATCH 8/8] mm, compaction: replace free scanner with direct freelist allocation Vlastimil Babka
  2018-01-23 20:05 ` [UNTESTED RFC PATCH 0/8] compaction scanners rework Johannes Weiner
  8 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2017-12-13  8:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Johannes Weiner, Mel Gorman, Joonsoo Kim, David Rientjes,
	Vlastimil Babka

The migration scanner for direct compaction now prescans MIGRATE_MOVABLE blocks
for MIGRATE_MOVABLE allocations and skips those where it appears that there are
unmovable pages that can prevent forming the high-order free page.

We can extend this strategy to !MIGRATE_MOVABLE allocations scanning
MIGRATE_MOVABLE blocks, in orde to prevent wasteful migrations. The difference
is that for these kind of allocations we want to migrate all pages from the
pageblock to prevent future allocations falling back to different movable
blocks. This patch thus adds a prescanning mode for that goal.

For other types of pageblocks we still scan and migrate everything we can to
make room for further allocations of the same migratetype.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 99c34a903688..3e6a37162d77 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -750,12 +750,16 @@ check_isolate_candidate(struct page *page, unsigned long *pfn,
  * block, so that actual isolation can begin from there, or end_pfn if no such
  * block was found.
  *
+ * When skip_on_failure is false, we want to know whether there is a suitable
+ * cc->order aligned block, but then we migrate all other pages in the
+ * pageblock as well. So we return the starting pfn unchanged.
+ *
  * The highest prescanned page is stored in cc->prescan_pfn.
  */
 static unsigned long
 prescan_migratepages_block(unsigned long prescan_pfn, unsigned long end_pfn,
 		struct compact_control *cc, struct page *valid_page,
-		bool *skipped_pages)
+		bool *skipped_pages, bool skip_on_failure)
 {
 	bool prescan_found = false;
 	unsigned long scan_start_pfn = prescan_pfn;
@@ -779,7 +783,8 @@ prescan_migratepages_block(unsigned long prescan_pfn, unsigned long end_pfn,
 			 * make sure the proper scan skips the former.
 			 */
 			next_skip_pfn = block_end_pfn(prescan_pfn, cc->order);
-			scan_start_pfn = prescan_pfn;
+			if (skip_on_failure)
+				scan_start_pfn = prescan_pfn;
 		}
 
 		if (!(prescan_pfn % SWAP_CLUSTER_MAX))
@@ -800,7 +805,7 @@ prescan_migratepages_block(unsigned long prescan_pfn, unsigned long end_pfn,
 			 * if we have only seen free pages so far, update the
 			 * proper scanner's starting pfn to skip over them.
 			 */
-			if (!prescan_found)
+			if (!prescan_found && skip_on_failure)
 				scan_start_pfn = prescan_pfn;
 			continue;
 		}
@@ -822,10 +827,14 @@ prescan_migratepages_block(unsigned long prescan_pfn, unsigned long end_pfn,
 		}
 	}
 
-	cc->prescan_pfn = min(prescan_pfn, end_pfn);
 	if (nr_prescanned)
 		count_compact_events(COMPACTMIGRATE_PRESCANNED, nr_prescanned);
 
+	if (!skip_on_failure && prescan_pfn < end_pfn)
+		cc->prescan_pfn = end_pfn;
+	else
+		cc->prescan_pfn = min(prescan_pfn, end_pfn);
+
 	return scan_start_pfn;
 }
 
@@ -889,14 +898,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	 * If we are skipping blocks where isolation has failed, we also don't
 	 * attempt to isolate, until we prescan the whole cc->order block ahead
 	 * to check that it contains only pages that can be isolated (or free).
+	 *
+	 * For !MIGRATE_MOVABLE allocations we don't skip on failure, because
+	 * we want to migrate away everything to make space for future
+	 * allocations of the same type so that they don't have to fallback.
+	 * But we still don't isolate for migration in a movable pageblock where
+	 * we are not likely to succeed. So we also prescan it first.
 	 */
 	if (cc->direct_compaction && !cc->finishing_block) {
 		pageblock_mt = get_pageblock_migratetype(valid_page);
-		if (pageblock_mt == MIGRATE_MOVABLE
-		    && cc->migratetype == MIGRATE_MOVABLE) {
+		if (pageblock_mt == MIGRATE_MOVABLE) {
 			prescan_block = true;
-			skip_on_failure = true;
-			next_skip_pfn = block_end_pfn(low_pfn, cc->order);
+
+			if (cc->migratetype == MIGRATE_MOVABLE) {
+				skip_on_failure = true;
+				next_skip_pfn = block_end_pfn(low_pfn, cc->order);
+			}
 		}
 	}
 
@@ -907,7 +924,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	 */
 	if (prescan_block && cc->prescan_pfn < next_skip_pfn) {
 		low_pfn = prescan_migratepages_block(low_pfn, end_pfn, cc,
-						valid_page, &skipped_pages);
+				valid_page, &skipped_pages, skip_on_failure);
 		if (skip_on_failure)
 			next_skip_pfn = block_end_pfn(low_pfn, cc->order);
 	}
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 8/8] mm, compaction: replace free scanner with direct freelist allocation
  2017-12-13  8:59 [UNTESTED RFC PATCH 0/8] compaction scanners rework Vlastimil Babka
                   ` (6 preceding siblings ...)
  2017-12-13  8:59 ` [RFC PATCH 7/8] mm, compaction: prescan all MIGRATE_MOVABLE pageblocks Vlastimil Babka
@ 2017-12-13  8:59 ` Vlastimil Babka
  2018-01-23 20:05 ` [UNTESTED RFC PATCH 0/8] compaction scanners rework Johannes Weiner
  8 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2017-12-13  8:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Johannes Weiner, Mel Gorman, Joonsoo Kim, David Rientjes,
	Vlastimil Babka

The goal of direct compaction is to quickly make a high-order page available
for the pending allocation. The free page scanner can add significant latency
when searching for migration targets, although to succeed the compaction, the
only important limit on the target free pages is that they must not come from
the same order-aligned block as the migrated pages.

This patch therefore makes compaction allocate freepages directly from
freelists. Pages that do come from the same block (which we cannot simply
exclude from the freelist allocation) are skipped and put back to the tail of
freelists.

In addition to reduced stall, another advantage is that we split larger free
pages for migration targets only when smaller pages are depleted, while the
free scanner can split pages up to (order - 1) as it encouters them. Further
advantage is that now the migration scanner can compact the whole zone, while
in the current scheme it has been observed to meet the free scanner in 1/3 to
1/2 of the zone.

One danger of the new scheme is that pages will be migrated back and forth as
the migration scanner would form a range of free pages (except non-movable and
THP pages) and then "slide" this range towards the end of the zone, as long as
the non-movable pages prevent it from succeeding. The previous patches in this
series should make this improbable for direct compaction thanks to the
pre-scanning approach. The same thing could be done for kcompactd, but it's not
clear yet how to handle manually triggered compaction from /proc as that has no
success termination criteria.

For observational purposes, the patch introduces two new counters to
/proc/vmstat. compact_free_list_alloc counts how many pages were allocated
directly without scanning, and compact_free_direct_skip counts the subset of
these allocations that were from the wrong range and had to be put back.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/vm_event_item.h |  1 +
 mm/compaction.c               | 10 ++++--
 mm/internal.h                 |  2 ++
 mm/page_alloc.c               | 71 +++++++++++++++++++++++++++++++++++++++++++
 mm/vmstat.c                   |  2 ++
 5 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index cf92b1f115ee..04c5dfb245b4 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -56,6 +56,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_COMPACTION
 		COMPACTMIGRATE_SCANNED, COMPACTFREE_SCANNED,
 		COMPACTMIGRATE_PRESCANNED,
+		COMPACTFREE_LIST_ALLOC, COMPACTFREE_LIST_SKIP,
 		COMPACTISOLATED,
 		COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
 		KCOMPACTD_WAKE,
diff --git a/mm/compaction.c b/mm/compaction.c
index 3e6a37162d77..0832c4a31181 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1327,14 +1327,20 @@ static struct page *compaction_alloc(struct page *migratepage,
 {
 	struct compact_control *cc = (struct compact_control *)data;
 	struct page *freepage;
+	int queued;
 
 	/*
 	 * Isolate free pages if necessary, and if we are not aborting due to
 	 * contention.
 	 */
 	if (list_empty(&cc->freepages)) {
-		if (!cc->contended)
-			isolate_freepages(cc);
+		if (!cc->contended) {
+			queued = alloc_pages_compact(cc->zone, &cc->freepages,
+				cc->nr_migratepages,
+				(cc->migrate_pfn - 1) >> pageblock_order);
+			cc->nr_freepages += queued;
+			map_pages(&cc->freepages);
+		}
 
 		if (list_empty(&cc->freepages))
 			return NULL;
diff --git a/mm/internal.h b/mm/internal.h
index 35ff677cf731..3e7a28caaa50 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -161,6 +161,8 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 }
 
 extern int __isolate_free_page(struct page *page, unsigned int order);
+extern int alloc_pages_compact(struct zone *zone, struct list_head *list,
+				int pages, unsigned long pageblock_exclude);
 extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
 					unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned int order);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0c9d97e1b0b7..5717135a9222 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2417,6 +2417,77 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 	return alloced;
 }
 
+static
+int __rmqueue_compact(struct zone *zone, struct list_head *list, int pages,
+						unsigned long pageblock_exclude)
+{
+	unsigned int order;
+	struct page *page, *next;
+	int mtype;
+	int fallback;
+	struct list_head * free_list;
+	LIST_HEAD(skip_list);
+	int queued_pages = 0;
+
+	for (order = 0; order < MAX_ORDER; ++order) {
+		for (mtype = MIGRATE_MOVABLE, fallback = 0;
+		     mtype != MIGRATE_TYPES;
+		     mtype = fallbacks[MIGRATE_MOVABLE][fallback++]) {
+
+			free_list = &zone->free_area[order].free_list[mtype];
+			list_for_each_entry_safe(page, next, free_list, lru) {
+				if (page_to_pfn(page) >> pageblock_order
+							== pageblock_exclude) {
+					list_move(&page->lru, &skip_list);
+					count_vm_event(COMPACTFREE_LIST_SKIP);
+					continue;
+				}
+
+
+				list_move(&page->lru, list);
+				zone->free_area[order].nr_free--;
+				rmv_page_order(page);
+				set_page_private(page, order);
+
+				__mod_zone_freepage_state(zone, -(1UL << order),
+					get_pageblock_migratetype(page));
+
+				queued_pages += 1 << order;
+				if (queued_pages >= pages)
+					break;
+			}
+			/*
+			 * Put skipped pages at the end of free list so we are
+			 * less likely to encounter them again.
+			 */
+			list_splice_tail_init(&skip_list, free_list);
+		}
+	}
+	count_vm_events(COMPACTFREE_LIST_ALLOC, queued_pages);
+	count_vm_events(COMPACTISOLATED, queued_pages);
+	return queued_pages;
+}
+
+int alloc_pages_compact(struct zone *zone, struct list_head *list, int pages,
+						unsigned long pageblock_exclude)
+{
+	unsigned long flags;
+	unsigned long watermark;
+	int queued_pages;
+
+	watermark = low_wmark_pages(zone) + pages;
+	if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
+		return 0;
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	queued_pages = __rmqueue_compact(zone, list, pages, pageblock_exclude);
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	return queued_pages;
+}
+
 #ifdef CONFIG_NUMA
 /*
  * Called from the vmstat counter updater to drain pagesets of this
diff --git a/mm/vmstat.c b/mm/vmstat.c
index cf445f8280e4..3c537237bda7 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1224,6 +1224,8 @@ const char * const vmstat_text[] = {
 	"compact_migrate_scanned",
 	"compact_free_scanned",
 	"compact_migrate_prescanned",
+	"compact_free_list_alloc",
+	"compact_free_list_skip",
 	"compact_isolated",
 	"compact_stall",
 	"compact_fail",
-- 
2.15.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [UNTESTED RFC PATCH 0/8] compaction scanners rework
  2017-12-13  8:59 [UNTESTED RFC PATCH 0/8] compaction scanners rework Vlastimil Babka
                   ` (7 preceding siblings ...)
  2017-12-13  8:59 ` [RFC PATCH 8/8] mm, compaction: replace free scanner with direct freelist allocation Vlastimil Babka
@ 2018-01-23 20:05 ` Johannes Weiner
  2018-03-01 10:22   ` Vlastimil Babka
  8 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2018-01-23 20:05 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: linux-mm, Mel Gorman, Joonsoo Kim, David Rientjes

Hi Vlastimil,

On Wed, Dec 13, 2017 at 09:59:07AM +0100, Vlastimil Babka wrote:
> Hi,
> 
> I have been working on this in the past weeks, but probably won't have time to
> finish and test properly this year. So here's an UNTESTED RFC for those brave
> enough to test, and also for review comments. I've been focusing on 1-7, and
> patch 8 is unchanged since the last posting,  so Mel's suggestions (wrt
> fallbacks and scanning pageblock where we get the free page from) from are not
> included yet.
>
> For context, please see the recent threads [1] [2]. The main goal is to
> eliminate the reported huge free scanner activity by replacing the scanner with
> allocation from free lists. This has some dangers of excessive migrations as
> described in Patch 8 commit log, so the earlier patches try to eliminate most
> of them by making the migration scanner decide to actually migrate pages only
> if it looks like it can succeed. This should be benefical even in the current
> scheme.

I'm interested in helping to push this along, since we suffer from the
current compaction free scanner.

On paper the patches make sense to me and the code looks reasonable as
well. However, testing them with our workload would probably not add
much to this series, since the new patches 1-7 are supposed to address
issues we didn't observe in practice.

Since Mel isn't comfortable with replacing the scanner with freelist
allocations - and I have to admit I also find the freelist allocations
harder to reason about - I wonder if a different approach to this is
workable.

The crux here is that this problem gets worse as memory sizes get
bigger. We don't have an issue on 16G machines. But the page orders we
want to allocate do not scale up the same way: THPs are still the same
size, MAX_ORDER is still the same. So why, on a 256G machine, do we
have to find matching used/free page candidates across the entire 256G
memory range? We allocate THPs on 8G machines all the time - cheaper.

Yes, we always have to compact all of memory. But we don't have to aim
for perfect defragmentation, with all used pages to the left and all
free pages to the right. Currently, on a 256G machine, we essentially
try to - although never getting there - compact a single order-25 free
page. That seems like an insane goal.

So I wonder if instead we could split large memory ranges into smaller
compaction chunks, each with their own pairs of migration and free
scanners. We could then cheaply skip over entire chunks for which
reclaim didn't produce any free pages. And we could compact all these
sub chunks in parallel.

Splitting the "matchmaking pool" like this would of course cause us to
miss compaction opportunities between sources and targets in disjunct
subranges. But we only need compaction to produce the largest common
allocation requests; there has to be a maximum pool size on which a
migrate & free scanner pair operates beyond which the rising scan cost
yields diminishing returns, and beyond which divide and conquer would
scale much better for the potentially increased allocation frequencies
on larger machines.

Does this make sense? Am I missing something in the way the allocator
works that would make this impractical?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [UNTESTED RFC PATCH 0/8] compaction scanners rework
  2018-01-23 20:05 ` [UNTESTED RFC PATCH 0/8] compaction scanners rework Johannes Weiner
@ 2018-03-01 10:22   ` Vlastimil Babka
  0 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2018-03-01 10:22 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, Mel Gorman, Joonsoo Kim, David Rientjes

On 01/23/2018 09:05 PM, Johannes Weiner wrote:
> Hi Vlastimil,

Hi, sorry for the long delay!

> On Wed, Dec 13, 2017 at 09:59:07AM +0100, Vlastimil Babka wrote:
>> Hi,
>>
>> I have been working on this in the past weeks, but probably won't have time to
>> finish and test properly this year. So here's an UNTESTED RFC for those brave
>> enough to test, and also for review comments. I've been focusing on 1-7, and
>> patch 8 is unchanged since the last posting,  so Mel's suggestions (wrt
>> fallbacks and scanning pageblock where we get the free page from) from are not
>> included yet.
>>
>> For context, please see the recent threads [1] [2]. The main goal is to
>> eliminate the reported huge free scanner activity by replacing the scanner with
>> allocation from free lists. This has some dangers of excessive migrations as
>> described in Patch 8 commit log, so the earlier patches try to eliminate most
>> of them by making the migration scanner decide to actually migrate pages only
>> if it looks like it can succeed. This should be benefical even in the current
>> scheme.
> 
> I'm interested in helping to push this along, since we suffer from the
> current compaction free scanner.
> 
> On paper the patches make sense to me and the code looks reasonable as
> well. However, testing them with our workload would probably not add
> much to this series, since the new patches 1-7 are supposed to address
> issues we didn't observe in practice.

Well, the main purpose of 1-7 was to minimize issues expected due to 8 :)

> Since Mel isn't comfortable with replacing the scanner with freelist
> allocations - and I have to admit I also find the freelist allocations

Hm IIRC he said in the end that it would be OK, especially if freelist
was used as a pointer to pageblock which would be scanned. I don't think
it's a fundamental difference from purely freelist allocations.

> harder to reason about - I wonder if a different approach to this is
> workable.
> 
> The crux here is that this problem gets worse as memory sizes get
> bigger. We don't have an issue on 16G machines. But the page orders we
> want to allocate do not scale up the same way: THPs are still the same
> size, MAX_ORDER is still the same. So why, on a 256G machine, do we
> have to find matching used/free page candidates across the entire 256G
> memory range? We allocate THPs on 8G machines all the time - cheaper.
> 
> Yes, we always have to compact all of memory. But we don't have to aim
> for perfect defragmentation, with all used pages to the left and all
> free pages to the right. Currently, on a 256G machine, we essentially
> try to - although never getting there - compact a single order-25 free
> page. That seems like an insane goal.
> 
> So I wonder if instead we could split large memory ranges into smaller
> compaction chunks, each with their own pairs of migration and free
> scanners.

At first sight that would mean the same number of pages would still be
scanned, just in different order...

> We could then cheaply skip over entire chunks for which
> reclaim didn't produce any free pages.

OK this could mean less scanning in theory, but now we also have
skipping of pageblocks where compaction failed to isolate anything, so I
don't immediately see if this scheme would mean more efficient skipping.

> And we could compact all these
> sub chunks in parallel.

That could reduce allocation latency, but then multiple CPU's would be
burning time in compaction, and people would be still unhappy I guess.
Also compaction needs lru and zone locks for isolation, so those would
get contended and it wouldn't scale?

> Splitting the "matchmaking pool" like this would of course cause us to
> miss compaction opportunities between sources and targets in disjunct
> subranges. But we only need compaction to produce the largest common
> allocation requests; there has to be a maximum pool size on which a
> migrate & free scanner pair operates beyond which the rising scan cost
> yields diminishing returns, and beyond which divide and conquer would
> scale much better for the potentially increased allocation frequencies
> on larger machines.

I wonder if there's some other non-obvious underlying reason why
compaction works worse one 256G systems than on 8G. Could it be because
min_free_kbytes scales sub-linearly (IIRC?) with zone size? Compaction
performs better with more free memory.
Or higher zone/lru lock contention because there are also more cpus?
That would make terminating async compaction more likely, thus retrying
with the more expensive sync compaction.

Could some kind of experiments with fake numa splitting to smaller nodes
shed some more light here?

Vlastimil

> Does this make sense? Am I missing something in the way the allocator
> works that would make this impractical?
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-01 10:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-13  8:59 [UNTESTED RFC PATCH 0/8] compaction scanners rework Vlastimil Babka
2017-12-13  8:59 ` [RFC PATCH 1/8] mm, compaction: don't mark pageblocks unsuitable when not fully scanned Vlastimil Babka
2017-12-13  8:59 ` [RFC PATCH 2/8] mm, compaction: skip_on_failure only for MIGRATE_MOVABLE allocations Vlastimil Babka
2017-12-13  8:59 ` [RFC PATCH 3/8] mm, compaction: pass valid_page to isolate_migratepages_block Vlastimil Babka
2017-12-13  8:59 ` [RFC PATCH 4/8] mm, compaction: skip on isolation failure also in sync compaction Vlastimil Babka
2017-12-13  8:59 ` [RFC PATCH 5/8] mm, compaction: factor out checking if page can be isolated for migration Vlastimil Babka
2017-12-13  8:59 ` [RFC PATCH 6/8] mm, compaction: prescan before isolating in skip_on_failure mode Vlastimil Babka
2017-12-13  8:59 ` [RFC PATCH 7/8] mm, compaction: prescan all MIGRATE_MOVABLE pageblocks Vlastimil Babka
2017-12-13  8:59 ` [RFC PATCH 8/8] mm, compaction: replace free scanner with direct freelist allocation Vlastimil Babka
2018-01-23 20:05 ` [UNTESTED RFC PATCH 0/8] compaction scanners rework Johannes Weiner
2018-03-01 10:22   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox