[PATCH v2 0/9] mm/compaction: redesign compaction

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/9] mm/compaction: redesign compaction
@ 2015-08-24  2:19 Joonsoo Kim
  2015-08-24  2:19 ` [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn Joonsoo Kim
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24  2:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
	Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim

Major changes from v1:
o Add one skip-bit on each pageblock to avoid the pageblock that
cannot be used for migration target

o Allow freepage scanner to scan non-movable pageblock only if zone
is in compaction depleted state:
To allow scanning on non-movable pageblock cannot be avoided because
there is a system where almost pageblocks are unmovable pageblock.
And, without this patch, as compaction progress, all of freepages are
moved to non-movable pageblock due to asymetric characteristic of
scanner's target pageblock and compaction will stop working due to
shortage of migration target freepage. In experiment, allowing
freepage scanner to scan non-movable pageblock only if zone is
in compaction depleted state doesn't fragment the system more than
before with ensuring great success rate improvement.

o Don't use high-order freepage higher than the order we try to make:
It prevents parallel freepage scanner undo migration scanner's work.

o Include elapsed time in stress-highalloc test

o Include page owner result to check fragmentation

o All result are refreshed

o Remove Success attribute in result:
Showing two similar metric make reader somewhat confused
so remove less important one. Remained one Success(N) is calculated
by following equation.

Success(N) = successful allocation * 100 / order-3 candidates

o Prevent freepage scanner to scan a zone many times

Orignial cover-letter with some refresh
Recently, I got a report that android get slow due to order-2 page
allocation. With some investigation, I found that compaction usually
fails and many pages are reclaimed to make order-2 freepage. I can't
analyze detailed reason that causes compaction fail because I don't
have reproducible environment and compaction code is changed so much
from that version, v3.10. But, I was inspired by this report and started
to think limitation of current compaction algorithm.

Limitation of current compaction algorithm:

1) Migrate scanner can't scan behind of free scanner, because
each scanner starts at both side of zone and go toward each other. If
they meet at some point, compaction is stopped and scanners' position
is reset to both side of zone again. From my experience, migrate scanner
usually doesn't scan beyond of half of the zone range.

2) Compaction capability is highly depends on amount of free memory.
If there is 50 MB free memory on 4 GB system, migrate scanner can
migrate 50 MB used pages at maximum and then will meet free scanner.
If compaction can't make enough high order freepages during this
amount of work, compaction would fail. There is no way to escape this
failure situation in current algorithm and it will scan same region and
fail again and again. And then, it goes into compaction deferring logic
and will be deferred for some times.

3) Compaction capability is highly depends on migratetype of memory,
because freepage scanner doesn't scan unmovable pageblock.

To investigate compaction limitations, I made some compaction benchmarks.
Base environment of this benchmark is fragmented memory. Before testing,
25% of total size of memory is allocated. With some tricks, these
allocations are evenly distributed to whole memory range. So, after
allocation is finished, memory is highly fragmented and possibility of
successful order-3 allocation is very low. Roughly 1500 order-3 allocation
can be successful. Tests attempt excessive amount of allocation request,
that is, 3000, to find out algorithm limitation.

There are two variations.

pageblock type (unmovable / movable):

One is that most pageblocks are unmovable migratetype and the other is
that most pageblocks are movable migratetype.

memory usage (memory hogger 200 MB / kernel build with -j8):

Memory hogger means that 200 MB free memory is occupied by hogger.
Kernel build means that kernel build is running on background and it
will consume free memory, but, amount of consumption will be very
fluctuated.

With these variations, I made 4 test cases by mixing them.

hogger-frag-unmovable
hogger-frag-movable
build-frag-unmovable
build-frag-movable

All tests are conducted on 512 MB QEMU virtual machine with 8 CPUs.

I can easily check weakness of compaction algorithm by following test.

To check 1), hogger-frag-movable benchmark is used. Result is as
following.

Kernel:	Base

Success(N)                    70
compact_stall                307
compact_success               64
compact_fail                 243
pgmigrate_success          34592
compact_isolated           73977
compact_migrate_scanned  2280770
compact_free_scanned     4710313

Column 'Success(N) are calculated by following equations.

Success(N) = successful allocation * 100 /
		number of order-3 candidates

As mentioned above, there are roughly 1500 high order page candidates,
but, compaction just returns 70% of them even if system is under low load.
With new compaction approach, it can be increased to 94%.

To check 2), hogger-frag-movable benchmark is used again, but, with some
tweaks. Amount of allocated memory by memory hogger varys.

Kernel:	Base

200MB-Success(N)	70
250MB-Success(N)	38
300MB-Success(N)	29

As background knowledge, up to 250MB, there is enough
memory to succeed all order-3 allocation attempts. In 300MB case,
available memory before starting allocation attempt is just 57MB,
so all of attempts cannot succeed.

Anyway, as free memory decreases, compaction success rate also decreases.
It is better to remove this dependency to get stable compaction result
in any case. System is usually under the low memory state because kernel
try to keeps page cache as much as possible. Even in this case,
compaction should work well so change is needed.

To check 3), build-frag-unmovable/movable benchmarks are used.
All factors are same except pageblock migratetypes.

Test: build-frag-unmovable
Success(N)                    37

Test: build-frag-movable
Success(N)                    71

Pageblock migratetype makes big difference on success rate. 3) would be
one of reason related to this result. Because freepage scanner doesn't
scan non-movable pageblock, compaction can't get enough freepage for
migration and compaction easily fails. This patchset try to solve it
by allowing freepage scanner to scan on non-movable pageblock.

Result show that we cannot get all possible high order page through
current compaction algorithm. And, in case that migratetype of
pageblock is unmovable, success rate get worse.

This patchset try to solve these limitations by introducing new compaction
approach. Main changes of this patchset are as following:

1) Make freepage scanner scans non-movable pageblock
Watermark check doesn't consider how many pages in non-movable pageblock.
To fully utilize existing freepage, freepage scanner should scan
non-movable pageblock. Otherwise, all freepage will be on non-movable
pageblock and compaction cannot progress.

2) Introduce compaction depletion state
Compaction algorithm will be changed to scan whole zone range. In this
approach, compaction inevitably do back and forth migration between
different iterations. If back and forth migration can make highorder
freepage, it can be justified. But, in case of depletion of compaction
possiblity, this back and forth migration causes unnecessary overhead.
Compaction depleteion state is introduced to avoid this useless
back and forth migration by detecting depletion of compaction possibility.

3) Change scanner's behaviour
Migration scanner is changed to scan whole zone range regardless freepage
scanner position. Freepage scanner also scans whole zone from
zone_start_pfn to zone_end_pfn. To prevent back and forth migration
within one compaction iteration, freepage scanner marks skip-bit when
scanning pageblock. Migration scanner will skip this marked pageblock.
Finish condition is very simple. If migration scanner reaches end of
the zone, compaction will be finished. If freepage scanner reaches end of
the zone first, it restart at zone_start_pfn. This helps us to overcome
dependency on amount of free memory.

Following is all test results of this patchset.

Kernel:	Base vs Limit vs Nonmovable vs Redesign vs Threshold

Test: hogger-frag-unmovable

Success(N)                    44              38              51              89              81
compact_stall               1268            5280            4949            3954            3891
compact_success               82              68             107             247             220
compact_fail                1186            5212            4841            3707            3671
pgmigrate_success          28053           14948           75829        16976894          233854
compact_isolated           60850          144501          306681        34055071          525070
compact_migrate_scanned  1199597         2965599        33270886        57323542         2010671
compact_free_scanned     3020346          949566        45376207        46796518         2241243

Test: hogger-frag-movable

Success(N)                    70              60              68              94              83
compact_stall                307            4555            4380            3642            4048
compact_success               64              41              79             144             212
compact_fail                 243            4514            4301            3498            3835
pgmigrate_success          34592           14243           54207        15897219          216387
compact_isolated           73977          105025          258877        31899553          487712
compact_migrate_scanned  2280770         2886304        37214115        59146745         2513245
compact_free_scanned     4710313         1472874        44372985        49566134         4124319

Test: build-frag-unmovable

Success(N)                    37              44              64              79              78
compact_stall                624            6886            5056            4623            4486
compact_success              103             180             419             397             346
compact_fail                 521            6706            4637            4226            4140
pgmigrate_success          22004           23100          277106         6574370          131729
compact_isolated           61021          247653         1056863        13477284          336305
compact_migrate_scanned  2609360         4186815        70252458        73017961         2312430
compact_free_scanned     4808989        13112142        23091292        19290141         2484755

Test: build-frag-movable

Success(N)                    71              66              76              89              87
compact_stall                432            4644            4243            4053            3642
compact_success              110              94             170             264             202
compact_fail                 322            4550            4073            3788            3440
pgmigrate_success          51265           34215          120132         6497642          153413
compact_isolated          124238          176080          756052        13292640          353445
compact_migrate_scanned  4497824         2786343        75556954        69714502         2307433
compact_free_scanned     3809018         3456820        26786674        20243121         2325295

Redesigned compaction version shows great success rate even if
almost pageblocks are unmovable migratetype. It shows that
almost order-3 candidates are allocated unlike previous compaction
algorithm. Overhead is dropped to reasonable number by assigning
threshold appropriately.

Test: stress-highalloc in mmtests
(tweaks to request order-7 unmovable allocation)

Kernel:	Base vs Limit vs Nonmovable vs Redesign vs Threshold

Ops 1                                   24              11              23              83              74
Ops 2                                   39              22              44              83              75
Ops 3                                   90              85              89              92              92
Elapsed                               1428            1354            1697            1993            1489

Compaction stalls                     6351           29199           26434           14348           15859
Compaction success                    2291            1343            2964            5081            4483
Compaction failures                   4060           27856           23470            9266           11376
Page migrate success               3264680         1174053        31952619        53015969         3099520
Compaction pages isolated          6644306         2871488        64857701       106723902         6645394
Compaction migrate scanned        62964165        19857623       569563971       748424010        38447292
Compaction free scanned          356586772       309061359      2129832292       670368329        95853699
Compaction cost                       3955            1412           38387           62298            3612

Result shows that much improvement comes from redesign algorithm but it
causes too much overhead. However, further optimization reduces this
overhead greatly with a little success rate degradation. It looks like
optimized version has less overhead than base in this test.

Test: repeats stress-highalloc 3 times without rebooting
(request order-9 movable allocation)

pb[N] means number of non-mixed pageblock after N run
of stress-highalloc test (Large number is better)

Kernel:	Base vs Threshold

pb[1]:DMA32:movable:		1365	1364
pb[1]:Normal:movable:		393	394
pb[2]:DMA32:movable:		1306	1309
pb[2]:Normal:movable:		368	368
pb[3]:DMA32:movable:		1272	1275
pb[3]:Normal:movable:		358	350

This series that include the patch allowing freepage scanner
to scan non-movable pageblock in compaction depleted state
doesn't fragment memory more than before.

Test: hogger-frag-movable with free memory variation

Kernel:	Base vs Limit vs Nonmovable vs Redesign vs Threshold

200MB-Success(N)	70	60	68	94	83
250MB-Success(N)	38	36	43	93	75
300MB-Success(N)	29	20	30	89	74

Please see result of "hogger-frag-movable with free memory variation".
It shows that as hogger takes more memory, success rate decreases greatly
until redesign version. After redesigning compaction, success rate still
decreases but just a little.

In conclusion, this series solves three limitations of current compaction
algorithm successfully. :)

This patchset is based on linux-next-20150515 and not targeted to merge.
After merge window is finished, I will rebase it to recent kernel and
send v3.

Feel free to comment.
Thanks.

Joonsoo Kim (9):
  mm/compaction: skip useless pfn when updating cached pfn
  mm/compaction: introduce compaction depleted state on zone
  mm/compaction: limit compaction activity in compaction depleted state
  mm/compaction: remove compaction deferring
  mm/compaction: allow to scan nonmovable pageblock when depleted state
  mm/compaction: manage separate skip-bits for migration and free
    scanner
  mm/compaction: redesign compaction
  mm/compaction: don't use higher order freepage than compaction aims at
  mm/compaction: new threshold for compaction depleted zone

 include/linux/compaction.h        |  14 +-
 include/linux/mmzone.h            |  10 +-
 include/linux/pageblock-flags.h   |  37 +++-
 include/trace/events/compaction.h |  30 ++-
 mm/compaction.c                   | 377 ++++++++++++++++++++++++--------------
 mm/internal.h                     |   1 +
 mm/page_alloc.c                   |   5 +-
 mm/vmscan.c                       |   4 +-
 8 files changed, 296 insertions(+), 182 deletions(-)

-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn
  2015-08-24  2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
@ 2015-08-24  2:19 ` Joonsoo Kim
  2015-08-24  9:07   ` Vlastimil Babka
  2015-08-24  2:19 ` [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone Joonsoo Kim
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24  2:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
	Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim

Cached pfn is used to determine the start position of scanner
at next compaction run. Current cached pfn points the skipped pageblock
so we uselessly checks whether pageblock is valid for compaction and
skip-bit is set or not. If we set scanner's cached pfn to next pfn of
skipped pageblock, we don't need to do this check.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 6ef2fdf..c2d3d6a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -261,10 +261,9 @@ void reset_isolation_suitable(pg_data_t *pgdat)
  */
 static void update_pageblock_skip(struct compact_control *cc,
 			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			unsigned long pfn, bool migrate_scanner)
 {
 	struct zone *zone = cc->zone;
-	unsigned long pfn;
 
 	if (cc->ignore_skip_hint)
 		return;
@@ -277,8 +276,6 @@ static void update_pageblock_skip(struct compact_control *cc,
 
 	set_pageblock_skip(page);
 
-	pfn = page_to_pfn(page);
-
 	/* Update where async and sync compaction should restart */
 	if (migrate_scanner) {
 		if (pfn > zone->compact_cached_migrate_pfn[0])
@@ -300,7 +297,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
 
 static void update_pageblock_skip(struct compact_control *cc,
 			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			unsigned long pfn, bool migrate_scanner)
 {
 }
 #endif /* CONFIG_COMPACTION */
@@ -509,7 +506,8 @@ isolate_fail:
 
 	/* Update the pageblock-skip if the whole pageblock was scanned */
 	if (blockpfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, total_isolated, false);
+		update_pageblock_skip(cc, valid_page, total_isolated,
+					end_pfn, false);
 
 	count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
 	if (total_isolated)
@@ -811,7 +809,8 @@ isolate_success:
 	 * if the whole pageblock was scanned without isolating any page.
 	 */
 	if (low_pfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, nr_isolated, true);
+		update_pageblock_skip(cc, valid_page, nr_isolated,
+					end_pfn, true);
 
 	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
 						nr_scanned, nr_isolated);
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn
  2015-08-24  2:19 ` [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn Joonsoo Kim
@ 2015-08-24  9:07   ` Vlastimil Babka
  2015-09-07  5:35     ` Joonsoo Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-08-24  9:07 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
	Minchan Kim, Joonsoo Kim

On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Cached pfn is used to determine the start position of scanner
> at next compaction run. Current cached pfn points the skipped pageblock
> so we uselessly checks whether pageblock is valid for compaction and
> skip-bit is set or not. If we set scanner's cached pfn to next pfn of
> skipped pageblock, we don't need to do this check.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>   mm/compaction.c | 13 ++++++-------
>   1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 6ef2fdf..c2d3d6a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -261,10 +261,9 @@ void reset_isolation_suitable(pg_data_t *pgdat)
>    */
>   static void update_pageblock_skip(struct compact_control *cc,
>   			struct page *page, unsigned long nr_isolated,
> -			bool migrate_scanner)
> +			unsigned long pfn, bool migrate_scanner)
>   {
>   	struct zone *zone = cc->zone;
> -	unsigned long pfn;
>
>   	if (cc->ignore_skip_hint)
>   		return;
> @@ -277,8 +276,6 @@ static void update_pageblock_skip(struct compact_control *cc,
>
>   	set_pageblock_skip(page);
>
> -	pfn = page_to_pfn(page);
> -
>   	/* Update where async and sync compaction should restart */
>   	if (migrate_scanner) {
>   		if (pfn > zone->compact_cached_migrate_pfn[0])
> @@ -300,7 +297,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
>
>   static void update_pageblock_skip(struct compact_control *cc,
>   			struct page *page, unsigned long nr_isolated,
> -			bool migrate_scanner)
> +			unsigned long pfn, bool migrate_scanner)
>   {
>   }
>   #endif /* CONFIG_COMPACTION */
> @@ -509,7 +506,8 @@ isolate_fail:
>
>   	/* Update the pageblock-skip if the whole pageblock was scanned */
>   	if (blockpfn == end_pfn)
> -		update_pageblock_skip(cc, valid_page, total_isolated, false);
> +		update_pageblock_skip(cc, valid_page, total_isolated,
> +					end_pfn, false);

In isolate_freepages_block() this means we actually go logically *back* 
one pageblock, as the direction is opposite? I know it's not an issue 
after the redesign patch so you wouldn't notice it when testing the 
whole series. But there's a non-zero chance that the smaller fixes are 
merged first and the redesign later...

>
>   	count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
>   	if (total_isolated)
> @@ -811,7 +809,8 @@ isolate_success:
>   	 * if the whole pageblock was scanned without isolating any page.
>   	 */
>   	if (low_pfn == end_pfn)
> -		update_pageblock_skip(cc, valid_page, nr_isolated, true);
> +		update_pageblock_skip(cc, valid_page, nr_isolated,
> +					end_pfn, true);
>
>   	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
>   						nr_scanned, nr_isolated);
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn
  2015-08-24  9:07   ` Vlastimil Babka
@ 2015-09-07  5:35     ` Joonsoo Kim
  2015-09-08 16:30       ` Vlastimil Babka
  0 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-09-07  5:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
	David Rientjes, Minchan Kim

On Mon, Aug 24, 2015 at 11:07:12AM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> >Cached pfn is used to determine the start position of scanner
> >at next compaction run. Current cached pfn points the skipped pageblock
> >so we uselessly checks whether pageblock is valid for compaction and
> >skip-bit is set or not. If we set scanner's cached pfn to next pfn of
> >skipped pageblock, we don't need to do this check.
> >
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >---
> >  mm/compaction.c | 13 ++++++-------
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> >
> >diff --git a/mm/compaction.c b/mm/compaction.c
> >index 6ef2fdf..c2d3d6a 100644
> >--- a/mm/compaction.c
> >+++ b/mm/compaction.c
> >@@ -261,10 +261,9 @@ void reset_isolation_suitable(pg_data_t *pgdat)
> >   */
> >  static void update_pageblock_skip(struct compact_control *cc,
> >  			struct page *page, unsigned long nr_isolated,
> >-			bool migrate_scanner)
> >+			unsigned long pfn, bool migrate_scanner)
> >  {
> >  	struct zone *zone = cc->zone;
> >-	unsigned long pfn;
> >
> >  	if (cc->ignore_skip_hint)
> >  		return;
> >@@ -277,8 +276,6 @@ static void update_pageblock_skip(struct compact_control *cc,
> >
> >  	set_pageblock_skip(page);
> >
> >-	pfn = page_to_pfn(page);
> >-
> >  	/* Update where async and sync compaction should restart */
> >  	if (migrate_scanner) {
> >  		if (pfn > zone->compact_cached_migrate_pfn[0])
> >@@ -300,7 +297,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
> >
> >  static void update_pageblock_skip(struct compact_control *cc,
> >  			struct page *page, unsigned long nr_isolated,
> >-			bool migrate_scanner)
> >+			unsigned long pfn, bool migrate_scanner)
> >  {
> >  }
> >  #endif /* CONFIG_COMPACTION */
> >@@ -509,7 +506,8 @@ isolate_fail:
> >
> >  	/* Update the pageblock-skip if the whole pageblock was scanned */
> >  	if (blockpfn == end_pfn)
> >-		update_pageblock_skip(cc, valid_page, total_isolated, false);
> >+		update_pageblock_skip(cc, valid_page, total_isolated,
> >+					end_pfn, false);
> 
> In isolate_freepages_block() this means we actually go logically
> *back* one pageblock, as the direction is opposite? I know it's not
> an issue after the redesign patch so you wouldn't notice it when
> testing the whole series. But there's a non-zero chance that the
> smaller fixes are merged first and the redesign later...

Hello, Vlastimil.
Sorry for long delay. I was on vacation. :)
I will fix it next time.

Btw, if possible, could you review the patchset in detail? or do you
have another plan on compaction improvement? Please let me know your
position to determine future plan of this patchset.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn
  2015-09-07  5:35     ` Joonsoo Kim
@ 2015-09-08 16:30       ` Vlastimil Babka
  0 siblings, 0 replies; 26+ messages in thread
From: Vlastimil Babka @ 2015-09-08 16:30 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
	David Rientjes, Minchan Kim

On 09/07/2015 07:35 AM, Joonsoo Kim wrote:
> On Mon, Aug 24, 2015 at 11:07:12AM +0200, Vlastimil Babka wrote:
>> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
>> 
>> In isolate_freepages_block() this means we actually go logically
>> *back* one pageblock, as the direction is opposite? I know it's not
>> an issue after the redesign patch so you wouldn't notice it when
>> testing the whole series. But there's a non-zero chance that the
>> smaller fixes are merged first and the redesign later...
> 
> Hello, Vlastimil.
> Sorry for long delay. I was on vacation. :)
> I will fix it next time.
> 
> Btw, if possible, could you review the patchset in detail? or do you

I'll try soon...

> have another plan on compaction improvement? Please let me know your
> position to determine future plan of this patchset.
> 
> Thanks.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone
  2015-08-24  2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
  2015-08-24  2:19 ` [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn Joonsoo Kim
@ 2015-08-24  2:19 ` Joonsoo Kim
  2015-09-25  8:04   ` Vlastimil Babka
  2015-08-24  2:19 ` [PATCH v2 3/9] mm/compaction: limit compaction activity in compaction depleted state Joonsoo Kim
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24  2:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
	Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim

Further compaction attempt is deferred when some of compaction attempts
already fails. But, after some number of trial are skipped, compaction
restarts work to check whether compaction is now possible or not. It
scans whole range of zone to determine this possibility and if compaction
possibility doesn't recover, this whole range scan is quite big overhead.
As a first step to reduce this overhead, this patch implement compaction
depleted state on zone.

The way to determine depletion of compaction possility is checking number
of success on previous compaction attempt. If number of successful
compaction is below than specified threshold, we guess that compaction
will not successful next time so mark the zone as compaction depleted.
In this patch, threshold is choosed by 1 to imitate current compaction
deferring algorithm. In the following patch, compaction algorithm will be
changed and this threshold is also adjusted to that change.

In this patch, only state definition is implemented. There is no action
for this new state so no functional change. But, following patch will
add some handling for this new state.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/mmzone.h |  3 +++
 mm/compaction.c        | 44 +++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 754c259..700e9b5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -517,6 +517,8 @@ struct zone {
 	unsigned int		compact_considered;
 	unsigned int		compact_defer_shift;
 	int			compact_order_failed;
+	unsigned long		compact_success;
+	unsigned long		compact_depletion_depth;
 #endif
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -543,6 +545,7 @@ enum zone_flags {
 					 * many pages under writeback
 					 */
 	ZONE_FAIR_DEPLETED,		/* fair zone policy batch depleted */
+	ZONE_COMPACTION_DEPLETED,	/* compaction possiblity depleted */
 };
 
 static inline unsigned long zone_end_pfn(const struct zone *zone)
diff --git a/mm/compaction.c b/mm/compaction.c
index c2d3d6a..de96e9d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -129,6 +129,23 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
+#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
+
+static bool compaction_depleted(struct zone *zone)
+{
+	unsigned long threshold;
+	unsigned long success = zone->compact_success;
+
+	/*
+	 * Now, to imitate current compaction deferring approach,
+	 * choose threshold to 1. It will be changed in the future.
+	 */
+	threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
+	if (success >= threshold)
+		return false;
+
+	return true;
+}
 
 /*
  * Compaction is deferred when compaction fails to result in a page
@@ -223,6 +240,16 @@ static void __reset_isolation_suitable(struct zone *zone)
 	zone->compact_cached_free_pfn = end_pfn;
 	zone->compact_blockskip_flush = false;
 
+	if (compaction_depleted(zone)) {
+		if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
+			zone->compact_depletion_depth++;
+		else {
+			set_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
+			zone->compact_depletion_depth = 0;
+		}
+	}
+	zone->compact_success = 0;
+
 	/* Walk the zone and mark every pageblock as suitable for isolation */
 	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
 		struct page *page;
@@ -1185,22 +1212,28 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
 		bool can_steal;
 
 		/* Job done if page is free of the right migratetype */
-		if (!list_empty(&area->free_list[migratetype]))
+		if (!list_empty(&area->free_list[migratetype])) {
+			zone->compact_success++;
 			return COMPACT_PARTIAL;
+		}
 
 #ifdef CONFIG_CMA
 		/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
 		if (migratetype == MIGRATE_MOVABLE &&
-			!list_empty(&area->free_list[MIGRATE_CMA]))
+			!list_empty(&area->free_list[MIGRATE_CMA])) {
+			zone->compact_success++;
 			return COMPACT_PARTIAL;
+		}
 #endif
 		/*
 		 * Job done if allocation would steal freepages from
 		 * other migratetype buddy lists.
 		 */
 		if (find_suitable_fallback(area, order, migratetype,
-						true, &can_steal) != -1)
+						true, &can_steal) != -1) {
+			zone->compact_success++;
 			return COMPACT_PARTIAL;
+		}
 	}
 
 	return COMPACT_NO_SUITABLE_PAGE;
@@ -1440,6 +1473,11 @@ out:
 	trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
 				cc->free_pfn, end_pfn, sync, ret);
 
+	if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags)) {
+		if (!compaction_depleted(zone))
+			clear_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
+	}
+
 	return ret;
 }
 
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone
  2015-08-24  2:19 ` [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone Joonsoo Kim
@ 2015-09-25  8:04   ` Vlastimil Babka
  2015-09-30  8:12     ` Joonsoo Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-09-25  8:04 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
	Minchan Kim, Joonsoo Kim

On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Further compaction attempt is deferred when some of compaction attempts
> already fails. But, after some number of trial are skipped, compaction
> restarts work to check whether compaction is now possible or not. It
> scans whole range of zone to determine this possibility and if compaction
> possibility doesn't recover, this whole range scan is quite big overhead.
> As a first step to reduce this overhead, this patch implement compaction
> depleted state on zone.
> 
> The way to determine depletion of compaction possility is checking number
> of success on previous compaction attempt. If number of successful
> compaction is below than specified threshold, we guess that compaction
> will not successful next time so mark the zone as compaction depleted.
> In this patch, threshold is choosed by 1 to imitate current compaction
> deferring algorithm. In the following patch, compaction algorithm will be
> changed and this threshold is also adjusted to that change.
> 
> In this patch, only state definition is implemented. There is no action
> for this new state so no functional change. But, following patch will
> add some handling for this new state.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/mmzone.h |  3 +++
>  mm/compaction.c        | 44 +++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 754c259..700e9b5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -517,6 +517,8 @@ struct zone {
>  	unsigned int		compact_considered;
>  	unsigned int		compact_defer_shift;
>  	int			compact_order_failed;
> +	unsigned long		compact_success;
> +	unsigned long		compact_depletion_depth;
>  #endif
>  
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> @@ -543,6 +545,7 @@ enum zone_flags {
>  					 * many pages under writeback
>  					 */
>  	ZONE_FAIR_DEPLETED,		/* fair zone policy batch depleted */
> +	ZONE_COMPACTION_DEPLETED,	/* compaction possiblity depleted */
>  };
>  
>  static inline unsigned long zone_end_pfn(const struct zone *zone)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c2d3d6a..de96e9d 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -129,6 +129,23 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>  
>  /* Do not skip compaction more than 64 times */
>  #define COMPACT_MAX_DEFER_SHIFT 6
> +#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
> +
> +static bool compaction_depleted(struct zone *zone)
> +{
> +	unsigned long threshold;
> +	unsigned long success = zone->compact_success;
> +
> +	/*
> +	 * Now, to imitate current compaction deferring approach,
> +	 * choose threshold to 1. It will be changed in the future.
> +	 */
> +	threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
> +	if (success >= threshold)
> +		return false;
> +
> +	return true;
> +}
>  
>  /*
>   * Compaction is deferred when compaction fails to result in a page
> @@ -223,6 +240,16 @@ static void __reset_isolation_suitable(struct zone *zone)
>  	zone->compact_cached_free_pfn = end_pfn;
>  	zone->compact_blockskip_flush = false;
>  
> +	if (compaction_depleted(zone)) {
> +		if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
> +			zone->compact_depletion_depth++;
> +		else {
> +			set_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
> +			zone->compact_depletion_depth = 0;
> +		}
> +	}
> +	zone->compact_success = 0;

It's possible that the following comment is made moot by further patches, but:

I assume doing this in __reset_isolation_suitable() is to react on the
compaction_restarting() state. But __reset_isolation_suitable() is called also
from manually invoked compaction, and from kswapd. What if compaction has
succeeded S times, but threshold is T, S < T and T is larger than 1 (by a later
patch). Then kswapd or manual compaction will reset S to zero, without giving it
chance to reach T, even when compaction would succeed?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone
  2015-09-25  8:04   ` Vlastimil Babka
@ 2015-09-30  8:12     ` Joonsoo Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-09-30  8:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
	David Rientjes, Minchan Kim

Hello, Vlastimil.

First of all, thanks for review!

On Fri, Sep 25, 2015 at 10:04:49AM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> > Further compaction attempt is deferred when some of compaction attempts
> > already fails. But, after some number of trial are skipped, compaction
> > restarts work to check whether compaction is now possible or not. It
> > scans whole range of zone to determine this possibility and if compaction
> > possibility doesn't recover, this whole range scan is quite big overhead.
> > As a first step to reduce this overhead, this patch implement compaction
> > depleted state on zone.
> > 
> > The way to determine depletion of compaction possility is checking number
> > of success on previous compaction attempt. If number of successful
> > compaction is below than specified threshold, we guess that compaction
> > will not successful next time so mark the zone as compaction depleted.
> > In this patch, threshold is choosed by 1 to imitate current compaction
> > deferring algorithm. In the following patch, compaction algorithm will be
> > changed and this threshold is also adjusted to that change.
> > 
> > In this patch, only state definition is implemented. There is no action
> > for this new state so no functional change. But, following patch will
> > add some handling for this new state.
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  include/linux/mmzone.h |  3 +++
> >  mm/compaction.c        | 44 +++++++++++++++++++++++++++++++++++++++++---
> >  2 files changed, 44 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 754c259..700e9b5 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -517,6 +517,8 @@ struct zone {
> >  	unsigned int		compact_considered;
> >  	unsigned int		compact_defer_shift;
> >  	int			compact_order_failed;
> > +	unsigned long		compact_success;
> > +	unsigned long		compact_depletion_depth;
> >  #endif
> >  
> >  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> > @@ -543,6 +545,7 @@ enum zone_flags {
> >  					 * many pages under writeback
> >  					 */
> >  	ZONE_FAIR_DEPLETED,		/* fair zone policy batch depleted */
> > +	ZONE_COMPACTION_DEPLETED,	/* compaction possiblity depleted */
> >  };
> >  
> >  static inline unsigned long zone_end_pfn(const struct zone *zone)
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index c2d3d6a..de96e9d 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -129,6 +129,23 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> >  
> >  /* Do not skip compaction more than 64 times */
> >  #define COMPACT_MAX_DEFER_SHIFT 6
> > +#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
> > +
> > +static bool compaction_depleted(struct zone *zone)
> > +{
> > +	unsigned long threshold;
> > +	unsigned long success = zone->compact_success;
> > +
> > +	/*
> > +	 * Now, to imitate current compaction deferring approach,
> > +	 * choose threshold to 1. It will be changed in the future.
> > +	 */
> > +	threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
> > +	if (success >= threshold)
> > +		return false;
> > +
> > +	return true;
> > +}
> >  
> >  /*
> >   * Compaction is deferred when compaction fails to result in a page
> > @@ -223,6 +240,16 @@ static void __reset_isolation_suitable(struct zone *zone)
> >  	zone->compact_cached_free_pfn = end_pfn;
> >  	zone->compact_blockskip_flush = false;
> >  
> > +	if (compaction_depleted(zone)) {
> > +		if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
> > +			zone->compact_depletion_depth++;
> > +		else {
> > +			set_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
> > +			zone->compact_depletion_depth = 0;
> > +		}
> > +	}
> > +	zone->compact_success = 0;
> 
> It's possible that the following comment is made moot by further patches, but:
> 
> I assume doing this in __reset_isolation_suitable() is to react on the
> compaction_restarting() state. But __reset_isolation_suitable() is called also
> from manually invoked compaction, and from kswapd. What if compaction has
> succeeded S times, but threshold is T, S < T and T is larger than 1 (by a later
> patch). Then kswapd or manual compaction will reset S to zero, without giving it
> chance to reach T, even when compaction would succeed?

Okay. I will fix it.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 3/9] mm/compaction: limit compaction activity in compaction depleted state
  2015-08-24  2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
  2015-08-24  2:19 ` [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn Joonsoo Kim
  2015-08-24  2:19 ` [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone Joonsoo Kim
@ 2015-08-24  2:19 ` Joonsoo Kim
  2015-08-24  2:19 ` [PATCH v2 4/9] mm/compaction: remove compaction deferring Joonsoo Kim
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24  2:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
	Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim

Compaction deferring was introduced to reduce overhead of compaction
when compaction attempt is expected to fail. But, it has a problem.
Whole zone is rescanned after some compaction attempts are deferred and
this rescan overhead is quite big. And, it imposes large latency to one
random requestor while others will get nearly zero latency to fail due
to deferring compaction. This patch try to handle this situation
differently to solve above problems.

At first, we should know when compaction will fail. Previous patch
defines compaction depleted state. In this state, compaction failure
is highly expected so we don't need to take much effort on compaction.
So, this patch forces migration scanner scan restricted number of pages
in this state. With this way, we can evenly distribute compaction overhead
to all compaction requestors. And, there is a way to escape from
compaction depleted state so we don't need to defer specific number of
compaction attempts unconditionally if compaction possibility recovers.

In this patch, migration scanner limit is defined to imitate current
compaction deferring approach. But, we can tune it easily if this
overhead doesn't look appropriate. It would be further work.

There would be a situation that compaction depleted state is maintained
for a long time. In this case, repeated compaction attempts would cause
useless overhead continually. To optimize this case, this patch uses
compaction depletion depth and make migration scanner limit diminished
according to this depth. It effectively reduce compaction overhead in
this situation.

Should note that this patch just introduces scan_limit infrastructure
and doesn't check scan_limit to finish the compaction. It will
be implemented in next patch with removing compaction deferring.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c | 41 +++++++++++++++++++++++++++++++++++++++++
 mm/internal.h   |  1 +
 2 files changed, 42 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index de96e9d..c6b8277 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -130,6 +130,7 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_DEFER_SHIFT 6
 #define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
+#define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)

 static bool compaction_depleted(struct zone *zone)
 {
@@ -147,6 +148,42 @@ static bool compaction_depleted(struct zone *zone)
 	return true;
 }

+static void set_migration_scan_limit(struct compact_control *cc)
+{
+	struct zone *zone = cc->zone;
+	int order = cc->order;
+	unsigned long limit;
+
+	cc->migration_scan_limit = LONG_MAX;
+	if (order < 0)
+		return;
+
+	if (!test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
+		return;
+
+	if (!zone->compact_depletion_depth)
+		return;
+
+	/*
+	 * Experimental observation shows that migration scanner
+	 * normally scans 1/4 pages
+	 */
+	limit = zone->managed_pages >> 2;
+
+	/*
+	 * Deferred compaction restart compaction every 64 compaction
+	 * attempts and it rescans whole zone range. If we limit
+	 * migration scanner to scan 1/64 range when depleted, 64
+	 * compaction attempts will rescan whole zone range as same
+	 * as deferred compaction.
+	 */
+	limit >>= 6;
+	limit = max(limit, COMPACT_MIN_SCAN_LIMIT);
+
+	/* Degradation scan limit according to depletion depth. */
+	limit >>= zone->compact_depletion_depth;
+	cc->migration_scan_limit = max(limit, COMPACT_CLUSTER_MAX);
+}
 /*
  * Compaction is deferred when compaction fails to result in a page
  * allocation success. 1 << compact_defer_limit compactions are skipped up
@@ -839,6 +876,8 @@ isolate_success:
 		update_pageblock_skip(cc, valid_page, nr_isolated,
 					end_pfn, true);

+	cc->migration_scan_limit -= nr_scanned;
+
 	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
 						nr_scanned, nr_isolated);

@@ -1367,6 +1406,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
 	}

+	set_migration_scan_limit(cc);
+
 	trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
 				cc->free_pfn, end_pfn, sync);

diff --git a/mm/internal.h b/mm/internal.h
index 36b23f1..a427695 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -182,6 +182,7 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	long migration_scan_limit;	/* Limit migration scanner activity */
 	enum migrate_mode mode;		/* Async or sync migration mode */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	int order;			/* order a direct compactor needs */
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/9] mm/compaction: remove compaction deferring
  2015-08-24  2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
                   ` (2 preceding siblings ...)
  2015-08-24  2:19 ` [PATCH v2 3/9] mm/compaction: limit compaction activity in compaction depleted state Joonsoo Kim
@ 2015-08-24  2:19 ` Joonsoo Kim
  2015-09-25  9:33   ` Vlastimil Babka
  2015-08-24  2:19 ` [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state Joonsoo Kim
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24  2:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
	Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim

Now, we have a way to determine compaction depleted state and compaction
activity will be limited according this state and depletion depth so
compaction overhead would be well controlled without compaction deferring.
So, this patch remove compaction deferring completely and enable
compaction activity limit.

Various functions are renamed and tracepoint outputs are changed due to
this removing.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/compaction.h        | 14 +------
 include/linux/mmzone.h            |  3 +-
 include/trace/events/compaction.h | 30 ++++++-------
 mm/compaction.c                   | 88 ++++++++++++++-------------------------
 mm/page_alloc.c                   |  2 +-
 mm/vmscan.c                       |  4 +-
 6 files changed, 49 insertions(+), 92 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index aa8f61c..8d98f3c 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -45,11 +45,8 @@ extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern unsigned long compaction_suitable(struct zone *zone, int order,
 					int alloc_flags, int classzone_idx);
 
-extern void defer_compaction(struct zone *zone, int order);
-extern bool compaction_deferred(struct zone *zone, int order);
-extern void compaction_defer_reset(struct zone *zone, int order,
+extern void compaction_failed_reset(struct zone *zone, int order,
 				bool alloc_success);
-extern bool compaction_restarting(struct zone *zone, int order);
 
 #else
 static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
@@ -74,15 +71,6 @@ static inline unsigned long compaction_suitable(struct zone *zone, int order,
 	return COMPACT_SKIPPED;
 }
 
-static inline void defer_compaction(struct zone *zone, int order)
-{
-}
-
-static inline bool compaction_deferred(struct zone *zone, int order)
-{
-	return true;
-}
-
 #endif /* CONFIG_COMPACTION */
 
 #if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 700e9b5..e13b732 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -514,8 +514,7 @@ struct zone {
 	 * are skipped before trying again. The number attempted since
 	 * last failure is tracked with compact_considered.
 	 */
-	unsigned int		compact_considered;
-	unsigned int		compact_defer_shift;
+	int			compact_failed;
 	int			compact_order_failed;
 	unsigned long		compact_success;
 	unsigned long		compact_depletion_depth;
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index 9a6a3fe..323e614 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -239,7 +239,7 @@ DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_suitable,
 );
 
 #ifdef CONFIG_COMPACTION
-DECLARE_EVENT_CLASS(mm_compaction_defer_template,
+DECLARE_EVENT_CLASS(mm_compaction_deplete_template,
 
 	TP_PROTO(struct zone *zone, int order),
 
@@ -249,8 +249,9 @@ DECLARE_EVENT_CLASS(mm_compaction_defer_template,
 		__field(int, nid)
 		__field(char *, name)
 		__field(int, order)
-		__field(unsigned int, considered)
-		__field(unsigned int, defer_shift)
+		__field(unsigned long, success)
+		__field(unsigned long, depletion_depth)
+		__field(int, failed)
 		__field(int, order_failed)
 	),
 
@@ -258,35 +259,30 @@ DECLARE_EVENT_CLASS(mm_compaction_defer_template,
 		__entry->nid = zone_to_nid(zone);
 		__entry->name = (char *)zone->name;
 		__entry->order = order;
-		__entry->considered = zone->compact_considered;
-		__entry->defer_shift = zone->compact_defer_shift;
+		__entry->success = zone->compact_success;
+		__entry->depletion_depth = zone->compact_depletion_depth;
+		__entry->failed = zone->compact_failed;
 		__entry->order_failed = zone->compact_order_failed;
 	),
 
-	TP_printk("node=%d zone=%-8s order=%d order_failed=%d consider=%u limit=%lu",
+	TP_printk("node=%d zone=%-8s order=%d failed=%d order_failed=%d consider=%lu depth=%lu",
 		__entry->nid,
 		__entry->name,
 		__entry->order,
+		__entry->failed,
 		__entry->order_failed,
-		__entry->considered,
-		1UL << __entry->defer_shift)
+		__entry->success,
+		__entry->depletion_depth)
 );
 
-DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_deferred,
+DEFINE_EVENT(mm_compaction_deplete_template, mm_compaction_fail_compaction,
 
 	TP_PROTO(struct zone *zone, int order),
 
 	TP_ARGS(zone, order)
 );
 
-DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_compaction,
-
-	TP_PROTO(struct zone *zone, int order),
-
-	TP_ARGS(zone, order)
-);
-
-DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_reset,
+DEFINE_EVENT(mm_compaction_deplete_template, mm_compaction_failed_reset,
 
 	TP_PROTO(struct zone *zone, int order),
 
diff --git a/mm/compaction.c b/mm/compaction.c
index c6b8277..1817564 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -128,7 +128,7 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
 #ifdef CONFIG_COMPACTION
 
 /* Do not skip compaction more than 64 times */
-#define COMPACT_MAX_DEFER_SHIFT 6
+#define COMPACT_MAX_FAILED 4
 #define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
 #define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
 
@@ -184,61 +184,28 @@ static void set_migration_scan_limit(struct compact_control *cc)
 	limit >>= zone->compact_depletion_depth;
 	cc->migration_scan_limit = max(limit, COMPACT_CLUSTER_MAX);
 }
-/*
- * Compaction is deferred when compaction fails to result in a page
- * allocation success. 1 << compact_defer_limit compactions are skipped up
- * to a limit of 1 << COMPACT_MAX_DEFER_SHIFT
- */
-void defer_compaction(struct zone *zone, int order)
-{
-	zone->compact_considered = 0;
-	zone->compact_defer_shift++;
-
-	if (order < zone->compact_order_failed)
-		zone->compact_order_failed = order;
-
-	if (zone->compact_defer_shift > COMPACT_MAX_DEFER_SHIFT)
-		zone->compact_defer_shift = COMPACT_MAX_DEFER_SHIFT;
-
-	trace_mm_compaction_defer_compaction(zone, order);
-}
 
-/* Returns true if compaction should be skipped this time */
-bool compaction_deferred(struct zone *zone, int order)
+void fail_compaction(struct zone *zone, int order)
 {
-	unsigned long defer_limit = 1UL << zone->compact_defer_shift;
-
-	if (order < zone->compact_order_failed)
-		return false;
-
-	/* Avoid possible overflow */
-	if (++zone->compact_considered > defer_limit)
-		zone->compact_considered = defer_limit;
-
-	if (zone->compact_considered >= defer_limit)
-		return false;
-
-	trace_mm_compaction_deferred(zone, order);
+	if (order < zone->compact_order_failed) {
+		zone->compact_failed = 0;
+		zone->compact_order_failed = order;
+	} else
+		zone->compact_failed++;
 
-	return true;
+	trace_mm_compaction_fail_compaction(zone, order);
 }
 
-/*
- * Update defer tracking counters after successful compaction of given order,
- * which means an allocation either succeeded (alloc_success == true) or is
- * expected to succeed.
- */
-void compaction_defer_reset(struct zone *zone, int order,
+void compaction_failed_reset(struct zone *zone, int order,
 		bool alloc_success)
 {
-	if (alloc_success) {
-		zone->compact_considered = 0;
-		zone->compact_defer_shift = 0;
-	}
+	if (alloc_success)
+		zone->compact_failed = 0;
+
 	if (order >= zone->compact_order_failed)
 		zone->compact_order_failed = order + 1;
 
-	trace_mm_compaction_defer_reset(zone, order);
+	trace_mm_compaction_failed_reset(zone, order);
 }
 
 /* Returns true if restarting compaction after many failures */
@@ -247,8 +214,7 @@ bool compaction_restarting(struct zone *zone, int order)
 	if (order < zone->compact_order_failed)
 		return false;
 
-	return zone->compact_defer_shift == COMPACT_MAX_DEFER_SHIFT &&
-		zone->compact_considered >= 1UL << zone->compact_defer_shift;
+	return zone->compact_failed < COMPACT_MAX_FAILED ? false : true;
 }
 
 /* Returns true if the pageblock should be scanned for pages to isolate. */
@@ -286,6 +252,7 @@ static void __reset_isolation_suitable(struct zone *zone)
 		}
 	}
 	zone->compact_success = 0;
+	zone->compact_failed = 0;
 
 	/* Walk the zone and mark every pageblock as suitable for isolation */
 	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
@@ -335,10 +302,15 @@ static void update_pageblock_skip(struct compact_control *cc,
 	if (!page)
 		return;
 
-	if (nr_isolated)
+	/*
+	 * Always update cached_pfn if compaction has scan_limit,
+	 * otherwise we would scan same range over and over.
+	 */
+	if (cc->migration_scan_limit == LONG_MAX && nr_isolated)
 		return;
 
-	set_pageblock_skip(page);
+	if (!nr_isolated)
+		set_pageblock_skip(page);
 
 	/* Update where async and sync compaction should restart */
 	if (migrate_scanner) {
@@ -1231,6 +1203,9 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
 		return COMPACT_COMPLETE;
 	}
 
+	if (cc->migration_scan_limit < 0)
+		return COMPACT_PARTIAL;
+
 	/*
 	 * order == -1 is expected when compacting via
 	 * /proc/sys/vm/compact_memory
@@ -1407,6 +1382,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 	}
 
 	set_migration_scan_limit(cc);
+	if (cc->migration_scan_limit < 0)
+		return COMPACT_SKIPPED;
 
 	trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
 				cc->free_pfn, end_pfn, sync);
@@ -1588,9 +1565,6 @@ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		int status;
 		int zone_contended;
 
-		if (compaction_deferred(zone, order))
-			continue;
-
 		status = compact_zone_order(zone, order, gfp_mask, mode,
 				&zone_contended, alloc_flags,
 				ac->classzone_idx);
@@ -1610,7 +1584,7 @@ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			 * will repeat this with true if allocation indeed
 			 * succeeds in this zone.
 			 */
-			compaction_defer_reset(zone, order, false);
+			compaction_failed_reset(zone, order, false);
 			/*
 			 * It is possible that async compaction aborted due to
 			 * need_resched() and the watermarks were ok thanks to
@@ -1631,7 +1605,7 @@ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			 * so we defer compaction there. If it ends up
 			 * succeeding after all, it will be reset.
 			 */
-			defer_compaction(zone, order);
+			fail_compaction(zone, order);
 		}
 
 		/*
@@ -1693,13 +1667,13 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 		if (cc->order == -1)
 			__reset_isolation_suitable(zone);
 
-		if (cc->order == -1 || !compaction_deferred(zone, cc->order))
+		if (cc->order == -1)
 			compact_zone(zone, cc);
 
 		if (cc->order > 0) {
 			if (zone_watermark_ok(zone, cc->order,
 						low_wmark_pages(zone), 0, 0))
-				compaction_defer_reset(zone, cc->order, false);
+				compaction_failed_reset(zone, cc->order, false);
 		}
 
 		VM_BUG_ON(!list_empty(&cc->freepages));
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e9cc98..c67f853 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2827,7 +2827,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		struct zone *zone = page_zone(page);
 
 		zone->compact_blockskip_flush = false;
-		compaction_defer_reset(zone, order, true);
+		compaction_failed_reset(zone, order, true);
 		count_vm_event(COMPACTSUCCESS);
 		return page;
 	}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 37e90db..a561b5f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2469,10 +2469,10 @@ static inline bool compaction_ready(struct zone *zone, int order)
 	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
 
 	/*
-	 * If compaction is deferred, reclaim up to a point where
+	 * If compaction is depleted, reclaim up to a point where
 	 * compaction will have a chance of success when re-enabled
 	 */
-	if (compaction_deferred(zone, order))
+	if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
 		return watermark_ok;
 
 	/*
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/9] mm/compaction: remove compaction deferring
  2015-08-24  2:19 ` [PATCH v2 4/9] mm/compaction: remove compaction deferring Joonsoo Kim
@ 2015-09-25  9:33   ` Vlastimil Babka
  2015-09-30  8:28     ` Joonsoo Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-09-25  9:33 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
	Minchan Kim, Joonsoo Kim

On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Now, we have a way to determine compaction depleted state and compaction
> activity will be limited according this state and depletion depth so
> compaction overhead would be well controlled without compaction deferring.
> So, this patch remove compaction deferring completely and enable
> compaction activity limit.
> 
> Various functions are renamed and tracepoint outputs are changed due to
> this removing.

It's more like renaming "deferred" to "failed" and the whole result is somewhat
hard to follow, as the changelog doesn't describe a lot. So if I understand
correctly:
- compaction has to fail 4 times to cause __reset_isolation_suitable(), which
also resets the fail counter back to 0
- thus after each 4 failures, depletion depth is adjusted
- when successes cross the depletion threshold, compaction_depleted() becomes
false and then compact_zone will clear the flag
- with flag clear, scan limit is no longer applied at all, but depletion depth
stays as it is... it will be set to 0 when the flag is set again

Maybe the switch from "depleted with some depth" to "not depleted at all" could
be more gradual?
Also I have a suspicion that the main feature of this (IIUC) which is the scan
limiting (and which I do consider improvement! and IIRC David wished for
something like that too) could be achieved with less code churn that is renaming
"deferred" to "failed" and adding another "depleted" state. E.g.
compact_defer_shift looks similar to compact_depletion_depth, and deferring
could be repurposed for scan limiting instead of binary go/no-go decisions. BTW
the name "depleted" also suggests a binary state, so it's not that much better
name than "deferred" IMHO.

Also I think my objection from patch 2 stays - __reset_isolation_suitable()
called from kswapd will set zone->compact_success = 0, potentially increase
depletion depth etc, with no connection to the number of failed compactions.

[...]

> @@ -1693,13 +1667,13 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
>  		if (cc->order == -1)
>  			__reset_isolation_suitable(zone);
>  
> -		if (cc->order == -1 || !compaction_deferred(zone, cc->order))
> +		if (cc->order == -1)

This change means kswapd no longer compacts from balance_pgdat() ->
compact_pgdat(). Probably you meant to call compact_zone unconditionally here?

>  			compact_zone(zone, cc);
>  
>  		if (cc->order > 0) {
>  			if (zone_watermark_ok(zone, cc->order,
>  						low_wmark_pages(zone), 0, 0))
> -				compaction_defer_reset(zone, cc->order, false);
> +				compaction_failed_reset(zone, cc->order, false);
>  		}
>  
>  		VM_BUG_ON(!list_empty(&cc->freepages));
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0e9cc98..c67f853 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2827,7 +2827,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>  		struct zone *zone = page_zone(page);
>  
>  		zone->compact_blockskip_flush = false;
> -		compaction_defer_reset(zone, order, true);
> +		compaction_failed_reset(zone, order, true);
>  		count_vm_event(COMPACTSUCCESS);
>  		return page;
>  	}
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 37e90db..a561b5f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2469,10 +2469,10 @@ static inline bool compaction_ready(struct zone *zone, int order)
>  	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
>  
>  	/*
> -	 * If compaction is deferred, reclaim up to a point where
> +	 * If compaction is depleted, reclaim up to a point where
>  	 * compaction will have a chance of success when re-enabled
>  	 */
> -	if (compaction_deferred(zone, order))
> +	if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
>  		return watermark_ok;

Hm this is a deviation from the "replace go/no-go with scan limit" principle.
Also compaction_deferred() could recover after some retries, and this flag won't.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/9] mm/compaction: remove compaction deferring
  2015-09-25  9:33   ` Vlastimil Babka
@ 2015-09-30  8:28     ` Joonsoo Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-09-30  8:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
	David Rientjes, Minchan Kim

On Fri, Sep 25, 2015 at 11:33:38AM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> > Now, we have a way to determine compaction depleted state and compaction
> > activity will be limited according this state and depletion depth so
> > compaction overhead would be well controlled without compaction deferring.
> > So, this patch remove compaction deferring completely and enable
> > compaction activity limit.
> > 
> > Various functions are renamed and tracepoint outputs are changed due to
> > this removing.
> 
> It's more like renaming "deferred" to "failed" and the whole result is somewhat
> hard to follow, as the changelog doesn't describe a lot. So if I understand
> correctly:
> - compaction has to fail 4 times to cause __reset_isolation_suitable(), which
> also resets the fail counter back to 0
> - thus after each 4 failures, depletion depth is adjusted
> - when successes cross the depletion threshold, compaction_depleted() becomes
> false and then compact_zone will clear the flag
> - with flag clear, scan limit is no longer applied at all, but depletion depth
> stays as it is... it will be set to 0 when the flag is set again

Correct! I will add this description at some place.

> 
> Maybe the switch from "depleted with some depth" to "not depleted at all" could
> be more gradual?
> Also I have a suspicion that the main feature of this (IIUC) which is the scan
> limiting (and which I do consider improvement! and IIRC David wished for
> something like that too) could be achieved with less code churn that is renaming
> "deferred" to "failed" and adding another "depleted" state. E.g.
> compact_defer_shift looks similar to compact_depletion_depth, and deferring
> could be repurposed for scan limiting instead of binary go/no-go decisions. BTW
> the name "depleted" also suggests a binary state, so it's not that much better
> name than "deferred" IMHO.

Okay. I will try to change current deferred logic to scan limit logic
and make code less churn.
Naming? I will think more.

> Also I think my objection from patch 2 stays - __reset_isolation_suitable()
> called from kswapd will set zone->compact_success = 0, potentially increase
> depletion depth etc, with no connection to the number of failed compactions.
> 
> [...]
> 
> > @@ -1693,13 +1667,13 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
> >  		if (cc->order == -1)
> >  			__reset_isolation_suitable(zone);
> >  
> > -		if (cc->order == -1 || !compaction_deferred(zone, cc->order))
> > +		if (cc->order == -1)
> 
> This change means kswapd no longer compacts from balance_pgdat() ->
> compact_pgdat(). Probably you meant to call compact_zone unconditionally here?

Yes, that's what I want. I will fix it.

> >  			compact_zone(zone, cc);
> >  
> >  		if (cc->order > 0) {
> >  			if (zone_watermark_ok(zone, cc->order,
> >  						low_wmark_pages(zone), 0, 0))
> > -				compaction_defer_reset(zone, cc->order, false);
> > +				compaction_failed_reset(zone, cc->order, false);
> >  		}
> >  
> >  		VM_BUG_ON(!list_empty(&cc->freepages));
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 0e9cc98..c67f853 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2827,7 +2827,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> >  		struct zone *zone = page_zone(page);
> >  
> >  		zone->compact_blockskip_flush = false;
> > -		compaction_defer_reset(zone, order, true);
> > +		compaction_failed_reset(zone, order, true);
> >  		count_vm_event(COMPACTSUCCESS);
> >  		return page;
> >  	}
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 37e90db..a561b5f 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2469,10 +2469,10 @@ static inline bool compaction_ready(struct zone *zone, int order)
> >  	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
> >  
> >  	/*
> > -	 * If compaction is deferred, reclaim up to a point where
> > +	 * If compaction is depleted, reclaim up to a point where
> >  	 * compaction will have a chance of success when re-enabled
> >  	 */
> > -	if (compaction_deferred(zone, order))
> > +	if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
> >  		return watermark_ok;
> 
> Hm this is a deviation from the "replace go/no-go with scan limit" principle.
> Also compaction_deferred() could recover after some retries, and this flag won't.

This means that if compaction success possibility is depleted there is
no need to reclaim more pages above watermark because more reclaim effort
is waste in this situation. It is same with compaction_deferred case.
Recover is done by attemping actual compaction by kswapd or direct
compaction because there is no blocker like as compaction_deferring
logic. I think that this code change is okay.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state
  2015-08-24  2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
                   ` (3 preceding siblings ...)
  2015-08-24  2:19 ` [PATCH v2 4/9] mm/compaction: remove compaction deferring Joonsoo Kim
@ 2015-08-24  2:19 ` Joonsoo Kim
  2015-09-25 10:32   ` Vlastimil Babka
  2015-08-24  2:19 ` [PATCH v2 6/9] mm/compaction: manage separate skip-bits for migration and free scanner Joonsoo Kim
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24  2:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
	Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim

Currently, freescanner doesn't scan non-movable pageblock, because if
freepages in non-movable pageblock are exhausted, another movable
pageblock would be used for non-movable allocation and it could cause
fragmentation.

But, we should know that watermark check for compaction doesn't
distinguish where freepage is. If all freepages are in non-movable
pageblock, although, system has enough freepages and watermark check
is passed, freepage scanner can't get any freepage and compaction will
be failed. There is no way to get precise number of freepage on movable
pageblock and no way to reclaim only used pages in movable pageblock.
Therefore, I think that best way to overcome this situation is
to use freepage in non-movable pageblock in compaction.

My test setup for this situation is:

Memory is artificially fragmented to make order 3 allocation hard. And,
most of pageblocks are changed to unmovable migratetype.

  System: 512 MB with 32 MB Zram
  Memory: 25% memory is allocated to make fragmentation and kernel build
  	is running on background.
  Fragmentation: Successful order 3 allocation candidates may be around
  	1500 roughly.
  Allocation attempts: Roughly 3000 order 3 allocation attempts
  	with GFP_NORETRY. This value is determined to saturate allocation
  	success.

Below is the result of this test.

Test: build-frag-unmovable

Kernel:	Base vs Nonmovable

Success(N)                    37              64
compact_stall                624            5056
compact_success              103             419
compact_fail                 521            4637
pgmigrate_success          22004          277106
compact_isolated           61021         1056863
compact_migrate_scanned  2609360        70252458
compact_free_scanned     4808989        23091292

Column 'Success(N) are calculated by following equations.

Success(N) = successful allocation * 100 / order 3 candidates

Result shows that success rate is roughly doubled in this case
because we can search more area.

Because we just allow freepage scanner to scan non-movable pageblock
in very limited situation, more scanning events happen. But, allowing
in very limited situation results in a very important benefit that
memory isn't fragmented more than before. Fragmentation effect is
measured on following patch so please refer it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/mmzone.h |  1 +
 mm/compaction.c        | 27 +++++++++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e13b732..5cae0ad 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -545,6 +545,7 @@ enum zone_flags {
 					 */
 	ZONE_FAIR_DEPLETED,		/* fair zone policy batch depleted */
 	ZONE_COMPACTION_DEPLETED,	/* compaction possiblity depleted */
+	ZONE_COMPACTION_SCANALLFREE,	/* scan all kinds of pageblocks */
 };
 
 static inline unsigned long zone_end_pfn(const struct zone *zone)
diff --git a/mm/compaction.c b/mm/compaction.c
index 1817564..b58f162 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -243,9 +243,17 @@ static void __reset_isolation_suitable(struct zone *zone)
 	zone->compact_cached_free_pfn = end_pfn;
 	zone->compact_blockskip_flush = false;
 
+	clear_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags);
 	if (compaction_depleted(zone)) {
 		if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
 			zone->compact_depletion_depth++;
+
+			/* Last resort to make high-order page */
+			if (!zone->compact_success) {
+				set_bit(ZONE_COMPACTION_SCANALLFREE,
+					&zone->flags);
+			}
+
 		else {
 			set_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
 			zone->compact_depletion_depth = 0;
@@ -914,7 +922,8 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 #ifdef CONFIG_COMPACTION
 
 /* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+static bool suitable_migration_target(struct compact_control *cc,
+					struct page *page)
 {
 	/* If the page is a large free page, then disallow migration */
 	if (PageBuddy(page)) {
@@ -931,6 +940,16 @@ static bool suitable_migration_target(struct page *page)
 	if (migrate_async_suitable(get_pageblock_migratetype(page)))
 		return true;
 
+	/*
+	 * Allow to scan all kinds of pageblock. Without this relaxation,
+	 * all freepage could be in non-movable pageblock and compaction
+	 * can be satuarated and cannot make high-order page even if there
+	 * is enough freepage in the system.
+	 */
+	if (cc->mode != MIGRATE_ASYNC &&
+		test_bit(ZONE_COMPACTION_SCANALLFREE, &cc->zone->flags))
+		return true;
+
 	/* Otherwise skip the block */
 	return false;
 }
@@ -992,7 +1011,7 @@ static void isolate_freepages(struct compact_control *cc)
 			continue;
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
+		if (!suitable_migration_target(cc, page))
 			continue;
 
 		/* If isolation recently failed, do not retry */
@@ -1494,6 +1513,10 @@ out:
 	if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags)) {
 		if (!compaction_depleted(zone))
 			clear_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
+
+		if (zone->compact_success &&
+			test_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags))
+			clear_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags);
 	}
 
 	return ret;
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state
  2015-08-24  2:19 ` [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state Joonsoo Kim
@ 2015-09-25 10:32   ` Vlastimil Babka
  2015-09-30  8:30     ` Joonsoo Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-09-25 10:32 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
	Minchan Kim, Joonsoo Kim

On 08/24/2015 04:19 AM, Joonsoo Kim wrote:

[...]

> 
> Because we just allow freepage scanner to scan non-movable pageblock
> in very limited situation, more scanning events happen. But, allowing
> in very limited situation results in a very important benefit that
> memory isn't fragmented more than before. Fragmentation effect is
> measured on following patch so please refer it.

AFAICS it's measured only for the whole series in the cover letter, no? Just to
be sure I didn't overlook something.

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/mmzone.h |  1 +
>  mm/compaction.c        | 27 +++++++++++++++++++++++++--
>  2 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index e13b732..5cae0ad 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -545,6 +545,7 @@ enum zone_flags {
>  					 */
>  	ZONE_FAIR_DEPLETED,		/* fair zone policy batch depleted */
>  	ZONE_COMPACTION_DEPLETED,	/* compaction possiblity depleted */
> +	ZONE_COMPACTION_SCANALLFREE,	/* scan all kinds of pageblocks */

"SCANALLFREE" is hard to read. Otherwise yeah, I agree scanning unmovable
pageblocks is necessary sometimes, and this seems to make a reasonable tradeoff.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state
  2015-09-25 10:32   ` Vlastimil Babka
@ 2015-09-30  8:30     ` Joonsoo Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-09-30  8:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
	David Rientjes, Minchan Kim

On Fri, Sep 25, 2015 at 12:32:02PM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> 
> [...]
> 
> > 
> > Because we just allow freepage scanner to scan non-movable pageblock
> > in very limited situation, more scanning events happen. But, allowing
> > in very limited situation results in a very important benefit that
> > memory isn't fragmented more than before. Fragmentation effect is
> > measured on following patch so please refer it.
> 
> AFAICS it's measured only for the whole series in the cover letter, no? Just to
> be sure I didn't overlook something.

It takes too much time so no measurement is done on every patch.
I will try to measure it on at least this patch in next revision.

> 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  include/linux/mmzone.h |  1 +
> >  mm/compaction.c        | 27 +++++++++++++++++++++++++--
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index e13b732..5cae0ad 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -545,6 +545,7 @@ enum zone_flags {
> >  					 */
> >  	ZONE_FAIR_DEPLETED,		/* fair zone policy batch depleted */
> >  	ZONE_COMPACTION_DEPLETED,	/* compaction possiblity depleted */
> > +	ZONE_COMPACTION_SCANALLFREE,	/* scan all kinds of pageblocks */
> 
> "SCANALLFREE" is hard to read. Otherwise yeah, I agree scanning unmovable
> pageblocks is necessary sometimes, and this seems to make a reasonable tradeoff.

Good! I will think better name.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 6/9] mm/compaction: manage separate skip-bits for migration and free scanner
  2015-08-24  2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
                   ` (4 preceding siblings ...)
  2015-08-24  2:19 ` [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state Joonsoo Kim
@ 2015-08-24  2:19 ` Joonsoo Kim
  2015-08-24  2:19 ` [PATCH v2 7/9] mm/compaction: redesign compaction Joonsoo Kim
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24  2:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
	Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim

Currently, just one skip-bit is used for migration and free scanner
at the sametime. This has problem if migrate scanner go into
the region where free scanner marks the skip-bit. Free scanner
just checks if there is freepage or not, so there would be migratable
page. But, due to skip-bit, migrate scanner would skip scanning.

Currently, this doesn't result in any problem because migration scanner
and free scanner always meets similar position in the zone and
stops scanning at that position.

But, following patch will change compaction algorithm that migration
scanner scans whole zone range in order to get much better success rate.
In this case, skip-bit marked from freepage scanner should be ignored
but at the sametime we need to check if there is migratable page and
skip that pageblock in next time. This cannot be achived by just one
skip-bit so this patch add one more skip-bit and use each one
for migrate and free scanner, respectively.

This patch incrases memory usage that each pageblock uses 4 bit more than
before. This means that if we have 1GB memory system we lose another
256 bytes. I think this is really marginal overhead.

Motivation for compaction algorithm change will be mentioned
on following patch. Please refer it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/mmzone.h          |  3 ---
 include/linux/pageblock-flags.h | 37 +++++++++++++++++++++++++++----------
 mm/compaction.c                 | 25 ++++++++++++++++---------
 mm/page_alloc.c                 |  3 ++-
 4 files changed, 45 insertions(+), 23 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5cae0ad..e641fd1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -75,9 +75,6 @@ enum {
 
 extern int page_group_by_mobility_disabled;
 
-#define NR_MIGRATETYPE_BITS (PB_migrate_end - PB_migrate + 1)
-#define MIGRATETYPE_MASK ((1UL << NR_MIGRATETYPE_BITS) - 1)
-
 #define get_pageblock_migratetype(page)					\
 	get_pfnblock_flags_mask(page, page_to_pfn(page),		\
 			PB_migrate_end, MIGRATETYPE_MASK)
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 2baeee1..de6997e 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -30,8 +30,13 @@ enum pageblock_bits {
 	PB_migrate,
 	PB_migrate_end = PB_migrate + 3 - 1,
 			/* 3 bits required for migrate types */
-	PB_migrate_skip,/* If set the block is skipped by compaction */
+	PB_padding1,	/* Padding for 4 byte aligned migrate types */
+	NR_MIGRATETYPE_BITS,
 
+	PB_skip_migratescan = 4,/* If set the block is skipped by compaction */
+	PB_skip_freescan,
+	PB_padding2,
+	PB_padding3,
 	/*
 	 * Assume the bits will always align on a word. If this assumption
 	 * changes then get/set pageblock needs updating.
@@ -39,6 +44,8 @@ enum pageblock_bits {
 	NR_PAGEBLOCK_BITS
 };
 
+#define MIGRATETYPE_MASK ((1UL << NR_MIGRATETYPE_BITS) - 1)
+
 #ifdef CONFIG_HUGETLB_PAGE
 
 #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
@@ -87,15 +94,25 @@ void set_pfnblock_flags_mask(struct page *page,
 			(1 << (end_bitidx - start_bitidx + 1)) - 1)
 
 #ifdef CONFIG_COMPACTION
-#define get_pageblock_skip(page) \
-			get_pageblock_flags_group(page, PB_migrate_skip,     \
-							PB_migrate_skip)
-#define clear_pageblock_skip(page) \
-			set_pageblock_flags_group(page, 0, PB_migrate_skip,  \
-							PB_migrate_skip)
-#define set_pageblock_skip(page) \
-			set_pageblock_flags_group(page, 1, PB_migrate_skip,  \
-							PB_migrate_skip)
+#define get_pageblock_skip_migratescan(page) \
+		get_pageblock_flags_group(page, PB_skip_migratescan,	\
+						PB_skip_migratescan)
+#define clear_pageblock_skip_migratescan(page) \
+		set_pageblock_flags_group(page, 0, PB_skip_migratescan,	\
+						PB_skip_migratescan)
+#define set_pageblock_skip_migratescan(page) \
+		set_pageblock_flags_group(page, 1, PB_skip_migratescan,	\
+						PB_skip_migratescan)
+#define get_pageblock_skip_freescan(page) \
+		get_pageblock_flags_group(page, PB_skip_freescan,	\
+						PB_skip_freescan)
+#define clear_pageblock_skip_freescan(page) \
+		set_pageblock_flags_group(page, 0, PB_skip_freescan,	\
+						PB_skip_freescan)
+#define set_pageblock_skip_freescan(page) \
+		set_pageblock_flags_group(page, 1, PB_skip_freescan,	\
+						PB_skip_freescan)
+
 #endif /* CONFIG_COMPACTION */
 
 #endif	/* PAGEBLOCK_FLAGS_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index b58f162..a259608 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -219,12 +219,15 @@ bool compaction_restarting(struct zone *zone, int order)
 
 /* Returns true if the pageblock should be scanned for pages to isolate. */
 static inline bool isolation_suitable(struct compact_control *cc,
-					struct page *page)
+					struct page *page, bool migrate_scanner)
 {
 	if (cc->ignore_skip_hint)
 		return true;
 
-	return !get_pageblock_skip(page);
+	if (migrate_scanner)
+		return !get_pageblock_skip_migratescan(page);
+	else
+		return !get_pageblock_skip_freescan(page);
 }
 
 /*
@@ -275,7 +278,8 @@ static void __reset_isolation_suitable(struct zone *zone)
 		if (zone != page_zone(page))
 			continue;
 
-		clear_pageblock_skip(page);
+		clear_pageblock_skip_migratescan(page);
+		clear_pageblock_skip_freescan(page);
 	}
 }
 
@@ -317,24 +321,27 @@ static void update_pageblock_skip(struct compact_control *cc,
 	if (cc->migration_scan_limit == LONG_MAX && nr_isolated)
 		return;
 
-	if (!nr_isolated)
-		set_pageblock_skip(page);
-
 	/* Update where async and sync compaction should restart */
 	if (migrate_scanner) {
+		if (!nr_isolated)
+			set_pageblock_skip_migratescan(page);
+
 		if (pfn > zone->compact_cached_migrate_pfn[0])
 			zone->compact_cached_migrate_pfn[0] = pfn;
 		if (cc->mode != MIGRATE_ASYNC &&
 		    pfn > zone->compact_cached_migrate_pfn[1])
 			zone->compact_cached_migrate_pfn[1] = pfn;
 	} else {
+		if (!nr_isolated)
+			set_pageblock_skip_freescan(page);
+
 		if (pfn < zone->compact_cached_free_pfn)
 			zone->compact_cached_free_pfn = pfn;
 	}
 }
 #else
 static inline bool isolation_suitable(struct compact_control *cc,
-					struct page *page)
+					struct page *page, bool migrate_scanner)
 {
 	return true;
 }
@@ -1015,7 +1022,7 @@ static void isolate_freepages(struct compact_control *cc)
 			continue;
 
 		/* If isolation recently failed, do not retry */
-		if (!isolation_suitable(cc, page))
+		if (!isolation_suitable(cc, page, false))
 			continue;
 
 		/* Found a block suitable for isolating free pages from. */
@@ -1154,7 +1161,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			continue;
 
 		/* If isolation recently failed, do not retry */
-		if (!isolation_suitable(cc, page))
+		if (!isolation_suitable(cc, page, true))
 			continue;
 
 		/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c67f853..a9a78d1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6623,7 +6623,8 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
 	unsigned long bitidx, word_bitidx;
 	unsigned long old_word, word;
 
-	BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4);
+	BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 8);
+	BUILD_BUG_ON(NR_MIGRATETYPE_BITS != 4);
 
 	zone = page_zone(page);
 	bitmap = get_pageblock_bitmap(zone, pfn);
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 7/9] mm/compaction: redesign compaction
  2015-08-24  2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
                   ` (5 preceding siblings ...)
  2015-08-24  2:19 ` [PATCH v2 6/9] mm/compaction: manage separate skip-bits for migration and free scanner Joonsoo Kim
@ 2015-08-24  2:19 ` Joonsoo Kim
  2015-10-14 11:38   ` Vlastimil Babka
  2015-08-24  2:19 ` [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at Joonsoo Kim
  2015-08-24  2:19 ` [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone Joonsoo Kim
  8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24  2:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
	Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim

Currently, compaction works as following.
1) migration scanner scans from zone_start_pfn to zone_end_pfn
to find migratable pages
2) free scanner scans from zone_end_pfn to zone_start_pfn to
find free pages
3) If both scanner crossed, compaction is finished.

This algorithm has some drawbacks. 1) Back of the zone cannot be
scanned by migration scanner because migration scanner can't pass
over freepage scanner. So, although there are some high order page
candidates at back of the zone, we can't utilize it.
Another weakness is 2) compaction's success highly depends on amount
of freepage. Compaction can migrate used pages by amount of freepage
at maximum. If we can't make high order page by this effort, both
scanner should meet and compaction will fail.

We can easily observe problem 1) by following test.

Memory is artificially fragmented to make order 3 allocation hard. And,
most of pageblocks are changed to movable migratetype.

  System: 512 MB with 32 MB Zram
  Memory: 25% memory is allocated to make fragmentation and 200 MB is
  	occupied by memory hogger. Most pageblocks are movable
  	migratetype.
  Fragmentation: Successful order 3 allocation candidates may be around
  	1500 roughly.
  Allocation attempts: Roughly 3000 order 3 allocation attempts
  	with GFP_NORETRY. This value is determined to saturate allocation
  	success.

Test: hogger-frag-movable

Success(N)                    70
compact_stall                307
compact_success               64
compact_fail                 243
pgmigrate_success          34592
compact_isolated           73977
compact_migrate_scanned  2280770
compact_free_scanned     4710313

Column 'Success(N) are calculated by following equations.

Success(N) = successful allocation * 100 /
		number of successful order-3 candidates

As mentioned above, there are roughly 1500 high order page candidates,
but, compaction just returns 70% of them, because migration scanner
can't pass over freepage scanner. With new compaction approach, it can
be increased to 94% by this patch.

To check 2), hogger-frag-movable benchmark is used again, but, with some
tweaks. Amount of allocated memory by memory hogger varys.

Test: hogger-frag-movable with free memory variation

Kernel:	Base

200MB-Success(N)	70
250MB-Success(N)	38
300MB-Success(N)	29

As background knowledge, up to 250MB, there is enough
memory to succeed all order-3 allocation attempts. In 300MB case,
available memory before starting allocation attempt is just 57MB,
so all of attempts cannot succeed.

Anyway, as free memory decreases, compaction success rate also decreases.
It is better to remove this dependency to get stable compaction result
in any case.

This patch solves these problems mentioned in above.
Freepage scanner is largely changed to scan zone from zone_start_pfn
to zone_end_pfn. And, by this change, compaction finish condition is also
changed that migration scanner reach zone_end_pfn. With these changes,
migration scanner can traverse anywhere in the zone.

To prevent back and forth migration within one compaction iteration,
freepage scanner marks skip-bit when scanning pageblock. migration scanner
checks it and will skip this marked pageblock so back and forth migration
cannot be possible in one compaction iteration.

If freepage scanner reachs the end of zone, it restarts at zone_start_pfn.
In this time, freepage scanner would scan the pageblock where migration
scanner has migrated some pages but fail to make high order page. This
leaved freepages means that they can't become high order page due to
the fragmentation so it is good source for freepage scanner.

With this change, above test result is:

Kernel:	Base vs Redesign

Test: hogger-frag-movable

Success(N)                    70              94
compact_stall                307            3642
compact_success               64             144
compact_fail                 243            3498
pgmigrate_success          34592        15897219
compact_isolated           73977        31899553
compact_migrate_scanned  2280770        59146745
compact_free_scanned     4710313        49566134

Test: hogger-frag-movable with free memory variation

200MB-Success(N)	70	94
250MB-Success(N)	38	93
300MB-Success(N)	29	89

Compaction gives us almost all possible high order page. Overhead is
highly increased, but, further patch will reduce it greatly
by adjusting depletion check with this new algorithm.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c | 145 ++++++++++++++++++++++++++++++--------------------------
 1 file changed, 77 insertions(+), 68 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a259608..ca4d6d1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -53,17 +53,17 @@ static const char *const compaction_status_string[] = {
 static unsigned long release_freepages(struct list_head *freelist)
 {
 	struct page *page, *next;
-	unsigned long high_pfn = 0;
+	unsigned long low_pfn = ULONG_MAX;
 
 	list_for_each_entry_safe(page, next, freelist, lru) {
 		unsigned long pfn = page_to_pfn(page);
 		list_del(&page->lru);
 		__free_page(page);
-		if (pfn > high_pfn)
-			high_pfn = pfn;
+		if (pfn < low_pfn)
+			low_pfn = pfn;
 	}
 
-	return high_pfn;
+	return low_pfn;
 }
 
 static void map_pages(struct list_head *list)
@@ -243,7 +243,7 @@ static void __reset_isolation_suitable(struct zone *zone)
 
 	zone->compact_cached_migrate_pfn[0] = start_pfn;
 	zone->compact_cached_migrate_pfn[1] = start_pfn;
-	zone->compact_cached_free_pfn = end_pfn;
+	zone->compact_cached_free_pfn = start_pfn;
 	zone->compact_blockskip_flush = false;
 
 	clear_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags);
@@ -335,7 +335,7 @@ static void update_pageblock_skip(struct compact_control *cc,
 		if (!nr_isolated)
 			set_pageblock_skip_freescan(page);
 
-		if (pfn < zone->compact_cached_free_pfn)
+		if (pfn > zone->compact_cached_free_pfn)
 			zone->compact_cached_free_pfn = pfn;
 	}
 }
@@ -869,8 +869,11 @@ isolate_success:
 						nr_scanned, nr_isolated);
 
 	count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
-	if (nr_isolated)
+	if (nr_isolated) {
 		count_compact_events(COMPACTISOLATED, nr_isolated);
+		if (valid_page)
+			clear_pageblock_skip_freescan(valid_page);
+	}
 
 	return low_pfn;
 }
@@ -969,12 +972,14 @@ static void isolate_freepages(struct compact_control *cc)
 {
 	struct zone *zone = cc->zone;
 	struct page *page;
+	unsigned long pfn;
 	unsigned long block_start_pfn;	/* start of current pageblock */
-	unsigned long isolate_start_pfn; /* exact pfn we start at */
 	unsigned long block_end_pfn;	/* end of current pageblock */
-	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
 	struct list_head *freelist = &cc->freepages;
+	unsigned long nr_isolated;
+	bool rotated = false;
 
+retry:
 	/*
 	 * Initialise the free scanner. The starting point is where we last
 	 * successfully isolated from, zone-cached value, or the end of the
@@ -986,22 +991,21 @@ static void isolate_freepages(struct compact_control *cc)
 	 * The low boundary is the end of the pageblock the migration scanner
 	 * is using.
 	 */
-	isolate_start_pfn = cc->free_pfn;
-	block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
-	block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
-						zone_end_pfn(zone));
-	low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
+	pfn = cc->free_pfn;
 
-	/*
-	 * Isolate free pages until enough are available to migrate the
-	 * pages on cc->migratepages. We stop searching if the migrate
-	 * and free page scanners meet or enough free pages are isolated.
-	 */
-	for (; block_start_pfn >= low_pfn &&
-			cc->nr_migratepages > cc->nr_freepages;
-				block_end_pfn = block_start_pfn,
-				block_start_pfn -= pageblock_nr_pages,
-				isolate_start_pfn = block_start_pfn) {
+	for (; pfn < zone_end_pfn(zone) &&
+		cc->nr_migratepages > cc->nr_freepages;) {
+
+		block_start_pfn = pfn & ~(pageblock_nr_pages-1);
+		block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+		/* Skip the pageblock where migration scan is */
+		if (block_start_pfn ==
+			(cc->migrate_pfn & ~(pageblock_nr_pages-1))) {
+			pfn = block_end_pfn;
+			continue;
+		}
 
 		/*
 		 * This can iterate a massively long zone without finding any
@@ -1012,35 +1016,31 @@ static void isolate_freepages(struct compact_control *cc)
 						&& compact_should_abort(cc))
 			break;
 
-		page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
-									zone);
-		if (!page)
+		page = pageblock_pfn_to_page(pfn, block_end_pfn, zone);
+		if (!page) {
+			pfn = block_end_pfn;
 			continue;
+		}
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(cc, page))
+		if (!suitable_migration_target(cc, page)) {
+			pfn = block_end_pfn;
 			continue;
+		}
 
 		/* If isolation recently failed, do not retry */
-		if (!isolation_suitable(cc, page, false))
+		if (!isolation_suitable(cc, page, false)) {
+			pfn = block_end_pfn;
 			continue;
+		}
 
 		/* Found a block suitable for isolating free pages from. */
-		isolate_freepages_block(cc, &isolate_start_pfn,
+		nr_isolated = isolate_freepages_block(cc, &pfn,
 					block_end_pfn, freelist, false);
 
-		/*
-		 * Remember where the free scanner should restart next time,
-		 * which is where isolate_freepages_block() left off.
-		 * But if it scanned the whole pageblock, isolate_start_pfn
-		 * now points at block_end_pfn, which is the start of the next
-		 * pageblock.
-		 * In that case we will however want to restart at the start
-		 * of the previous pageblock.
-		 */
-		cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
-				isolate_start_pfn :
-				block_start_pfn - pageblock_nr_pages;
+		/* To prevent back and forth migration */
+		if (nr_isolated)
+			set_pageblock_skip_migratescan(page);
 
 		/*
 		 * isolate_freepages_block() might have aborted due to async
@@ -1053,12 +1053,15 @@ static void isolate_freepages(struct compact_control *cc)
 	/* split_free_page does not map the pages */
 	map_pages(freelist);
 
-	/*
-	 * If we crossed the migrate scanner, we want to keep it that way
-	 * so that compact_finished() may detect this
-	 */
-	if (block_start_pfn < low_pfn)
-		cc->free_pfn = cc->migrate_pfn;
+	cc->free_pfn = pfn;
+	if (cc->free_pfn >= zone_end_pfn(zone)) {
+		cc->free_pfn = zone->zone_start_pfn;
+		zone->compact_cached_free_pfn = cc->free_pfn;
+		if (cc->nr_freepages == 0 && !rotated) {
+			rotated = true;
+			goto retry;
+		}
+	}
 }
 
 /*
@@ -1144,8 +1147,9 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	 * Iterate over whole pageblocks until we find the first suitable.
 	 * Do not cross the free scanner.
 	 */
-	for (; end_pfn <= cc->free_pfn;
-			low_pfn = end_pfn, end_pfn += pageblock_nr_pages) {
+	for (; low_pfn < zone_end_pfn(zone); low_pfn = end_pfn) {
+		end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
+		end_pfn = min(end_pfn, zone_end_pfn(zone));
 
 		/*
 		 * This can potentially iterate a massively long zone with
@@ -1191,12 +1195,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	}
 
 	acct_isolated(zone, cc);
-	/*
-	 * Record where migration scanner will be restarted. If we end up in
-	 * the same pageblock as the free scanner, make the scanners fully
-	 * meet so that compact_finished() terminates compaction.
-	 */
-	cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
+	cc->migrate_pfn = low_pfn;
 
 	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
 }
@@ -1211,11 +1210,15 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
 		return COMPACT_PARTIAL;
 
 	/* Compaction run completes if the migrate and free scanner meet */
-	if (cc->free_pfn <= cc->migrate_pfn) {
+	if (cc->migrate_pfn >= zone_end_pfn(zone)) {
+		/* Stop the async compaction */
+		zone->compact_cached_migrate_pfn[0] = zone_end_pfn(zone);
+		if (cc->mode == MIGRATE_ASYNC)
+			return COMPACT_PARTIAL;
+
 		/* Let the next compaction start anew. */
 		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
 		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
-		zone->compact_cached_free_pfn = zone_end_pfn(zone);
 
 		/*
 		 * Mark that the PG_migrate_skip information should be cleared
@@ -1397,11 +1400,14 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 	 */
 	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
 	cc->free_pfn = zone->compact_cached_free_pfn;
-	if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
-		cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
+	if (cc->mode == MIGRATE_ASYNC && cc->migrate_pfn >= end_pfn)
+		return COMPACT_SKIPPED;
+
+	if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
+		cc->free_pfn = start_pfn;
 		zone->compact_cached_free_pfn = cc->free_pfn;
 	}
-	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
+	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
 		cc->migrate_pfn = start_pfn;
 		zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
 		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
@@ -1450,11 +1456,16 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		if (err) {
 			putback_movable_pages(&cc->migratepages);
 			/*
-			 * migrate_pages() may return -ENOMEM when scanners meet
-			 * and we want compact_finished() to detect it
+			 * migrate_pages() may return -ENOMEM when free scanner
+			 * doesn't find any freepage, and, in this case, more
+			 * compaction attempt is useless because it cannot
+			 * be successful due to no free memory. We'd like to
+			 * consider this state as COMPACT_COMPLETE although
+			 * migration scanner doesn't reach end_pfn of zone and
+			 * fail to compaction.
 			 */
-			if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
-				ret = COMPACT_PARTIAL;
+			if (err == -ENOMEM) {
+				ret = COMPACT_COMPLETE;
 				goto out;
 			}
 		}
@@ -1504,13 +1515,11 @@ out:
 
 		cc->nr_freepages = 0;
 		VM_BUG_ON(free_pfn == 0);
-		/* The cached pfn is always the first in a pageblock */
-		free_pfn &= ~(pageblock_nr_pages-1);
 		/*
 		 * Only go back, not forward. The cached pfn might have been
 		 * already reset to zone end in compact_finished()
 		 */
-		if (free_pfn > zone->compact_cached_free_pfn)
+		if (free_pfn < zone->compact_cached_free_pfn)
 			zone->compact_cached_free_pfn = free_pfn;
 	}
 
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/9] mm/compaction: redesign compaction
  2015-08-24  2:19 ` [PATCH v2 7/9] mm/compaction: redesign compaction Joonsoo Kim
@ 2015-10-14 11:38   ` Vlastimil Babka
  2015-10-15  2:38     ` Joonsoo Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-10-14 11:38 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
	Minchan Kim, Joonsoo Kim

On 08/24/2015 04:19 AM, Joonsoo Kim wrote:

[...]

> This patch solves these problems mentioned in above.
> Freepage scanner is largely changed to scan zone from zone_start_pfn
> to zone_end_pfn. And, by this change, compaction finish condition is also
> changed that migration scanner reach zone_end_pfn. With these changes,
> migration scanner can traverse anywhere in the zone.
> 
> To prevent back and forth migration within one compaction iteration,
> freepage scanner marks skip-bit when scanning pageblock. migration scanner
> checks it and will skip this marked pageblock so back and forth migration
> cannot be possible in one compaction iteration.
> 
> If freepage scanner reachs the end of zone, it restarts at zone_start_pfn.
> In this time, freepage scanner would scan the pageblock where migration
> scanner has migrated some pages but fail to make high order page. This
> leaved freepages means that they can't become high order page due to
> the fragmentation so it is good source for freepage scanner.

Hmm, there might be danger that freepage scanner will proceed faster than
migration scanner, as evidenced by the current code where the scanners meet
earlier than in the middle. Thus, it would likely mark pageblocks as unsuitable
for migration scanner, before it can reach them?

> With this change, above test result is:
> 
> Kernel:	Base vs Redesign
> 
> Test: hogger-frag-movable
> 
> Success(N)                    70              94
> compact_stall                307            3642
> compact_success               64             144
> compact_fail                 243            3498
> pgmigrate_success          34592        15897219
> compact_isolated           73977        31899553
> compact_migrate_scanned  2280770        59146745
> compact_free_scanned     4710313        49566134
> 
> Test: hogger-frag-movable with free memory variation
> 
> 200MB-Success(N)	70	94
> 250MB-Success(N)	38	93
> 300MB-Success(N)	29	89
> 
> Compaction gives us almost all possible high order page. Overhead is
> highly increased, but, further patch will reduce it greatly
> by adjusting depletion check with this new algorithm.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/compaction.c | 145 ++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 77 insertions(+), 68 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index a259608..ca4d6d1 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -53,17 +53,17 @@ static const char *const compaction_status_string[] = {
>  static unsigned long release_freepages(struct list_head *freelist)
>  {
>  	struct page *page, *next;
> -	unsigned long high_pfn = 0;
> +	unsigned long low_pfn = ULONG_MAX;
>  
>  	list_for_each_entry_safe(page, next, freelist, lru) {
>  		unsigned long pfn = page_to_pfn(page);
>  		list_del(&page->lru);
>  		__free_page(page);
> -		if (pfn > high_pfn)
> -			high_pfn = pfn;
> +		if (pfn < low_pfn)
> +			low_pfn = pfn;
>  	}
>  
> -	return high_pfn;
> +	return low_pfn;

This assumes that the freelist won't contain both pages from the end of the
zone, and from the beginning the zone after the free scanner has wrapped. Which
is true, but not immediately obvious, so it deserves some comment.

>  }
>  
>  static void map_pages(struct list_head *list)
> @@ -243,7 +243,7 @@ static void __reset_isolation_suitable(struct zone *zone)
>  
>  	zone->compact_cached_migrate_pfn[0] = start_pfn;
>  	zone->compact_cached_migrate_pfn[1] = start_pfn;
> -	zone->compact_cached_free_pfn = end_pfn;
> +	zone->compact_cached_free_pfn = start_pfn;
>  	zone->compact_blockskip_flush = false;
>  
>  	clear_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags);
> @@ -335,7 +335,7 @@ static void update_pageblock_skip(struct compact_control *cc,
>  		if (!nr_isolated)
>  			set_pageblock_skip_freescan(page);
>  
> -		if (pfn < zone->compact_cached_free_pfn)
> +		if (pfn > zone->compact_cached_free_pfn)
>  			zone->compact_cached_free_pfn = pfn;
>  	}
>  }
> @@ -869,8 +869,11 @@ isolate_success:
>  						nr_scanned, nr_isolated);
>  
>  	count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
> -	if (nr_isolated)
> +	if (nr_isolated) {
>  		count_compact_events(COMPACTISOLATED, nr_isolated);
> +		if (valid_page)
> +			clear_pageblock_skip_freescan(valid_page);

This is explained in changelog, but a comment here would be useful too.

> +	}
>  
>  	return low_pfn;
>  }
> @@ -969,12 +972,14 @@ static void isolate_freepages(struct compact_control *cc)
>  {
>  	struct zone *zone = cc->zone;
>  	struct page *page;
> +	unsigned long pfn;
>  	unsigned long block_start_pfn;	/* start of current pageblock */
> -	unsigned long isolate_start_pfn; /* exact pfn we start at */
>  	unsigned long block_end_pfn;	/* end of current pageblock */
> -	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
>  	struct list_head *freelist = &cc->freepages;
> +	unsigned long nr_isolated;
> +	bool rotated = false;
>  
> +retry:
>  	/*
>  	 * Initialise the free scanner. The starting point is where we last
>  	 * successfully isolated from, zone-cached value, or the end of the
> @@ -986,22 +991,21 @@ static void isolate_freepages(struct compact_control *cc)
>  	 * The low boundary is the end of the pageblock the migration scanner
>  	 * is using.
>  	 */
> -	isolate_start_pfn = cc->free_pfn;
> -	block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
> -	block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
> -						zone_end_pfn(zone));
> -	low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
> +	pfn = cc->free_pfn;
>  
> -	/*
> -	 * Isolate free pages until enough are available to migrate the
> -	 * pages on cc->migratepages. We stop searching if the migrate
> -	 * and free page scanners meet or enough free pages are isolated.
> -	 */
> -	for (; block_start_pfn >= low_pfn &&
> -			cc->nr_migratepages > cc->nr_freepages;
> -				block_end_pfn = block_start_pfn,
> -				block_start_pfn -= pageblock_nr_pages,
> -				isolate_start_pfn = block_start_pfn) {
> +	for (; pfn < zone_end_pfn(zone) &&
> +		cc->nr_migratepages > cc->nr_freepages;) {
> +
> +		block_start_pfn = pfn & ~(pageblock_nr_pages-1);
> +		block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
> +		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> +		/* Skip the pageblock where migration scan is */
> +		if (block_start_pfn ==
> +			(cc->migrate_pfn & ~(pageblock_nr_pages-1))) {
> +			pfn = block_end_pfn;

Can't this "pfn = block_end_pfn" be in the for-loop header? Can it ever happen
that all of the following is true? I don't think it can?
- isolate_freepages_block() returns before reaching block_end_pfn
- it hasn't isolated enough, so cc->nr_migratepages > cc->nr_freepages is true
- cc->contended is not set

> +			continue;
> +		}
>  
>  		/*
>  		 * This can iterate a massively long zone without finding any
> @@ -1012,35 +1016,31 @@ static void isolate_freepages(struct compact_control *cc)
>  						&& compact_should_abort(cc))
>  			break;
>  
> -		page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
> -									zone);
> -		if (!page)
> +		page = pageblock_pfn_to_page(pfn, block_end_pfn, zone);
> +		if (!page) {
> +			pfn = block_end_pfn;
>  			continue;
> +		}
>  
>  		/* Check the block is suitable for migration */
> -		if (!suitable_migration_target(cc, page))
> +		if (!suitable_migration_target(cc, page)) {
> +			pfn = block_end_pfn;
>  			continue;
> +		}
>  
>  		/* If isolation recently failed, do not retry */
> -		if (!isolation_suitable(cc, page, false))
> +		if (!isolation_suitable(cc, page, false)) {
> +			pfn = block_end_pfn;
>  			continue;
> +		}
>  
>  		/* Found a block suitable for isolating free pages from. */
> -		isolate_freepages_block(cc, &isolate_start_pfn,
> +		nr_isolated = isolate_freepages_block(cc, &pfn,
>  					block_end_pfn, freelist, false);
>  
> -		/*
> -		 * Remember where the free scanner should restart next time,
> -		 * which is where isolate_freepages_block() left off.
> -		 * But if it scanned the whole pageblock, isolate_start_pfn
> -		 * now points at block_end_pfn, which is the start of the next
> -		 * pageblock.
> -		 * In that case we will however want to restart at the start
> -		 * of the previous pageblock.
> -		 */
> -		cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
> -				isolate_start_pfn :
> -				block_start_pfn - pageblock_nr_pages;
> +		/* To prevent back and forth migration */
> +		if (nr_isolated)
> +			set_pageblock_skip_migratescan(page);
>  
>  		/*
>  		 * isolate_freepages_block() might have aborted due to async
> @@ -1053,12 +1053,15 @@ static void isolate_freepages(struct compact_control *cc)
>  	/* split_free_page does not map the pages */
>  	map_pages(freelist);
>  
> -	/*
> -	 * If we crossed the migrate scanner, we want to keep it that way
> -	 * so that compact_finished() may detect this
> -	 */
> -	if (block_start_pfn < low_pfn)
> -		cc->free_pfn = cc->migrate_pfn;
> +	cc->free_pfn = pfn;
> +	if (cc->free_pfn >= zone_end_pfn(zone)) {
> +		cc->free_pfn = zone->zone_start_pfn;
> +		zone->compact_cached_free_pfn = cc->free_pfn;
> +		if (cc->nr_freepages == 0 && !rotated) {
> +			rotated = true;
> +			goto retry;

This too deserves explanation. If we isolated at least something, we can return
and not "rotate" (wrap) to the zone start immediately. But if we isolated
nothing, we need to continue immediately, otherwise migration fails on -ENOMEM
and so will the whole compaction. And the fact, that if we isolated something,
we return immediately, has implications for release_freepages() being correct,
as I noted above.

> +		}
> +	}
>  }
>  
>  /*
> @@ -1144,8 +1147,9 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	 * Iterate over whole pageblocks until we find the first suitable.
>  	 * Do not cross the free scanner.
>  	 */
> -	for (; end_pfn <= cc->free_pfn;
> -			low_pfn = end_pfn, end_pfn += pageblock_nr_pages) {
> +	for (; low_pfn < zone_end_pfn(zone); low_pfn = end_pfn) {
> +		end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
> +		end_pfn = min(end_pfn, zone_end_pfn(zone));
>  
>  		/*
>  		 * This can potentially iterate a massively long zone with
> @@ -1191,12 +1195,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	}
>  
>  	acct_isolated(zone, cc);
> -	/*
> -	 * Record where migration scanner will be restarted. If we end up in
> -	 * the same pageblock as the free scanner, make the scanners fully
> -	 * meet so that compact_finished() terminates compaction.
> -	 */
> -	cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
> +	cc->migrate_pfn = low_pfn;
>  
>  	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
>  }
> @@ -1211,11 +1210,15 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
>  		return COMPACT_PARTIAL;
>  
>  	/* Compaction run completes if the migrate and free scanner meet */

This is no longer true :)

> -	if (cc->free_pfn <= cc->migrate_pfn) {
> +	if (cc->migrate_pfn >= zone_end_pfn(zone)) {
> +		/* Stop the async compaction */
> +		zone->compact_cached_migrate_pfn[0] = zone_end_pfn(zone);
> +		if (cc->mode == MIGRATE_ASYNC)
> +			return COMPACT_PARTIAL;

This is not explained in changelog? And I don't immediately see why this is needed?

> +
>  		/* Let the next compaction start anew. */
>  		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
>  		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> -		zone->compact_cached_free_pfn = zone_end_pfn(zone);
>  
>  		/*
>  		 * Mark that the PG_migrate_skip information should be cleared
> @@ -1397,11 +1400,14 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>  	 */
>  	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
>  	cc->free_pfn = zone->compact_cached_free_pfn;
> -	if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
> -		cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
> +	if (cc->mode == MIGRATE_ASYNC && cc->migrate_pfn >= end_pfn)
> +		return COMPACT_SKIPPED;

Ditto. Is this needed for correctness, or another optimization that should be
separate patch?

> +
> +	if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
> +		cc->free_pfn = start_pfn;
>  		zone->compact_cached_free_pfn = cc->free_pfn;
>  	}
> -	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
> +	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
>  		cc->migrate_pfn = start_pfn;
>  		zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
>  		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
> @@ -1450,11 +1456,16 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>  		if (err) {
>  			putback_movable_pages(&cc->migratepages);
>  			/*
> -			 * migrate_pages() may return -ENOMEM when scanners meet
> -			 * and we want compact_finished() to detect it
> +			 * migrate_pages() may return -ENOMEM when free scanner
> +			 * doesn't find any freepage, and, in this case, more
> +			 * compaction attempt is useless because it cannot
> +			 * be successful due to no free memory. We'd like to
> +			 * consider this state as COMPACT_COMPLETE although
> +			 * migration scanner doesn't reach end_pfn of zone and
> +			 * fail to compaction.

Hmm is there a danger of frequent completes like these leading to premature
cached pfn resets, so migration scanner will again not reach the whole zone?

>  			 */
> -			if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
> -				ret = COMPACT_PARTIAL;
> +			if (err == -ENOMEM) {
> +				ret = COMPACT_COMPLETE;
>  				goto out;
>  			}
>  		}
> @@ -1504,13 +1515,11 @@ out:
>  
>  		cc->nr_freepages = 0;
>  		VM_BUG_ON(free_pfn == 0);
> -		/* The cached pfn is always the first in a pageblock */
> -		free_pfn &= ~(pageblock_nr_pages-1);
>  		/*
>  		 * Only go back, not forward. The cached pfn might have been
>  		 * already reset to zone end in compact_finished()
>  		 */
> -		if (free_pfn > zone->compact_cached_free_pfn)
> +		if (free_pfn < zone->compact_cached_free_pfn)
>  			zone->compact_cached_free_pfn = free_pfn;
>  	}
>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/9] mm/compaction: redesign compaction
  2015-10-14 11:38   ` Vlastimil Babka
@ 2015-10-15  2:38     ` Joonsoo Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-10-15  2:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
	David Rientjes, Minchan Kim

On Wed, Oct 14, 2015 at 01:38:07PM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> 
> [...]
> 
> > This patch solves these problems mentioned in above.
> > Freepage scanner is largely changed to scan zone from zone_start_pfn
> > to zone_end_pfn. And, by this change, compaction finish condition is also
> > changed that migration scanner reach zone_end_pfn. With these changes,
> > migration scanner can traverse anywhere in the zone.
> > 
> > To prevent back and forth migration within one compaction iteration,
> > freepage scanner marks skip-bit when scanning pageblock. migration scanner
> > checks it and will skip this marked pageblock so back and forth migration
> > cannot be possible in one compaction iteration.
> > 
> > If freepage scanner reachs the end of zone, it restarts at zone_start_pfn.
> > In this time, freepage scanner would scan the pageblock where migration
> > scanner has migrated some pages but fail to make high order page. This
> > leaved freepages means that they can't become high order page due to
> > the fragmentation so it is good source for freepage scanner.
> 
> Hmm, there might be danger that freepage scanner will proceed faster than
> migration scanner, as evidenced by the current code where the scanners meet
> earlier than in the middle. Thus, it would likely mark pageblocks as unsuitable
> for migration scanner, before it can reach them?

That's right.

> 
> > With this change, above test result is:
> > 
> > Kernel:	Base vs Redesign
> > 
> > Test: hogger-frag-movable
> > 
> > Success(N)                    70              94
> > compact_stall                307            3642
> > compact_success               64             144
> > compact_fail                 243            3498
> > pgmigrate_success          34592        15897219
> > compact_isolated           73977        31899553
> > compact_migrate_scanned  2280770        59146745
> > compact_free_scanned     4710313        49566134
> > 
> > Test: hogger-frag-movable with free memory variation
> > 
> > 200MB-Success(N)	70	94
> > 250MB-Success(N)	38	93
> > 300MB-Success(N)	29	89
> > 
> > Compaction gives us almost all possible high order page. Overhead is
> > highly increased, but, further patch will reduce it greatly
> > by adjusting depletion check with this new algorithm.
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  mm/compaction.c | 145 ++++++++++++++++++++++++++++++--------------------------
> >  1 file changed, 77 insertions(+), 68 deletions(-)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index a259608..ca4d6d1 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -53,17 +53,17 @@ static const char *const compaction_status_string[] = {
> >  static unsigned long release_freepages(struct list_head *freelist)
> >  {
> >  	struct page *page, *next;
> > -	unsigned long high_pfn = 0;
> > +	unsigned long low_pfn = ULONG_MAX;
> >  
> >  	list_for_each_entry_safe(page, next, freelist, lru) {
> >  		unsigned long pfn = page_to_pfn(page);
> >  		list_del(&page->lru);
> >  		__free_page(page);
> > -		if (pfn > high_pfn)
> > -			high_pfn = pfn;
> > +		if (pfn < low_pfn)
> > +			low_pfn = pfn;
> >  	}
> >  
> > -	return high_pfn;
> > +	return low_pfn;
> 
> This assumes that the freelist won't contain both pages from the end of the
> zone, and from the beginning the zone after the free scanner has wrapped. Which
> is true, but not immediately obvious, so it deserves some comment.

Okay. Will add it.

> >  }
> >  
> >  static void map_pages(struct list_head *list)
> > @@ -243,7 +243,7 @@ static void __reset_isolation_suitable(struct zone *zone)
> >  
> >  	zone->compact_cached_migrate_pfn[0] = start_pfn;
> >  	zone->compact_cached_migrate_pfn[1] = start_pfn;
> > -	zone->compact_cached_free_pfn = end_pfn;
> > +	zone->compact_cached_free_pfn = start_pfn;
> >  	zone->compact_blockskip_flush = false;
> >  
> >  	clear_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags);
> > @@ -335,7 +335,7 @@ static void update_pageblock_skip(struct compact_control *cc,
> >  		if (!nr_isolated)
> >  			set_pageblock_skip_freescan(page);
> >  
> > -		if (pfn < zone->compact_cached_free_pfn)
> > +		if (pfn > zone->compact_cached_free_pfn)
> >  			zone->compact_cached_free_pfn = pfn;
> >  	}
> >  }
> > @@ -869,8 +869,11 @@ isolate_success:
> >  						nr_scanned, nr_isolated);
> >  
> >  	count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
> > -	if (nr_isolated)
> > +	if (nr_isolated) {
> >  		count_compact_events(COMPACTISOLATED, nr_isolated);
> > +		if (valid_page)
> > +			clear_pageblock_skip_freescan(valid_page);
> 
> This is explained in changelog, but a comment here would be useful too.

Okay.

> > +	}
> >  
> >  	return low_pfn;
> >  }
> > @@ -969,12 +972,14 @@ static void isolate_freepages(struct compact_control *cc)
> >  {
> >  	struct zone *zone = cc->zone;
> >  	struct page *page;
> > +	unsigned long pfn;
> >  	unsigned long block_start_pfn;	/* start of current pageblock */
> > -	unsigned long isolate_start_pfn; /* exact pfn we start at */
> >  	unsigned long block_end_pfn;	/* end of current pageblock */
> > -	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
> >  	struct list_head *freelist = &cc->freepages;
> > +	unsigned long nr_isolated;
> > +	bool rotated = false;
> >  
> > +retry:
> >  	/*
> >  	 * Initialise the free scanner. The starting point is where we last
> >  	 * successfully isolated from, zone-cached value, or the end of the
> > @@ -986,22 +991,21 @@ static void isolate_freepages(struct compact_control *cc)
> >  	 * The low boundary is the end of the pageblock the migration scanner
> >  	 * is using.
> >  	 */
> > -	isolate_start_pfn = cc->free_pfn;
> > -	block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
> > -	block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
> > -						zone_end_pfn(zone));
> > -	low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
> > +	pfn = cc->free_pfn;
> >  
> > -	/*
> > -	 * Isolate free pages until enough are available to migrate the
> > -	 * pages on cc->migratepages. We stop searching if the migrate
> > -	 * and free page scanners meet or enough free pages are isolated.
> > -	 */
> > -	for (; block_start_pfn >= low_pfn &&
> > -			cc->nr_migratepages > cc->nr_freepages;
> > -				block_end_pfn = block_start_pfn,
> > -				block_start_pfn -= pageblock_nr_pages,
> > -				isolate_start_pfn = block_start_pfn) {
> > +	for (; pfn < zone_end_pfn(zone) &&
> > +		cc->nr_migratepages > cc->nr_freepages;) {
> > +
> > +		block_start_pfn = pfn & ~(pageblock_nr_pages-1);
> > +		block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
> > +		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> > +
> > +		/* Skip the pageblock where migration scan is */
> > +		if (block_start_pfn ==
> > +			(cc->migrate_pfn & ~(pageblock_nr_pages-1))) {
> > +			pfn = block_end_pfn;
> 
> Can't this "pfn = block_end_pfn" be in the for-loop header? Can it ever happen

You mean following thing?
for (; pfn < zone_end_pfn(zone) && cc->nr_migratepages >
        cc->nr_freepages; pfn = block_end_pfn)

It is possible. Will do it.

> that all of the following is true? I don't think it can?
> - isolate_freepages_block() returns before reaching block_end_pfn
> - it hasn't isolated enough, so cc->nr_migratepages > cc->nr_freepages is true
> - cc->contended is not set

Do you want to move out this check to out of the loop? It isn't
possible because freepage scanner could skip some pageblock in the loop.
If you mean other thing, please explain more.

> > +			continue;
> > +		}
> >  
> >  		/*
> >  		 * This can iterate a massively long zone without finding any
> > @@ -1012,35 +1016,31 @@ static void isolate_freepages(struct compact_control *cc)
> >  						&& compact_should_abort(cc))
> >  			break;
> >  
> > -		page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
> > -									zone);
> > -		if (!page)
> > +		page = pageblock_pfn_to_page(pfn, block_end_pfn, zone);
> > +		if (!page) {
> > +			pfn = block_end_pfn;
> >  			continue;
> > +		}
> >  
> >  		/* Check the block is suitable for migration */
> > -		if (!suitable_migration_target(cc, page))
> > +		if (!suitable_migration_target(cc, page)) {
> > +			pfn = block_end_pfn;
> >  			continue;
> > +		}
> >  
> >  		/* If isolation recently failed, do not retry */
> > -		if (!isolation_suitable(cc, page, false))
> > +		if (!isolation_suitable(cc, page, false)) {
> > +			pfn = block_end_pfn;
> >  			continue;
> > +		}
> >  
> >  		/* Found a block suitable for isolating free pages from. */
> > -		isolate_freepages_block(cc, &isolate_start_pfn,
> > +		nr_isolated = isolate_freepages_block(cc, &pfn,
> >  					block_end_pfn, freelist, false);
> >  
> > -		/*
> > -		 * Remember where the free scanner should restart next time,
> > -		 * which is where isolate_freepages_block() left off.
> > -		 * But if it scanned the whole pageblock, isolate_start_pfn
> > -		 * now points at block_end_pfn, which is the start of the next
> > -		 * pageblock.
> > -		 * In that case we will however want to restart at the start
> > -		 * of the previous pageblock.
> > -		 */
> > -		cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
> > -				isolate_start_pfn :
> > -				block_start_pfn - pageblock_nr_pages;
> > +		/* To prevent back and forth migration */
> > +		if (nr_isolated)
> > +			set_pageblock_skip_migratescan(page);
> >  
> >  		/*
> >  		 * isolate_freepages_block() might have aborted due to async
> > @@ -1053,12 +1053,15 @@ static void isolate_freepages(struct compact_control *cc)
> >  	/* split_free_page does not map the pages */
> >  	map_pages(freelist);
> >  
> > -	/*
> > -	 * If we crossed the migrate scanner, we want to keep it that way
> > -	 * so that compact_finished() may detect this
> > -	 */
> > -	if (block_start_pfn < low_pfn)
> > -		cc->free_pfn = cc->migrate_pfn;
> > +	cc->free_pfn = pfn;
> > +	if (cc->free_pfn >= zone_end_pfn(zone)) {
> > +		cc->free_pfn = zone->zone_start_pfn;
> > +		zone->compact_cached_free_pfn = cc->free_pfn;
> > +		if (cc->nr_freepages == 0 && !rotated) {
> > +			rotated = true;
> > +			goto retry;
> 
> This too deserves explanation. If we isolated at least something, we can return
> and not "rotate" (wrap) to the zone start immediately. But if we isolated
> nothing, we need to continue immediately, otherwise migration fails on -ENOMEM
> and so will the whole compaction. And the fact, that if we isolated something,
> we return immediately, has implications for release_freepages() being correct,
> as I noted above.

Yes. you're right. I will add some comment.

> > +		}
> > +	}
> >  }
> >  
> >  /*
> > @@ -1144,8 +1147,9 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> >  	 * Iterate over whole pageblocks until we find the first suitable.
> >  	 * Do not cross the free scanner.
> >  	 */
> > -	for (; end_pfn <= cc->free_pfn;
> > -			low_pfn = end_pfn, end_pfn += pageblock_nr_pages) {
> > +	for (; low_pfn < zone_end_pfn(zone); low_pfn = end_pfn) {
> > +		end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
> > +		end_pfn = min(end_pfn, zone_end_pfn(zone));
> >  
> >  		/*
> >  		 * This can potentially iterate a massively long zone with
> > @@ -1191,12 +1195,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> >  	}
> >  
> >  	acct_isolated(zone, cc);
> > -	/*
> > -	 * Record where migration scanner will be restarted. If we end up in
> > -	 * the same pageblock as the free scanner, make the scanners fully
> > -	 * meet so that compact_finished() terminates compaction.
> > -	 */
> > -	cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
> > +	cc->migrate_pfn = low_pfn;
> >  
> >  	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> >  }
> > @@ -1211,11 +1210,15 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
> >  		return COMPACT_PARTIAL;
> >  
> >  	/* Compaction run completes if the migrate and free scanner meet */
> 
> This is no longer true :)

Will remove.

> > -	if (cc->free_pfn <= cc->migrate_pfn) {
> > +	if (cc->migrate_pfn >= zone_end_pfn(zone)) {
> > +		/* Stop the async compaction */
> > +		zone->compact_cached_migrate_pfn[0] = zone_end_pfn(zone);
> > +		if (cc->mode == MIGRATE_ASYNC)
> > +			return COMPACT_PARTIAL;
> 
> This is not explained in changelog? And I don't immediately see why this is needed?

Yeah. I should explain this change. It's my mistake.
Purpose of this change is giving the opportunity that *sync* compaction
scans whole zone range. If async compaction reach the end of the zone and
compaction is complete and reset, compaction would not scan whole zone
in some cases like as many non-movable pageblock exists.

After async compaction reach the end of the zone, async compaction
will be skipped and only sync compaction will work. It prevents needless
overhead that async compaction scan the range where async compaction already
fail to make high-order page.

> > +
> >  		/* Let the next compaction start anew. */
> >  		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> >  		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> > -		zone->compact_cached_free_pfn = zone_end_pfn(zone);
> >  
> >  		/*
> >  		 * Mark that the PG_migrate_skip information should be cleared
> > @@ -1397,11 +1400,14 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> >  	 */
> >  	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
> >  	cc->free_pfn = zone->compact_cached_free_pfn;
> > -	if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
> > -		cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
> > +	if (cc->mode == MIGRATE_ASYNC && cc->migrate_pfn >= end_pfn)
> > +		return COMPACT_SKIPPED;
> 
> Ditto. Is this needed for correctness, or another optimization that should be
> separate patch?

Explained above.

> > +
> > +	if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
> > +		cc->free_pfn = start_pfn;
> >  		zone->compact_cached_free_pfn = cc->free_pfn;
> >  	}
> > -	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
> > +	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
> >  		cc->migrate_pfn = start_pfn;
> >  		zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
> >  		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
> > @@ -1450,11 +1456,16 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> >  		if (err) {
> >  			putback_movable_pages(&cc->migratepages);
> >  			/*
> > -			 * migrate_pages() may return -ENOMEM when scanners meet
> > -			 * and we want compact_finished() to detect it
> > +			 * migrate_pages() may return -ENOMEM when free scanner
> > +			 * doesn't find any freepage, and, in this case, more
> > +			 * compaction attempt is useless because it cannot
> > +			 * be successful due to no free memory. We'd like to
> > +			 * consider this state as COMPACT_COMPLETE although
> > +			 * migration scanner doesn't reach end_pfn of zone and
> > +			 * fail to compaction.
> 
> Hmm is there a danger of frequent completes like these leading to premature
> cached pfn resets, so migration scanner will again not reach the whole zone?

ENOMEM would happen when most of pageblocks are non-movable and freepage are
in non-movable pageblock. After some failure, freepage scanner will try to
scan non-movable pageblock so can escape this situation.

Thanks.

> >  			 */
> > -			if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
> > -				ret = COMPACT_PARTIAL;
> > +			if (err == -ENOMEM) {
> > +				ret = COMPACT_COMPLETE;
> >  				goto out;
> >  			}
> >  		}
> > @@ -1504,13 +1515,11 @@ out:
> >  
> >  		cc->nr_freepages = 0;
> >  		VM_BUG_ON(free_pfn == 0);
> > -		/* The cached pfn is always the first in a pageblock */
> > -		free_pfn &= ~(pageblock_nr_pages-1);
> >  		/*
> >  		 * Only go back, not forward. The cached pfn might have been
> >  		 * already reset to zone end in compact_finished()
> >  		 */
> > -		if (free_pfn > zone->compact_cached_free_pfn)
> > +		if (free_pfn < zone->compact_cached_free_pfn)
> >  			zone->compact_cached_free_pfn = free_pfn;
> >  	}
> >  
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at
  2015-08-24  2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
                   ` (6 preceding siblings ...)
  2015-08-24  2:19 ` [PATCH v2 7/9] mm/compaction: redesign compaction Joonsoo Kim
@ 2015-08-24  2:19 ` Joonsoo Kim
  2015-09-25 11:41   ` Vlastimil Babka
  2015-08-24  2:19 ` [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone Joonsoo Kim
  8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24  2:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
	Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim

Purpose of compaction is to make high order page. To achive this purpose,
it is the best strategy that compaction migrates contiguous used pages
to fragmented unused freepages. Currently, freepage scanner don't
distinguish whether freepage is fragmented or not and blindly use
any freepage for migration target regardless of freepage's order.

Using higher order freepage than compaction aims at is not good because
what we do here is breaking high order freepage at somewhere and migrating
used pages from elsewhere to this broken high order freepages in order to
make new high order freepage. That is just position change of high order
freepage.

This is useless effort and doesn't help to make more high order freepages
because we can't be sure that migrating used pages makes high order
freepage. So, this patch makes freepage scanner only uses the ordered
freepage lower than compaction order.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index ca4d6d1..e61ee77 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -455,6 +455,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 	unsigned long flags = 0;
 	bool locked = false;
 	unsigned long blockpfn = *start_pfn;
+	unsigned long freepage_order;
 
 	cursor = pfn_to_page(blockpfn);
 
@@ -482,6 +483,20 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 		if (!PageBuddy(page))
 			goto isolate_fail;
 
+		if (!strict && cc->order != -1) {
+			freepage_order = page_order_unsafe(page);
+
+			if (freepage_order > 0 && freepage_order < MAX_ORDER) {
+				/*
+				 * Do not use high order freepage for migration
+				 * taret. It would not be beneficial for
+				 * compaction success rate.
+				 */
+				if (freepage_order >= cc->order)
+					goto isolate_fail;
+			}
+		}
+
 		/*
 		 * If we already hold the lock, we can skip some rechecking.
 		 * Note that if we hold the lock now, checked_pageblock was
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at
  2015-08-24  2:19 ` [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at Joonsoo Kim
@ 2015-09-25 11:41   ` Vlastimil Babka
  2015-09-30  8:38     ` Joonsoo Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-09-25 11:41 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
	Minchan Kim, Joonsoo Kim

On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Purpose of compaction is to make high order page. To achive this purpose,
> it is the best strategy that compaction migrates contiguous used pages
> to fragmented unused freepages. Currently, freepage scanner don't
> distinguish whether freepage is fragmented or not and blindly use
> any freepage for migration target regardless of freepage's order.
> 
> Using higher order freepage than compaction aims at is not good because
> what we do here is breaking high order freepage at somewhere and migrating
> used pages from elsewhere to this broken high order freepages in order to
> make new high order freepage. That is just position change of high order
> freepage.
> 
> This is useless effort and doesn't help to make more high order freepages
> because we can't be sure that migrating used pages makes high order
> freepage. So, this patch makes freepage scanner only uses the ordered
> freepage lower than compaction order.

How often does this happen? If there's a free page of the order we need, then we
are done compacting anyway, no? Or is this happening because of the current
high-order watermark checking implementation? It would be interesting to measure
how often this skip would trigger. Also watermark checking should change with
Mel's patchset and then this patch shouldn't be needed?

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/compaction.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ca4d6d1..e61ee77 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -455,6 +455,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  	unsigned long flags = 0;
>  	bool locked = false;
>  	unsigned long blockpfn = *start_pfn;
> +	unsigned long freepage_order;
>  
>  	cursor = pfn_to_page(blockpfn);
>  
> @@ -482,6 +483,20 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  		if (!PageBuddy(page))
>  			goto isolate_fail;
>  
> +		if (!strict && cc->order != -1) {
> +			freepage_order = page_order_unsafe(page);
> +
> +			if (freepage_order > 0 && freepage_order < MAX_ORDER) {
> +				/*
> +				 * Do not use high order freepage for migration
> +				 * taret. It would not be beneficial for
> +				 * compaction success rate.
> +				 */
> +				if (freepage_order >= cc->order)

It would be better to skip the whole freepage_order.

> +					goto isolate_fail;
> +			}
> +		}
> +
>  		/*
>  		 * If we already hold the lock, we can skip some rechecking.
>  		 * Note that if we hold the lock now, checked_pageblock was
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at
  2015-09-25 11:41   ` Vlastimil Babka
@ 2015-09-30  8:38     ` Joonsoo Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-09-30  8:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
	David Rientjes, Minchan Kim

On Fri, Sep 25, 2015 at 01:41:48PM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> > Purpose of compaction is to make high order page. To achive this purpose,
> > it is the best strategy that compaction migrates contiguous used pages
> > to fragmented unused freepages. Currently, freepage scanner don't
> > distinguish whether freepage is fragmented or not and blindly use
> > any freepage for migration target regardless of freepage's order.
> > 
> > Using higher order freepage than compaction aims at is not good because
> > what we do here is breaking high order freepage at somewhere and migrating
> > used pages from elsewhere to this broken high order freepages in order to
> > make new high order freepage. That is just position change of high order
> > freepage.
> > 
> > This is useless effort and doesn't help to make more high order freepages
> > because we can't be sure that migrating used pages makes high order
> > freepage. So, this patch makes freepage scanner only uses the ordered
> > freepage lower than compaction order.
> 
> How often does this happen? If there's a free page of the order we need, then we
> are done compacting anyway, no? Or is this happening because of the current
> high-order watermark checking implementation? It would be interesting to measure
> how often this skip would trigger. Also watermark checking should change with
> Mel's patchset and then this patch shouldn't be needed?

Yes, you are right. This would be happening because of the current
high-order watermakr checking implementation and Mel's patchset will
solve it and if it is merged, this patch isn't needed.

> 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  mm/compaction.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index ca4d6d1..e61ee77 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -455,6 +455,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> >  	unsigned long flags = 0;
> >  	bool locked = false;
> >  	unsigned long blockpfn = *start_pfn;
> > +	unsigned long freepage_order;
> >  
> >  	cursor = pfn_to_page(blockpfn);
> >  
> > @@ -482,6 +483,20 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> >  		if (!PageBuddy(page))
> >  			goto isolate_fail;
> >  
> > +		if (!strict && cc->order != -1) {
> > +			freepage_order = page_order_unsafe(page);
> > +
> > +			if (freepage_order > 0 && freepage_order < MAX_ORDER) {
> > +				/*
> > +				 * Do not use high order freepage for migration
> > +				 * taret. It would not be beneficial for
> > +				 * compaction success rate.
> > +				 */
> > +				if (freepage_order >= cc->order)
> 
> It would be better to skip the whole freepage_order.

Okay.

Thanks.

> 
> > +					goto isolate_fail;
> > +			}
> > +		}
> > +
> >  		/*
> >  		 * If we already hold the lock, we can skip some rechecking.
> >  		 * Note that if we hold the lock now, checked_pageblock was
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone
  2015-08-24  2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
                   ` (7 preceding siblings ...)
  2015-08-24  2:19 ` [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at Joonsoo Kim
@ 2015-08-24  2:19 ` Joonsoo Kim
  2015-10-14 12:28   ` Vlastimil Babka
  8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24  2:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
	Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim

Now, compaction algorithm become powerful. Migration scanner traverses
whole zone range. So, old threshold for depleted zone which is designed
to imitate compaction deferring approach isn't appropriate for current
compaction algorithm. If we adhere to current threshold, 1, we can't
avoid excessive overhead caused by compaction, because one compaction
for low order allocation would be easily successful in any situation.

This patch re-implements threshold calculation based on zone size and
allocation requested order. We judge whther compaction possibility is
depleted or not by number of successful compaction. Roughly, 1/100
of future scanned area should be allocated for high order page during
one comaction iteration in order to determine whether zone's compaction
possiblity is depleted or not.

Below is test result with following setup.

Memory is artificially fragmented to make order 3 allocation hard. And,
most of pageblocks are changed to movable migratetype.

  System: 512 MB with 32 MB Zram
  Memory: 25% memory is allocated to make fragmentation and 200 MB is
  	occupied by memory hogger. Most pageblocks are movable
  	migratetype.
  Fragmentation: Successful order 3 allocation candidates may be around
  	1500 roughly.
  Allocation attempts: Roughly 3000 order 3 allocation attempts
  	with GFP_NORETRY. This value is determined to saturate allocation
  	success.

Test: hogger-frag-movable

Success(N)                    94              83
compact_stall               3642            4048
compact_success              144             212
compact_fail                3498            3835
pgmigrate_success       15897219          216387
compact_isolated        31899553          487712
compact_migrate_scanned 59146745         2513245
compact_free_scanned    49566134         4124319

This change results in greatly decreasing compaction overhead when
zone's compaction possibility is nearly depleted. But, I should admit
that it's not perfect because compaction success rate is decreased.
More precise tuning threshold would restore this regression, but,
it highly depends on workload so I'm not doing it here.

Other test doesn't show big regression.

  System: 512 MB with 32 MB Zram
  Memory: 25% memory is allocated to make fragmentation and kernel
  	build is running on background. Most pageblocks are movable
  	migratetype.
  Fragmentation: Successful order 3 allocation candidates may be around
  	1500 roughly.
  Allocation attempts: Roughly 3000 order 3 allocation attempts
  	with GFP_NORETRY. This value is determined to saturate allocation
  	success.

Test: build-frag-movable

Success(N)                    89              87
compact_stall               4053            3642
compact_success              264             202
compact_fail                3788            3440
pgmigrate_success        6497642          153413
compact_isolated        13292640          353445
compact_migrate_scanned 69714502         2307433
compact_free_scanned    20243121         2325295

This looks like reasonable trade-off.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index e61ee77..e1b44a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -129,19 +129,24 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,

 /* Do not skip compaction more than 64 times */
 #define COMPACT_MAX_FAILED 4
-#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
+#define COMPACT_MIN_DEPLETE_THRESHOLD 4UL
 #define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)

 static bool compaction_depleted(struct zone *zone)
 {
-	unsigned long threshold;
+	unsigned long nr_possible;
 	unsigned long success = zone->compact_success;
+	unsigned long threshold;

-	/*
-	 * Now, to imitate current compaction deferring approach,
-	 * choose threshold to 1. It will be changed in the future.
-	 */
-	threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
+	nr_possible = zone->managed_pages >> zone->compact_order_failed;
+
+	/* Migration scanner normally scans less than 1/4 range of zone */
+	nr_possible >>= 2;
+
+	/* We hope to succeed more than 1/100 roughly */
+	threshold = nr_possible >> 7;
+
+	threshold = max(threshold, COMPACT_MIN_DEPLETE_THRESHOLD);
 	if (success >= threshold)
 		return false;

-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone
  2015-08-24  2:19 ` [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone Joonsoo Kim
@ 2015-10-14 12:28   ` Vlastimil Babka
  2015-10-15  6:03     ` Joonsoo Kim
  2015-10-15  6:06     ` Joonsoo Kim
  0 siblings, 2 replies; 26+ messages in thread
From: Vlastimil Babka @ 2015-10-14 12:28 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
	Minchan Kim, Joonsoo Kim

On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Now, compaction algorithm become powerful. Migration scanner traverses
> whole zone range. So, old threshold for depleted zone which is designed
> to imitate compaction deferring approach isn't appropriate for current
> compaction algorithm. If we adhere to current threshold, 1, we can't
> avoid excessive overhead caused by compaction, because one compaction
> for low order allocation would be easily successful in any situation.
> 
> This patch re-implements threshold calculation based on zone size and
> allocation requested order. We judge whther compaction possibility is
> depleted or not by number of successful compaction. Roughly, 1/100
> of future scanned area should be allocated for high order page during
> one comaction iteration in order to determine whether zone's compaction
> possiblity is depleted or not.

Finally finishing my review, sorry it took that long...

> Below is test result with following setup.
> 
> Memory is artificially fragmented to make order 3 allocation hard. And,
> most of pageblocks are changed to movable migratetype.
> 
>   System: 512 MB with 32 MB Zram
>   Memory: 25% memory is allocated to make fragmentation and 200 MB is
>   	occupied by memory hogger. Most pageblocks are movable
>   	migratetype.
>   Fragmentation: Successful order 3 allocation candidates may be around
>   	1500 roughly.
>   Allocation attempts: Roughly 3000 order 3 allocation attempts
>   	with GFP_NORETRY. This value is determined to saturate allocation
>   	success.
> 
> Test: hogger-frag-movable
> 
> Success(N)                    94              83
> compact_stall               3642            4048
> compact_success              144             212
> compact_fail                3498            3835
> pgmigrate_success       15897219          216387
> compact_isolated        31899553          487712
> compact_migrate_scanned 59146745         2513245
> compact_free_scanned    49566134         4124319

The decrease in scanned/isolated/migrated counts looks definitely nice, but why
did success regress when compact_success improved substantially?

> This change results in greatly decreasing compaction overhead when
> zone's compaction possibility is nearly depleted. But, I should admit
> that it's not perfect because compaction success rate is decreased.
> More precise tuning threshold would restore this regression, but,
> it highly depends on workload so I'm not doing it here.
> 
> Other test doesn't show big regression.
> 
>   System: 512 MB with 32 MB Zram
>   Memory: 25% memory is allocated to make fragmentation and kernel
>   	build is running on background. Most pageblocks are movable
>   	migratetype.
>   Fragmentation: Successful order 3 allocation candidates may be around
>   	1500 roughly.
>   Allocation attempts: Roughly 3000 order 3 allocation attempts
>   	with GFP_NORETRY. This value is determined to saturate allocation
>   	success.
> 
> Test: build-frag-movable
> 
> Success(N)                    89              87
> compact_stall               4053            3642
> compact_success              264             202
> compact_fail                3788            3440
> pgmigrate_success        6497642          153413
> compact_isolated        13292640          353445
> compact_migrate_scanned 69714502         2307433
> compact_free_scanned    20243121         2325295

Here compact_success decreased relatively a lot, while success just barely.
Less counterintuitive than the first result, but still a bit.

> This looks like reasonable trade-off.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/compaction.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e61ee77..e1b44a5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -129,19 +129,24 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>  
>  /* Do not skip compaction more than 64 times */
>  #define COMPACT_MAX_FAILED 4
> -#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
> +#define COMPACT_MIN_DEPLETE_THRESHOLD 4UL
>  #define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
>  
>  static bool compaction_depleted(struct zone *zone)
>  {
> -	unsigned long threshold;
> +	unsigned long nr_possible;
>  	unsigned long success = zone->compact_success;
> +	unsigned long threshold;
>  
> -	/*
> -	 * Now, to imitate current compaction deferring approach,
> -	 * choose threshold to 1. It will be changed in the future.
> -	 */
> -	threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
> +	nr_possible = zone->managed_pages >> zone->compact_order_failed;
> +
> +	/* Migration scanner normally scans less than 1/4 range of zone */
> +	nr_possible >>= 2;
> +
> +	/* We hope to succeed more than 1/100 roughly */
> +	threshold = nr_possible >> 7;
> +
> +	threshold = max(threshold, COMPACT_MIN_DEPLETE_THRESHOLD);
>  	if (success >= threshold)
>  		return false;

I wonder if compact_depletion_depth should play some "positive" role here. The
bigger the depth, the lower the migration_scan_limit, which means higher chance
of failing and so on. Ideally, the system should stabilize itself, so that
migration_scan_limit is set based how many pages on average have to be scanned
to succeed?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone
  2015-10-14 12:28   ` Vlastimil Babka
@ 2015-10-15  6:03     ` Joonsoo Kim
  2015-10-15  6:06     ` Joonsoo Kim
  1 sibling, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-10-15  6:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
	David Rientjes, Minchan Kim

On Wed, Oct 14, 2015 at 02:28:31PM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> > Now, compaction algorithm become powerful. Migration scanner traverses
> > whole zone range. So, old threshold for depleted zone which is designed
> > to imitate compaction deferring approach isn't appropriate for current
> > compaction algorithm. If we adhere to current threshold, 1, we can't
> > avoid excessive overhead caused by compaction, because one compaction
> > for low order allocation would be easily successful in any situation.
> > 
> > This patch re-implements threshold calculation based on zone size and
> > allocation requested order. We judge whther compaction possibility is
> > depleted or not by number of successful compaction. Roughly, 1/100
> > of future scanned area should be allocated for high order page during
> > one comaction iteration in order to determine whether zone's compaction
> > possiblity is depleted or not.
> 
> Finally finishing my review, sorry it took that long...
> 
> > Below is test result with following setup.
> > 
> > Memory is artificially fragmented to make order 3 allocation hard. And,
> > most of pageblocks are changed to movable migratetype.
> > 
> >   System: 512 MB with 32 MB Zram
> >   Memory: 25% memory is allocated to make fragmentation and 200 MB is
> >   	occupied by memory hogger. Most pageblocks are movable
> >   	migratetype.
> >   Fragmentation: Successful order 3 allocation candidates may be around
> >   	1500 roughly.
> >   Allocation attempts: Roughly 3000 order 3 allocation attempts
> >   	with GFP_NORETRY. This value is determined to saturate allocation
> >   	success.
> > 
> > Test: hogger-frag-movable
> > 
> > Success(N)                    94              83
> > compact_stall               3642            4048
> > compact_success              144             212
> > compact_fail                3498            3835
> > pgmigrate_success       15897219          216387
> > compact_isolated        31899553          487712
> > compact_migrate_scanned 59146745         2513245
> > compact_free_scanned    49566134         4124319
> 
> The decrease in scanned/isolated/migrated counts looks definitely nice, but why
> did success regress when compact_success improved substantially?

I don't know exact reason. I guess that more compaction success rate
comes from the fact that async compaction is stopped when depleted.
This compact_success metric doesn't fully reflect allocation success because
allocation success comes from not only compaction but also reclaim and
others. In the above test, compaction just succeed 200 times but
allocation succeed more than 1300 times.

> 
> > This change results in greatly decreasing compaction overhead when
> > zone's compaction possibility is nearly depleted. But, I should admit
> > that it's not perfect because compaction success rate is decreased.
> > More precise tuning threshold would restore this regression, but,
> > it highly depends on workload so I'm not doing it here.
> > 
> > Other test doesn't show big regression.
> > 
> >   System: 512 MB with 32 MB Zram
> >   Memory: 25% memory is allocated to make fragmentation and kernel
> >   	build is running on background. Most pageblocks are movable
> >   	migratetype.
> >   Fragmentation: Successful order 3 allocation candidates may be around
> >   	1500 roughly.
> >   Allocation attempts: Roughly 3000 order 3 allocation attempts
> >   	with GFP_NORETRY. This value is determined to saturate allocation
> >   	success.
> > 
> > Test: build-frag-movable
> > 
> > Success(N)                    89              87
> > compact_stall               4053            3642
> > compact_success              264             202
> > compact_fail                3788            3440
> > pgmigrate_success        6497642          153413
> > compact_isolated        13292640          353445
> > compact_migrate_scanned 69714502         2307433
> > compact_free_scanned    20243121         2325295
> 
> Here compact_success decreased relatively a lot, while success just barely.
> Less counterintuitive than the first result, but still a bit.

compact_stall on build test would be more unpredictable, because there
are a lot of background working threads and they allocate and free
memory. Anyway, I will investigate it and if I find something, I will
note. :)

To know precise reason, separating async/sync compaction stat will be
helpful.

> > This looks like reasonable trade-off.
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  mm/compaction.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index e61ee77..e1b44a5 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -129,19 +129,24 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> >  
> >  /* Do not skip compaction more than 64 times */
> >  #define COMPACT_MAX_FAILED 4
> > -#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
> > +#define COMPACT_MIN_DEPLETE_THRESHOLD 4UL
> >  #define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
> >  
> >  static bool compaction_depleted(struct zone *zone)
> >  {
> > -	unsigned long threshold;
> > +	unsigned long nr_possible;
> >  	unsigned long success = zone->compact_success;
> > +	unsigned long threshold;
> >  
> > -	/*
> > -	 * Now, to imitate current compaction deferring approach,
> > -	 * choose threshold to 1. It will be changed in the future.
> > -	 */
> > -	threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
> > +	nr_possible = zone->managed_pages >> zone->compact_order_failed;
> > +
> > +	/* Migration scanner normally scans less than 1/4 range of zone */
> > +	nr_possible >>= 2;
> > +
> > +	/* We hope to succeed more than 1/100 roughly */
> > +	threshold = nr_possible >> 7;
> > +
> > +	threshold = max(threshold, COMPACT_MIN_DEPLETE_THRESHOLD);
> >  	if (success >= threshold)
> >  		return false;
> 
> I wonder if compact_depletion_depth should play some "positive" role here. The
> bigger the depth, the lower the migration_scan_limit, which means higher chance
> of failing and so on. Ideally, the system should stabilize itself, so that
> migration_scan_limit is set based how many pages on average have to be scanned
> to succeed?

compact_depletion_depth is for throttling compaction when compaction
success possibility is depleted. Not for more success when depleted.
If compaction possibility is recovered, this little try will make
high-order page and stop throttling. This is a way to reduce excessive
compaction overhead like as deferring.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone
  2015-10-14 12:28   ` Vlastimil Babka
  2015-10-15  6:03     ` Joonsoo Kim
@ 2015-10-15  6:06     ` Joonsoo Kim
  1 sibling, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-10-15  6:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
	David Rientjes, Minchan Kim

On Wed, Oct 14, 2015 at 02:28:31PM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> > Now, compaction algorithm become powerful. Migration scanner traverses
> > whole zone range. So, old threshold for depleted zone which is designed
> > to imitate compaction deferring approach isn't appropriate for current
> > compaction algorithm. If we adhere to current threshold, 1, we can't
> > avoid excessive overhead caused by compaction, because one compaction
> > for low order allocation would be easily successful in any situation.
> > 
> > This patch re-implements threshold calculation based on zone size and
> > allocation requested order. We judge whther compaction possibility is
> > depleted or not by number of successful compaction. Roughly, 1/100
> > of future scanned area should be allocated for high order page during
> > one comaction iteration in order to determine whether zone's compaction
> > possiblity is depleted or not.
> 
> Finally finishing my review, sorry it took that long...
> 

Ah... I forgot to mention that I really appreciate your help.
Thanks for review!!


Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-10-15  6:05 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-24  2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
2015-08-24  2:19 ` [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn Joonsoo Kim
2015-08-24  9:07   ` Vlastimil Babka
2015-09-07  5:35     ` Joonsoo Kim
2015-09-08 16:30       ` Vlastimil Babka
2015-08-24  2:19 ` [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone Joonsoo Kim
2015-09-25  8:04   ` Vlastimil Babka
2015-09-30  8:12     ` Joonsoo Kim
2015-08-24  2:19 ` [PATCH v2 3/9] mm/compaction: limit compaction activity in compaction depleted state Joonsoo Kim
2015-08-24  2:19 ` [PATCH v2 4/9] mm/compaction: remove compaction deferring Joonsoo Kim
2015-09-25  9:33   ` Vlastimil Babka
2015-09-30  8:28     ` Joonsoo Kim
2015-08-24  2:19 ` [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state Joonsoo Kim
2015-09-25 10:32   ` Vlastimil Babka
2015-09-30  8:30     ` Joonsoo Kim
2015-08-24  2:19 ` [PATCH v2 6/9] mm/compaction: manage separate skip-bits for migration and free scanner Joonsoo Kim
2015-08-24  2:19 ` [PATCH v2 7/9] mm/compaction: redesign compaction Joonsoo Kim
2015-10-14 11:38   ` Vlastimil Babka
2015-10-15  2:38     ` Joonsoo Kim
2015-08-24  2:19 ` [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at Joonsoo Kim
2015-09-25 11:41   ` Vlastimil Babka
2015-09-30  8:38     ` Joonsoo Kim
2015-08-24  2:19 ` [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone Joonsoo Kim
2015-10-14 12:28   ` Vlastimil Babka
2015-10-15  6:03     ` Joonsoo Kim
2015-10-15  6:06     ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox