* [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn
2015-08-24 2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
@ 2015-08-24 2:19 ` Joonsoo Kim
2015-08-24 9:07 ` Vlastimil Babka
2015-08-24 2:19 ` [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone Joonsoo Kim
` (7 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24 2:19 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim
Cached pfn is used to determine the start position of scanner
at next compaction run. Current cached pfn points the skipped pageblock
so we uselessly checks whether pageblock is valid for compaction and
skip-bit is set or not. If we set scanner's cached pfn to next pfn of
skipped pageblock, we don't need to do this check.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
mm/compaction.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 6ef2fdf..c2d3d6a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -261,10 +261,9 @@ void reset_isolation_suitable(pg_data_t *pgdat)
*/
static void update_pageblock_skip(struct compact_control *cc,
struct page *page, unsigned long nr_isolated,
- bool migrate_scanner)
+ unsigned long pfn, bool migrate_scanner)
{
struct zone *zone = cc->zone;
- unsigned long pfn;
if (cc->ignore_skip_hint)
return;
@@ -277,8 +276,6 @@ static void update_pageblock_skip(struct compact_control *cc,
set_pageblock_skip(page);
- pfn = page_to_pfn(page);
-
/* Update where async and sync compaction should restart */
if (migrate_scanner) {
if (pfn > zone->compact_cached_migrate_pfn[0])
@@ -300,7 +297,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
static void update_pageblock_skip(struct compact_control *cc,
struct page *page, unsigned long nr_isolated,
- bool migrate_scanner)
+ unsigned long pfn, bool migrate_scanner)
{
}
#endif /* CONFIG_COMPACTION */
@@ -509,7 +506,8 @@ isolate_fail:
/* Update the pageblock-skip if the whole pageblock was scanned */
if (blockpfn == end_pfn)
- update_pageblock_skip(cc, valid_page, total_isolated, false);
+ update_pageblock_skip(cc, valid_page, total_isolated,
+ end_pfn, false);
count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
if (total_isolated)
@@ -811,7 +809,8 @@ isolate_success:
* if the whole pageblock was scanned without isolating any page.
*/
if (low_pfn == end_pfn)
- update_pageblock_skip(cc, valid_page, nr_isolated, true);
+ update_pageblock_skip(cc, valid_page, nr_isolated,
+ end_pfn, true);
trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
nr_scanned, nr_isolated);
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn
2015-08-24 2:19 ` [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn Joonsoo Kim
@ 2015-08-24 9:07 ` Vlastimil Babka
2015-09-07 5:35 ` Joonsoo Kim
0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-08-24 9:07 UTC (permalink / raw)
To: Joonsoo Kim, Andrew Morton
Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
Minchan Kim, Joonsoo Kim
On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Cached pfn is used to determine the start position of scanner
> at next compaction run. Current cached pfn points the skipped pageblock
> so we uselessly checks whether pageblock is valid for compaction and
> skip-bit is set or not. If we set scanner's cached pfn to next pfn of
> skipped pageblock, we don't need to do this check.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> mm/compaction.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 6ef2fdf..c2d3d6a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -261,10 +261,9 @@ void reset_isolation_suitable(pg_data_t *pgdat)
> */
> static void update_pageblock_skip(struct compact_control *cc,
> struct page *page, unsigned long nr_isolated,
> - bool migrate_scanner)
> + unsigned long pfn, bool migrate_scanner)
> {
> struct zone *zone = cc->zone;
> - unsigned long pfn;
>
> if (cc->ignore_skip_hint)
> return;
> @@ -277,8 +276,6 @@ static void update_pageblock_skip(struct compact_control *cc,
>
> set_pageblock_skip(page);
>
> - pfn = page_to_pfn(page);
> -
> /* Update where async and sync compaction should restart */
> if (migrate_scanner) {
> if (pfn > zone->compact_cached_migrate_pfn[0])
> @@ -300,7 +297,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
>
> static void update_pageblock_skip(struct compact_control *cc,
> struct page *page, unsigned long nr_isolated,
> - bool migrate_scanner)
> + unsigned long pfn, bool migrate_scanner)
> {
> }
> #endif /* CONFIG_COMPACTION */
> @@ -509,7 +506,8 @@ isolate_fail:
>
> /* Update the pageblock-skip if the whole pageblock was scanned */
> if (blockpfn == end_pfn)
> - update_pageblock_skip(cc, valid_page, total_isolated, false);
> + update_pageblock_skip(cc, valid_page, total_isolated,
> + end_pfn, false);
In isolate_freepages_block() this means we actually go logically *back*
one pageblock, as the direction is opposite? I know it's not an issue
after the redesign patch so you wouldn't notice it when testing the
whole series. But there's a non-zero chance that the smaller fixes are
merged first and the redesign later...
>
> count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
> if (total_isolated)
> @@ -811,7 +809,8 @@ isolate_success:
> * if the whole pageblock was scanned without isolating any page.
> */
> if (low_pfn == end_pfn)
> - update_pageblock_skip(cc, valid_page, nr_isolated, true);
> + update_pageblock_skip(cc, valid_page, nr_isolated,
> + end_pfn, true);
>
> trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
> nr_scanned, nr_isolated);
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn
2015-08-24 9:07 ` Vlastimil Babka
@ 2015-09-07 5:35 ` Joonsoo Kim
2015-09-08 16:30 ` Vlastimil Babka
0 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-09-07 5:35 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
David Rientjes, Minchan Kim
On Mon, Aug 24, 2015 at 11:07:12AM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> >Cached pfn is used to determine the start position of scanner
> >at next compaction run. Current cached pfn points the skipped pageblock
> >so we uselessly checks whether pageblock is valid for compaction and
> >skip-bit is set or not. If we set scanner's cached pfn to next pfn of
> >skipped pageblock, we don't need to do this check.
> >
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >---
> > mm/compaction.c | 13 ++++++-------
> > 1 file changed, 6 insertions(+), 7 deletions(-)
> >
> >diff --git a/mm/compaction.c b/mm/compaction.c
> >index 6ef2fdf..c2d3d6a 100644
> >--- a/mm/compaction.c
> >+++ b/mm/compaction.c
> >@@ -261,10 +261,9 @@ void reset_isolation_suitable(pg_data_t *pgdat)
> > */
> > static void update_pageblock_skip(struct compact_control *cc,
> > struct page *page, unsigned long nr_isolated,
> >- bool migrate_scanner)
> >+ unsigned long pfn, bool migrate_scanner)
> > {
> > struct zone *zone = cc->zone;
> >- unsigned long pfn;
> >
> > if (cc->ignore_skip_hint)
> > return;
> >@@ -277,8 +276,6 @@ static void update_pageblock_skip(struct compact_control *cc,
> >
> > set_pageblock_skip(page);
> >
> >- pfn = page_to_pfn(page);
> >-
> > /* Update where async and sync compaction should restart */
> > if (migrate_scanner) {
> > if (pfn > zone->compact_cached_migrate_pfn[0])
> >@@ -300,7 +297,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
> >
> > static void update_pageblock_skip(struct compact_control *cc,
> > struct page *page, unsigned long nr_isolated,
> >- bool migrate_scanner)
> >+ unsigned long pfn, bool migrate_scanner)
> > {
> > }
> > #endif /* CONFIG_COMPACTION */
> >@@ -509,7 +506,8 @@ isolate_fail:
> >
> > /* Update the pageblock-skip if the whole pageblock was scanned */
> > if (blockpfn == end_pfn)
> >- update_pageblock_skip(cc, valid_page, total_isolated, false);
> >+ update_pageblock_skip(cc, valid_page, total_isolated,
> >+ end_pfn, false);
>
> In isolate_freepages_block() this means we actually go logically
> *back* one pageblock, as the direction is opposite? I know it's not
> an issue after the redesign patch so you wouldn't notice it when
> testing the whole series. But there's a non-zero chance that the
> smaller fixes are merged first and the redesign later...
Hello, Vlastimil.
Sorry for long delay. I was on vacation. :)
I will fix it next time.
Btw, if possible, could you review the patchset in detail? or do you
have another plan on compaction improvement? Please let me know your
position to determine future plan of this patchset.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn
2015-09-07 5:35 ` Joonsoo Kim
@ 2015-09-08 16:30 ` Vlastimil Babka
0 siblings, 0 replies; 26+ messages in thread
From: Vlastimil Babka @ 2015-09-08 16:30 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
David Rientjes, Minchan Kim
On 09/07/2015 07:35 AM, Joonsoo Kim wrote:
> On Mon, Aug 24, 2015 at 11:07:12AM +0200, Vlastimil Babka wrote:
>> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
>>
>> In isolate_freepages_block() this means we actually go logically
>> *back* one pageblock, as the direction is opposite? I know it's not
>> an issue after the redesign patch so you wouldn't notice it when
>> testing the whole series. But there's a non-zero chance that the
>> smaller fixes are merged first and the redesign later...
>
> Hello, Vlastimil.
> Sorry for long delay. I was on vacation. :)
> I will fix it next time.
>
> Btw, if possible, could you review the patchset in detail? or do you
I'll try soon...
> have another plan on compaction improvement? Please let me know your
> position to determine future plan of this patchset.
>
> Thanks.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone
2015-08-24 2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
2015-08-24 2:19 ` [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn Joonsoo Kim
@ 2015-08-24 2:19 ` Joonsoo Kim
2015-09-25 8:04 ` Vlastimil Babka
2015-08-24 2:19 ` [PATCH v2 3/9] mm/compaction: limit compaction activity in compaction depleted state Joonsoo Kim
` (6 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24 2:19 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim
Further compaction attempt is deferred when some of compaction attempts
already fails. But, after some number of trial are skipped, compaction
restarts work to check whether compaction is now possible or not. It
scans whole range of zone to determine this possibility and if compaction
possibility doesn't recover, this whole range scan is quite big overhead.
As a first step to reduce this overhead, this patch implement compaction
depleted state on zone.
The way to determine depletion of compaction possility is checking number
of success on previous compaction attempt. If number of successful
compaction is below than specified threshold, we guess that compaction
will not successful next time so mark the zone as compaction depleted.
In this patch, threshold is choosed by 1 to imitate current compaction
deferring algorithm. In the following patch, compaction algorithm will be
changed and this threshold is also adjusted to that change.
In this patch, only state definition is implemented. There is no action
for this new state so no functional change. But, following patch will
add some handling for this new state.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/mmzone.h | 3 +++
mm/compaction.c | 44 +++++++++++++++++++++++++++++++++++++++++---
2 files changed, 44 insertions(+), 3 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 754c259..700e9b5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -517,6 +517,8 @@ struct zone {
unsigned int compact_considered;
unsigned int compact_defer_shift;
int compact_order_failed;
+ unsigned long compact_success;
+ unsigned long compact_depletion_depth;
#endif
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -543,6 +545,7 @@ enum zone_flags {
* many pages under writeback
*/
ZONE_FAIR_DEPLETED, /* fair zone policy batch depleted */
+ ZONE_COMPACTION_DEPLETED, /* compaction possiblity depleted */
};
static inline unsigned long zone_end_pfn(const struct zone *zone)
diff --git a/mm/compaction.c b/mm/compaction.c
index c2d3d6a..de96e9d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -129,6 +129,23 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
/* Do not skip compaction more than 64 times */
#define COMPACT_MAX_DEFER_SHIFT 6
+#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
+
+static bool compaction_depleted(struct zone *zone)
+{
+ unsigned long threshold;
+ unsigned long success = zone->compact_success;
+
+ /*
+ * Now, to imitate current compaction deferring approach,
+ * choose threshold to 1. It will be changed in the future.
+ */
+ threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
+ if (success >= threshold)
+ return false;
+
+ return true;
+}
/*
* Compaction is deferred when compaction fails to result in a page
@@ -223,6 +240,16 @@ static void __reset_isolation_suitable(struct zone *zone)
zone->compact_cached_free_pfn = end_pfn;
zone->compact_blockskip_flush = false;
+ if (compaction_depleted(zone)) {
+ if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
+ zone->compact_depletion_depth++;
+ else {
+ set_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
+ zone->compact_depletion_depth = 0;
+ }
+ }
+ zone->compact_success = 0;
+
/* Walk the zone and mark every pageblock as suitable for isolation */
for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
struct page *page;
@@ -1185,22 +1212,28 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
bool can_steal;
/* Job done if page is free of the right migratetype */
- if (!list_empty(&area->free_list[migratetype]))
+ if (!list_empty(&area->free_list[migratetype])) {
+ zone->compact_success++;
return COMPACT_PARTIAL;
+ }
#ifdef CONFIG_CMA
/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
if (migratetype == MIGRATE_MOVABLE &&
- !list_empty(&area->free_list[MIGRATE_CMA]))
+ !list_empty(&area->free_list[MIGRATE_CMA])) {
+ zone->compact_success++;
return COMPACT_PARTIAL;
+ }
#endif
/*
* Job done if allocation would steal freepages from
* other migratetype buddy lists.
*/
if (find_suitable_fallback(area, order, migratetype,
- true, &can_steal) != -1)
+ true, &can_steal) != -1) {
+ zone->compact_success++;
return COMPACT_PARTIAL;
+ }
}
return COMPACT_NO_SUITABLE_PAGE;
@@ -1440,6 +1473,11 @@ out:
trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
cc->free_pfn, end_pfn, sync, ret);
+ if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags)) {
+ if (!compaction_depleted(zone))
+ clear_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
+ }
+
return ret;
}
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone
2015-08-24 2:19 ` [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone Joonsoo Kim
@ 2015-09-25 8:04 ` Vlastimil Babka
2015-09-30 8:12 ` Joonsoo Kim
0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-09-25 8:04 UTC (permalink / raw)
To: Joonsoo Kim, Andrew Morton
Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
Minchan Kim, Joonsoo Kim
On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Further compaction attempt is deferred when some of compaction attempts
> already fails. But, after some number of trial are skipped, compaction
> restarts work to check whether compaction is now possible or not. It
> scans whole range of zone to determine this possibility and if compaction
> possibility doesn't recover, this whole range scan is quite big overhead.
> As a first step to reduce this overhead, this patch implement compaction
> depleted state on zone.
>
> The way to determine depletion of compaction possility is checking number
> of success on previous compaction attempt. If number of successful
> compaction is below than specified threshold, we guess that compaction
> will not successful next time so mark the zone as compaction depleted.
> In this patch, threshold is choosed by 1 to imitate current compaction
> deferring algorithm. In the following patch, compaction algorithm will be
> changed and this threshold is also adjusted to that change.
>
> In this patch, only state definition is implemented. There is no action
> for this new state so no functional change. But, following patch will
> add some handling for this new state.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> include/linux/mmzone.h | 3 +++
> mm/compaction.c | 44 +++++++++++++++++++++++++++++++++++++++++---
> 2 files changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 754c259..700e9b5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -517,6 +517,8 @@ struct zone {
> unsigned int compact_considered;
> unsigned int compact_defer_shift;
> int compact_order_failed;
> + unsigned long compact_success;
> + unsigned long compact_depletion_depth;
> #endif
>
> #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> @@ -543,6 +545,7 @@ enum zone_flags {
> * many pages under writeback
> */
> ZONE_FAIR_DEPLETED, /* fair zone policy batch depleted */
> + ZONE_COMPACTION_DEPLETED, /* compaction possiblity depleted */
> };
>
> static inline unsigned long zone_end_pfn(const struct zone *zone)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c2d3d6a..de96e9d 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -129,6 +129,23 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>
> /* Do not skip compaction more than 64 times */
> #define COMPACT_MAX_DEFER_SHIFT 6
> +#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
> +
> +static bool compaction_depleted(struct zone *zone)
> +{
> + unsigned long threshold;
> + unsigned long success = zone->compact_success;
> +
> + /*
> + * Now, to imitate current compaction deferring approach,
> + * choose threshold to 1. It will be changed in the future.
> + */
> + threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
> + if (success >= threshold)
> + return false;
> +
> + return true;
> +}
>
> /*
> * Compaction is deferred when compaction fails to result in a page
> @@ -223,6 +240,16 @@ static void __reset_isolation_suitable(struct zone *zone)
> zone->compact_cached_free_pfn = end_pfn;
> zone->compact_blockskip_flush = false;
>
> + if (compaction_depleted(zone)) {
> + if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
> + zone->compact_depletion_depth++;
> + else {
> + set_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
> + zone->compact_depletion_depth = 0;
> + }
> + }
> + zone->compact_success = 0;
It's possible that the following comment is made moot by further patches, but:
I assume doing this in __reset_isolation_suitable() is to react on the
compaction_restarting() state. But __reset_isolation_suitable() is called also
from manually invoked compaction, and from kswapd. What if compaction has
succeeded S times, but threshold is T, S < T and T is larger than 1 (by a later
patch). Then kswapd or manual compaction will reset S to zero, without giving it
chance to reach T, even when compaction would succeed?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone
2015-09-25 8:04 ` Vlastimil Babka
@ 2015-09-30 8:12 ` Joonsoo Kim
0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-09-30 8:12 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
David Rientjes, Minchan Kim
Hello, Vlastimil.
First of all, thanks for review!
On Fri, Sep 25, 2015 at 10:04:49AM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> > Further compaction attempt is deferred when some of compaction attempts
> > already fails. But, after some number of trial are skipped, compaction
> > restarts work to check whether compaction is now possible or not. It
> > scans whole range of zone to determine this possibility and if compaction
> > possibility doesn't recover, this whole range scan is quite big overhead.
> > As a first step to reduce this overhead, this patch implement compaction
> > depleted state on zone.
> >
> > The way to determine depletion of compaction possility is checking number
> > of success on previous compaction attempt. If number of successful
> > compaction is below than specified threshold, we guess that compaction
> > will not successful next time so mark the zone as compaction depleted.
> > In this patch, threshold is choosed by 1 to imitate current compaction
> > deferring algorithm. In the following patch, compaction algorithm will be
> > changed and this threshold is also adjusted to that change.
> >
> > In this patch, only state definition is implemented. There is no action
> > for this new state so no functional change. But, following patch will
> > add some handling for this new state.
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> > include/linux/mmzone.h | 3 +++
> > mm/compaction.c | 44 +++++++++++++++++++++++++++++++++++++++++---
> > 2 files changed, 44 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 754c259..700e9b5 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -517,6 +517,8 @@ struct zone {
> > unsigned int compact_considered;
> > unsigned int compact_defer_shift;
> > int compact_order_failed;
> > + unsigned long compact_success;
> > + unsigned long compact_depletion_depth;
> > #endif
> >
> > #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> > @@ -543,6 +545,7 @@ enum zone_flags {
> > * many pages under writeback
> > */
> > ZONE_FAIR_DEPLETED, /* fair zone policy batch depleted */
> > + ZONE_COMPACTION_DEPLETED, /* compaction possiblity depleted */
> > };
> >
> > static inline unsigned long zone_end_pfn(const struct zone *zone)
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index c2d3d6a..de96e9d 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -129,6 +129,23 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> >
> > /* Do not skip compaction more than 64 times */
> > #define COMPACT_MAX_DEFER_SHIFT 6
> > +#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
> > +
> > +static bool compaction_depleted(struct zone *zone)
> > +{
> > + unsigned long threshold;
> > + unsigned long success = zone->compact_success;
> > +
> > + /*
> > + * Now, to imitate current compaction deferring approach,
> > + * choose threshold to 1. It will be changed in the future.
> > + */
> > + threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
> > + if (success >= threshold)
> > + return false;
> > +
> > + return true;
> > +}
> >
> > /*
> > * Compaction is deferred when compaction fails to result in a page
> > @@ -223,6 +240,16 @@ static void __reset_isolation_suitable(struct zone *zone)
> > zone->compact_cached_free_pfn = end_pfn;
> > zone->compact_blockskip_flush = false;
> >
> > + if (compaction_depleted(zone)) {
> > + if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
> > + zone->compact_depletion_depth++;
> > + else {
> > + set_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
> > + zone->compact_depletion_depth = 0;
> > + }
> > + }
> > + zone->compact_success = 0;
>
> It's possible that the following comment is made moot by further patches, but:
>
> I assume doing this in __reset_isolation_suitable() is to react on the
> compaction_restarting() state. But __reset_isolation_suitable() is called also
> from manually invoked compaction, and from kswapd. What if compaction has
> succeeded S times, but threshold is T, S < T and T is larger than 1 (by a later
> patch). Then kswapd or manual compaction will reset S to zero, without giving it
> chance to reach T, even when compaction would succeed?
Okay. I will fix it.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 3/9] mm/compaction: limit compaction activity in compaction depleted state
2015-08-24 2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
2015-08-24 2:19 ` [PATCH v2 1/9] mm/compaction: skip useless pfn when updating cached pfn Joonsoo Kim
2015-08-24 2:19 ` [PATCH v2 2/9] mm/compaction: introduce compaction depleted state on zone Joonsoo Kim
@ 2015-08-24 2:19 ` Joonsoo Kim
2015-08-24 2:19 ` [PATCH v2 4/9] mm/compaction: remove compaction deferring Joonsoo Kim
` (5 subsequent siblings)
8 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24 2:19 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim
Compaction deferring was introduced to reduce overhead of compaction
when compaction attempt is expected to fail. But, it has a problem.
Whole zone is rescanned after some compaction attempts are deferred and
this rescan overhead is quite big. And, it imposes large latency to one
random requestor while others will get nearly zero latency to fail due
to deferring compaction. This patch try to handle this situation
differently to solve above problems.
At first, we should know when compaction will fail. Previous patch
defines compaction depleted state. In this state, compaction failure
is highly expected so we don't need to take much effort on compaction.
So, this patch forces migration scanner scan restricted number of pages
in this state. With this way, we can evenly distribute compaction overhead
to all compaction requestors. And, there is a way to escape from
compaction depleted state so we don't need to defer specific number of
compaction attempts unconditionally if compaction possibility recovers.
In this patch, migration scanner limit is defined to imitate current
compaction deferring approach. But, we can tune it easily if this
overhead doesn't look appropriate. It would be further work.
There would be a situation that compaction depleted state is maintained
for a long time. In this case, repeated compaction attempts would cause
useless overhead continually. To optimize this case, this patch uses
compaction depletion depth and make migration scanner limit diminished
according to this depth. It effectively reduce compaction overhead in
this situation.
Should note that this patch just introduces scan_limit infrastructure
and doesn't check scan_limit to finish the compaction. It will
be implemented in next patch with removing compaction deferring.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
mm/compaction.c | 41 +++++++++++++++++++++++++++++++++++++++++
mm/internal.h | 1 +
2 files changed, 42 insertions(+)
diff --git a/mm/compaction.c b/mm/compaction.c
index de96e9d..c6b8277 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -130,6 +130,7 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
/* Do not skip compaction more than 64 times */
#define COMPACT_MAX_DEFER_SHIFT 6
#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
+#define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
static bool compaction_depleted(struct zone *zone)
{
@@ -147,6 +148,42 @@ static bool compaction_depleted(struct zone *zone)
return true;
}
+static void set_migration_scan_limit(struct compact_control *cc)
+{
+ struct zone *zone = cc->zone;
+ int order = cc->order;
+ unsigned long limit;
+
+ cc->migration_scan_limit = LONG_MAX;
+ if (order < 0)
+ return;
+
+ if (!test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
+ return;
+
+ if (!zone->compact_depletion_depth)
+ return;
+
+ /*
+ * Experimental observation shows that migration scanner
+ * normally scans 1/4 pages
+ */
+ limit = zone->managed_pages >> 2;
+
+ /*
+ * Deferred compaction restart compaction every 64 compaction
+ * attempts and it rescans whole zone range. If we limit
+ * migration scanner to scan 1/64 range when depleted, 64
+ * compaction attempts will rescan whole zone range as same
+ * as deferred compaction.
+ */
+ limit >>= 6;
+ limit = max(limit, COMPACT_MIN_SCAN_LIMIT);
+
+ /* Degradation scan limit according to depletion depth. */
+ limit >>= zone->compact_depletion_depth;
+ cc->migration_scan_limit = max(limit, COMPACT_CLUSTER_MAX);
+}
/*
* Compaction is deferred when compaction fails to result in a page
* allocation success. 1 << compact_defer_limit compactions are skipped up
@@ -839,6 +876,8 @@ isolate_success:
update_pageblock_skip(cc, valid_page, nr_isolated,
end_pfn, true);
+ cc->migration_scan_limit -= nr_scanned;
+
trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
nr_scanned, nr_isolated);
@@ -1367,6 +1406,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
}
+ set_migration_scan_limit(cc);
+
trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
cc->free_pfn, end_pfn, sync);
diff --git a/mm/internal.h b/mm/internal.h
index 36b23f1..a427695 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -182,6 +182,7 @@ struct compact_control {
unsigned long nr_migratepages; /* Number of pages to migrate */
unsigned long free_pfn; /* isolate_freepages search base */
unsigned long migrate_pfn; /* isolate_migratepages search base */
+ long migration_scan_limit; /* Limit migration scanner activity */
enum migrate_mode mode; /* Async or sync migration mode */
bool ignore_skip_hint; /* Scan blocks even if marked skip */
int order; /* order a direct compactor needs */
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* [PATCH v2 4/9] mm/compaction: remove compaction deferring
2015-08-24 2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
` (2 preceding siblings ...)
2015-08-24 2:19 ` [PATCH v2 3/9] mm/compaction: limit compaction activity in compaction depleted state Joonsoo Kim
@ 2015-08-24 2:19 ` Joonsoo Kim
2015-09-25 9:33 ` Vlastimil Babka
2015-08-24 2:19 ` [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state Joonsoo Kim
` (4 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24 2:19 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim
Now, we have a way to determine compaction depleted state and compaction
activity will be limited according this state and depletion depth so
compaction overhead would be well controlled without compaction deferring.
So, this patch remove compaction deferring completely and enable
compaction activity limit.
Various functions are renamed and tracepoint outputs are changed due to
this removing.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/compaction.h | 14 +------
include/linux/mmzone.h | 3 +-
include/trace/events/compaction.h | 30 ++++++-------
mm/compaction.c | 88 ++++++++++++++-------------------------
mm/page_alloc.c | 2 +-
mm/vmscan.c | 4 +-
6 files changed, 49 insertions(+), 92 deletions(-)
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index aa8f61c..8d98f3c 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -45,11 +45,8 @@ extern void reset_isolation_suitable(pg_data_t *pgdat);
extern unsigned long compaction_suitable(struct zone *zone, int order,
int alloc_flags, int classzone_idx);
-extern void defer_compaction(struct zone *zone, int order);
-extern bool compaction_deferred(struct zone *zone, int order);
-extern void compaction_defer_reset(struct zone *zone, int order,
+extern void compaction_failed_reset(struct zone *zone, int order,
bool alloc_success);
-extern bool compaction_restarting(struct zone *zone, int order);
#else
static inline unsigned long try_to_compact_pages(gfp_t gfp_mask,
@@ -74,15 +71,6 @@ static inline unsigned long compaction_suitable(struct zone *zone, int order,
return COMPACT_SKIPPED;
}
-static inline void defer_compaction(struct zone *zone, int order)
-{
-}
-
-static inline bool compaction_deferred(struct zone *zone, int order)
-{
- return true;
-}
-
#endif /* CONFIG_COMPACTION */
#if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 700e9b5..e13b732 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -514,8 +514,7 @@ struct zone {
* are skipped before trying again. The number attempted since
* last failure is tracked with compact_considered.
*/
- unsigned int compact_considered;
- unsigned int compact_defer_shift;
+ int compact_failed;
int compact_order_failed;
unsigned long compact_success;
unsigned long compact_depletion_depth;
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index 9a6a3fe..323e614 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -239,7 +239,7 @@ DEFINE_EVENT(mm_compaction_suitable_template, mm_compaction_suitable,
);
#ifdef CONFIG_COMPACTION
-DECLARE_EVENT_CLASS(mm_compaction_defer_template,
+DECLARE_EVENT_CLASS(mm_compaction_deplete_template,
TP_PROTO(struct zone *zone, int order),
@@ -249,8 +249,9 @@ DECLARE_EVENT_CLASS(mm_compaction_defer_template,
__field(int, nid)
__field(char *, name)
__field(int, order)
- __field(unsigned int, considered)
- __field(unsigned int, defer_shift)
+ __field(unsigned long, success)
+ __field(unsigned long, depletion_depth)
+ __field(int, failed)
__field(int, order_failed)
),
@@ -258,35 +259,30 @@ DECLARE_EVENT_CLASS(mm_compaction_defer_template,
__entry->nid = zone_to_nid(zone);
__entry->name = (char *)zone->name;
__entry->order = order;
- __entry->considered = zone->compact_considered;
- __entry->defer_shift = zone->compact_defer_shift;
+ __entry->success = zone->compact_success;
+ __entry->depletion_depth = zone->compact_depletion_depth;
+ __entry->failed = zone->compact_failed;
__entry->order_failed = zone->compact_order_failed;
),
- TP_printk("node=%d zone=%-8s order=%d order_failed=%d consider=%u limit=%lu",
+ TP_printk("node=%d zone=%-8s order=%d failed=%d order_failed=%d consider=%lu depth=%lu",
__entry->nid,
__entry->name,
__entry->order,
+ __entry->failed,
__entry->order_failed,
- __entry->considered,
- 1UL << __entry->defer_shift)
+ __entry->success,
+ __entry->depletion_depth)
);
-DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_deferred,
+DEFINE_EVENT(mm_compaction_deplete_template, mm_compaction_fail_compaction,
TP_PROTO(struct zone *zone, int order),
TP_ARGS(zone, order)
);
-DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_compaction,
-
- TP_PROTO(struct zone *zone, int order),
-
- TP_ARGS(zone, order)
-);
-
-DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_defer_reset,
+DEFINE_EVENT(mm_compaction_deplete_template, mm_compaction_failed_reset,
TP_PROTO(struct zone *zone, int order),
diff --git a/mm/compaction.c b/mm/compaction.c
index c6b8277..1817564 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -128,7 +128,7 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
#ifdef CONFIG_COMPACTION
/* Do not skip compaction more than 64 times */
-#define COMPACT_MAX_DEFER_SHIFT 6
+#define COMPACT_MAX_FAILED 4
#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
#define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
@@ -184,61 +184,28 @@ static void set_migration_scan_limit(struct compact_control *cc)
limit >>= zone->compact_depletion_depth;
cc->migration_scan_limit = max(limit, COMPACT_CLUSTER_MAX);
}
-/*
- * Compaction is deferred when compaction fails to result in a page
- * allocation success. 1 << compact_defer_limit compactions are skipped up
- * to a limit of 1 << COMPACT_MAX_DEFER_SHIFT
- */
-void defer_compaction(struct zone *zone, int order)
-{
- zone->compact_considered = 0;
- zone->compact_defer_shift++;
-
- if (order < zone->compact_order_failed)
- zone->compact_order_failed = order;
-
- if (zone->compact_defer_shift > COMPACT_MAX_DEFER_SHIFT)
- zone->compact_defer_shift = COMPACT_MAX_DEFER_SHIFT;
-
- trace_mm_compaction_defer_compaction(zone, order);
-}
-/* Returns true if compaction should be skipped this time */
-bool compaction_deferred(struct zone *zone, int order)
+void fail_compaction(struct zone *zone, int order)
{
- unsigned long defer_limit = 1UL << zone->compact_defer_shift;
-
- if (order < zone->compact_order_failed)
- return false;
-
- /* Avoid possible overflow */
- if (++zone->compact_considered > defer_limit)
- zone->compact_considered = defer_limit;
-
- if (zone->compact_considered >= defer_limit)
- return false;
-
- trace_mm_compaction_deferred(zone, order);
+ if (order < zone->compact_order_failed) {
+ zone->compact_failed = 0;
+ zone->compact_order_failed = order;
+ } else
+ zone->compact_failed++;
- return true;
+ trace_mm_compaction_fail_compaction(zone, order);
}
-/*
- * Update defer tracking counters after successful compaction of given order,
- * which means an allocation either succeeded (alloc_success == true) or is
- * expected to succeed.
- */
-void compaction_defer_reset(struct zone *zone, int order,
+void compaction_failed_reset(struct zone *zone, int order,
bool alloc_success)
{
- if (alloc_success) {
- zone->compact_considered = 0;
- zone->compact_defer_shift = 0;
- }
+ if (alloc_success)
+ zone->compact_failed = 0;
+
if (order >= zone->compact_order_failed)
zone->compact_order_failed = order + 1;
- trace_mm_compaction_defer_reset(zone, order);
+ trace_mm_compaction_failed_reset(zone, order);
}
/* Returns true if restarting compaction after many failures */
@@ -247,8 +214,7 @@ bool compaction_restarting(struct zone *zone, int order)
if (order < zone->compact_order_failed)
return false;
- return zone->compact_defer_shift == COMPACT_MAX_DEFER_SHIFT &&
- zone->compact_considered >= 1UL << zone->compact_defer_shift;
+ return zone->compact_failed < COMPACT_MAX_FAILED ? false : true;
}
/* Returns true if the pageblock should be scanned for pages to isolate. */
@@ -286,6 +252,7 @@ static void __reset_isolation_suitable(struct zone *zone)
}
}
zone->compact_success = 0;
+ zone->compact_failed = 0;
/* Walk the zone and mark every pageblock as suitable for isolation */
for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
@@ -335,10 +302,15 @@ static void update_pageblock_skip(struct compact_control *cc,
if (!page)
return;
- if (nr_isolated)
+ /*
+ * Always update cached_pfn if compaction has scan_limit,
+ * otherwise we would scan same range over and over.
+ */
+ if (cc->migration_scan_limit == LONG_MAX && nr_isolated)
return;
- set_pageblock_skip(page);
+ if (!nr_isolated)
+ set_pageblock_skip(page);
/* Update where async and sync compaction should restart */
if (migrate_scanner) {
@@ -1231,6 +1203,9 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
return COMPACT_COMPLETE;
}
+ if (cc->migration_scan_limit < 0)
+ return COMPACT_PARTIAL;
+
/*
* order == -1 is expected when compacting via
* /proc/sys/vm/compact_memory
@@ -1407,6 +1382,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
}
set_migration_scan_limit(cc);
+ if (cc->migration_scan_limit < 0)
+ return COMPACT_SKIPPED;
trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
cc->free_pfn, end_pfn, sync);
@@ -1588,9 +1565,6 @@ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
int status;
int zone_contended;
- if (compaction_deferred(zone, order))
- continue;
-
status = compact_zone_order(zone, order, gfp_mask, mode,
&zone_contended, alloc_flags,
ac->classzone_idx);
@@ -1610,7 +1584,7 @@ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
* will repeat this with true if allocation indeed
* succeeds in this zone.
*/
- compaction_defer_reset(zone, order, false);
+ compaction_failed_reset(zone, order, false);
/*
* It is possible that async compaction aborted due to
* need_resched() and the watermarks were ok thanks to
@@ -1631,7 +1605,7 @@ unsigned long try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
* so we defer compaction there. If it ends up
* succeeding after all, it will be reset.
*/
- defer_compaction(zone, order);
+ fail_compaction(zone, order);
}
/*
@@ -1693,13 +1667,13 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
if (cc->order == -1)
__reset_isolation_suitable(zone);
- if (cc->order == -1 || !compaction_deferred(zone, cc->order))
+ if (cc->order == -1)
compact_zone(zone, cc);
if (cc->order > 0) {
if (zone_watermark_ok(zone, cc->order,
low_wmark_pages(zone), 0, 0))
- compaction_defer_reset(zone, cc->order, false);
+ compaction_failed_reset(zone, cc->order, false);
}
VM_BUG_ON(!list_empty(&cc->freepages));
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e9cc98..c67f853 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2827,7 +2827,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
struct zone *zone = page_zone(page);
zone->compact_blockskip_flush = false;
- compaction_defer_reset(zone, order, true);
+ compaction_failed_reset(zone, order, true);
count_vm_event(COMPACTSUCCESS);
return page;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 37e90db..a561b5f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2469,10 +2469,10 @@ static inline bool compaction_ready(struct zone *zone, int order)
watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
/*
- * If compaction is deferred, reclaim up to a point where
+ * If compaction is depleted, reclaim up to a point where
* compaction will have a chance of success when re-enabled
*/
- if (compaction_deferred(zone, order))
+ if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
return watermark_ok;
/*
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 4/9] mm/compaction: remove compaction deferring
2015-08-24 2:19 ` [PATCH v2 4/9] mm/compaction: remove compaction deferring Joonsoo Kim
@ 2015-09-25 9:33 ` Vlastimil Babka
2015-09-30 8:28 ` Joonsoo Kim
0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-09-25 9:33 UTC (permalink / raw)
To: Joonsoo Kim, Andrew Morton
Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
Minchan Kim, Joonsoo Kim
On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Now, we have a way to determine compaction depleted state and compaction
> activity will be limited according this state and depletion depth so
> compaction overhead would be well controlled without compaction deferring.
> So, this patch remove compaction deferring completely and enable
> compaction activity limit.
>
> Various functions are renamed and tracepoint outputs are changed due to
> this removing.
It's more like renaming "deferred" to "failed" and the whole result is somewhat
hard to follow, as the changelog doesn't describe a lot. So if I understand
correctly:
- compaction has to fail 4 times to cause __reset_isolation_suitable(), which
also resets the fail counter back to 0
- thus after each 4 failures, depletion depth is adjusted
- when successes cross the depletion threshold, compaction_depleted() becomes
false and then compact_zone will clear the flag
- with flag clear, scan limit is no longer applied at all, but depletion depth
stays as it is... it will be set to 0 when the flag is set again
Maybe the switch from "depleted with some depth" to "not depleted at all" could
be more gradual?
Also I have a suspicion that the main feature of this (IIUC) which is the scan
limiting (and which I do consider improvement! and IIRC David wished for
something like that too) could be achieved with less code churn that is renaming
"deferred" to "failed" and adding another "depleted" state. E.g.
compact_defer_shift looks similar to compact_depletion_depth, and deferring
could be repurposed for scan limiting instead of binary go/no-go decisions. BTW
the name "depleted" also suggests a binary state, so it's not that much better
name than "deferred" IMHO.
Also I think my objection from patch 2 stays - __reset_isolation_suitable()
called from kswapd will set zone->compact_success = 0, potentially increase
depletion depth etc, with no connection to the number of failed compactions.
[...]
> @@ -1693,13 +1667,13 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
> if (cc->order == -1)
> __reset_isolation_suitable(zone);
>
> - if (cc->order == -1 || !compaction_deferred(zone, cc->order))
> + if (cc->order == -1)
This change means kswapd no longer compacts from balance_pgdat() ->
compact_pgdat(). Probably you meant to call compact_zone unconditionally here?
> compact_zone(zone, cc);
>
> if (cc->order > 0) {
> if (zone_watermark_ok(zone, cc->order,
> low_wmark_pages(zone), 0, 0))
> - compaction_defer_reset(zone, cc->order, false);
> + compaction_failed_reset(zone, cc->order, false);
> }
>
> VM_BUG_ON(!list_empty(&cc->freepages));
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0e9cc98..c67f853 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2827,7 +2827,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> struct zone *zone = page_zone(page);
>
> zone->compact_blockskip_flush = false;
> - compaction_defer_reset(zone, order, true);
> + compaction_failed_reset(zone, order, true);
> count_vm_event(COMPACTSUCCESS);
> return page;
> }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 37e90db..a561b5f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2469,10 +2469,10 @@ static inline bool compaction_ready(struct zone *zone, int order)
> watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
>
> /*
> - * If compaction is deferred, reclaim up to a point where
> + * If compaction is depleted, reclaim up to a point where
> * compaction will have a chance of success when re-enabled
> */
> - if (compaction_deferred(zone, order))
> + if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
> return watermark_ok;
Hm this is a deviation from the "replace go/no-go with scan limit" principle.
Also compaction_deferred() could recover after some retries, and this flag won't.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 4/9] mm/compaction: remove compaction deferring
2015-09-25 9:33 ` Vlastimil Babka
@ 2015-09-30 8:28 ` Joonsoo Kim
0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-09-30 8:28 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
David Rientjes, Minchan Kim
On Fri, Sep 25, 2015 at 11:33:38AM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> > Now, we have a way to determine compaction depleted state and compaction
> > activity will be limited according this state and depletion depth so
> > compaction overhead would be well controlled without compaction deferring.
> > So, this patch remove compaction deferring completely and enable
> > compaction activity limit.
> >
> > Various functions are renamed and tracepoint outputs are changed due to
> > this removing.
>
> It's more like renaming "deferred" to "failed" and the whole result is somewhat
> hard to follow, as the changelog doesn't describe a lot. So if I understand
> correctly:
> - compaction has to fail 4 times to cause __reset_isolation_suitable(), which
> also resets the fail counter back to 0
> - thus after each 4 failures, depletion depth is adjusted
> - when successes cross the depletion threshold, compaction_depleted() becomes
> false and then compact_zone will clear the flag
> - with flag clear, scan limit is no longer applied at all, but depletion depth
> stays as it is... it will be set to 0 when the flag is set again
Correct! I will add this description at some place.
>
> Maybe the switch from "depleted with some depth" to "not depleted at all" could
> be more gradual?
> Also I have a suspicion that the main feature of this (IIUC) which is the scan
> limiting (and which I do consider improvement! and IIRC David wished for
> something like that too) could be achieved with less code churn that is renaming
> "deferred" to "failed" and adding another "depleted" state. E.g.
> compact_defer_shift looks similar to compact_depletion_depth, and deferring
> could be repurposed for scan limiting instead of binary go/no-go decisions. BTW
> the name "depleted" also suggests a binary state, so it's not that much better
> name than "deferred" IMHO.
Okay. I will try to change current deferred logic to scan limit logic
and make code less churn.
Naming? I will think more.
> Also I think my objection from patch 2 stays - __reset_isolation_suitable()
> called from kswapd will set zone->compact_success = 0, potentially increase
> depletion depth etc, with no connection to the number of failed compactions.
>
> [...]
>
> > @@ -1693,13 +1667,13 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
> > if (cc->order == -1)
> > __reset_isolation_suitable(zone);
> >
> > - if (cc->order == -1 || !compaction_deferred(zone, cc->order))
> > + if (cc->order == -1)
>
> This change means kswapd no longer compacts from balance_pgdat() ->
> compact_pgdat(). Probably you meant to call compact_zone unconditionally here?
Yes, that's what I want. I will fix it.
> > compact_zone(zone, cc);
> >
> > if (cc->order > 0) {
> > if (zone_watermark_ok(zone, cc->order,
> > low_wmark_pages(zone), 0, 0))
> > - compaction_defer_reset(zone, cc->order, false);
> > + compaction_failed_reset(zone, cc->order, false);
> > }
> >
> > VM_BUG_ON(!list_empty(&cc->freepages));
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 0e9cc98..c67f853 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2827,7 +2827,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> > struct zone *zone = page_zone(page);
> >
> > zone->compact_blockskip_flush = false;
> > - compaction_defer_reset(zone, order, true);
> > + compaction_failed_reset(zone, order, true);
> > count_vm_event(COMPACTSUCCESS);
> > return page;
> > }
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 37e90db..a561b5f 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2469,10 +2469,10 @@ static inline bool compaction_ready(struct zone *zone, int order)
> > watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
> >
> > /*
> > - * If compaction is deferred, reclaim up to a point where
> > + * If compaction is depleted, reclaim up to a point where
> > * compaction will have a chance of success when re-enabled
> > */
> > - if (compaction_deferred(zone, order))
> > + if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
> > return watermark_ok;
>
> Hm this is a deviation from the "replace go/no-go with scan limit" principle.
> Also compaction_deferred() could recover after some retries, and this flag won't.
This means that if compaction success possibility is depleted there is
no need to reclaim more pages above watermark because more reclaim effort
is waste in this situation. It is same with compaction_deferred case.
Recover is done by attemping actual compaction by kswapd or direct
compaction because there is no blocker like as compaction_deferring
logic. I think that this code change is okay.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state
2015-08-24 2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
` (3 preceding siblings ...)
2015-08-24 2:19 ` [PATCH v2 4/9] mm/compaction: remove compaction deferring Joonsoo Kim
@ 2015-08-24 2:19 ` Joonsoo Kim
2015-09-25 10:32 ` Vlastimil Babka
2015-08-24 2:19 ` [PATCH v2 6/9] mm/compaction: manage separate skip-bits for migration and free scanner Joonsoo Kim
` (3 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24 2:19 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim
Currently, freescanner doesn't scan non-movable pageblock, because if
freepages in non-movable pageblock are exhausted, another movable
pageblock would be used for non-movable allocation and it could cause
fragmentation.
But, we should know that watermark check for compaction doesn't
distinguish where freepage is. If all freepages are in non-movable
pageblock, although, system has enough freepages and watermark check
is passed, freepage scanner can't get any freepage and compaction will
be failed. There is no way to get precise number of freepage on movable
pageblock and no way to reclaim only used pages in movable pageblock.
Therefore, I think that best way to overcome this situation is
to use freepage in non-movable pageblock in compaction.
My test setup for this situation is:
Memory is artificially fragmented to make order 3 allocation hard. And,
most of pageblocks are changed to unmovable migratetype.
System: 512 MB with 32 MB Zram
Memory: 25% memory is allocated to make fragmentation and kernel build
is running on background.
Fragmentation: Successful order 3 allocation candidates may be around
1500 roughly.
Allocation attempts: Roughly 3000 order 3 allocation attempts
with GFP_NORETRY. This value is determined to saturate allocation
success.
Below is the result of this test.
Test: build-frag-unmovable
Kernel: Base vs Nonmovable
Success(N) 37 64
compact_stall 624 5056
compact_success 103 419
compact_fail 521 4637
pgmigrate_success 22004 277106
compact_isolated 61021 1056863
compact_migrate_scanned 2609360 70252458
compact_free_scanned 4808989 23091292
Column 'Success(N) are calculated by following equations.
Success(N) = successful allocation * 100 / order 3 candidates
Result shows that success rate is roughly doubled in this case
because we can search more area.
Because we just allow freepage scanner to scan non-movable pageblock
in very limited situation, more scanning events happen. But, allowing
in very limited situation results in a very important benefit that
memory isn't fragmented more than before. Fragmentation effect is
measured on following patch so please refer it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/mmzone.h | 1 +
mm/compaction.c | 27 +++++++++++++++++++++++++--
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e13b732..5cae0ad 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -545,6 +545,7 @@ enum zone_flags {
*/
ZONE_FAIR_DEPLETED, /* fair zone policy batch depleted */
ZONE_COMPACTION_DEPLETED, /* compaction possiblity depleted */
+ ZONE_COMPACTION_SCANALLFREE, /* scan all kinds of pageblocks */
};
static inline unsigned long zone_end_pfn(const struct zone *zone)
diff --git a/mm/compaction.c b/mm/compaction.c
index 1817564..b58f162 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -243,9 +243,17 @@ static void __reset_isolation_suitable(struct zone *zone)
zone->compact_cached_free_pfn = end_pfn;
zone->compact_blockskip_flush = false;
+ clear_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags);
if (compaction_depleted(zone)) {
if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags))
zone->compact_depletion_depth++;
+
+ /* Last resort to make high-order page */
+ if (!zone->compact_success) {
+ set_bit(ZONE_COMPACTION_SCANALLFREE,
+ &zone->flags);
+ }
+
else {
set_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
zone->compact_depletion_depth = 0;
@@ -914,7 +922,8 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
#ifdef CONFIG_COMPACTION
/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+static bool suitable_migration_target(struct compact_control *cc,
+ struct page *page)
{
/* If the page is a large free page, then disallow migration */
if (PageBuddy(page)) {
@@ -931,6 +940,16 @@ static bool suitable_migration_target(struct page *page)
if (migrate_async_suitable(get_pageblock_migratetype(page)))
return true;
+ /*
+ * Allow to scan all kinds of pageblock. Without this relaxation,
+ * all freepage could be in non-movable pageblock and compaction
+ * can be satuarated and cannot make high-order page even if there
+ * is enough freepage in the system.
+ */
+ if (cc->mode != MIGRATE_ASYNC &&
+ test_bit(ZONE_COMPACTION_SCANALLFREE, &cc->zone->flags))
+ return true;
+
/* Otherwise skip the block */
return false;
}
@@ -992,7 +1011,7 @@ static void isolate_freepages(struct compact_control *cc)
continue;
/* Check the block is suitable for migration */
- if (!suitable_migration_target(page))
+ if (!suitable_migration_target(cc, page))
continue;
/* If isolation recently failed, do not retry */
@@ -1494,6 +1513,10 @@ out:
if (test_bit(ZONE_COMPACTION_DEPLETED, &zone->flags)) {
if (!compaction_depleted(zone))
clear_bit(ZONE_COMPACTION_DEPLETED, &zone->flags);
+
+ if (zone->compact_success &&
+ test_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags))
+ clear_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags);
}
return ret;
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state
2015-08-24 2:19 ` [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state Joonsoo Kim
@ 2015-09-25 10:32 ` Vlastimil Babka
2015-09-30 8:30 ` Joonsoo Kim
0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-09-25 10:32 UTC (permalink / raw)
To: Joonsoo Kim, Andrew Morton
Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
Minchan Kim, Joonsoo Kim
On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
[...]
>
> Because we just allow freepage scanner to scan non-movable pageblock
> in very limited situation, more scanning events happen. But, allowing
> in very limited situation results in a very important benefit that
> memory isn't fragmented more than before. Fragmentation effect is
> measured on following patch so please refer it.
AFAICS it's measured only for the whole series in the cover letter, no? Just to
be sure I didn't overlook something.
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> include/linux/mmzone.h | 1 +
> mm/compaction.c | 27 +++++++++++++++++++++++++--
> 2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index e13b732..5cae0ad 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -545,6 +545,7 @@ enum zone_flags {
> */
> ZONE_FAIR_DEPLETED, /* fair zone policy batch depleted */
> ZONE_COMPACTION_DEPLETED, /* compaction possiblity depleted */
> + ZONE_COMPACTION_SCANALLFREE, /* scan all kinds of pageblocks */
"SCANALLFREE" is hard to read. Otherwise yeah, I agree scanning unmovable
pageblocks is necessary sometimes, and this seems to make a reasonable tradeoff.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state
2015-09-25 10:32 ` Vlastimil Babka
@ 2015-09-30 8:30 ` Joonsoo Kim
0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-09-30 8:30 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
David Rientjes, Minchan Kim
On Fri, Sep 25, 2015 at 12:32:02PM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
>
> [...]
>
> >
> > Because we just allow freepage scanner to scan non-movable pageblock
> > in very limited situation, more scanning events happen. But, allowing
> > in very limited situation results in a very important benefit that
> > memory isn't fragmented more than before. Fragmentation effect is
> > measured on following patch so please refer it.
>
> AFAICS it's measured only for the whole series in the cover letter, no? Just to
> be sure I didn't overlook something.
It takes too much time so no measurement is done on every patch.
I will try to measure it on at least this patch in next revision.
>
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> > include/linux/mmzone.h | 1 +
> > mm/compaction.c | 27 +++++++++++++++++++++++++--
> > 2 files changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index e13b732..5cae0ad 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -545,6 +545,7 @@ enum zone_flags {
> > */
> > ZONE_FAIR_DEPLETED, /* fair zone policy batch depleted */
> > ZONE_COMPACTION_DEPLETED, /* compaction possiblity depleted */
> > + ZONE_COMPACTION_SCANALLFREE, /* scan all kinds of pageblocks */
>
> "SCANALLFREE" is hard to read. Otherwise yeah, I agree scanning unmovable
> pageblocks is necessary sometimes, and this seems to make a reasonable tradeoff.
Good! I will think better name.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 6/9] mm/compaction: manage separate skip-bits for migration and free scanner
2015-08-24 2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
` (4 preceding siblings ...)
2015-08-24 2:19 ` [PATCH v2 5/9] mm/compaction: allow to scan nonmovable pageblock when depleted state Joonsoo Kim
@ 2015-08-24 2:19 ` Joonsoo Kim
2015-08-24 2:19 ` [PATCH v2 7/9] mm/compaction: redesign compaction Joonsoo Kim
` (2 subsequent siblings)
8 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24 2:19 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim
Currently, just one skip-bit is used for migration and free scanner
at the sametime. This has problem if migrate scanner go into
the region where free scanner marks the skip-bit. Free scanner
just checks if there is freepage or not, so there would be migratable
page. But, due to skip-bit, migrate scanner would skip scanning.
Currently, this doesn't result in any problem because migration scanner
and free scanner always meets similar position in the zone and
stops scanning at that position.
But, following patch will change compaction algorithm that migration
scanner scans whole zone range in order to get much better success rate.
In this case, skip-bit marked from freepage scanner should be ignored
but at the sametime we need to check if there is migratable page and
skip that pageblock in next time. This cannot be achived by just one
skip-bit so this patch add one more skip-bit and use each one
for migrate and free scanner, respectively.
This patch incrases memory usage that each pageblock uses 4 bit more than
before. This means that if we have 1GB memory system we lose another
256 bytes. I think this is really marginal overhead.
Motivation for compaction algorithm change will be mentioned
on following patch. Please refer it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/mmzone.h | 3 ---
include/linux/pageblock-flags.h | 37 +++++++++++++++++++++++++++----------
mm/compaction.c | 25 ++++++++++++++++---------
mm/page_alloc.c | 3 ++-
4 files changed, 45 insertions(+), 23 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5cae0ad..e641fd1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -75,9 +75,6 @@ enum {
extern int page_group_by_mobility_disabled;
-#define NR_MIGRATETYPE_BITS (PB_migrate_end - PB_migrate + 1)
-#define MIGRATETYPE_MASK ((1UL << NR_MIGRATETYPE_BITS) - 1)
-
#define get_pageblock_migratetype(page) \
get_pfnblock_flags_mask(page, page_to_pfn(page), \
PB_migrate_end, MIGRATETYPE_MASK)
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 2baeee1..de6997e 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -30,8 +30,13 @@ enum pageblock_bits {
PB_migrate,
PB_migrate_end = PB_migrate + 3 - 1,
/* 3 bits required for migrate types */
- PB_migrate_skip,/* If set the block is skipped by compaction */
+ PB_padding1, /* Padding for 4 byte aligned migrate types */
+ NR_MIGRATETYPE_BITS,
+ PB_skip_migratescan = 4,/* If set the block is skipped by compaction */
+ PB_skip_freescan,
+ PB_padding2,
+ PB_padding3,
/*
* Assume the bits will always align on a word. If this assumption
* changes then get/set pageblock needs updating.
@@ -39,6 +44,8 @@ enum pageblock_bits {
NR_PAGEBLOCK_BITS
};
+#define MIGRATETYPE_MASK ((1UL << NR_MIGRATETYPE_BITS) - 1)
+
#ifdef CONFIG_HUGETLB_PAGE
#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
@@ -87,15 +94,25 @@ void set_pfnblock_flags_mask(struct page *page,
(1 << (end_bitidx - start_bitidx + 1)) - 1)
#ifdef CONFIG_COMPACTION
-#define get_pageblock_skip(page) \
- get_pageblock_flags_group(page, PB_migrate_skip, \
- PB_migrate_skip)
-#define clear_pageblock_skip(page) \
- set_pageblock_flags_group(page, 0, PB_migrate_skip, \
- PB_migrate_skip)
-#define set_pageblock_skip(page) \
- set_pageblock_flags_group(page, 1, PB_migrate_skip, \
- PB_migrate_skip)
+#define get_pageblock_skip_migratescan(page) \
+ get_pageblock_flags_group(page, PB_skip_migratescan, \
+ PB_skip_migratescan)
+#define clear_pageblock_skip_migratescan(page) \
+ set_pageblock_flags_group(page, 0, PB_skip_migratescan, \
+ PB_skip_migratescan)
+#define set_pageblock_skip_migratescan(page) \
+ set_pageblock_flags_group(page, 1, PB_skip_migratescan, \
+ PB_skip_migratescan)
+#define get_pageblock_skip_freescan(page) \
+ get_pageblock_flags_group(page, PB_skip_freescan, \
+ PB_skip_freescan)
+#define clear_pageblock_skip_freescan(page) \
+ set_pageblock_flags_group(page, 0, PB_skip_freescan, \
+ PB_skip_freescan)
+#define set_pageblock_skip_freescan(page) \
+ set_pageblock_flags_group(page, 1, PB_skip_freescan, \
+ PB_skip_freescan)
+
#endif /* CONFIG_COMPACTION */
#endif /* PAGEBLOCK_FLAGS_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index b58f162..a259608 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -219,12 +219,15 @@ bool compaction_restarting(struct zone *zone, int order)
/* Returns true if the pageblock should be scanned for pages to isolate. */
static inline bool isolation_suitable(struct compact_control *cc,
- struct page *page)
+ struct page *page, bool migrate_scanner)
{
if (cc->ignore_skip_hint)
return true;
- return !get_pageblock_skip(page);
+ if (migrate_scanner)
+ return !get_pageblock_skip_migratescan(page);
+ else
+ return !get_pageblock_skip_freescan(page);
}
/*
@@ -275,7 +278,8 @@ static void __reset_isolation_suitable(struct zone *zone)
if (zone != page_zone(page))
continue;
- clear_pageblock_skip(page);
+ clear_pageblock_skip_migratescan(page);
+ clear_pageblock_skip_freescan(page);
}
}
@@ -317,24 +321,27 @@ static void update_pageblock_skip(struct compact_control *cc,
if (cc->migration_scan_limit == LONG_MAX && nr_isolated)
return;
- if (!nr_isolated)
- set_pageblock_skip(page);
-
/* Update where async and sync compaction should restart */
if (migrate_scanner) {
+ if (!nr_isolated)
+ set_pageblock_skip_migratescan(page);
+
if (pfn > zone->compact_cached_migrate_pfn[0])
zone->compact_cached_migrate_pfn[0] = pfn;
if (cc->mode != MIGRATE_ASYNC &&
pfn > zone->compact_cached_migrate_pfn[1])
zone->compact_cached_migrate_pfn[1] = pfn;
} else {
+ if (!nr_isolated)
+ set_pageblock_skip_freescan(page);
+
if (pfn < zone->compact_cached_free_pfn)
zone->compact_cached_free_pfn = pfn;
}
}
#else
static inline bool isolation_suitable(struct compact_control *cc,
- struct page *page)
+ struct page *page, bool migrate_scanner)
{
return true;
}
@@ -1015,7 +1022,7 @@ static void isolate_freepages(struct compact_control *cc)
continue;
/* If isolation recently failed, do not retry */
- if (!isolation_suitable(cc, page))
+ if (!isolation_suitable(cc, page, false))
continue;
/* Found a block suitable for isolating free pages from. */
@@ -1154,7 +1161,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
continue;
/* If isolation recently failed, do not retry */
- if (!isolation_suitable(cc, page))
+ if (!isolation_suitable(cc, page, true))
continue;
/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c67f853..a9a78d1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6623,7 +6623,8 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
unsigned long bitidx, word_bitidx;
unsigned long old_word, word;
- BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 4);
+ BUILD_BUG_ON(NR_PAGEBLOCK_BITS != 8);
+ BUILD_BUG_ON(NR_MIGRATETYPE_BITS != 4);
zone = page_zone(page);
bitmap = get_pageblock_bitmap(zone, pfn);
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* [PATCH v2 7/9] mm/compaction: redesign compaction
2015-08-24 2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
` (5 preceding siblings ...)
2015-08-24 2:19 ` [PATCH v2 6/9] mm/compaction: manage separate skip-bits for migration and free scanner Joonsoo Kim
@ 2015-08-24 2:19 ` Joonsoo Kim
2015-10-14 11:38 ` Vlastimil Babka
2015-08-24 2:19 ` [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at Joonsoo Kim
2015-08-24 2:19 ` [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone Joonsoo Kim
8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24 2:19 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim
Currently, compaction works as following.
1) migration scanner scans from zone_start_pfn to zone_end_pfn
to find migratable pages
2) free scanner scans from zone_end_pfn to zone_start_pfn to
find free pages
3) If both scanner crossed, compaction is finished.
This algorithm has some drawbacks. 1) Back of the zone cannot be
scanned by migration scanner because migration scanner can't pass
over freepage scanner. So, although there are some high order page
candidates at back of the zone, we can't utilize it.
Another weakness is 2) compaction's success highly depends on amount
of freepage. Compaction can migrate used pages by amount of freepage
at maximum. If we can't make high order page by this effort, both
scanner should meet and compaction will fail.
We can easily observe problem 1) by following test.
Memory is artificially fragmented to make order 3 allocation hard. And,
most of pageblocks are changed to movable migratetype.
System: 512 MB with 32 MB Zram
Memory: 25% memory is allocated to make fragmentation and 200 MB is
occupied by memory hogger. Most pageblocks are movable
migratetype.
Fragmentation: Successful order 3 allocation candidates may be around
1500 roughly.
Allocation attempts: Roughly 3000 order 3 allocation attempts
with GFP_NORETRY. This value is determined to saturate allocation
success.
Test: hogger-frag-movable
Success(N) 70
compact_stall 307
compact_success 64
compact_fail 243
pgmigrate_success 34592
compact_isolated 73977
compact_migrate_scanned 2280770
compact_free_scanned 4710313
Column 'Success(N) are calculated by following equations.
Success(N) = successful allocation * 100 /
number of successful order-3 candidates
As mentioned above, there are roughly 1500 high order page candidates,
but, compaction just returns 70% of them, because migration scanner
can't pass over freepage scanner. With new compaction approach, it can
be increased to 94% by this patch.
To check 2), hogger-frag-movable benchmark is used again, but, with some
tweaks. Amount of allocated memory by memory hogger varys.
Test: hogger-frag-movable with free memory variation
Kernel: Base
200MB-Success(N) 70
250MB-Success(N) 38
300MB-Success(N) 29
As background knowledge, up to 250MB, there is enough
memory to succeed all order-3 allocation attempts. In 300MB case,
available memory before starting allocation attempt is just 57MB,
so all of attempts cannot succeed.
Anyway, as free memory decreases, compaction success rate also decreases.
It is better to remove this dependency to get stable compaction result
in any case.
This patch solves these problems mentioned in above.
Freepage scanner is largely changed to scan zone from zone_start_pfn
to zone_end_pfn. And, by this change, compaction finish condition is also
changed that migration scanner reach zone_end_pfn. With these changes,
migration scanner can traverse anywhere in the zone.
To prevent back and forth migration within one compaction iteration,
freepage scanner marks skip-bit when scanning pageblock. migration scanner
checks it and will skip this marked pageblock so back and forth migration
cannot be possible in one compaction iteration.
If freepage scanner reachs the end of zone, it restarts at zone_start_pfn.
In this time, freepage scanner would scan the pageblock where migration
scanner has migrated some pages but fail to make high order page. This
leaved freepages means that they can't become high order page due to
the fragmentation so it is good source for freepage scanner.
With this change, above test result is:
Kernel: Base vs Redesign
Test: hogger-frag-movable
Success(N) 70 94
compact_stall 307 3642
compact_success 64 144
compact_fail 243 3498
pgmigrate_success 34592 15897219
compact_isolated 73977 31899553
compact_migrate_scanned 2280770 59146745
compact_free_scanned 4710313 49566134
Test: hogger-frag-movable with free memory variation
200MB-Success(N) 70 94
250MB-Success(N) 38 93
300MB-Success(N) 29 89
Compaction gives us almost all possible high order page. Overhead is
highly increased, but, further patch will reduce it greatly
by adjusting depletion check with this new algorithm.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
mm/compaction.c | 145 ++++++++++++++++++++++++++++++--------------------------
1 file changed, 77 insertions(+), 68 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index a259608..ca4d6d1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -53,17 +53,17 @@ static const char *const compaction_status_string[] = {
static unsigned long release_freepages(struct list_head *freelist)
{
struct page *page, *next;
- unsigned long high_pfn = 0;
+ unsigned long low_pfn = ULONG_MAX;
list_for_each_entry_safe(page, next, freelist, lru) {
unsigned long pfn = page_to_pfn(page);
list_del(&page->lru);
__free_page(page);
- if (pfn > high_pfn)
- high_pfn = pfn;
+ if (pfn < low_pfn)
+ low_pfn = pfn;
}
- return high_pfn;
+ return low_pfn;
}
static void map_pages(struct list_head *list)
@@ -243,7 +243,7 @@ static void __reset_isolation_suitable(struct zone *zone)
zone->compact_cached_migrate_pfn[0] = start_pfn;
zone->compact_cached_migrate_pfn[1] = start_pfn;
- zone->compact_cached_free_pfn = end_pfn;
+ zone->compact_cached_free_pfn = start_pfn;
zone->compact_blockskip_flush = false;
clear_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags);
@@ -335,7 +335,7 @@ static void update_pageblock_skip(struct compact_control *cc,
if (!nr_isolated)
set_pageblock_skip_freescan(page);
- if (pfn < zone->compact_cached_free_pfn)
+ if (pfn > zone->compact_cached_free_pfn)
zone->compact_cached_free_pfn = pfn;
}
}
@@ -869,8 +869,11 @@ isolate_success:
nr_scanned, nr_isolated);
count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
- if (nr_isolated)
+ if (nr_isolated) {
count_compact_events(COMPACTISOLATED, nr_isolated);
+ if (valid_page)
+ clear_pageblock_skip_freescan(valid_page);
+ }
return low_pfn;
}
@@ -969,12 +972,14 @@ static void isolate_freepages(struct compact_control *cc)
{
struct zone *zone = cc->zone;
struct page *page;
+ unsigned long pfn;
unsigned long block_start_pfn; /* start of current pageblock */
- unsigned long isolate_start_pfn; /* exact pfn we start at */
unsigned long block_end_pfn; /* end of current pageblock */
- unsigned long low_pfn; /* lowest pfn scanner is able to scan */
struct list_head *freelist = &cc->freepages;
+ unsigned long nr_isolated;
+ bool rotated = false;
+retry:
/*
* Initialise the free scanner. The starting point is where we last
* successfully isolated from, zone-cached value, or the end of the
@@ -986,22 +991,21 @@ static void isolate_freepages(struct compact_control *cc)
* The low boundary is the end of the pageblock the migration scanner
* is using.
*/
- isolate_start_pfn = cc->free_pfn;
- block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
- block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
- zone_end_pfn(zone));
- low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
+ pfn = cc->free_pfn;
- /*
- * Isolate free pages until enough are available to migrate the
- * pages on cc->migratepages. We stop searching if the migrate
- * and free page scanners meet or enough free pages are isolated.
- */
- for (; block_start_pfn >= low_pfn &&
- cc->nr_migratepages > cc->nr_freepages;
- block_end_pfn = block_start_pfn,
- block_start_pfn -= pageblock_nr_pages,
- isolate_start_pfn = block_start_pfn) {
+ for (; pfn < zone_end_pfn(zone) &&
+ cc->nr_migratepages > cc->nr_freepages;) {
+
+ block_start_pfn = pfn & ~(pageblock_nr_pages-1);
+ block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+ block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+ /* Skip the pageblock where migration scan is */
+ if (block_start_pfn ==
+ (cc->migrate_pfn & ~(pageblock_nr_pages-1))) {
+ pfn = block_end_pfn;
+ continue;
+ }
/*
* This can iterate a massively long zone without finding any
@@ -1012,35 +1016,31 @@ static void isolate_freepages(struct compact_control *cc)
&& compact_should_abort(cc))
break;
- page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
- zone);
- if (!page)
+ page = pageblock_pfn_to_page(pfn, block_end_pfn, zone);
+ if (!page) {
+ pfn = block_end_pfn;
continue;
+ }
/* Check the block is suitable for migration */
- if (!suitable_migration_target(cc, page))
+ if (!suitable_migration_target(cc, page)) {
+ pfn = block_end_pfn;
continue;
+ }
/* If isolation recently failed, do not retry */
- if (!isolation_suitable(cc, page, false))
+ if (!isolation_suitable(cc, page, false)) {
+ pfn = block_end_pfn;
continue;
+ }
/* Found a block suitable for isolating free pages from. */
- isolate_freepages_block(cc, &isolate_start_pfn,
+ nr_isolated = isolate_freepages_block(cc, &pfn,
block_end_pfn, freelist, false);
- /*
- * Remember where the free scanner should restart next time,
- * which is where isolate_freepages_block() left off.
- * But if it scanned the whole pageblock, isolate_start_pfn
- * now points at block_end_pfn, which is the start of the next
- * pageblock.
- * In that case we will however want to restart at the start
- * of the previous pageblock.
- */
- cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
- isolate_start_pfn :
- block_start_pfn - pageblock_nr_pages;
+ /* To prevent back and forth migration */
+ if (nr_isolated)
+ set_pageblock_skip_migratescan(page);
/*
* isolate_freepages_block() might have aborted due to async
@@ -1053,12 +1053,15 @@ static void isolate_freepages(struct compact_control *cc)
/* split_free_page does not map the pages */
map_pages(freelist);
- /*
- * If we crossed the migrate scanner, we want to keep it that way
- * so that compact_finished() may detect this
- */
- if (block_start_pfn < low_pfn)
- cc->free_pfn = cc->migrate_pfn;
+ cc->free_pfn = pfn;
+ if (cc->free_pfn >= zone_end_pfn(zone)) {
+ cc->free_pfn = zone->zone_start_pfn;
+ zone->compact_cached_free_pfn = cc->free_pfn;
+ if (cc->nr_freepages == 0 && !rotated) {
+ rotated = true;
+ goto retry;
+ }
+ }
}
/*
@@ -1144,8 +1147,9 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
* Iterate over whole pageblocks until we find the first suitable.
* Do not cross the free scanner.
*/
- for (; end_pfn <= cc->free_pfn;
- low_pfn = end_pfn, end_pfn += pageblock_nr_pages) {
+ for (; low_pfn < zone_end_pfn(zone); low_pfn = end_pfn) {
+ end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
+ end_pfn = min(end_pfn, zone_end_pfn(zone));
/*
* This can potentially iterate a massively long zone with
@@ -1191,12 +1195,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
}
acct_isolated(zone, cc);
- /*
- * Record where migration scanner will be restarted. If we end up in
- * the same pageblock as the free scanner, make the scanners fully
- * meet so that compact_finished() terminates compaction.
- */
- cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
+ cc->migrate_pfn = low_pfn;
return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
}
@@ -1211,11 +1210,15 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
return COMPACT_PARTIAL;
/* Compaction run completes if the migrate and free scanner meet */
- if (cc->free_pfn <= cc->migrate_pfn) {
+ if (cc->migrate_pfn >= zone_end_pfn(zone)) {
+ /* Stop the async compaction */
+ zone->compact_cached_migrate_pfn[0] = zone_end_pfn(zone);
+ if (cc->mode == MIGRATE_ASYNC)
+ return COMPACT_PARTIAL;
+
/* Let the next compaction start anew. */
zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
- zone->compact_cached_free_pfn = zone_end_pfn(zone);
/*
* Mark that the PG_migrate_skip information should be cleared
@@ -1397,11 +1400,14 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
*/
cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
cc->free_pfn = zone->compact_cached_free_pfn;
- if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
- cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
+ if (cc->mode == MIGRATE_ASYNC && cc->migrate_pfn >= end_pfn)
+ return COMPACT_SKIPPED;
+
+ if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
+ cc->free_pfn = start_pfn;
zone->compact_cached_free_pfn = cc->free_pfn;
}
- if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
+ if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
cc->migrate_pfn = start_pfn;
zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
@@ -1450,11 +1456,16 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
if (err) {
putback_movable_pages(&cc->migratepages);
/*
- * migrate_pages() may return -ENOMEM when scanners meet
- * and we want compact_finished() to detect it
+ * migrate_pages() may return -ENOMEM when free scanner
+ * doesn't find any freepage, and, in this case, more
+ * compaction attempt is useless because it cannot
+ * be successful due to no free memory. We'd like to
+ * consider this state as COMPACT_COMPLETE although
+ * migration scanner doesn't reach end_pfn of zone and
+ * fail to compaction.
*/
- if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
- ret = COMPACT_PARTIAL;
+ if (err == -ENOMEM) {
+ ret = COMPACT_COMPLETE;
goto out;
}
}
@@ -1504,13 +1515,11 @@ out:
cc->nr_freepages = 0;
VM_BUG_ON(free_pfn == 0);
- /* The cached pfn is always the first in a pageblock */
- free_pfn &= ~(pageblock_nr_pages-1);
/*
* Only go back, not forward. The cached pfn might have been
* already reset to zone end in compact_finished()
*/
- if (free_pfn > zone->compact_cached_free_pfn)
+ if (free_pfn < zone->compact_cached_free_pfn)
zone->compact_cached_free_pfn = free_pfn;
}
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 7/9] mm/compaction: redesign compaction
2015-08-24 2:19 ` [PATCH v2 7/9] mm/compaction: redesign compaction Joonsoo Kim
@ 2015-10-14 11:38 ` Vlastimil Babka
2015-10-15 2:38 ` Joonsoo Kim
0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-10-14 11:38 UTC (permalink / raw)
To: Joonsoo Kim, Andrew Morton
Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
Minchan Kim, Joonsoo Kim
On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
[...]
> This patch solves these problems mentioned in above.
> Freepage scanner is largely changed to scan zone from zone_start_pfn
> to zone_end_pfn. And, by this change, compaction finish condition is also
> changed that migration scanner reach zone_end_pfn. With these changes,
> migration scanner can traverse anywhere in the zone.
>
> To prevent back and forth migration within one compaction iteration,
> freepage scanner marks skip-bit when scanning pageblock. migration scanner
> checks it and will skip this marked pageblock so back and forth migration
> cannot be possible in one compaction iteration.
>
> If freepage scanner reachs the end of zone, it restarts at zone_start_pfn.
> In this time, freepage scanner would scan the pageblock where migration
> scanner has migrated some pages but fail to make high order page. This
> leaved freepages means that they can't become high order page due to
> the fragmentation so it is good source for freepage scanner.
Hmm, there might be danger that freepage scanner will proceed faster than
migration scanner, as evidenced by the current code where the scanners meet
earlier than in the middle. Thus, it would likely mark pageblocks as unsuitable
for migration scanner, before it can reach them?
> With this change, above test result is:
>
> Kernel: Base vs Redesign
>
> Test: hogger-frag-movable
>
> Success(N) 70 94
> compact_stall 307 3642
> compact_success 64 144
> compact_fail 243 3498
> pgmigrate_success 34592 15897219
> compact_isolated 73977 31899553
> compact_migrate_scanned 2280770 59146745
> compact_free_scanned 4710313 49566134
>
> Test: hogger-frag-movable with free memory variation
>
> 200MB-Success(N) 70 94
> 250MB-Success(N) 38 93
> 300MB-Success(N) 29 89
>
> Compaction gives us almost all possible high order page. Overhead is
> highly increased, but, further patch will reduce it greatly
> by adjusting depletion check with this new algorithm.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> mm/compaction.c | 145 ++++++++++++++++++++++++++++++--------------------------
> 1 file changed, 77 insertions(+), 68 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index a259608..ca4d6d1 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -53,17 +53,17 @@ static const char *const compaction_status_string[] = {
> static unsigned long release_freepages(struct list_head *freelist)
> {
> struct page *page, *next;
> - unsigned long high_pfn = 0;
> + unsigned long low_pfn = ULONG_MAX;
>
> list_for_each_entry_safe(page, next, freelist, lru) {
> unsigned long pfn = page_to_pfn(page);
> list_del(&page->lru);
> __free_page(page);
> - if (pfn > high_pfn)
> - high_pfn = pfn;
> + if (pfn < low_pfn)
> + low_pfn = pfn;
> }
>
> - return high_pfn;
> + return low_pfn;
This assumes that the freelist won't contain both pages from the end of the
zone, and from the beginning the zone after the free scanner has wrapped. Which
is true, but not immediately obvious, so it deserves some comment.
> }
>
> static void map_pages(struct list_head *list)
> @@ -243,7 +243,7 @@ static void __reset_isolation_suitable(struct zone *zone)
>
> zone->compact_cached_migrate_pfn[0] = start_pfn;
> zone->compact_cached_migrate_pfn[1] = start_pfn;
> - zone->compact_cached_free_pfn = end_pfn;
> + zone->compact_cached_free_pfn = start_pfn;
> zone->compact_blockskip_flush = false;
>
> clear_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags);
> @@ -335,7 +335,7 @@ static void update_pageblock_skip(struct compact_control *cc,
> if (!nr_isolated)
> set_pageblock_skip_freescan(page);
>
> - if (pfn < zone->compact_cached_free_pfn)
> + if (pfn > zone->compact_cached_free_pfn)
> zone->compact_cached_free_pfn = pfn;
> }
> }
> @@ -869,8 +869,11 @@ isolate_success:
> nr_scanned, nr_isolated);
>
> count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
> - if (nr_isolated)
> + if (nr_isolated) {
> count_compact_events(COMPACTISOLATED, nr_isolated);
> + if (valid_page)
> + clear_pageblock_skip_freescan(valid_page);
This is explained in changelog, but a comment here would be useful too.
> + }
>
> return low_pfn;
> }
> @@ -969,12 +972,14 @@ static void isolate_freepages(struct compact_control *cc)
> {
> struct zone *zone = cc->zone;
> struct page *page;
> + unsigned long pfn;
> unsigned long block_start_pfn; /* start of current pageblock */
> - unsigned long isolate_start_pfn; /* exact pfn we start at */
> unsigned long block_end_pfn; /* end of current pageblock */
> - unsigned long low_pfn; /* lowest pfn scanner is able to scan */
> struct list_head *freelist = &cc->freepages;
> + unsigned long nr_isolated;
> + bool rotated = false;
>
> +retry:
> /*
> * Initialise the free scanner. The starting point is where we last
> * successfully isolated from, zone-cached value, or the end of the
> @@ -986,22 +991,21 @@ static void isolate_freepages(struct compact_control *cc)
> * The low boundary is the end of the pageblock the migration scanner
> * is using.
> */
> - isolate_start_pfn = cc->free_pfn;
> - block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
> - block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
> - zone_end_pfn(zone));
> - low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
> + pfn = cc->free_pfn;
>
> - /*
> - * Isolate free pages until enough are available to migrate the
> - * pages on cc->migratepages. We stop searching if the migrate
> - * and free page scanners meet or enough free pages are isolated.
> - */
> - for (; block_start_pfn >= low_pfn &&
> - cc->nr_migratepages > cc->nr_freepages;
> - block_end_pfn = block_start_pfn,
> - block_start_pfn -= pageblock_nr_pages,
> - isolate_start_pfn = block_start_pfn) {
> + for (; pfn < zone_end_pfn(zone) &&
> + cc->nr_migratepages > cc->nr_freepages;) {
> +
> + block_start_pfn = pfn & ~(pageblock_nr_pages-1);
> + block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
> + block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> + /* Skip the pageblock where migration scan is */
> + if (block_start_pfn ==
> + (cc->migrate_pfn & ~(pageblock_nr_pages-1))) {
> + pfn = block_end_pfn;
Can't this "pfn = block_end_pfn" be in the for-loop header? Can it ever happen
that all of the following is true? I don't think it can?
- isolate_freepages_block() returns before reaching block_end_pfn
- it hasn't isolated enough, so cc->nr_migratepages > cc->nr_freepages is true
- cc->contended is not set
> + continue;
> + }
>
> /*
> * This can iterate a massively long zone without finding any
> @@ -1012,35 +1016,31 @@ static void isolate_freepages(struct compact_control *cc)
> && compact_should_abort(cc))
> break;
>
> - page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
> - zone);
> - if (!page)
> + page = pageblock_pfn_to_page(pfn, block_end_pfn, zone);
> + if (!page) {
> + pfn = block_end_pfn;
> continue;
> + }
>
> /* Check the block is suitable for migration */
> - if (!suitable_migration_target(cc, page))
> + if (!suitable_migration_target(cc, page)) {
> + pfn = block_end_pfn;
> continue;
> + }
>
> /* If isolation recently failed, do not retry */
> - if (!isolation_suitable(cc, page, false))
> + if (!isolation_suitable(cc, page, false)) {
> + pfn = block_end_pfn;
> continue;
> + }
>
> /* Found a block suitable for isolating free pages from. */
> - isolate_freepages_block(cc, &isolate_start_pfn,
> + nr_isolated = isolate_freepages_block(cc, &pfn,
> block_end_pfn, freelist, false);
>
> - /*
> - * Remember where the free scanner should restart next time,
> - * which is where isolate_freepages_block() left off.
> - * But if it scanned the whole pageblock, isolate_start_pfn
> - * now points at block_end_pfn, which is the start of the next
> - * pageblock.
> - * In that case we will however want to restart at the start
> - * of the previous pageblock.
> - */
> - cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
> - isolate_start_pfn :
> - block_start_pfn - pageblock_nr_pages;
> + /* To prevent back and forth migration */
> + if (nr_isolated)
> + set_pageblock_skip_migratescan(page);
>
> /*
> * isolate_freepages_block() might have aborted due to async
> @@ -1053,12 +1053,15 @@ static void isolate_freepages(struct compact_control *cc)
> /* split_free_page does not map the pages */
> map_pages(freelist);
>
> - /*
> - * If we crossed the migrate scanner, we want to keep it that way
> - * so that compact_finished() may detect this
> - */
> - if (block_start_pfn < low_pfn)
> - cc->free_pfn = cc->migrate_pfn;
> + cc->free_pfn = pfn;
> + if (cc->free_pfn >= zone_end_pfn(zone)) {
> + cc->free_pfn = zone->zone_start_pfn;
> + zone->compact_cached_free_pfn = cc->free_pfn;
> + if (cc->nr_freepages == 0 && !rotated) {
> + rotated = true;
> + goto retry;
This too deserves explanation. If we isolated at least something, we can return
and not "rotate" (wrap) to the zone start immediately. But if we isolated
nothing, we need to continue immediately, otherwise migration fails on -ENOMEM
and so will the whole compaction. And the fact, that if we isolated something,
we return immediately, has implications for release_freepages() being correct,
as I noted above.
> + }
> + }
> }
>
> /*
> @@ -1144,8 +1147,9 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> * Iterate over whole pageblocks until we find the first suitable.
> * Do not cross the free scanner.
> */
> - for (; end_pfn <= cc->free_pfn;
> - low_pfn = end_pfn, end_pfn += pageblock_nr_pages) {
> + for (; low_pfn < zone_end_pfn(zone); low_pfn = end_pfn) {
> + end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
> + end_pfn = min(end_pfn, zone_end_pfn(zone));
>
> /*
> * This can potentially iterate a massively long zone with
> @@ -1191,12 +1195,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> }
>
> acct_isolated(zone, cc);
> - /*
> - * Record where migration scanner will be restarted. If we end up in
> - * the same pageblock as the free scanner, make the scanners fully
> - * meet so that compact_finished() terminates compaction.
> - */
> - cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
> + cc->migrate_pfn = low_pfn;
>
> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> }
> @@ -1211,11 +1210,15 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
> return COMPACT_PARTIAL;
>
> /* Compaction run completes if the migrate and free scanner meet */
This is no longer true :)
> - if (cc->free_pfn <= cc->migrate_pfn) {
> + if (cc->migrate_pfn >= zone_end_pfn(zone)) {
> + /* Stop the async compaction */
> + zone->compact_cached_migrate_pfn[0] = zone_end_pfn(zone);
> + if (cc->mode == MIGRATE_ASYNC)
> + return COMPACT_PARTIAL;
This is not explained in changelog? And I don't immediately see why this is needed?
> +
> /* Let the next compaction start anew. */
> zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> - zone->compact_cached_free_pfn = zone_end_pfn(zone);
>
> /*
> * Mark that the PG_migrate_skip information should be cleared
> @@ -1397,11 +1400,14 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> */
> cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
> cc->free_pfn = zone->compact_cached_free_pfn;
> - if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
> - cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
> + if (cc->mode == MIGRATE_ASYNC && cc->migrate_pfn >= end_pfn)
> + return COMPACT_SKIPPED;
Ditto. Is this needed for correctness, or another optimization that should be
separate patch?
> +
> + if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
> + cc->free_pfn = start_pfn;
> zone->compact_cached_free_pfn = cc->free_pfn;
> }
> - if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
> + if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
> cc->migrate_pfn = start_pfn;
> zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
> zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
> @@ -1450,11 +1456,16 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> if (err) {
> putback_movable_pages(&cc->migratepages);
> /*
> - * migrate_pages() may return -ENOMEM when scanners meet
> - * and we want compact_finished() to detect it
> + * migrate_pages() may return -ENOMEM when free scanner
> + * doesn't find any freepage, and, in this case, more
> + * compaction attempt is useless because it cannot
> + * be successful due to no free memory. We'd like to
> + * consider this state as COMPACT_COMPLETE although
> + * migration scanner doesn't reach end_pfn of zone and
> + * fail to compaction.
Hmm is there a danger of frequent completes like these leading to premature
cached pfn resets, so migration scanner will again not reach the whole zone?
> */
> - if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
> - ret = COMPACT_PARTIAL;
> + if (err == -ENOMEM) {
> + ret = COMPACT_COMPLETE;
> goto out;
> }
> }
> @@ -1504,13 +1515,11 @@ out:
>
> cc->nr_freepages = 0;
> VM_BUG_ON(free_pfn == 0);
> - /* The cached pfn is always the first in a pageblock */
> - free_pfn &= ~(pageblock_nr_pages-1);
> /*
> * Only go back, not forward. The cached pfn might have been
> * already reset to zone end in compact_finished()
> */
> - if (free_pfn > zone->compact_cached_free_pfn)
> + if (free_pfn < zone->compact_cached_free_pfn)
> zone->compact_cached_free_pfn = free_pfn;
> }
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 7/9] mm/compaction: redesign compaction
2015-10-14 11:38 ` Vlastimil Babka
@ 2015-10-15 2:38 ` Joonsoo Kim
0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-10-15 2:38 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
David Rientjes, Minchan Kim
On Wed, Oct 14, 2015 at 01:38:07PM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
>
> [...]
>
> > This patch solves these problems mentioned in above.
> > Freepage scanner is largely changed to scan zone from zone_start_pfn
> > to zone_end_pfn. And, by this change, compaction finish condition is also
> > changed that migration scanner reach zone_end_pfn. With these changes,
> > migration scanner can traverse anywhere in the zone.
> >
> > To prevent back and forth migration within one compaction iteration,
> > freepage scanner marks skip-bit when scanning pageblock. migration scanner
> > checks it and will skip this marked pageblock so back and forth migration
> > cannot be possible in one compaction iteration.
> >
> > If freepage scanner reachs the end of zone, it restarts at zone_start_pfn.
> > In this time, freepage scanner would scan the pageblock where migration
> > scanner has migrated some pages but fail to make high order page. This
> > leaved freepages means that they can't become high order page due to
> > the fragmentation so it is good source for freepage scanner.
>
> Hmm, there might be danger that freepage scanner will proceed faster than
> migration scanner, as evidenced by the current code where the scanners meet
> earlier than in the middle. Thus, it would likely mark pageblocks as unsuitable
> for migration scanner, before it can reach them?
That's right.
>
> > With this change, above test result is:
> >
> > Kernel: Base vs Redesign
> >
> > Test: hogger-frag-movable
> >
> > Success(N) 70 94
> > compact_stall 307 3642
> > compact_success 64 144
> > compact_fail 243 3498
> > pgmigrate_success 34592 15897219
> > compact_isolated 73977 31899553
> > compact_migrate_scanned 2280770 59146745
> > compact_free_scanned 4710313 49566134
> >
> > Test: hogger-frag-movable with free memory variation
> >
> > 200MB-Success(N) 70 94
> > 250MB-Success(N) 38 93
> > 300MB-Success(N) 29 89
> >
> > Compaction gives us almost all possible high order page. Overhead is
> > highly increased, but, further patch will reduce it greatly
> > by adjusting depletion check with this new algorithm.
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> > mm/compaction.c | 145 ++++++++++++++++++++++++++++++--------------------------
> > 1 file changed, 77 insertions(+), 68 deletions(-)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index a259608..ca4d6d1 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -53,17 +53,17 @@ static const char *const compaction_status_string[] = {
> > static unsigned long release_freepages(struct list_head *freelist)
> > {
> > struct page *page, *next;
> > - unsigned long high_pfn = 0;
> > + unsigned long low_pfn = ULONG_MAX;
> >
> > list_for_each_entry_safe(page, next, freelist, lru) {
> > unsigned long pfn = page_to_pfn(page);
> > list_del(&page->lru);
> > __free_page(page);
> > - if (pfn > high_pfn)
> > - high_pfn = pfn;
> > + if (pfn < low_pfn)
> > + low_pfn = pfn;
> > }
> >
> > - return high_pfn;
> > + return low_pfn;
>
> This assumes that the freelist won't contain both pages from the end of the
> zone, and from the beginning the zone after the free scanner has wrapped. Which
> is true, but not immediately obvious, so it deserves some comment.
Okay. Will add it.
> > }
> >
> > static void map_pages(struct list_head *list)
> > @@ -243,7 +243,7 @@ static void __reset_isolation_suitable(struct zone *zone)
> >
> > zone->compact_cached_migrate_pfn[0] = start_pfn;
> > zone->compact_cached_migrate_pfn[1] = start_pfn;
> > - zone->compact_cached_free_pfn = end_pfn;
> > + zone->compact_cached_free_pfn = start_pfn;
> > zone->compact_blockskip_flush = false;
> >
> > clear_bit(ZONE_COMPACTION_SCANALLFREE, &zone->flags);
> > @@ -335,7 +335,7 @@ static void update_pageblock_skip(struct compact_control *cc,
> > if (!nr_isolated)
> > set_pageblock_skip_freescan(page);
> >
> > - if (pfn < zone->compact_cached_free_pfn)
> > + if (pfn > zone->compact_cached_free_pfn)
> > zone->compact_cached_free_pfn = pfn;
> > }
> > }
> > @@ -869,8 +869,11 @@ isolate_success:
> > nr_scanned, nr_isolated);
> >
> > count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
> > - if (nr_isolated)
> > + if (nr_isolated) {
> > count_compact_events(COMPACTISOLATED, nr_isolated);
> > + if (valid_page)
> > + clear_pageblock_skip_freescan(valid_page);
>
> This is explained in changelog, but a comment here would be useful too.
Okay.
> > + }
> >
> > return low_pfn;
> > }
> > @@ -969,12 +972,14 @@ static void isolate_freepages(struct compact_control *cc)
> > {
> > struct zone *zone = cc->zone;
> > struct page *page;
> > + unsigned long pfn;
> > unsigned long block_start_pfn; /* start of current pageblock */
> > - unsigned long isolate_start_pfn; /* exact pfn we start at */
> > unsigned long block_end_pfn; /* end of current pageblock */
> > - unsigned long low_pfn; /* lowest pfn scanner is able to scan */
> > struct list_head *freelist = &cc->freepages;
> > + unsigned long nr_isolated;
> > + bool rotated = false;
> >
> > +retry:
> > /*
> > * Initialise the free scanner. The starting point is where we last
> > * successfully isolated from, zone-cached value, or the end of the
> > @@ -986,22 +991,21 @@ static void isolate_freepages(struct compact_control *cc)
> > * The low boundary is the end of the pageblock the migration scanner
> > * is using.
> > */
> > - isolate_start_pfn = cc->free_pfn;
> > - block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
> > - block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
> > - zone_end_pfn(zone));
> > - low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
> > + pfn = cc->free_pfn;
> >
> > - /*
> > - * Isolate free pages until enough are available to migrate the
> > - * pages on cc->migratepages. We stop searching if the migrate
> > - * and free page scanners meet or enough free pages are isolated.
> > - */
> > - for (; block_start_pfn >= low_pfn &&
> > - cc->nr_migratepages > cc->nr_freepages;
> > - block_end_pfn = block_start_pfn,
> > - block_start_pfn -= pageblock_nr_pages,
> > - isolate_start_pfn = block_start_pfn) {
> > + for (; pfn < zone_end_pfn(zone) &&
> > + cc->nr_migratepages > cc->nr_freepages;) {
> > +
> > + block_start_pfn = pfn & ~(pageblock_nr_pages-1);
> > + block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
> > + block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> > +
> > + /* Skip the pageblock where migration scan is */
> > + if (block_start_pfn ==
> > + (cc->migrate_pfn & ~(pageblock_nr_pages-1))) {
> > + pfn = block_end_pfn;
>
> Can't this "pfn = block_end_pfn" be in the for-loop header? Can it ever happen
You mean following thing?
for (; pfn < zone_end_pfn(zone) && cc->nr_migratepages >
cc->nr_freepages; pfn = block_end_pfn)
It is possible. Will do it.
> that all of the following is true? I don't think it can?
> - isolate_freepages_block() returns before reaching block_end_pfn
> - it hasn't isolated enough, so cc->nr_migratepages > cc->nr_freepages is true
> - cc->contended is not set
Do you want to move out this check to out of the loop? It isn't
possible because freepage scanner could skip some pageblock in the loop.
If you mean other thing, please explain more.
> > + continue;
> > + }
> >
> > /*
> > * This can iterate a massively long zone without finding any
> > @@ -1012,35 +1016,31 @@ static void isolate_freepages(struct compact_control *cc)
> > && compact_should_abort(cc))
> > break;
> >
> > - page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
> > - zone);
> > - if (!page)
> > + page = pageblock_pfn_to_page(pfn, block_end_pfn, zone);
> > + if (!page) {
> > + pfn = block_end_pfn;
> > continue;
> > + }
> >
> > /* Check the block is suitable for migration */
> > - if (!suitable_migration_target(cc, page))
> > + if (!suitable_migration_target(cc, page)) {
> > + pfn = block_end_pfn;
> > continue;
> > + }
> >
> > /* If isolation recently failed, do not retry */
> > - if (!isolation_suitable(cc, page, false))
> > + if (!isolation_suitable(cc, page, false)) {
> > + pfn = block_end_pfn;
> > continue;
> > + }
> >
> > /* Found a block suitable for isolating free pages from. */
> > - isolate_freepages_block(cc, &isolate_start_pfn,
> > + nr_isolated = isolate_freepages_block(cc, &pfn,
> > block_end_pfn, freelist, false);
> >
> > - /*
> > - * Remember where the free scanner should restart next time,
> > - * which is where isolate_freepages_block() left off.
> > - * But if it scanned the whole pageblock, isolate_start_pfn
> > - * now points at block_end_pfn, which is the start of the next
> > - * pageblock.
> > - * In that case we will however want to restart at the start
> > - * of the previous pageblock.
> > - */
> > - cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
> > - isolate_start_pfn :
> > - block_start_pfn - pageblock_nr_pages;
> > + /* To prevent back and forth migration */
> > + if (nr_isolated)
> > + set_pageblock_skip_migratescan(page);
> >
> > /*
> > * isolate_freepages_block() might have aborted due to async
> > @@ -1053,12 +1053,15 @@ static void isolate_freepages(struct compact_control *cc)
> > /* split_free_page does not map the pages */
> > map_pages(freelist);
> >
> > - /*
> > - * If we crossed the migrate scanner, we want to keep it that way
> > - * so that compact_finished() may detect this
> > - */
> > - if (block_start_pfn < low_pfn)
> > - cc->free_pfn = cc->migrate_pfn;
> > + cc->free_pfn = pfn;
> > + if (cc->free_pfn >= zone_end_pfn(zone)) {
> > + cc->free_pfn = zone->zone_start_pfn;
> > + zone->compact_cached_free_pfn = cc->free_pfn;
> > + if (cc->nr_freepages == 0 && !rotated) {
> > + rotated = true;
> > + goto retry;
>
> This too deserves explanation. If we isolated at least something, we can return
> and not "rotate" (wrap) to the zone start immediately. But if we isolated
> nothing, we need to continue immediately, otherwise migration fails on -ENOMEM
> and so will the whole compaction. And the fact, that if we isolated something,
> we return immediately, has implications for release_freepages() being correct,
> as I noted above.
Yes. you're right. I will add some comment.
> > + }
> > + }
> > }
> >
> > /*
> > @@ -1144,8 +1147,9 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> > * Iterate over whole pageblocks until we find the first suitable.
> > * Do not cross the free scanner.
> > */
> > - for (; end_pfn <= cc->free_pfn;
> > - low_pfn = end_pfn, end_pfn += pageblock_nr_pages) {
> > + for (; low_pfn < zone_end_pfn(zone); low_pfn = end_pfn) {
> > + end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
> > + end_pfn = min(end_pfn, zone_end_pfn(zone));
> >
> > /*
> > * This can potentially iterate a massively long zone with
> > @@ -1191,12 +1195,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> > }
> >
> > acct_isolated(zone, cc);
> > - /*
> > - * Record where migration scanner will be restarted. If we end up in
> > - * the same pageblock as the free scanner, make the scanners fully
> > - * meet so that compact_finished() terminates compaction.
> > - */
> > - cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
> > + cc->migrate_pfn = low_pfn;
> >
> > return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> > }
> > @@ -1211,11 +1210,15 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
> > return COMPACT_PARTIAL;
> >
> > /* Compaction run completes if the migrate and free scanner meet */
>
> This is no longer true :)
Will remove.
> > - if (cc->free_pfn <= cc->migrate_pfn) {
> > + if (cc->migrate_pfn >= zone_end_pfn(zone)) {
> > + /* Stop the async compaction */
> > + zone->compact_cached_migrate_pfn[0] = zone_end_pfn(zone);
> > + if (cc->mode == MIGRATE_ASYNC)
> > + return COMPACT_PARTIAL;
>
> This is not explained in changelog? And I don't immediately see why this is needed?
Yeah. I should explain this change. It's my mistake.
Purpose of this change is giving the opportunity that *sync* compaction
scans whole zone range. If async compaction reach the end of the zone and
compaction is complete and reset, compaction would not scan whole zone
in some cases like as many non-movable pageblock exists.
After async compaction reach the end of the zone, async compaction
will be skipped and only sync compaction will work. It prevents needless
overhead that async compaction scan the range where async compaction already
fail to make high-order page.
> > +
> > /* Let the next compaction start anew. */
> > zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> > zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> > - zone->compact_cached_free_pfn = zone_end_pfn(zone);
> >
> > /*
> > * Mark that the PG_migrate_skip information should be cleared
> > @@ -1397,11 +1400,14 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> > */
> > cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
> > cc->free_pfn = zone->compact_cached_free_pfn;
> > - if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
> > - cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
> > + if (cc->mode == MIGRATE_ASYNC && cc->migrate_pfn >= end_pfn)
> > + return COMPACT_SKIPPED;
>
> Ditto. Is this needed for correctness, or another optimization that should be
> separate patch?
Explained above.
> > +
> > + if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
> > + cc->free_pfn = start_pfn;
> > zone->compact_cached_free_pfn = cc->free_pfn;
> > }
> > - if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
> > + if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
> > cc->migrate_pfn = start_pfn;
> > zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
> > zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
> > @@ -1450,11 +1456,16 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> > if (err) {
> > putback_movable_pages(&cc->migratepages);
> > /*
> > - * migrate_pages() may return -ENOMEM when scanners meet
> > - * and we want compact_finished() to detect it
> > + * migrate_pages() may return -ENOMEM when free scanner
> > + * doesn't find any freepage, and, in this case, more
> > + * compaction attempt is useless because it cannot
> > + * be successful due to no free memory. We'd like to
> > + * consider this state as COMPACT_COMPLETE although
> > + * migration scanner doesn't reach end_pfn of zone and
> > + * fail to compaction.
>
> Hmm is there a danger of frequent completes like these leading to premature
> cached pfn resets, so migration scanner will again not reach the whole zone?
ENOMEM would happen when most of pageblocks are non-movable and freepage are
in non-movable pageblock. After some failure, freepage scanner will try to
scan non-movable pageblock so can escape this situation.
Thanks.
> > */
> > - if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
> > - ret = COMPACT_PARTIAL;
> > + if (err == -ENOMEM) {
> > + ret = COMPACT_COMPLETE;
> > goto out;
> > }
> > }
> > @@ -1504,13 +1515,11 @@ out:
> >
> > cc->nr_freepages = 0;
> > VM_BUG_ON(free_pfn == 0);
> > - /* The cached pfn is always the first in a pageblock */
> > - free_pfn &= ~(pageblock_nr_pages-1);
> > /*
> > * Only go back, not forward. The cached pfn might have been
> > * already reset to zone end in compact_finished()
> > */
> > - if (free_pfn > zone->compact_cached_free_pfn)
> > + if (free_pfn < zone->compact_cached_free_pfn)
> > zone->compact_cached_free_pfn = free_pfn;
> > }
> >
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at
2015-08-24 2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
` (6 preceding siblings ...)
2015-08-24 2:19 ` [PATCH v2 7/9] mm/compaction: redesign compaction Joonsoo Kim
@ 2015-08-24 2:19 ` Joonsoo Kim
2015-09-25 11:41 ` Vlastimil Babka
2015-08-24 2:19 ` [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone Joonsoo Kim
8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24 2:19 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim
Purpose of compaction is to make high order page. To achive this purpose,
it is the best strategy that compaction migrates contiguous used pages
to fragmented unused freepages. Currently, freepage scanner don't
distinguish whether freepage is fragmented or not and blindly use
any freepage for migration target regardless of freepage's order.
Using higher order freepage than compaction aims at is not good because
what we do here is breaking high order freepage at somewhere and migrating
used pages from elsewhere to this broken high order freepages in order to
make new high order freepage. That is just position change of high order
freepage.
This is useless effort and doesn't help to make more high order freepages
because we can't be sure that migrating used pages makes high order
freepage. So, this patch makes freepage scanner only uses the ordered
freepage lower than compaction order.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
mm/compaction.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/mm/compaction.c b/mm/compaction.c
index ca4d6d1..e61ee77 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -455,6 +455,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
unsigned long flags = 0;
bool locked = false;
unsigned long blockpfn = *start_pfn;
+ unsigned long freepage_order;
cursor = pfn_to_page(blockpfn);
@@ -482,6 +483,20 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
if (!PageBuddy(page))
goto isolate_fail;
+ if (!strict && cc->order != -1) {
+ freepage_order = page_order_unsafe(page);
+
+ if (freepage_order > 0 && freepage_order < MAX_ORDER) {
+ /*
+ * Do not use high order freepage for migration
+ * taret. It would not be beneficial for
+ * compaction success rate.
+ */
+ if (freepage_order >= cc->order)
+ goto isolate_fail;
+ }
+ }
+
/*
* If we already hold the lock, we can skip some rechecking.
* Note that if we hold the lock now, checked_pageblock was
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at
2015-08-24 2:19 ` [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at Joonsoo Kim
@ 2015-09-25 11:41 ` Vlastimil Babka
2015-09-30 8:38 ` Joonsoo Kim
0 siblings, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2015-09-25 11:41 UTC (permalink / raw)
To: Joonsoo Kim, Andrew Morton
Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
Minchan Kim, Joonsoo Kim
On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Purpose of compaction is to make high order page. To achive this purpose,
> it is the best strategy that compaction migrates contiguous used pages
> to fragmented unused freepages. Currently, freepage scanner don't
> distinguish whether freepage is fragmented or not and blindly use
> any freepage for migration target regardless of freepage's order.
>
> Using higher order freepage than compaction aims at is not good because
> what we do here is breaking high order freepage at somewhere and migrating
> used pages from elsewhere to this broken high order freepages in order to
> make new high order freepage. That is just position change of high order
> freepage.
>
> This is useless effort and doesn't help to make more high order freepages
> because we can't be sure that migrating used pages makes high order
> freepage. So, this patch makes freepage scanner only uses the ordered
> freepage lower than compaction order.
How often does this happen? If there's a free page of the order we need, then we
are done compacting anyway, no? Or is this happening because of the current
high-order watermark checking implementation? It would be interesting to measure
how often this skip would trigger. Also watermark checking should change with
Mel's patchset and then this patch shouldn't be needed?
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> mm/compaction.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ca4d6d1..e61ee77 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -455,6 +455,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> unsigned long flags = 0;
> bool locked = false;
> unsigned long blockpfn = *start_pfn;
> + unsigned long freepage_order;
>
> cursor = pfn_to_page(blockpfn);
>
> @@ -482,6 +483,20 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> if (!PageBuddy(page))
> goto isolate_fail;
>
> + if (!strict && cc->order != -1) {
> + freepage_order = page_order_unsafe(page);
> +
> + if (freepage_order > 0 && freepage_order < MAX_ORDER) {
> + /*
> + * Do not use high order freepage for migration
> + * taret. It would not be beneficial for
> + * compaction success rate.
> + */
> + if (freepage_order >= cc->order)
It would be better to skip the whole freepage_order.
> + goto isolate_fail;
> + }
> + }
> +
> /*
> * If we already hold the lock, we can skip some rechecking.
> * Note that if we hold the lock now, checked_pageblock was
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at
2015-09-25 11:41 ` Vlastimil Babka
@ 2015-09-30 8:38 ` Joonsoo Kim
0 siblings, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-09-30 8:38 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
David Rientjes, Minchan Kim
On Fri, Sep 25, 2015 at 01:41:48PM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> > Purpose of compaction is to make high order page. To achive this purpose,
> > it is the best strategy that compaction migrates contiguous used pages
> > to fragmented unused freepages. Currently, freepage scanner don't
> > distinguish whether freepage is fragmented or not and blindly use
> > any freepage for migration target regardless of freepage's order.
> >
> > Using higher order freepage than compaction aims at is not good because
> > what we do here is breaking high order freepage at somewhere and migrating
> > used pages from elsewhere to this broken high order freepages in order to
> > make new high order freepage. That is just position change of high order
> > freepage.
> >
> > This is useless effort and doesn't help to make more high order freepages
> > because we can't be sure that migrating used pages makes high order
> > freepage. So, this patch makes freepage scanner only uses the ordered
> > freepage lower than compaction order.
>
> How often does this happen? If there's a free page of the order we need, then we
> are done compacting anyway, no? Or is this happening because of the current
> high-order watermark checking implementation? It would be interesting to measure
> how often this skip would trigger. Also watermark checking should change with
> Mel's patchset and then this patch shouldn't be needed?
Yes, you are right. This would be happening because of the current
high-order watermakr checking implementation and Mel's patchset will
solve it and if it is merged, this patch isn't needed.
>
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> > mm/compaction.c | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index ca4d6d1..e61ee77 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -455,6 +455,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> > unsigned long flags = 0;
> > bool locked = false;
> > unsigned long blockpfn = *start_pfn;
> > + unsigned long freepage_order;
> >
> > cursor = pfn_to_page(blockpfn);
> >
> > @@ -482,6 +483,20 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> > if (!PageBuddy(page))
> > goto isolate_fail;
> >
> > + if (!strict && cc->order != -1) {
> > + freepage_order = page_order_unsafe(page);
> > +
> > + if (freepage_order > 0 && freepage_order < MAX_ORDER) {
> > + /*
> > + * Do not use high order freepage for migration
> > + * taret. It would not be beneficial for
> > + * compaction success rate.
> > + */
> > + if (freepage_order >= cc->order)
>
> It would be better to skip the whole freepage_order.
Okay.
Thanks.
>
> > + goto isolate_fail;
> > + }
> > + }
> > +
> > /*
> > * If we already hold the lock, we can skip some rechecking.
> > * Note that if we hold the lock now, checked_pageblock was
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone
2015-08-24 2:19 [PATCH v2 0/9] mm/compaction: redesign compaction Joonsoo Kim
` (7 preceding siblings ...)
2015-08-24 2:19 ` [PATCH v2 8/9] mm/compaction: don't use higher order freepage than compaction aims at Joonsoo Kim
@ 2015-08-24 2:19 ` Joonsoo Kim
2015-10-14 12:28 ` Vlastimil Babka
8 siblings, 1 reply; 26+ messages in thread
From: Joonsoo Kim @ 2015-08-24 2:19 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, linux-mm, Vlastimil Babka, Mel Gorman,
Rik van Riel, David Rientjes, Minchan Kim, Joonsoo Kim
Now, compaction algorithm become powerful. Migration scanner traverses
whole zone range. So, old threshold for depleted zone which is designed
to imitate compaction deferring approach isn't appropriate for current
compaction algorithm. If we adhere to current threshold, 1, we can't
avoid excessive overhead caused by compaction, because one compaction
for low order allocation would be easily successful in any situation.
This patch re-implements threshold calculation based on zone size and
allocation requested order. We judge whther compaction possibility is
depleted or not by number of successful compaction. Roughly, 1/100
of future scanned area should be allocated for high order page during
one comaction iteration in order to determine whether zone's compaction
possiblity is depleted or not.
Below is test result with following setup.
Memory is artificially fragmented to make order 3 allocation hard. And,
most of pageblocks are changed to movable migratetype.
System: 512 MB with 32 MB Zram
Memory: 25% memory is allocated to make fragmentation and 200 MB is
occupied by memory hogger. Most pageblocks are movable
migratetype.
Fragmentation: Successful order 3 allocation candidates may be around
1500 roughly.
Allocation attempts: Roughly 3000 order 3 allocation attempts
with GFP_NORETRY. This value is determined to saturate allocation
success.
Test: hogger-frag-movable
Success(N) 94 83
compact_stall 3642 4048
compact_success 144 212
compact_fail 3498 3835
pgmigrate_success 15897219 216387
compact_isolated 31899553 487712
compact_migrate_scanned 59146745 2513245
compact_free_scanned 49566134 4124319
This change results in greatly decreasing compaction overhead when
zone's compaction possibility is nearly depleted. But, I should admit
that it's not perfect because compaction success rate is decreased.
More precise tuning threshold would restore this regression, but,
it highly depends on workload so I'm not doing it here.
Other test doesn't show big regression.
System: 512 MB with 32 MB Zram
Memory: 25% memory is allocated to make fragmentation and kernel
build is running on background. Most pageblocks are movable
migratetype.
Fragmentation: Successful order 3 allocation candidates may be around
1500 roughly.
Allocation attempts: Roughly 3000 order 3 allocation attempts
with GFP_NORETRY. This value is determined to saturate allocation
success.
Test: build-frag-movable
Success(N) 89 87
compact_stall 4053 3642
compact_success 264 202
compact_fail 3788 3440
pgmigrate_success 6497642 153413
compact_isolated 13292640 353445
compact_migrate_scanned 69714502 2307433
compact_free_scanned 20243121 2325295
This looks like reasonable trade-off.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
mm/compaction.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index e61ee77..e1b44a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -129,19 +129,24 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
/* Do not skip compaction more than 64 times */
#define COMPACT_MAX_FAILED 4
-#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
+#define COMPACT_MIN_DEPLETE_THRESHOLD 4UL
#define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
static bool compaction_depleted(struct zone *zone)
{
- unsigned long threshold;
+ unsigned long nr_possible;
unsigned long success = zone->compact_success;
+ unsigned long threshold;
- /*
- * Now, to imitate current compaction deferring approach,
- * choose threshold to 1. It will be changed in the future.
- */
- threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
+ nr_possible = zone->managed_pages >> zone->compact_order_failed;
+
+ /* Migration scanner normally scans less than 1/4 range of zone */
+ nr_possible >>= 2;
+
+ /* We hope to succeed more than 1/100 roughly */
+ threshold = nr_possible >> 7;
+
+ threshold = max(threshold, COMPACT_MIN_DEPLETE_THRESHOLD);
if (success >= threshold)
return false;
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone
2015-08-24 2:19 ` [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone Joonsoo Kim
@ 2015-10-14 12:28 ` Vlastimil Babka
2015-10-15 6:03 ` Joonsoo Kim
2015-10-15 6:06 ` Joonsoo Kim
0 siblings, 2 replies; 26+ messages in thread
From: Vlastimil Babka @ 2015-10-14 12:28 UTC (permalink / raw)
To: Joonsoo Kim, Andrew Morton
Cc: linux-kernel, linux-mm, Mel Gorman, Rik van Riel, David Rientjes,
Minchan Kim, Joonsoo Kim
On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> Now, compaction algorithm become powerful. Migration scanner traverses
> whole zone range. So, old threshold for depleted zone which is designed
> to imitate compaction deferring approach isn't appropriate for current
> compaction algorithm. If we adhere to current threshold, 1, we can't
> avoid excessive overhead caused by compaction, because one compaction
> for low order allocation would be easily successful in any situation.
>
> This patch re-implements threshold calculation based on zone size and
> allocation requested order. We judge whther compaction possibility is
> depleted or not by number of successful compaction. Roughly, 1/100
> of future scanned area should be allocated for high order page during
> one comaction iteration in order to determine whether zone's compaction
> possiblity is depleted or not.
Finally finishing my review, sorry it took that long...
> Below is test result with following setup.
>
> Memory is artificially fragmented to make order 3 allocation hard. And,
> most of pageblocks are changed to movable migratetype.
>
> System: 512 MB with 32 MB Zram
> Memory: 25% memory is allocated to make fragmentation and 200 MB is
> occupied by memory hogger. Most pageblocks are movable
> migratetype.
> Fragmentation: Successful order 3 allocation candidates may be around
> 1500 roughly.
> Allocation attempts: Roughly 3000 order 3 allocation attempts
> with GFP_NORETRY. This value is determined to saturate allocation
> success.
>
> Test: hogger-frag-movable
>
> Success(N) 94 83
> compact_stall 3642 4048
> compact_success 144 212
> compact_fail 3498 3835
> pgmigrate_success 15897219 216387
> compact_isolated 31899553 487712
> compact_migrate_scanned 59146745 2513245
> compact_free_scanned 49566134 4124319
The decrease in scanned/isolated/migrated counts looks definitely nice, but why
did success regress when compact_success improved substantially?
> This change results in greatly decreasing compaction overhead when
> zone's compaction possibility is nearly depleted. But, I should admit
> that it's not perfect because compaction success rate is decreased.
> More precise tuning threshold would restore this regression, but,
> it highly depends on workload so I'm not doing it here.
>
> Other test doesn't show big regression.
>
> System: 512 MB with 32 MB Zram
> Memory: 25% memory is allocated to make fragmentation and kernel
> build is running on background. Most pageblocks are movable
> migratetype.
> Fragmentation: Successful order 3 allocation candidates may be around
> 1500 roughly.
> Allocation attempts: Roughly 3000 order 3 allocation attempts
> with GFP_NORETRY. This value is determined to saturate allocation
> success.
>
> Test: build-frag-movable
>
> Success(N) 89 87
> compact_stall 4053 3642
> compact_success 264 202
> compact_fail 3788 3440
> pgmigrate_success 6497642 153413
> compact_isolated 13292640 353445
> compact_migrate_scanned 69714502 2307433
> compact_free_scanned 20243121 2325295
Here compact_success decreased relatively a lot, while success just barely.
Less counterintuitive than the first result, but still a bit.
> This looks like reasonable trade-off.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> mm/compaction.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e61ee77..e1b44a5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -129,19 +129,24 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>
> /* Do not skip compaction more than 64 times */
> #define COMPACT_MAX_FAILED 4
> -#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
> +#define COMPACT_MIN_DEPLETE_THRESHOLD 4UL
> #define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
>
> static bool compaction_depleted(struct zone *zone)
> {
> - unsigned long threshold;
> + unsigned long nr_possible;
> unsigned long success = zone->compact_success;
> + unsigned long threshold;
>
> - /*
> - * Now, to imitate current compaction deferring approach,
> - * choose threshold to 1. It will be changed in the future.
> - */
> - threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
> + nr_possible = zone->managed_pages >> zone->compact_order_failed;
> +
> + /* Migration scanner normally scans less than 1/4 range of zone */
> + nr_possible >>= 2;
> +
> + /* We hope to succeed more than 1/100 roughly */
> + threshold = nr_possible >> 7;
> +
> + threshold = max(threshold, COMPACT_MIN_DEPLETE_THRESHOLD);
> if (success >= threshold)
> return false;
I wonder if compact_depletion_depth should play some "positive" role here. The
bigger the depth, the lower the migration_scan_limit, which means higher chance
of failing and so on. Ideally, the system should stabilize itself, so that
migration_scan_limit is set based how many pages on average have to be scanned
to succeed?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone
2015-10-14 12:28 ` Vlastimil Babka
@ 2015-10-15 6:03 ` Joonsoo Kim
2015-10-15 6:06 ` Joonsoo Kim
1 sibling, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-10-15 6:03 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
David Rientjes, Minchan Kim
On Wed, Oct 14, 2015 at 02:28:31PM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> > Now, compaction algorithm become powerful. Migration scanner traverses
> > whole zone range. So, old threshold for depleted zone which is designed
> > to imitate compaction deferring approach isn't appropriate for current
> > compaction algorithm. If we adhere to current threshold, 1, we can't
> > avoid excessive overhead caused by compaction, because one compaction
> > for low order allocation would be easily successful in any situation.
> >
> > This patch re-implements threshold calculation based on zone size and
> > allocation requested order. We judge whther compaction possibility is
> > depleted or not by number of successful compaction. Roughly, 1/100
> > of future scanned area should be allocated for high order page during
> > one comaction iteration in order to determine whether zone's compaction
> > possiblity is depleted or not.
>
> Finally finishing my review, sorry it took that long...
>
> > Below is test result with following setup.
> >
> > Memory is artificially fragmented to make order 3 allocation hard. And,
> > most of pageblocks are changed to movable migratetype.
> >
> > System: 512 MB with 32 MB Zram
> > Memory: 25% memory is allocated to make fragmentation and 200 MB is
> > occupied by memory hogger. Most pageblocks are movable
> > migratetype.
> > Fragmentation: Successful order 3 allocation candidates may be around
> > 1500 roughly.
> > Allocation attempts: Roughly 3000 order 3 allocation attempts
> > with GFP_NORETRY. This value is determined to saturate allocation
> > success.
> >
> > Test: hogger-frag-movable
> >
> > Success(N) 94 83
> > compact_stall 3642 4048
> > compact_success 144 212
> > compact_fail 3498 3835
> > pgmigrate_success 15897219 216387
> > compact_isolated 31899553 487712
> > compact_migrate_scanned 59146745 2513245
> > compact_free_scanned 49566134 4124319
>
> The decrease in scanned/isolated/migrated counts looks definitely nice, but why
> did success regress when compact_success improved substantially?
I don't know exact reason. I guess that more compaction success rate
comes from the fact that async compaction is stopped when depleted.
This compact_success metric doesn't fully reflect allocation success because
allocation success comes from not only compaction but also reclaim and
others. In the above test, compaction just succeed 200 times but
allocation succeed more than 1300 times.
>
> > This change results in greatly decreasing compaction overhead when
> > zone's compaction possibility is nearly depleted. But, I should admit
> > that it's not perfect because compaction success rate is decreased.
> > More precise tuning threshold would restore this regression, but,
> > it highly depends on workload so I'm not doing it here.
> >
> > Other test doesn't show big regression.
> >
> > System: 512 MB with 32 MB Zram
> > Memory: 25% memory is allocated to make fragmentation and kernel
> > build is running on background. Most pageblocks are movable
> > migratetype.
> > Fragmentation: Successful order 3 allocation candidates may be around
> > 1500 roughly.
> > Allocation attempts: Roughly 3000 order 3 allocation attempts
> > with GFP_NORETRY. This value is determined to saturate allocation
> > success.
> >
> > Test: build-frag-movable
> >
> > Success(N) 89 87
> > compact_stall 4053 3642
> > compact_success 264 202
> > compact_fail 3788 3440
> > pgmigrate_success 6497642 153413
> > compact_isolated 13292640 353445
> > compact_migrate_scanned 69714502 2307433
> > compact_free_scanned 20243121 2325295
>
> Here compact_success decreased relatively a lot, while success just barely.
> Less counterintuitive than the first result, but still a bit.
compact_stall on build test would be more unpredictable, because there
are a lot of background working threads and they allocate and free
memory. Anyway, I will investigate it and if I find something, I will
note. :)
To know precise reason, separating async/sync compaction stat will be
helpful.
> > This looks like reasonable trade-off.
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> > mm/compaction.c | 19 ++++++++++++-------
> > 1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index e61ee77..e1b44a5 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -129,19 +129,24 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> >
> > /* Do not skip compaction more than 64 times */
> > #define COMPACT_MAX_FAILED 4
> > -#define COMPACT_MIN_DEPLETE_THRESHOLD 1UL
> > +#define COMPACT_MIN_DEPLETE_THRESHOLD 4UL
> > #define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
> >
> > static bool compaction_depleted(struct zone *zone)
> > {
> > - unsigned long threshold;
> > + unsigned long nr_possible;
> > unsigned long success = zone->compact_success;
> > + unsigned long threshold;
> >
> > - /*
> > - * Now, to imitate current compaction deferring approach,
> > - * choose threshold to 1. It will be changed in the future.
> > - */
> > - threshold = COMPACT_MIN_DEPLETE_THRESHOLD;
> > + nr_possible = zone->managed_pages >> zone->compact_order_failed;
> > +
> > + /* Migration scanner normally scans less than 1/4 range of zone */
> > + nr_possible >>= 2;
> > +
> > + /* We hope to succeed more than 1/100 roughly */
> > + threshold = nr_possible >> 7;
> > +
> > + threshold = max(threshold, COMPACT_MIN_DEPLETE_THRESHOLD);
> > if (success >= threshold)
> > return false;
>
> I wonder if compact_depletion_depth should play some "positive" role here. The
> bigger the depth, the lower the migration_scan_limit, which means higher chance
> of failing and so on. Ideally, the system should stabilize itself, so that
> migration_scan_limit is set based how many pages on average have to be scanned
> to succeed?
compact_depletion_depth is for throttling compaction when compaction
success possibility is depleted. Not for more success when depleted.
If compaction possibility is recovered, this little try will make
high-order page and stop throttling. This is a way to reduce excessive
compaction overhead like as deferring.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH v2 9/9] mm/compaction: new threshold for compaction depleted zone
2015-10-14 12:28 ` Vlastimil Babka
2015-10-15 6:03 ` Joonsoo Kim
@ 2015-10-15 6:06 ` Joonsoo Kim
1 sibling, 0 replies; 26+ messages in thread
From: Joonsoo Kim @ 2015-10-15 6:06 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Rik van Riel,
David Rientjes, Minchan Kim
On Wed, Oct 14, 2015 at 02:28:31PM +0200, Vlastimil Babka wrote:
> On 08/24/2015 04:19 AM, Joonsoo Kim wrote:
> > Now, compaction algorithm become powerful. Migration scanner traverses
> > whole zone range. So, old threshold for depleted zone which is designed
> > to imitate compaction deferring approach isn't appropriate for current
> > compaction algorithm. If we adhere to current threshold, 1, we can't
> > avoid excessive overhead caused by compaction, because one compaction
> > for low order allocation would be easily successful in any situation.
> >
> > This patch re-implements threshold calculation based on zone size and
> > allocation requested order. We judge whther compaction possibility is
> > depleted or not by number of successful compaction. Roughly, 1/100
> > of future scanned area should be allocated for high order page during
> > one comaction iteration in order to determine whether zone's compaction
> > possiblity is depleted or not.
>
> Finally finishing my review, sorry it took that long...
>
Ah... I forgot to mention that I really appreciate your help.
Thanks for review!!
Thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread