* [RFC PATCH 1/7] sysfs interface for the boundary of movable zone
2024-03-20 2:42 [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations kaiyang2
@ 2024-03-20 2:42 ` kaiyang2
2024-03-20 2:42 ` [RFC PATCH 2/7] Disallows high-order movable allocations in other zones if ZONE_MOVABLE is populated kaiyang2
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: kaiyang2 @ 2024-03-20 2:42 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Kaiyang Zhao, hannes, ziy, dskarlat
From: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Exports the pfn and memory block id for boundary
Signed-off-by: Kaiyang Zhao <zh_kaiyang@hotmail.com>
---
drivers/base/memory.c | 2 +-
drivers/base/node.c | 32 ++++++++++++++++++++++++++++++++
include/linux/memory.h | 1 +
3 files changed, 34 insertions(+), 1 deletion(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..281b229d7019 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -55,7 +55,7 @@ static inline unsigned long memory_block_id(unsigned long section_nr)
return section_nr / sections_per_block;
}
-static inline unsigned long pfn_to_block_id(unsigned long pfn)
+unsigned long pfn_to_block_id(unsigned long pfn)
{
return memory_block_id(pfn_to_section_nr(pfn));
}
diff --git a/drivers/base/node.c b/drivers/base/node.c
index b46db17124f3..f29ce07565ba 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -486,6 +486,37 @@ static ssize_t node_read_meminfo(struct device *dev,
#undef K
static DEVICE_ATTR(meminfo, 0444, node_read_meminfo, NULL);
+static ssize_t node_read_movable_zone(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ int len = 0;
+ struct zone *unmovable_zone;
+ unsigned long movable_start_pfn, unmovable_end_pfn;
+ unsigned long movable_start_block_id, unmovable_end_block_id;
+
+ movable_start_pfn = NODE_DATA(dev->id)->node_zones[ZONE_MOVABLE].zone_start_pfn;
+ movable_start_block_id = pfn_to_block_id(movable_start_pfn);
+
+ if (populated_zone(&(NODE_DATA(dev->id)->node_zones[ZONE_NORMAL])))
+ unmovable_zone = &(NODE_DATA(dev->id)->node_zones[ZONE_NORMAL]);
+ else
+ unmovable_zone = &(NODE_DATA(dev->id)->node_zones[ZONE_DMA32]);
+
+ unmovable_end_pfn = zone_end_pfn(unmovable_zone);
+ unmovable_end_block_id = pfn_to_block_id(unmovable_end_pfn);
+
+ len = sysfs_emit_at(buf, len,
+ "movable_zone_start_pfn %lu\n"
+ "movable_zone_start_block_id %lu\n"
+ "unmovable_zone_end_pfn %lu\n"
+ "unmovable_zone_end_block_id %lu\n",
+ movable_start_pfn, movable_start_block_id,
+ unmovable_end_pfn, unmovable_end_block_id);
+
+ return len;
+}
+static DEVICE_ATTR(movable_zone, 0444, node_read_movable_zone, NULL);
+
static ssize_t node_read_numastat(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -565,6 +596,7 @@ static DEVICE_ATTR(distance, 0444, node_read_distance, NULL);
static struct attribute *node_dev_attrs[] = {
&dev_attr_meminfo.attr,
+ &dev_attr_movable_zone.attr,
&dev_attr_numastat.attr,
&dev_attr_distance.attr,
&dev_attr_vmstat.attr,
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 31343566c221..17a92a5c1ae5 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -92,6 +92,7 @@ struct memory_block {
int arch_get_memory_phys_device(unsigned long start_pfn);
unsigned long memory_block_size_bytes(void);
int set_memory_block_size_order(unsigned int order);
+unsigned long pfn_to_block_id(unsigned long pfn);
/* These states are exposed to userspace as text strings in sysfs */
#define MEM_ONLINE (1<<0) /* exposed to userspace */
--
2.40.1
^ permalink raw reply [flat|nested] 10+ messages in thread* [RFC PATCH 2/7] Disallows high-order movable allocations in other zones if ZONE_MOVABLE is populated
2024-03-20 2:42 [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations kaiyang2
2024-03-20 2:42 ` [RFC PATCH 1/7] sysfs interface for the boundary of movable zone kaiyang2
@ 2024-03-20 2:42 ` kaiyang2
2024-03-20 2:42 ` [RFC PATCH 3/7] compaction accepts a destination zone kaiyang2
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: kaiyang2 @ 2024-03-20 2:42 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Kaiyang Zhao, hannes, ziy, dskarlat
From: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Use ZONE_MOVABLE exclusively for non-0 order allocations
Signed-off-by: Kaiyang Zhao <zh_kaiyang@hotmail.com>
---
mm/page_alloc.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 47421bedc12b..9ad9357e340a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3403,6 +3403,16 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
struct page *page;
unsigned long mark;
+ /*
+ * Disallows high-order movable allocations in other zones if
+ * ZONE_MOVABLE is populated on this node.
+ */
+ if (ac->highest_zoneidx >= ZONE_MOVABLE &&
+ order > 0 &&
+ zone_idx(zone) != ZONE_MOVABLE &&
+ populated_zone(&(zone->zone_pgdat->node_zones[ZONE_MOVABLE])))
+ continue;
+
if (cpusets_enabled() &&
(alloc_flags & ALLOC_CPUSET) &&
!__cpuset_zone_allowed(zone, gfp_mask))
--
2.40.1
^ permalink raw reply [flat|nested] 10+ messages in thread* [RFC PATCH 3/7] compaction accepts a destination zone
2024-03-20 2:42 [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations kaiyang2
2024-03-20 2:42 ` [RFC PATCH 1/7] sysfs interface for the boundary of movable zone kaiyang2
2024-03-20 2:42 ` [RFC PATCH 2/7] Disallows high-order movable allocations in other zones if ZONE_MOVABLE is populated kaiyang2
@ 2024-03-20 2:42 ` kaiyang2
2024-03-20 2:42 ` [RFC PATCH 4/7] vmstat counter for pages migrated across zones kaiyang2
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: kaiyang2 @ 2024-03-20 2:42 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Kaiyang Zhao, hannes, ziy, dskarlat
From: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Distinguishes the source and destination zones in compaction
Signed-off-by: Kaiyang Zhao <zh_kaiyang@hotmail.com>
---
include/linux/compaction.h | 4 +-
mm/compaction.c | 106 +++++++++++++++++++++++--------------
mm/internal.h | 1 +
mm/vmscan.c | 4 +-
4 files changed, 70 insertions(+), 45 deletions(-)
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index a6e512cfb670..11f5a1a83abb 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -90,7 +90,7 @@ extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
struct page **page);
extern void reset_isolation_suitable(pg_data_t *pgdat);
extern enum compact_result compaction_suitable(struct zone *zone, int order,
- unsigned int alloc_flags, int highest_zoneidx);
+ unsigned int alloc_flags, int highest_zoneidx, struct zone *dst_zone);
extern void compaction_defer_reset(struct zone *zone, int order,
bool alloc_success);
@@ -180,7 +180,7 @@ static inline void reset_isolation_suitable(pg_data_t *pgdat)
}
static inline enum compact_result compaction_suitable(struct zone *zone, int order,
- int alloc_flags, int highest_zoneidx)
+ int alloc_flags, int highest_zoneidx, struct zone *dst_zone)
{
return COMPACT_SKIPPED;
}
diff --git a/mm/compaction.c b/mm/compaction.c
index c8bcdea15f5f..03b5c4debc17 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -435,7 +435,7 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
static void update_pageblock_skip(struct compact_control *cc,
struct page *page, unsigned long pfn)
{
- struct zone *zone = cc->zone;
+ struct zone *dst_zone = cc->dst_zone ? cc->dst_zone : cc->zone;
if (cc->no_set_skip_hint)
return;
@@ -446,8 +446,8 @@ static void update_pageblock_skip(struct compact_control *cc,
set_pageblock_skip(page);
/* Update where async and sync compaction should restart */
- if (pfn < zone->compact_cached_free_pfn)
- zone->compact_cached_free_pfn = pfn;
+ if (pfn < dst_zone->compact_cached_free_pfn)
+ dst_zone->compact_cached_free_pfn = pfn;
}
#else
static inline bool isolation_suitable(struct compact_control *cc,
@@ -550,6 +550,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
bool locked = false;
unsigned long blockpfn = *start_pfn;
unsigned int order;
+ struct zone *dst_zone = cc->dst_zone ? cc->dst_zone : cc->zone;
/* Strict mode is for isolation, speed is secondary */
if (strict)
@@ -568,7 +569,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
* pending.
*/
if (!(blockpfn % COMPACT_CLUSTER_MAX)
- && compact_unlock_should_abort(&cc->zone->lock, flags,
+ && compact_unlock_should_abort(&dst_zone->lock, flags,
&locked, cc))
break;
@@ -596,7 +597,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
/* If we already hold the lock, we can skip some rechecking. */
if (!locked) {
- locked = compact_lock_irqsave(&cc->zone->lock,
+ locked = compact_lock_irqsave(&dst_zone->lock,
&flags, cc);
/* Recheck this is a buddy page under lock */
@@ -634,7 +635,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
}
if (locked)
- spin_unlock_irqrestore(&cc->zone->lock, flags);
+ spin_unlock_irqrestore(&dst_zone->lock, flags);
/*
* There is a tiny chance that we have read bogus compound_order(),
@@ -683,11 +684,12 @@ isolate_freepages_range(struct compact_control *cc,
{
unsigned long isolated, pfn, block_start_pfn, block_end_pfn;
LIST_HEAD(freelist);
+ struct zone *dst_zone = cc->dst_zone ? cc->dst_zone : cc->zone;
pfn = start_pfn;
block_start_pfn = pageblock_start_pfn(pfn);
- if (block_start_pfn < cc->zone->zone_start_pfn)
- block_start_pfn = cc->zone->zone_start_pfn;
+ if (block_start_pfn < dst_zone->zone_start_pfn)
+ block_start_pfn = dst_zone->zone_start_pfn;
block_end_pfn = pageblock_end_pfn(pfn);
for (; pfn < end_pfn; pfn += isolated,
@@ -710,7 +712,7 @@ isolate_freepages_range(struct compact_control *cc,
}
if (!pageblock_pfn_to_page(block_start_pfn,
- block_end_pfn, cc->zone))
+ block_end_pfn, dst_zone))
break;
isolated = isolate_freepages_block(cc, &isolate_start_pfn,
@@ -1359,6 +1361,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
{
unsigned long start_pfn, end_pfn;
struct page *page;
+ struct zone *dst_zone = cc->dst_zone ? cc->dst_zone : cc->zone;
/* Do not search around if there are enough pages already */
if (cc->nr_freepages >= cc->nr_migratepages)
@@ -1369,10 +1372,10 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
return;
/* Pageblock boundaries */
- start_pfn = max(pageblock_start_pfn(pfn), cc->zone->zone_start_pfn);
- end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(cc->zone));
+ start_pfn = max(pageblock_start_pfn(pfn), dst_zone->zone_start_pfn);
+ end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(dst_zone));
- page = pageblock_pfn_to_page(start_pfn, end_pfn, cc->zone);
+ page = pageblock_pfn_to_page(start_pfn, end_pfn, dst_zone);
if (!page)
return;
@@ -1414,6 +1417,7 @@ fast_isolate_freepages(struct compact_control *cc)
struct page *page = NULL;
bool scan_start = false;
int order;
+ struct zone *dst_zone = cc->dst_zone ? cc->dst_zone : cc->zone;
/* Full compaction passes in a negative order */
if (cc->order <= 0)
@@ -1423,7 +1427,7 @@ fast_isolate_freepages(struct compact_control *cc)
* If starting the scan, use a deeper search and use the highest
* PFN found if a suitable one is not found.
*/
- if (cc->free_pfn >= cc->zone->compact_init_free_pfn) {
+ if (cc->free_pfn >= dst_zone->compact_init_free_pfn) {
limit = pageblock_nr_pages >> 1;
scan_start = true;
}
@@ -1448,7 +1452,7 @@ fast_isolate_freepages(struct compact_control *cc)
for (order = cc->search_order;
!page && order >= 0;
order = next_search_order(cc, order)) {
- struct free_area *area = &cc->zone->free_area[order];
+ struct free_area *area = &dst_zone->free_area[order];
struct list_head *freelist;
struct page *freepage;
unsigned long flags;
@@ -1458,7 +1462,7 @@ fast_isolate_freepages(struct compact_control *cc)
if (!area->nr_free)
continue;
- spin_lock_irqsave(&cc->zone->lock, flags);
+ spin_lock_irqsave(&dst_zone->lock, flags);
freelist = &area->free_list[MIGRATE_MOVABLE];
list_for_each_entry_reverse(freepage, freelist, lru) {
unsigned long pfn;
@@ -1469,7 +1473,7 @@ fast_isolate_freepages(struct compact_control *cc)
if (pfn >= highest)
highest = max(pageblock_start_pfn(pfn),
- cc->zone->zone_start_pfn);
+ dst_zone->zone_start_pfn);
if (pfn >= low_pfn) {
cc->fast_search_fail = 0;
@@ -1516,7 +1520,7 @@ fast_isolate_freepages(struct compact_control *cc)
}
}
- spin_unlock_irqrestore(&cc->zone->lock, flags);
+ spin_unlock_irqrestore(&dst_zone->lock, flags);
/*
* Smaller scan on next order so the total scan is related
@@ -1541,17 +1545,17 @@ fast_isolate_freepages(struct compact_control *cc)
if (cc->direct_compaction && pfn_valid(min_pfn)) {
page = pageblock_pfn_to_page(min_pfn,
min(pageblock_end_pfn(min_pfn),
- zone_end_pfn(cc->zone)),
- cc->zone);
+ zone_end_pfn(dst_zone)),
+ dst_zone);
cc->free_pfn = min_pfn;
}
}
}
}
- if (highest && highest >= cc->zone->compact_cached_free_pfn) {
+ if (highest && highest >= dst_zone->compact_cached_free_pfn) {
highest -= pageblock_nr_pages;
- cc->zone->compact_cached_free_pfn = highest;
+ dst_zone->compact_cached_free_pfn = highest;
}
cc->total_free_scanned += nr_scanned;
@@ -1569,7 +1573,7 @@ fast_isolate_freepages(struct compact_control *cc)
*/
static void isolate_freepages(struct compact_control *cc)
{
- struct zone *zone = cc->zone;
+ struct zone *zone = cc->dst_zone ? cc->dst_zone : cc->zone;
struct page *page;
unsigned long block_start_pfn; /* start of current pageblock */
unsigned long isolate_start_pfn; /* exact pfn we start at */
@@ -2089,11 +2093,19 @@ static enum compact_result __compact_finished(struct compact_control *cc)
unsigned int order;
const int migratetype = cc->migratetype;
int ret;
+ struct zone *dst_zone = cc->dst_zone ? cc->dst_zone : cc->zone;
- /* Compaction run completes if the migrate and free scanner meet */
- if (compact_scanners_met(cc)) {
+ /*
+ * Compaction run completes if the migrate and free scanner meet
+ * or when either the src or dst zone has been completely scanned
+ */
+ if (compact_scanners_met(cc) ||
+ cc->migrate_pfn >= zone_end_pfn(cc->zone) ||
+ cc->free_pfn < dst_zone->zone_start_pfn) {
/* Let the next compaction start anew. */
reset_cached_positions(cc->zone);
+ if (cc->dst_zone)
+ reset_cached_positions(cc->dst_zone);
/*
* Mark that the PG_migrate_skip information should be cleared
@@ -2196,10 +2208,13 @@ static enum compact_result compact_finished(struct compact_control *cc)
static enum compact_result __compaction_suitable(struct zone *zone, int order,
unsigned int alloc_flags,
int highest_zoneidx,
- unsigned long wmark_target)
+ unsigned long wmark_target, struct zone *dst_zone)
{
unsigned long watermark;
+ if (!dst_zone)
+ dst_zone = zone;
+
if (is_via_compact_memory(order))
return COMPACT_CONTINUE;
@@ -2227,9 +2242,9 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
* suitable migration targets
*/
watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
- low_wmark_pages(zone) : min_wmark_pages(zone);
+ low_wmark_pages(dst_zone) : min_wmark_pages(dst_zone);
watermark += compact_gap(order);
- if (!__zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
+ if (!__zone_watermark_ok(dst_zone, 0, watermark, highest_zoneidx,
ALLOC_CMA, wmark_target))
return COMPACT_SKIPPED;
@@ -2245,13 +2260,16 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
*/
enum compact_result compaction_suitable(struct zone *zone, int order,
unsigned int alloc_flags,
- int highest_zoneidx)
+ int highest_zoneidx, struct zone *dst_zone)
{
enum compact_result ret;
int fragindex;
+ if (!dst_zone)
+ dst_zone = zone;
+
ret = __compaction_suitable(zone, order, alloc_flags, highest_zoneidx,
- zone_page_state(zone, NR_FREE_PAGES));
+ zone_page_state(dst_zone, NR_FREE_PAGES), dst_zone);
/*
* fragmentation index determines if allocation failures are due to
* low memory or external fragmentation
@@ -2305,7 +2323,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
available = zone_reclaimable_pages(zone) / order;
available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
compact_result = __compaction_suitable(zone, order, alloc_flags,
- ac->highest_zoneidx, available);
+ ac->highest_zoneidx, available, NULL);
if (compact_result == COMPACT_CONTINUE)
return true;
}
@@ -2317,8 +2335,9 @@ static enum compact_result
compact_zone(struct compact_control *cc, struct capture_control *capc)
{
enum compact_result ret;
+ struct zone *dst_zone = cc->dst_zone ? cc->dst_zone : cc->zone;
unsigned long start_pfn = cc->zone->zone_start_pfn;
- unsigned long end_pfn = zone_end_pfn(cc->zone);
+ unsigned long end_pfn = zone_end_pfn(dst_zone);
unsigned long last_migrated_pfn;
const bool sync = cc->mode != MIGRATE_ASYNC;
bool update_cached;
@@ -2337,7 +2356,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
cc->migratetype = gfp_migratetype(cc->gfp_mask);
ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
- cc->highest_zoneidx);
+ cc->highest_zoneidx, dst_zone);
/* Compaction is likely to fail */
if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
return ret;
@@ -2346,14 +2365,19 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
* Clear pageblock skip if there were failures recently and compaction
* is about to be retried after being deferred.
*/
- if (compaction_restarting(cc->zone, cc->order))
+ if (compaction_restarting(cc->zone, cc->order)) {
__reset_isolation_suitable(cc->zone);
+ if (dst_zone != cc->zone)
+ __reset_isolation_suitable(dst_zone);
+ }
/*
* Setup to move all movable pages to the end of the zone. Used cached
* information on where the scanners should start (unless we explicitly
* want to compact the whole zone), but check that it is initialised
* by ensuring the values are within zone boundaries.
+ *
+ * If a destination zone is provided, use it for free pages.
*/
cc->fast_start_pfn = 0;
if (cc->whole_zone) {
@@ -2361,12 +2385,12 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
} else {
cc->migrate_pfn = cc->zone->compact_cached_migrate_pfn[sync];
- cc->free_pfn = cc->zone->compact_cached_free_pfn;
- if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
+ cc->free_pfn = dst_zone->compact_cached_free_pfn;
+ if (cc->free_pfn < dst_zone->zone_start_pfn || cc->free_pfn >= end_pfn) {
cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
- cc->zone->compact_cached_free_pfn = cc->free_pfn;
+ dst_zone->compact_cached_free_pfn = cc->free_pfn;
}
- if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
+ if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= zone_end_pfn(cc->zone)) {
cc->migrate_pfn = start_pfn;
cc->zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
cc->zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
@@ -2522,8 +2546,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
* Only go back, not forward. The cached pfn might have been
* already reset to zone end in compact_finished()
*/
- if (free_pfn > cc->zone->compact_cached_free_pfn)
- cc->zone->compact_cached_free_pfn = free_pfn;
+ if (free_pfn > dst_zone->compact_cached_free_pfn)
+ dst_zone->compact_cached_free_pfn = free_pfn;
}
count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned);
@@ -2834,7 +2858,7 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat)
continue;
if (compaction_suitable(zone, pgdat->kcompactd_max_order, 0,
- highest_zoneidx) == COMPACT_CONTINUE)
+ highest_zoneidx, NULL) == COMPACT_CONTINUE)
return true;
}
@@ -2871,7 +2895,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
if (compaction_deferred(zone, cc.order))
continue;
- if (compaction_suitable(zone, cc.order, 0, zoneid) !=
+ if (compaction_suitable(zone, cc.order, 0, zoneid, NULL) !=
COMPACT_CONTINUE)
continue;
diff --git a/mm/internal.h b/mm/internal.h
index 68410c6d97ac..349223cc0359 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -465,6 +465,7 @@ struct compact_control {
unsigned long migrate_pfn;
unsigned long fast_start_pfn; /* a pfn to start linear scan from */
struct zone *zone;
+ struct zone *dst_zone; /* use another zone as the destination */
unsigned long total_migrate_scanned;
unsigned long total_free_scanned;
unsigned short fast_search_fail;/* failures to use free list searches */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5bf98d0a22c9..aa21da983804 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6383,7 +6383,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
if (!managed_zone(zone))
continue;
- switch (compaction_suitable(zone, sc->order, 0, sc->reclaim_idx)) {
+ switch (compaction_suitable(zone, sc->order, 0, sc->reclaim_idx, NULL)) {
case COMPACT_SUCCESS:
case COMPACT_CONTINUE:
return false;
@@ -6580,7 +6580,7 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
unsigned long watermark;
enum compact_result suitable;
- suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx);
+ suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx, NULL);
if (suitable == COMPACT_SUCCESS)
/* Allocation should succeed already. Don't reclaim. */
return true;
--
2.40.1
^ permalink raw reply [flat|nested] 10+ messages in thread* [RFC PATCH 4/7] vmstat counter for pages migrated across zones
2024-03-20 2:42 [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations kaiyang2
` (2 preceding siblings ...)
2024-03-20 2:42 ` [RFC PATCH 3/7] compaction accepts a destination zone kaiyang2
@ 2024-03-20 2:42 ` kaiyang2
2024-03-20 2:42 ` [RFC PATCH 5/7] proactively move pages out of unmovable zones in kcompactd kaiyang2
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: kaiyang2 @ 2024-03-20 2:42 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Kaiyang Zhao, hannes, ziy, dskarlat
From: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Add a counter for the number of pages migrated across zones in vmstat
Signed-off-by: Kaiyang Zhao <zh_kaiyang@hotmail.com>
---
include/linux/vm_event_item.h | 1 +
mm/compaction.c | 2 ++
mm/vmstat.c | 1 +
3 files changed, 4 insertions(+)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 8abfa1240040..be88819085b6 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -79,6 +79,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
KCOMPACTD_WAKE,
KCOMPACTD_MIGRATE_SCANNED, KCOMPACTD_FREE_SCANNED,
+ COMPACT_CROSS_ZONE_MIGRATED,
#endif
#ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/mm/compaction.c b/mm/compaction.c
index 03b5c4debc17..dea10ad8ec64 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2552,6 +2552,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned);
count_compact_events(COMPACTFREE_SCANNED, cc->total_free_scanned);
+ if (dst_zone != cc->zone)
+ count_compact_events(COMPACT_CROSS_ZONE_MIGRATED, nr_succeeded);
trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c28046371b45..98af82e65ad9 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1324,6 +1324,7 @@ const char * const vmstat_text[] = {
"compact_daemon_wake",
"compact_daemon_migrate_scanned",
"compact_daemon_free_scanned",
+ "compact_cross_zone_migrated",
#endif
#ifdef CONFIG_HUGETLB_PAGE
--
2.40.1
^ permalink raw reply [flat|nested] 10+ messages in thread* [RFC PATCH 5/7] proactively move pages out of unmovable zones in kcompactd
2024-03-20 2:42 [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations kaiyang2
` (3 preceding siblings ...)
2024-03-20 2:42 ` [RFC PATCH 4/7] vmstat counter for pages migrated across zones kaiyang2
@ 2024-03-20 2:42 ` kaiyang2
2024-03-20 2:42 ` [RFC PATCH 6/7] pass gfp mask of the allocation that waked kswapd to track number of pages scanned on behalf of each alloc type kaiyang2
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: kaiyang2 @ 2024-03-20 2:42 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Kaiyang Zhao, hannes, ziy, dskarlat
From: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Proactively move pages out of unmovable zones in kcompactd
Debug only: zone start and end pfn printed in vmstat
Added counters for cross zone compaction start and scan
Signed-off-by: Kaiyang Zhao <zh_kaiyang@hotmail.com>
---
include/linux/vm_event_item.h | 3 +
mm/compaction.c | 101 +++++++++++++++++++++++++++++++---
mm/vmstat.c | 11 +++-
3 files changed, 104 insertions(+), 11 deletions(-)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index be88819085b6..c9183117c8f7 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -80,6 +80,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
KCOMPACTD_WAKE,
KCOMPACTD_MIGRATE_SCANNED, KCOMPACTD_FREE_SCANNED,
COMPACT_CROSS_ZONE_MIGRATED,
+ KCOMPACTD_CROSS_ZONE_START,
+ COMPACT_CROSS_ZONE_MIGRATE_SCANNED,
+ COMPACT_CROSS_ZONE_FREE_SCANNED,
#endif
#ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/mm/compaction.c b/mm/compaction.c
index dea10ad8ec64..94ce1282f17b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1436,7 +1436,10 @@ fast_isolate_freepages(struct compact_control *cc)
* Preferred point is in the top quarter of the scan space but take
* a pfn from the top half if the search is problematic.
*/
- distance = (cc->free_pfn - cc->migrate_pfn);
+ if (cc->zone != dst_zone)
+ distance = (cc->free_pfn - dst_zone->zone_start_pfn) >> 1;
+ else
+ distance = (cc->free_pfn - cc->migrate_pfn);
low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2));
min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1));
@@ -1602,7 +1605,10 @@ static void isolate_freepages(struct compact_control *cc)
block_start_pfn = pageblock_start_pfn(isolate_start_pfn);
block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
zone_end_pfn(zone));
- low_pfn = pageblock_end_pfn(cc->migrate_pfn);
+ if (cc->dst_zone && cc->zone != cc->dst_zone)
+ low_pfn = pageblock_end_pfn(cc->dst_zone->zone_start_pfn);
+ else
+ low_pfn = pageblock_end_pfn(cc->migrate_pfn);
stride = cc->mode == MIGRATE_ASYNC ? COMPACT_CLUSTER_MAX : 1;
/*
@@ -1822,7 +1828,11 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
* within the first eighth to reduce the chances that a migration
* target later becomes a source.
*/
- distance = (cc->free_pfn - cc->migrate_pfn) >> 1;
+ if (cc->dst_zone && cc->zone != cc->dst_zone)
+ distance = (zone_end_pfn(cc->zone) - cc->migrate_pfn) >> 1;
+ else
+ distance = (cc->free_pfn - cc->migrate_pfn) >> 1;
+
if (cc->migrate_pfn != cc->zone->zone_start_pfn)
distance >>= 2;
high_pfn = pageblock_start_pfn(cc->migrate_pfn + distance);
@@ -1897,7 +1907,7 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
{
unsigned long block_start_pfn;
unsigned long block_end_pfn;
- unsigned long low_pfn;
+ unsigned long low_pfn, high_pfn;
struct page *page;
const isolate_mode_t isolate_mode =
(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
@@ -1924,11 +1934,16 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
/* Only scan within a pageblock boundary */
block_end_pfn = pageblock_end_pfn(low_pfn);
+ if (cc->dst_zone && cc->zone != cc->dst_zone)
+ high_pfn = zone_end_pfn(cc->zone);
+ else
+ high_pfn = cc->free_pfn;
+
/*
* Iterate over whole pageblocks until we find the first suitable.
* Do not cross the free scanner.
*/
- for (; block_end_pfn <= cc->free_pfn;
+ for (; block_end_pfn <= high_pfn;
fast_find_block = false,
cc->migrate_pfn = low_pfn = block_end_pfn,
block_start_pfn = block_end_pfn,
@@ -1954,6 +1969,7 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
* before making it "skip" so other compaction instances do
* not scan the same block.
*/
+
if (pageblock_aligned(low_pfn) &&
!fast_find_block && !isolation_suitable(cc, page))
continue;
@@ -1976,6 +1992,10 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
isolate_mode))
return ISOLATE_ABORT;
+ /* free_pfn may have changed. update high_pfn. */
+ if (!cc->dst_zone || cc->zone == cc->dst_zone)
+ high_pfn = cc->free_pfn;
+
/*
* Either we isolated something and proceed with migration. Or
* we failed and compact_zone should decide if we should
@@ -2141,7 +2161,9 @@ static enum compact_result __compact_finished(struct compact_control *cc)
goto out;
}
- if (is_via_compact_memory(cc->order))
+ /* Don't check if a suitable page is free if doing cross zone compaction. */
+ if (is_via_compact_memory(cc->order) ||
+ (cc->dst_zone && cc->dst_zone != cc->zone))
return COMPACT_CONTINUE;
/*
@@ -2224,7 +2246,8 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
* should be no need for compaction at all.
*/
if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
- alloc_flags))
+ alloc_flags) &&
+ dst_zone == zone)
return COMPACT_SUCCESS;
/*
@@ -2270,6 +2293,11 @@ enum compact_result compaction_suitable(struct zone *zone, int order,
ret = __compaction_suitable(zone, order, alloc_flags, highest_zoneidx,
zone_page_state(dst_zone, NR_FREE_PAGES), dst_zone);
+
+ /* Allow migrating movable pages to ZONE_MOVABLE regardless of frag index */
+ if (ret == COMPACT_CONTINUE && dst_zone != zone)
+ return ret;
+
/*
* fragmentation index determines if allocation failures are due to
* low memory or external fragmentation
@@ -2841,6 +2869,14 @@ void compaction_unregister_node(struct node *node)
}
#endif /* CONFIG_SYSFS && CONFIG_NUMA */
+static inline bool should_compact_unmovable_zones(pg_data_t *pgdat)
+{
+ if (populated_zone(&pgdat->node_zones[ZONE_MOVABLE]))
+ return true;
+ else
+ return false;
+}
+
static inline bool kcompactd_work_requested(pg_data_t *pgdat)
{
return pgdat->kcompactd_max_order > 0 || kthread_should_stop() ||
@@ -2942,6 +2978,48 @@ static void kcompactd_do_work(pg_data_t *pgdat)
pgdat->kcompactd_highest_zoneidx = pgdat->nr_zones - 1;
}
+static void kcompactd_clean_unmovable_zones(pg_data_t *pgdat)
+{
+ int zoneid;
+ struct zone *zone;
+ struct compact_control cc = {
+ .order = 0,
+ .search_order = 0,
+ .highest_zoneidx = ZONE_MOVABLE,
+ .mode = MIGRATE_SYNC,
+ .ignore_skip_hint = true,
+ .gfp_mask = GFP_KERNEL,
+ .dst_zone = &pgdat->node_zones[ZONE_MOVABLE],
+ .whole_zone = true
+ };
+ count_compact_event(KCOMPACTD_CROSS_ZONE_START);
+
+ for (zoneid = 0; zoneid < ZONE_MOVABLE; zoneid++) {
+ int status;
+
+ zone = &pgdat->node_zones[zoneid];
+ if (!populated_zone(zone))
+ continue;
+
+ if (compaction_suitable(zone, cc.order, 0, zoneid, cc.dst_zone) !=
+ COMPACT_CONTINUE)
+ continue;
+
+ if (kthread_should_stop())
+ return;
+
+ /* Not participating in compaction defer. */
+
+ cc.zone = zone;
+ status = compact_zone(&cc, NULL);
+
+ count_compact_events(COMPACT_CROSS_ZONE_MIGRATE_SCANNED,
+ cc.total_migrate_scanned);
+ count_compact_events(COMPACT_CROSS_ZONE_FREE_SCANNED,
+ cc.total_free_scanned);
+ }
+}
+
void wakeup_kcompactd(pg_data_t *pgdat, int order, int highest_zoneidx)
{
if (!order)
@@ -2994,9 +3072,10 @@ static int kcompactd(void *p)
/*
* Avoid the unnecessary wakeup for proactive compaction
- * when it is disabled.
+ * and cleanup of unmovable zones
+ * when they are disabled.
*/
- if (!sysctl_compaction_proactiveness)
+ if (!sysctl_compaction_proactiveness && !should_compact_unmovable_zones(pgdat))
timeout = MAX_SCHEDULE_TIMEOUT;
trace_mm_compaction_kcompactd_sleep(pgdat->node_id);
if (wait_event_freezable_timeout(pgdat->kcompactd_wait,
@@ -3017,6 +3096,10 @@ static int kcompactd(void *p)
continue;
}
+ /* Migrates movable pages out of unmovable zones if ZONE_MOVABLE exists */
+ if (should_compact_unmovable_zones(pgdat))
+ kcompactd_clean_unmovable_zones(pgdat);
+
/*
* Start the proactive work with default timeout. Based
* on the fragmentation score, this timeout is updated.
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 98af82e65ad9..444740605f2f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1325,6 +1325,9 @@ const char * const vmstat_text[] = {
"compact_daemon_migrate_scanned",
"compact_daemon_free_scanned",
"compact_cross_zone_migrated",
+ "compact_cross_zone_start",
+ "compact_cross_zone_migrate_scanned",
+ "compact_cross_zone_free_scanned",
#endif
#ifdef CONFIG_HUGETLB_PAGE
@@ -1692,7 +1695,9 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
"\n spanned %lu"
"\n present %lu"
"\n managed %lu"
- "\n cma %lu",
+ "\n cma %lu"
+ "\n start %lu"
+ "\n end %lu",
zone_page_state(zone, NR_FREE_PAGES),
zone->watermark_boost,
min_wmark_pages(zone),
@@ -1701,7 +1706,9 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
zone->spanned_pages,
zone->present_pages,
zone_managed_pages(zone),
- zone_cma_pages(zone));
+ zone_cma_pages(zone),
+ zone->zone_start_pfn,
+ zone_end_pfn(zone));
seq_printf(m,
"\n protection: (%ld",
--
2.40.1
^ permalink raw reply [flat|nested] 10+ messages in thread* [RFC PATCH 6/7] pass gfp mask of the allocation that waked kswapd to track number of pages scanned on behalf of each alloc type
2024-03-20 2:42 [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations kaiyang2
` (4 preceding siblings ...)
2024-03-20 2:42 ` [RFC PATCH 5/7] proactively move pages out of unmovable zones in kcompactd kaiyang2
@ 2024-03-20 2:42 ` kaiyang2
2024-03-20 2:42 ` [RFC PATCH 7/7] exports the number of pages scanned on behalf of movable/unmovable allocations kaiyang2
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: kaiyang2 @ 2024-03-20 2:42 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Kaiyang Zhao, hannes, ziy, dskarlat
From: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
In preparation for exporting the number of pages scanned for each alloc
type
Signed-off-by: Kaiyang Zhao <zh_kaiyang@hotmail.com>
---
include/linux/mmzone.h | 1 +
mm/vmscan.c | 13 +++++++++++--
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a4889c9d4055..abc9f1623c82 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1288,6 +1288,7 @@ typedef struct pglist_data {
struct task_struct *kswapd; /* Protected by kswapd_lock */
int kswapd_order;
enum zone_type kswapd_highest_zoneidx;
+ gfp_t kswapd_gfp;
int kswapd_failures; /* Number of 'reclaimed == 0' runs */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index aa21da983804..ed0f47e2e810 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7330,7 +7330,7 @@ clear_reclaim_active(pg_data_t *pgdat, int highest_zoneidx)
* or lower is eligible for reclaim until at least one usable zone is
* balanced.
*/
-static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
+static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx, gfp_t gfp_mask)
{
int i;
unsigned long nr_soft_reclaimed;
@@ -7345,6 +7345,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
.order = order,
.may_unmap = 1,
};
+ if (is_migrate_movable(gfp_migratetype(gfp_mask)))
+ sc.gfp_mask |= __GFP_MOVABLE;
set_task_reclaim_state(current, &sc.reclaim_state);
psi_memstall_enter(&pflags);
@@ -7659,6 +7661,7 @@ static int kswapd(void *p)
pg_data_t *pgdat = (pg_data_t *)p;
struct task_struct *tsk = current;
const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
+ gfp_t gfp_mask;
if (!cpumask_empty(cpumask))
set_cpus_allowed_ptr(tsk, cpumask);
@@ -7680,6 +7683,7 @@ static int kswapd(void *p)
WRITE_ONCE(pgdat->kswapd_order, 0);
WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES);
+ WRITE_ONCE(pgdat->kswapd_gfp, 0);
atomic_set(&pgdat->nr_writeback_throttled, 0);
for ( ; ; ) {
bool ret;
@@ -7687,6 +7691,7 @@ static int kswapd(void *p)
alloc_order = reclaim_order = READ_ONCE(pgdat->kswapd_order);
highest_zoneidx = kswapd_highest_zoneidx(pgdat,
highest_zoneidx);
+ gfp_mask = READ_ONCE(pgdat->kswapd_gfp);
kswapd_try_sleep:
kswapd_try_to_sleep(pgdat, alloc_order, reclaim_order,
@@ -7696,8 +7701,10 @@ static int kswapd(void *p)
alloc_order = READ_ONCE(pgdat->kswapd_order);
highest_zoneidx = kswapd_highest_zoneidx(pgdat,
highest_zoneidx);
+ gfp_mask = READ_ONCE(pgdat->kswapd_gfp);
WRITE_ONCE(pgdat->kswapd_order, 0);
WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES);
+ WRITE_ONCE(pgdat->kswapd_gfp, 0);
ret = try_to_freeze();
if (kthread_should_stop())
@@ -7721,7 +7728,7 @@ static int kswapd(void *p)
trace_mm_vmscan_kswapd_wake(pgdat->node_id, highest_zoneidx,
alloc_order);
reclaim_order = balance_pgdat(pgdat, alloc_order,
- highest_zoneidx);
+ highest_zoneidx, gfp_mask);
if (reclaim_order < alloc_order)
goto kswapd_try_sleep;
}
@@ -7759,6 +7766,8 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
if (READ_ONCE(pgdat->kswapd_order) < order)
WRITE_ONCE(pgdat->kswapd_order, order);
+ WRITE_ONCE(pgdat->kswapd_gfp, gfp_flags);
+
if (!waitqueue_active(&pgdat->kswapd_wait))
return;
--
2.40.1
^ permalink raw reply [flat|nested] 10+ messages in thread* [RFC PATCH 7/7] exports the number of pages scanned on behalf of movable/unmovable allocations
2024-03-20 2:42 [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations kaiyang2
` (5 preceding siblings ...)
2024-03-20 2:42 ` [RFC PATCH 6/7] pass gfp mask of the allocation that waked kswapd to track number of pages scanned on behalf of each alloc type kaiyang2
@ 2024-03-20 2:42 ` kaiyang2
2024-03-20 2:47 ` [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations Zi Yan
2024-03-20 2:57 ` kaiyang2
8 siblings, 0 replies; 10+ messages in thread
From: kaiyang2 @ 2024-03-20 2:42 UTC (permalink / raw)
To: linux-mm, linux-kernel; +Cc: Kaiyang Zhao, hannes, ziy, dskarlat
From: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
exports the number of pages scanned on behalf of movable/unmovable
allocations in vmstat
Signed-off-by: Kaiyang Zhao <zh_kaiyang@hotmail.com>
---
include/linux/vm_event_item.h | 2 ++
mm/vmscan.c | 11 +++++++++++
mm/vmstat.c | 2 ++
3 files changed, 15 insertions(+)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index c9183117c8f7..dcfff56c6d29 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -50,6 +50,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
PGSCAN_DIRECT_THROTTLE,
PGSCAN_ANON,
PGSCAN_FILE,
+ PGSCAN_MOVABLE, /* number of pages scanned on behalf of a movable allocation */
+ PGSCAN_UNMOVABLE,
PGSTEAL_ANON,
PGSTEAL_FILE,
#ifdef CONFIG_NUMA
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ed0f47e2e810..4eadf0254918 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -904,6 +904,12 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
cond_resched();
}
+ /* Arbitrarily consider 16 pages scanned */
+ if (is_migrate_movable(gfp_migratetype(shrinkctl->gfp_mask)))
+ count_vm_events(PGSCAN_MOVABLE, 16);
+ else
+ count_vm_events(PGSCAN_UNMOVABLE, 16);
+
/*
* The deferred work is increased by any new work (delta) that wasn't
* done, decreased by old deferred work that was done now.
@@ -2580,6 +2586,11 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
__count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
__count_vm_events(PGSCAN_ANON + file, nr_scanned);
+ if (is_migrate_movable(gfp_migratetype(sc->gfp_mask)))
+ __count_vm_events(PGSCAN_MOVABLE, nr_scanned);
+ else
+ __count_vm_events(PGSCAN_UNMOVABLE, nr_scanned);
+
spin_unlock_irq(&lruvec->lru_lock);
if (nr_taken == 0)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 444740605f2f..56062d53a36c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1281,6 +1281,8 @@ const char * const vmstat_text[] = {
"pgscan_direct_throttle",
"pgscan_anon",
"pgscan_file",
+ "pgscan_by_movable",
+ "pgscan_by_unmovable",
"pgsteal_anon",
"pgsteal_file",
--
2.40.1
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations
2024-03-20 2:42 [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations kaiyang2
` (6 preceding siblings ...)
2024-03-20 2:42 ` [RFC PATCH 7/7] exports the number of pages scanned on behalf of movable/unmovable allocations kaiyang2
@ 2024-03-20 2:47 ` Zi Yan
2024-03-20 2:57 ` kaiyang2
8 siblings, 0 replies; 10+ messages in thread
From: Zi Yan @ 2024-03-20 2:47 UTC (permalink / raw)
To: Kaiyang Zhao; +Cc: linux-mm, linux-kernel, hannes, dskarlat
[-- Attachment #1: Type: text/plain, Size: 3490 bytes --]
On 19 Mar 2024, at 22:42, kaiyang2@cs.cmu.edu wrote:
> From: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
>
> Memory capacity has increased dramatically over the last decades.
> Meanwhile, TLB capacity has stagnated, causing a significant virtual
> address translation overhead. As a collaboration between Carnegie Mellon
> University and Meta, we investigated the issue at Meta’s datacenters and
> found that about 20% of CPU cycles are spent doing page walks [1], and
> similar results are also reported by Google [2].
>
> To tackle the overhead, we need widespread uses of huge pages. And huge
> pages, when they can actually be created, work wonders: they provide up
> to 18% higher performance for Meta’s production workloads in our
> experiments [1].
>
> However, we observed that huge pages through THP are unreliable because
> sufficient physical contiguity may not exist and compaction to recover
> from memory fragmentation frequently fails. To ensure workloads get a
> reasonable number of huge pages, Meta could not rely on THP and had to
> use reserved huge pages. Proposals to add 1GB THP support [5] are even
> more dependent on ample availability of physical contiguity.
>
> A major reason for the lack of physical contiguity is the mixing of
> unmovable and movable allocations, causing compaction to fail. Quoting
> from [3], “in a broad sample of Meta servers, we find that unmovable
> allocations make up less than 7% of total memory on average, yet occupy
> 34% of the 2M blocks in the system. We also found that this effect isn't
> correlated with high uptimes, and that servers can get heavily
> fragmented within the first hour of running a workload.”
>
> Our proposed solution is to confine the unmovable allocations to a
> separate region in physical memory. We experimented with using a CMA
> region for the movable allocations, but in this version we use
> ZONE_MOVABLE for movable and all other zones for unmovable allocations.
> Movable allocations can temporarily reside in the unmovable zones, but
> will be proactively moved out by compaction.
>
> To resize ZONE_MOVABLE, we still rely on memory hotplug interfaces. We
> export the number of pages scanned on behalf of movable or unmovable
> allocations during reclaim to approximate the memory pressure in two
> parts of physical memory, and a userspace tool can monitor the metrics
> and make resizing decisions. Previously we augmented the PSI interface
> to break down memory pressure into movable and unmovable allocation
> types, but that approach enlarges the scheduler cacheline footprint.
> From our preliminary observations, just looking at the per-allocation
> type scanned counters and with a little tuning, it is sufficient to tell
> if there is not enough memory for unmovable allocations and make
> resizing decisions.
>
> This patch extends the idea of migratetype isolation at pageblock
> granularity posted earlier [3] by Johannes Weiner to an
> as-large-as-needed region to better support huge pages of bigger sizes
> and hardware TLB coalescing. We’re looking for feedback on the overall
> direction, particularly in relation to the recent THP allocator
> optimization proposal [4].
>
> The patches are based on 6.4 and are also available on github at
> https://github.com/magickaiyang/kernel-contiguous/tree/per_alloc_type_reclaim_counters_oct052023
Your reference links (1 to 4) are missing.
--
Best Regards,
Yan, Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations
2024-03-20 2:42 [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations kaiyang2
` (7 preceding siblings ...)
2024-03-20 2:47 ` [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations Zi Yan
@ 2024-03-20 2:57 ` kaiyang2
8 siblings, 0 replies; 10+ messages in thread
From: kaiyang2 @ 2024-03-20 2:57 UTC (permalink / raw)
To: kaiyang2; +Cc: dskarlat, hannes, linux-kernel, linux-mm, ziy
From: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Adding the missing citations.
[1]: https://dl.acm.org/doi/pdf/10.1145/3579371.3589079
[2]: https://www.usenix.org/conference/osdi21/presentation/hunter
[3]: https://lore.kernel.org/lkml/20230418191313.268131-1-hannes@cmpxchg.org/
[4]: https://lore.kernel.org/linux-mm/20240229183436.4110845-1-yuzhao@google.com/
[5]: https://lore.kernel.org/linux-mm/20200902180628.4052244-1-zi.yan@sent.com/
Best,
Kaiyang Zhao
^ permalink raw reply [flat|nested] 10+ messages in thread