linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/4] Enable >0 order folio memory compaction
@ 2024-02-16 17:04 Zi Yan
  2024-02-16 17:04 ` [PATCH v6 1/4] mm/page_alloc: remove unused fpi_flags in free_pages_prepare() Zi Yan
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Zi Yan @ 2024-02-16 17:04 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Zi Yan, Huang, Ying, Ryan Roberts, Andrew Morton,
	Matthew Wilcox (Oracle),
	David Hildenbrand, Yin, Fengwei, Yu Zhao, Vlastimil Babka,
	Kirill A . Shutemov, Johannes Weiner, Baolin Wang, Kemeng Shi,
	Mel Gorman, Rohan Puri, Mcgrof Chamberlain, Adam Manzanares,
	Vishal Moola (Oracle)

From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset enables >0 order folio memory compaction, which is one of
the prerequisitions for large folio support[1]. It is on top of
mm-everything-2024-02-16-01-35.

I am aware of that split free pages is necessary for folio
migration in compaction, since if >0 order free pages are never split
and no order-0 free page is scanned, compaction will end prematurely due
to migration returns -ENOMEM. Free page split becomes a must instead of
an optimization.

lkp ncompare results (on a 8-CPU (Intel Xeon E5-2650 v4 @2.20GHz) 16G VM)
for default LRU (-no-mglru) and CONFIG_LRU_GEN are shown at the bottom,
copied from V3[4].
In sum, most of vm-scalability applications do not see performance
change, and the others see ~4% to ~26% performance boost under default LRU
and ~2% to ~6% performance boost under CONFIG_LRU_GEN.


Changelog
===

From V5 [6]
1. Removed unused parameter in prepare_free_pages() and used it instead
of my old prepare_free_pages_fpi_none() (per Vlastimil Babka).

2. Removed unnecessary INIT_LIST_HEAD() in compaction_free()
(per Vlastimil Babka).

3. Fixed cc->nr_migratepages update in compaction_free()
(per Vlastimil Babka).


From V4 [5]:
1. Refactored code in compaction_alloc() in Patch 3 (per Yu Zhao).


From V3 [4]:
1. Restructured isolate_migratepages_block() to minimize PageHuge() use
in Patch 1 (per Vlastimil Babka).

2. Used folio_put_testzero() instead of folio_set_count() to properly
handle free pages in compaction_free() (per Vlastimil Babka).

3. Simplified code to use struct list_head instead of a new struct page_list
(per Vlastimil Babka).

4. Restructured compaction_alloc() code to reduce indentation and
increase readability (per Vlastimil Babka).


From V2 [3]:
1. Added missing free page count in fast isolation path. This fixed the
weird performance outcome.


From V1 [2]:
1. Used folio_test_large() instead of folio_order() > 0. (per Matthew
Wilcox)

2. Fixed code rebase error. (per Baolin Wang)

3. Used list_split_init() instead of list_split(). (per Ryan Boberts)

4. Added free_pages_prepare_fpi_none() to avoid duplicate free page code
in compaction_free().

5. Dropped source page order sorting patch.


From RFC [1]:
1. Enabled >0 order folio compaction in the first patch by splitting all
to-be-migrated folios. (per Huang, Ying)

2. Stopped isolating compound pages with order greater than cc->order
to avoid wasting effort, since cc->order gives a hint that no free pages
with order greater than it exist, thus migrating the compound pages will fail.
(per Baolin Wang)

3. Retained the folio check within lru lock. (per Baolin Wang)

4. Made isolate_freepages_block() generate order-sorted multi lists.
(per Johannes Weiner)

Overview
===

To support >0 order folio compaction, the patchset changes how free pages used
for migration are kept during compaction. Free pages used to be split into
order-0 pages that are post allocation processed (i.e., PageBuddy flag cleared,
page order stored in page->private is zeroed, and page reference is set to 1).
Now all free pages are kept in a NR_PAGE_ORDER array of page lists based
on their order without post allocation process. When migrate_pages() asks for
a new page, one of the free pages, based on the requested page order, is
then processed and given out. And THP <2MB would need this feature.


Feel free to give comments and ask questions.

Thanks.

[1] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/
[2] https://lore.kernel.org/linux-mm/20231113170157.280181-1-zi.yan@sent.com/
[3] https://lore.kernel.org/linux-mm/20240123034636.1095672-1-zi.yan@sent.com/
[4] https://lore.kernel.org/linux-mm/20240202161554.565023-1-zi.yan@sent.com/
[5] https://lore.kernel.org/linux-mm/20240212163510.859822-1-zi.yan@sent.com/
[6] https://lore.kernel.org/linux-mm/20240214220420.1229173-1-zi.yan@sent.com/


Hi Andrew,

Baolin's patch on nr_migratepages was based on this one, a better fixup
for it might be below. Since before my patchset, compaction only deals with
order-0 pages.

diff --git a/mm/compaction.c b/mm/compaction.c
index 01ec85cfd623f..e60135e2019d6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1798,7 +1798,7 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
 	dst = list_entry(cc->freepages.next, struct folio, lru);
 	list_del(&dst->lru);
 	cc->nr_freepages--;
-	cc->nr_migratepages -= 1 << order;
+	cc->nr_migratepages--;
 
 	return dst;
 }
@@ -1814,7 +1814,7 @@ static void compaction_free(struct folio *dst, unsigned long data)
 
 	list_add(&dst->lru, &cc->freepages);
 	cc->nr_freepages++;
-	cc->nr_migratepages += 1 << order;
+	cc->nr_migratepages++;
 }


vm-scalability results on CONFIG_LRU_GEN
===

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/defconfig/debian/300s/qemu-vm/mmap-xread-seq-mt/vm-scalability

commit: 
  6.8.0-rc1-mm-everything-2024-01-29-07-19+
  6.8.0-rc1-split-folio-in-compaction+
  6.8.0-rc1-folio-migration-in-compaction+
  6.8.0-rc1-folio-migration-free-page-split+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f 
---------------- --------------------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \  
  15107616            +3.2%   15590339            +1.3%   15297619            +3.0%   15567998        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/defconfig/debian/300s/qemu-vm/mmap-pread-seq/vm-scalability

commit: 
  6.8.0-rc1-mm-everything-2024-01-29-07-19+
  6.8.0-rc1-split-folio-in-compaction+
  6.8.0-rc1-folio-migration-in-compaction+
  6.8.0-rc1-folio-migration-free-page-split+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f 
---------------- --------------------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \  
  12611785            +1.8%   12832919            +0.9%   12724223            +1.6%   12812682        vm-scalability.throughput


=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/defconfig/debian/300s/qemu-vm/lru-file-readtwice/vm-scalability

commit: 
  6.8.0-rc1-mm-everything-2024-01-29-07-19+
  6.8.0-rc1-split-folio-in-compaction+
  6.8.0-rc1-folio-migration-in-compaction+
  6.8.0-rc1-folio-migration-free-page-split+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f 
---------------- --------------------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \  
   9833393            +5.7%   10390190            +3.0%   10126606            +5.9%   10408804        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/defconfig/debian/300s/qemu-vm/lru-file-mmap-read/vm-scalability

commit: 
  6.8.0-rc1-mm-everything-2024-01-29-07-19+
  6.8.0-rc1-split-folio-in-compaction+
  6.8.0-rc1-folio-migration-in-compaction+
  6.8.0-rc1-folio-migration-free-page-split+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f 
---------------- --------------------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \  
   7034709 ±  3%      +2.9%    7241429            +3.2%    7256680 ±  2%      +3.9%    7308375        vm-scalability.throughput



vm-scalability results on default LRU (with -no-mglru suffix)
===

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/defconfig/debian/300s/qemu-vm/mmap-xread-seq-mt/vm-scalability

commit: 
  6.8.0-rc1-mm-everything-2024-01-29-07-19-no-mglru+
  6.8.0-rc1-split-folio-in-compaction-no-mglru+
  6.8.0-rc1-folio-migration-in-compaction-no-mglru+
  6.8.0-rc1-folio-migration-free-page-split-no-mglru+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f 
---------------- --------------------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \  
  14401491            +3.7%   14940270            +2.4%   14748626            +4.0%   14975716        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/defconfig/debian/300s/qemu-vm/mmap-pread-seq/vm-scalability

commit: 
  6.8.0-rc1-mm-everything-2024-01-29-07-19-no-mglru+
  6.8.0-rc1-split-folio-in-compaction-no-mglru+
  6.8.0-rc1-folio-migration-in-compaction-no-mglru+
  6.8.0-rc1-folio-migration-free-page-split-no-mglru+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f 
---------------- --------------------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \  
  11407497            +5.1%   11989632            -0.5%   11349272            +4.8%   11957423        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/defconfig/debian/300s/qemu-vm/mmap-pread-seq-mt/vm-scalability

commit: 
  6.8.0-rc1-mm-everything-2024-01-29-07-19-no-mglru+
  6.8.0-rc1-split-folio-in-compaction-no-mglru+
  6.8.0-rc1-folio-migration-in-compaction-no-mglru+
  6.8.0-rc1-folio-migration-free-page-split-no-mglru+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f 
---------------- --------------------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \  
  11348474            +3.3%   11719453            -1.2%   11208759            +3.7%   11771926        vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/defconfig/debian/300s/qemu-vm/lru-file-readtwice/vm-scalability

commit: 
  6.8.0-rc1-mm-everything-2024-01-29-07-19-no-mglru+
  6.8.0-rc1-split-folio-in-compaction-no-mglru+
  6.8.0-rc1-folio-migration-in-compaction-no-mglru+
  6.8.0-rc1-folio-migration-free-page-split-no-mglru+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f 
---------------- --------------------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \  
   8065614 ±  3%      +7.7%    8686626 ±  2%      +5.0%    8467577 ±  4%     +11.8%    9016077 ±  2%  vm-scalability.throughput

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-13/defconfig/debian/300s/qemu-vm/lru-file-mmap-read/vm-scalability

commit: 
  6.8.0-rc1-mm-everything-2024-01-29-07-19-no-mglru+
  6.8.0-rc1-split-folio-in-compaction-no-mglru+
  6.8.0-rc1-folio-migration-in-compaction-no-mglru+
  6.8.0-rc1-folio-migration-free-page-split-no-mglru+

6.8.0-rc1-mm-eve 6.8.0-rc1-split-folio-in-co 6.8.0-rc1-folio-migration-i 6.8.0-rc1-folio-migration-f 
---------------- --------------------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \  
   6438422 ±  2%     +27.5%    8206734 ±  2%     +10.6%    7118390           +26.2%    8127192 ±  4%  vm-scalability.throughput

Zi Yan (4):
  mm/page_alloc: remove unused fpi_flags in free_pages_prepare()
  mm/compaction: enable compacting >0 order folios.
  mm/compaction: add support for >0 order folio memory compaction.
  mm/compaction: optimize >0 order folio compaction with free page
    split.

 mm/compaction.c | 225 ++++++++++++++++++++++++++++++++----------------
 mm/internal.h   |   4 +-
 mm/page_alloc.c |  12 +--
 3 files changed, 162 insertions(+), 79 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v6 1/4] mm/page_alloc: remove unused fpi_flags in free_pages_prepare()
  2024-02-16 17:04 [PATCH v6 0/4] Enable >0 order folio memory compaction Zi Yan
@ 2024-02-16 17:04 ` Zi Yan
  2024-02-16 17:19   ` Vlastimil Babka
  2024-02-20  8:48   ` David Hildenbrand
  2024-02-16 17:04 ` [PATCH v6 2/4] mm/compaction: enable compacting >0 order folios Zi Yan
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Zi Yan @ 2024-02-16 17:04 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Zi Yan, Huang, Ying, Ryan Roberts, Andrew Morton,
	Matthew Wilcox (Oracle),
	David Hildenbrand, Yin, Fengwei, Yu Zhao, Vlastimil Babka,
	Kirill A . Shutemov, Johannes Weiner, Baolin Wang, Kemeng Shi,
	Mel Gorman, Rohan Puri, Mcgrof Chamberlain, Adam Manzanares,
	Vishal Moola (Oracle)

From: Zi Yan <ziy@nvidia.com>

fpi_flags is only passed to should_skip_kasan_poison() but ignored
by the function. Remove the unused parameter.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/page_alloc.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7ae4b74c9e5c..70c1ed3addf3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1061,7 +1061,7 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
  * on-demand allocation and then freed again before the deferred pages
  * initialization is done, but this is not likely to happen.
  */
-static inline bool should_skip_kasan_poison(struct page *page, fpi_t fpi_flags)
+static inline bool should_skip_kasan_poison(struct page *page)
 {
 	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
 		return deferred_pages_enabled();
@@ -1081,10 +1081,10 @@ static void kernel_init_pages(struct page *page, int numpages)
 }
 
 static __always_inline bool free_pages_prepare(struct page *page,
-			unsigned int order, fpi_t fpi_flags)
+			unsigned int order)
 {
 	int bad = 0;
-	bool skip_kasan_poison = should_skip_kasan_poison(page, fpi_flags);
+	bool skip_kasan_poison = should_skip_kasan_poison(page);
 	bool init = want_init_on_free();
 	bool compound = PageCompound(page);
 
@@ -1266,7 +1266,7 @@ static void __free_pages_ok(struct page *page, unsigned int order,
 	unsigned long pfn = page_to_pfn(page);
 	struct zone *zone = page_zone(page);
 
-	if (!free_pages_prepare(page, order, fpi_flags))
+	if (!free_pages_prepare(page, order))
 		return;
 
 	/*
@@ -2379,7 +2379,7 @@ static bool free_unref_page_prepare(struct page *page, unsigned long pfn,
 {
 	int migratetype;
 
-	if (!free_pages_prepare(page, order, FPI_NONE))
+	if (!free_pages_prepare(page, order))
 		return false;
 
 	migratetype = get_pfnblock_migratetype(page, pfn);
-- 
2.43.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v6 2/4] mm/compaction: enable compacting >0 order folios.
  2024-02-16 17:04 [PATCH v6 0/4] Enable >0 order folio memory compaction Zi Yan
  2024-02-16 17:04 ` [PATCH v6 1/4] mm/page_alloc: remove unused fpi_flags in free_pages_prepare() Zi Yan
@ 2024-02-16 17:04 ` Zi Yan
  2024-02-20  9:03   ` David Hildenbrand
  2024-02-16 17:04 ` [PATCH v6 3/4] mm/compaction: add support for >0 order folio memory compaction Zi Yan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Zi Yan @ 2024-02-16 17:04 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Zi Yan, Huang, Ying, Ryan Roberts, Andrew Morton,
	Matthew Wilcox (Oracle),
	David Hildenbrand, Yin, Fengwei, Yu Zhao, Vlastimil Babka,
	Kirill A . Shutemov, Johannes Weiner, Baolin Wang, Kemeng Shi,
	Mel Gorman, Rohan Puri, Mcgrof Chamberlain, Adam Manzanares,
	Vishal Moola (Oracle)

From: Zi Yan <ziy@nvidia.com>

migrate_pages() supports >0 order folio migration and during compaction,
even if compaction_alloc() cannot provide >0 order free pages,
migrate_pages() can split the source page and try to migrate the base
pages from the split.  It can be a baseline and start point for adding
support for compacting >0 order folios.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Suggested-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Adam Manzanares <a.manzanares@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/compaction.c | 66 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 52 insertions(+), 14 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index cc801ce099b4..aa6aad805c4d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -816,6 +816,21 @@ static bool too_many_isolated(struct compact_control *cc)
 	return too_many;
 }
 
+/*
+ * 1. if the page order is larger than or equal to target_order (i.e.,
+ * cc->order and when it is not -1 for global compaction), skip it since
+ * target_order already indicates no free page with larger than target_order
+ * exists and later migrating it will most likely fail;
+ *
+ * 2. compacting > pageblock_order pages does not improve memory fragmentation,
+ * skip them;
+ */
+static bool skip_isolation_on_order(int order, int target_order)
+{
+	return (target_order != -1 && order >= target_order) ||
+		order >= pageblock_order;
+}
+
 /**
  * isolate_migratepages_block() - isolate all migrate-able pages within
  *				  a single pageblock
@@ -947,7 +962,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			valid_page = page;
 		}
 
-		if (PageHuge(page) && cc->alloc_contig) {
+		if (PageHuge(page)) {
+			/*
+			 * skip hugetlbfs if we are not compacting for pages
+			 * bigger than its order. THPs and other compound pages
+			 * are handled below.
+			 */
+			if (!cc->alloc_contig) {
+				const unsigned int order = compound_order(page);
+
+				if (order <= MAX_PAGE_ORDER) {
+					low_pfn += (1UL << order) - 1;
+					nr_scanned += (1UL << order) - 1;
+				}
+				goto isolate_fail;
+			}
+			/* for alloc_contig case */
 			if (locked) {
 				unlock_page_lruvec_irqrestore(locked, flags);
 				locked = NULL;
@@ -1008,21 +1038,24 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		}
 
 		/*
-		 * Regardless of being on LRU, compound pages such as THP and
-		 * hugetlbfs are not to be compacted unless we are attempting
-		 * an allocation much larger than the huge page size (eg CMA).
-		 * We can potentially save a lot of iterations if we skip them
-		 * at once. The check is racy, but we can consider only valid
-		 * values and the only danger is skipping too much.
+		 * Regardless of being on LRU, compound pages such as THP
+		 * (hugetlbfs is handled above) are not to be compacted unless
+		 * we are attempting an allocation larger than the compound
+		 * page size. We can potentially save a lot of iterations if we
+		 * skip them at once. The check is racy, but we can consider
+		 * only valid values and the only danger is skipping too much.
 		 */
 		if (PageCompound(page) && !cc->alloc_contig) {
 			const unsigned int order = compound_order(page);
 
-			if (likely(order <= MAX_PAGE_ORDER)) {
-				low_pfn += (1UL << order) - 1;
-				nr_scanned += (1UL << order) - 1;
+			/* Skip based on page order and compaction target order. */
+			if (skip_isolation_on_order(order, cc->order)) {
+				if (order <= MAX_PAGE_ORDER) {
+					low_pfn += (1UL << order) - 1;
+					nr_scanned += (1UL << order) - 1;
+				}
+				goto isolate_fail;
 			}
-			goto isolate_fail;
 		}
 
 		/*
@@ -1165,10 +1198,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			}
 
 			/*
-			 * folio become large since the non-locked check,
-			 * and it's on LRU.
+			 * Check LRU folio order under the lock
 			 */
-			if (unlikely(folio_test_large(folio) && !cc->alloc_contig)) {
+			if (unlikely(skip_isolation_on_order(folio_order(folio),
+							     cc->order) &&
+				     !cc->alloc_contig)) {
 				low_pfn += folio_nr_pages(folio) - 1;
 				nr_scanned += folio_nr_pages(folio) - 1;
 				folio_set_lru(folio);
@@ -1788,6 +1822,10 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
 	struct compact_control *cc = (struct compact_control *)data;
 	struct folio *dst;
 
+	/* this makes migrate_pages() split the source page and retry */
+	if (folio_test_large(src) > 0)
+		return NULL;
+
 	if (list_empty(&cc->freepages)) {
 		isolate_freepages(cc);
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v6 3/4] mm/compaction: add support for >0 order folio memory compaction.
  2024-02-16 17:04 [PATCH v6 0/4] Enable >0 order folio memory compaction Zi Yan
  2024-02-16 17:04 ` [PATCH v6 1/4] mm/page_alloc: remove unused fpi_flags in free_pages_prepare() Zi Yan
  2024-02-16 17:04 ` [PATCH v6 2/4] mm/compaction: enable compacting >0 order folios Zi Yan
@ 2024-02-16 17:04 ` Zi Yan
  2024-02-16 17:04 ` [PATCH v6 4/4] mm/compaction: optimize >0 order folio compaction with free page split Zi Yan
  2024-02-20  2:06 ` [PATCH v6 0/4] Enable >0 order folio memory compaction Andrew Morton
  4 siblings, 0 replies; 16+ messages in thread
From: Zi Yan @ 2024-02-16 17:04 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Zi Yan, Huang, Ying, Ryan Roberts, Andrew Morton,
	Matthew Wilcox (Oracle),
	David Hildenbrand, Yin, Fengwei, Yu Zhao, Vlastimil Babka,
	Kirill A . Shutemov, Johannes Weiner, Baolin Wang, Kemeng Shi,
	Mel Gorman, Rohan Puri, Mcgrof Chamberlain, Adam Manzanares,
	Vishal Moola (Oracle)

From: Zi Yan <ziy@nvidia.com>

Before last commit, memory compaction only migrates order-0 folios and
skips >0 order folios.  Last commit splits all >0 order folios during
compaction.  This commit migrates >0 order folios during compaction by
keeping isolated free pages at their original size without splitting them
into order-0 pages and using them directly during migration process.

What is different from the prior implementation:
1. All isolated free pages are kept in a NR_PAGE_ORDERS array of page
   lists, where each page list stores free pages in the same order.
2. All free pages are not post_alloc_hook() processed nor buddy pages,
   although their orders are stored in first page's private like buddy
   pages.
3. During migration, in new page allocation time (i.e., in
   compaction_alloc()), free pages are then processed by post_alloc_hook().
   When migration fails and a new page is returned (i.e., in
   compaction_free()), free pages are restored by reversing the
   post_alloc_hook() operations using newly added
   free_pages_prepare_fpi_none().

Step 3 is done for a latter optimization that splitting and/or merging
free pages during compaction becomes easier.

Note: without splitting free pages, compaction can end prematurely due to
migration will return -ENOMEM even if there is free pages.  This happens
when no order-0 free page exist and compaction_alloc() return NULL.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Adam Manzanares <a.manzanares@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/compaction.c | 140 +++++++++++++++++++++++++++---------------------
 mm/internal.h   |   4 +-
 mm/page_alloc.c |   2 +-
 3 files changed, 83 insertions(+), 63 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index aa6aad805c4d..f21836ca243a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -66,45 +66,56 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
 #define COMPACTION_HPAGE_ORDER	(PMD_SHIFT - PAGE_SHIFT)
 #endif
 
-static unsigned long release_freepages(struct list_head *freelist)
+static void split_map_pages(struct list_head *freepages)
 {
+	unsigned int i, order;
 	struct page *page, *next;
-	unsigned long high_pfn = 0;
+	LIST_HEAD(tmp_list);
 
-	list_for_each_entry_safe(page, next, freelist, lru) {
-		unsigned long pfn = page_to_pfn(page);
-		list_del(&page->lru);
-		__free_page(page);
-		if (pfn > high_pfn)
-			high_pfn = pfn;
-	}
+	for (order = 0; order < NR_PAGE_ORDERS; order++) {
+		list_for_each_entry_safe(page, next, &freepages[order], lru) {
+			unsigned int nr_pages;
 
-	return high_pfn;
+			list_del(&page->lru);
+
+			nr_pages = 1 << order;
+
+			post_alloc_hook(page, order, __GFP_MOVABLE);
+			if (order)
+				split_page(page, order);
+
+			for (i = 0; i < nr_pages; i++) {
+				list_add(&page->lru, &tmp_list);
+				page++;
+			}
+		}
+		list_splice_init(&tmp_list, &freepages[0]);
+	}
 }
 
-static void split_map_pages(struct list_head *list)
+static unsigned long release_free_list(struct list_head *freepages)
 {
-	unsigned int i, order, nr_pages;
-	struct page *page, *next;
-	LIST_HEAD(tmp_list);
-
-	list_for_each_entry_safe(page, next, list, lru) {
-		list_del(&page->lru);
+	int order;
+	unsigned long high_pfn = 0;
 
-		order = page_private(page);
-		nr_pages = 1 << order;
+	for (order = 0; order < NR_PAGE_ORDERS; order++) {
+		struct page *page, *next;
 
-		post_alloc_hook(page, order, __GFP_MOVABLE);
-		if (order)
-			split_page(page, order);
+		list_for_each_entry_safe(page, next, &freepages[order], lru) {
+			unsigned long pfn = page_to_pfn(page);
 
-		for (i = 0; i < nr_pages; i++) {
-			list_add(&page->lru, &tmp_list);
-			page++;
+			list_del(&page->lru);
+			/*
+			 * Convert free pages into post allocation pages, so
+			 * that we can free them via __free_page.
+			 */
+			post_alloc_hook(page, order, __GFP_MOVABLE);
+			__free_pages(page, order);
+			if (pfn > high_pfn)
+				high_pfn = pfn;
 		}
 	}
-
-	list_splice(&tmp_list, list);
+	return high_pfn;
 }
 
 #ifdef CONFIG_COMPACTION
@@ -657,7 +668,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 		nr_scanned += isolated - 1;
 		total_isolated += isolated;
 		cc->nr_freepages += isolated;
-		list_add_tail(&page->lru, freelist);
+		list_add_tail(&page->lru, &freelist[order]);
 
 		if (!strict && cc->nr_migratepages <= cc->nr_freepages) {
 			blockpfn += isolated;
@@ -722,7 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
 			unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long isolated, pfn, block_start_pfn, block_end_pfn;
-	LIST_HEAD(freelist);
+	int order;
+	struct list_head tmp_freepages[NR_PAGE_ORDERS];
+
+	for (order = 0; order < NR_PAGE_ORDERS; order++)
+		INIT_LIST_HEAD(&tmp_freepages[order]);
 
 	pfn = start_pfn;
 	block_start_pfn = pageblock_start_pfn(pfn);
@@ -753,7 +768,7 @@ isolate_freepages_range(struct compact_control *cc,
 			break;
 
 		isolated = isolate_freepages_block(cc, &isolate_start_pfn,
-					block_end_pfn, &freelist, 0, true);
+					block_end_pfn, tmp_freepages, 0, true);
 
 		/*
 		 * In strict mode, isolate_freepages_block() returns 0 if
@@ -770,15 +785,15 @@ isolate_freepages_range(struct compact_control *cc,
 		 */
 	}
 
-	/* __isolate_free_page() does not map the pages */
-	split_map_pages(&freelist);
-
 	if (pfn < end_pfn) {
 		/* Loop terminated early, cleanup. */
-		release_freepages(&freelist);
+		release_free_list(tmp_freepages);
 		return 0;
 	}
 
+	/* __isolate_free_page() does not map the pages */
+	split_map_pages(tmp_freepages);
+
 	/* We don't use freelists for anything. */
 	return pfn;
 }
@@ -1494,7 +1509,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
 	if (!page)
 		return;
 
-	isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
+	isolate_freepages_block(cc, &start_pfn, end_pfn, cc->freepages, 1, false);
 
 	/* Skip this pageblock in the future as it's full or nearly full */
 	if (start_pfn == end_pfn && !cc->no_set_skip_hint)
@@ -1623,7 +1638,7 @@ static void fast_isolate_freepages(struct compact_control *cc)
 				nr_scanned += nr_isolated - 1;
 				total_isolated += nr_isolated;
 				cc->nr_freepages += nr_isolated;
-				list_add_tail(&page->lru, &cc->freepages);
+				list_add_tail(&page->lru, &cc->freepages[order]);
 				count_compact_events(COMPACTISOLATED, nr_isolated);
 			} else {
 				/* If isolation fails, abort the search */
@@ -1700,13 +1715,12 @@ static void isolate_freepages(struct compact_control *cc)
 	unsigned long isolate_start_pfn; /* exact pfn we start at */
 	unsigned long block_end_pfn;	/* end of current pageblock */
 	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
-	struct list_head *freelist = &cc->freepages;
 	unsigned int stride;
 
 	/* Try a small search of the free lists for a candidate */
 	fast_isolate_freepages(cc);
 	if (cc->nr_freepages)
-		goto splitmap;
+		return;
 
 	/*
 	 * Initialise the free scanner. The starting point is where we last
@@ -1766,7 +1780,7 @@ static void isolate_freepages(struct compact_control *cc)
 
 		/* Found a block suitable for isolating free pages from. */
 		nr_isolated = isolate_freepages_block(cc, &isolate_start_pfn,
-					block_end_pfn, freelist, stride, false);
+					block_end_pfn, cc->freepages, stride, false);
 
 		/* Update the skip hint if the full pageblock was scanned */
 		if (isolate_start_pfn == block_end_pfn)
@@ -1807,10 +1821,6 @@ static void isolate_freepages(struct compact_control *cc)
 	 * and the loop terminated due to isolate_start_pfn < low_pfn
 	 */
 	cc->free_pfn = isolate_start_pfn;
-
-splitmap:
-	/* __isolate_free_page() does not map the pages */
-	split_map_pages(freelist);
 }
 
 /*
@@ -1821,24 +1831,22 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
 {
 	struct compact_control *cc = (struct compact_control *)data;
 	struct folio *dst;
+	int order = folio_order(src);
 
-	/* this makes migrate_pages() split the source page and retry */
-	if (folio_test_large(src) > 0)
-		return NULL;
-
-	if (list_empty(&cc->freepages)) {
+	if (list_empty(&cc->freepages[order])) {
 		isolate_freepages(cc);
-
-		if (list_empty(&cc->freepages))
+		if (list_empty(&cc->freepages[order]))
 			return NULL;
 	}
 
-	dst = list_entry(cc->freepages.next, struct folio, lru);
+	dst = list_first_entry(&cc->freepages[order], struct folio, lru);
 	list_del(&dst->lru);
-	cc->nr_freepages--;
-	cc->nr_migratepages -= 1 << folio_order(src);
-
-	return dst;
+	post_alloc_hook(&dst->page, order, __GFP_MOVABLE);
+	if (order)
+		prep_compound_page(&dst->page, order);
+	cc->nr_freepages -= 1 << order;
+	cc->nr_migratepages -= 1 << order;
+	return page_rmappable_folio(&dst->page);
 }
 
 /*
@@ -1849,10 +1857,19 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
 static void compaction_free(struct folio *dst, unsigned long data)
 {
 	struct compact_control *cc = (struct compact_control *)data;
+	int order = folio_order(dst);
+	struct page *page = &dst->page;
 
-	list_add(&dst->lru, &cc->freepages);
-	cc->nr_freepages++;
-	cc->nr_migratepages += 1 << folio_order(dst);
+	if (folio_put_testzero(dst)) {
+		free_pages_prepare(page, order);
+		list_add(&dst->lru, &cc->freepages[order]);
+		cc->nr_freepages += 1 << order;
+	}
+	cc->nr_migratepages += 1 << order;
+	/*
+	 * someone else has referenced the page, we cannot take it back to our
+	 * free list.
+	 */
 }
 
 /* possible outcome of isolate_migratepages */
@@ -2476,6 +2493,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 	const bool sync = cc->mode != MIGRATE_ASYNC;
 	bool update_cached;
 	unsigned int nr_succeeded = 0;
+	int order;
 
 	/*
 	 * These counters track activities during zone compaction.  Initialize
@@ -2485,7 +2503,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 	cc->total_free_scanned = 0;
 	cc->nr_migratepages = 0;
 	cc->nr_freepages = 0;
-	INIT_LIST_HEAD(&cc->freepages);
+	for (order = 0; order < NR_PAGE_ORDERS; order++)
+		INIT_LIST_HEAD(&cc->freepages[order]);
 	INIT_LIST_HEAD(&cc->migratepages);
 
 	cc->migratetype = gfp_migratetype(cc->gfp_mask);
@@ -2671,7 +2690,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 	 * so we don't leave any returned pages behind in the next attempt.
 	 */
 	if (cc->nr_freepages > 0) {
-		unsigned long free_pfn = release_freepages(&cc->freepages);
+		unsigned long free_pfn = release_free_list(cc->freepages);
 
 		cc->nr_freepages = 0;
 		VM_BUG_ON(free_pfn == 0);
@@ -2690,7 +2709,6 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 
 	trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret);
 
-	VM_BUG_ON(!list_empty(&cc->freepages));
 	VM_BUG_ON(!list_empty(&cc->migratepages));
 
 	return ret;
diff --git a/mm/internal.h b/mm/internal.h
index 1e29c5821a1d..93e229112045 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -447,6 +447,8 @@ extern void prep_compound_page(struct page *page, unsigned int order);
 
 extern void post_alloc_hook(struct page *page, unsigned int order,
 					gfp_t gfp_flags);
+extern bool free_pages_prepare(struct page *page, unsigned int order);
+
 extern int user_min_free_kbytes;
 
 extern void free_unref_page(struct page *page, unsigned int order);
@@ -481,7 +483,7 @@ int split_free_page(struct page *free_page,
  * completes when free_pfn <= migrate_pfn
  */
 struct compact_control {
-	struct list_head freepages;	/* List of free pages to migrate to */
+	struct list_head freepages[NR_PAGE_ORDERS];	/* List of free pages to migrate to */
 	struct list_head migratepages;	/* List of pages being migrated */
 	unsigned int nr_freepages;	/* Number of isolated free pages */
 	unsigned int nr_migratepages;	/* Number of pages to migrate */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 70c1ed3addf3..b0b92ce997dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1080,7 +1080,7 @@ static void kernel_init_pages(struct page *page, int numpages)
 	kasan_enable_current();
 }
 
-static __always_inline bool free_pages_prepare(struct page *page,
+__always_inline bool free_pages_prepare(struct page *page,
 			unsigned int order)
 {
 	int bad = 0;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v6 4/4] mm/compaction: optimize >0 order folio compaction with free page split.
  2024-02-16 17:04 [PATCH v6 0/4] Enable >0 order folio memory compaction Zi Yan
                   ` (2 preceding siblings ...)
  2024-02-16 17:04 ` [PATCH v6 3/4] mm/compaction: add support for >0 order folio memory compaction Zi Yan
@ 2024-02-16 17:04 ` Zi Yan
  2024-02-20  2:06 ` [PATCH v6 0/4] Enable >0 order folio memory compaction Andrew Morton
  4 siblings, 0 replies; 16+ messages in thread
From: Zi Yan @ 2024-02-16 17:04 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Zi Yan, Huang, Ying, Ryan Roberts, Andrew Morton,
	Matthew Wilcox (Oracle),
	David Hildenbrand, Yin, Fengwei, Yu Zhao, Vlastimil Babka,
	Kirill A . Shutemov, Johannes Weiner, Baolin Wang, Kemeng Shi,
	Mel Gorman, Rohan Puri, Mcgrof Chamberlain, Adam Manzanares,
	Vishal Moola (Oracle)

From: Zi Yan <ziy@nvidia.com>

During migration in a memory compaction, free pages are placed in an array
of page lists based on their order.  But the desired free page order
(i.e., the order of a source page) might not be always present, thus
leading to migration failures and premature compaction termination.  Split
a high order free pages when source migration page has a lower order to
increase migration successful rate.

Note: merging free pages when a migration fails and a lower order free
page is returned via compaction_free() is possible, but there is too much
work.  Since the free pages are not buddy pages, it is hard to identify
these free pages using existing PFN-based page merging algorithm.

Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Adam Manzanares <a.manzanares@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/compaction.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index f21836ca243a..b2a570e00a8e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1832,15 +1832,40 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
 	struct compact_control *cc = (struct compact_control *)data;
 	struct folio *dst;
 	int order = folio_order(src);
+	bool has_isolated_pages = false;
+	int start_order;
+	struct page *freepage;
+	unsigned long size;
+
+again:
+	for (start_order = order; start_order < NR_PAGE_ORDERS; start_order++)
+		if (!list_empty(&cc->freepages[start_order]))
+			break;
 
-	if (list_empty(&cc->freepages[order])) {
-		isolate_freepages(cc);
-		if (list_empty(&cc->freepages[order]))
+	/* no free pages in the list */
+	if (start_order == NR_PAGE_ORDERS) {
+		if (has_isolated_pages)
 			return NULL;
+		isolate_freepages(cc);
+		has_isolated_pages = true;
+		goto again;
+	}
+
+	freepage = list_first_entry(&cc->freepages[start_order], struct page,
+				lru);
+	size = 1 << start_order;
+
+	list_del(&freepage->lru);
+
+	while (start_order > order) {
+		start_order--;
+		size >>= 1;
+
+		list_add(&freepage[size].lru, &cc->freepages[start_order]);
+		set_page_private(&freepage[size], start_order);
 	}
+	dst = (struct folio *)freepage;
 
-	dst = list_first_entry(&cc->freepages[order], struct folio, lru);
-	list_del(&dst->lru);
 	post_alloc_hook(&dst->page, order, __GFP_MOVABLE);
 	if (order)
 		prep_compound_page(&dst->page, order);
-- 
2.43.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/4] mm/page_alloc: remove unused fpi_flags in free_pages_prepare()
  2024-02-16 17:04 ` [PATCH v6 1/4] mm/page_alloc: remove unused fpi_flags in free_pages_prepare() Zi Yan
@ 2024-02-16 17:19   ` Vlastimil Babka
  2024-02-20  8:48   ` David Hildenbrand
  1 sibling, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2024-02-16 17:19 UTC (permalink / raw)
  To: Zi Yan, linux-mm, linux-kernel
  Cc: Huang, Ying, Ryan Roberts, Andrew Morton, Matthew Wilcox (Oracle),
	David Hildenbrand, Yin, Fengwei, Yu Zhao, Kirill A . Shutemov,
	Johannes Weiner, Baolin Wang, Kemeng Shi, Mel Gorman, Rohan Puri,
	Mcgrof Chamberlain, Adam Manzanares, Vishal Moola (Oracle)

On 2/16/24 18:04, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> fpi_flags is only passed to should_skip_kasan_poison() but ignored
> by the function. Remove the unused parameter.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/4] Enable >0 order folio memory compaction
  2024-02-16 17:04 [PATCH v6 0/4] Enable >0 order folio memory compaction Zi Yan
                   ` (3 preceding siblings ...)
  2024-02-16 17:04 ` [PATCH v6 4/4] mm/compaction: optimize >0 order folio compaction with free page split Zi Yan
@ 2024-02-20  2:06 ` Andrew Morton
  2024-02-20  2:31   ` Zi Yan
  4 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2024-02-20  2:06 UTC (permalink / raw)
  To: Zi Yan
  Cc: Zi Yan, linux-mm, linux-kernel, Huang, Ying, Ryan Roberts,
	Matthew Wilcox (Oracle),
	David Hildenbrand, Yin, Fengwei, Yu Zhao, Vlastimil Babka,
	Kirill A . Shutemov, Johannes Weiner, Baolin Wang, Kemeng Shi,
	Mel Gorman, Rohan Puri, Mcgrof Chamberlain, Adam Manzanares,
	Vishal Moola (Oracle)

On Fri, 16 Feb 2024 12:04:28 -0500 Zi Yan <zi.yan@sent.com> wrote:

> Baolin's patch

Baolin writes many patches and patches have names, please use them!

> on nr_migratepages was based on this one, a better fixup
> for it might be below. Since before my patchset, compaction only deals with
> order-0 pages.

I don't understand what this means.  The patchset you sent applies OK
to mm-unstable so what else is there to do?

> diff --git a/mm/compaction.c b/mm/compaction.c
> index 01ec85cfd623f..e60135e2019d6 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1798,7 +1798,7 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
>  	dst = list_entry(cc->freepages.next, struct folio, lru);
>  	list_del(&dst->lru);
>  	cc->nr_freepages--;
> -	cc->nr_migratepages -= 1 << order;
> +	cc->nr_migratepages--;
>  
>  	return dst;
>  }
> @@ -1814,7 +1814,7 @@ static void compaction_free(struct folio *dst, unsigned long data)
>  
>  	list_add(&dst->lru, &cc->freepages);
>  	cc->nr_freepages++;
> -	cc->nr_migratepages += 1 << order;
> +	cc->nr_migratepages++;
>  }



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/4] Enable >0 order folio memory compaction
  2024-02-20  2:06 ` [PATCH v6 0/4] Enable >0 order folio memory compaction Andrew Morton
@ 2024-02-20  2:31   ` Zi Yan
  2024-02-20  3:00     ` Baolin Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Zi Yan @ 2024-02-20  2:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, "Huang, Ying",
	Ryan Roberts, "Matthew Wilcox (Oracle)",
	David Hildenbrand, "Yin, Fengwei",
	Yu Zhao, Vlastimil Babka, "Kirill A . Shutemov",
	Johannes Weiner, Baolin Wang, Kemeng Shi, Mel Gorman, Rohan Puri,
	Mcgrof Chamberlain, Adam Manzanares,
	"Vishal Moola (Oracle)"

[-- Attachment #1: Type: text/plain, Size: 2146 bytes --]

On 19 Feb 2024, at 21:06, Andrew Morton wrote:

> On Fri, 16 Feb 2024 12:04:28 -0500 Zi Yan <zi.yan@sent.com> wrote:
>
>> Baolin's patch
>
> Baolin writes many patches and patches have names, please use them!
Sorry for not being specific. I mean this fixup:
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-16-01-35&id=97f749c7c82f677f89bbf4f10de7816ce9b071fe

to this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-16-01-35&id=ea87b0558293a5ad597bea606fe261f7b2650cda


The patch was based on top of my early version of this patchset, thus
uses "cc->nr_migratepages -= 1 << order;" and
"cc->nr_migratepages += 1 << order;", but now it is applied before
mine. The change should be "cc->nr_migratepages--;" and
"cc->nr_migratepages++;", respectively.

>
>> on nr_migratepages was based on this one, a better fixup
>> for it might be below. Since before my patchset, compaction only deals with
>> order-0 pages.
>
> I don't understand what this means.  The patchset you sent applies OK
> to mm-unstable so what else is there to do?

Your above fixup to Baolin's patch needs to be changed to the patch below
and my "mm/compaction: add support for >0 order folio memory compaction" will
need to be adjusted accordingly to be applied on top.

Let me know if anything is unclear.

>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 01ec85cfd623f..e60135e2019d6 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1798,7 +1798,7 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
>>  	dst = list_entry(cc->freepages.next, struct folio, lru);
>>  	list_del(&dst->lru);
>>  	cc->nr_freepages--;
>> -	cc->nr_migratepages -= 1 << order;
>> +	cc->nr_migratepages--;
>>
>>  	return dst;
>>  }
>> @@ -1814,7 +1814,7 @@ static void compaction_free(struct folio *dst, unsigned long data)
>>
>>  	list_add(&dst->lru, &cc->freepages);
>>  	cc->nr_freepages++;
>> -	cc->nr_migratepages += 1 << order;
>> +	cc->nr_migratepages++;
>>  }


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/4] Enable >0 order folio memory compaction
  2024-02-20  2:31   ` Zi Yan
@ 2024-02-20  3:00     ` Baolin Wang
  2024-02-20  3:30       ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Baolin Wang @ 2024-02-20  3:00 UTC (permalink / raw)
  To: Zi Yan, Andrew Morton
  Cc: linux-mm, linux-kernel, Huang, Ying, Ryan Roberts,
	Matthew Wilcox (Oracle),
	David Hildenbrand, Yin, Fengwei, Yu Zhao, Vlastimil Babka,
	Kirill A . Shutemov, Johannes Weiner, Kemeng Shi, Mel Gorman,
	Rohan Puri, Mcgrof Chamberlain, Adam Manzanares,
	Vishal Moola (Oracle)



On 2024/2/20 10:31, Zi Yan wrote:
> On 19 Feb 2024, at 21:06, Andrew Morton wrote:
> 
>> On Fri, 16 Feb 2024 12:04:28 -0500 Zi Yan <zi.yan@sent.com> wrote:
>>
>>> Baolin's patch
>>
>> Baolin writes many patches and patches have names, please use them!
> Sorry for not being specific. I mean this fixup:
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-16-01-35&id=97f749c7c82f677f89bbf4f10de7816ce9b071fe
> 
> to this patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-16-01-35&id=ea87b0558293a5ad597bea606fe261f7b2650cda
> 
> 
> The patch was based on top of my early version of this patchset, thus
> uses "cc->nr_migratepages -= 1 << order;" and
> "cc->nr_migratepages += 1 << order;", but now it is applied before
> mine. The change should be "cc->nr_migratepages--;" and
> "cc->nr_migratepages++;", respectively.
> 
>>
>>> on nr_migratepages was based on this one, a better fixup
>>> for it might be below. Since before my patchset, compaction only deals with
>>> order-0 pages.
>>
>> I don't understand what this means.  The patchset you sent applies OK
>> to mm-unstable so what else is there to do?
> 
> Your above fixup to Baolin's patch needs to be changed to the patch below
> and my "mm/compaction: add support for >0 order folio memory compaction" will
> need to be adjusted accordingly to be applied on top.
> 
> Let me know if anything is unclear.

Hi Andrew,

To avoid conflicts, you can drop these two patches, and I will send a 
new version with fixing the issue pointed by Vlastimilb on top of 
"mm/compaction: add support for >0 order folio memory compaction".

https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-16-01-35&id=97f749c7c82f677f89bbf4f10de7816ce9b071fe

https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-16-01-35&id=ea87b0558293a5ad597bea606fe261f7b2650cda


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/4] Enable >0 order folio memory compaction
  2024-02-20  3:00     ` Baolin Wang
@ 2024-02-20  3:30       ` Andrew Morton
  2024-02-20  6:28         ` Baolin Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2024-02-20  3:30 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Zi Yan, linux-mm, linux-kernel, Huang, Ying, Ryan Roberts,
	Matthew Wilcox (Oracle),
	David Hildenbrand, Yin, Fengwei, Yu Zhao, Vlastimil Babka,
	Kirill A . Shutemov, Johannes Weiner, Kemeng Shi, Mel Gorman,
	Rohan Puri, Mcgrof Chamberlain, Adam Manzanares,
	Vishal Moola (Oracle)

On Tue, 20 Feb 2024 11:00:39 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote:

> > The patch was based on top of my early version of this patchset, thus
> > uses "cc->nr_migratepages -= 1 << order;" and
> > "cc->nr_migratepages += 1 << order;", but now it is applied before
> > mine. The change should be "cc->nr_migratepages--;" and
> > "cc->nr_migratepages++;", respectively.
> > 
> >>
> >>> on nr_migratepages was based on this one, a better fixup
> >>> for it might be below. Since before my patchset, compaction only deals with
> >>> order-0 pages.
> >>
> >> I don't understand what this means.  The patchset you sent applies OK
> >> to mm-unstable so what else is there to do?
> > 
> > Your above fixup to Baolin's patch needs to be changed to the patch below
> > and my "mm/compaction: add support for >0 order folio memory compaction" will
> > need to be adjusted accordingly to be applied on top.
> > 
> > Let me know if anything is unclear.
> 
> Hi Andrew,
> 
> To avoid conflicts, you can drop these two patches, and I will send a 
> new version with fixing the issue pointed by Vlastimilb on top of 
> "mm/compaction: add support for >0 order folio memory compaction".
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-16-01-35&id=97f749c7c82f677f89bbf4f10de7816ce9b071fe
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-16-01-35&id=ea87b0558293a5ad597bea606fe261f7b2650cda

Well I thought I'd fixed everything up 10 minutes ago.  Please take a
look at next mm-unstable.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/4] Enable >0 order folio memory compaction
  2024-02-20  3:30       ` Andrew Morton
@ 2024-02-20  6:28         ` Baolin Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Baolin Wang @ 2024-02-20  6:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zi Yan, linux-mm, linux-kernel, Huang, Ying, Ryan Roberts,
	Matthew Wilcox (Oracle),
	David Hildenbrand, Yin, Fengwei, Yu Zhao, Vlastimil Babka,
	Kirill A . Shutemov, Johannes Weiner, Kemeng Shi, Mel Gorman,
	Rohan Puri, Mcgrof Chamberlain, Adam Manzanares,
	Vishal Moola (Oracle)



On 2024/2/20 11:30, Andrew Morton wrote:
> On Tue, 20 Feb 2024 11:00:39 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
> 
>>> The patch was based on top of my early version of this patchset, thus
>>> uses "cc->nr_migratepages -= 1 << order;" and
>>> "cc->nr_migratepages += 1 << order;", but now it is applied before
>>> mine. The change should be "cc->nr_migratepages--;" and
>>> "cc->nr_migratepages++;", respectively.
>>>
>>>>
>>>>> on nr_migratepages was based on this one, a better fixup
>>>>> for it might be below. Since before my patchset, compaction only deals with
>>>>> order-0 pages.
>>>>
>>>> I don't understand what this means.  The patchset you sent applies OK
>>>> to mm-unstable so what else is there to do?
>>>
>>> Your above fixup to Baolin's patch needs to be changed to the patch below
>>> and my "mm/compaction: add support for >0 order folio memory compaction" will
>>> need to be adjusted accordingly to be applied on top.
>>>
>>> Let me know if anything is unclear.
>>
>> Hi Andrew,
>>
>> To avoid conflicts, you can drop these two patches, and I will send a
>> new version with fixing the issue pointed by Vlastimilb on top of
>> "mm/compaction: add support for >0 order folio memory compaction".
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-16-01-35&id=97f749c7c82f677f89bbf4f10de7816ce9b071fe
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything-2024-02-16-01-35&id=ea87b0558293a5ad597bea606fe261f7b2650cda
> 
> Well I thought I'd fixed everything up 10 minutes ago.  Please take a
> look at next mm-unstable.

Sure. And I found a minor rebase error in the compaction_alloc() 
function while Zi Yan's original patch is correct.

+	cc->nr_freepages -= 1 << order;
  	cc->nr_migratepages--;
-
-	return dst;
+	return page_rmappable_folio(&dst->page);
  }

should change to be:
cc->nr_migratepages -= 1 << order;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 1/4] mm/page_alloc: remove unused fpi_flags in free_pages_prepare()
  2024-02-16 17:04 ` [PATCH v6 1/4] mm/page_alloc: remove unused fpi_flags in free_pages_prepare() Zi Yan
  2024-02-16 17:19   ` Vlastimil Babka
@ 2024-02-20  8:48   ` David Hildenbrand
  1 sibling, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-02-20  8:48 UTC (permalink / raw)
  To: Zi Yan, linux-mm, linux-kernel
  Cc: Huang, Ying, Ryan Roberts, Andrew Morton, Matthew Wilcox (Oracle),
	Yin, Fengwei, Yu Zhao, Vlastimil Babka, Kirill A . Shutemov,
	Johannes Weiner, Baolin Wang, Kemeng Shi, Mel Gorman, Rohan Puri,
	Mcgrof Chamberlain, Adam Manzanares, Vishal Moola (Oracle)

On 16.02.24 18:04, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> fpi_flags is only passed to should_skip_kasan_poison() but ignored
> by the function. Remove the unused parameter.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   mm/page_alloc.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7ae4b74c9e5c..70c1ed3addf3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1061,7 +1061,7 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page)
>    * on-demand allocation and then freed again before the deferred pages
>    * initialization is done, but this is not likely to happen.
>    */
> -static inline bool should_skip_kasan_poison(struct page *page, fpi_t fpi_flags)
> +static inline bool should_skip_kasan_poison(struct page *page)
>   {
>   	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
>   		return deferred_pages_enabled();
> @@ -1081,10 +1081,10 @@ static void kernel_init_pages(struct page *page, int numpages)
>   }
>   
>   static __always_inline bool free_pages_prepare(struct page *page,
> -			unsigned int order, fpi_t fpi_flags)
> +			unsigned int order)
>   {
>   	int bad = 0;
> -	bool skip_kasan_poison = should_skip_kasan_poison(page, fpi_flags);
> +	bool skip_kasan_poison = should_skip_kasan_poison(page);
>   	bool init = want_init_on_free();
>   	bool compound = PageCompound(page);
>   
> @@ -1266,7 +1266,7 @@ static void __free_pages_ok(struct page *page, unsigned int order,
>   	unsigned long pfn = page_to_pfn(page);
>   	struct zone *zone = page_zone(page);
>   
> -	if (!free_pages_prepare(page, order, fpi_flags))
> +	if (!free_pages_prepare(page, order))
>   		return;
>   
>   	/*
> @@ -2379,7 +2379,7 @@ static bool free_unref_page_prepare(struct page *page, unsigned long pfn,
>   {
>   	int migratetype;
>   
> -	if (!free_pages_prepare(page, order, FPI_NONE))
> +	if (!free_pages_prepare(page, order))
>   		return false;
>   
>   	migratetype = get_pfnblock_migratetype(page, pfn);

Likely due to

commit 0a54864f8dfb64b64c84c9db6ff70e0e93690a33
Author: Peter Collingbourne <pcc@google.com>
Date:   Thu Mar 9 20:29:14 2023 -0800

     kasan: remove PG_skip_kasan_poison flag

That removed the usage of fpi_flags in should_skip_kasan_poison.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/4] mm/compaction: enable compacting >0 order folios.
  2024-02-16 17:04 ` [PATCH v6 2/4] mm/compaction: enable compacting >0 order folios Zi Yan
@ 2024-02-20  9:03   ` David Hildenbrand
  2024-02-20  9:11     ` David Hildenbrand
  2024-02-20 15:27     ` Zi Yan
  0 siblings, 2 replies; 16+ messages in thread
From: David Hildenbrand @ 2024-02-20  9:03 UTC (permalink / raw)
  To: Zi Yan, linux-mm, linux-kernel
  Cc: Huang, Ying, Ryan Roberts, Andrew Morton, Matthew Wilcox (Oracle),
	Yin, Fengwei, Yu Zhao, Vlastimil Babka, Kirill A . Shutemov,
	Johannes Weiner, Baolin Wang, Kemeng Shi, Mel Gorman, Rohan Puri,
	Mcgrof Chamberlain, Adam Manzanares, Vishal Moola (Oracle)

On 16.02.24 18:04, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> migrate_pages() supports >0 order folio migration and during compaction,
> even if compaction_alloc() cannot provide >0 order free pages,
> migrate_pages() can split the source page and try to migrate the base
> pages from the split.  It can be a baseline and start point for adding
> support for compacting >0 order folios.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Suggested-by: Huang Ying <ying.huang@intel.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Tested-by: Yu Zhao <yuzhao@google.com>
> Cc: Adam Manzanares <a.manzanares@samsung.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Luis Chamberlain <mcgrof@kernel.org>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Yin Fengwei <fengwei.yin@intel.com>
> ---
>   mm/compaction.c | 66 ++++++++++++++++++++++++++++++++++++++-----------
>   1 file changed, 52 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index cc801ce099b4..aa6aad805c4d 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -816,6 +816,21 @@ static bool too_many_isolated(struct compact_control *cc)
>   	return too_many;
>   }
>   
> +/*


Can't you add these comments to the respective checks? Like

static bool skip_isolation_on_order(int order, int target_order)
{
	/*
	 * Unless we are performing global compaction (targert_order <
	 * 0), skip any folios that are larger than the target order: we
	 * wouldn't be here if we'd have a free folio with the desired
	 * target_order, so migrating this folio would likely fail
	 * later.
	 */
	if (target_order != -1 && order >= target_order)
		return true;
	/*
	 * We limit memory compaction to pageblocks and won't try
	 * creating free blocks of memory that are larger than that.
	 */
	return order >= pageblock_order;
}

Then, add a simple expressive function documentation (if really 
required) that doesn't contain all these details.

> + * 1. if the page order is larger than or equal to target_order (i.e.,
> + * cc->order and when it is not -1 for global compaction), skip it since
> + * target_order already indicates no free page with larger than target_order
> + * exists and later migrating it will most likely fail;
> + *
> + * 2. compacting > pageblock_order pages does not improve memory fragmentation,

I'm pretty sure you meant "reduce" ?

> + * skip them;
> + */
> +static bool skip_isolation_on_order(int order, int target_order)
> +{
> +	return (target_order != -1 && order >= target_order) ||
> +		order >= pageblock_order;
> +}
> +
>   /**
>    * isolate_migratepages_block() - isolate all migrate-able pages within
>    *				  a single pageblock
> @@ -947,7 +962,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   			valid_page = page;
>   		}
>   
> -		if (PageHuge(page) && cc->alloc_contig) {
> +		if (PageHuge(page)) {
> +			/*
> +			 * skip hugetlbfs if we are not compacting for pages
> +			 * bigger than its order. THPs and other compound pages
> +			 * are handled below.
> +			 */
> +			if (!cc->alloc_contig) {
> +				const unsigned int order = compound_order(page);
> +
> +				if (order <= MAX_PAGE_ORDER) {
> +					low_pfn += (1UL << order) - 1;
> +					nr_scanned += (1UL << order) - 1;
> +				}
> +				goto isolate_fail;
> +			}
> +			/* for alloc_contig case */
>   			if (locked) {
>   				unlock_page_lruvec_irqrestore(locked, flags);
>   				locked = NULL;
> @@ -1008,21 +1038,24 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   		}
>   
>   		/*
> -		 * Regardless of being on LRU, compound pages such as THP and
> -		 * hugetlbfs are not to be compacted unless we are attempting
> -		 * an allocation much larger than the huge page size (eg CMA).
> -		 * We can potentially save a lot of iterations if we skip them
> -		 * at once. The check is racy, but we can consider only valid
> -		 * values and the only danger is skipping too much.
> +		 * Regardless of being on LRU, compound pages such as THP
> +		 * (hugetlbfs is handled above) are not to be compacted unless
> +		 * we are attempting an allocation larger than the compound
> +		 * page size. We can potentially save a lot of iterations if we
> +		 * skip them at once. The check is racy, but we can consider
> +		 * only valid values and the only danger is skipping too much.
>   		 */
>   		if (PageCompound(page) && !cc->alloc_contig) {
>   			const unsigned int order = compound_order(page);
>   
> -			if (likely(order <= MAX_PAGE_ORDER)) {
> -				low_pfn += (1UL << order) - 1;
> -				nr_scanned += (1UL << order) - 1;
> +			/* Skip based on page order and compaction target order. */
> +			if (skip_isolation_on_order(order, cc->order)) {
> +				if (order <= MAX_PAGE_ORDER) {
> +					low_pfn += (1UL << order) - 1;
> +					nr_scanned += (1UL << order) - 1;
> +				}
> +				goto isolate_fail;
>   			}
> -			goto isolate_fail;
>   		}
>   
>   		/*
> @@ -1165,10 +1198,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   			}
>   
>   			/*
> -			 * folio become large since the non-locked check,
> -			 * and it's on LRU.
> +			 * Check LRU folio order under the lock
>   			 */
> -			if (unlikely(folio_test_large(folio) && !cc->alloc_contig)) {
> +			if (unlikely(skip_isolation_on_order(folio_order(folio),
> +							     cc->order) &&
> +				     !cc->alloc_contig)) {
>   				low_pfn += folio_nr_pages(folio) - 1;
>   				nr_scanned += folio_nr_pages(folio) - 1;
>   				folio_set_lru(folio);
> @@ -1788,6 +1822,10 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
>   	struct compact_control *cc = (struct compact_control *)data;
>   	struct folio *dst;
>   
> +	/* this makes migrate_pages() split the source page and retry */
> +	if (folio_test_large(src) > 0)
> +		return NULL;

Why the "> 0 " check ? Either it's large or it isn't.

Apart from that LGTM, but I am no compaction expert.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/4] mm/compaction: enable compacting >0 order folios.
  2024-02-20  9:03   ` David Hildenbrand
@ 2024-02-20  9:11     ` David Hildenbrand
  2024-02-20 15:27       ` Zi Yan
  2024-02-20 15:27     ` Zi Yan
  1 sibling, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2024-02-20  9:11 UTC (permalink / raw)
  To: Zi Yan, linux-mm, linux-kernel
  Cc: Huang, Ying, Ryan Roberts, Andrew Morton, Matthew Wilcox (Oracle),
	Yin, Fengwei, Yu Zhao, Vlastimil Babka, Kirill A . Shutemov,
	Johannes Weiner, Baolin Wang, Kemeng Shi, Mel Gorman, Rohan Puri,
	Mcgrof Chamberlain, Adam Manzanares, Vishal Moola (Oracle)

On 20.02.24 10:03, David Hildenbrand wrote:
> On 16.02.24 18:04, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> migrate_pages() supports >0 order folio migration and during compaction,
>> even if compaction_alloc() cannot provide >0 order free pages,
>> migrate_pages() can split the source page and try to migrate the base
>> pages from the split.  It can be a baseline and start point for adding
>> support for compacting >0 order folios.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Suggested-by: Huang Ying <ying.huang@intel.com>
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Tested-by: Yu Zhao <yuzhao@google.com>
>> Cc: Adam Manzanares <a.manzanares@samsung.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Luis Chamberlain <mcgrof@kernel.org>
>> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>> ---
>>    mm/compaction.c | 66 ++++++++++++++++++++++++++++++++++++++-----------
>>    1 file changed, 52 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index cc801ce099b4..aa6aad805c4d 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -816,6 +816,21 @@ static bool too_many_isolated(struct compact_control *cc)
>>    	return too_many;
>>    }
>>    
>> +/*
> 
> 
> Can't you add these comments to the respective checks? Like
> 
> static bool skip_isolation_on_order(int order, int target_order)
> {
> 	/*
> 	 * Unless we are performing global compaction (targert_order <
> 	 * 0), skip any folios that are larger than the target order: we
> 	 * wouldn't be here if we'd have a free folio with the desired
> 	 * target_order, so migrating this folio would likely fail
> 	 * later.
> 	 */
> 	if (target_order != -1 && order >= target_order)
> 		return true;

I just stumbled over "is_via_compact_memory", likely that should be used 
instead of the "!= -1 check.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/4] mm/compaction: enable compacting >0 order folios.
  2024-02-20  9:03   ` David Hildenbrand
  2024-02-20  9:11     ` David Hildenbrand
@ 2024-02-20 15:27     ` Zi Yan
  1 sibling, 0 replies; 16+ messages in thread
From: Zi Yan @ 2024-02-20 15:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, "Huang, Ying",
	Ryan Roberts, Andrew Morton, "Matthew Wilcox (Oracle)",
	"Yin, Fengwei",
	Yu Zhao, Vlastimil Babka, "Kirill A . Shutemov",
	Johannes Weiner, Baolin Wang, Kemeng Shi, Mel Gorman, Rohan Puri,
	Mcgrof Chamberlain, Adam Manzanares,
	"Vishal Moola (Oracle)"

[-- Attachment #1: Type: text/plain, Size: 6863 bytes --]

On 20 Feb 2024, at 4:03, David Hildenbrand wrote:

> On 16.02.24 18:04, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> migrate_pages() supports >0 order folio migration and during compaction,
>> even if compaction_alloc() cannot provide >0 order free pages,
>> migrate_pages() can split the source page and try to migrate the base
>> pages from the split.  It can be a baseline and start point for adding
>> support for compacting >0 order folios.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Suggested-by: Huang Ying <ying.huang@intel.com>
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Tested-by: Yu Zhao <yuzhao@google.com>
>> Cc: Adam Manzanares <a.manzanares@samsung.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Luis Chamberlain <mcgrof@kernel.org>
>> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>> ---
>>   mm/compaction.c | 66 ++++++++++++++++++++++++++++++++++++++-----------
>>   1 file changed, 52 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index cc801ce099b4..aa6aad805c4d 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -816,6 +816,21 @@ static bool too_many_isolated(struct compact_control *cc)
>>   	return too_many;
>>   }
>>  +/*
>
>
> Can't you add these comments to the respective checks? Like
>
> static bool skip_isolation_on_order(int order, int target_order)
> {
> 	/*
> 	 * Unless we are performing global compaction (targert_order <
> 	 * 0), skip any folios that are larger than the target order: we
> 	 * wouldn't be here if we'd have a free folio with the desired
> 	 * target_order, so migrating this folio would likely fail
> 	 * later.
> 	 */
> 	if (target_order != -1 && order >= target_order)
> 		return true;
> 	/*
> 	 * We limit memory compaction to pageblocks and won't try
> 	 * creating free blocks of memory that are larger than that.
> 	 */
> 	return order >= pageblock_order;
> }
>
> Then, add a simple expressive function documentation (if really required) that doesn't contain all these details.
>

OK. No problem.

>> + * 1. if the page order is larger than or equal to target_order (i.e.,
>> + * cc->order and when it is not -1 for global compaction), skip it since
>> + * target_order already indicates no free page with larger than target_order
>> + * exists and later migrating it will most likely fail;
>> + *
>> + * 2. compacting > pageblock_order pages does not improve memory fragmentation,
>
> I'm pretty sure you meant "reduce" ?

Yes.

>
>> + * skip them;
>> + */
>> +static bool skip_isolation_on_order(int order, int target_order)
>> +{
>> +	return (target_order != -1 && order >= target_order) ||
>> +		order >= pageblock_order;
>> +}
>> +
>>   /**
>>    * isolate_migratepages_block() - isolate all migrate-able pages within
>>    *				  a single pageblock
>> @@ -947,7 +962,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>   			valid_page = page;
>>   		}
>>  -		if (PageHuge(page) && cc->alloc_contig) {
>> +		if (PageHuge(page)) {
>> +			/*
>> +			 * skip hugetlbfs if we are not compacting for pages
>> +			 * bigger than its order. THPs and other compound pages
>> +			 * are handled below.
>> +			 */
>> +			if (!cc->alloc_contig) {
>> +				const unsigned int order = compound_order(page);
>> +
>> +				if (order <= MAX_PAGE_ORDER) {
>> +					low_pfn += (1UL << order) - 1;
>> +					nr_scanned += (1UL << order) - 1;
>> +				}
>> +				goto isolate_fail;
>> +			}
>> +			/* for alloc_contig case */
>>   			if (locked) {
>>   				unlock_page_lruvec_irqrestore(locked, flags);
>>   				locked = NULL;
>> @@ -1008,21 +1038,24 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>   		}
>>    		/*
>> -		 * Regardless of being on LRU, compound pages such as THP and
>> -		 * hugetlbfs are not to be compacted unless we are attempting
>> -		 * an allocation much larger than the huge page size (eg CMA).
>> -		 * We can potentially save a lot of iterations if we skip them
>> -		 * at once. The check is racy, but we can consider only valid
>> -		 * values and the only danger is skipping too much.
>> +		 * Regardless of being on LRU, compound pages such as THP
>> +		 * (hugetlbfs is handled above) are not to be compacted unless
>> +		 * we are attempting an allocation larger than the compound
>> +		 * page size. We can potentially save a lot of iterations if we
>> +		 * skip them at once. The check is racy, but we can consider
>> +		 * only valid values and the only danger is skipping too much.
>>   		 */
>>   		if (PageCompound(page) && !cc->alloc_contig) {
>>   			const unsigned int order = compound_order(page);
>>  -			if (likely(order <= MAX_PAGE_ORDER)) {
>> -				low_pfn += (1UL << order) - 1;
>> -				nr_scanned += (1UL << order) - 1;
>> +			/* Skip based on page order and compaction target order. */
>> +			if (skip_isolation_on_order(order, cc->order)) {
>> +				if (order <= MAX_PAGE_ORDER) {
>> +					low_pfn += (1UL << order) - 1;
>> +					nr_scanned += (1UL << order) - 1;
>> +				}
>> +				goto isolate_fail;
>>   			}
>> -			goto isolate_fail;
>>   		}
>>    		/*
>> @@ -1165,10 +1198,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>   			}
>>    			/*
>> -			 * folio become large since the non-locked check,
>> -			 * and it's on LRU.
>> +			 * Check LRU folio order under the lock
>>   			 */
>> -			if (unlikely(folio_test_large(folio) && !cc->alloc_contig)) {
>> +			if (unlikely(skip_isolation_on_order(folio_order(folio),
>> +							     cc->order) &&
>> +				     !cc->alloc_contig)) {
>>   				low_pfn += folio_nr_pages(folio) - 1;
>>   				nr_scanned += folio_nr_pages(folio) - 1;
>>   				folio_set_lru(folio);
>> @@ -1788,6 +1822,10 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
>>   	struct compact_control *cc = (struct compact_control *)data;
>>   	struct folio *dst;
>>  +	/* this makes migrate_pages() split the source page and retry */
>> +	if (folio_test_large(src) > 0)
>> +		return NULL;
>
> Why the "> 0 " check ? Either it's large or it isn't.
Will fix it.

> Apart from that LGTM, but I am no compaction expert.

Thanks.


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/4] mm/compaction: enable compacting >0 order folios.
  2024-02-20  9:11     ` David Hildenbrand
@ 2024-02-20 15:27       ` Zi Yan
  0 siblings, 0 replies; 16+ messages in thread
From: Zi Yan @ 2024-02-20 15:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, "Huang, Ying",
	Ryan Roberts, Andrew Morton, "Matthew Wilcox (Oracle)",
	"Yin, Fengwei",
	Yu Zhao, Vlastimil Babka, "Kirill A . Shutemov",
	Johannes Weiner, Baolin Wang, Kemeng Shi, Mel Gorman, Rohan Puri,
	Mcgrof Chamberlain, Adam Manzanares,
	"Vishal Moola (Oracle)"

[-- Attachment #1: Type: text/plain, Size: 2556 bytes --]

On 20 Feb 2024, at 4:11, David Hildenbrand wrote:

> On 20.02.24 10:03, David Hildenbrand wrote:
>> On 16.02.24 18:04, Zi Yan wrote:
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> migrate_pages() supports >0 order folio migration and during compaction,
>>> even if compaction_alloc() cannot provide >0 order free pages,
>>> migrate_pages() can split the source page and try to migrate the base
>>> pages from the split.  It can be a baseline and start point for adding
>>> support for compacting >0 order folios.
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> Suggested-by: Huang Ying <ying.huang@intel.com>
>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Tested-by: Yu Zhao <yuzhao@google.com>
>>> Cc: Adam Manzanares <a.manzanares@samsung.com>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
>>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> Cc: Luis Chamberlain <mcgrof@kernel.org>
>>> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
>>> Cc: Mel Gorman <mgorman@techsingularity.net>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>>> ---
>>>    mm/compaction.c | 66 ++++++++++++++++++++++++++++++++++++++-----------
>>>    1 file changed, 52 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index cc801ce099b4..aa6aad805c4d 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -816,6 +816,21 @@ static bool too_many_isolated(struct compact_control *cc)
>>>    	return too_many;
>>>    }
>>>   +/*
>>
>>
>> Can't you add these comments to the respective checks? Like
>>
>> static bool skip_isolation_on_order(int order, int target_order)
>> {
>> 	/*
>> 	 * Unless we are performing global compaction (targert_order <
>> 	 * 0), skip any folios that are larger than the target order: we
>> 	 * wouldn't be here if we'd have a free folio with the desired
>> 	 * target_order, so migrating this folio would likely fail
>> 	 * later.
>> 	 */
>> 	if (target_order != -1 && order >= target_order)
>> 		return true;
>
> I just stumbled over "is_via_compact_memory", likely that should be used instead of the "!= -1 check.

Thanks. Let me use it.


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-02-20 15:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-16 17:04 [PATCH v6 0/4] Enable >0 order folio memory compaction Zi Yan
2024-02-16 17:04 ` [PATCH v6 1/4] mm/page_alloc: remove unused fpi_flags in free_pages_prepare() Zi Yan
2024-02-16 17:19   ` Vlastimil Babka
2024-02-20  8:48   ` David Hildenbrand
2024-02-16 17:04 ` [PATCH v6 2/4] mm/compaction: enable compacting >0 order folios Zi Yan
2024-02-20  9:03   ` David Hildenbrand
2024-02-20  9:11     ` David Hildenbrand
2024-02-20 15:27       ` Zi Yan
2024-02-20 15:27     ` Zi Yan
2024-02-16 17:04 ` [PATCH v6 3/4] mm/compaction: add support for >0 order folio memory compaction Zi Yan
2024-02-16 17:04 ` [PATCH v6 4/4] mm/compaction: optimize >0 order folio compaction with free page split Zi Yan
2024-02-20  2:06 ` [PATCH v6 0/4] Enable >0 order folio memory compaction Andrew Morton
2024-02-20  2:31   ` Zi Yan
2024-02-20  3:00     ` Baolin Wang
2024-02-20  3:30       ` Andrew Morton
2024-02-20  6:28         ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox