[PATCH v1 0/2] mm/page_alloc: rework conditional splitting >= pageblock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/2] mm/page_alloc: rework conditional splitting >= pageblock_order pages when freeing
@ 2024-12-06  9:59 David Hildenbrand
  2024-12-06  9:59 ` [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand
  2024-12-06  9:59 ` [PATCH v1 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages David Hildenbrand
  0 siblings, 2 replies; 14+ messages in thread
From: David Hildenbrand @ 2024-12-06  9:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Zi Yan,
	Vlastimil Babka, Yu Zhao

Looking into recent alloc_contig_range(__GFP_COMP) support, I realized that
we now unconditionally split up high-order pages on the page freeing path
to free in pageblock granularity, just to immediately let the buddy merge
them again in the common case.

Let's optimize for the common case (all pageblock migratetypes match), and
enable it only in configs where this is strictly required. Further, add
some comments that explain why this special casing is required at all.

Alongside, a fix for a stale comment in page isolation code.

Tested with runtime allocation of gigantic pages and virtio-mem.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>

David Hildenbrand (2):
  mm/page_alloc: conditionally split > pageblock_order pages in
    free_one_page() and move_freepages_block_isolate()
  mm/page_isolation: fixup isolate_single_pageblock() comment regarding
    splitting free pages

 mm/page_alloc.c     | 71 ++++++++++++++++++++++++++++++++++++---------
 mm/page_isolation.c |  9 +++---
 2 files changed, 61 insertions(+), 19 deletions(-)

-- 
2.47.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
  2024-12-06  9:59 [PATCH v1 0/2] mm/page_alloc: rework conditional splitting >= pageblock_order pages when freeing David Hildenbrand
@ 2024-12-06  9:59 ` David Hildenbrand
  2024-12-06 16:58   ` Zi Yan
                     ` (2 more replies)
  2024-12-06  9:59 ` [PATCH v1 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages David Hildenbrand
  1 sibling, 3 replies; 14+ messages in thread
From: David Hildenbrand @ 2024-12-06  9:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Zi Yan,
	Vlastimil Babka, Yu Zhao

Let's special-case for the common scenarios that:

(a) We are freeing pages <= pageblock_order
(b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match
    (especially, no mixture of isolated and non-isolated pageblocks)

When we encounter a > MAX_PAGE_ORDER page, it can only come from
alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks.

When we encounter a >pageblock_order <= MAX_PAGE_ORDER page,
check whether all pageblocks match, and if so (common case), don't
split them up just for the buddy to merge them back.

This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy,
for example during system startups, memory onlining, or when isolating
consecutive pageblocks via alloc_contig_range()/memory offlining, that
we don't unnecessarily split up what we'll immediately merge again,
because the migratetypes match.

Rename split_large_buddy() to __free_one_page_maybe_split(), to make it
clearer what's happening, and handle in it only natural buddy orders,
not the alloc_contig_range(__GFP_COMP) special case: handle that in
free_one_page() only.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 71 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 57 insertions(+), 14 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48a291c485df4..ad19758a7779f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1225,24 +1225,50 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
-/* Split a multi-block free page into its individual pageblocks. */
-static void split_large_buddy(struct zone *zone, struct page *page,
-			      unsigned long pfn, int order, fpi_t fpi)
+static bool pfnblock_migratetype_equal(unsigned long pfn,
+		unsigned long end_pfn, int mt)
 {
-	unsigned long end = pfn + (1 << order);
+	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | end_pfn, pageblock_nr_pages));
 
+	while (pfn != end_pfn) {
+		struct page *page = pfn_to_page(pfn);
+
+		if (unlikely(mt != get_pfnblock_migratetype(page, pfn)))
+			return false;
+		pfn += pageblock_nr_pages;
+	}
+	return true;
+}
+
+static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
+		unsigned long pfn, int order, fpi_t fpi_flags)
+{
+	const unsigned long end_pfn = pfn + (1 << order);
+	int mt = get_pfnblock_migratetype(page, pfn);
+
+	VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
 	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
 	/* Caller removed page from freelist, buddy info cleared! */
 	VM_WARN_ON_ONCE(PageBuddy(page));
 
-	if (order > pageblock_order)
-		order = pageblock_order;
-
-	while (pfn != end) {
-		int mt = get_pfnblock_migratetype(page, pfn);
+	/*
+	 * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
+	 * pages that cover pageblocks with different migratetypes; for example
+	 * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
+	 * case, fallback to freeing individual pageblocks so they get put
+	 * onto the right lists.
+	 */
+	if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
+	    likely(order <= pageblock_order) ||
+	    pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
+		__free_one_page(page, pfn, zone, order, mt, fpi_flags);
+		return;
+	}
 
-		__free_one_page(page, pfn, zone, order, mt, fpi);
-		pfn += 1 << order;
+	while (pfn != end_pfn) {
+		mt = get_pfnblock_migratetype(page, pfn);
+		__free_one_page(page, pfn, zone, pageblock_order, mt, fpi_flags);
+		pfn += pageblock_nr_pages;
 		page = pfn_to_page(pfn);
 	}
 }
@@ -1254,7 +1280,24 @@ static void free_one_page(struct zone *zone, struct page *page,
 	unsigned long flags;
 
 	spin_lock_irqsave(&zone->lock, flags);
-	split_large_buddy(zone, page, pfn, order, fpi_flags);
+	if (likely(order <= MAX_PAGE_ORDER)) {
+		__free_one_page_maybe_split(zone, page, pfn, order, fpi_flags);
+	} else if (IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
+		const unsigned long end_pfn = pfn + (1 << order);
+
+		/*
+		 * The only way we can end up with order > MAX_PAGE_ORDER is
+		 * through alloc_contig_range(__GFP_COMP).
+		 */
+		while (pfn != end_pfn) {
+			__free_one_page_maybe_split(zone, page, pfn,
+						    MAX_PAGE_ORDER, fpi_flags);
+			pfn += MAX_ORDER_NR_PAGES;
+			page = pfn_to_page(pfn);
+		}
+	} else {
+		WARN_ON_ONCE(1);
+	}
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	__count_vm_events(PGFREE, 1 << order);
@@ -1790,7 +1833,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page,
 		del_page_from_free_list(buddy, zone, order,
 					get_pfnblock_migratetype(buddy, pfn));
 		set_pageblock_migratetype(page, migratetype);
-		split_large_buddy(zone, buddy, pfn, order, FPI_NONE);
+		__free_one_page_maybe_split(zone, buddy, pfn, order, FPI_NONE);
 		return true;
 	}
 
@@ -1801,7 +1844,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page,
 		del_page_from_free_list(page, zone, order,
 					get_pfnblock_migratetype(page, pfn));
 		set_pageblock_migratetype(page, migratetype);
-		split_large_buddy(zone, page, pfn, order, FPI_NONE);
+		__free_one_page_maybe_split(zone, page, pfn, order, FPI_NONE);
 		return true;
 	}
 move:
-- 
2.47.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages
  2024-12-06  9:59 [PATCH v1 0/2] mm/page_alloc: rework conditional splitting >= pageblock_order pages when freeing David Hildenbrand
  2024-12-06  9:59 ` [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand
@ 2024-12-06  9:59 ` David Hildenbrand
  2024-12-06 16:59   ` Zi Yan
  2024-12-09 22:13   ` Vlastimil Babka
  1 sibling, 2 replies; 14+ messages in thread
From: David Hildenbrand @ 2024-12-06  9:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Zi Yan,
	Vlastimil Babka, Yu Zhao

Let's fixup the comment, documenting how free_one_page_maybe_split()
comes into play.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_isolation.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c608e9d728655..63fddf283e681 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -298,11 +298,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * pagelbocks.
  * [      MAX_PAGE_ORDER         ]
  * [  pageblock0  |  pageblock1  ]
- * When either pageblock is isolated, if it is a free page, the page is not
- * split into separate migratetype lists, which is supposed to; if it is an
- * in-use page and freed later, __free_one_page() does not split the free page
- * either. The function handles this by splitting the free page or migrating
- * the in-use page then splitting the free page.
+ * When either pageblock is isolated, if it is an in-use page and freed later,
+ * __free_one_page_maybe_split() will split the free page if required. If the
+ * page is already free, this function handles this by splitting the free page
+ * through move_freepages_block_isolate()->__free_one_page_maybe_split().
  */
 static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 		bool isolate_before, bool skip_isolation, int migratetype)
-- 
2.47.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
  2024-12-06  9:59 ` [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand
@ 2024-12-06 16:58   ` Zi Yan
  2024-12-07  6:48   ` Yu Zhao
  2024-12-09 19:01   ` Vlastimil Babka
  2 siblings, 0 replies; 14+ messages in thread
From: Zi Yan @ 2024-12-06 16:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Vlastimil Babka, Yu Zhao

On 6 Dec 2024, at 4:59, David Hildenbrand wrote:

> Let's special-case for the common scenarios that:
>
> (a) We are freeing pages <= pageblock_order
> (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match
>     (especially, no mixture of isolated and non-isolated pageblocks)
>
> When we encounter a > MAX_PAGE_ORDER page, it can only come from
> alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks.
>
> When we encounter a >pageblock_order <= MAX_PAGE_ORDER page,
> check whether all pageblocks match, and if so (common case), don't
> split them up just for the buddy to merge them back.
>
> This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy,
> for example during system startups, memory onlining, or when isolating
> consecutive pageblocks via alloc_contig_range()/memory offlining, that
> we don't unnecessarily split up what we'll immediately merge again,
> because the migratetypes match.
>
> Rename split_large_buddy() to __free_one_page_maybe_split(), to make it
> clearer what's happening, and handle in it only natural buddy orders,
> not the alloc_contig_range(__GFP_COMP) special case: handle that in
> free_one_page() only.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/page_alloc.c | 71 +++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 57 insertions(+), 14 deletions(-)
>
LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages
  2024-12-06  9:59 ` [PATCH v1 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages David Hildenbrand
@ 2024-12-06 16:59   ` Zi Yan
  2024-12-09 22:13   ` Vlastimil Babka
  1 sibling, 0 replies; 14+ messages in thread
From: Zi Yan @ 2024-12-06 16:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Vlastimil Babka, Yu Zhao

On 6 Dec 2024, at 4:59, David Hildenbrand wrote:

> Let's fixup the comment, documenting how free_one_page_maybe_split()
> comes into play.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/page_isolation.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index c608e9d728655..63fddf283e681 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -298,11 +298,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * pagelbocks.
>   * [      MAX_PAGE_ORDER         ]
>   * [  pageblock0  |  pageblock1  ]
> - * When either pageblock is isolated, if it is a free page, the page is not
> - * split into separate migratetype lists, which is supposed to; if it is an
> - * in-use page and freed later, __free_one_page() does not split the free page
> - * either. The function handles this by splitting the free page or migrating
> - * the in-use page then splitting the free page.
> + * When either pageblock is isolated, if it is an in-use page and freed later,
> + * __free_one_page_maybe_split() will split the free page if required. If the
> + * page is already free, this function handles this by splitting the free page
> + * through move_freepages_block_isolate()->__free_one_page_maybe_split().
>   */
>  static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>  		bool isolate_before, bool skip_isolation, int migratetype)

LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
  2024-12-06  9:59 ` [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand
  2024-12-06 16:58   ` Zi Yan
@ 2024-12-07  6:48   ` Yu Zhao
  2024-12-09 19:01   ` Vlastimil Babka
  2 siblings, 0 replies; 14+ messages in thread
From: Yu Zhao @ 2024-12-07  6:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Zi Yan, Vlastimil Babka

On Fri, Dec 6, 2024 at 3:00 AM David Hildenbrand <david@redhat.com> wrote:
>
> Let's special-case for the common scenarios that:
>
> (a) We are freeing pages <= pageblock_order
> (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match
>     (especially, no mixture of isolated and non-isolated pageblocks)
>
> When we encounter a > MAX_PAGE_ORDER page, it can only come from
> alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks.
>
> When we encounter a >pageblock_order <= MAX_PAGE_ORDER page,
> check whether all pageblocks match, and if so (common case), don't
> split them up just for the buddy to merge them back.
>
> This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy,
> for example during system startups, memory onlining, or when isolating
> consecutive pageblocks via alloc_contig_range()/memory offlining, that
> we don't unnecessarily split up what we'll immediately merge again,
> because the migratetypes match.
>
> Rename split_large_buddy() to __free_one_page_maybe_split(), to make it
> clearer what's happening, and handle in it only natural buddy orders,
> not the alloc_contig_range(__GFP_COMP) special case: handle that in
> free_one_page() only.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Yu Zhao <yuzhao@google.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
  2024-12-06  9:59 ` [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand
  2024-12-06 16:58   ` Zi Yan
  2024-12-07  6:48   ` Yu Zhao
@ 2024-12-09 19:01   ` Vlastimil Babka
  2024-12-09 19:23     ` Zi Yan
  2024-12-10  9:39     ` David Hildenbrand
  2 siblings, 2 replies; 14+ messages in thread
From: Vlastimil Babka @ 2024-12-09 19:01 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel, Johannes Weiner
  Cc: linux-mm, Andrew Morton, Zi Yan, Yu Zhao

On 12/6/24 10:59, David Hildenbrand wrote:
> Let's special-case for the common scenarios that:
> 
> (a) We are freeing pages <= pageblock_order
> (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match
>     (especially, no mixture of isolated and non-isolated pageblocks)

Well in many of those cases we could also just adjust the pageblocks... But
perhaps they indeed shouldn't differ in the first place, unless there's an
isolation attempt.

> When we encounter a > MAX_PAGE_ORDER page, it can only come from
> alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks.
> 
> When we encounter a >pageblock_order <= MAX_PAGE_ORDER page,
> check whether all pageblocks match, and if so (common case), don't
> split them up just for the buddy to merge them back.
> 
> This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy,
> for example during system startups, memory onlining, or when isolating
> consecutive pageblocks via alloc_contig_range()/memory offlining, that
> we don't unnecessarily split up what we'll immediately merge again,
> because the migratetypes match.
> 
> Rename split_large_buddy() to __free_one_page_maybe_split(), to make it
> clearer what's happening, and handle in it only natural buddy orders,
> not the alloc_contig_range(__GFP_COMP) special case: handle that in
> free_one_page() only.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz

Hm but noticed something:

> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
> +		unsigned long pfn, int order, fpi_t fpi_flags)
> +{
> +	const unsigned long end_pfn = pfn + (1 << order);
> +	int mt = get_pfnblock_migratetype(page, pfn);
> +
> +	VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
>  	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>  	/* Caller removed page from freelist, buddy info cleared! */
>  	VM_WARN_ON_ONCE(PageBuddy(page));
>  
> -	if (order > pageblock_order)
> -		order = pageblock_order;
> -
> -	while (pfn != end) {
> -		int mt = get_pfnblock_migratetype(page, pfn);
> +	/*
> +	 * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
> +	 * pages that cover pageblocks with different migratetypes; for example
> +	 * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
> +	 * case, fallback to freeing individual pageblocks so they get put
> +	 * onto the right lists.
> +	 */
> +	if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
> +	    likely(order <= pageblock_order) ||
> +	    pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
> +		__free_one_page(page, pfn, zone, order, mt, fpi_flags);
> +		return;
> +	}
>  
> -		__free_one_page(page, pfn, zone, order, mt, fpi);
> -		pfn += 1 << order;
> +	while (pfn != end_pfn) {
> +		mt = get_pfnblock_migratetype(page, pfn);
> +		__free_one_page(page, pfn, zone, pageblock_order, mt, fpi_flags);
> +		pfn += pageblock_nr_pages;
>  		page = pfn_to_page(pfn);

This predates your patch, but seems potentially dangerous to attempt
pfn_to_page(end_pfn) with SPARSEMEM and no vmemmap and the end_pfn perhaps
being just outside of the valid range? Should we change that?

But seems this code was initially introduced as part of Johannes'
migratetype hygiene series.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
  2024-12-09 19:01   ` Vlastimil Babka
@ 2024-12-09 19:23     ` Zi Yan
  2024-12-09 21:35       ` David Hildenbrand
  2024-12-09 21:36       ` Vlastimil Babka
  2024-12-10  9:39     ` David Hildenbrand
  1 sibling, 2 replies; 14+ messages in thread
From: Zi Yan @ 2024-12-09 19:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Hildenbrand, linux-kernel, Johannes Weiner, linux-mm,
	Andrew Morton, Yu Zhao

On 9 Dec 2024, at 14:01, Vlastimil Babka wrote:

> On 12/6/24 10:59, David Hildenbrand wrote:
>> Let's special-case for the common scenarios that:
>>
>> (a) We are freeing pages <= pageblock_order
>> (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match
>>     (especially, no mixture of isolated and non-isolated pageblocks)
>
> Well in many of those cases we could also just adjust the pageblocks... But
> perhaps they indeed shouldn't differ in the first place, unless there's an
> isolation attempt.
>
>> When we encounter a > MAX_PAGE_ORDER page, it can only come from
>> alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks.
>>
>> When we encounter a >pageblock_order <= MAX_PAGE_ORDER page,
>> check whether all pageblocks match, and if so (common case), don't
>> split them up just for the buddy to merge them back.
>>
>> This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy,
>> for example during system startups, memory onlining, or when isolating
>> consecutive pageblocks via alloc_contig_range()/memory offlining, that
>> we don't unnecessarily split up what we'll immediately merge again,
>> because the migratetypes match.
>>
>> Rename split_large_buddy() to __free_one_page_maybe_split(), to make it
>> clearer what's happening, and handle in it only natural buddy orders,
>> not the alloc_contig_range(__GFP_COMP) special case: handle that in
>> free_one_page() only.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz
>
> Hm but noticed something:
>
>> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
>> +		unsigned long pfn, int order, fpi_t fpi_flags)
>> +{
>> +	const unsigned long end_pfn = pfn + (1 << order);
>> +	int mt = get_pfnblock_migratetype(page, pfn);
>> +
>> +	VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
>>  	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>>  	/* Caller removed page from freelist, buddy info cleared! */
>>  	VM_WARN_ON_ONCE(PageBuddy(page));
>>
>> -	if (order > pageblock_order)
>> -		order = pageblock_order;
>> -
>> -	while (pfn != end) {
>> -		int mt = get_pfnblock_migratetype(page, pfn);
>> +	/*
>> +	 * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
>> +	 * pages that cover pageblocks with different migratetypes; for example
>> +	 * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
>> +	 * case, fallback to freeing individual pageblocks so they get put
>> +	 * onto the right lists.
>> +	 */
>> +	if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
>> +	    likely(order <= pageblock_order) ||
>> +	    pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
>> +		__free_one_page(page, pfn, zone, order, mt, fpi_flags);
>> +		return;
>> +	}
>>
>> -		__free_one_page(page, pfn, zone, order, mt, fpi);
>> -		pfn += 1 << order;
>> +	while (pfn != end_pfn) {
>> +		mt = get_pfnblock_migratetype(page, pfn);
>> +		__free_one_page(page, pfn, zone, pageblock_order, mt, fpi_flags);
>> +		pfn += pageblock_nr_pages;
>>  		page = pfn_to_page(pfn);
>
> This predates your patch, but seems potentially dangerous to attempt
> pfn_to_page(end_pfn) with SPARSEMEM and no vmemmap and the end_pfn perhaps
> being just outside of the valid range? Should we change that?
>
> But seems this code was initially introduced as part of Johannes'
> migratetype hygiene series.

It starts as split_free_page() from commit b2c9e2fbba32 ("mm: make
alloc_contig_range work at pageblock granularity”), but harmless since
it is only used to split a buddy page. Then commit fd919a85cd55 ("mm:
page_isolation: prepare for hygienic freelists") refactored it, which
should be fine, since it is still used for the same purpose in page
isolation. Then commit e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
used it for gigantic hugetlb.

For SPARSEMEM && !SPARSEMEM_VMEMMAP, PFNs are contiguous, vmemmap might not
be. The code above using pfn in the loop might be fine. And since order
is provided, unless the caller is providing a falsely large order, pfn
should be valid. Or am I missing anything?

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
  2024-12-09 19:23     ` Zi Yan
@ 2024-12-09 21:35       ` David Hildenbrand
  2024-12-09 21:42         ` Zi Yan
  2024-12-09 21:36       ` Vlastimil Babka
  1 sibling, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2024-12-09 21:35 UTC (permalink / raw)
  To: Zi Yan, Vlastimil Babka
  Cc: linux-kernel, Johannes Weiner, linux-mm, Andrew Morton, Yu Zhao

On 09.12.24 20:23, Zi Yan wrote:
> On 9 Dec 2024, at 14:01, Vlastimil Babka wrote:
> 
>> On 12/6/24 10:59, David Hildenbrand wrote:
>>> Let's special-case for the common scenarios that:
>>>
>>> (a) We are freeing pages <= pageblock_order
>>> (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match
>>>      (especially, no mixture of isolated and non-isolated pageblocks)
>>
>> Well in many of those cases we could also just adjust the pageblocks... But
>> perhaps they indeed shouldn't differ in the first place, unless there's an
>> isolation attempt.
>>
>>> When we encounter a > MAX_PAGE_ORDER page, it can only come from
>>> alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks.
>>>
>>> When we encounter a >pageblock_order <= MAX_PAGE_ORDER page,
>>> check whether all pageblocks match, and if so (common case), don't
>>> split them up just for the buddy to merge them back.
>>>
>>> This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy,
>>> for example during system startups, memory onlining, or when isolating
>>> consecutive pageblocks via alloc_contig_range()/memory offlining, that
>>> we don't unnecessarily split up what we'll immediately merge again,
>>> because the migratetypes match.
>>>
>>> Rename split_large_buddy() to __free_one_page_maybe_split(), to make it
>>> clearer what's happening, and handle in it only natural buddy orders,
>>> not the alloc_contig_range(__GFP_COMP) special case: handle that in
>>> free_one_page() only.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz
>>
>> Hm but noticed something:
>>
>>> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
>>> +		unsigned long pfn, int order, fpi_t fpi_flags)
>>> +{
>>> +	const unsigned long end_pfn = pfn + (1 << order);
>>> +	int mt = get_pfnblock_migratetype(page, pfn);
>>> +
>>> +	VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
>>>   	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>>>   	/* Caller removed page from freelist, buddy info cleared! */
>>>   	VM_WARN_ON_ONCE(PageBuddy(page));
>>>
>>> -	if (order > pageblock_order)
>>> -		order = pageblock_order;
>>> -
>>> -	while (pfn != end) {
>>> -		int mt = get_pfnblock_migratetype(page, pfn);
>>> +	/*
>>> +	 * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
>>> +	 * pages that cover pageblocks with different migratetypes; for example
>>> +	 * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
>>> +	 * case, fallback to freeing individual pageblocks so they get put
>>> +	 * onto the right lists.
>>> +	 */
>>> +	if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
>>> +	    likely(order <= pageblock_order) ||
>>> +	    pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
>>> +		__free_one_page(page, pfn, zone, order, mt, fpi_flags);
>>> +		return;
>>> +	}
>>>
>>> -		__free_one_page(page, pfn, zone, order, mt, fpi);
>>> -		pfn += 1 << order;
>>> +	while (pfn != end_pfn) {
>>> +		mt = get_pfnblock_migratetype(page, pfn);
>>> +		__free_one_page(page, pfn, zone, pageblock_order, mt, fpi_flags);
>>> +		pfn += pageblock_nr_pages;
>>>   		page = pfn_to_page(pfn);
>>
>> This predates your patch, but seems potentially dangerous to attempt
>> pfn_to_page(end_pfn) with SPARSEMEM and no vmemmap and the end_pfn perhaps
>> being just outside of the valid range? Should we change that?
>>
>> But seems this code was initially introduced as part of Johannes'
>> migratetype hygiene series.
> 
> It starts as split_free_page() from commit b2c9e2fbba32 ("mm: make
> alloc_contig_range work at pageblock granularity”), but harmless since
> it is only used to split a buddy page. Then commit fd919a85cd55 ("mm:
> page_isolation: prepare for hygienic freelists") refactored it, which
> should be fine, since it is still used for the same purpose in page
> isolation. Then commit e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
> used it for gigantic hugetlb.
> 
> For SPARSEMEM && !SPARSEMEM_VMEMMAP, PFNs are contiguous, vmemmap might not
> be. The code above using pfn in the loop might be fine. And since order
> is provided, unless the caller is providing a falsely large order, pfn
> should be valid. Or am I missing anything?

I think the question is, what happens when we call pfn_to_page() on a 
PFN that falls into a memory section that is either offline, doesn't 
have a memmap, or does not exist.

With CONFIG_SPARSEMEM, we do a

struct mem_section *__sec = __pfn_to_section(__pfn)
__section_mem_map_addr(__sec) + __pfn;

__pfn_to_section() can return NULL, in which case 
__section_mem_map_addr() would dereference NULL.

I assume it ould happen in corner cases, if we'd exceed 
NR_SECTION_ROOTS. (IOW, large memory, and we free a page that is at the 
very end of physical memory).

Likely, we should do the pfn_to_page() before the __free_one_page() call.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
  2024-12-09 19:23     ` Zi Yan
  2024-12-09 21:35       ` David Hildenbrand
@ 2024-12-09 21:36       ` Vlastimil Babka
  1 sibling, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2024-12-09 21:36 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, linux-kernel, Johannes Weiner, linux-mm,
	Andrew Morton, Yu Zhao

On 12/9/24 20:23, Zi Yan wrote:
> On 9 Dec 2024, at 14:01, Vlastimil Babka wrote:
>>> +	/*
>>> +	 * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
>>> +	 * pages that cover pageblocks with different migratetypes; for example
>>> +	 * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
>>> +	 * case, fallback to freeing individual pageblocks so they get put
>>> +	 * onto the right lists.
>>> +	 */
>>> +	if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
>>> +	    likely(order <= pageblock_order) ||
>>> +	    pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
>>> +		__free_one_page(page, pfn, zone, order, mt, fpi_flags);
>>> +		return;
>>> +	}
>>>
>>> -		__free_one_page(page, pfn, zone, order, mt, fpi);
>>> -		pfn += 1 << order;
>>> +	while (pfn != end_pfn) {
>>> +		mt = get_pfnblock_migratetype(page, pfn);
>>> +		__free_one_page(page, pfn, zone, pageblock_order, mt, fpi_flags);
>>> +		pfn += pageblock_nr_pages;
>>>  		page = pfn_to_page(pfn);
>>
>> This predates your patch, but seems potentially dangerous to attempt
>> pfn_to_page(end_pfn) with SPARSEMEM and no vmemmap and the end_pfn perhaps
>> being just outside of the valid range? Should we change that?
>>
>> But seems this code was initially introduced as part of Johannes'
>> migratetype hygiene series.
> 
> It starts as split_free_page() from commit b2c9e2fbba32 ("mm: make
> alloc_contig_range work at pageblock granularity”), but harmless since
> it is only used to split a buddy page. Then commit fd919a85cd55 ("mm:
> page_isolation: prepare for hygienic freelists") refactored it, which
> should be fine, since it is still used for the same purpose in page
> isolation. Then commit e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
> used it for gigantic hugetlb.
> 
> For SPARSEMEM && !SPARSEMEM_VMEMMAP, PFNs are contiguous, vmemmap might not
> be. The code above using pfn in the loop might be fine. And since order
> is provided, unless the caller is providing a falsely large order, pfn
> should be valid. Or am I missing anything?

I mean if we are in the last iteration and about to exit the loop because
pfn == end_pfn, and it's the very last MAX_ORDER block of a zone and
section, end_pfn is already outside of it, and pfn_to_page() might get NULL
result from __pfn_to_section() and __section_mem_map_addr() then oops, no?

> Best Regards,
> Yan, Zi



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
  2024-12-09 21:35       ` David Hildenbrand
@ 2024-12-09 21:42         ` Zi Yan
  2024-12-09 22:10           ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Zi Yan @ 2024-12-09 21:42 UTC (permalink / raw)
  To: David Hildenbrand, Vlastimil Babka
  Cc: linux-kernel, Johannes Weiner, linux-mm, Andrew Morton, Yu Zhao

On 9 Dec 2024, at 16:35, David Hildenbrand wrote:

> On 09.12.24 20:23, Zi Yan wrote:
>> On 9 Dec 2024, at 14:01, Vlastimil Babka wrote:
>>
>>> On 12/6/24 10:59, David Hildenbrand wrote:
>>>> Let's special-case for the common scenarios that:
>>>>
>>>> (a) We are freeing pages <= pageblock_order
>>>> (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match
>>>>      (especially, no mixture of isolated and non-isolated pageblocks)
>>>
>>> Well in many of those cases we could also just adjust the pageblocks... But
>>> perhaps they indeed shouldn't differ in the first place, unless there's an
>>> isolation attempt.
>>>
>>>> When we encounter a > MAX_PAGE_ORDER page, it can only come from
>>>> alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks.
>>>>
>>>> When we encounter a >pageblock_order <= MAX_PAGE_ORDER page,
>>>> check whether all pageblocks match, and if so (common case), don't
>>>> split them up just for the buddy to merge them back.
>>>>
>>>> This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy,
>>>> for example during system startups, memory onlining, or when isolating
>>>> consecutive pageblocks via alloc_contig_range()/memory offlining, that
>>>> we don't unnecessarily split up what we'll immediately merge again,
>>>> because the migratetypes match.
>>>>
>>>> Rename split_large_buddy() to __free_one_page_maybe_split(), to make it
>>>> clearer what's happening, and handle in it only natural buddy orders,
>>>> not the alloc_contig_range(__GFP_COMP) special case: handle that in
>>>> free_one_page() only.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>
>>> Acked-by: Vlastimil Babka <vbabka@suse.cz
>>>
>>> Hm but noticed something:
>>>
>>>> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
>>>> +		unsigned long pfn, int order, fpi_t fpi_flags)
>>>> +{
>>>> +	const unsigned long end_pfn = pfn + (1 << order);
>>>> +	int mt = get_pfnblock_migratetype(page, pfn);
>>>> +
>>>> +	VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
>>>>   	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>>>>   	/* Caller removed page from freelist, buddy info cleared! */
>>>>   	VM_WARN_ON_ONCE(PageBuddy(page));
>>>>
>>>> -	if (order > pageblock_order)
>>>> -		order = pageblock_order;
>>>> -
>>>> -	while (pfn != end) {
>>>> -		int mt = get_pfnblock_migratetype(page, pfn);
>>>> +	/*
>>>> +	 * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
>>>> +	 * pages that cover pageblocks with different migratetypes; for example
>>>> +	 * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
>>>> +	 * case, fallback to freeing individual pageblocks so they get put
>>>> +	 * onto the right lists.
>>>> +	 */
>>>> +	if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
>>>> +	    likely(order <= pageblock_order) ||
>>>> +	    pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
>>>> +		__free_one_page(page, pfn, zone, order, mt, fpi_flags);
>>>> +		return;
>>>> +	}
>>>>
>>>> -		__free_one_page(page, pfn, zone, order, mt, fpi);
>>>> -		pfn += 1 << order;
>>>> +	while (pfn != end_pfn) {
>>>> +		mt = get_pfnblock_migratetype(page, pfn);
>>>> +		__free_one_page(page, pfn, zone, pageblock_order, mt, fpi_flags);
>>>> +		pfn += pageblock_nr_pages;
>>>>   		page = pfn_to_page(pfn);
>>>
>>> This predates your patch, but seems potentially dangerous to attempt
>>> pfn_to_page(end_pfn) with SPARSEMEM and no vmemmap and the end_pfn perhaps
>>> being just outside of the valid range? Should we change that?
>>>
>>> But seems this code was initially introduced as part of Johannes'
>>> migratetype hygiene series.
>>
>> It starts as split_free_page() from commit b2c9e2fbba32 ("mm: make
>> alloc_contig_range work at pageblock granularity”), but harmless since
>> it is only used to split a buddy page. Then commit fd919a85cd55 ("mm:
>> page_isolation: prepare for hygienic freelists") refactored it, which
>> should be fine, since it is still used for the same purpose in page
>> isolation. Then commit e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
>> used it for gigantic hugetlb.
>>
>> For SPARSEMEM && !SPARSEMEM_VMEMMAP, PFNs are contiguous, vmemmap might not
>> be. The code above using pfn in the loop might be fine. And since order
>> is provided, unless the caller is providing a falsely large order, pfn
>> should be valid. Or am I missing anything?
>
> I think the question is, what happens when we call pfn_to_page() on a PFN that falls into a memory section that is either offline, doesn't have a memmap, or does not exist.
>
> With CONFIG_SPARSEMEM, we do a
>
> struct mem_section *__sec = __pfn_to_section(__pfn)
> __section_mem_map_addr(__sec) + __pfn;
>
> __pfn_to_section() can return NULL, in which case __section_mem_map_addr() would dereference NULL.
>
> I assume it ould happen in corner cases, if we'd exceed NR_SECTION_ROOTS. (IOW, large memory, and we free a page that is at the very end of physical memory).
>
> Likely, we should do the pfn_to_page() before the __free_one_page() call.

Got it. Both you and Vlastimil gave the same corner case issue.
I agree that doing pfn_to_page() before the __free_one_page() could get rid of
the concern.

Thank you both.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
  2024-12-09 21:42         ` Zi Yan
@ 2024-12-09 22:10           ` David Hildenbrand
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2024-12-09 22:10 UTC (permalink / raw)
  To: Zi Yan, Vlastimil Babka
  Cc: linux-kernel, Johannes Weiner, linux-mm, Andrew Morton, Yu Zhao

On 09.12.24 22:42, Zi Yan wrote:
> On 9 Dec 2024, at 16:35, David Hildenbrand wrote:
> 
>> On 09.12.24 20:23, Zi Yan wrote:
>>> On 9 Dec 2024, at 14:01, Vlastimil Babka wrote:
>>>
>>>> On 12/6/24 10:59, David Hildenbrand wrote:
>>>>> Let's special-case for the common scenarios that:
>>>>>
>>>>> (a) We are freeing pages <= pageblock_order
>>>>> (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match
>>>>>       (especially, no mixture of isolated and non-isolated pageblocks)
>>>>
>>>> Well in many of those cases we could also just adjust the pageblocks... But
>>>> perhaps they indeed shouldn't differ in the first place, unless there's an
>>>> isolation attempt.
>>>>
>>>>> When we encounter a > MAX_PAGE_ORDER page, it can only come from
>>>>> alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks.
>>>>>
>>>>> When we encounter a >pageblock_order <= MAX_PAGE_ORDER page,
>>>>> check whether all pageblocks match, and if so (common case), don't
>>>>> split them up just for the buddy to merge them back.
>>>>>
>>>>> This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy,
>>>>> for example during system startups, memory onlining, or when isolating
>>>>> consecutive pageblocks via alloc_contig_range()/memory offlining, that
>>>>> we don't unnecessarily split up what we'll immediately merge again,
>>>>> because the migratetypes match.
>>>>>
>>>>> Rename split_large_buddy() to __free_one_page_maybe_split(), to make it
>>>>> clearer what's happening, and handle in it only natural buddy orders,
>>>>> not the alloc_contig_range(__GFP_COMP) special case: handle that in
>>>>> free_one_page() only.
>>>>>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>
>>>> Acked-by: Vlastimil Babka <vbabka@suse.cz
>>>>
>>>> Hm but noticed something:
>>>>
>>>>> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page,
>>>>> +		unsigned long pfn, int order, fpi_t fpi_flags)
>>>>> +{
>>>>> +	const unsigned long end_pfn = pfn + (1 << order);
>>>>> +	int mt = get_pfnblock_migratetype(page, pfn);
>>>>> +
>>>>> +	VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER);
>>>>>    	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
>>>>>    	/* Caller removed page from freelist, buddy info cleared! */
>>>>>    	VM_WARN_ON_ONCE(PageBuddy(page));
>>>>>
>>>>> -	if (order > pageblock_order)
>>>>> -		order = pageblock_order;
>>>>> -
>>>>> -	while (pfn != end) {
>>>>> -		int mt = get_pfnblock_migratetype(page, pfn);
>>>>> +	/*
>>>>> +	 * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES
>>>>> +	 * pages that cover pageblocks with different migratetypes; for example
>>>>> +	 * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely)
>>>>> +	 * case, fallback to freeing individual pageblocks so they get put
>>>>> +	 * onto the right lists.
>>>>> +	 */
>>>>> +	if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) ||
>>>>> +	    likely(order <= pageblock_order) ||
>>>>> +	    pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) {
>>>>> +		__free_one_page(page, pfn, zone, order, mt, fpi_flags);
>>>>> +		return;
>>>>> +	}
>>>>>
>>>>> -		__free_one_page(page, pfn, zone, order, mt, fpi);
>>>>> -		pfn += 1 << order;
>>>>> +	while (pfn != end_pfn) {
>>>>> +		mt = get_pfnblock_migratetype(page, pfn);
>>>>> +		__free_one_page(page, pfn, zone, pageblock_order, mt, fpi_flags);
>>>>> +		pfn += pageblock_nr_pages;
>>>>>    		page = pfn_to_page(pfn);
>>>>
>>>> This predates your patch, but seems potentially dangerous to attempt
>>>> pfn_to_page(end_pfn) with SPARSEMEM and no vmemmap and the end_pfn perhaps
>>>> being just outside of the valid range? Should we change that?
>>>>
>>>> But seems this code was initially introduced as part of Johannes'
>>>> migratetype hygiene series.
>>>
>>> It starts as split_free_page() from commit b2c9e2fbba32 ("mm: make
>>> alloc_contig_range work at pageblock granularity”), but harmless since
>>> it is only used to split a buddy page. Then commit fd919a85cd55 ("mm:
>>> page_isolation: prepare for hygienic freelists") refactored it, which
>>> should be fine, since it is still used for the same purpose in page
>>> isolation. Then commit e98337d11bbd ("mm/contig_alloc: support __GFP_COMP")
>>> used it for gigantic hugetlb.
>>>
>>> For SPARSEMEM && !SPARSEMEM_VMEMMAP, PFNs are contiguous, vmemmap might not
>>> be. The code above using pfn in the loop might be fine. And since order
>>> is provided, unless the caller is providing a falsely large order, pfn
>>> should be valid. Or am I missing anything?
>>
>> I think the question is, what happens when we call pfn_to_page() on a PFN that falls into a memory section that is either offline, doesn't have a memmap, or does not exist.
>>
>> With CONFIG_SPARSEMEM, we do a
>>
>> struct mem_section *__sec = __pfn_to_section(__pfn)
>> __section_mem_map_addr(__sec) + __pfn;
>>
>> __pfn_to_section() can return NULL, in which case __section_mem_map_addr() would dereference NULL.
>>
>> I assume it ould happen in corner cases, if we'd exceed NR_SECTION_ROOTS. (IOW, large memory, and we free a page that is at the very end of physical memory).
>>
>> Likely, we should do the pfn_to_page() before the __free_one_page() call.
> 
> Got it. Both you and Vlastimil gave the same corner case issue.
> I agree that doing pfn_to_page() before the __free_one_page() could get rid of
> the concern.

Thanks you both for the review. I'll resend a v2 tomorrow, including a 
patch to fix that up first.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages
  2024-12-06  9:59 ` [PATCH v1 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages David Hildenbrand
  2024-12-06 16:59   ` Zi Yan
@ 2024-12-09 22:13   ` Vlastimil Babka
  1 sibling, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2024-12-09 22:13 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel; +Cc: linux-mm, Andrew Morton, Zi Yan, Yu Zhao

On 12/6/24 10:59, David Hildenbrand wrote:
> Let's fixup the comment, documenting how free_one_page_maybe_split()
> comes into play.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_isolation.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index c608e9d728655..63fddf283e681 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -298,11 +298,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * pagelbocks.
>   * [      MAX_PAGE_ORDER         ]
>   * [  pageblock0  |  pageblock1  ]
> - * When either pageblock is isolated, if it is a free page, the page is not
> - * split into separate migratetype lists, which is supposed to; if it is an
> - * in-use page and freed later, __free_one_page() does not split the free page
> - * either. The function handles this by splitting the free page or migrating
> - * the in-use page then splitting the free page.
> + * When either pageblock is isolated, if it is an in-use page and freed later,
> + * __free_one_page_maybe_split() will split the free page if required. If the
> + * page is already free, this function handles this by splitting the free page
> + * through move_freepages_block_isolate()->__free_one_page_maybe_split().
>   */
>  static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>  		bool isolate_before, bool skip_isolation, int migratetype)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate()
  2024-12-09 19:01   ` Vlastimil Babka
  2024-12-09 19:23     ` Zi Yan
@ 2024-12-10  9:39     ` David Hildenbrand
  1 sibling, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2024-12-10  9:39 UTC (permalink / raw)
  To: Vlastimil Babka, linux-kernel, Johannes Weiner
  Cc: linux-mm, Andrew Morton, Zi Yan, Yu Zhao

On 09.12.24 20:01, Vlastimil Babka wrote:
> On 12/6/24 10:59, David Hildenbrand wrote:
>> Let's special-case for the common scenarios that:
>>
>> (a) We are freeing pages <= pageblock_order
>> (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match
>>      (especially, no mixture of isolated and non-isolated pageblocks)
> 
> Well in many of those cases we could also just adjust the pageblocks... But
> perhaps they indeed shouldn't differ in the first place, unless there's an
> isolation attempt.

Thanks for the review and finding that one flaw.

Yes, I agree: usually we expect this to only happen with isolated 
pageblocks. At least in the scenarios I have in mind (boot, hotplug, 
alloc_contig_range()), we should only ever have a mixture of isolated 
and !isolated.

Maybe one could even special case on zone->nr_isolate_pageblock: if 0, 
just assume all pageblocks are the equal. I'll look into some details 
and might send a follow-up patch.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-12-10  9:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-06  9:59 [PATCH v1 0/2] mm/page_alloc: rework conditional splitting >= pageblock_order pages when freeing David Hildenbrand
2024-12-06  9:59 ` [PATCH v1 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand
2024-12-06 16:58   ` Zi Yan
2024-12-07  6:48   ` Yu Zhao
2024-12-09 19:01   ` Vlastimil Babka
2024-12-09 19:23     ` Zi Yan
2024-12-09 21:35       ` David Hildenbrand
2024-12-09 21:42         ` Zi Yan
2024-12-09 22:10           ` David Hildenbrand
2024-12-09 21:36       ` Vlastimil Babka
2024-12-10  9:39     ` David Hildenbrand
2024-12-06  9:59 ` [PATCH v1 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages David Hildenbrand
2024-12-06 16:59   ` Zi Yan
2024-12-09 22:13   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox