linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Wei Yang <richard.weiyang@gmail.com>
To: Yuan Liu <yuan1.liu@intel.com>
Cc: David Hildenbrand <david@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Mike Rapoport <rppt@kernel.org>,
	Wei Yang <richard.weiyang@gmail.com>,
	linux-mm@kvack.org, Yong Hu <yong.hu@intel.com>,
	Nanhai Zou <nanhai.zou@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Yu C Chen <yu.c.chen@intel.com>, Pan Deng <pan.deng@intel.com>,
	Tianyou Li <tianyou.li@intel.com>,
	Chen Zhang <zhangchen.kidd@jd.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
Date: Mon, 13 Apr 2026 13:06:33 +0000	[thread overview]
Message-ID: <20260413130633.knzkliyqvjhuz2kd@master> (raw)
In-Reply-To: <20260408031615.1831922-1-yuan1.liu@intel.com>

On Tue, Apr 07, 2026 at 11:16:15PM -0400, Yuan Liu wrote:
[...]
> 
>-void set_zone_contiguous(struct zone *zone)
>-{
>-	unsigned long block_start_pfn = zone->zone_start_pfn;
>-	unsigned long block_end_pfn;
>-
>-	block_end_pfn = pageblock_end_pfn(block_start_pfn);
>-	for (; block_start_pfn < zone_end_pfn(zone);
>-			block_start_pfn = block_end_pfn,
>-			 block_end_pfn += pageblock_nr_pages) {
>-
>-		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>-
>-		if (!__pageblock_pfn_to_page(block_start_pfn,
>-					     block_end_pfn, zone))
>-			return;
>-		cond_resched();
>-	}
>-
>-	/* We confirm that there is no hole */
>-	zone->contiguous = true;
>-}
>-

Hi, 

I may see a behavioral change after this patch.

  * An originally non-contiguous zone would be detected as contiguous after this patch.

My test setup:

  Did test in a qemu with 6G memory with memblock_debug enabled.
  And adjust the /proc/zoneinfo to display zone->contiguous field.

  Originally, memblock_dump shows:

     MEMBLOCK configuration:
      memory size = 0x000000017ff7dc00 reserved size = 0x0000000005a9d9c2
      memory.cnt  = 0x3
      memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
      memory[0x1]     [0x0000000000100000-0x00000000bffdefff], 0x00000000bfedf000 bytes on node 0 flags: 0x0
  +-  memory[0x2]     [0x0000000100000000-0x00000001bfffffff], 0x00000000c0000000 bytes on node 1 flags: 0x0
    
  And zone range shows:

     Zone ranges:
       DMA      [mem 0x0000000000001000-0x0000000000ffffff]
       DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
       Normal   [mem 0x0000000100000000-0x00000001bfffffff]     <--- entire last memblock region

  With the last memblock region fits in Node 1 Zone Normal. 

  Then I punch a hole in this region with 2M(subsection) size with following
  change, to mimic there is a hole in memory range:

    @@ -1372,5 +1372,8 @@ __init void e820__memblock_setup(void)
            /* Throw away partial pages: */
            memblock_trim_memory(PAGE_SIZE);
     
    +       memblock_remove(0x140000000, 0x200000);
    +
            memblock_dump_all();
     }
    
  Then the memblock dump shows:

     MEMBLOCK configuration:
      memory size = 0x000000017fd7dc00 reserved size = 0x0000000005a97 9c2
      memory.cnt  = 0x4
      memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
      memory[0x1]     [0x0000000000100000-0x00000000bffdefff], 0x00000000bfedf000 bytes on node 0 flags: 0x0
  +-  memory[0x2]     [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 1 flags: 0x0
  +-  memory[0x3]     [0x0000000140200000-0x00000001bfffffff], 0x000000007fe00000 bytes on node 1 flags: 0x0

  We can see the original one memblock region is divided into two, with a hole
  of 2M in the middle.

  Not sure this is a reasonable mimic of memory hole. Also I tried to
  punch a larger hole, e.g. 10M, still see the behavioral change.

The /proc/zoneinfo result:

    w/o patch

    Node 1, zone   Normal
      pages free     469271
            boost    0
            min      8567
            low      10708
            high     12849
            promo    14990
            spanned  786432
            present  785920
            contigu  0           <--- zone is non-contiguous
            managed  766024
            cma      0
    
    with patch

    Node 1, zone   Normal
      pages free     121098
            boost    0
            min      8665
            low      10831
            high     12997
            promo    15163
            spanned  786432
            present  785920
            contigu  1           <--- zone is contiguous
            managed  773041
            cma      0

  This shows we treat Node 1 Zone Normal as non-contiguous before, but treat
  it a contiguous zone after this patch.

Reason:

  set_zone_contiguous()
      __pageblock_pfn_to_page()
          pfn_to_online_page()
	      pfn_section_valid() <--- check subsection

  When SPARSEMEM_VMEMMEP is set, pfn_section_valid() checks subsection bit to
  decide if it is valid. For a hole, the corresponding bit is not set. So it
  is non-contiguous before the patch.

  After this patch, the memory map in this hole also contributes to
  pages_with_online_memmap, so it is treated as contiguous.

Some question:

  I suspect with !SPARSEMEM_VMEMMEP, we always treat Zone Normal as
  contiguous, because we don't set subsection. So it looks the behavior is
  different from SPARSEMEM_VMEMMEP. But I didn't manage to build kernel with
  !SPARSEMEM_VMEMMEP to verify.

  I see the discussion on defining zone->contiguous as safe to use
  pfn_to_page() for the whole zone. For this purpose, current change looks
  good to me. Since we do allocate and init memory map for holes.

  But pageblock_pfn_to_page() is used for compaction and other. A pfn with
  memory map but no actual memory seems not guarantee to be a usable page. So
  the correct usage of pageblock_pfn_to_page() is after
  pageblock_pfn_to_page() return a page, we should validate each page in the
  range before using? I am a little lost here.


> /*
>  * Check if a PFN range intersects multiple zones on one or more
>  * NUMA nodes. Specify the @nid argument if it is known that this
>-- 
>2.47.3

-- 
Wei Yang
Help you, Help me


      parent reply	other threads:[~2026-04-13 13:06 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08  3:16 Yuan Liu
2026-04-08  7:36 ` David Hildenbrand (Arm)
2026-04-08 12:29   ` Liu, Yuan1
2026-04-08 12:31     ` David Hildenbrand (Arm)
2026-04-08 12:37       ` Liu, Yuan1
2026-04-09 14:40   ` Mike Rapoport
2026-04-09 15:08     ` David Hildenbrand (Arm)
2026-04-13 13:06 ` Wei Yang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260413130633.knzkliyqvjhuz2kd@master \
    --to=richard.weiyang@gmail.com \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nanhai.zou@intel.com \
    --cc=osalvador@suse.de \
    --cc=pan.deng@intel.com \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rppt@kernel.org \
    --cc=tianyou.li@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=yong.hu@intel.com \
    --cc=yu.c.chen@intel.com \
    --cc=yuan1.liu@intel.com \
    --cc=zhangchen.kidd@jd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox