* [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned
@ 2025-02-17 9:29 Zhenhua Huang
2025-02-17 9:44 ` David Hildenbrand
0 siblings, 1 reply; 7+ messages in thread
From: Zhenhua Huang @ 2025-02-17 9:29 UTC (permalink / raw)
To: anshuman.khandual, catalin.marinas, david
Cc: will, ardb, ryan.roberts, mark.rutland, joey.gouly, dave.hansen,
akpm, chenfeiyang, chenhuacai, linux-mm, linux-arm-kernel,
linux-kernel, quic_tingweiz, Zhenhua Huang
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The Next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
---
arch/arm64/mm/mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..eec1666da368 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1178,7 +1178,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
--
2.25.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned
2025-02-17 9:29 [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned Zhenhua Huang
@ 2025-02-17 9:44 ` David Hildenbrand
2025-02-17 10:34 ` Zhenhua Huang
0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2025-02-17 9:44 UTC (permalink / raw)
To: Zhenhua Huang, anshuman.khandual, catalin.marinas
Cc: will, ardb, ryan.roberts, mark.rutland, joey.gouly, dave.hansen,
akpm, chenfeiyang, chenhuacai, linux-mm, linux-arm-kernel,
linux-kernel, quic_tingweiz
On 17.02.25 10:29, Zhenhua Huang wrote:
> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
> to 27, making one section 128M. The related page struct which vmemmap
> points to is 2M then.
> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
> vmemmap to populate at the PMD section level which was suitable
> initially since hot plug granule is always one section(128M). However,
> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
> existing arm64 assumptions.
>
> The first problem is that if start or end is not aligned to a section
> boundary, such as when a subsection is hot added, populating the entire
> section is wasteful.
>
> The Next problem is if we hotplug something that spans part of 128 MiB
> section (subsections, let's call it memblock1), and then hotplug something
> that spans another part of a 128 MiB section(subsections, let's call it
> memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
> the entire PMD entry which also supports memblock2 even though memblock2
> is still active.
>
> Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
> fix similar to x86-64: populate to pages levels if start/end is not aligned
> with section boundary.
>
> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
> ---
> arch/arm64/mm/mmu.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index b4df5bc5b1b8..eec1666da368 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1178,7 +1178,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> {
> WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>
> - if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
> + if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
> + (end - start < PAGES_PER_SECTION * sizeof(struct page)))
> return vmemmap_populate_basepages(start, end, node, altmap);
> else
> return vmemmap_populate_hugepages(start, end, node, altmap);
Yes, this does mimic what x86 does. That handling does look weird, because it
doesn't care about any address alignments, only about the size, which is odd.
I wonder if we could do better and move this handling
into vmemmap_populate_hugepages(), where we already have a fallback
to vmemmap_populate_basepages().
Something like:
One thing that confuses me is the "altmap" handling in x86-64 code: in particular
why it is ignored in some cases. So that might need a bit of thought / double-checking.
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 01ea7c6df3036..57542313c0000 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1546,10 +1546,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
VM_BUG_ON(!PAGE_ALIGNED(start));
VM_BUG_ON(!PAGE_ALIGNED(end));
- if (end - start < PAGES_PER_SECTION * sizeof(struct page))
- err = vmemmap_populate_basepages(start, end, node, NULL);
- else if (boot_cpu_has(X86_FEATURE_PSE))
+ if (boot_cpu_has(X86_FEATURE_PSE))
err = vmemmap_populate_hugepages(start, end, node, altmap);
+ else
+ err = vmemmap_populate_basepages(start, end, node, NULL);
else if (altmap) {
pr_err_once("%s: no cpu support for altmap allocations\n",
__func__);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 3287ebadd167d..8b217265b25b1 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -300,6 +300,10 @@ int __weak __meminit vmemmap_check_pmd(pmd_t *pmd, int node,
return 0;
}
+/*
+ * Try to populate PMDs, but fallback to populating base pages when ranges
+ * would only partially cover a PMD.
+ */
int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
int node, struct vmem_altmap *altmap)
{
@@ -313,6 +317,9 @@ int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
for (addr = start; addr < end; addr = next) {
next = pmd_addr_end(addr, end);
+ if (!IS_ALIGNED(addr, PMD_SIZE) || !IS_ALIGNED(next, PMD_SIZE))
+ goto fallback;
+
pgd = vmemmap_pgd_populate(addr, node);
if (!pgd)
return -ENOMEM;
@@ -346,6 +353,7 @@ int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned long end,
}
} else if (vmemmap_check_pmd(pmd, node, addr, next))
continue;
+fallback:
if (vmemmap_populate_basepages(addr, next, node, altmap))
return -ENOMEM;
}
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned
2025-02-17 9:44 ` David Hildenbrand
@ 2025-02-17 10:34 ` Zhenhua Huang
2025-02-17 14:30 ` David Hildenbrand
0 siblings, 1 reply; 7+ messages in thread
From: Zhenhua Huang @ 2025-02-17 10:34 UTC (permalink / raw)
To: David Hildenbrand, anshuman.khandual, catalin.marinas
Cc: will, ardb, ryan.roberts, mark.rutland, joey.gouly, dave.hansen,
akpm, chenfeiyang, chenhuacai, linux-mm, linux-arm-kernel,
linux-kernel, quic_tingweiz
On 2025/2/17 17:44, David Hildenbrand wrote:
> On 17.02.25 10:29, Zhenhua Huang wrote:
>> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
>> to 27, making one section 128M. The related page struct which vmemmap
>> points to is 2M then.
>> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
>> vmemmap to populate at the PMD section level which was suitable
>> initially since hot plug granule is always one section(128M). However,
>> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
>> existing arm64 assumptions.
>>
>> The first problem is that if start or end is not aligned to a section
>> boundary, such as when a subsection is hot added, populating the entire
>> section is wasteful.
>>
>> The Next problem is if we hotplug something that spans part of 128 MiB
>> section (subsections, let's call it memblock1), and then hotplug
>> something
>> that spans another part of a 128 MiB section(subsections, let's call it
>> memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
>> the entire PMD entry which also supports memblock2 even though memblock2
>> is still active.
>>
>> Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
>> fix similar to x86-64: populate to pages levels if start/end is not
>> aligned
>> with section boundary.
>>
>> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
>> ---
>> arch/arm64/mm/mmu.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index b4df5bc5b1b8..eec1666da368 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1178,7 +1178,8 @@ int __meminit vmemmap_populate(unsigned long
>> start, unsigned long end, int node,
>> {
>> WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>> - if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
>> + if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
>> + (end - start < PAGES_PER_SECTION * sizeof(struct page)))
>> return vmemmap_populate_basepages(start, end, node, altmap);
>> else
>> return vmemmap_populate_hugepages(start, end, node, altmap);
>
> Yes, this does mimic what x86 does. That handling does look weird,
> because it
> doesn't care about any address alignments, only about the size, which is
> odd.
>
> I wonder if we could do better and move this handling
> into vmemmap_populate_hugepages(), where we already have a fallback
> to vmemmap_populate_basepages().
Hi David,
I had the same doubt initially.
After going through the codes, I noticed for vmemmap_populate(), the
arguments "start" and "end" passed down should already be within one
section.
early path:
for_each_present_section_nr
__populate_section_memmap
..
vmemmap_populate()
hotplug path:
__add_pages
section_activate
vmemmap_populate()
Therefore.. focusing only on the size seems OK to me, and fall back
solution below appears unnecessary?
BTW, I have few more doubt about the original codes below, but they're
not bugs, so I have not raised them. Please correct me if it's incorrect.
>
> Something like:
>
> One thing that confuses me is the "altmap" handling in x86-64 code: in
> particular
> why it is ignored in some cases. So that might need a bit of thought /
> double-checking.
>
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 01ea7c6df3036..57542313c0000 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1546,10 +1546,10 @@ int __meminit vmemmap_populate(unsigned long
> start, unsigned long end, int node,
> VM_BUG_ON(!PAGE_ALIGNED(start));
> VM_BUG_ON(!PAGE_ALIGNED(end));
>
> - if (end - start < PAGES_PER_SECTION * sizeof(struct page))
> - err = vmemmap_populate_basepages(start, end, node, NULL);
> - else if (boot_cpu_has(X86_FEATURE_PSE))
> + if (boot_cpu_has(X86_FEATURE_PSE))
> err = vmemmap_populate_hugepages(start, end, node,
> altmap);
> + else
> + err = vmemmap_populate_basepages(start, end, node, NULL);
> else if (altmap) {
> pr_err_once("%s: no cpu support for altmap allocations\n",
> __func__);
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 3287ebadd167d..8b217265b25b1 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -300,6 +300,10 @@ int __weak __meminit vmemmap_check_pmd(pmd_t *pmd,
> int node,
> return 0;
> }
>
> +/*
> + * Try to populate PMDs, but fallback to populating base pages when ranges
> + * would only partially cover a PMD.
> + */
> int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned
> long end,
> int node, struct vmem_altmap
> *altmap)
> {
> @@ -313,6 +317,9 @@ int __meminit vmemmap_populate_hugepages(unsigned
> long start, unsigned long end,
> for (addr = start; addr < end; addr = next) {
This for loop appears to be redundant for arm64 as well, as above
mentioned, a single call to pmd_addr_end() should suffice.
> next = pmd_addr_end(addr, end);
>
> + if (!IS_ALIGNED(addr, PMD_SIZE) || !IS_ALIGNED(next,
> PMD_SIZE))
> + goto fallback;
> +
> pgd = vmemmap_pgd_populate(addr, node);
> if (!pgd)
> return -ENOMEM;
> @@ -346,6 +353,7 @@ int __meminit vmemmap_populate_hugepages(unsigned
> long start, unsigned long end,
> }
> } else if (vmemmap_check_pmd(pmd, node, addr, next))
> continue;
> +fallback:
> if (vmemmap_populate_basepages(addr, next, node, altmap))
> return -ENOMEM;
It seems we have no chance to call populate_basepages here?
> }
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned
2025-02-17 10:34 ` Zhenhua Huang
@ 2025-02-17 14:30 ` David Hildenbrand
2025-02-18 3:07 ` Zhenhua Huang
0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2025-02-17 14:30 UTC (permalink / raw)
To: Zhenhua Huang, anshuman.khandual, catalin.marinas
Cc: will, ardb, ryan.roberts, mark.rutland, joey.gouly, dave.hansen,
akpm, chenfeiyang, chenhuacai, linux-mm, linux-arm-kernel,
linux-kernel, quic_tingweiz
On 17.02.25 11:34, Zhenhua Huang wrote:
>
>
> On 2025/2/17 17:44, David Hildenbrand wrote:
>> On 17.02.25 10:29, Zhenhua Huang wrote:
>>> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
>>> to 27, making one section 128M. The related page struct which vmemmap
>>> points to is 2M then.
>>> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
>>> vmemmap to populate at the PMD section level which was suitable
>>> initially since hot plug granule is always one section(128M). However,
>>> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
>>> existing arm64 assumptions.
>>>
>>> The first problem is that if start or end is not aligned to a section
>>> boundary, such as when a subsection is hot added, populating the entire
>>> section is wasteful.
>>>
>>> The Next problem is if we hotplug something that spans part of 128 MiB
>>> section (subsections, let's call it memblock1), and then hotplug
>>> something
>>> that spans another part of a 128 MiB section(subsections, let's call it
>>> memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
>>> the entire PMD entry which also supports memblock2 even though memblock2
>>> is still active.
>>>
>>> Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
>>> fix similar to x86-64: populate to pages levels if start/end is not
>>> aligned
>>> with section boundary.
>>>
>>> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
>>> ---
>>> arch/arm64/mm/mmu.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index b4df5bc5b1b8..eec1666da368 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -1178,7 +1178,8 @@ int __meminit vmemmap_populate(unsigned long
>>> start, unsigned long end, int node,
>>> {
>>> WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>> - if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
>>> + if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
>>> + (end - start < PAGES_PER_SECTION * sizeof(struct page)))
>>> return vmemmap_populate_basepages(start, end, node, altmap);
>>> else
>>> return vmemmap_populate_hugepages(start, end, node, altmap);
>>
>> Yes, this does mimic what x86 does. That handling does look weird,
>> because it
>> doesn't care about any address alignments, only about the size, which is
>> odd.
>>
>> I wonder if we could do better and move this handling
>> into vmemmap_populate_hugepages(), where we already have a fallback
>> to vmemmap_populate_basepages().
>
> Hi David,
>
> I had the same doubt initially.
> After going through the codes, I noticed for vmemmap_populate(), the
> arguments "start" and "end" passed down should already be within one
> section.
> early path:
> for_each_present_section_nr
> __populate_section_memmap
> ..
> vmemmap_populate()
>
> hotplug path:
> __add_pages
> section_activate
> vmemmap_populate()
>
> Therefore.. focusing only on the size seems OK to me, and fall back
> solution below appears unnecessary?
Ah, in that case it is fine. Might make sense to document/enforce that
somehow for the time being ...
>> +/*
>> + * Try to populate PMDs, but fallback to populating base pages when ranges
>> + * would only partially cover a PMD.
>> + */
>> int __meminit vmemmap_populate_hugepages(unsigned long start, unsigned
>> long end,
>> int node, struct vmem_altmap
>> *altmap)
>> {
>> @@ -313,6 +317,9 @@ int __meminit vmemmap_populate_hugepages(unsigned
>> long start, unsigned long end,
>> for (addr = start; addr < end; addr = next) {
>
> This for loop appears to be redundant for arm64 as well, as above
> mentioned, a single call to pmd_addr_end() should suffice.
Right, that was what was confusing me in the first place.
>
>> next = pmd_addr_end(addr, end);
>>
>> + if (!IS_ALIGNED(addr, PMD_SIZE) || !IS_ALIGNED(next,
>> PMD_SIZE))
>> + goto fallback;
>> +
>> pgd = vmemmap_pgd_populate(addr, node);
>> if (!pgd)
>> return -ENOMEM;
>> @@ -346,6 +353,7 @@ int __meminit vmemmap_populate_hugepages(unsigned
>> long start, unsigned long end,
>> }
>> } else if (vmemmap_check_pmd(pmd, node, addr, next))
>> continue;
>> +fallback:
>> if (vmemmap_populate_basepages(addr, next, node, altmap))
>> return -ENOMEM;
>
> It seems we have no chance to call populate_basepages here?
Can you elaborate?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned
2025-02-17 14:30 ` David Hildenbrand
@ 2025-02-18 3:07 ` Zhenhua Huang
2025-02-26 17:13 ` David Hildenbrand
0 siblings, 1 reply; 7+ messages in thread
From: Zhenhua Huang @ 2025-02-18 3:07 UTC (permalink / raw)
To: David Hildenbrand, anshuman.khandual, catalin.marinas
Cc: will, ardb, ryan.roberts, mark.rutland, joey.gouly, dave.hansen,
akpm, chenfeiyang, chenhuacai, linux-mm, linux-arm-kernel,
linux-kernel, quic_tingweiz
On 2025/2/17 22:30, David Hildenbrand wrote:
> On 17.02.25 11:34, Zhenhua Huang wrote:
>>
>>
>> On 2025/2/17 17:44, David Hildenbrand wrote:
>>> On 17.02.25 10:29, Zhenhua Huang wrote:
>>>> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is
>>>> set
>>>> to 27, making one section 128M. The related page struct which vmemmap
>>>> points to is 2M then.
>>>> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
>>>> vmemmap to populate at the PMD section level which was suitable
>>>> initially since hot plug granule is always one section(128M). However,
>>>> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>>> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
>>>> existing arm64 assumptions.
>>>>
>>>> The first problem is that if start or end is not aligned to a section
>>>> boundary, such as when a subsection is hot added, populating the entire
>>>> section is wasteful.
>>>>
>>>> The Next problem is if we hotplug something that spans part of 128 MiB
>>>> section (subsections, let's call it memblock1), and then hotplug
>>>> something
>>>> that spans another part of a 128 MiB section(subsections, let's call it
>>>> memblock2), and subsequently unplug memblock1, vmemmap_free() will
>>>> clear
>>>> the entire PMD entry which also supports memblock2 even though
>>>> memblock2
>>>> is still active.
>>>>
>>>> Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
>>>> fix similar to x86-64: populate to pages levels if start/end is not
>>>> aligned
>>>> with section boundary.
>>>>
>>>> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
>>>> ---
>>>> arch/arm64/mm/mmu.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>> index b4df5bc5b1b8..eec1666da368 100644
>>>> --- a/arch/arm64/mm/mmu.c
>>>> +++ b/arch/arm64/mm/mmu.c
>>>> @@ -1178,7 +1178,8 @@ int __meminit vmemmap_populate(unsigned long
>>>> start, unsigned long end, int node,
>>>> {
>>>> WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>>> - if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
>>>> + if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
>>>> + (end - start < PAGES_PER_SECTION * sizeof(struct page)))
>>>> return vmemmap_populate_basepages(start, end, node, altmap);
>>>> else
>>>> return vmemmap_populate_hugepages(start, end, node, altmap);
>>>
>>> Yes, this does mimic what x86 does. That handling does look weird,
>>> because it
>>> doesn't care about any address alignments, only about the size, which is
>>> odd.
>>>
>>> I wonder if we could do better and move this handling
>>> into vmemmap_populate_hugepages(), where we already have a fallback
>>> to vmemmap_populate_basepages().
>>
>> Hi David,
>>
>> I had the same doubt initially.
>> After going through the codes, I noticed for vmemmap_populate(), the
>> arguments "start" and "end" passed down should already be within one
>> section.
>> early path:
>> for_each_present_section_nr
>> __populate_section_memmap
>> ..
>> vmemmap_populate()
>>
>> hotplug path:
>> __add_pages
>> section_activate
>> vmemmap_populate()
>>
>> Therefore.. focusing only on the size seems OK to me, and fall back
>> solution below appears unnecessary?
>
> Ah, in that case it is fine. Might make sense to document/enforce that
> somehow for the time being ...
Shall I document and WARN_ON if the size exceeds? like:
WARN_ON(end - start > PAGES_PER_SECTION * sizeof(struct page))
Since vmemmap_populate() is implemented per architecture, the change
should apply for other architectures as well. However I have no setup to
test it on...
Therefore, May I implement it only for arm64 now ?
Additionally, from previous discussion, the change is worth
backporting(apologies for missing to CC stable kernel in this version).
Keeping it for arm64 should simplify for backporting. WDYT?
>
>
>>> +/*
>>> + * Try to populate PMDs, but fallback to populating base pages when
>>> ranges
>>> + * would only partially cover a PMD.
>>> + */
>>> int __meminit vmemmap_populate_hugepages(unsigned long start,
>>> unsigned
>>> long end,
>>> int node, struct vmem_altmap
>>> *altmap)
>>> {
>>> @@ -313,6 +317,9 @@ int __meminit vmemmap_populate_hugepages(unsigned
>>> long start, unsigned long end,
>>> for (addr = start; addr < end; addr = next) {
>>
>> This for loop appears to be redundant for arm64 as well, as above
>> mentioned, a single call to pmd_addr_end() should suffice.
>
> Right, that was what was confusing me in the first place.
>
>>
>>> next = pmd_addr_end(addr, end);
>>>
>>> + if (!IS_ALIGNED(addr, PMD_SIZE) || !IS_ALIGNED(next,
>>> PMD_SIZE))
>>> + goto fallback;
>>> +
>>> pgd = vmemmap_pgd_populate(addr, node);
>>> if (!pgd)
>>> return -ENOMEM;
>>> @@ -346,6 +353,7 @@ int __meminit vmemmap_populate_hugepages(unsigned
>>> long start, unsigned long end,
>>> }
>>> } else if (vmemmap_check_pmd(pmd, node, addr, next))
>>> continue;
>>> +fallback:
>>> if (vmemmap_populate_basepages(addr, next, node,
>>> altmap))
>>> return -ENOMEM;
>>
>> It seems we have no chance to call populate_basepages here?
>
>
> Can you elaborate?
It's invoked within vmemmap_populate_hugepages(), which is called by
vmemmap_populate(). This implies that we are always performing a whole
section hotplug?
However, since it's common code used by other architectures like x86,
RISC-V and LoongArch, it is still necessary to review the code for these
architectures as well. At the very least, it's not a BUG :)
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned
2025-02-18 3:07 ` Zhenhua Huang
@ 2025-02-26 17:13 ` David Hildenbrand
2025-03-03 9:29 ` Zhenhua Huang
0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2025-02-26 17:13 UTC (permalink / raw)
To: Zhenhua Huang, anshuman.khandual, catalin.marinas
Cc: will, ardb, ryan.roberts, mark.rutland, joey.gouly, dave.hansen,
akpm, chenfeiyang, chenhuacai, linux-mm, linux-arm-kernel,
linux-kernel, quic_tingweiz
Sorry, I somehow missed this mail.
>>> Hi David,
>>>
>>> I had the same doubt initially.
>>> After going through the codes, I noticed for vmemmap_populate(), the
>>> arguments "start" and "end" passed down should already be within one
>>> section.
>>> early path:
>>> for_each_present_section_nr
>>> __populate_section_memmap
>>> ..
>>> vmemmap_populate()
>>>
>>> hotplug path:
>>> __add_pages
>>> section_activate
>>> vmemmap_populate()
>>>
>>> Therefore.. focusing only on the size seems OK to me, and fall back
>>> solution below appears unnecessary?
>>
>> Ah, in that case it is fine. Might make sense to document/enforce that
>> somehow for the time being ...
>
> Shall I document and WARN_ON if the size exceeds? like:
> WARN_ON(end - start > PAGES_PER_SECTION * sizeof(struct page))
Probably WARN_ON_ONCE() along with a comment that we should never exceed
a single memory section.
>
> Since vmemmap_populate() is implemented per architecture, the change
> should apply for other architectures as well. However I have no setup to
> test it on...
> Therefore, May I implement it only for arm64 now ?
Would work for me; better than no warning.
> Additionally, from previous discussion, the change is worth
> backporting(apologies for missing to CC stable kernel in this version).
> Keeping it for arm64 should simplify for backporting. WDYT?
Jup. Of course, we could add a generic warning in a separate patch.
>
>>
>>
>>>> +/*
>>>> + * Try to populate PMDs, but fallback to populating base pages when
>>>> ranges
>>>> + * would only partially cover a PMD.
>>>> + */
>>>> int __meminit vmemmap_populate_hugepages(unsigned long start,
>>>> unsigned
>>>> long end,
>>>> int node, struct vmem_altmap
>>>> *altmap)
>>>> {
>>>> @@ -313,6 +317,9 @@ int __meminit vmemmap_populate_hugepages(unsigned
>>>> long start, unsigned long end,
>>>> for (addr = start; addr < end; addr = next) {
>>>
>>> This for loop appears to be redundant for arm64 as well, as above
>>> mentioned, a single call to pmd_addr_end() should suffice.
>>
>> Right, that was what was confusing me in the first place.
>>
>>>
>>>> next = pmd_addr_end(addr, end);
>>>>
>>>> + if (!IS_ALIGNED(addr, PMD_SIZE) || !IS_ALIGNED(next,
>>>> PMD_SIZE))
>>>> + goto fallback;
>>>> +
>>>> pgd = vmemmap_pgd_populate(addr, node);
>>>> if (!pgd)
>>>> return -ENOMEM;
>>>> @@ -346,6 +353,7 @@ int __meminit vmemmap_populate_hugepages(unsigned
>>>> long start, unsigned long end,
>>>> }
>>>> } else if (vmemmap_check_pmd(pmd, node, addr, next))
>>>> continue;
>>>> +fallback:
>>>> if (vmemmap_populate_basepages(addr, next, node,
>>>> altmap))
>>>> return -ENOMEM;
>>>
>>> It seems we have no chance to call populate_basepages here?
>>
>>
>> Can you elaborate?
>
> It's invoked within vmemmap_populate_hugepages(), which is called by
> vmemmap_populate(). This implies that we are always performing a whole
> section hotplug?
Ah, you meant only in the context of this change, yes. I was confused,
because there are other reasons why we run into that fallback (failing
to allocate a PMD).
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned
2025-02-26 17:13 ` David Hildenbrand
@ 2025-03-03 9:29 ` Zhenhua Huang
0 siblings, 0 replies; 7+ messages in thread
From: Zhenhua Huang @ 2025-03-03 9:29 UTC (permalink / raw)
To: David Hildenbrand, anshuman.khandual, catalin.marinas
Cc: will, ardb, ryan.roberts, mark.rutland, joey.gouly, dave.hansen,
akpm, chenfeiyang, chenhuacai, linux-mm, linux-arm-kernel,
linux-kernel, quic_tingweiz
On 2025/2/27 1:13, David Hildenbrand wrote:
> Sorry, I somehow missed this mail.
>
No problem. Thanks for your reply.
>>>> Hi David,
>>>>
>>>> I had the same doubt initially.
>>>> After going through the codes, I noticed for vmemmap_populate(), the
>>>> arguments "start" and "end" passed down should already be within one
>>>> section.
>>>> early path:
>>>> for_each_present_section_nr
>>>> __populate_section_memmap
>>>> ..
>>>> vmemmap_populate()
>>>>
>>>> hotplug path:
>>>> __add_pages
>>>> section_activate
>>>> vmemmap_populate()
>>>>
>>>> Therefore.. focusing only on the size seems OK to me, and fall back
>>>> solution below appears unnecessary?
>>>
>>> Ah, in that case it is fine. Might make sense to document/enforce that
>>> somehow for the time being ...
>>
>> Shall I document and WARN_ON if the size exceeds? like:
>> WARN_ON(end - start > PAGES_PER_SECTION * sizeof(struct page))
>
> Probably WARN_ON_ONCE() along with a comment that we should never exceed
> a single memory section.
Got it.
>
>>
>> Since vmemmap_populate() is implemented per architecture, the change
>> should apply for other architectures as well. However I have no setup to
>> test it on...
>> Therefore, May I implement it only for arm64 now ?
>
> Would work for me; better than no warning.
I made one patch several days ago, could you please help review once?
https://lore.kernel.org/linux-mm/20250219084001.1272445-1-quic_zhenhuah@quicinc.com/T/
Is there anything else you would suggest besides preferring WARN_ON_ONCE() ?
>
>> Additionally, from previous discussion, the change is worth
>> backporting(apologies for missing to CC stable kernel in this version).
>> Keeping it for arm64 should simplify for backporting. WDYT?
>
> Jup. Of course, we could add a generic warning in a separate patch.
>
>>
>>>
>>>
>>>>> +/*
>>>>> + * Try to populate PMDs, but fallback to populating base pages when
>>>>> ranges
>>>>> + * would only partially cover a PMD.
>>>>> + */
>>>>> int __meminit vmemmap_populate_hugepages(unsigned long start,
>>>>> unsigned
>>>>> long end,
>>>>> int node, struct
>>>>> vmem_altmap
>>>>> *altmap)
>>>>> {
>>>>> @@ -313,6 +317,9 @@ int __meminit vmemmap_populate_hugepages(unsigned
>>>>> long start, unsigned long end,
>>>>> for (addr = start; addr < end; addr = next) {
>>>>
>>>> This for loop appears to be redundant for arm64 as well, as above
>>>> mentioned, a single call to pmd_addr_end() should suffice.
>>>
>>> Right, that was what was confusing me in the first place.
>>>
>>>>
>>>>> next = pmd_addr_end(addr, end);
>>>>>
>>>>> + if (!IS_ALIGNED(addr, PMD_SIZE) || !IS_ALIGNED(next,
>>>>> PMD_SIZE))
>>>>> + goto fallback;
>>>>> +
>>>>> pgd = vmemmap_pgd_populate(addr, node);
>>>>> if (!pgd)
>>>>> return -ENOMEM;
>>>>> @@ -346,6 +353,7 @@ int __meminit vmemmap_populate_hugepages(unsigned
>>>>> long start, unsigned long end,
>>>>> }
>>>>> } else if (vmemmap_check_pmd(pmd, node, addr,
>>>>> next))
>>>>> continue;
>>>>> +fallback:
>>>>> if (vmemmap_populate_basepages(addr, next, node,
>>>>> altmap))
>>>>> return -ENOMEM;
>>>>
>>>> It seems we have no chance to call populate_basepages here?
>>>
>>>
>>> Can you elaborate?
>>
>> It's invoked within vmemmap_populate_hugepages(), which is called by
>> vmemmap_populate(). This implies that we are always performing a whole
>> section hotplug?
>
> Ah, you meant only in the context of this change, yes. I was confused,
> because there are other reasons why we run into that fallback (failing
> to allocate a PMD).
I observed that this logic was introduced in
2045a3b8911b("mm/sparse-vmemmap: generalise
vmemmap_populate_hugepages()") which moved from arch-dependent
codes(such as vmemmap_set_pmd() in arch/loongarch/mm/init.c or
arch/x86/mm/init_64.c) to common arch-independent code(function
vmemmap_populate_hugepages).
I suspect it might be causing the confusion, as it may not be fully
compatible with arm64. However, it does not seem to cause any bugs :)
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-03-03 9:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-17 9:29 [PATCH v7] arm64: mm: Populate vmemmap at the page level if not section aligned Zhenhua Huang
2025-02-17 9:44 ` David Hildenbrand
2025-02-17 10:34 ` Zhenhua Huang
2025-02-17 14:30 ` David Hildenbrand
2025-02-18 3:07 ` Zhenhua Huang
2025-02-26 17:13 ` David Hildenbrand
2025-03-03 9:29 ` Zhenhua Huang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox