From: Bharata B Rao <bharata@amd.com>
To: Dave Hansen <dave.hansen@intel.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
luto@kernel.org, peterz@infradead.org, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
nikunj@amd.com, Balbir Singh <balbirs@nvidia.com>,
kees@kernel.org, alexander.deucher@amd.com
Subject: Re: AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5()
Date: Wed, 23 Apr 2025 15:00:17 +0530 [thread overview]
Message-ID: <83cf7fc7-23e0-46f5-916b-5341a0ab9599@amd.com> (raw)
In-Reply-To: <19353ca2-11f3-4718-b602-d898ff05ba87@intel.com>
On 22-Apr-25 8:43 PM, Dave Hansen wrote:
> On 4/21/25 23:34, Bharata B Rao wrote:
>> At the outset, it appears that the selection of vmemmap_base doesn't
>> seem to consider if there is going to be enough room of accommodating
>> future hot plugged pages.
>
> Is this future hotplug area in the memory map at boot?
The KVM guest isn't using any -m maxmem option if that's what you are
hinting at.
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000007ffdafff] usable
BIOS-e820: [mem 0x000000007ffdb000-0x000000007fffffff] reserved
BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000f4a3ffffff] usable
BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff] reserved
kaslr_region: base[0] ff4552df80000000 size_tb 1000
kaslr_region: end[0] fffffffffffff
kaslr_region: base[1] ff69c69640000000 size_tb 3200
kaslr_region: base[2] ffd3140680000000 size_tb 40
So vmemmap_base is 0xffd3140680000000
Also the last and max_arch pfns are reported like this:
last_pfn = 0x7ffdb max_arch_pfn = 0x10000000000
Here is some data for the hotplug that happens for the 8 GPUs.
Driver is passing the following values for pgmap->range.start,
pgmap->range.end and pgmap->type in dev_memremap_pages():
amdgpu: kgd2kfd_init_zone_device: start fffc010000000 end fffffffffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start fff8020000000 end fffc00fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start fff4030000000 end fff801fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start fff0040000000 end fff402fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start ffec050000000 end fff003fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start ffe8060000000 end ffec04fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start ffe4070000000 end ffe805fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start ffe0080000000 end ffe406fffffff
type 1
The pfn and the number of pages being added in response to the above:
__add_pages pfn fffc010000 nr_pages 67043328 nid 0
__add_pages pfn fff8020000 nr_pages 67043328 nid 0
__add_pages pfn fff4030000 nr_pages 67043328 nid 0
__add_pages pfn fff0040000 nr_pages 67043328 nid 0
__add_pages pfn ffec050000 nr_pages 67043328 nid 0
__add_pages pfn ffe8060000 nr_pages 67043328 nid 0
__add_pages pfn ffe4070000 nr_pages 67043328 nid 0
__add_pages pfn ffe0080000 nr_pages 67043328 nid 0
For the above vmemmap_base, the (first) addresses seen in
sync_global_pgds_l5() for the above 8 hotplug cases are like this:
start ffd3540580400000, end = ffd35405805fffff
start ffd3540480800000, end = ffd35404809fffff
start ffd3540380c00000, end = ffd3540380dfffff
start ffd3540281000000, end = ffd35402811fffff
start ffd3540181400000, end = ffd35401815fffff
start ffd3540081800000, end = ffd35400819fffff
start ffd353ff81c00000, end = ffd353ff81dfffff
start ffd353fe82000000, end = ffd353fe821fffff
This is for the case that succeeds while I have shown the same data for
the case that fails in the first mail thread.
When randomization results in bad vmemmap_base address, the hotplug of
1st page for the 1st GPU results in BUG_ON.
Regards,
Bharata.
next prev parent reply other threads:[~2025-04-23 9:30 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 6:34 Bharata B Rao
2025-04-22 7:14 ` Balbir Singh
2025-04-22 8:28 ` Bharata B Rao
2025-04-23 6:40 ` Bharata B Rao
2025-04-22 15:13 ` Dave Hansen
2025-04-23 9:30 ` Bharata B Rao [this message]
2025-04-23 16:01 ` Dave Hansen
2025-04-24 12:54 ` Bharata B Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83cf7fc7-23e0-46f5-916b-5341a0ab9599@amd.com \
--to=bharata@amd.com \
--cc=alexander.deucher@amd.com \
--cc=balbirs@nvidia.com \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kees@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=nikunj@amd.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox