From: "Christian König" <christian.koenig@amd.com>
To: Honglei Huang <honglei1.huang@amd.com>,
Felix.Kuehling@amd.com, alexander.deucher@amd.com,
Ray.Huang@amd.com
Cc: dmitry.osipenko@collabora.com, Xinhui.Pan@amd.com,
airlied@gmail.com, daniel@ffwll.ch,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
akpm@linux-foundation.org, honghuan@amd.com
Subject: Re: [PATCH v3 0/8] drm/amdkfd: Add batch userptr allocation support
Date: Fri, 6 Feb 2026 14:56:27 +0100 [thread overview]
Message-ID: <da75eadd-865e-41fe-a86b-ed9d9aa45e5a@amd.com> (raw)
In-Reply-To: <20260206062557.3718801-1-honglei1.huang@amd.com>
On 2/6/26 07:25, Honglei Huang wrote:
> From: Honglei Huang <honghuan@amd.com>
>
> Hi all,
>
> This is v3 of the patch series to support allocating multiple non-contiguous
> CPU virtual address ranges that map to a single contiguous GPU virtual address.
>
> v3:
> 1. No new ioctl: Reuses existing AMDKFD_IOC_ALLOC_MEMORY_OF_GPU
> - Adds only one flag: KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH
That is most likely not the best approach, but Felix or Philip need to comment here since I don't know such IOCTLs well either.
> - When flag is set, mmap_offset field points to range array
> - Minimal API surface change
Why range of VA space for each entry?
> 2. Improved MMU notifier handling:
> - Single mmu_interval_notifier covering the VA span [va_min, va_max]
> - Interval tree for efficient lookup of affected ranges during invalidation
> - Avoids per-range notifier overhead mentioned in v2 review
That won't work unless you also modify hmm_range_fault() to take multiple VA addresses (or ranges) at the same time.
The problem is that we must rely on hmm_range.notifier_seq to detect changes to the page tables in question, but that in turn works only if you have one hmm_range structure and not multiple.
What might work is doing an XOR or CRC over all hmm_range.notifier_seq you have, but that is a bit flaky.
Regards,
Christian.
>
> 3. Better code organization: Split into 8 focused patches for easier review
>
> v2:
> - Each CPU VA range gets its own mmu_interval_notifier for invalidation
> - All ranges validated together and mapped to contiguous GPU VA
> - Single kgd_mem object with array of user_range_info structures
> - Unified eviction/restore path for all ranges in a batch
>
> Current Implementation Approach
> ===============================
>
> This series implements a practical solution within existing kernel constraints:
>
> 1. Single MMU notifier for VA span: Register one notifier covering the
> entire range from lowest to highest address in the batch
>
> 2. Interval tree filtering: Use interval tree to efficiently identify
> which specific ranges are affected during invalidation callbacks,
> avoiding unnecessary processing for unrelated address changes
>
> 3. Unified eviction/restore: All ranges in a batch share eviction and
> restore paths, maintaining consistency with existing userptr handling
>
> Patch Series Overview
> =====================
>
> Patch 1/8: Add userptr batch allocation UAPI structures
> - KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH flag
> - kfd_ioctl_userptr_range and kfd_ioctl_userptr_ranges_data structures
>
> Patch 2/8: Add user_range_info infrastructure to kgd_mem
> - user_range_info structure for per-range tracking
> - Fields for batch allocation in kgd_mem
>
> Patch 3/8: Implement interval tree for userptr ranges
> - Interval tree for efficient range lookup during invalidation
> - mark_invalid_ranges() function
>
> Patch 4/8: Add batch MMU notifier support
> - Single notifier for entire VA span
> - Invalidation callback using interval tree filtering
>
> Patch 5/8: Implement batch userptr page management
> - get_user_pages_batch() and set_user_pages_batch()
> - Per-range page array management
>
> Patch 6/8: Add batch allocation function and export API
> - init_user_pages_batch() main initialization
> - amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu_batch() entry point
>
> Patch 7/8: Unify userptr cleanup and update paths
> - Shared eviction/restore handling for batch allocations
> - Integration with existing userptr validation flows
>
> Patch 8/8: Wire up batch allocation in ioctl handler
> - Input validation and range array parsing
> - Integration with existing alloc_memory_of_gpu path
>
> Testing
> =======
>
> - Multiple scattered malloc() allocations (2-4000+ ranges)
> - Various allocation sizes (4KB to 1G+ per range)
> - Memory pressure scenarios and eviction/restore cycles
> - OpenCL CTS and HIP catch tests in KVM guest environment
> - AI workloads: Stable Diffusion, ComfyUI in virtualized environments
> - Small LLM inference (3B-7B models)
> - Benchmark score: 160,000 - 190,000 (80%-95% of bare metal)
> - Performance improvement: 2x-2.4x faster than userspace approach
>
> Thank you for your review and feedback.
>
> Best regards,
> Honglei Huang
>
> Honglei Huang (8):
> drm/amdkfd: Add userptr batch allocation UAPI structures
> drm/amdkfd: Add user_range_info infrastructure to kgd_mem
> drm/amdkfd: Implement interval tree for userptr ranges
> drm/amdkfd: Add batch MMU notifier support
> drm/amdkfd: Implement batch userptr page management
> drm/amdkfd: Add batch allocation function and export API
> drm/amdkfd: Unify userptr cleanup and update paths
> drm/amdkfd: Wire up batch allocation in ioctl handler
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 23 +
> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 539 +++++++++++++++++-
> drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 128 ++++-
> include/uapi/linux/kfd_ioctl.h | 31 +-
> 4 files changed, 697 insertions(+), 24 deletions(-)
>
next prev parent reply other threads:[~2026-02-06 13:56 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 6:25 Honglei Huang
2026-02-06 6:25 ` [PATCH v3 1/8] drm/amdkfd: Add userptr batch allocation UAPI structures Honglei Huang
2026-02-06 6:25 ` [PATCH v3 2/8] drm/amdkfd: Add user_range_info infrastructure to kgd_mem Honglei Huang
2026-02-06 6:25 ` [PATCH v3 3/8] drm/amdkfd: Implement interval tree for userptr ranges Honglei Huang
2026-02-06 6:25 ` [PATCH v3 4/8] drm/amdkfd: Add batch MMU notifier support Honglei Huang
2026-02-06 6:25 ` [PATCH v3 5/8] drm/amdkfd: Implement batch userptr page management Honglei Huang
2026-02-06 6:25 ` [PATCH v3 6/8] drm/amdkfd: Add batch allocation function and export API Honglei Huang
2026-02-06 6:25 ` [PATCH v3 7/8] drm/amdkfd: Unify userptr cleanup and update paths Honglei Huang
2026-02-06 6:25 ` [PATCH v3 8/8] drm/amdkfd: Wire up batch allocation in ioctl handler Honglei Huang
2026-02-06 13:56 ` Christian König [this message]
2026-02-09 6:14 ` [PATCH v3 0/8] drm/amdkfd: Add batch userptr allocation support Honglei Huang
2026-02-09 10:16 ` Christian König
2026-02-09 12:52 ` Honglei Huang
2026-02-09 12:59 ` Christian König
2026-02-09 13:11 ` Honglei Huang
2026-02-09 13:27 ` Christian König
2026-02-09 14:16 ` Honglei Huang
2026-02-09 14:25 ` Christian König
2026-02-09 14:44 ` Honglei Huang
2026-02-09 15:07 ` Christian König
2026-02-09 15:46 ` Honglei Huang
2026-02-09 17:37 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da75eadd-865e-41fe-a86b-ed9d9aa45e5a@amd.com \
--to=christian.koenig@amd.com \
--cc=Felix.Kuehling@amd.com \
--cc=Ray.Huang@amd.com \
--cc=Xinhui.Pan@amd.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=daniel@ffwll.ch \
--cc=dmitry.osipenko@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=honghuan@amd.com \
--cc=honglei1.huang@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox