This patch paves the path to enable huge mappings in vmalloc space and linear map space by default on arm64. For this we must ensure that we can handle any permission games on the kernel (init_mm) pagetable. Currently, __change_memory_common() uses apply_to_page_range() which does not support changing permissions for block mappings. We attempt to move away from this by using the pagewalk API, similar to what riscv does right now; however, it is the responsibility of the caller to ensure that we do not pass a range overlapping a partial block mapping or cont mapping; in such a case, the system must be able to support range splitting. This patch is tied with Yang Shi's attempt [1] at using huge mappings in the linear mapping in case the system supports BBML2, in which case we will be able to split the linear mapping if needed without break-before-make. Thus, Yang's series, IIUC, will be one such user of my patch; suppose we are changing permissions on a range of the linear map backed by PMD-hugepages, then the sequence of operations should look like the following: split_range(start) split_range(end); __change_memory_common(start, end); However, this patch can be used independently of Yang's; since currently permission games are being played only on pte mappings (due to apply_to_page_range not supporting otherwise), this patch provides the mechanism for enabling huge mappings for various kernel mappings like linear map and vmalloc. --------------------- Implementation --------------------- arm64 currently changes permissions on vmalloc objects locklessly, via apply_to_page_range, whose limitation is to deny changing permissions for block mappings. Therefore, we move away to use the generic pagewalk API, thus paving the path for enabling huge mappings by default on kernel space mappings, thus leading to more efficient TLB usage. However, the API currently enforces the init_mm.mmap_lock to be held. To avoid the unnecessary bottleneck of the mmap_lock for our usecase, this patch extends this generic API to be used locklessly, so as to retain the existing behaviour for changing permissions. Apart from this reason, it is noted at [2] that KFENCE can manipulate kernel pgtable entries during softirqs. It does this by calling set_memory_valid() -> __change_memory_common(). This being a non-sleepable context, we cannot take the init_mm mmap lock. Add comments to highlight the conditions under which we can use the lockless variant - no underlying VMA, and the user having exclusive control over the range, thus guaranteeing no concurrent access. We require that the start and end of a given range do not partially overlap block mappings, or cont mappings. Return -EINVAL in case a partial block mapping is detected in any of the PGD/P4D/PUD/PMD levels; add a corresponding comment in update_range_prot() to warn that eliminating such a condition is the responsibility of the caller. Note that, the pte level callback may change permissions for a whole contpte block, and that will be done one pte at a time, as opposed to an atomic operation for the block mappings. This is fine as any access will decode either the old or the new permission until the TLBI. apply_to_page_range() currently performs all pte level callbacks while in lazy mmu mode. Since arm64 can optimize performance by batching barriers when modifying kernel pgtables in lazy mmu mode, we would like to continue to benefit from this optimisation. Unfortunately walk_kernel_page_table_range() does not use lazy mmu mode. However, since the pagewalk framework is not allocating any memory, we can safely bracket the whole operation inside lazy mmu mode ourselves. Therefore, wrap the call to walk_kernel_page_table_range() with the lazy MMU helpers. [1] https://lore.kernel.org/all/20250304222018.615808-1-yang@os.amperecomputing.com/ [2] https://lore.kernel.org/linux-arm-kernel/89d0ad18-4772-4d8f-ae8a-7c48d26a927e@arm.com/ Signed-off-by: Dev Jain <dev.jain@arm.com> ---
Forgot to carry: Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>