On 03/07/25 8:44 pm, Dev Jain wrote: > This patch paves the path to enable huge mappings in vmalloc space and > linear map space by default on arm64. For this we must ensure that we can > handle any permission games on the kernel (init_mm) pagetable. Currently, > __change_memory_common() uses apply_to_page_range() which does not support > changing permissions for block mappings. We attempt to move away from this > by using the pagewalk API, similar to what riscv does right now; however, > it is the responsibility of the caller to ensure that we do not pass a > range overlapping a partial block mapping or cont mapping; in such a case, > the system must be able to support range splitting. > > This patch is tied with Yang Shi's attempt [1] at using huge mappings > in the linear mapping in case the system supports BBML2, in which case > we will be able to split the linear mapping if needed without > break-before-make. Thus, Yang's series, IIUC, will be one such user of my > patch; suppose we are changing permissions on a range of the linear map > backed by PMD-hugepages, then the sequence of operations should look > like the following: > > split_range(start) > split_range(end); > __change_memory_common(start, end); > > However, this patch can be used independently of Yang's; since currently > permission games are being played only on pte mappings (due to > apply_to_page_range not supporting otherwise), this patch provides the > mechanism for enabling huge mappings for various kernel mappings > like linear map and vmalloc. > > --------------------- > Implementation > --------------------- > > arm64 currently changes permissions on vmalloc objects locklessly, via > apply_to_page_range, whose limitation is to deny changing permissions for > block mappings. Therefore, we move away to use the generic pagewalk API, > thus paving the path for enabling huge mappings by default on kernel space > mappings, thus leading to more efficient TLB usage. However, the API > currently enforces the init_mm.mmap_lock to be held. To avoid the > unnecessary bottleneck of the mmap_lock for our usecase, this patch > extends this generic API to be used locklessly, so as to retain the > existing behaviour for changing permissions. Apart from this reason, it is > noted at [2] that KFENCE can manipulate kernel pgtable entries during > softirqs. It does this by calling set_memory_valid() -> __change_memory_common(). > This being a non-sleepable context, we cannot take the init_mm mmap lock. > > Add comments to highlight the conditions under which we can use the > lockless variant - no underlying VMA, and the user having exclusive control > over the range, thus guaranteeing no concurrent access. > > We require that the start and end of a given range do not partially overlap > block mappings, or cont mappings. Return -EINVAL in case a partial block > mapping is detected in any of the PGD/P4D/PUD/PMD levels; add a > corresponding comment in update_range_prot() to warn that eliminating > such a condition is the responsibility of the caller. > > Note that, the pte level callback may change permissions for a whole > contpte block, and that will be done one pte at a time, as opposed to > an atomic operation for the block mappings. This is fine as any access > will decode either the old or the new permission until the TLBI. > > apply_to_page_range() currently performs all pte level callbacks while in > lazy mmu mode. Since arm64 can optimize performance by batching barriers > when modifying kernel pgtables in lazy mmu mode, we would like to continue > to benefit from this optimisation. Unfortunately walk_kernel_page_table_range() > does not use lazy mmu mode. However, since the pagewalk framework is not > allocating any memory, we can safely bracket the whole operation inside > lazy mmu mode ourselves. Therefore, wrap the call to > walk_kernel_page_table_range() with the lazy MMU helpers. > > [1]https://lore.kernel.org/all/20250304222018.615808-1-yang@os.amperecomputing.com/ > [2]https://lore.kernel.org/linux-arm-kernel/89d0ad18-4772-4d8f-ae8a-7c48d26a927e@arm.com/ > > Signed-off-by: Dev Jain > --- > Forgot to carry: Reviewed-by: Ryan Roberts