linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang@os.amperecomputing.com>
To: Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	scott@os.amperecomputing.com, cl@gentwo.org
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v7 0/6] arm64: support FEAT_BBM level 2 and large block mapping when rodata=full
Date: Thu, 4 Sep 2025 14:49:56 -0700	[thread overview]
Message-ID: <39c2f841-9043-448d-b644-ac96612d520a@os.amperecomputing.com> (raw)
In-Reply-To: <bf1aa0a4-08de-443f-a1a3-aa6c05bab38c@os.amperecomputing.com>



On 9/4/25 10:47 AM, Yang Shi wrote:
>
>
> On 9/4/25 6:16 AM, Ryan Roberts wrote:
>> On 04/09/2025 14:14, Ryan Roberts wrote:
>>> On 03/09/2025 01:50, Yang Shi wrote:
>>>>>>>
>>>>>>> I am wondering whether we can just have a warn_on_once or 
>>>>>>> something for the
>>>>>>> case
>>>>>>> when we fail to allocate a pagetable page. Or, Ryan had
>>>>>>> suggested in an off-the-list conversation that we can maintain a 
>>>>>>> cache of PTE
>>>>>>> tables for every PMD block mapping, which will give us
>>>>>>> the same memory consumption as we do today, but not sure if this 
>>>>>>> is worth it.
>>>>>>> x86 can already handle splitting but due to the callchains
>>>>>>> I have described above, it has the same problem, and the code 
>>>>>>> has been working
>>>>>>> for years :)
>>>>>> I think it's preferable to avoid having to keep a cache of 
>>>>>> pgtable memory if we
>>>>>> can...
>>>>> Yes, I agree. We simply don't know how many pages we need to 
>>>>> cache, and it
>>>>> still can't guarantee 100% allocation success.
>>>> This is wrong... We can know how many pages will be needed for 
>>>> splitting linear
>>>> mapping to PTEs for the worst case once linear mapping is 
>>>> finalized. But it may
>>>> require a few hundred megabytes memory to guarantee allocation 
>>>> success. I don't
>>>> think it is worth for such rare corner case.
>>> Indeed, we know exactly how much memory we need for pgtables to map 
>>> the linear
>>> map by pte - that's exactly what we are doing today. So we _could_ 
>>> keep a cache.
>>> We would still get the benefit of improved performance but we would 
>>> lose the
>>> benefit of reduced memory.
>>>
>>> I think we need to solve the vm_reset_perms() problem somehow, 
>>> before we can
>>> enable this.
>> Sorry I realise this was not very clear... I am saying I think we 
>> need to fix it
>> somehow. A cache would likely work. But I'd prefer to avoid it if we 
>> can find a
>> better solution.
>
> Took a deeper look at vm_reset_perms(). It was introduced by commit 
> 868b104d7379 ("mm/vmalloc: Add flag for freeing of special 
> permsissions"). The VM_FLUSH_RESET_PERMS flag is supposed to be set if 
> the vmalloc memory is RO and/or ROX. So set_memory_ro() or 
> set_memory_rox() is supposed to follow up vmalloc(). So the page table 
> should be already split before reaching vfree(). I think this why 
> vm_reset_perms() doesn't not check return value.
>
> I scrutinized all the callsites with VM_FLUSH_RESET_PERMS flag set. 
> The most of them has set_memory_ro() or set_memory_rox() followed. But 
> there are 3 places I don't see set_memory_ro()/set_memory_rox() is 
> called.
>
> 1. BPF trampoline allocation. The BPF trampoline calls 
> arch_protect_bpf_trampoline(). The generic implementation does call 
> set_memory_rox(). But the x86 and arm64 implementation just simply 
> return 0. For x86, it is because execmem cache is used and it does 
> call set_memory_rox(). ARM64 doesn't need to split page table before 
> this series, so it should never fail. I think we just need to use the 
> generic implementation (remove arm64 implementation) if this series is 
> merged.
>
> 2. BPF dispatcher. It calls execmem_alloc which has 
> VM_FLUSH_RESET_PERMS set. But it is used for rw allocation, so 
> VM_FLUSH_RESET_PERMS should be unnecessary IIUC. So it doesn't matter 
> even though vm_reset_perms() fails.
>
> 3. kprobe. S390's alloc_insn_page() does call set_memory_rox(), x86 
> also called set_memory_rox() before switching to execmem cache. The 
> execmem cache calls set_memory_rox(). I don't know why ARM64 doesn't 
> call it.
>
> So I think we just need to fix #1 and #3 per the above analysis. If 
> this analysis look correct to you guys, I will prepare two patches to 
> fix them.

Tested the below patch with bpftrace kfunc (allocate bpf trampoline) and 
kprobes. It seems work well.

diff --git a/arch/arm64/kernel/probes/kprobes.c 
b/arch/arm64/kernel/probes/kprobes.c
index 0c5d408afd95..c4f8c4750f1e 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -10,6 +10,7 @@

  #define pr_fmt(fmt) "kprobes: " fmt

+#include <linux/execmem.h>
  #include <linux/extable.h>
  #include <linux/kasan.h>
  #include <linux/kernel.h>
@@ -41,6 +42,17 @@ DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
  static void __kprobes
  post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct 
pt_regs *);

+void *alloc_insn_page(void)
+{
+       void *page;
+
+       page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
+       if (!page)
+               return NULL;
+       set_memory_rox((unsigned long)page, 1);
+       return page;
+}
+
  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)
  {
         kprobe_opcode_t *addr = p->ainsn.xol_insn;
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 52ffe115a8c4..3e301bc2cd66 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -2717,11 +2717,6 @@ void arch_free_bpf_trampoline(void *image, 
unsigned int size)
         bpf_prog_pack_free(image, size);
  }

-int arch_protect_bpf_trampoline(void *image, unsigned int size)
-{
-       return 0;
-}
-
  int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void 
*ro_image,
                                 void *ro_image_end, const struct 
btf_func_model *m,
                                 u32 flags, struct bpf_tramp_links *tlinks,


>
> Thanks,
> Yang
>
>>
>>
>>> Thanks,
>>> Ryan
>>>
>>>> Thanks,
>>>> Yang
>>>>
>>>>> Thanks,
>>>>> Yang
>>>>>
>>>>>> Thanks,
>>>>>> Ryan
>>>>>>
>>>>>>
>



  reply	other threads:[~2025-09-04 21:50 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-29 11:52 Ryan Roberts
2025-08-29 11:52 ` [PATCH v7 1/6] arm64: Enable permission change on arm64 kernel block mappings Ryan Roberts
2025-09-04  3:40   ` Jinjiang Tu
2025-09-04 11:06     ` Ryan Roberts
2025-09-04 11:49       ` Jinjiang Tu
2025-09-04 13:21         ` Ryan Roberts
2025-09-16 21:37       ` Yang Shi
2025-08-29 11:52 ` [PATCH v7 2/6] arm64: cpufeature: add AmpereOne to BBML2 allow list Ryan Roberts
2025-08-29 22:08   ` Yang Shi
2025-09-04 11:07     ` Ryan Roberts
2025-09-03 17:24   ` Catalin Marinas
2025-09-04  0:49     ` Yang Shi
2025-08-29 11:52 ` [PATCH v7 3/6] arm64: mm: support large block mapping when rodata=full Ryan Roberts
2025-09-03 19:15   ` Catalin Marinas
2025-09-04  0:52     ` Yang Shi
2025-09-04 11:09     ` Ryan Roberts
2025-09-04 11:15   ` Ryan Roberts
2025-09-04 14:57     ` Yang Shi
2025-08-29 11:52 ` [PATCH v7 4/6] arm64: mm: Optimize split_kernel_leaf_mapping() Ryan Roberts
2025-08-29 22:11   ` Yang Shi
2025-09-03 19:20   ` Catalin Marinas
2025-09-04 11:09     ` Ryan Roberts
2025-08-29 11:52 ` [PATCH v7 5/6] arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs Ryan Roberts
2025-09-04 16:59   ` Catalin Marinas
2025-09-04 17:54     ` Yang Shi
2025-09-08 15:25     ` Ryan Roberts
2025-08-29 11:52 ` [PATCH v7 6/6] arm64: mm: Optimize linear_map_split_to_ptes() Ryan Roberts
2025-08-29 22:27   ` Yang Shi
2025-09-04 11:10     ` Ryan Roberts
2025-09-04 14:58       ` Yang Shi
2025-09-04 17:00   ` Catalin Marinas
2025-09-01  5:04 ` [PATCH v7 0/6] arm64: support FEAT_BBM level 2 and large block mapping when rodata=full Dev Jain
2025-09-01  8:03   ` Ryan Roberts
2025-09-03  0:21     ` Yang Shi
2025-09-03  0:50       ` Yang Shi
2025-09-04 13:14         ` Ryan Roberts
2025-09-04 13:16           ` Ryan Roberts
2025-09-04 17:47             ` Yang Shi
2025-09-04 21:49               ` Yang Shi [this message]
2025-09-08 16:34                 ` Ryan Roberts
2025-09-08 18:31                   ` Yang Shi
2025-09-09 14:36                     ` Ryan Roberts
2025-09-09 15:32                       ` Yang Shi
2025-09-09 16:32                         ` Ryan Roberts
2025-09-09 17:32                           ` Yang Shi
2025-09-11 22:03                             ` Yang Shi
2025-09-17 16:28                               ` Ryan Roberts
2025-09-17 17:21                                 ` Yang Shi
2025-09-17 18:58                                   ` Ryan Roberts
2025-09-17 19:15                                     ` Yang Shi
2025-09-17 19:40                                       ` Ryan Roberts
2025-09-17 19:59                                         ` Yang Shi
2025-09-16 23:44               ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39c2f841-9043-448d-b644-ac96612d520a@os.amperecomputing.com \
    --to=yang@os.amperecomputing.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=ryan.roberts@arm.com \
    --cc=scott@os.amperecomputing.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox