linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH 1/1] mm/vmalloc: Modify permission reset procedure to avoid invalid access
       [not found] <20240611131301.2988047-1-ycliang@andestech.com>
@ 2024-06-11 14:21 ` Edgecombe, Rick P
  2024-06-18 12:08   ` Leo Liang
  0 siblings, 1 reply; 3+ messages in thread
From: Edgecombe, Rick P @ 2024-06-11 14:21 UTC (permalink / raw)
  To: lstoakes, akpm, linux-mm, hch, linux-kernel, bpf, urezki, ycliang; +Cc: patrick

On Tue, 2024-06-11 at 21:13 +0800, Leo Yu-Chi Liang wrote:
> The previous reset procedure is
> 1. Set direct map attribute to invalid
> 2. Flush TLB
> 3. Reset direct map attribute to default
> 
> It is possible that kernel forks another process
> on another core that access the invalid mappings after
> sync_kernel_mappings.
> 
> We could reproduce this scenario by running LTP/bpf_prog
> multiple times on RV32 kernel on QEMU.
> 
> Therefore, the following procedure is proposed
> to avoid mappings being invalid.
> 1. Reset direct map attribute to default
> 2. Flush TLB

Can you explain more about what is happening in this scenario? Looking briefly,
riscv is doing something unique around sync_kernel_mappings(). If a RO mapping
is copied instead of a NP/invalid mapping, how is the problem avoided?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH 1/1] mm/vmalloc: Modify permission reset procedure to avoid invalid access
  2024-06-11 14:21 ` [RFC PATCH 1/1] mm/vmalloc: Modify permission reset procedure to avoid invalid access Edgecombe, Rick P
@ 2024-06-18 12:08   ` Leo Liang
  2024-06-18 16:08     ` Edgecombe, Rick P
  0 siblings, 1 reply; 3+ messages in thread
From: Leo Liang @ 2024-06-18 12:08 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: lstoakes, akpm, linux-mm, hch, linux-kernel, bpf, urezki, patrick

On Tue, Jun 11, 2024 at 02:21:42PM +0000, Edgecombe, Rick P wrote:
> [EXTERNAL MAIL]
> 
> On Tue, 2024-06-11 at 21:13 +0800, Leo Yu-Chi Liang wrote:
> > The previous reset procedure is
> > 1. Set direct map attribute to invalid
> > 2. Flush TLB
> > 3. Reset direct map attribute to default
> >
> > It is possible that kernel forks another process
> > on another core that access the invalid mappings after
> > sync_kernel_mappings.
> >
> > We could reproduce this scenario by running LTP/bpf_prog
> > multiple times on RV32 kernel on QEMU.
> >
> > Therefore, the following procedure is proposed
> > to avoid mappings being invalid.
> > 1. Reset direct map attribute to default
> > 2. Flush TLB
> 
> Can you explain more about what is happening in this scenario? Looking briefly,
> riscv is doing something unique around sync_kernel_mappings(). If a RO mapping
> is copied instead of a NP/invalid mapping, how is the problem avoided?

Hi Edgecombe,

Sorry for the late reply and thank you for taking a look.

What we are seeing at first is that running LTP bpf_prog03 test fails randomly
on RV32 SMP QEMU with kernel 6.1 and the failed cause is a load page fault.

After a bit of inspection, we found that the faulting page is a part of
kernel's page table and the valid bit of that page's PTE is cleared due to this reset procedure.

The scenario of this fault is suspected to be the following:
1. Running bpf_prog03: Creates kernel pages with elevated 'X' permission so that bpf program can be executed.
2. Finishing bpf_prog03: vfree code path to reset permission to default: 
	a. Set the pages to invalid first
	b. Unmap the pages and flush TLB
	c. Reset them to default permission
3. Other core forkes new processes: sync_kernel_mappings copies the kernel page table.

If the 3rd step happens during 2a, then we get a kernel mapping with invalid PTE permission,
Therefore, if the invalid page is accessed, we'd get a page fault exception and the kernel panics.

But despite all of the above conjecture,
we still are wondering if setting the mappings to be invalid first is necessary.
IMHO, "set to invalid --> unmap & flush TLB --> set to default" is identical to "set to default --> unmap & flush TLB".
Could we not just reset them to default first and then flush TLB & free memory?

Best regards,
Leo


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH 1/1] mm/vmalloc: Modify permission reset procedure to avoid invalid access
  2024-06-18 12:08   ` Leo Liang
@ 2024-06-18 16:08     ` Edgecombe, Rick P
  0 siblings, 0 replies; 3+ messages in thread
From: Edgecombe, Rick P @ 2024-06-18 16:08 UTC (permalink / raw)
  To: ycliang; +Cc: lstoakes, akpm, linux-mm, patrick, hch, linux-kernel, bpf, urezki

On Tue, 2024-06-18 at 20:08 +0800, Leo Liang wrote:
> What we are seeing at first is that running LTP bpf_prog03 test fails randomly
> on RV32 SMP QEMU with kernel 6.1 and the failed cause is a load page fault.
> 
> After a bit of inspection, we found that the faulting page is a part of
> kernel's page table and the valid bit of that page's PTE is cleared due to this reset procedure.
> 
> The scenario of this fault is suspected to be the following:
> 1. Running bpf_prog03: Creates kernel pages with elevated 'X' permission so that bpf program can
> be executed.
> 2. Finishing bpf_prog03: vfree code path to reset permission to default: 
>         a. Set the pages to invalid first
>         b. Unmap the pages and flush TLB
>         c. Reset them to default permission
> 3. Other core forkes new processes: sync_kernel_mappings copies the kernel page table.
> 
> If the 3rd step happens during 2a, then we get a kernel mapping with invalid PTE permission,
> Therefore, if the invalid page is accessed, we'd get a page fault exception and the kernel panics.

So some other non-BPF related access takes the #PF I guess? How is this not a generic problem with
kernel memory permissions? There are other set_direct_map() callers.

Perhaps 32 bit riscv should not support the set_direct_map() functions if the implementation is
problematic like this. As in, something like:
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56..125ba87d3c9d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -34,7 +34,7 @@ config RISCV
        select ARCH_HAS_PMEM_API
        select ARCH_HAS_PREPARE_SYNC_CORE_CMD
        select ARCH_HAS_PTE_SPECIAL
-       select ARCH_HAS_SET_DIRECT_MAP if MMU
+       select ARCH_HAS_SET_DIRECT_MAP if MMU && 64BIT
        select ARCH_HAS_SET_MEMORY if MMU
        select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
        select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL


> 
> But despite all of the above conjecture,
> we still are wondering if setting the mappings to be invalid first is necessary.
> IMHO, "set to invalid --> unmap & flush TLB --> set to default" is identical to "set to default --
> > unmap & flush TLB".
> Could we not just reset them to default first and then flush TLB & free memory?

It is trying to reset the permissions from ROX to RW without leaving a transient writable alias
while there remains an executable one, and doing it all with a single flush.

On x86 and arm, the JIT pages will have a direct map alias that is read only and a module address is
RO+X. When cleaning up, the NP PTEs can prevent any writable TLBs from being created before the
executable ones are flushed.

See here for the original discussion:
https://lore.kernel.org/lkml/20181128000754.18056-1-rick.p.edgecombe@intel.com/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-06-18 16:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20240611131301.2988047-1-ycliang@andestech.com>
2024-06-11 14:21 ` [RFC PATCH 1/1] mm/vmalloc: Modify permission reset procedure to avoid invalid access Edgecombe, Rick P
2024-06-18 12:08   ` Leo Liang
2024-06-18 16:08     ` Edgecombe, Rick P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox