* [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash)
@ 2024-10-18 17:29 Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 01/12] powerpc: mm/fault: Fix kfence page fault reporting Ritesh Harjani (IBM)
` (12 more replies)
0 siblings, 13 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
v2 -> v3:
============
1. Addressed review comments from Christophe in patch-1: To check for
is_kfence_address before doing search in exception tables.
(Thanks for the review!)
2. Separate out patch-1, which will need a separate tree for inclusion and
review from kfence/kasan folks since it's a kfence kunit test.
[v2]: https://lore.kernel.org/linuxppc-dev/cover.1728954719.git.ritesh.list@gmail.com/
Not much of the change from last revision. I wanted to split this series up
and drop the RFC tag so that this starts to look ready for inclusion before the
merge window opens for powerpc-next testing.
Kindly let me know if anything is needed for this.
-ritesh
Summary:
==========
This patch series addresses following to improve kfence support on Powerpc.
1. Usage of copy_from_kernel_nofault() within kernel, such as read from
/proc/kcore can cause kfence to report false negatives.
This is similar to what was reported on s390. [1]
[1]: https://lore.kernel.org/all/20230213183858.1473681-1-hca@linux.ibm.com/
Patch-1, thus adds a fix to handle this case in ___do_page_fault() for
powerpc.
2. (book3s64) Kfence depends upon debug_pagealloc infrastructure on Hash.
debug_pagealloc allocates a linear map based on the size of the DRAM i.e.
1 byte for every 64k page. That means for a 16TB DRAM, it will need 256MB
memory for linear map. Memory for linear map on pseries comes from
RMA region which has size limitation. On P8 RMA is 512MB, in which we also
fit crash kernel at 256MB, paca allocations and emergency stacks.
That means there is not enough memory in the RMA region for the linear map
based on DRAM size (required by debug_pagealloc).
Now kfence only requires memory for it's kfence objects. kfence by default
requires only (255 + 1) * 2 i.e. 32 MB for 64k pagesize.
Summary of patches
==================
Patch-1 adds a fix to handle this false negatives from copy_from_kernel_nofault().
Patch[2-8] removes the direct dependency of kfence on debug_pagealloc
infrastructure. We make Hash kernel linear map functions to take linear map array
as a parameter so that it can support debug_pagealloc and kfence individually.
That means we don't need to keep the size of the linear map to be
DRAM_SIZE >> PAGE_SHIFT anymore for kfence.
Patch-9: Adds kfence support with above (abstracted out) kernel linear map
infrastructure. With it, this also fixes, the boot failure problem when kfence
gets enabled on Hash with >=16TB of RAM.
Patch-10 & Patch-11: Ensure late initialization of kfence is disabled for both
Hash and Radix due to linear mapping size limiations. Commit gives more
description.
Patch-12: Early detects if debug_pagealloc cannot be enabled (due to RMA size
limitation) so that the linear mapping size can be set correctly during init.
Testing:
========
It passes kfence kunit tests with Hash and Radix.
[ 44.355173][ T1] # kfence: pass:27 fail:0 skip:0 total:27
[ 44.358631][ T1] # Totals: pass:27 fail:0 skip:0 total:27
[ 44.365570][ T1] ok 1 kfence
Future TODO:
============
When kfence on Hash gets enabled, the kernel linear map uses PAGE_SIZE mapping
rather than 16MB mapping. This should be improved in future.
v1 -> v2:
=========
1. Added a kunit testcase patch-1.
2. Fixed a false negative with copy_from_kernel_nofault() in patch-2.
3. Addressed review comments from Christophe Leroy.
4. Added patch-13.
Ritesh Harjani (IBM) (12):
powerpc: mm/fault: Fix kfence page fault reporting
book3s64/hash: Remove kfence support temporarily
book3s64/hash: Refactor kernel linear map related calls
book3s64/hash: Add hash_debug_pagealloc_add_slot() function
book3s64/hash: Add hash_debug_pagealloc_alloc_slots() function
book3s64/hash: Refactor hash__kernel_map_pages() function
book3s64/hash: Make kernel_map_linear_page() generic
book3s64/hash: Disable debug_pagealloc if it requires more memory
book3s64/hash: Add kfence functionality
book3s64/radix: Refactoring common kfence related functions
book3s64/hash: Disable kfence if not early init
book3s64/hash: Early detect debug_pagealloc size requirement
arch/powerpc/include/asm/kfence.h | 8 +-
arch/powerpc/mm/book3s64/hash_utils.c | 364 +++++++++++++++++------
arch/powerpc/mm/book3s64/pgtable.c | 13 +
arch/powerpc/mm/book3s64/radix_pgtable.c | 12 -
arch/powerpc/mm/fault.c | 11 +-
arch/powerpc/mm/init-common.c | 1 +
6 files changed, 301 insertions(+), 108 deletions(-)
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 01/12] powerpc: mm/fault: Fix kfence page fault reporting
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:40 ` Christophe Leroy
2024-10-22 2:42 ` Michael Ellerman
2024-10-18 17:29 ` [PATCH v3 02/12] book3s64/hash: Remove kfence support temporarily Ritesh Harjani (IBM)
` (11 subsequent siblings)
12 siblings, 2 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM),
Disha Goel
copy_from_kernel_nofault() can be called when doing read of /proc/kcore.
/proc/kcore can have some unmapped kfence objects which when read via
copy_from_kernel_nofault() can cause page faults. Since *_nofault()
functions define their own fixup table for handling fault, use that
instead of asking kfence to handle such faults.
Hence we search the exception tables for the nip which generated the
fault. If there is an entry then we let the fixup table handler handle the
page fault by returning an error from within ___do_page_fault().
This can be easily triggered if someone tries to do dd from /proc/kcore.
dd if=/proc/kcore of=/dev/null bs=1M
<some example false negatives>
===============================
BUG: KFENCE: invalid read in copy_from_kernel_nofault+0xb0/0x1c8
Invalid read at 0x000000004f749d2e:
copy_from_kernel_nofault+0xb0/0x1c8
0xc0000000057f7950
read_kcore_iter+0x41c/0x9ac
proc_reg_read_iter+0xe4/0x16c
vfs_read+0x2e4/0x3b0
ksys_read+0x88/0x154
system_call_exception+0x124/0x340
system_call_common+0x160/0x2c4
BUG: KFENCE: use-after-free read in copy_from_kernel_nofault+0xb0/0x1c8
Use-after-free read at 0x000000008fbb08ad (in kfence-#0):
copy_from_kernel_nofault+0xb0/0x1c8
0xc0000000057f7950
read_kcore_iter+0x41c/0x9ac
proc_reg_read_iter+0xe4/0x16c
vfs_read+0x2e4/0x3b0
ksys_read+0x88/0x154
system_call_exception+0x124/0x340
system_call_common+0x160/0x2c4
Fixes: 90cbac0e995d ("powerpc: Enable KFENCE for PPC32")
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reported-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/fault.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 81c77ddce2e3..316f5162ffc4 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -439,10 +439,17 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
/*
* The kernel should never take an execute fault nor should it
* take a page fault to a kernel address or a page fault to a user
- * address outside of dedicated places
+ * address outside of dedicated places.
+ *
+ * Rather than kfence directly reporting false negatives, search whether
+ * the NIP belongs to the fixup table for cases where fault could come
+ * from functions like copy_from_kernel_nofault().
*/
if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) {
- if (kfence_handle_page_fault(address, is_write, regs))
+
+ if (is_kfence_address((void *)address) &&
+ !search_exception_tables(instruction_pointer(regs)) &&
+ kfence_handle_page_fault(address, is_write, regs))
return 0;
return SIGSEGV;
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 02/12] book3s64/hash: Remove kfence support temporarily
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 01/12] powerpc: mm/fault: Fix kfence page fault reporting Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 03/12] book3s64/hash: Refactor kernel linear map related calls Ritesh Harjani (IBM)
` (10 subsequent siblings)
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Kfence on book3s Hash on pseries is anyways broken. It fails to boot
due to RMA size limitation. That is because, kfence with Hash uses
debug_pagealloc infrastructure. debug_pagealloc allocates linear map
for entire dram size instead of just kfence relevant objects.
This means for 16TB of DRAM it will require (16TB >> PAGE_SHIFT)
which is 256MB which is half of RMA region on P8.
crash kernel reserves 256MB and we also need 2048 * 16KB * 3 for
emergency stack and some more for paca allocations.
That means there is not enough memory for reserving the full linear map
in the RMA region, if the DRAM size is too big (>=16TB)
(The issue is seen above 8TB with crash kernel 256 MB reservation).
Now Kfence does not require linear memory map for entire DRAM.
It only needs for kfence objects. So this patch temporarily removes the
kfence functionality since debug_pagealloc code needs some refactoring.
We will bring in kfence on Hash support in later patches.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/include/asm/kfence.h | 5 +++++
arch/powerpc/mm/book3s64/hash_utils.c | 16 +++++++++++-----
2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/kfence.h b/arch/powerpc/include/asm/kfence.h
index fab124ada1c7..f3a9476a71b3 100644
--- a/arch/powerpc/include/asm/kfence.h
+++ b/arch/powerpc/include/asm/kfence.h
@@ -10,6 +10,7 @@
#include <linux/mm.h>
#include <asm/pgtable.h>
+#include <asm/mmu.h>
#ifdef CONFIG_PPC64_ELF_ABI_V1
#define ARCH_FUNC_PREFIX "."
@@ -25,6 +26,10 @@ static inline void disable_kfence(void)
static inline bool arch_kfence_init_pool(void)
{
+#ifdef CONFIG_PPC64
+ if (!radix_enabled())
+ return false;
+#endif
return !kfence_disabled;
}
#endif
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index a408ef7d850e..e22a8f540193 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -431,7 +431,7 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
break;
cond_resched();
- if (debug_pagealloc_enabled_or_kfence() &&
+ if (debug_pagealloc_enabled() &&
(paddr >> PAGE_SHIFT) < linear_map_hash_count)
linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
}
@@ -814,7 +814,7 @@ static void __init htab_init_page_sizes(void)
bool aligned = true;
init_hpte_page_sizes();
- if (!debug_pagealloc_enabled_or_kfence()) {
+ if (!debug_pagealloc_enabled()) {
/*
* Pick a size for the linear mapping. Currently, we only
* support 16M, 1M and 4K which is the default
@@ -1134,7 +1134,7 @@ static void __init htab_initialize(void)
prot = pgprot_val(PAGE_KERNEL);
- if (debug_pagealloc_enabled_or_kfence()) {
+ if (debug_pagealloc_enabled()) {
linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
linear_map_hash_slots = memblock_alloc_try_nid(
linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
@@ -2120,7 +2120,7 @@ void hpt_do_stress(unsigned long ea, unsigned long hpte_group)
}
}
-#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
+#ifdef CONFIG_DEBUG_PAGEALLOC
static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
@@ -2194,7 +2194,13 @@ int hash__kernel_map_pages(struct page *page, int numpages, int enable)
local_irq_restore(flags);
return 0;
}
-#endif /* CONFIG_DEBUG_PAGEALLOC || CONFIG_KFENCE */
+#else /* CONFIG_DEBUG_PAGEALLOC */
+int hash__kernel_map_pages(struct page *page, int numpages,
+ int enable)
+{
+ return 0;
+}
+#endif /* CONFIG_DEBUG_PAGEALLOC */
void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
phys_addr_t first_memblock_size)
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 03/12] book3s64/hash: Refactor kernel linear map related calls
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 01/12] powerpc: mm/fault: Fix kfence page fault reporting Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 02/12] book3s64/hash: Remove kfence support temporarily Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 04/12] book3s64/hash: Add hash_debug_pagealloc_add_slot() function Ritesh Harjani (IBM)
` (9 subsequent siblings)
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
This just brings all linear map related handling at one place instead of
having those functions scattered in hash_utils file.
Makes it easy for review.
No functionality changes in this patch.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 164 +++++++++++++-------------
1 file changed, 82 insertions(+), 82 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index e22a8f540193..fb2f717e9e74 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -273,6 +273,88 @@ void hash__tlbiel_all(unsigned int action)
WARN(1, "%s called on pre-POWER7 CPU\n", __func__);
}
+#ifdef CONFIG_DEBUG_PAGEALLOC
+static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
+
+static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
+{
+ unsigned long hash;
+ unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
+ unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
+ unsigned long mode = htab_convert_pte_flags(pgprot_val(PAGE_KERNEL), HPTE_USE_KERNEL_KEY);
+ long ret;
+
+ hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
+
+ /* Don't create HPTE entries for bad address */
+ if (!vsid)
+ return;
+
+ if (linear_map_hash_slots[lmi] & 0x80)
+ return;
+
+ ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode,
+ HPTE_V_BOLTED,
+ mmu_linear_psize, mmu_kernel_ssize);
+
+ BUG_ON (ret < 0);
+ raw_spin_lock(&linear_map_hash_lock);
+ BUG_ON(linear_map_hash_slots[lmi] & 0x80);
+ linear_map_hash_slots[lmi] = ret | 0x80;
+ raw_spin_unlock(&linear_map_hash_lock);
+}
+
+static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
+{
+ unsigned long hash, hidx, slot;
+ unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
+ unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
+
+ hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
+ raw_spin_lock(&linear_map_hash_lock);
+ if (!(linear_map_hash_slots[lmi] & 0x80)) {
+ raw_spin_unlock(&linear_map_hash_lock);
+ return;
+ }
+ hidx = linear_map_hash_slots[lmi] & 0x7f;
+ linear_map_hash_slots[lmi] = 0;
+ raw_spin_unlock(&linear_map_hash_lock);
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+ slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+ slot += hidx & _PTEIDX_GROUP_IX;
+ mmu_hash_ops.hpte_invalidate(slot, vpn, mmu_linear_psize,
+ mmu_linear_psize,
+ mmu_kernel_ssize, 0);
+}
+
+int hash__kernel_map_pages(struct page *page, int numpages, int enable)
+{
+ unsigned long flags, vaddr, lmi;
+ int i;
+
+ local_irq_save(flags);
+ for (i = 0; i < numpages; i++, page++) {
+ vaddr = (unsigned long)page_address(page);
+ lmi = __pa(vaddr) >> PAGE_SHIFT;
+ if (lmi >= linear_map_hash_count)
+ continue;
+ if (enable)
+ kernel_map_linear_page(vaddr, lmi);
+ else
+ kernel_unmap_linear_page(vaddr, lmi);
+ }
+ local_irq_restore(flags);
+ return 0;
+}
+#else /* CONFIG_DEBUG_PAGEALLOC */
+int hash__kernel_map_pages(struct page *page, int numpages,
+ int enable)
+{
+ return 0;
+}
+#endif /* CONFIG_DEBUG_PAGEALLOC */
+
/*
* 'R' and 'C' update notes:
* - Under pHyp or KVM, the updatepp path will not set C, thus it *will*
@@ -2120,88 +2202,6 @@ void hpt_do_stress(unsigned long ea, unsigned long hpte_group)
}
}
-#ifdef CONFIG_DEBUG_PAGEALLOC
-static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
-
-static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
-{
- unsigned long hash;
- unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
- unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
- unsigned long mode = htab_convert_pte_flags(pgprot_val(PAGE_KERNEL), HPTE_USE_KERNEL_KEY);
- long ret;
-
- hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
-
- /* Don't create HPTE entries for bad address */
- if (!vsid)
- return;
-
- if (linear_map_hash_slots[lmi] & 0x80)
- return;
-
- ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode,
- HPTE_V_BOLTED,
- mmu_linear_psize, mmu_kernel_ssize);
-
- BUG_ON (ret < 0);
- raw_spin_lock(&linear_map_hash_lock);
- BUG_ON(linear_map_hash_slots[lmi] & 0x80);
- linear_map_hash_slots[lmi] = ret | 0x80;
- raw_spin_unlock(&linear_map_hash_lock);
-}
-
-static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
-{
- unsigned long hash, hidx, slot;
- unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
- unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
-
- hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
- raw_spin_lock(&linear_map_hash_lock);
- if (!(linear_map_hash_slots[lmi] & 0x80)) {
- raw_spin_unlock(&linear_map_hash_lock);
- return;
- }
- hidx = linear_map_hash_slots[lmi] & 0x7f;
- linear_map_hash_slots[lmi] = 0;
- raw_spin_unlock(&linear_map_hash_lock);
- if (hidx & _PTEIDX_SECONDARY)
- hash = ~hash;
- slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
- slot += hidx & _PTEIDX_GROUP_IX;
- mmu_hash_ops.hpte_invalidate(slot, vpn, mmu_linear_psize,
- mmu_linear_psize,
- mmu_kernel_ssize, 0);
-}
-
-int hash__kernel_map_pages(struct page *page, int numpages, int enable)
-{
- unsigned long flags, vaddr, lmi;
- int i;
-
- local_irq_save(flags);
- for (i = 0; i < numpages; i++, page++) {
- vaddr = (unsigned long)page_address(page);
- lmi = __pa(vaddr) >> PAGE_SHIFT;
- if (lmi >= linear_map_hash_count)
- continue;
- if (enable)
- kernel_map_linear_page(vaddr, lmi);
- else
- kernel_unmap_linear_page(vaddr, lmi);
- }
- local_irq_restore(flags);
- return 0;
-}
-#else /* CONFIG_DEBUG_PAGEALLOC */
-int hash__kernel_map_pages(struct page *page, int numpages,
- int enable)
-{
- return 0;
-}
-#endif /* CONFIG_DEBUG_PAGEALLOC */
-
void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
phys_addr_t first_memblock_size)
{
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 04/12] book3s64/hash: Add hash_debug_pagealloc_add_slot() function
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
` (2 preceding siblings ...)
2024-10-18 17:29 ` [PATCH v3 03/12] book3s64/hash: Refactor kernel linear map related calls Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 05/12] book3s64/hash: Add hash_debug_pagealloc_alloc_slots() function Ritesh Harjani (IBM)
` (8 subsequent siblings)
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
This adds hash_debug_pagealloc_add_slot() function instead of open
coding that in htab_bolt_mapping(). This is required since we will be
separating kfence functionality to not depend upon debug_pagealloc.
No functionality change in this patch.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index fb2f717e9e74..de3cabd66812 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -328,6 +328,14 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
mmu_kernel_ssize, 0);
}
+static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot)
+{
+ if (!debug_pagealloc_enabled())
+ return;
+ if ((paddr >> PAGE_SHIFT) < linear_map_hash_count)
+ linear_map_hash_slots[paddr >> PAGE_SHIFT] = slot | 0x80;
+}
+
int hash__kernel_map_pages(struct page *page, int numpages, int enable)
{
unsigned long flags, vaddr, lmi;
@@ -353,6 +361,7 @@ int hash__kernel_map_pages(struct page *page, int numpages,
{
return 0;
}
+static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot) {}
#endif /* CONFIG_DEBUG_PAGEALLOC */
/*
@@ -513,9 +522,7 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
break;
cond_resched();
- if (debug_pagealloc_enabled() &&
- (paddr >> PAGE_SHIFT) < linear_map_hash_count)
- linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
+ hash_debug_pagealloc_add_slot(paddr, ret);
}
return ret < 0 ? ret : 0;
}
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 05/12] book3s64/hash: Add hash_debug_pagealloc_alloc_slots() function
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
` (3 preceding siblings ...)
2024-10-18 17:29 ` [PATCH v3 04/12] book3s64/hash: Add hash_debug_pagealloc_add_slot() function Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 06/12] book3s64/hash: Refactor hash__kernel_map_pages() function Ritesh Harjani (IBM)
` (7 subsequent siblings)
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
This adds hash_debug_pagealloc_alloc_slots() function instead of open
coding that in htab_initialize(). This is required since we will be
separating the kfence functionality to not depend upon debug_pagealloc.
Now that everything required for debug_pagealloc is under a #ifdef
config. Bring in linear_map_hash_slots and linear_map_hash_count
variables under the same config too.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 29 ++++++++++++++++-----------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index de3cabd66812..0b63acf62d1d 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -123,8 +123,6 @@ EXPORT_SYMBOL_GPL(mmu_slb_size);
#ifdef CONFIG_PPC_64K_PAGES
int mmu_ci_restrictions;
#endif
-static u8 *linear_map_hash_slots;
-static unsigned long linear_map_hash_count;
struct mmu_hash_ops mmu_hash_ops __ro_after_init;
EXPORT_SYMBOL(mmu_hash_ops);
@@ -274,6 +272,8 @@ void hash__tlbiel_all(unsigned int action)
}
#ifdef CONFIG_DEBUG_PAGEALLOC
+static u8 *linear_map_hash_slots;
+static unsigned long linear_map_hash_count;
static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
@@ -328,6 +328,19 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
mmu_kernel_ssize, 0);
}
+static inline void hash_debug_pagealloc_alloc_slots(void)
+{
+ if (!debug_pagealloc_enabled())
+ return;
+ linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
+ linear_map_hash_slots = memblock_alloc_try_nid(
+ linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
+ ppc64_rma_size, NUMA_NO_NODE);
+ if (!linear_map_hash_slots)
+ panic("%s: Failed to allocate %lu bytes max_addr=%pa\n",
+ __func__, linear_map_hash_count, &ppc64_rma_size);
+}
+
static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot)
{
if (!debug_pagealloc_enabled())
@@ -361,6 +374,7 @@ int hash__kernel_map_pages(struct page *page, int numpages,
{
return 0;
}
+static inline void hash_debug_pagealloc_alloc_slots(void) {}
static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot) {}
#endif /* CONFIG_DEBUG_PAGEALLOC */
@@ -1223,16 +1237,7 @@ static void __init htab_initialize(void)
prot = pgprot_val(PAGE_KERNEL);
- if (debug_pagealloc_enabled()) {
- linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
- linear_map_hash_slots = memblock_alloc_try_nid(
- linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
- ppc64_rma_size, NUMA_NO_NODE);
- if (!linear_map_hash_slots)
- panic("%s: Failed to allocate %lu bytes max_addr=%pa\n",
- __func__, linear_map_hash_count, &ppc64_rma_size);
- }
-
+ hash_debug_pagealloc_alloc_slots();
/* create bolted the linear mapping in the hash table */
for_each_mem_range(i, &base, &end) {
size = end - base;
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 06/12] book3s64/hash: Refactor hash__kernel_map_pages() function
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
` (4 preceding siblings ...)
2024-10-18 17:29 ` [PATCH v3 05/12] book3s64/hash: Add hash_debug_pagealloc_alloc_slots() function Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 07/12] book3s64/hash: Make kernel_map_linear_page() generic Ritesh Harjani (IBM)
` (6 subsequent siblings)
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
This refactors hash__kernel_map_pages() function to call
hash_debug_pagealloc_map_pages(). This will come useful when we will add
kfence support.
No functionality changes in this patch.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 0b63acf62d1d..ab50bb33a390 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -349,7 +349,8 @@ static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot)
linear_map_hash_slots[paddr >> PAGE_SHIFT] = slot | 0x80;
}
-int hash__kernel_map_pages(struct page *page, int numpages, int enable)
+static int hash_debug_pagealloc_map_pages(struct page *page, int numpages,
+ int enable)
{
unsigned long flags, vaddr, lmi;
int i;
@@ -368,6 +369,12 @@ int hash__kernel_map_pages(struct page *page, int numpages, int enable)
local_irq_restore(flags);
return 0;
}
+
+int hash__kernel_map_pages(struct page *page, int numpages, int enable)
+{
+ return hash_debug_pagealloc_map_pages(page, numpages, enable);
+}
+
#else /* CONFIG_DEBUG_PAGEALLOC */
int hash__kernel_map_pages(struct page *page, int numpages,
int enable)
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 07/12] book3s64/hash: Make kernel_map_linear_page() generic
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
` (5 preceding siblings ...)
2024-10-18 17:29 ` [PATCH v3 06/12] book3s64/hash: Refactor hash__kernel_map_pages() function Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 08/12] book3s64/hash: Disable debug_pagealloc if it requires more memory Ritesh Harjani (IBM)
` (5 subsequent siblings)
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Currently kernel_map_linear_page() function assumes to be working on
linear_map_hash_slots array. But since in later patches we need a
separate linear map array for kfence, hence make
kernel_map_linear_page() take a linear map array and lock in it's
function argument.
This is needed to separate out kfence from debug_pagealloc
infrastructure.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 47 ++++++++++++++-------------
1 file changed, 25 insertions(+), 22 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index ab50bb33a390..11975a2f7403 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -272,11 +272,8 @@ void hash__tlbiel_all(unsigned int action)
}
#ifdef CONFIG_DEBUG_PAGEALLOC
-static u8 *linear_map_hash_slots;
-static unsigned long linear_map_hash_count;
-static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
-
-static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
+static void kernel_map_linear_page(unsigned long vaddr, unsigned long idx,
+ u8 *slots, raw_spinlock_t *lock)
{
unsigned long hash;
unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
@@ -290,7 +287,7 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
if (!vsid)
return;
- if (linear_map_hash_slots[lmi] & 0x80)
+ if (slots[idx] & 0x80)
return;
ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode,
@@ -298,36 +295,40 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
mmu_linear_psize, mmu_kernel_ssize);
BUG_ON (ret < 0);
- raw_spin_lock(&linear_map_hash_lock);
- BUG_ON(linear_map_hash_slots[lmi] & 0x80);
- linear_map_hash_slots[lmi] = ret | 0x80;
- raw_spin_unlock(&linear_map_hash_lock);
+ raw_spin_lock(lock);
+ BUG_ON(slots[idx] & 0x80);
+ slots[idx] = ret | 0x80;
+ raw_spin_unlock(lock);
}
-static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
+static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long idx,
+ u8 *slots, raw_spinlock_t *lock)
{
- unsigned long hash, hidx, slot;
+ unsigned long hash, hslot, slot;
unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
- raw_spin_lock(&linear_map_hash_lock);
- if (!(linear_map_hash_slots[lmi] & 0x80)) {
- raw_spin_unlock(&linear_map_hash_lock);
+ raw_spin_lock(lock);
+ if (!(slots[idx] & 0x80)) {
+ raw_spin_unlock(lock);
return;
}
- hidx = linear_map_hash_slots[lmi] & 0x7f;
- linear_map_hash_slots[lmi] = 0;
- raw_spin_unlock(&linear_map_hash_lock);
- if (hidx & _PTEIDX_SECONDARY)
+ hslot = slots[idx] & 0x7f;
+ slots[idx] = 0;
+ raw_spin_unlock(lock);
+ if (hslot & _PTEIDX_SECONDARY)
hash = ~hash;
slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
- slot += hidx & _PTEIDX_GROUP_IX;
+ slot += hslot & _PTEIDX_GROUP_IX;
mmu_hash_ops.hpte_invalidate(slot, vpn, mmu_linear_psize,
mmu_linear_psize,
mmu_kernel_ssize, 0);
}
+static u8 *linear_map_hash_slots;
+static unsigned long linear_map_hash_count;
+static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
static inline void hash_debug_pagealloc_alloc_slots(void)
{
if (!debug_pagealloc_enabled())
@@ -362,9 +363,11 @@ static int hash_debug_pagealloc_map_pages(struct page *page, int numpages,
if (lmi >= linear_map_hash_count)
continue;
if (enable)
- kernel_map_linear_page(vaddr, lmi);
+ kernel_map_linear_page(vaddr, lmi,
+ linear_map_hash_slots, &linear_map_hash_lock);
else
- kernel_unmap_linear_page(vaddr, lmi);
+ kernel_unmap_linear_page(vaddr, lmi,
+ linear_map_hash_slots, &linear_map_hash_lock);
}
local_irq_restore(flags);
return 0;
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 08/12] book3s64/hash: Disable debug_pagealloc if it requires more memory
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
` (6 preceding siblings ...)
2024-10-18 17:29 ` [PATCH v3 07/12] book3s64/hash: Make kernel_map_linear_page() generic Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 09/12] book3s64/hash: Add kfence functionality Ritesh Harjani (IBM)
` (4 subsequent siblings)
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Make size of the linear map to be allocated in RMA region to be of
ppc64_rma_size / 4. If debug_pagealloc requires more memory than that
then do not allocate any memory and disable debug_pagealloc.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 11975a2f7403..f51f2cd9bf22 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -331,9 +331,19 @@ static unsigned long linear_map_hash_count;
static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
static inline void hash_debug_pagealloc_alloc_slots(void)
{
+ unsigned long max_hash_count = ppc64_rma_size / 4;
+
if (!debug_pagealloc_enabled())
return;
linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
+ if (unlikely(linear_map_hash_count > max_hash_count)) {
+ pr_info("linear map size (%llu) greater than 4 times RMA region (%llu). Disabling debug_pagealloc\n",
+ ((u64)linear_map_hash_count << PAGE_SHIFT),
+ ppc64_rma_size);
+ linear_map_hash_count = 0;
+ return;
+ }
+
linear_map_hash_slots = memblock_alloc_try_nid(
linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
ppc64_rma_size, NUMA_NO_NODE);
@@ -344,7 +354,7 @@ static inline void hash_debug_pagealloc_alloc_slots(void)
static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot)
{
- if (!debug_pagealloc_enabled())
+ if (!debug_pagealloc_enabled() || !linear_map_hash_count)
return;
if ((paddr >> PAGE_SHIFT) < linear_map_hash_count)
linear_map_hash_slots[paddr >> PAGE_SHIFT] = slot | 0x80;
@@ -356,6 +366,9 @@ static int hash_debug_pagealloc_map_pages(struct page *page, int numpages,
unsigned long flags, vaddr, lmi;
int i;
+ if (!debug_pagealloc_enabled() || !linear_map_hash_count)
+ return 0;
+
local_irq_save(flags);
for (i = 0; i < numpages; i++, page++) {
vaddr = (unsigned long)page_address(page);
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 09/12] book3s64/hash: Add kfence functionality
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
` (7 preceding siblings ...)
2024-10-18 17:29 ` [PATCH v3 08/12] book3s64/hash: Disable debug_pagealloc if it requires more memory Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 10/12] book3s64/radix: Refactoring common kfence related functions Ritesh Harjani (IBM)
` (3 subsequent siblings)
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Now that linear map functionality of debug_pagealloc is made generic,
enable kfence to use this generic infrastructure.
1. Define kfence related linear map variables.
- u8 *linear_map_kf_hash_slots;
- unsigned long linear_map_kf_hash_count;
- DEFINE_RAW_SPINLOCK(linear_map_kf_hash_lock);
2. The linear map size allocated in RMA region is quite small
(KFENCE_POOL_SIZE >> PAGE_SHIFT) which is 512 bytes by default.
3. kfence pool memory is reserved using memblock_phys_alloc() which has
can come from anywhere.
(default 255 objects => ((1+255) * 2) << PAGE_SHIFT = 32MB)
4. The hash slot information for kfence memory gets added in linear map
in hash_linear_map_add_slot() (which also adds for debug_pagealloc).
Reported-by: Pavithra Prakash <pavrampu@linux.vnet.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/include/asm/kfence.h | 5 -
arch/powerpc/mm/book3s64/hash_utils.c | 162 +++++++++++++++++++++++---
2 files changed, 149 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/include/asm/kfence.h b/arch/powerpc/include/asm/kfence.h
index f3a9476a71b3..fab124ada1c7 100644
--- a/arch/powerpc/include/asm/kfence.h
+++ b/arch/powerpc/include/asm/kfence.h
@@ -10,7 +10,6 @@
#include <linux/mm.h>
#include <asm/pgtable.h>
-#include <asm/mmu.h>
#ifdef CONFIG_PPC64_ELF_ABI_V1
#define ARCH_FUNC_PREFIX "."
@@ -26,10 +25,6 @@ static inline void disable_kfence(void)
static inline bool arch_kfence_init_pool(void)
{
-#ifdef CONFIG_PPC64
- if (!radix_enabled())
- return false;
-#endif
return !kfence_disabled;
}
#endif
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index f51f2cd9bf22..558d6f5202b9 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -40,6 +40,7 @@
#include <linux/random.h>
#include <linux/elf-randomize.h>
#include <linux/of_fdt.h>
+#include <linux/kfence.h>
#include <asm/interrupt.h>
#include <asm/processor.h>
@@ -66,6 +67,7 @@
#include <asm/pte-walk.h>
#include <asm/asm-prototypes.h>
#include <asm/ultravisor.h>
+#include <asm/kfence.h>
#include <mm/mmu_decl.h>
@@ -271,7 +273,7 @@ void hash__tlbiel_all(unsigned int action)
WARN(1, "%s called on pre-POWER7 CPU\n", __func__);
}
-#ifdef CONFIG_DEBUG_PAGEALLOC
+#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
static void kernel_map_linear_page(unsigned long vaddr, unsigned long idx,
u8 *slots, raw_spinlock_t *lock)
{
@@ -325,11 +327,13 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long idx,
mmu_linear_psize,
mmu_kernel_ssize, 0);
}
+#endif
+#ifdef CONFIG_DEBUG_PAGEALLOC
static u8 *linear_map_hash_slots;
static unsigned long linear_map_hash_count;
static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
-static inline void hash_debug_pagealloc_alloc_slots(void)
+static void hash_debug_pagealloc_alloc_slots(void)
{
unsigned long max_hash_count = ppc64_rma_size / 4;
@@ -352,7 +356,8 @@ static inline void hash_debug_pagealloc_alloc_slots(void)
__func__, linear_map_hash_count, &ppc64_rma_size);
}
-static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot)
+static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr,
+ int slot)
{
if (!debug_pagealloc_enabled() || !linear_map_hash_count)
return;
@@ -386,20 +391,148 @@ static int hash_debug_pagealloc_map_pages(struct page *page, int numpages,
return 0;
}
-int hash__kernel_map_pages(struct page *page, int numpages, int enable)
+#else /* CONFIG_DEBUG_PAGEALLOC */
+static inline void hash_debug_pagealloc_alloc_slots(void) {}
+static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot) {}
+static int __maybe_unused
+hash_debug_pagealloc_map_pages(struct page *page, int numpages, int enable)
{
- return hash_debug_pagealloc_map_pages(page, numpages, enable);
+ return 0;
}
+#endif /* CONFIG_DEBUG_PAGEALLOC */
-#else /* CONFIG_DEBUG_PAGEALLOC */
-int hash__kernel_map_pages(struct page *page, int numpages,
- int enable)
+#ifdef CONFIG_KFENCE
+static u8 *linear_map_kf_hash_slots;
+static unsigned long linear_map_kf_hash_count;
+static DEFINE_RAW_SPINLOCK(linear_map_kf_hash_lock);
+
+static phys_addr_t kfence_pool;
+
+static inline void hash_kfence_alloc_pool(void)
+{
+
+ /* allocate linear map for kfence within RMA region */
+ linear_map_kf_hash_count = KFENCE_POOL_SIZE >> PAGE_SHIFT;
+ linear_map_kf_hash_slots = memblock_alloc_try_nid(
+ linear_map_kf_hash_count, 1,
+ MEMBLOCK_LOW_LIMIT, ppc64_rma_size,
+ NUMA_NO_NODE);
+ if (!linear_map_kf_hash_slots) {
+ pr_err("%s: memblock for linear map (%lu) failed\n", __func__,
+ linear_map_kf_hash_count);
+ goto err;
+ }
+
+ /* allocate kfence pool early */
+ kfence_pool = memblock_phys_alloc_range(KFENCE_POOL_SIZE, PAGE_SIZE,
+ MEMBLOCK_LOW_LIMIT, MEMBLOCK_ALLOC_ANYWHERE);
+ if (!kfence_pool) {
+ pr_err("%s: memblock for kfence pool (%lu) failed\n", __func__,
+ KFENCE_POOL_SIZE);
+ memblock_free(linear_map_kf_hash_slots,
+ linear_map_kf_hash_count);
+ linear_map_kf_hash_count = 0;
+ goto err;
+ }
+ memblock_mark_nomap(kfence_pool, KFENCE_POOL_SIZE);
+
+ return;
+err:
+ pr_info("Disabling kfence\n");
+ disable_kfence();
+}
+
+static inline void hash_kfence_map_pool(void)
+{
+ unsigned long kfence_pool_start, kfence_pool_end;
+ unsigned long prot = pgprot_val(PAGE_KERNEL);
+
+ if (!kfence_pool)
+ return;
+
+ kfence_pool_start = (unsigned long) __va(kfence_pool);
+ kfence_pool_end = kfence_pool_start + KFENCE_POOL_SIZE;
+ __kfence_pool = (char *) kfence_pool_start;
+ BUG_ON(htab_bolt_mapping(kfence_pool_start, kfence_pool_end,
+ kfence_pool, prot, mmu_linear_psize,
+ mmu_kernel_ssize));
+ memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
+}
+
+static inline void hash_kfence_add_slot(phys_addr_t paddr, int slot)
{
+ unsigned long vaddr = (unsigned long) __va(paddr);
+ unsigned long lmi = (vaddr - (unsigned long)__kfence_pool)
+ >> PAGE_SHIFT;
+
+ if (!kfence_pool)
+ return;
+ BUG_ON(!is_kfence_address((void *)vaddr));
+ BUG_ON(lmi >= linear_map_kf_hash_count);
+ linear_map_kf_hash_slots[lmi] = slot | 0x80;
+}
+
+static int hash_kfence_map_pages(struct page *page, int numpages, int enable)
+{
+ unsigned long flags, vaddr, lmi;
+ int i;
+
+ WARN_ON_ONCE(!linear_map_kf_hash_count);
+ local_irq_save(flags);
+ for (i = 0; i < numpages; i++, page++) {
+ vaddr = (unsigned long)page_address(page);
+ lmi = (vaddr - (unsigned long)__kfence_pool) >> PAGE_SHIFT;
+
+ /* Ideally this should never happen */
+ if (lmi >= linear_map_kf_hash_count) {
+ WARN_ON_ONCE(1);
+ continue;
+ }
+
+ if (enable)
+ kernel_map_linear_page(vaddr, lmi,
+ linear_map_kf_hash_slots,
+ &linear_map_kf_hash_lock);
+ else
+ kernel_unmap_linear_page(vaddr, lmi,
+ linear_map_kf_hash_slots,
+ &linear_map_kf_hash_lock);
+ }
+ local_irq_restore(flags);
return 0;
}
-static inline void hash_debug_pagealloc_alloc_slots(void) {}
-static inline void hash_debug_pagealloc_add_slot(phys_addr_t paddr, int slot) {}
-#endif /* CONFIG_DEBUG_PAGEALLOC */
+#else
+static inline void hash_kfence_alloc_pool(void) {}
+static inline void hash_kfence_map_pool(void) {}
+static inline void hash_kfence_add_slot(phys_addr_t paddr, int slot) {}
+static int __maybe_unused
+hash_kfence_map_pages(struct page *page, int numpages, int enable)
+{
+ return 0;
+}
+#endif
+
+#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
+int hash__kernel_map_pages(struct page *page, int numpages, int enable)
+{
+ void *vaddr = page_address(page);
+
+ if (is_kfence_address(vaddr))
+ return hash_kfence_map_pages(page, numpages, enable);
+ else
+ return hash_debug_pagealloc_map_pages(page, numpages, enable);
+}
+
+static void hash_linear_map_add_slot(phys_addr_t paddr, int slot)
+{
+ if (is_kfence_address(__va(paddr)))
+ hash_kfence_add_slot(paddr, slot);
+ else
+ hash_debug_pagealloc_add_slot(paddr, slot);
+}
+#else
+static void hash_linear_map_add_slot(phys_addr_t paddr, int slot) {}
+#endif
/*
* 'R' and 'C' update notes:
@@ -559,7 +692,8 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend,
break;
cond_resched();
- hash_debug_pagealloc_add_slot(paddr, ret);
+ /* add slot info in debug_pagealloc / kfence linear map */
+ hash_linear_map_add_slot(paddr, ret);
}
return ret < 0 ? ret : 0;
}
@@ -940,7 +1074,7 @@ static void __init htab_init_page_sizes(void)
bool aligned = true;
init_hpte_page_sizes();
- if (!debug_pagealloc_enabled()) {
+ if (!debug_pagealloc_enabled_or_kfence()) {
/*
* Pick a size for the linear mapping. Currently, we only
* support 16M, 1M and 4K which is the default
@@ -1261,6 +1395,7 @@ static void __init htab_initialize(void)
prot = pgprot_val(PAGE_KERNEL);
hash_debug_pagealloc_alloc_slots();
+ hash_kfence_alloc_pool();
/* create bolted the linear mapping in the hash table */
for_each_mem_range(i, &base, &end) {
size = end - base;
@@ -1277,6 +1412,7 @@ static void __init htab_initialize(void)
BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
prot, mmu_linear_psize, mmu_kernel_ssize));
}
+ hash_kfence_map_pool();
memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
/*
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 10/12] book3s64/radix: Refactoring common kfence related functions
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
` (8 preceding siblings ...)
2024-10-18 17:29 ` [PATCH v3 09/12] book3s64/hash: Add kfence functionality Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 11/12] book3s64/hash: Disable kfence if not early init Ritesh Harjani (IBM)
` (2 subsequent siblings)
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Both radix and hash on book3s requires to detect if kfence
early init is enabled or not. Hash needs to disable kfence
if early init is not enabled because with kfence the linear map is
mapped using PAGE_SIZE rather than 16M mapping.
We don't support multiple page sizes for slb entry used for kernel
linear map in book3s64.
This patch refactors out the common functions required to detect kfence
early init is enabled or not.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/include/asm/kfence.h | 8 ++++++--
arch/powerpc/mm/book3s64/pgtable.c | 13 +++++++++++++
arch/powerpc/mm/book3s64/radix_pgtable.c | 12 ------------
arch/powerpc/mm/init-common.c | 1 +
4 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/include/asm/kfence.h b/arch/powerpc/include/asm/kfence.h
index fab124ada1c7..1f7cab58ab2c 100644
--- a/arch/powerpc/include/asm/kfence.h
+++ b/arch/powerpc/include/asm/kfence.h
@@ -15,7 +15,7 @@
#define ARCH_FUNC_PREFIX "."
#endif
-#ifdef CONFIG_KFENCE
+extern bool kfence_early_init;
extern bool kfence_disabled;
static inline void disable_kfence(void)
@@ -27,7 +27,11 @@ static inline bool arch_kfence_init_pool(void)
{
return !kfence_disabled;
}
-#endif
+
+static inline bool kfence_early_init_enabled(void)
+{
+ return IS_ENABLED(CONFIG_KFENCE) && kfence_early_init;
+}
#ifdef CONFIG_PPC64
static inline bool kfence_protect_page(unsigned long addr, bool protect)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index f4d8d3c40e5c..1563a8c28feb 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -37,6 +37,19 @@ EXPORT_SYMBOL(__pmd_frag_nr);
unsigned long __pmd_frag_size_shift;
EXPORT_SYMBOL(__pmd_frag_size_shift);
+#ifdef CONFIG_KFENCE
+extern bool kfence_early_init;
+static int __init parse_kfence_early_init(char *arg)
+{
+ int val;
+
+ if (get_option(&arg, &val))
+ kfence_early_init = !!val;
+ return 0;
+}
+early_param("kfence.sample_interval", parse_kfence_early_init);
+#endif
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/*
* This is called when relaxing access to a hugepage. It's also called in the page
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index b0d927009af8..311e2112d782 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -363,18 +363,6 @@ static int __meminit create_physical_mapping(unsigned long start,
}
#ifdef CONFIG_KFENCE
-static bool __ro_after_init kfence_early_init = !!CONFIG_KFENCE_SAMPLE_INTERVAL;
-
-static int __init parse_kfence_early_init(char *arg)
-{
- int val;
-
- if (get_option(&arg, &val))
- kfence_early_init = !!val;
- return 0;
-}
-early_param("kfence.sample_interval", parse_kfence_early_init);
-
static inline phys_addr_t alloc_kfence_pool(void)
{
phys_addr_t kfence_pool;
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 2978fcbe307e..745097554bea 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -33,6 +33,7 @@ bool disable_kuep = !IS_ENABLED(CONFIG_PPC_KUEP);
bool disable_kuap = !IS_ENABLED(CONFIG_PPC_KUAP);
#ifdef CONFIG_KFENCE
bool __ro_after_init kfence_disabled;
+bool __ro_after_init kfence_early_init = !!CONFIG_KFENCE_SAMPLE_INTERVAL;
#endif
static int __init parse_nosmep(char *p)
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 11/12] book3s64/hash: Disable kfence if not early init
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
` (9 preceding siblings ...)
2024-10-18 17:29 ` [PATCH v3 10/12] book3s64/radix: Refactoring common kfence related functions Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 12/12] book3s64/hash: Early detect debug_pagealloc size requirement Ritesh Harjani (IBM)
2024-11-07 8:42 ` [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Michael Ellerman
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Enable kfence on book3s64 hash only when early init is enabled.
This is because, kfence could cause the kernel linear map to be mapped
at PAGE_SIZE level instead of 16M (which I guess we don't want).
Also currently there is no way to -
1. Make multiple page size entries for the SLB used for kernel linear
map.
2. No easy way of getting the hash slot details after the page table
mapping for kernel linear setup. So even if kfence allocate the
pool in late init, we won't be able to get the hash slot details in
kfence linear map.
Thus this patch disables kfence on hash if kfence early init is not
enabled.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 558d6f5202b9..2f5dd6310a8f 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -410,6 +410,8 @@ static phys_addr_t kfence_pool;
static inline void hash_kfence_alloc_pool(void)
{
+ if (!kfence_early_init_enabled())
+ goto err;
/* allocate linear map for kfence within RMA region */
linear_map_kf_hash_count = KFENCE_POOL_SIZE >> PAGE_SHIFT;
@@ -1074,7 +1076,7 @@ static void __init htab_init_page_sizes(void)
bool aligned = true;
init_hpte_page_sizes();
- if (!debug_pagealloc_enabled_or_kfence()) {
+ if (!debug_pagealloc_enabled() && !kfence_early_init_enabled()) {
/*
* Pick a size for the linear mapping. Currently, we only
* support 16M, 1M and 4K which is the default
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 12/12] book3s64/hash: Early detect debug_pagealloc size requirement
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
` (10 preceding siblings ...)
2024-10-18 17:29 ` [PATCH v3 11/12] book3s64/hash: Disable kfence if not early init Ritesh Harjani (IBM)
@ 2024-10-18 17:29 ` Ritesh Harjani (IBM)
2024-11-07 8:42 ` [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Michael Ellerman
12 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani (IBM) @ 2024-10-18 17:29 UTC (permalink / raw)
To: linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML,
Ritesh Harjani (IBM)
Add hash_supports_debug_pagealloc() helper to detect whether
debug_pagealloc can be supported on hash or not. This checks for both,
whether debug_pagealloc config is enabled and the linear map should
fit within rma_size/4 region size.
This can then be used early during htab_init_page_sizes() to decide
linear map pagesize if hash supports either debug_pagealloc or
kfence.
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 2f5dd6310a8f..2674f763f5db 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -329,25 +329,26 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long idx,
}
#endif
+static inline bool hash_supports_debug_pagealloc(void)
+{
+ unsigned long max_hash_count = ppc64_rma_size / 4;
+ unsigned long linear_map_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
+
+ if (!debug_pagealloc_enabled() || linear_map_count > max_hash_count)
+ return false;
+ return true;
+}
+
#ifdef CONFIG_DEBUG_PAGEALLOC
static u8 *linear_map_hash_slots;
static unsigned long linear_map_hash_count;
static DEFINE_RAW_SPINLOCK(linear_map_hash_lock);
static void hash_debug_pagealloc_alloc_slots(void)
{
- unsigned long max_hash_count = ppc64_rma_size / 4;
-
- if (!debug_pagealloc_enabled())
- return;
- linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
- if (unlikely(linear_map_hash_count > max_hash_count)) {
- pr_info("linear map size (%llu) greater than 4 times RMA region (%llu). Disabling debug_pagealloc\n",
- ((u64)linear_map_hash_count << PAGE_SHIFT),
- ppc64_rma_size);
- linear_map_hash_count = 0;
+ if (!hash_supports_debug_pagealloc())
return;
- }
+ linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
linear_map_hash_slots = memblock_alloc_try_nid(
linear_map_hash_count, 1, MEMBLOCK_LOW_LIMIT,
ppc64_rma_size, NUMA_NO_NODE);
@@ -1076,7 +1077,7 @@ static void __init htab_init_page_sizes(void)
bool aligned = true;
init_hpte_page_sizes();
- if (!debug_pagealloc_enabled() && !kfence_early_init_enabled()) {
+ if (!hash_supports_debug_pagealloc() && !kfence_early_init_enabled()) {
/*
* Pick a size for the linear mapping. Currently, we only
* support 16M, 1M and 4K which is the default
--
2.46.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 01/12] powerpc: mm/fault: Fix kfence page fault reporting
2024-10-18 17:29 ` [PATCH v3 01/12] powerpc: mm/fault: Fix kfence page fault reporting Ritesh Harjani (IBM)
@ 2024-10-18 17:40 ` Christophe Leroy
2024-10-22 2:42 ` Michael Ellerman
1 sibling, 0 replies; 17+ messages in thread
From: Christophe Leroy @ 2024-10-18 17:40 UTC (permalink / raw)
To: Ritesh Harjani (IBM), linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Hari Bathini, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, LKML, Disha Goel
Le 18/10/2024 à 19:29, Ritesh Harjani (IBM) a écrit :
> copy_from_kernel_nofault() can be called when doing read of /proc/kcore.
> /proc/kcore can have some unmapped kfence objects which when read via
> copy_from_kernel_nofault() can cause page faults. Since *_nofault()
> functions define their own fixup table for handling fault, use that
> instead of asking kfence to handle such faults.
>
> Hence we search the exception tables for the nip which generated the
> fault. If there is an entry then we let the fixup table handler handle the
> page fault by returning an error from within ___do_page_fault().
>
> This can be easily triggered if someone tries to do dd from /proc/kcore.
> dd if=/proc/kcore of=/dev/null bs=1M
>
> <some example false negatives>
> ===============================
> BUG: KFENCE: invalid read in copy_from_kernel_nofault+0xb0/0x1c8
> Invalid read at 0x000000004f749d2e:
> copy_from_kernel_nofault+0xb0/0x1c8
> 0xc0000000057f7950
> read_kcore_iter+0x41c/0x9ac
> proc_reg_read_iter+0xe4/0x16c
> vfs_read+0x2e4/0x3b0
> ksys_read+0x88/0x154
> system_call_exception+0x124/0x340
> system_call_common+0x160/0x2c4
>
> BUG: KFENCE: use-after-free read in copy_from_kernel_nofault+0xb0/0x1c8
> Use-after-free read at 0x000000008fbb08ad (in kfence-#0):
> copy_from_kernel_nofault+0xb0/0x1c8
> 0xc0000000057f7950
> read_kcore_iter+0x41c/0x9ac
> proc_reg_read_iter+0xe4/0x16c
> vfs_read+0x2e4/0x3b0
> ksys_read+0x88/0x154
> system_call_exception+0x124/0x340
> system_call_common+0x160/0x2c4
>
> Fixes: 90cbac0e995d ("powerpc: Enable KFENCE for PPC32")
> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Nit below.
> Reported-by: Disha Goel <disgoel@linux.ibm.com>
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
> arch/powerpc/mm/fault.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 81c77ddce2e3..316f5162ffc4 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -439,10 +439,17 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
> /*
> * The kernel should never take an execute fault nor should it
> * take a page fault to a kernel address or a page fault to a user
> - * address outside of dedicated places
> + * address outside of dedicated places.
> + *
> + * Rather than kfence directly reporting false negatives, search whether
> + * the NIP belongs to the fixup table for cases where fault could come
> + * from functions like copy_from_kernel_nofault().
> */
> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) {
> - if (kfence_handle_page_fault(address, is_write, regs))
> +
Why do you need a blank line here ?
> + if (is_kfence_address((void *)address) &&
> + !search_exception_tables(instruction_pointer(regs)) &&
> + kfence_handle_page_fault(address, is_write, regs))
> return 0;
>
> return SIGSEGV;
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 01/12] powerpc: mm/fault: Fix kfence page fault reporting
2024-10-18 17:29 ` [PATCH v3 01/12] powerpc: mm/fault: Fix kfence page fault reporting Ritesh Harjani (IBM)
2024-10-18 17:40 ` Christophe Leroy
@ 2024-10-22 2:42 ` Michael Ellerman
2024-10-22 3:09 ` Ritesh Harjani
1 sibling, 1 reply; 17+ messages in thread
From: Michael Ellerman @ 2024-10-22 2:42 UTC (permalink / raw)
To: Ritesh Harjani (IBM), linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Nicholas Piggin, Madhavan Srinivasan,
Christophe Leroy, Hari Bathini, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, LKML, Ritesh Harjani (IBM),
Disha Goel
Hi Ritesh,
"Ritesh Harjani (IBM)" <ritesh.list@gmail.com> writes:
> copy_from_kernel_nofault() can be called when doing read of /proc/kcore.
> /proc/kcore can have some unmapped kfence objects which when read via
> copy_from_kernel_nofault() can cause page faults. Since *_nofault()
> functions define their own fixup table for handling fault, use that
> instead of asking kfence to handle such faults.
>
> Hence we search the exception tables for the nip which generated the
> fault. If there is an entry then we let the fixup table handler handle the
> page fault by returning an error from within ___do_page_fault().
>
> This can be easily triggered if someone tries to do dd from /proc/kcore.
> dd if=/proc/kcore of=/dev/null bs=1M
>
> <some example false negatives>
> ===============================
> BUG: KFENCE: invalid read in copy_from_kernel_nofault+0xb0/0x1c8
> Invalid read at 0x000000004f749d2e:
> copy_from_kernel_nofault+0xb0/0x1c8
> 0xc0000000057f7950
> read_kcore_iter+0x41c/0x9ac
> proc_reg_read_iter+0xe4/0x16c
> vfs_read+0x2e4/0x3b0
> ksys_read+0x88/0x154
> system_call_exception+0x124/0x340
> system_call_common+0x160/0x2c4
I haven't been able to reproduce this. Can you give some more details on
the exact machine/kernel-config/setup where you saw this?
cheers
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 01/12] powerpc: mm/fault: Fix kfence page fault reporting
2024-10-22 2:42 ` Michael Ellerman
@ 2024-10-22 3:09 ` Ritesh Harjani
0 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani @ 2024-10-22 3:09 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Nicholas Piggin, Madhavan Srinivasan,
Christophe Leroy, Hari Bathini, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, LKML, Disha Goel
Michael Ellerman <mpe@ellerman.id.au> writes:
> Hi Ritesh,
>
> "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> writes:
>> copy_from_kernel_nofault() can be called when doing read of /proc/kcore.
>> /proc/kcore can have some unmapped kfence objects which when read via
>> copy_from_kernel_nofault() can cause page faults. Since *_nofault()
>> functions define their own fixup table for handling fault, use that
>> instead of asking kfence to handle such faults.
>>
>> Hence we search the exception tables for the nip which generated the
>> fault. If there is an entry then we let the fixup table handler handle the
>> page fault by returning an error from within ___do_page_fault().
>>
>> This can be easily triggered if someone tries to do dd from /proc/kcore.
>> dd if=/proc/kcore of=/dev/null bs=1M
>>
>> <some example false negatives>
>> ===============================
>> BUG: KFENCE: invalid read in copy_from_kernel_nofault+0xb0/0x1c8
>> Invalid read at 0x000000004f749d2e:
>> copy_from_kernel_nofault+0xb0/0x1c8
>> 0xc0000000057f7950
>> read_kcore_iter+0x41c/0x9ac
>> proc_reg_read_iter+0xe4/0x16c
>> vfs_read+0x2e4/0x3b0
>> ksys_read+0x88/0x154
>> system_call_exception+0x124/0x340
>> system_call_common+0x160/0x2c4
>
> I haven't been able to reproduce this. Can you give some more details on
> the exact machine/kernel-config/setup where you saw this?
w/o this patch I am able to hit this on book3s64 with both Radix and
Hash. I believe these configs should do the job. We should be able to
reproduce it on qemu and/or LPAR or baremetal.
root-> cat .out-ppc/.config |grep -i KFENCE
CONFIG_HAVE_ARCH_KFENCE=y
CONFIG_KFENCE=y
CONFIG_KFENCE_SAMPLE_INTERVAL=100
CONFIG_KFENCE_NUM_OBJECTS=255
# CONFIG_KFENCE_DEFERRABLE is not set
# CONFIG_KFENCE_STATIC_KEYS is not set
CONFIG_KFENCE_STRESS_TEST_FAULTS=0
CONFIG_KFENCE_KUNIT_TEST=y
root-> cat .out-ppc/.config |grep -i KCORE
CONFIG_PROC_KCORE=y
root-> cat .out-ppc/.config |grep -i KUNIT
CONFIG_KFENCE_KUNIT_TEST=y
CONFIG_KUNIT=y
CONFIG_KUNIT_DEFAULT_ENABLED=y
Then doing running dd like below can hit the issue. Maybe let it run for
few mins and see?
~ # dd if=/proc/kcore of=/dev/null bs=1M
Otherwise running this kfence kunit test also can reproduce the same
bug [1]. Above configs have kfence kunit config shown as well which will
run during boot time itself.
[1]: https://lore.kernel.org/linuxppc-dev/210e561f7845697a32de44b643393890f180069f.1729272697.git.ritesh.list@gmail.com/
Note: This was originally reported internally in which the tester was
doing - perf test 'Object code reading' [2]
[2]: https://github.com/torvalds/linux/blob/master/tools/perf/tests/code-reading.c#L737
Thanks for looking into this. Let me know if this helped.
-ritesh
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash)
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
` (11 preceding siblings ...)
2024-10-18 17:29 ` [PATCH v3 12/12] book3s64/hash: Early detect debug_pagealloc size requirement Ritesh Harjani (IBM)
@ 2024-11-07 8:42 ` Michael Ellerman
12 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2024-11-07 8:42 UTC (permalink / raw)
To: linuxppc-dev, Ritesh Harjani (IBM)
Cc: kasan-dev, linux-mm, Marco Elver, Alexander Potapenko,
Heiko Carstens, Michael Ellerman, Nicholas Piggin,
Madhavan Srinivasan, Christophe Leroy, Hari Bathini,
Aneesh Kumar K . V, Donet Tom, Pavithra Prakash, LKML
On Fri, 18 Oct 2024 22:59:41 +0530, Ritesh Harjani (IBM) wrote:
> v2 -> v3:
> ============
> 1. Addressed review comments from Christophe in patch-1: To check for
> is_kfence_address before doing search in exception tables.
> (Thanks for the review!)
>
> 2. Separate out patch-1, which will need a separate tree for inclusion and
> review from kfence/kasan folks since it's a kfence kunit test.
>
> [...]
Applied to powerpc/next.
[01/12] powerpc: mm/fault: Fix kfence page fault reporting
https://git.kernel.org/powerpc/c/06dbbb4d5f7126b6307ab807cbf04ecfc459b933
[02/12] book3s64/hash: Remove kfence support temporarily
https://git.kernel.org/powerpc/c/47780e7eae783674b557cc16cf6852c0ce9dbbe9
[03/12] book3s64/hash: Refactor kernel linear map related calls
https://git.kernel.org/powerpc/c/8b1085523fd22bf29a097d53c669a7dcf017d5ea
[04/12] book3s64/hash: Add hash_debug_pagealloc_add_slot() function
https://git.kernel.org/powerpc/c/cc5734481b3c24ddee1551f9732d743453bca010
[05/12] book3s64/hash: Add hash_debug_pagealloc_alloc_slots() function
https://git.kernel.org/powerpc/c/ff8631cdc23ad42f662a8510c57aeb0555ac3d5f
[06/12] book3s64/hash: Refactor hash__kernel_map_pages() function
https://git.kernel.org/powerpc/c/43919f4154bebbef0a0d3004f1b022643d21082c
[07/12] book3s64/hash: Make kernel_map_linear_page() generic
https://git.kernel.org/powerpc/c/685d942d00d8b0edf8431869028e23eac6cc4bab
[08/12] book3s64/hash: Disable debug_pagealloc if it requires more memory
https://git.kernel.org/powerpc/c/47dd2e63d42a7a1b0a9c374d3a236f58b97c19e6
[09/12] book3s64/hash: Add kfence functionality
https://git.kernel.org/powerpc/c/8fec58f503b296af87ffca3898965e3054f2b616
[10/12] book3s64/radix: Refactoring common kfence related functions
https://git.kernel.org/powerpc/c/b5fbf7e2c6a403344e83139a14322f0c42911f2d
[11/12] book3s64/hash: Disable kfence if not early init
https://git.kernel.org/powerpc/c/76b7d6463fc504ac266472f5948b83902dfca4c6
[12/12] book3s64/hash: Early detect debug_pagealloc size requirement
https://git.kernel.org/powerpc/c/8846d9683884fa9ef5bb160011a748701216e186
cheers
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-11-07 8:45 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-18 17:29 [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 01/12] powerpc: mm/fault: Fix kfence page fault reporting Ritesh Harjani (IBM)
2024-10-18 17:40 ` Christophe Leroy
2024-10-22 2:42 ` Michael Ellerman
2024-10-22 3:09 ` Ritesh Harjani
2024-10-18 17:29 ` [PATCH v3 02/12] book3s64/hash: Remove kfence support temporarily Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 03/12] book3s64/hash: Refactor kernel linear map related calls Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 04/12] book3s64/hash: Add hash_debug_pagealloc_add_slot() function Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 05/12] book3s64/hash: Add hash_debug_pagealloc_alloc_slots() function Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 06/12] book3s64/hash: Refactor hash__kernel_map_pages() function Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 07/12] book3s64/hash: Make kernel_map_linear_page() generic Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 08/12] book3s64/hash: Disable debug_pagealloc if it requires more memory Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 09/12] book3s64/hash: Add kfence functionality Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 10/12] book3s64/radix: Refactoring common kfence related functions Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 11/12] book3s64/hash: Disable kfence if not early init Ritesh Harjani (IBM)
2024-10-18 17:29 ` [PATCH v3 12/12] book3s64/hash: Early detect debug_pagealloc size requirement Ritesh Harjani (IBM)
2024-11-07 8:42 ` [PATCH v3 00/12] powerpc/kfence: Improve kfence support (mainly Hash) Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox