* [PATCH v3 00/12] Cover a guard gap corner case
@ 2024-03-12 22:28 Rick Edgecombe
2024-03-12 22:28 ` [PATCH v3 01/12] mm: Switch mm->get_unmapped_area() to a flag Rick Edgecombe
` (11 more replies)
0 siblings, 12 replies; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe
Hi,
For v3, the change is in the struct vm_unmapped_area_info zeroing patches.
Per discussion[0], they are switched to a method of intializing the struct
at the callers that also doesn't leave useless statements as cleanup, but
is a bit easier to manually inspect for bugs. The arch's that acked the
old versions are left separate. What's left after that happens in a
treewide change.
It seems like a more straightforward change now, but I would still
appreciate if anyone can double check the treewide change.
Also, rebase to v6.8.
[0] https://lore.kernel.org/lkml/e617dea592ec336e991c4362e48cd8c648ba7b49.camel@intel.com/
v2:
https://lore.kernel.org/lkml/20240226190951.3240433-1-rick.p.edgecombe@intel.com/
v1:
https://lore.kernel.org/lkml/20240215231332.1556787-1-rick.p.edgecombe@intel.com/
=======
In working on x86’s shadow stack feature, I came across some limitations
around the kernel’s handling of guard gaps. AFAICT these limitations are
not too important for the traditional stack usage of guard gaps, but have
bigger impact on shadow stack’s usage. And now in addition to x86, we have
two other architectures implementing shadow stack like features that plan
to use guard gaps. I wanted to see about addressing them, but I have not
worked on mmap() placement related code before, so would greatly
appreciate if people could take a look and point me in the right
direction.
The nature of the limitations of concern is as follows. In order to ensure
guard gaps between mappings, mmap() would need to consider two things:
1. That the new mapping isn’t placed in an any existing mapping’s guard
gap.
2. That the new mapping isn’t placed such that any existing mappings are
not in *its* guard gaps
Currently mmap never considers (2), and (1) is not considered in some
situations.
When not passing an address hint, or passing one without
MAP_FIXED_NOREPLACE, (1) is enforced. With MAP_FIXED_NOREPLACE, (1) is not
enforced. With MAP_FIXED, (1) is not considered, but this seems to be
expected since MAP_FIXED can already clobber existing mappings. For
MAP_FIXED_NOREPLACE I would have guessed it should respect the guard gaps
of existing mappings, but it is probably a little ambiguous.
In this series I just tried to add enforcement of (2) for the normal (no
address hint) case and only for the newer shadow stack memory (not
stacks). The reason is that with the no-address-hint situation, landing
next to a guard gap could come up naturally and so be more influencable by
attackers such that two shadow stacks could be adjacent without a guard
gap. Where as the address-hint scenarios would require more control -
being able to call mmap() with specific arguments. As for why not just fix
the other corner cases anyway, I thought it might have some greater
possibility of affecting existing apps.
Thanks,
Rick
Rick Edgecombe (12):
mm: Switch mm->get_unmapped_area() to a flag
mm: Introduce arch_get_unmapped_area_vmflags()
mm: Use get_unmapped_area_vmflags()
thp: Add thp_get_unmapped_area_vmflags()
csky: Use initializer for struct vm_unmapped_area_info
parisc: Use initializer for struct vm_unmapped_area_info
powerpc: Use initializer for struct vm_unmapped_area_info
treewide: Use initializer for struct vm_unmapped_area_info
mm: Take placement mappings gap into account
x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
x86/mm: Care about shadow stack guard gap during placement
selftests/x86: Add placement guard gap test for shstk
arch/alpha/kernel/osf_sys.c | 5 +-
arch/arc/mm/mmap.c | 4 +-
arch/arm/mm/mmap.c | 5 +-
arch/csky/abiv1/mmap.c | 12 +-
arch/loongarch/mm/mmap.c | 3 +-
arch/mips/mm/mmap.c | 3 +-
arch/parisc/kernel/sys_parisc.c | 6 +-
arch/powerpc/mm/book3s64/slice.c | 23 ++--
arch/s390/mm/hugetlbpage.c | 9 +-
arch/s390/mm/mmap.c | 15 +--
arch/sh/mm/mmap.c | 5 +-
arch/sparc/kernel/sys_sparc_32.c | 3 +-
arch/sparc/kernel/sys_sparc_64.c | 20 ++--
arch/sparc/mm/hugetlbpage.c | 9 +-
arch/x86/include/asm/pgtable_64.h | 1 +
arch/x86/kernel/cpu/sgx/driver.c | 2 +-
arch/x86/kernel/sys_x86_64.c | 42 +++++--
arch/x86/mm/hugetlbpage.c | 9 +-
arch/x86/mm/mmap.c | 4 +-
drivers/char/mem.c | 2 +-
drivers/dax/device.c | 6 +-
fs/hugetlbfs/inode.c | 11 +-
fs/proc/inode.c | 15 +--
fs/ramfs/file-mmu.c | 2 +-
include/linux/huge_mm.h | 11 ++
include/linux/mm.h | 12 +-
include/linux/mm_types.h | 6 +-
include/linux/sched/coredump.h | 5 +-
include/linux/sched/mm.h | 22 ++++
io_uring/io_uring.c | 2 +-
mm/debug.c | 6 -
mm/huge_memory.c | 26 +++--
mm/mmap.c | 106 +++++++++++++-----
mm/shmem.c | 11 +-
mm/util.c | 6 +-
.../testing/selftests/x86/test_shadow_stack.c | 67 ++++++++++-
36 files changed, 319 insertions(+), 177 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 01/12] mm: Switch mm->get_unmapped_area() to a flag
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-13 7:19 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 02/12] mm: Introduce arch_get_unmapped_area_vmflags() Rick Edgecombe
` (10 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe
The mm_struct contains a function pointer *get_unmapped_area(), which
is set to either arch_get_unmapped_area() or
arch_get_unmapped_area_topdown() during the initialization of the mm.
Since the function pointer only ever points to two functions that are named
the same across all arch's, a function pointer is not really required. In
addition future changes will want to add versions of the functions that
take additional arguments. So to save a pointers worth of bytes in
mm_struct, and prevent adding additional function pointers to mm_struct in
future changes, remove it and keep the information about which
get_unmapped_area() to use in a flag.
Add the new flag to MMF_INIT_MASK so it doesn't get clobbered on fork by
mmf_init_flags(). Most MM flags get clobbered on fork. In the pre-existing
behavior mm->get_unmapped_area() would get copied to the new mm in
dup_mm(), so not clobbering the flag preserves the existing behavior
around inheriting the topdown-ness.
Introduce a helper, mm_get_unmapped_area(), to easily convert code that
refers to the old function pointer to instead select and call either
arch_get_unmapped_area() or arch_get_unmapped_area_topdown() based on the
flag. Then drop the mm->get_unmapped_area() function pointer. Leave the
get_unmapped_area() pointer in struct file_operations alone. The main
purpose of this change is to reorganize in preparation for future changes,
but it also converts the calls of mm->get_unmapped_area() from indirect
branches into a direct ones.
The stress-ng bigheap benchmark calls realloc a lot, which calls through
get_unmapped_area() in the kernel. On x86, the change yielded a ~1%
improvement there on a retpoline config.
In testing a few x86 configs, removing the pointer unfortunately didn't
result in any actual size reductions in the compiled layout of mm_struct.
But depending on compiler or arch alignment requirements, the change could
shrink the size of mm_struct.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
v3:
- Fix comment that still referred to mm->get_unmapped_area()
- Resolve trivial rebase conflicts with "mm: thp_get_unmapped_area must
honour topdown preference"
- Spelling fix in log
v2:
- Fix comment on MMF_TOPDOWN (Kirill, rppt)
- Move MMF_TOPDOWN to actually unused bit
- Add MMF_TOPDOWN to MMF_INIT_MASK so it doesn't get clobbered on fork,
and result in the children using the search up path.
- New lower performance results after above bug fix
- Add Reviews and Acks
---
arch/s390/mm/hugetlbpage.c | 2 +-
arch/s390/mm/mmap.c | 4 ++--
arch/sparc/kernel/sys_sparc_64.c | 15 ++++++---------
arch/sparc/mm/hugetlbpage.c | 2 +-
arch/x86/kernel/cpu/sgx/driver.c | 2 +-
arch/x86/mm/hugetlbpage.c | 2 +-
arch/x86/mm/mmap.c | 4 ++--
drivers/char/mem.c | 2 +-
drivers/dax/device.c | 6 +++---
fs/hugetlbfs/inode.c | 4 ++--
fs/proc/inode.c | 15 ++++++++-------
fs/ramfs/file-mmu.c | 2 +-
include/linux/mm_types.h | 6 +-----
include/linux/sched/coredump.h | 5 ++++-
include/linux/sched/mm.h | 5 +++++
io_uring/io_uring.c | 2 +-
mm/debug.c | 6 ------
mm/huge_memory.c | 9 ++++-----
mm/mmap.c | 21 ++++++++++++++++++---
mm/shmem.c | 11 +++++------
mm/util.c | 6 +++---
21 files changed, 70 insertions(+), 61 deletions(-)
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 297a6d897d5a..c2d2850ec8d5 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -328,7 +328,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
goto check_asce_limit;
}
- if (mm->get_unmapped_area == arch_get_unmapped_area)
+ if (!test_bit(MMF_TOPDOWN, &mm->flags))
addr = hugetlb_get_unmapped_area_bottomup(file, addr, len,
pgoff, flags);
else
diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index fc9a7dc26c5e..cd52d72b59cf 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -182,10 +182,10 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
*/
if (mmap_is_legacy(rlim_stack)) {
mm->mmap_base = mmap_base_legacy(random_factor);
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
} else {
mm->mmap_base = mmap_base(random_factor, rlim_stack);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
}
}
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index 1e9a9e016237..1dbf7211666e 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -218,14 +218,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, unsigned long len, unsigned long pgoff, unsigned long flags)
{
unsigned long align_goal, addr = -ENOMEM;
- unsigned long (*get_area)(struct file *, unsigned long,
- unsigned long, unsigned long, unsigned long);
-
- get_area = current->mm->get_unmapped_area;
if (flags & MAP_FIXED) {
/* Ok, don't mess with it. */
- return get_area(NULL, orig_addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, NULL, orig_addr, len, pgoff, flags);
}
flags &= ~MAP_SHARED;
@@ -238,7 +234,8 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u
align_goal = (64UL * 1024);
do {
- addr = get_area(NULL, orig_addr, len + (align_goal - PAGE_SIZE), pgoff, flags);
+ addr = mm_get_unmapped_area(current->mm, NULL, orig_addr,
+ len + (align_goal - PAGE_SIZE), pgoff, flags);
if (!(addr & ~PAGE_MASK)) {
addr = (addr + (align_goal - 1UL)) & ~(align_goal - 1UL);
break;
@@ -256,7 +253,7 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u
* be obtained.
*/
if (addr & ~PAGE_MASK)
- addr = get_area(NULL, orig_addr, len, pgoff, flags);
+ addr = mm_get_unmapped_area(current->mm, NULL, orig_addr, len, pgoff, flags);
return addr;
}
@@ -292,7 +289,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
gap == RLIM_INFINITY ||
sysctl_legacy_va_layout) {
mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
} else {
/* We know it's 32-bit */
unsigned long task_size = STACK_TOP32;
@@ -303,7 +300,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
gap = (task_size / 6 * 5);
mm->mmap_base = PAGE_ALIGN(task_size - gap - random_factor);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
}
}
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index b432500c13a5..38a1bef47efb 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -123,7 +123,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
(!vma || addr + len <= vm_start_gap(vma)))
return addr;
}
- if (mm->get_unmapped_area == arch_get_unmapped_area)
+ if (!test_bit(MMF_TOPDOWN, &mm->flags))
return hugetlb_get_unmapped_area_bottomup(file, addr, len,
pgoff, flags);
else
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 262f5fb18d74..22b65a5f5ec6 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -113,7 +113,7 @@ static unsigned long sgx_get_unmapped_area(struct file *file,
if (flags & MAP_FIXED)
return addr;
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
}
#ifdef CONFIG_COMPAT
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 5804bbae4f01..6d77c0039617 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -141,7 +141,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
}
get_unmapped_area:
- if (mm->get_unmapped_area == arch_get_unmapped_area)
+ if (!test_bit(MMF_TOPDOWN, &mm->flags))
return hugetlb_get_unmapped_area_bottomup(file, addr, len,
pgoff, flags);
else
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index c90c20904a60..a2cabb1c81e1 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -129,9 +129,9 @@ static void arch_pick_mmap_base(unsigned long *base, unsigned long *legacy_base,
void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
{
if (mmap_is_legacy())
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
else
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
arch_pick_mmap_base(&mm->mmap_base, &mm->mmap_legacy_base,
arch_rnd(mmap64_rnd_bits), task_size_64bit(0),
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 3c6670cf905f..9b80e622ae80 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -544,7 +544,7 @@ static unsigned long get_unmapped_area_zero(struct file *file,
}
/* Otherwise flags & MAP_PRIVATE: with no shmem object beneath it */
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
#else
return -ENOSYS;
#endif
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 93ebedc5ec8c..47c126d37b59 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -329,14 +329,14 @@ static unsigned long dax_get_unmapped_area(struct file *filp,
if ((off + len_align) < off)
goto out;
- addr_align = current->mm->get_unmapped_area(filp, addr, len_align,
- pgoff, flags);
+ addr_align = mm_get_unmapped_area(current->mm, filp, addr, len_align,
+ pgoff, flags);
if (!IS_ERR_VALUE(addr_align)) {
addr_align += (off - addr_align) & (align - 1);
return addr_align;
}
out:
- return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
}
static const struct address_space_operations dev_dax_aops = {
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index d746866ae3b6..cd87ea5944a1 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -249,11 +249,11 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
}
/*
- * Use mm->get_unmapped_area value as a hint to use topdown routine.
+ * Use MMF_TOPDOWN flag as a hint to use topdown routine.
* If architectures have special needs, they should define their own
* version of hugetlb_get_unmapped_area.
*/
- if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
+ if (test_bit(MMF_TOPDOWN, &mm->flags))
return hugetlb_get_unmapped_area_topdown(file, addr, len,
pgoff, flags);
return hugetlb_get_unmapped_area_bottomup(file, addr, len,
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 05350f3c2812..017144a8516c 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -451,15 +451,16 @@ pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned lo
unsigned long len, unsigned long pgoff,
unsigned long flags)
{
- typeof_member(struct proc_ops, proc_get_unmapped_area) get_area;
-
- get_area = pde->proc_ops->proc_get_unmapped_area;
+ if (pde->proc_ops->proc_get_unmapped_area)
+ return pde->proc_ops->proc_get_unmapped_area(file, orig_addr,
+ len, pgoff,
+ flags);
#ifdef CONFIG_MMU
- if (!get_area)
- get_area = current->mm->get_unmapped_area;
+ else
+ return mm_get_unmapped_area(current->mm, file, orig_addr,
+ len, pgoff, flags);
#endif
- if (get_area)
- return get_area(file, orig_addr, len, pgoff, flags);
+
return orig_addr;
}
diff --git a/fs/ramfs/file-mmu.c b/fs/ramfs/file-mmu.c
index c7a1aa3c882b..b45c7edc3225 100644
--- a/fs/ramfs/file-mmu.c
+++ b/fs/ramfs/file-mmu.c
@@ -35,7 +35,7 @@ static unsigned long ramfs_mmu_get_unmapped_area(struct file *file,
unsigned long addr, unsigned long len, unsigned long pgoff,
unsigned long flags)
{
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
}
const struct file_operations ramfs_file_operations = {
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8b611e13153e..d20869881214 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -749,11 +749,7 @@ struct mm_struct {
} ____cacheline_aligned_in_smp;
struct maple_tree mm_mt;
-#ifdef CONFIG_MMU
- unsigned long (*get_unmapped_area) (struct file *filp,
- unsigned long addr, unsigned long len,
- unsigned long pgoff, unsigned long flags);
-#endif
+
unsigned long mmap_base; /* base of mmap area */
unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 02f5090ffea2..e62ff805cfc9 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -92,9 +92,12 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_VM_MERGE_ANY 30
#define MMF_VM_MERGE_ANY_MASK (1 << MMF_VM_MERGE_ANY)
+#define MMF_TOPDOWN 31 /* mm searches top down by default */
+#define MMF_TOPDOWN_MASK (1 << MMF_TOPDOWN)
+
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\
- MMF_VM_MERGE_ANY_MASK)
+ MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK)
static inline unsigned long mmf_init_flags(unsigned long flags)
{
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 9a19f1b42f64..cde946e926d8 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -8,6 +8,7 @@
#include <linux/mm_types.h>
#include <linux/gfp.h>
#include <linux/sync_core.h>
+#include <linux/sched/coredump.h>
/*
* Routines for handling mm_structs
@@ -186,6 +187,10 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff,
unsigned long flags);
+unsigned long mm_get_unmapped_area(struct mm_struct *mm, struct file *filp,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags);
+
unsigned long
generic_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index cd9a137ad6ce..9eb3b2587031 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3513,7 +3513,7 @@ static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
#else
addr = 0UL;
#endif
- return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
}
#else /* !CONFIG_MMU */
diff --git a/mm/debug.c b/mm/debug.c
index ee533a5ceb79..32db5de8e1e7 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -162,9 +162,6 @@ EXPORT_SYMBOL(dump_vma);
void dump_mm(const struct mm_struct *mm)
{
pr_emerg("mm %px task_size %lu\n"
-#ifdef CONFIG_MMU
- "get_unmapped_area %px\n"
-#endif
"mmap_base %lu mmap_legacy_base %lu\n"
"pgd %px mm_users %d mm_count %d pgtables_bytes %lu map_count %d\n"
"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
@@ -190,9 +187,6 @@ void dump_mm(const struct mm_struct *mm)
"def_flags: %#lx(%pGv)\n",
mm, mm->task_size,
-#ifdef CONFIG_MMU
- mm->get_unmapped_area,
-#endif
mm->mmap_base, mm->mmap_legacy_base,
mm->pgd, atomic_read(&mm->mm_users),
atomic_read(&mm->mm_count),
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 94c958f7ebb5..bc3bf441e768 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -822,8 +822,8 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
if (len_pad < len || (off + len_pad) < off)
return 0;
- ret = current->mm->get_unmapped_area(filp, addr, len_pad,
- off >> PAGE_SHIFT, flags);
+ ret = mm_get_unmapped_area(current->mm, filp, addr, len_pad,
+ off >> PAGE_SHIFT, flags);
/*
* The failure might be due to length padding. The caller will retry
@@ -841,8 +841,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
off_sub = (off - ret) & (size - 1);
- if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
- !off_sub)
+ if (test_bit(MMF_TOPDOWN, ¤t->mm->flags) && !off_sub)
return ret + size;
ret += off_sub;
@@ -859,7 +858,7 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
if (ret)
return ret;
- return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
}
EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
diff --git a/mm/mmap.c b/mm/mmap.c
index 3281287771c9..39e9a3ae3ca5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1815,7 +1815,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
{
unsigned long (*get_area)(struct file *, unsigned long,
- unsigned long, unsigned long, unsigned long);
+ unsigned long, unsigned long, unsigned long)
+ = NULL;
unsigned long error = arch_mmap_check(addr, len, flags);
if (error)
@@ -1825,7 +1826,6 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
if (len > TASK_SIZE)
return -ENOMEM;
- get_area = current->mm->get_unmapped_area;
if (file) {
if (file->f_op->get_unmapped_area)
get_area = file->f_op->get_unmapped_area;
@@ -1844,7 +1844,11 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
if (!file)
pgoff = 0;
- addr = get_area(file, addr, len, pgoff, flags);
+ if (get_area)
+ addr = get_area(file, addr, len, pgoff, flags);
+ else
+ addr = mm_get_unmapped_area(current->mm, file, addr, len,
+ pgoff, flags);
if (IS_ERR_VALUE(addr))
return addr;
@@ -1859,6 +1863,17 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
EXPORT_SYMBOL(get_unmapped_area);
+unsigned long
+mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags)
+{
+ if (test_bit(MMF_TOPDOWN, &mm->flags))
+ return arch_get_unmapped_area_topdown(file, addr, len, pgoff, flags);
+ return arch_get_unmapped_area(file, addr, len, pgoff, flags);
+}
+EXPORT_SYMBOL(mm_get_unmapped_area);
+
/**
* find_vma_intersection() - Look up the first VMA which intersects the interval
* @mm: The process address space.
diff --git a/mm/shmem.c b/mm/shmem.c
index d7c84ff62186..5452065faa46 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2240,8 +2240,6 @@ unsigned long shmem_get_unmapped_area(struct file *file,
unsigned long uaddr, unsigned long len,
unsigned long pgoff, unsigned long flags)
{
- unsigned long (*get_area)(struct file *,
- unsigned long, unsigned long, unsigned long, unsigned long);
unsigned long addr;
unsigned long offset;
unsigned long inflated_len;
@@ -2251,8 +2249,8 @@ unsigned long shmem_get_unmapped_area(struct file *file,
if (len > TASK_SIZE)
return -ENOMEM;
- get_area = current->mm->get_unmapped_area;
- addr = get_area(file, uaddr, len, pgoff, flags);
+ addr = mm_get_unmapped_area(current->mm, file, uaddr, len, pgoff,
+ flags);
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
return addr;
@@ -2309,7 +2307,8 @@ unsigned long shmem_get_unmapped_area(struct file *file,
if (inflated_len < len)
return addr;
- inflated_addr = get_area(NULL, uaddr, inflated_len, 0, flags);
+ inflated_addr = mm_get_unmapped_area(current->mm, NULL, uaddr,
+ inflated_len, 0, flags);
if (IS_ERR_VALUE(inflated_addr))
return addr;
if (inflated_addr & ~PAGE_MASK)
@@ -4755,7 +4754,7 @@ unsigned long shmem_get_unmapped_area(struct file *file,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
{
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
}
#endif
diff --git a/mm/util.c b/mm/util.c
index 5a6a9802583b..2b959553f9ce 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -452,17 +452,17 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
if (mmap_is_legacy(rlim_stack)) {
mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
} else {
mm->mmap_base = mmap_base(random_factor, rlim_stack);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
}
}
#elif defined(CONFIG_MMU) && !defined(HAVE_ARCH_PICK_MMAP_LAYOUT)
void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
{
mm->mmap_base = TASK_UNMAPPED_BASE;
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
}
#endif
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 02/12] mm: Introduce arch_get_unmapped_area_vmflags()
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
2024-03-12 22:28 ` [PATCH v3 01/12] mm: Switch mm->get_unmapped_area() to a flag Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-13 7:22 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 03/12] mm: Use get_unmapped_area_vmflags() Rick Edgecombe
` (9 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
In order to take the start gap into account, the maple tree search needs
to know the size of start gap the new mapping will need. The call chain
from do_mmap() to the actual maple tree search looks like this:
do_mmap(size, vm_flags, map_flags, ..)
mm/mmap.c:get_unmapped_area(size, map_flags, ...)
arch_get_unmapped_area(size, map_flags, ...)
vm_unmapped_area(struct vm_unmapped_area_info)
One option would be to add another MAP_ flag to mean a one page start gap
(as is for shadow stack), but this consumes a flag unnecessarily. Another
option could be to simply increase the size passed in do_mmap() by the
start gap size, and adjust after the fact, but this will interfere with
the alignment requirements passed in struct vm_unmapped_area_info, and
unknown to mmap.c. Instead, introduce variants of
arch_get_unmapped_area/_topdown() that take vm_flags. In future changes,
these variants can be used in mmap.c:get_unmapped_area() to allow the
vm_flags to be passed through to vm_unmapped_area(), while preserving the
normal arch_get_unmapped_area/_topdown() for the existing callers.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
include/linux/sched/mm.h | 17 +++++++++++++++++
mm/mmap.c | 28 ++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index cde946e926d8..7b44441865c5 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -191,6 +191,23 @@ unsigned long mm_get_unmapped_area(struct mm_struct *mm, struct file *filp,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags);
+extern unsigned long
+arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags);
+extern unsigned long
+arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t);
+
+unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm,
+ struct file *filp,
+ unsigned long addr,
+ unsigned long len,
+ unsigned long pgoff,
+ unsigned long flags,
+ vm_flags_t vm_flags);
+
unsigned long
generic_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff,
diff --git a/mm/mmap.c b/mm/mmap.c
index 39e9a3ae3ca5..e23ce8ca24c9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1810,6 +1810,34 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
}
#endif
+#ifndef HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
+extern unsigned long
+arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
+{
+ return arch_get_unmapped_area(filp, addr, len, pgoff, flags);
+}
+
+extern unsigned long
+arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags)
+{
+ return arch_get_unmapped_area_topdown(filp, addr, len, pgoff, flags);
+}
+#endif
+
+unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *filp,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags,
+ vm_flags_t vm_flags)
+{
+ if (test_bit(MMF_TOPDOWN, &mm->flags))
+ return arch_get_unmapped_area_topdown_vmflags(filp, addr, len, pgoff,
+ flags, vm_flags);
+ return arch_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, vm_flags);
+}
+
unsigned long
get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 03/12] mm: Use get_unmapped_area_vmflags()
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
2024-03-12 22:28 ` [PATCH v3 01/12] mm: Switch mm->get_unmapped_area() to a flag Rick Edgecombe
2024-03-12 22:28 ` [PATCH v3 02/12] mm: Introduce arch_get_unmapped_area_vmflags() Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-13 9:05 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 04/12] thp: Add thp_get_unmapped_area_vmflags() Rick Edgecombe
` (8 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Use mm_get_unmapped_area_vmflags() in the do_mmap() so future changes
can cause shadow stack mappings to be placed with a guard gap. Also use
the THP variant that takes vm_flags, such that THP shadow stack can get the
same treatment. Adjust the vm_flags calculation to happen earlier so that
the vm_flags can be passed into __get_unmapped_area().
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
v2:
- Make get_unmapped_area() a static inline (Kirill)
---
include/linux/mm.h | 11 ++++++++++-
mm/mmap.c | 34 ++++++++++++++++------------------
2 files changed, 26 insertions(+), 19 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f5a97dec5169..d91cde79aaee 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3363,7 +3363,16 @@ extern int install_special_mapping(struct mm_struct *mm,
unsigned long randomize_stack_top(unsigned long stack_top);
unsigned long randomize_page(unsigned long start, unsigned long range);
-extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
+unsigned long
+__get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags);
+
+static inline unsigned long
+get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags)
+{
+ return __get_unmapped_area(file, addr, len, pgoff, flags, 0);
+}
extern unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
diff --git a/mm/mmap.c b/mm/mmap.c
index e23ce8ca24c9..a3128ed26676 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1257,18 +1257,6 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
if (mm->map_count > sysctl_max_map_count)
return -ENOMEM;
- /* Obtain the address to map to. we verify (or select) it and ensure
- * that it represents a valid section of the address space.
- */
- addr = get_unmapped_area(file, addr, len, pgoff, flags);
- if (IS_ERR_VALUE(addr))
- return addr;
-
- if (flags & MAP_FIXED_NOREPLACE) {
- if (find_vma_intersection(mm, addr, addr + len))
- return -EEXIST;
- }
-
if (prot == PROT_EXEC) {
pkey = execute_only_pkey(mm);
if (pkey < 0)
@@ -1282,6 +1270,18 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
+ /* Obtain the address to map to. we verify (or select) it and ensure
+ * that it represents a valid section of the address space.
+ */
+ addr = __get_unmapped_area(file, addr, len, pgoff, flags, vm_flags);
+ if (IS_ERR_VALUE(addr))
+ return addr;
+
+ if (flags & MAP_FIXED_NOREPLACE) {
+ if (find_vma_intersection(mm, addr, addr + len))
+ return -EEXIST;
+ }
+
if (flags & MAP_LOCKED)
if (!can_do_mlock())
return -EPERM;
@@ -1839,8 +1839,8 @@ unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *fi
}
unsigned long
-get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
- unsigned long pgoff, unsigned long flags)
+__get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
{
unsigned long (*get_area)(struct file *, unsigned long,
unsigned long, unsigned long, unsigned long)
@@ -1875,8 +1875,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
if (get_area)
addr = get_area(file, addr, len, pgoff, flags);
else
- addr = mm_get_unmapped_area(current->mm, file, addr, len,
- pgoff, flags);
+ addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
+ pgoff, flags, vm_flags);
if (IS_ERR_VALUE(addr))
return addr;
@@ -1889,8 +1889,6 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
return error ? error : addr;
}
-EXPORT_SYMBOL(get_unmapped_area);
-
unsigned long
mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
unsigned long addr, unsigned long len,
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 04/12] thp: Add thp_get_unmapped_area_vmflags()
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
` (2 preceding siblings ...)
2024-03-12 22:28 ` [PATCH v3 03/12] mm: Use get_unmapped_area_vmflags() Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-13 9:04 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 05/12] csky: Use initializer for struct vm_unmapped_area_info Rick Edgecombe
` (7 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Add a THP implementations of the vm_flags variant of get_unmapped_area().
Future changes will call this from mmap.c in the do_mmap() path to allow
shadow stacks to be placed with consideration taken for the start guard
gap. Shadow stack memory is always private and anonymous and so special
guard gap logic is not needed in a lot of caseis, but it can be mapped by
THP, so needs to be handled.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
include/linux/huge_mm.h | 11 +++++++++++
mm/huge_memory.c | 23 ++++++++++++++++-------
mm/mmap.c | 12 +++++++-----
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 5adb86af35fc..8744c808d380 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -262,6 +262,9 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags);
+unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags,
+ vm_flags_t vm_flags);
void folio_prep_large_rmappable(struct folio *folio);
bool can_split_folio(struct folio *folio, int *pextra_pins);
@@ -416,6 +419,14 @@ static inline void folio_prep_large_rmappable(struct folio *folio) {}
#define thp_get_unmapped_area NULL
+static inline unsigned long
+thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags)
+{
+ return 0;
+}
+
static inline bool
can_split_folio(struct folio *folio, int *pextra_pins)
{
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bc3bf441e768..349c93a1a7c3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -806,7 +806,8 @@ static inline bool is_transparent_hugepage(struct folio *folio)
static unsigned long __thp_get_unmapped_area(struct file *filp,
unsigned long addr, unsigned long len,
- loff_t off, unsigned long flags, unsigned long size)
+ loff_t off, unsigned long flags, unsigned long size,
+ vm_flags_t vm_flags)
{
loff_t off_end = off + len;
loff_t off_align = round_up(off, size);
@@ -822,8 +823,8 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
if (len_pad < len || (off + len_pad) < off)
return 0;
- ret = mm_get_unmapped_area(current->mm, filp, addr, len_pad,
- off >> PAGE_SHIFT, flags);
+ ret = mm_get_unmapped_area_vmflags(current->mm, filp, addr, len_pad,
+ off >> PAGE_SHIFT, flags, vm_flags);
/*
* The failure might be due to length padding. The caller will retry
@@ -848,17 +849,25 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
return ret;
}
-unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
- unsigned long len, unsigned long pgoff, unsigned long flags)
+unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags,
+ vm_flags_t vm_flags)
{
unsigned long ret;
loff_t off = (loff_t)pgoff << PAGE_SHIFT;
- ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE);
+ ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags);
if (ret)
return ret;
- return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area_vmflags(current->mm, filp, addr, len, pgoff, flags,
+ vm_flags);
+}
+
+unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+ return thp_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0);
}
EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
diff --git a/mm/mmap.c b/mm/mmap.c
index a3128ed26676..68381b90f906 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1863,20 +1863,22 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
* so use shmem's get_unmapped_area in case it can be huge.
*/
get_area = shmem_get_unmapped_area;
- } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
- /* Ensures that larger anonymous mappings are THP aligned. */
- get_area = thp_get_unmapped_area;
}
/* Always treat pgoff as zero for anonymous memory. */
if (!file)
pgoff = 0;
- if (get_area)
+ if (get_area) {
addr = get_area(file, addr, len, pgoff, flags);
- else
+ } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+ /* Ensures that larger anonymous mappings are THP aligned. */
+ addr = thp_get_unmapped_area_vmflags(file, addr, len,
+ pgoff, flags, vm_flags);
+ } else {
addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
pgoff, flags, vm_flags);
+ }
if (IS_ERR_VALUE(addr))
return addr;
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 05/12] csky: Use initializer for struct vm_unmapped_area_info
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
` (3 preceding siblings ...)
2024-03-12 22:28 ` [PATCH v3 04/12] thp: Add thp_get_unmapped_area_vmflags() Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-13 9:04 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 06/12] parisc: " Rick Edgecombe
` (6 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe, Guo Ren, linux-csky
Future changes will need to add a new member to struct
vm_unmapped_area_info. This would cause trouble for any call site that
doesn't initialize the struct. Currently every caller sets each member
manually, so if new members are added they will be uninitialized and the
core code parsing the struct will see garbage in the new member.
It could be possible to initialize the new member manually to 0 at each
call site. This and a couple other options were discussed, and a working
consensus (see links) was that in general the best way to accomplish this
would be via static initialization with designated member initiators.
Having some struct vm_unmapped_area_info instances not zero initialized
will put those sites at risk of feeding garbage into vm_unmapped_area() if
the convention is to zero initialize the struct and any new member addition
misses a call site that initializes each member manually.
It could be possible to leave the code mostly untouched, and just change
the line:
struct vm_unmapped_area_info info
to:
struct vm_unmapped_area_info info = {};
However, that would leave cleanup for the members that are manually set
to zero, as it would no longer be required.
So to be reduce the chance of bugs via uninitialized members, instead
simply continue the process to initialize the struct this way tree wide.
This will zero any unspecified members. Move the member initializers to the
struct declaration when they are known at that time. Leave the members out
that were manually initialized to zero, as this would be redundant for
designated initializers.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
---
v3:
- Fixed spelling errors in log
- Be consistent about field vs member in log
Hi,
This patch was split and refactored out of a tree-wide change [0] to just
zero-init each struct vm_unmapped_area_info. The overall goal of the
series is to help shadow stack guard gaps. Currently, there is only one
arch with shadow stacks, but two more are in progress. It is compile tested
only.
There was further discussion that this method of initializing the structs
while nice in some ways has a greater risk of introducing bugs in some of
the more complicated callers. Since this version was reviewed my arch
maintainers already, leave it as was already acknowledged.
Thanks,
Rick
[0] https://lore.kernel.org/lkml/20240226190951.3240433-6-rick.p.edgecombe@intel.com/
---
arch/csky/abiv1/mmap.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/csky/abiv1/mmap.c b/arch/csky/abiv1/mmap.c
index 6792aca49999..7f826331d409 100644
--- a/arch/csky/abiv1/mmap.c
+++ b/arch/csky/abiv1/mmap.c
@@ -28,7 +28,12 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
int do_align = 0;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {
+ .length = len,
+ .low_limit = mm->mmap_base,
+ .high_limit = TASK_SIZE,
+ .align_offset = pgoff << PAGE_SHIFT
+ };
/*
* We only need to do colour alignment if either the I or D
@@ -61,11 +66,6 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
- info.length = len;
- info.low_limit = mm->mmap_base;
- info.high_limit = TASK_SIZE;
info.align_mask = do_align ? (PAGE_MASK & (SHMLBA - 1)) : 0;
- info.align_offset = pgoff << PAGE_SHIFT;
return vm_unmapped_area(&info);
}
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 06/12] parisc: Use initializer for struct vm_unmapped_area_info
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
` (4 preceding siblings ...)
2024-03-12 22:28 ` [PATCH v3 05/12] csky: Use initializer for struct vm_unmapped_area_info Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-13 9:04 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 07/12] powerpc: " Rick Edgecombe
` (5 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe, Helge Deller,
James E.J. Bottomley, linux-parisc
Future changes will need to add a new member to struct
vm_unmapped_area_info. This would cause trouble for any call site that
doesn't initialize the struct. Currently every caller sets each member
manually, so if new members are added they will be uninitialized and the
core code parsing the struct will see garbage in the new member.
It could be possible to initialize the new member manually to 0 at each
call site. This and a couple other options were discussed, and a working
consensus (see links) was that in general the best way to accomplish this
would be via static initialization with designated member initiators.
Having some struct vm_unmapped_area_info instances not zero initialized
will put those sites at risk of feeding garbage into vm_unmapped_area() if
the convention is to zero initialize the struct and any new member addition
misses a call site that initializes each member manually.
It could be possible to leave the code mostly untouched, and just change
the line:
struct vm_unmapped_area_info info
to:
struct vm_unmapped_area_info info = {};
However, that would leave cleanup for the members that are manually set
to zero, as it would no longer be required.
So to be reduce the chance of bugs via uninitialized members, instead
simply continue the process to initialize the struct this way tree wide.
This will zero any unspecified members. Move the member initializers to the
struct declaration when they are known at that time. Leave the members out
that were manually initialized to zero, as this would be redundant for
designated initializers.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
---
v3:
- Fixed spelling errors in log
- Be consistent about field vs member in log
Hi,
This patch was split and refactored out of a tree-wide change [0] to just
zero-init each struct vm_unmapped_area_info. The overall goal of the
series is to help shadow stack guard gaps. Currently, there is only one
arch with shadow stacks, but two more are in progress. It is compile tested
only.
There was further discussion that this method of initializing the structs
while nice in some ways has a greater risk of introducing bugs in some of
the more complicated callers. Since this version was reviewed my arch
maintainers already, leave it as was already acknowledged.
Thanks,
Rick
[0] https://lore.kernel.org/lkml/20240226190951.3240433-6-rick.p.edgecombe@intel.com/
---
arch/parisc/kernel/sys_parisc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index 98af719d5f85..f7722451276e 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -104,7 +104,9 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
struct vm_area_struct *vma, *prev;
unsigned long filp_pgoff;
int do_color_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {
+ .length = len
+ };
if (unlikely(len > TASK_SIZE))
return -ENOMEM;
@@ -139,7 +141,6 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
return addr;
}
- info.length = len;
info.align_mask = do_color_align ? (PAGE_MASK & (SHM_COLOUR - 1)) : 0;
info.align_offset = shared_align_offset(filp_pgoff, pgoff);
@@ -160,7 +161,6 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
*/
}
- info.flags = 0;
info.low_limit = mm->mmap_base;
info.high_limit = mmap_upper_limit(NULL);
return vm_unmapped_area(&info);
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 07/12] powerpc: Use initializer for struct vm_unmapped_area_info
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
` (5 preceding siblings ...)
2024-03-12 22:28 ` [PATCH v3 06/12] parisc: " Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-13 6:44 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 08/12] treewide: " Rick Edgecombe
` (4 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe, Michael Ellerman,
Nicholas Piggin, Aneesh Kumar K . V, Naveen N . Rao,
linuxppc-dev
Future changes will need to add a new member to struct
vm_unmapped_area_info. This would cause trouble for any call site that
doesn't initialize the struct. Currently every caller sets each member
manually, so if new members are added they will be uninitialized and the
core code parsing the struct will see garbage in the new member.
It could be possible to initialize the new member manually to 0 at each
call site. This and a couple other options were discussed, and a working
consensus (see links) was that in general the best way to accomplish this
would be via static initialization with designated member initiators.
Having some struct vm_unmapped_area_info instances not zero initialized
will put those sites at risk of feeding garbage into vm_unmapped_area() if
the convention is to zero initialize the struct and any new member addition
misses a call site that initializes each member manually.
It could be possible to leave the code mostly untouched, and just change
the line:
struct vm_unmapped_area_info info
to:
struct vm_unmapped_area_info info = {};
However, that would leave cleanup for the members that are manually set
to zero, as it would no longer be required.
So to be reduce the chance of bugs via uninitialized members, instead
simply continue the process to initialize the struct this way tree wide.
This will zero any unspecified members. Move the member initializers to the
struct declaration when they are known at that time. Leave the members out
that were manually initialized to zero, as this would be redundant for
designated initializers.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
---
v3:
- Fixed spelling errors in log
- Be consistent about field vs member in log
Hi,
This patch was split and refactored out of a tree-wide change [0] to just
zero-init each struct vm_unmapped_area_info. The overall goal of the
series is to help shadow stack guard gaps. Currently, there is only one
arch with shadow stacks, but two more are in progress. It is compile tested
only.
There was further discussion that this method of initializing the structs
while nice in some ways has a greater risk of introducing bugs in some of
the more complicated callers. Since this version was reviewed my arch
maintainers already, leave it as was already acknowledged.
Thanks,
Rick
[0] https://lore.kernel.org/lkml/20240226190951.3240433-6-rick.p.edgecombe@intel.com/
---
arch/powerpc/mm/book3s64/slice.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/slice.c b/arch/powerpc/mm/book3s64/slice.c
index c0b58afb9a47..6c7ac8c73a6c 100644
--- a/arch/powerpc/mm/book3s64/slice.c
+++ b/arch/powerpc/mm/book3s64/slice.c
@@ -282,12 +282,12 @@ static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
{
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
unsigned long found, next_end;
- struct vm_unmapped_area_info info;
-
- info.flags = 0;
- info.length = len;
- info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
- info.align_offset = 0;
+ struct vm_unmapped_area_info info = {
+ .flags = 0,
+ .length = len,
+ .align_mask = PAGE_MASK & ((1ul << pshift) - 1),
+ .align_offset = 0
+ };
/*
* Check till the allow max value for this mmap request
*/
@@ -326,13 +326,14 @@ static unsigned long slice_find_area_topdown(struct mm_struct *mm,
{
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
unsigned long found, prev;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {
+ .flags = VM_UNMAPPED_AREA_TOPDOWN,
+ .length = len,
+ .align_mask = PAGE_MASK & ((1ul << pshift) - 1),
+ .align_offset = 0
+ };
unsigned long min_addr = max(PAGE_SIZE, mmap_min_addr);
- info.flags = VM_UNMAPPED_AREA_TOPDOWN;
- info.length = len;
- info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
- info.align_offset = 0;
/*
* If we are trying to allocate above DEFAULT_MAP_WINDOW
* Add the different to the mmap_base.
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 08/12] treewide: Use initializer for struct vm_unmapped_area_info
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
` (6 preceding siblings ...)
2024-03-12 22:28 ` [PATCH v3 07/12] powerpc: " Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-13 3:18 ` Kees Cook
2024-03-12 22:28 ` [PATCH v3 09/12] mm: Take placement mappings gap into account Rick Edgecombe
` (3 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe, linux-alpha,
linux-snps-arc, linux-arm-kernel, linux-csky, loongarch,
linux-mips, linux-s390, linux-sh, sparclinux
Future changes will need to add a new member to struct
vm_unmapped_area_info. This would cause trouble for any call site that
doesn't initialize the struct. Currently every caller sets each member
manually, so if new ones are added they will be uninitialized and the
core code parsing the struct will see garbage in the new member.
It could be possible to initialize the new member manually to 0 at each
call site. This and a couple other options were discussed. Having some
struct vm_unmapped_area_info instances not zero initialized will put
those sites at risk of feeding garbage into vm_unmapped_area(), if the
convention is to zero initialize the struct and any new field addition
missed a call site that initializes each field manually. So it is
useful to do things similar across the kernel.
The consensus (see links) was that in general the best way to accomplish
taking into account both code cleanliness and minimizing the chance of
introducing bugs, was to do C99 static initialization. As in:
struct vm_unmapped_area_info info = {};
With this method of initialization, the whole struct will be zero
initialized, and any statements setting fields to zero will be unneeded.
The change should not leave cleanup at the call sides.
While iterating though the possible solutions a few archs kindly acked
other variations that still zero initialized the struct. These sites have
been modified in previous changes using the pattern acked by the respective
arch.
So to be reduce the chance of bugs via uninitialized fields, perform a
tree wide change using the consensus for the best general way to do this
change. Use C99 static initializing to zero the struct and remove and
statements that simply set members to zero.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-csky@vger.kernel.org
Cc: loongarch@lists.linux.dev
Cc: linux-mips@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
Link: https://lore.kernel.org/lkml/ec3e377a-c0a0-4dd3-9cb9-96517e54d17e@csgroup.eu/
---
Hi archs,
For some context, this is part of a larger series to improve shadow stack
guard gaps. It involves plumbing a new field via
struct vm_unmapped_area_info. The first user is x86, but arm and riscv may
likely use it as well. The change is compile tested only for non-x86.
Thanks,
Rick
---
arch/alpha/kernel/osf_sys.c | 5 +----
arch/arc/mm/mmap.c | 4 +---
arch/arm/mm/mmap.c | 5 ++---
arch/loongarch/mm/mmap.c | 3 +--
arch/mips/mm/mmap.c | 3 +--
arch/s390/mm/hugetlbpage.c | 7 ++-----
arch/s390/mm/mmap.c | 11 ++++-------
arch/sh/mm/mmap.c | 5 ++---
arch/sparc/kernel/sys_sparc_32.c | 3 +--
arch/sparc/kernel/sys_sparc_64.c | 5 ++---
arch/sparc/mm/hugetlbpage.c | 7 ++-----
arch/x86/kernel/sys_x86_64.c | 7 ++-----
arch/x86/mm/hugetlbpage.c | 7 ++-----
fs/hugetlbfs/inode.c | 7 ++-----
mm/mmap.c | 9 ++-------
15 files changed, 27 insertions(+), 61 deletions(-)
diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 5db88b627439..e5f881bc8288 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1218,14 +1218,11 @@ static unsigned long
arch_get_unmapped_area_1(unsigned long addr, unsigned long len,
unsigned long limit)
{
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
- info.flags = 0;
info.length = len;
info.low_limit = addr;
info.high_limit = limit;
- info.align_mask = 0;
- info.align_offset = 0;
return vm_unmapped_area(&info);
}
diff --git a/arch/arc/mm/mmap.c b/arch/arc/mm/mmap.c
index 3c1c7ae73292..69a915297155 100644
--- a/arch/arc/mm/mmap.c
+++ b/arch/arc/mm/mmap.c
@@ -27,7 +27,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/*
* We enforce the MAP_FIXED case.
@@ -51,11 +51,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
- info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
return vm_unmapped_area(&info);
}
diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
index a0f8a0ca0788..d65d0e6ed10a 100644
--- a/arch/arm/mm/mmap.c
+++ b/arch/arm/mm/mmap.c
@@ -34,7 +34,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
struct vm_area_struct *vma;
int do_align = 0;
int aliasing = cache_is_vipt_aliasing();
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/*
* We only need to do colour alignment if either the I or D
@@ -68,7 +68,6 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
@@ -87,7 +86,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
unsigned long addr = addr0;
int do_align = 0;
int aliasing = cache_is_vipt_aliasing();
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/*
* We only need to do colour alignment if either the I or D
diff --git a/arch/loongarch/mm/mmap.c b/arch/loongarch/mm/mmap.c
index a9630a81b38a..4bbd449b4a47 100644
--- a/arch/loongarch/mm/mmap.c
+++ b/arch/loongarch/mm/mmap.c
@@ -24,7 +24,7 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
struct vm_area_struct *vma;
unsigned long addr = addr0;
int do_color_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (unlikely(len > TASK_SIZE))
return -ENOMEM;
@@ -82,7 +82,6 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
*/
}
- info.flags = 0;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
return vm_unmapped_area(&info);
diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
index 00fe90c6db3e..7e11d7b58761 100644
--- a/arch/mips/mm/mmap.c
+++ b/arch/mips/mm/mmap.c
@@ -34,7 +34,7 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
struct vm_area_struct *vma;
unsigned long addr = addr0;
int do_color_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (unlikely(len > TASK_SIZE))
return -ENOMEM;
@@ -92,7 +92,6 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
*/
}
- info.flags = 0;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
return vm_unmapped_area(&info);
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index c2d2850ec8d5..51fb3806395b 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -258,14 +258,12 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
- info.flags = 0;
info.length = len;
info.low_limit = current->mm->mmap_base;
info.high_limit = TASK_SIZE;
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
return vm_unmapped_area(&info);
}
@@ -274,7 +272,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
unsigned long addr;
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
@@ -282,7 +280,6 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
info.low_limit = PAGE_SIZE;
info.high_limit = current->mm->mmap_base;
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
/*
diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index cd52d72b59cf..5c9d9f18a55f 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -77,7 +77,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (len > TASK_SIZE - mmap_min_addr)
return -ENOMEM;
@@ -93,14 +93,12 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
goto check_asce_limit;
}
- info.flags = 0;
info.length = len;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
if (filp || (flags & MAP_SHARED))
info.align_mask = MMAP_ALIGN_MASK << PAGE_SHIFT;
- else
- info.align_mask = 0;
+
info.align_offset = pgoff << PAGE_SHIFT;
addr = vm_unmapped_area(&info);
if (offset_in_page(addr))
@@ -116,7 +114,7 @@ unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long ad
{
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/* requested length too big for entire address space */
if (len > TASK_SIZE - mmap_min_addr)
@@ -140,8 +138,7 @@ unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long ad
info.high_limit = mm->mmap_base;
if (filp || (flags & MAP_SHARED))
info.align_mask = MMAP_ALIGN_MASK << PAGE_SHIFT;
- else
- info.align_mask = 0;
+
info.align_offset = pgoff << PAGE_SHIFT;
addr = vm_unmapped_area(&info);
diff --git a/arch/sh/mm/mmap.c b/arch/sh/mm/mmap.c
index b82199878b45..bee329d4149a 100644
--- a/arch/sh/mm/mmap.c
+++ b/arch/sh/mm/mmap.c
@@ -57,7 +57,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
int do_colour_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (flags & MAP_FIXED) {
/* We do not accept a shared mapping if it would violate
@@ -88,7 +88,6 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = TASK_UNMAPPED_BASE;
info.high_limit = TASK_SIZE;
@@ -106,7 +105,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
struct mm_struct *mm = current->mm;
unsigned long addr = addr0;
int do_colour_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (flags & MAP_FIXED) {
/* We do not accept a shared mapping if it would violate
diff --git a/arch/sparc/kernel/sys_sparc_32.c b/arch/sparc/kernel/sys_sparc_32.c
index 082a551897ed..08a19727795c 100644
--- a/arch/sparc/kernel/sys_sparc_32.c
+++ b/arch/sparc/kernel/sys_sparc_32.c
@@ -41,7 +41,7 @@ SYSCALL_DEFINE0(getpagesize)
unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags)
{
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (flags & MAP_FIXED) {
/* We do not accept a shared mapping if it would violate
@@ -59,7 +59,6 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsi
if (!addr)
addr = TASK_UNMAPPED_BASE;
- info.flags = 0;
info.length = len;
info.low_limit = addr;
info.high_limit = TASK_SIZE;
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index 1dbf7211666e..d9c3b34ca744 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -93,7 +93,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsi
struct vm_area_struct * vma;
unsigned long task_size = TASK_SIZE;
int do_color_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (flags & MAP_FIXED) {
/* We do not accept a shared mapping if it would violate
@@ -126,7 +126,6 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsi
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = TASK_UNMAPPED_BASE;
info.high_limit = min(task_size, VA_EXCLUDE_START);
@@ -154,7 +153,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
unsigned long task_size = STACK_TOP32;
unsigned long addr = addr0;
int do_color_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/* This should only ever run for 32-bit processes. */
BUG_ON(!test_thread_flag(TIF_32BIT));
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 38a1bef47efb..4caf56b32e26 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -31,17 +31,15 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *filp,
{
struct hstate *h = hstate_file(filp);
unsigned long task_size = TASK_SIZE;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (test_thread_flag(TIF_32BIT))
task_size = STACK_TOP32;
- info.flags = 0;
info.length = len;
info.low_limit = TASK_UNMAPPED_BASE;
info.high_limit = min(task_size, VA_EXCLUDE_START);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
if ((addr & ~PAGE_MASK) && task_size > VA_EXCLUDE_END) {
@@ -63,7 +61,7 @@ hugetlb_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
struct hstate *h = hstate_file(filp);
struct mm_struct *mm = current->mm;
unsigned long addr = addr0;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/* This should only ever run for 32-bit processes. */
BUG_ON(!test_thread_flag(TIF_32BIT));
@@ -73,7 +71,6 @@ hugetlb_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
info.low_limit = PAGE_SIZE;
info.high_limit = mm->mmap_base;
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
/*
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index c783aeb37dce..b3278e4f7e59 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -125,7 +125,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
unsigned long begin, end;
if (flags & MAP_FIXED)
@@ -144,11 +144,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = begin;
info.high_limit = end;
- info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
if (filp) {
info.align_mask = get_align_mask();
@@ -165,7 +163,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
unsigned long addr = addr0;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/* requested length too big for entire address space */
if (len > TASK_SIZE)
@@ -210,7 +208,6 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
- info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
if (filp) {
info.align_mask = get_align_mask();
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 6d77c0039617..fb600949a355 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -51,9 +51,8 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
- info.flags = 0;
info.length = len;
info.low_limit = get_mmap_base(1);
@@ -65,7 +64,6 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
task_size_32bit() : task_size_64bit(addr > DEFAULT_MAP_WINDOW);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
return vm_unmapped_area(&info);
}
@@ -74,7 +72,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
@@ -89,7 +87,6 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
/*
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index cd87ea5944a1..ae833080a146 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,14 +176,12 @@ hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
- info.flags = 0;
info.length = len;
info.low_limit = current->mm->mmap_base;
info.high_limit = arch_get_mmap_end(addr, len, flags);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
return vm_unmapped_area(&info);
}
@@ -192,14 +190,13 @@ hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
info.low_limit = PAGE_SIZE;
info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
/*
diff --git a/mm/mmap.c b/mm/mmap.c
index 68381b90f906..b889c79d11bd 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1707,7 +1707,7 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
if (len > mmap_end - mmap_min_addr)
@@ -1725,12 +1725,9 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = mm->mmap_base;
info.high_limit = mmap_end;
- info.align_mask = 0;
- info.align_offset = 0;
return vm_unmapped_area(&info);
}
@@ -1755,7 +1752,7 @@ generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
{
struct vm_area_struct *vma, *prev;
struct mm_struct *mm = current->mm;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
/* requested length too big for entire address space */
@@ -1779,8 +1776,6 @@ generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
info.length = len;
info.low_limit = PAGE_SIZE;
info.high_limit = arch_get_mmap_base(addr, mm->mmap_base);
- info.align_mask = 0;
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
/*
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 09/12] mm: Take placement mappings gap into account
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
` (7 preceding siblings ...)
2024-03-12 22:28 ` [PATCH v3 08/12] treewide: " Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-13 9:04 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 10/12] x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS Rick Edgecombe
` (2 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
For MAP_GROWSDOWN/VM_GROWSDOWN and MAP_GROWSUP/VM_GROWSUP this has not
been a problem in practice because applications place these kinds of
mappings very early, when there is not many mappings to find a space
between. But for shadow stacks, they may be placed throughout the lifetime
of the application.
Use the start_gap field to find a space that includes the guard gap for
the new mapping. Take care to not interfere with the alignment.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
v3:
- Spelling fix in comment
v2:
- Remove VM_UNMAPPED_START_GAP_SET and have struct vm_unmapped_area_info
initialized with zeros (in another patch). (Kirill)
- Drop unrelated space change (Kirill)
- Add comment around interactions of alignment and start gap step
(Kirill)
---
include/linux/mm.h | 1 +
mm/mmap.c | 12 +++++++++---
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d91cde79aaee..deade7be00d0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3418,6 +3418,7 @@ struct vm_unmapped_area_info {
unsigned long high_limit;
unsigned long align_mask;
unsigned long align_offset;
+ unsigned long start_gap;
};
extern unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info);
diff --git a/mm/mmap.c b/mm/mmap.c
index b889c79d11bd..634e706fd97e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1582,7 +1582,7 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
/* Adjust search length to account for worst case alignment overhead */
- length = info->length + info->align_mask;
+ length = info->length + info->align_mask + info->start_gap;
if (length < info->length)
return -ENOMEM;
@@ -1594,7 +1594,13 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
if (mas_empty_area(&mas, low_limit, high_limit - 1, length))
return -ENOMEM;
- gap = mas.index;
+ /*
+ * Adjust for the gap first so it doesn't interfere with the
+ * later alignment. The first step is the minimum needed to
+ * fulill the start gap, the next steps is the minimum to align
+ * that. It is the minimum needed to fulill both.
+ */
+ gap = mas.index + info->start_gap;
gap += (info->align_offset - gap) & info->align_mask;
tmp = mas_next(&mas, ULONG_MAX);
if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
@@ -1633,7 +1639,7 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info)
MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
/* Adjust search length to account for worst case alignment overhead */
- length = info->length + info->align_mask;
+ length = info->length + info->align_mask + info->start_gap;
if (length < info->length)
return -ENOMEM;
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 10/12] x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
` (8 preceding siblings ...)
2024-03-12 22:28 ` [PATCH v3 09/12] mm: Take placement mappings gap into account Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-13 9:04 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 11/12] x86/mm: Care about shadow stack guard gap during placement Rick Edgecombe
2024-03-12 22:28 ` [PATCH v3 12/12] selftests/x86: Add placement guard gap test for shstk Rick Edgecombe
11 siblings, 1 reply; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Add x86 arch implementations of arch_get_unmapped_area_vmflags/_topdown()
so future changes can allow the guard gap of type of vma being placed to
be taken into account. This will be used for shadow stack memory.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
v3:
- Commit log grammar
v2:
- Remove unnecessary added extern
---
arch/x86/include/asm/pgtable_64.h | 1 +
arch/x86/kernel/sys_x86_64.c | 25 ++++++++++++++++++++-----
2 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 24af25b1551a..13dcaf436efd 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -244,6 +244,7 @@ extern void cleanup_highmap(void);
#define HAVE_ARCH_UNMAPPED_AREA
#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
+#define HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
#define PAGE_AGP PAGE_KERNEL_NOCACHE
#define HAVE_PAGE_AGP 1
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index b3278e4f7e59..d6fbc4dd08ef 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -120,8 +120,8 @@ static void find_start_end(unsigned long addr, unsigned long flags,
}
unsigned long
-arch_get_unmapped_area(struct file *filp, unsigned long addr,
- unsigned long len, unsigned long pgoff, unsigned long flags)
+arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
@@ -156,9 +156,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
}
unsigned long
-arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
- const unsigned long len, const unsigned long pgoff,
- const unsigned long flags)
+arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr0,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags)
{
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
@@ -227,3 +227,18 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
*/
return arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
}
+
+unsigned long
+arch_get_unmapped_area(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+ return arch_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0);
+}
+
+unsigned long
+arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr,
+ const unsigned long len, const unsigned long pgoff,
+ const unsigned long flags)
+{
+ return arch_get_unmapped_area_topdown_vmflags(filp, addr, len, pgoff, flags, 0);
+}
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 11/12] x86/mm: Care about shadow stack guard gap during placement
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
` (9 preceding siblings ...)
2024-03-12 22:28 ` [PATCH v3 10/12] x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
2024-03-12 22:28 ` [PATCH v3 12/12] selftests/x86: Add placement guard gap test for shstk Rick Edgecombe
11 siblings, 0 replies; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Now that the vm_flags is passed into the arch get_unmapped_area()'s, and
vm_unmapped_area() is ready to consider it, have VM_SHADOW_STACK's get
guard gap consideration for scenario 2.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
arch/x86/kernel/sys_x86_64.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index d6fbc4dd08ef..964cb435710e 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -119,6 +119,14 @@ static void find_start_end(unsigned long addr, unsigned long flags,
*end = task_size_64bit(addr > DEFAULT_MAP_WINDOW);
}
+static inline unsigned long stack_guard_placement(vm_flags_t vm_flags)
+{
+ if (vm_flags & VM_SHADOW_STACK)
+ return PAGE_SIZE;
+
+ return 0;
+}
+
unsigned long
arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
@@ -148,6 +156,7 @@ arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned l
info.low_limit = begin;
info.high_limit = end;
info.align_offset = pgoff << PAGE_SHIFT;
+ info.start_gap = stack_guard_placement(vm_flags);
if (filp) {
info.align_mask = get_align_mask();
info.align_offset += get_align_bits();
@@ -197,6 +206,7 @@ arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr0,
info.low_limit = PAGE_SIZE;
info.high_limit = get_mmap_base(0);
+ info.start_gap = stack_guard_placement(vm_flags);
/*
* If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 12/12] selftests/x86: Add placement guard gap test for shstk
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
` (10 preceding siblings ...)
2024-03-12 22:28 ` [PATCH v3 11/12] x86/mm: Care about shadow stack guard gap during placement Rick Edgecombe
@ 2024-03-12 22:28 ` Rick Edgecombe
11 siblings, 0 replies; 36+ messages in thread
From: Rick Edgecombe @ 2024-03-12 22:28 UTC (permalink / raw)
To: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
keescook, kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy
Cc: linux-kernel, linux-mm, rick.p.edgecombe
The existing shadow stack test for guard gaps just checks that new
mappings are not placed in an existing mapping's guard gap. Add one that
checks that new mappings are not placed such that preexisting mappings are
in the new mappings guard gap.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
.../testing/selftests/x86/test_shadow_stack.c | 67 +++++++++++++++++--
1 file changed, 63 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c
index 757e6527f67e..ee909a7927f9 100644
--- a/tools/testing/selftests/x86/test_shadow_stack.c
+++ b/tools/testing/selftests/x86/test_shadow_stack.c
@@ -556,7 +556,7 @@ struct node {
* looked at the shadow stack gaps.
* 5. See if it landed in the gap.
*/
-int test_guard_gap(void)
+int test_guard_gap_other_gaps(void)
{
void *free_area, *shstk, *test_map = (void *)0xFFFFFFFFFFFFFFFF;
struct node *head = NULL, *cur;
@@ -593,11 +593,64 @@ int test_guard_gap(void)
if (shstk - test_map - PAGE_SIZE != PAGE_SIZE)
return 1;
- printf("[OK]\tGuard gap test\n");
+ printf("[OK]\tGuard gap test, other mapping's gaps\n");
return 0;
}
+/* Tests respecting the guard gap of the mapping getting placed */
+int test_guard_gap_new_mappings_gaps(void)
+{
+ void *free_area, *shstk_start, *test_map = (void *)0xFFFFFFFFFFFFFFFF;
+ struct node *head = NULL, *cur;
+ int ret = 0;
+
+ free_area = mmap(0, PAGE_SIZE * 4, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ munmap(free_area, PAGE_SIZE * 4);
+
+ /* Test letting map_shadow_stack find a free space */
+ shstk_start = mmap(free_area, PAGE_SIZE, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (shstk_start == MAP_FAILED || shstk_start != free_area)
+ return 1;
+
+ while (test_map > shstk_start) {
+ test_map = (void *)syscall(__NR_map_shadow_stack, 0, PAGE_SIZE, 0);
+ if (test_map == MAP_FAILED) {
+ printf("[INFO]\tmap_shadow_stack MAP_FAILED\n");
+ ret = 1;
+ break;
+ }
+
+ cur = malloc(sizeof(*cur));
+ cur->mapping = test_map;
+
+ cur->next = head;
+ head = cur;
+
+ if (test_map == free_area + PAGE_SIZE) {
+ printf("[INFO]\tNew mapping has other mapping in guard gap!\n");
+ ret = 1;
+ break;
+ }
+ }
+
+ while (head) {
+ cur = head;
+ head = cur->next;
+ munmap(cur->mapping, PAGE_SIZE);
+ free(cur);
+ }
+
+ munmap(shstk_start, PAGE_SIZE);
+
+ if (!ret)
+ printf("[OK]\tGuard gap test, placement mapping's gaps\n");
+
+ return ret;
+}
+
/*
* Too complicated to pull it out of the 32 bit header, but also get the
* 64 bit one needed above. Just define a copy here.
@@ -850,9 +903,15 @@ int main(int argc, char *argv[])
goto out;
}
- if (test_guard_gap()) {
+ if (test_guard_gap_other_gaps()) {
ret = 1;
- printf("[FAIL]\tGuard gap test\n");
+ printf("[FAIL]\tGuard gap test, other mappings' gaps\n");
+ goto out;
+ }
+
+ if (test_guard_gap_new_mappings_gaps()) {
+ ret = 1;
+ printf("[FAIL]\tGuard gap test, placement mapping's gaps\n");
goto out;
}
--
2.34.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 08/12] treewide: Use initializer for struct vm_unmapped_area_info
2024-03-12 22:28 ` [PATCH v3 08/12] treewide: " Rick Edgecombe
@ 2024-03-13 3:18 ` Kees Cook
2024-03-13 15:40 ` Edgecombe, Rick P
0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2024-03-13 3:18 UTC (permalink / raw)
To: Rick Edgecombe
Cc: Liam.Howlett, akpm, bp, broonie, dave.hansen, debug, hpa,
kirill.shutemov, luto, mingo, peterz, tglx, x86,
christophe.leroy, linux-kernel, linux-mm, linux-alpha,
linux-snps-arc, linux-arm-kernel, linux-csky, loongarch,
linux-mips, linux-s390, linux-sh, sparclinux
On Tue, Mar 12, 2024 at 03:28:39PM -0700, Rick Edgecombe wrote:
> So to be reduce the chance of bugs via uninitialized fields, perform a
> tree wide change using the consensus for the best general way to do this
> change. Use C99 static initializing to zero the struct and remove and
> statements that simply set members to zero.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Thanks! This looks to do exactly what it describes. :)
Reviewed-by: Kees Cook <keescook@chromium.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 07/12] powerpc: Use initializer for struct vm_unmapped_area_info
2024-03-12 22:28 ` [PATCH v3 07/12] powerpc: " Rick Edgecombe
@ 2024-03-13 6:44 ` Christophe Leroy
2024-03-13 14:57 ` Edgecombe, Rick P
0 siblings, 1 reply; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 6:44 UTC (permalink / raw)
To: Rick Edgecombe, Liam.Howlett, akpm, bp, broonie, dave.hansen,
debug, hpa, keescook, kirill.shutemov, luto, mingo, peterz, tglx,
x86
Cc: linux-kernel, linux-mm, Michael Ellerman, Nicholas Piggin,
Aneesh Kumar K . V, Naveen N . Rao, linuxppc-dev
Le 12/03/2024 à 23:28, Rick Edgecombe a écrit :
> Future changes will need to add a new member to struct
> vm_unmapped_area_info. This would cause trouble for any call site that
> doesn't initialize the struct. Currently every caller sets each member
> manually, so if new members are added they will be uninitialized and the
> core code parsing the struct will see garbage in the new member.
>
> It could be possible to initialize the new member manually to 0 at each
> call site. This and a couple other options were discussed, and a working
> consensus (see links) was that in general the best way to accomplish this
> would be via static initialization with designated member initiators.
> Having some struct vm_unmapped_area_info instances not zero initialized
> will put those sites at risk of feeding garbage into vm_unmapped_area() if
> the convention is to zero initialize the struct and any new member addition
> misses a call site that initializes each member manually.
>
> It could be possible to leave the code mostly untouched, and just change
> the line:
> struct vm_unmapped_area_info info
> to:
> struct vm_unmapped_area_info info = {};
>
> However, that would leave cleanup for the members that are manually set
> to zero, as it would no longer be required.
>
> So to be reduce the chance of bugs via uninitialized members, instead
> simply continue the process to initialize the struct this way tree wide.
> This will zero any unspecified members. Move the member initializers to the
> struct declaration when they are known at that time. Leave the members out
> that were manually initialized to zero, as this would be redundant for
> designated initializers.
I understand from this text that, as agreed, this patch removes the
pointless/redundant zero-init of individual members. But it is not what
is done, see below ?
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
> Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
> ---
> v3:
> - Fixed spelling errors in log
> - Be consistent about field vs member in log
>
> Hi,
>
> This patch was split and refactored out of a tree-wide change [0] to just
> zero-init each struct vm_unmapped_area_info. The overall goal of the
> series is to help shadow stack guard gaps. Currently, there is only one
> arch with shadow stacks, but two more are in progress. It is compile tested
> only.
>
> There was further discussion that this method of initializing the structs
> while nice in some ways has a greater risk of introducing bugs in some of
> the more complicated callers. Since this version was reviewed my arch
> maintainers already, leave it as was already acknowledged.
>
> Thanks,
>
> Rick
>
> [0] https://lore.kernel.org/lkml/20240226190951.3240433-6-rick.p.edgecombe@intel.com/
> ---
> arch/powerpc/mm/book3s64/slice.c | 23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/slice.c b/arch/powerpc/mm/book3s64/slice.c
> index c0b58afb9a47..6c7ac8c73a6c 100644
> --- a/arch/powerpc/mm/book3s64/slice.c
> +++ b/arch/powerpc/mm/book3s64/slice.c
> @@ -282,12 +282,12 @@ static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
> {
> int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
> unsigned long found, next_end;
> - struct vm_unmapped_area_info info;
> -
> - info.flags = 0;
> - info.length = len;
> - info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
> - info.align_offset = 0;
> + struct vm_unmapped_area_info info = {
> + .flags = 0,
Please remove zero-init as agreed and explained in the commit message
> + .length = len,
> + .align_mask = PAGE_MASK & ((1ul << pshift) - 1),
> + .align_offset = 0
Same here.
> + };
> /*
> * Check till the allow max value for this mmap request
> */
> @@ -326,13 +326,14 @@ static unsigned long slice_find_area_topdown(struct mm_struct *mm,
> {
> int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
> unsigned long found, prev;
> - struct vm_unmapped_area_info info;
> + struct vm_unmapped_area_info info = {
> + .flags = VM_UNMAPPED_AREA_TOPDOWN,
> + .length = len,
> + .align_mask = PAGE_MASK & ((1ul << pshift) - 1),
> + .align_offset = 0
Same here.
> + };
> unsigned long min_addr = max(PAGE_SIZE, mmap_min_addr);
>
> - info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> - info.length = len;
> - info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
> - info.align_offset = 0;
> /*
> * If we are trying to allocate above DEFAULT_MAP_WINDOW
> * Add the different to the mmap_base.
Christophe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 01/12] mm: Switch mm->get_unmapped_area() to a flag
2024-03-12 22:28 ` [PATCH v3 01/12] mm: Switch mm->get_unmapped_area() to a flag Rick Edgecombe
@ 2024-03-13 7:19 ` Christophe Leroy
2024-03-13 14:48 ` Edgecombe, Rick P
0 siblings, 1 reply; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 7:19 UTC (permalink / raw)
To: Rick Edgecombe, Liam.Howlett, akpm, bp, broonie, dave.hansen,
debug, hpa, keescook, kirill.shutemov, luto, mingo, peterz, tglx,
x86
Cc: linux-kernel, linux-mm
Le 12/03/2024 à 23:28, Rick Edgecombe a écrit :
> The mm_struct contains a function pointer *get_unmapped_area(), which
> is set to either arch_get_unmapped_area() or
> arch_get_unmapped_area_topdown() during the initialization of the mm.
>
> Since the function pointer only ever points to two functions that are named
> the same across all arch's, a function pointer is not really required. In
> addition future changes will want to add versions of the functions that
> take additional arguments. So to save a pointers worth of bytes in
> mm_struct, and prevent adding additional function pointers to mm_struct in
> future changes, remove it and keep the information about which
> get_unmapped_area() to use in a flag.
>
> Add the new flag to MMF_INIT_MASK so it doesn't get clobbered on fork by
> mmf_init_flags(). Most MM flags get clobbered on fork. In the pre-existing
> behavior mm->get_unmapped_area() would get copied to the new mm in
> dup_mm(), so not clobbering the flag preserves the existing behavior
> around inheriting the topdown-ness.
>
> Introduce a helper, mm_get_unmapped_area(), to easily convert code that
> refers to the old function pointer to instead select and call either
> arch_get_unmapped_area() or arch_get_unmapped_area_topdown() based on the
> flag. Then drop the mm->get_unmapped_area() function pointer. Leave the
> get_unmapped_area() pointer in struct file_operations alone. The main
> purpose of this change is to reorganize in preparation for future changes,
> but it also converts the calls of mm->get_unmapped_area() from indirect
> branches into a direct ones.
>
> The stress-ng bigheap benchmark calls realloc a lot, which calls through
> get_unmapped_area() in the kernel. On x86, the change yielded a ~1%
> improvement there on a retpoline config.
>
> In testing a few x86 configs, removing the pointer unfortunately didn't
> result in any actual size reductions in the compiled layout of mm_struct.
> But depending on compiler or arch alignment requirements, the change could
> shrink the size of mm_struct.
This patch is quite big and un-easy to follow. Would be worth splitting
in several patches if possible. Some of the changes seem to go further
than just switching mm->get_unmapped_area() to a flag.
First patch could add the new flag and necessary helpers, then following
patches could convert sub-systems one by one then last patch would
remove mm->get_unmapped_area() once all users are converted.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> v3:
> - Fix comment that still referred to mm->get_unmapped_area()
> - Resolve trivial rebase conflicts with "mm: thp_get_unmapped_area must
> honour topdown preference"
> - Spelling fix in log
>
> v2:
> - Fix comment on MMF_TOPDOWN (Kirill, rppt)
> - Move MMF_TOPDOWN to actually unused bit
> - Add MMF_TOPDOWN to MMF_INIT_MASK so it doesn't get clobbered on fork,
> and result in the children using the search up path.
> - New lower performance results after above bug fix
> - Add Reviews and Acks
> ---
> arch/s390/mm/hugetlbpage.c | 2 +-
> arch/s390/mm/mmap.c | 4 ++--
> arch/sparc/kernel/sys_sparc_64.c | 15 ++++++---------
> arch/sparc/mm/hugetlbpage.c | 2 +-
> arch/x86/kernel/cpu/sgx/driver.c | 2 +-
> arch/x86/mm/hugetlbpage.c | 2 +-
> arch/x86/mm/mmap.c | 4 ++--
> drivers/char/mem.c | 2 +-
> drivers/dax/device.c | 6 +++---
> fs/hugetlbfs/inode.c | 4 ++--
> fs/proc/inode.c | 15 ++++++++-------
> fs/ramfs/file-mmu.c | 2 +-
> include/linux/mm_types.h | 6 +-----
> include/linux/sched/coredump.h | 5 ++++-
> include/linux/sched/mm.h | 5 +++++
> io_uring/io_uring.c | 2 +-
> mm/debug.c | 6 ------
> mm/huge_memory.c | 9 ++++-----
> mm/mmap.c | 21 ++++++++++++++++++---
> mm/shmem.c | 11 +++++------
> mm/util.c | 6 +++---
> 21 files changed, 70 insertions(+), 61 deletions(-)
>
> diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
> index 297a6d897d5a..c2d2850ec8d5 100644
> --- a/arch/s390/mm/hugetlbpage.c
> +++ b/arch/s390/mm/hugetlbpage.c
> @@ -328,7 +328,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> goto check_asce_limit;
> }
>
> - if (mm->get_unmapped_area == arch_get_unmapped_area)
> + if (!test_bit(MMF_TOPDOWN, &mm->flags))
> addr = hugetlb_get_unmapped_area_bottomup(file, addr, len,
> pgoff, flags);
> else
> diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
> index fc9a7dc26c5e..cd52d72b59cf 100644
> --- a/arch/s390/mm/mmap.c
> +++ b/arch/s390/mm/mmap.c
> @@ -182,10 +182,10 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
> */
> if (mmap_is_legacy(rlim_stack)) {
> mm->mmap_base = mmap_base_legacy(random_factor);
> - mm->get_unmapped_area = arch_get_unmapped_area;
> + clear_bit(MMF_TOPDOWN, &mm->flags);
> } else {
> mm->mmap_base = mmap_base(random_factor, rlim_stack);
> - mm->get_unmapped_area = arch_get_unmapped_area_topdown;
> + set_bit(MMF_TOPDOWN, &mm->flags);
> }
> }
>
> diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
> index 1e9a9e016237..1dbf7211666e 100644
> --- a/arch/sparc/kernel/sys_sparc_64.c
> +++ b/arch/sparc/kernel/sys_sparc_64.c
> @@ -218,14 +218,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
> unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, unsigned long len, unsigned long pgoff, unsigned long flags)
> {
> unsigned long align_goal, addr = -ENOMEM;
> - unsigned long (*get_area)(struct file *, unsigned long,
> - unsigned long, unsigned long, unsigned long);
> -
> - get_area = current->mm->get_unmapped_area;
>
> if (flags & MAP_FIXED) {
> /* Ok, don't mess with it. */
> - return get_area(NULL, orig_addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, NULL, orig_addr, len, pgoff, flags);
> }
> flags &= ~MAP_SHARED;
>
> @@ -238,7 +234,8 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u
> align_goal = (64UL * 1024);
>
> do {
> - addr = get_area(NULL, orig_addr, len + (align_goal - PAGE_SIZE), pgoff, flags);
> + addr = mm_get_unmapped_area(current->mm, NULL, orig_addr,
> + len + (align_goal - PAGE_SIZE), pgoff, flags);
> if (!(addr & ~PAGE_MASK)) {
> addr = (addr + (align_goal - 1UL)) & ~(align_goal - 1UL);
> break;
> @@ -256,7 +253,7 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u
> * be obtained.
> */
> if (addr & ~PAGE_MASK)
> - addr = get_area(NULL, orig_addr, len, pgoff, flags);
> + addr = mm_get_unmapped_area(current->mm, NULL, orig_addr, len, pgoff, flags);
>
> return addr;
> }
> @@ -292,7 +289,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
> gap == RLIM_INFINITY ||
> sysctl_legacy_va_layout) {
> mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
> - mm->get_unmapped_area = arch_get_unmapped_area;
> + clear_bit(MMF_TOPDOWN, &mm->flags);
> } else {
> /* We know it's 32-bit */
> unsigned long task_size = STACK_TOP32;
> @@ -303,7 +300,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
> gap = (task_size / 6 * 5);
>
> mm->mmap_base = PAGE_ALIGN(task_size - gap - random_factor);
> - mm->get_unmapped_area = arch_get_unmapped_area_topdown;
> + set_bit(MMF_TOPDOWN, &mm->flags);
> }
> }
>
> diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
> index b432500c13a5..38a1bef47efb 100644
> --- a/arch/sparc/mm/hugetlbpage.c
> +++ b/arch/sparc/mm/hugetlbpage.c
> @@ -123,7 +123,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> (!vma || addr + len <= vm_start_gap(vma)))
> return addr;
> }
> - if (mm->get_unmapped_area == arch_get_unmapped_area)
> + if (!test_bit(MMF_TOPDOWN, &mm->flags))
> return hugetlb_get_unmapped_area_bottomup(file, addr, len,
> pgoff, flags);
> else
> diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
> index 262f5fb18d74..22b65a5f5ec6 100644
> --- a/arch/x86/kernel/cpu/sgx/driver.c
> +++ b/arch/x86/kernel/cpu/sgx/driver.c
> @@ -113,7 +113,7 @@ static unsigned long sgx_get_unmapped_area(struct file *file,
> if (flags & MAP_FIXED)
> return addr;
>
> - return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
> }
>
> #ifdef CONFIG_COMPAT
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 5804bbae4f01..6d77c0039617 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -141,7 +141,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> }
>
> get_unmapped_area:
> - if (mm->get_unmapped_area == arch_get_unmapped_area)
> + if (!test_bit(MMF_TOPDOWN, &mm->flags))
> return hugetlb_get_unmapped_area_bottomup(file, addr, len,
> pgoff, flags);
> else
> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
> index c90c20904a60..a2cabb1c81e1 100644
> --- a/arch/x86/mm/mmap.c
> +++ b/arch/x86/mm/mmap.c
> @@ -129,9 +129,9 @@ static void arch_pick_mmap_base(unsigned long *base, unsigned long *legacy_base,
> void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
> {
> if (mmap_is_legacy())
> - mm->get_unmapped_area = arch_get_unmapped_area;
> + clear_bit(MMF_TOPDOWN, &mm->flags);
> else
> - mm->get_unmapped_area = arch_get_unmapped_area_topdown;
> + set_bit(MMF_TOPDOWN, &mm->flags);
>
> arch_pick_mmap_base(&mm->mmap_base, &mm->mmap_legacy_base,
> arch_rnd(mmap64_rnd_bits), task_size_64bit(0),
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index 3c6670cf905f..9b80e622ae80 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -544,7 +544,7 @@ static unsigned long get_unmapped_area_zero(struct file *file,
> }
>
> /* Otherwise flags & MAP_PRIVATE: with no shmem object beneath it */
> - return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
> #else
> return -ENOSYS;
> #endif
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index 93ebedc5ec8c..47c126d37b59 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -329,14 +329,14 @@ static unsigned long dax_get_unmapped_area(struct file *filp,
> if ((off + len_align) < off)
> goto out;
>
> - addr_align = current->mm->get_unmapped_area(filp, addr, len_align,
> - pgoff, flags);
> + addr_align = mm_get_unmapped_area(current->mm, filp, addr, len_align,
> + pgoff, flags);
> if (!IS_ERR_VALUE(addr_align)) {
> addr_align += (off - addr_align) & (align - 1);
> return addr_align;
> }
> out:
> - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
> }
>
> static const struct address_space_operations dev_dax_aops = {
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index d746866ae3b6..cd87ea5944a1 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -249,11 +249,11 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> }
>
> /*
> - * Use mm->get_unmapped_area value as a hint to use topdown routine.
> + * Use MMF_TOPDOWN flag as a hint to use topdown routine.
> * If architectures have special needs, they should define their own
> * version of hugetlb_get_unmapped_area.
> */
> - if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
> + if (test_bit(MMF_TOPDOWN, &mm->flags))
> return hugetlb_get_unmapped_area_topdown(file, addr, len,
> pgoff, flags);
> return hugetlb_get_unmapped_area_bottomup(file, addr, len,
> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 05350f3c2812..017144a8516c 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -451,15 +451,16 @@ pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned lo
> unsigned long len, unsigned long pgoff,
> unsigned long flags)
> {
> - typeof_member(struct proc_ops, proc_get_unmapped_area) get_area;
> -
> - get_area = pde->proc_ops->proc_get_unmapped_area;
> + if (pde->proc_ops->proc_get_unmapped_area)
> + return pde->proc_ops->proc_get_unmapped_area(file, orig_addr,
> + len, pgoff,
> + flags);
> #ifdef CONFIG_MMU
> - if (!get_area)
> - get_area = current->mm->get_unmapped_area;
> + else
> + return mm_get_unmapped_area(current->mm, file, orig_addr,
> + len, pgoff, flags);
> #endif
> - if (get_area)
> - return get_area(file, orig_addr, len, pgoff, flags);
> +
> return orig_addr;
> }
The change looks unclear at first look. Ok after looking a second time
it seems to simplify things, but would be better as a separate patch.
Don't know.
>
> diff --git a/fs/ramfs/file-mmu.c b/fs/ramfs/file-mmu.c
> index c7a1aa3c882b..b45c7edc3225 100644
> --- a/fs/ramfs/file-mmu.c
> +++ b/fs/ramfs/file-mmu.c
> @@ -35,7 +35,7 @@ static unsigned long ramfs_mmu_get_unmapped_area(struct file *file,
> unsigned long addr, unsigned long len, unsigned long pgoff,
> unsigned long flags)
> {
> - return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
> }
>
> const struct file_operations ramfs_file_operations = {
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 8b611e13153e..d20869881214 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -749,11 +749,7 @@ struct mm_struct {
> } ____cacheline_aligned_in_smp;
>
> struct maple_tree mm_mt;
> -#ifdef CONFIG_MMU
> - unsigned long (*get_unmapped_area) (struct file *filp,
> - unsigned long addr, unsigned long len,
> - unsigned long pgoff, unsigned long flags);
> -#endif
> +
> unsigned long mmap_base; /* base of mmap area */
> unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
> #ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 02f5090ffea2..e62ff805cfc9 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -92,9 +92,12 @@ static inline int get_dumpable(struct mm_struct *mm)
> #define MMF_VM_MERGE_ANY 30
> #define MMF_VM_MERGE_ANY_MASK (1 << MMF_VM_MERGE_ANY)
>
> +#define MMF_TOPDOWN 31 /* mm searches top down by default */
> +#define MMF_TOPDOWN_MASK (1 << MMF_TOPDOWN)
> +
> #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\
> - MMF_VM_MERGE_ANY_MASK)
> + MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK)
>
> static inline unsigned long mmf_init_flags(unsigned long flags)
> {
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 9a19f1b42f64..cde946e926d8 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -8,6 +8,7 @@
> #include <linux/mm_types.h>
> #include <linux/gfp.h>
> #include <linux/sync_core.h>
> +#include <linux/sched/coredump.h>
>
> /*
> * Routines for handling mm_structs
> @@ -186,6 +187,10 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
> unsigned long len, unsigned long pgoff,
> unsigned long flags);
>
> +unsigned long mm_get_unmapped_area(struct mm_struct *mm, struct file *filp,
> + unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags);
> +
> unsigned long
> generic_get_unmapped_area(struct file *filp, unsigned long addr,
> unsigned long len, unsigned long pgoff,
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index cd9a137ad6ce..9eb3b2587031 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3513,7 +3513,7 @@ static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
> #else
> addr = 0UL;
> #endif
> - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
> }
>
> #else /* !CONFIG_MMU */
> diff --git a/mm/debug.c b/mm/debug.c
> index ee533a5ceb79..32db5de8e1e7 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -162,9 +162,6 @@ EXPORT_SYMBOL(dump_vma);
> void dump_mm(const struct mm_struct *mm)
> {
> pr_emerg("mm %px task_size %lu\n"
> -#ifdef CONFIG_MMU
> - "get_unmapped_area %px\n"
> -#endif
> "mmap_base %lu mmap_legacy_base %lu\n"
> "pgd %px mm_users %d mm_count %d pgtables_bytes %lu map_count %d\n"
> "hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
> @@ -190,9 +187,6 @@ void dump_mm(const struct mm_struct *mm)
> "def_flags: %#lx(%pGv)\n",
>
> mm, mm->task_size,
> -#ifdef CONFIG_MMU
> - mm->get_unmapped_area,
> -#endif
> mm->mmap_base, mm->mmap_legacy_base,
> mm->pgd, atomic_read(&mm->mm_users),
> atomic_read(&mm->mm_count),
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 94c958f7ebb5..bc3bf441e768 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -822,8 +822,8 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> if (len_pad < len || (off + len_pad) < off)
> return 0;
>
> - ret = current->mm->get_unmapped_area(filp, addr, len_pad,
> - off >> PAGE_SHIFT, flags);
> + ret = mm_get_unmapped_area(current->mm, filp, addr, len_pad,
> + off >> PAGE_SHIFT, flags);
>
> /*
> * The failure might be due to length padding. The caller will retry
> @@ -841,8 +841,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>
> off_sub = (off - ret) & (size - 1);
>
> - if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
> - !off_sub)
> + if (test_bit(MMF_TOPDOWN, ¤t->mm->flags) && !off_sub)
> return ret + size;
>
> ret += off_sub;
> @@ -859,7 +858,7 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> if (ret)
> return ret;
>
> - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
> }
> EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 3281287771c9..39e9a3ae3ca5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1815,7 +1815,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> unsigned long pgoff, unsigned long flags)
> {
> unsigned long (*get_area)(struct file *, unsigned long,
> - unsigned long, unsigned long, unsigned long);
> + unsigned long, unsigned long, unsigned long)
> + = NULL;
>
> unsigned long error = arch_mmap_check(addr, len, flags);
> if (error)
> @@ -1825,7 +1826,6 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> if (len > TASK_SIZE)
> return -ENOMEM;
>
> - get_area = current->mm->get_unmapped_area;
> if (file) {
> if (file->f_op->get_unmapped_area)
> get_area = file->f_op->get_unmapped_area;
> @@ -1844,7 +1844,11 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> if (!file)
> pgoff = 0;
>
> - addr = get_area(file, addr, len, pgoff, flags);
> + if (get_area)
> + addr = get_area(file, addr, len, pgoff, flags);
> + else
> + addr = mm_get_unmapped_area(current->mm, file, addr, len,
> + pgoff, flags);
> if (IS_ERR_VALUE(addr))
> return addr;
>
> @@ -1859,6 +1863,17 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>
> EXPORT_SYMBOL(get_unmapped_area);
>
> +unsigned long
> +mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
> + unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags)
> +{
> + if (test_bit(MMF_TOPDOWN, &mm->flags))
> + return arch_get_unmapped_area_topdown(file, addr, len, pgoff, flags);
> + return arch_get_unmapped_area(file, addr, len, pgoff, flags);
> +}
This function seems quite simple, wouldn't it be better to make it a
static inline ?
> +EXPORT_SYMBOL(mm_get_unmapped_area);
> +
> /**
> * find_vma_intersection() - Look up the first VMA which intersects the interval
> * @mm: The process address space.
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d7c84ff62186..5452065faa46 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2240,8 +2240,6 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> unsigned long uaddr, unsigned long len,
> unsigned long pgoff, unsigned long flags)
> {
> - unsigned long (*get_area)(struct file *,
> - unsigned long, unsigned long, unsigned long, unsigned long);
> unsigned long addr;
> unsigned long offset;
> unsigned long inflated_len;
> @@ -2251,8 +2249,8 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> if (len > TASK_SIZE)
> return -ENOMEM;
>
> - get_area = current->mm->get_unmapped_area;
> - addr = get_area(file, uaddr, len, pgoff, flags);
> + addr = mm_get_unmapped_area(current->mm, file, uaddr, len, pgoff,
> + flags);
>
> if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> return addr;
> @@ -2309,7 +2307,8 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> if (inflated_len < len)
> return addr;
>
> - inflated_addr = get_area(NULL, uaddr, inflated_len, 0, flags);
> + inflated_addr = mm_get_unmapped_area(current->mm, NULL, uaddr,
> + inflated_len, 0, flags);
> if (IS_ERR_VALUE(inflated_addr))
> return addr;
> if (inflated_addr & ~PAGE_MASK)
> @@ -4755,7 +4754,7 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> unsigned long addr, unsigned long len,
> unsigned long pgoff, unsigned long flags)
> {
> - return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
> }
> #endif
>
> diff --git a/mm/util.c b/mm/util.c
> index 5a6a9802583b..2b959553f9ce 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -452,17 +452,17 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
>
> if (mmap_is_legacy(rlim_stack)) {
> mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
> - mm->get_unmapped_area = arch_get_unmapped_area;
> + clear_bit(MMF_TOPDOWN, &mm->flags);
> } else {
> mm->mmap_base = mmap_base(random_factor, rlim_stack);
> - mm->get_unmapped_area = arch_get_unmapped_area_topdown;
> + set_bit(MMF_TOPDOWN, &mm->flags);
> }
> }
> #elif defined(CONFIG_MMU) && !defined(HAVE_ARCH_PICK_MMAP_LAYOUT)
> void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
> {
> mm->mmap_base = TASK_UNMAPPED_BASE;
> - mm->get_unmapped_area = arch_get_unmapped_area;
> + clear_bit(MMF_TOPDOWN, &mm->flags);
> }
> #endif
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 02/12] mm: Introduce arch_get_unmapped_area_vmflags()
2024-03-12 22:28 ` [PATCH v3 02/12] mm: Introduce arch_get_unmapped_area_vmflags() Rick Edgecombe
@ 2024-03-13 7:22 ` Christophe Leroy
2024-03-13 14:51 ` Edgecombe, Rick P
0 siblings, 1 reply; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 7:22 UTC (permalink / raw)
To: Rick Edgecombe, Liam.Howlett, akpm, bp, broonie, dave.hansen,
debug, hpa, keescook, kirill.shutemov, luto, mingo, peterz, tglx,
x86
Cc: linux-kernel, linux-mm
Le 12/03/2024 à 23:28, Rick Edgecombe a écrit :
> When memory is being placed, mmap() will take care to respect the guard
> gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
> VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
> needs to consider two things:
> 1. That the new mapping isn’t placed in an any existing mappings guard
> gaps.
> 2. That the new mapping isn’t placed such that any existing mappings
> are not in *its* guard gaps.
>
> The long standing behavior of mmap() is to ensure 1, but not take any care
> around 2. So for example, if there is a PAGE_SIZE free area, and a
> mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
> placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
> the mapping that is supposed to have a guard gap will not have a gap to
> the adjacent VMA.
>
> In order to take the start gap into account, the maple tree search needs
> to know the size of start gap the new mapping will need. The call chain
> from do_mmap() to the actual maple tree search looks like this:
>
> do_mmap(size, vm_flags, map_flags, ..)
> mm/mmap.c:get_unmapped_area(size, map_flags, ...)
> arch_get_unmapped_area(size, map_flags, ...)
> vm_unmapped_area(struct vm_unmapped_area_info)
>
> One option would be to add another MAP_ flag to mean a one page start gap
> (as is for shadow stack), but this consumes a flag unnecessarily. Another
> option could be to simply increase the size passed in do_mmap() by the
> start gap size, and adjust after the fact, but this will interfere with
> the alignment requirements passed in struct vm_unmapped_area_info, and
> unknown to mmap.c. Instead, introduce variants of
> arch_get_unmapped_area/_topdown() that take vm_flags. In future changes,
> these variants can be used in mmap.c:get_unmapped_area() to allow the
> vm_flags to be passed through to vm_unmapped_area(), while preserving the
> normal arch_get_unmapped_area/_topdown() for the existing callers.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> include/linux/sched/mm.h | 17 +++++++++++++++++
> mm/mmap.c | 28 ++++++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index cde946e926d8..7b44441865c5 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -191,6 +191,23 @@ unsigned long mm_get_unmapped_area(struct mm_struct *mm, struct file *filp,
> unsigned long addr, unsigned long len,
> unsigned long pgoff, unsigned long flags);
>
> +extern unsigned long
> +arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff,
> + unsigned long flags, vm_flags_t vm_flags);
> +extern unsigned long
> +arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff,
> + unsigned long flags, vm_flags_t);
Please, no new "extern", that's pointless for function prototypes and
outdate coding style. Checkpatch --strict is likely protesting about it.
> +
> +unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm,
> + struct file *filp,
> + unsigned long addr,
> + unsigned long len,
> + unsigned long pgoff,
> + unsigned long flags,
> + vm_flags_t vm_flags);
> +
> unsigned long
> generic_get_unmapped_area(struct file *filp, unsigned long addr,
> unsigned long len, unsigned long pgoff,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 39e9a3ae3ca5..e23ce8ca24c9 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1810,6 +1810,34 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
> }
> #endif
>
> +#ifndef HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
> +extern unsigned long
> +arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
> +{
> + return arch_get_unmapped_area(filp, addr, len, pgoff, flags);
> +}
Same, what is the point with that "extern" keyword in function definition ?
> +
> +extern unsigned long
> +arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff,
> + unsigned long flags, vm_flags_t vm_flags)
> +{
> + return arch_get_unmapped_area_topdown(filp, addr, len, pgoff, flags);
> +}
> +#endif
> +
> +unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *filp,
> + unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags,
> + vm_flags_t vm_flags)
> +{
> + if (test_bit(MMF_TOPDOWN, &mm->flags))
> + return arch_get_unmapped_area_topdown_vmflags(filp, addr, len, pgoff,
> + flags, vm_flags);
> + return arch_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, vm_flags);
> +}
> +
> unsigned long
> get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> unsigned long pgoff, unsigned long flags)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/12] x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
2024-03-12 22:28 ` [PATCH v3 10/12] x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS Rick Edgecombe
@ 2024-03-13 9:04 ` Christophe Leroy
2024-03-13 16:00 ` Edgecombe, Rick P
0 siblings, 1 reply; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 9:04 UTC (permalink / raw)
To: Rick Edgecombe, Liam.Howlett, akpm, bp, broonie, dave.hansen,
debug, hpa, keescook, kirill.shutemov, luto, mingo, peterz, tglx,
x86
Cc: linux-kernel, linux-mm
Le 12/03/2024 à 23:28, Rick Edgecombe a écrit :
> When memory is being placed, mmap() will take care to respect the guard
> gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
> VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
> needs to consider two things:
> 1. That the new mapping isn’t placed in an any existing mappings guard
> gaps.
> 2. That the new mapping isn’t placed such that any existing mappings
> are not in *its* guard gaps.
>
> The long standing behavior of mmap() is to ensure 1, but not take any care
> around 2. So for example, if there is a PAGE_SIZE free area, and a
> mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
> placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
> the mapping that is supposed to have a guard gap will not have a gap to
> the adjacent VMA.
>
> Add x86 arch implementations of arch_get_unmapped_area_vmflags/_topdown()
> so future changes can allow the guard gap of type of vma being placed to
> be taken into account. This will be used for shadow stack memory.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> v3:
> - Commit log grammar
>
> v2:
> - Remove unnecessary added extern
> ---
> arch/x86/include/asm/pgtable_64.h | 1 +
> arch/x86/kernel/sys_x86_64.c | 25 ++++++++++++++++++++-----
> 2 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
> index 24af25b1551a..13dcaf436efd 100644
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -244,6 +244,7 @@ extern void cleanup_highmap(void);
>
> #define HAVE_ARCH_UNMAPPED_AREA
> #define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
> +#define HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
>
> #define PAGE_AGP PAGE_KERNEL_NOCACHE
> #define HAVE_PAGE_AGP 1
> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
> index b3278e4f7e59..d6fbc4dd08ef 100644
> --- a/arch/x86/kernel/sys_x86_64.c
> +++ b/arch/x86/kernel/sys_x86_64.c
> @@ -120,8 +120,8 @@ static void find_start_end(unsigned long addr, unsigned long flags,
> }
>
> unsigned long
> -arch_get_unmapped_area(struct file *filp, unsigned long addr,
> - unsigned long len, unsigned long pgoff, unsigned long flags)
> +arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
> {
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma;
> @@ -156,9 +156,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
> }
>
> unsigned long
> -arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
> - const unsigned long len, const unsigned long pgoff,
> - const unsigned long flags)
> +arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr0,
> + unsigned long len, unsigned long pgoff,
> + unsigned long flags, vm_flags_t vm_flags)
> {
> struct vm_area_struct *vma;
> struct mm_struct *mm = current->mm;
> @@ -227,3 +227,18 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
> */
> return arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
> }
> +
> +unsigned long
> +arch_get_unmapped_area(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> + return arch_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0);
> +}
> +
> +unsigned long
> +arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr,
> + const unsigned long len, const unsigned long pgoff,
> + const unsigned long flags)
> +{
> + return arch_get_unmapped_area_topdown_vmflags(filp, addr, len, pgoff, flags, 0);
> +}
Wouldn't it be better to define those two as static inlines ?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 09/12] mm: Take placement mappings gap into account
2024-03-12 22:28 ` [PATCH v3 09/12] mm: Take placement mappings gap into account Rick Edgecombe
@ 2024-03-13 9:04 ` Christophe Leroy
2024-03-13 14:58 ` Edgecombe, Rick P
0 siblings, 1 reply; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 9:04 UTC (permalink / raw)
To: Rick Edgecombe, Liam.Howlett, akpm, bp, broonie, dave.hansen,
debug, hpa, keescook, kirill.shutemov, luto, mingo, peterz, tglx,
x86
Cc: linux-kernel, linux-mm
Le 12/03/2024 à 23:28, Rick Edgecombe a écrit :
> When memory is being placed, mmap() will take care to respect the guard
> gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
> VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
> needs to consider two things:
> 1. That the new mapping isn’t placed in an any existing mappings guard
> gaps.
> 2. That the new mapping isn’t placed such that any existing mappings
> are not in *its* guard gaps.
>
> The long standing behavior of mmap() is to ensure 1, but not take any care
> around 2. So for example, if there is a PAGE_SIZE free area, and a
> mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
> placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
> the mapping that is supposed to have a guard gap will not have a gap to
> the adjacent VMA.
>
> For MAP_GROWSDOWN/VM_GROWSDOWN and MAP_GROWSUP/VM_GROWSUP this has not
> been a problem in practice because applications place these kinds of
> mappings very early, when there is not many mappings to find a space
> between. But for shadow stacks, they may be placed throughout the lifetime
> of the application.
>
> Use the start_gap field to find a space that includes the guard gap for
> the new mapping. Take care to not interfere with the alignment.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> v3:
> - Spelling fix in comment
>
> v2:
> - Remove VM_UNMAPPED_START_GAP_SET and have struct vm_unmapped_area_info
> initialized with zeros (in another patch). (Kirill)
> - Drop unrelated space change (Kirill)
> - Add comment around interactions of alignment and start gap step
> (Kirill)
> ---
> include/linux/mm.h | 1 +
> mm/mmap.c | 12 +++++++++---
> 2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d91cde79aaee..deade7be00d0 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3418,6 +3418,7 @@ struct vm_unmapped_area_info {
> unsigned long high_limit;
> unsigned long align_mask;
> unsigned long align_offset;
> + unsigned long start_gap;
Only a start_gap is needed ? No need of an end_gap ?
> };
>
> extern unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b889c79d11bd..634e706fd97e 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1582,7 +1582,7 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
> MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
>
> /* Adjust search length to account for worst case alignment overhead */
> - length = info->length + info->align_mask;
> + length = info->length + info->align_mask + info->start_gap;
> if (length < info->length)
> return -ENOMEM;
>
> @@ -1594,7 +1594,13 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
> if (mas_empty_area(&mas, low_limit, high_limit - 1, length))
> return -ENOMEM;
>
> - gap = mas.index;
> + /*
> + * Adjust for the gap first so it doesn't interfere with the
> + * later alignment. The first step is the minimum needed to
> + * fulill the start gap, the next steps is the minimum to align
> + * that. It is the minimum needed to fulill both.
> + */
> + gap = mas.index + info->start_gap;
> gap += (info->align_offset - gap) & info->align_mask;
> tmp = mas_next(&mas, ULONG_MAX);
> if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
> @@ -1633,7 +1639,7 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info)
>
> MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
> /* Adjust search length to account for worst case alignment overhead */
> - length = info->length + info->align_mask;
> + length = info->length + info->align_mask + info->start_gap;
> if (length < info->length)
> return -ENOMEM;
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 06/12] parisc: Use initializer for struct vm_unmapped_area_info
2024-03-12 22:28 ` [PATCH v3 06/12] parisc: " Rick Edgecombe
@ 2024-03-13 9:04 ` Christophe Leroy
0 siblings, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 9:04 UTC (permalink / raw)
To: Rick Edgecombe, Liam.Howlett, akpm, bp, broonie, dave.hansen,
debug, hpa, keescook, kirill.shutemov, luto, mingo, peterz, tglx,
x86
Cc: linux-kernel, linux-mm, Helge Deller, James E.J. Bottomley, linux-parisc
Le 12/03/2024 à 23:28, Rick Edgecombe a écrit :
> Future changes will need to add a new member to struct
> vm_unmapped_area_info. This would cause trouble for any call site that
> doesn't initialize the struct. Currently every caller sets each member
> manually, so if new members are added they will be uninitialized and the
> core code parsing the struct will see garbage in the new member.
>
> It could be possible to initialize the new member manually to 0 at each
> call site. This and a couple other options were discussed, and a working
> consensus (see links) was that in general the best way to accomplish this
> would be via static initialization with designated member initiators.
> Having some struct vm_unmapped_area_info instances not zero initialized
> will put those sites at risk of feeding garbage into vm_unmapped_area() if
> the convention is to zero initialize the struct and any new member addition
> misses a call site that initializes each member manually.
>
> It could be possible to leave the code mostly untouched, and just change
> the line:
> struct vm_unmapped_area_info info
> to:
> struct vm_unmapped_area_info info = {};
>
> However, that would leave cleanup for the members that are manually set
> to zero, as it would no longer be required.
>
> So to be reduce the chance of bugs via uninitialized members, instead
> simply continue the process to initialize the struct this way tree wide.
> This will zero any unspecified members. Move the member initializers to the
> struct declaration when they are known at that time. Leave the members out
> that were manually initialized to zero, as this would be redundant for
> designated initializers.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> Acked-by: Helge Deller <deller@gmx.de>
> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: linux-parisc@vger.kernel.org
> Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
> Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
> ---
> v3:
> - Fixed spelling errors in log
> - Be consistent about field vs member in log
>
> Hi,
>
> This patch was split and refactored out of a tree-wide change [0] to just
> zero-init each struct vm_unmapped_area_info. The overall goal of the
> series is to help shadow stack guard gaps. Currently, there is only one
> arch with shadow stacks, but two more are in progress. It is compile tested
> only.
>
> There was further discussion that this method of initializing the structs
> while nice in some ways has a greater risk of introducing bugs in some of
> the more complicated callers. Since this version was reviewed my arch
> maintainers already, leave it as was already acknowledged.
>
> Thanks,
>
> Rick
>
> [0] https://lore.kernel.org/lkml/20240226190951.3240433-6-rick.p.edgecombe@intel.com/
> ---
> arch/parisc/kernel/sys_parisc.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
> index 98af719d5f85..f7722451276e 100644
> --- a/arch/parisc/kernel/sys_parisc.c
> +++ b/arch/parisc/kernel/sys_parisc.c
> @@ -104,7 +104,9 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
> struct vm_area_struct *vma, *prev;
> unsigned long filp_pgoff;
> int do_color_align;
> - struct vm_unmapped_area_info info;
> + struct vm_unmapped_area_info info = {
> + .length = len
> + };
>
> if (unlikely(len > TASK_SIZE))
> return -ENOMEM;
> @@ -139,7 +141,6 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
> return addr;
> }
>
> - info.length = len;
> info.align_mask = do_color_align ? (PAGE_MASK & (SHM_COLOUR - 1)) : 0;
> info.align_offset = shared_align_offset(filp_pgoff, pgoff);
>
> @@ -160,7 +161,6 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
> */
> }
>
> - info.flags = 0;
> info.low_limit = mm->mmap_base;
> info.high_limit = mmap_upper_limit(NULL);
> return vm_unmapped_area(&info);
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 05/12] csky: Use initializer for struct vm_unmapped_area_info
2024-03-12 22:28 ` [PATCH v3 05/12] csky: Use initializer for struct vm_unmapped_area_info Rick Edgecombe
@ 2024-03-13 9:04 ` Christophe Leroy
0 siblings, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 9:04 UTC (permalink / raw)
To: Rick Edgecombe, Liam.Howlett, akpm, bp, broonie, dave.hansen,
debug, hpa, keescook, kirill.shutemov, luto, mingo, peterz, tglx,
x86
Cc: linux-kernel, linux-mm, Guo Ren, linux-csky
Le 12/03/2024 à 23:28, Rick Edgecombe a écrit :
> Future changes will need to add a new member to struct
> vm_unmapped_area_info. This would cause trouble for any call site that
> doesn't initialize the struct. Currently every caller sets each member
> manually, so if new members are added they will be uninitialized and the
> core code parsing the struct will see garbage in the new member.
>
> It could be possible to initialize the new member manually to 0 at each
> call site. This and a couple other options were discussed, and a working
> consensus (see links) was that in general the best way to accomplish this
> would be via static initialization with designated member initiators.
> Having some struct vm_unmapped_area_info instances not zero initialized
> will put those sites at risk of feeding garbage into vm_unmapped_area() if
> the convention is to zero initialize the struct and any new member addition
> misses a call site that initializes each member manually.
>
> It could be possible to leave the code mostly untouched, and just change
> the line:
> struct vm_unmapped_area_info info
> to:
> struct vm_unmapped_area_info info = {};
>
> However, that would leave cleanup for the members that are manually set
> to zero, as it would no longer be required.
>
> So to be reduce the chance of bugs via uninitialized members, instead
> simply continue the process to initialize the struct this way tree wide.
> This will zero any unspecified members. Move the member initializers to the
> struct declaration when they are known at that time. Leave the members out
> that were manually initialized to zero, as this would be redundant for
> designated initializers.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Guo Ren <guoren@kernel.org>
> Cc: linux-csky@vger.kernel.org
> Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
> Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
> ---
> v3:
> - Fixed spelling errors in log
> - Be consistent about field vs member in log
>
> Hi,
>
> This patch was split and refactored out of a tree-wide change [0] to just
> zero-init each struct vm_unmapped_area_info. The overall goal of the
> series is to help shadow stack guard gaps. Currently, there is only one
> arch with shadow stacks, but two more are in progress. It is compile tested
> only.
>
> There was further discussion that this method of initializing the structs
> while nice in some ways has a greater risk of introducing bugs in some of
> the more complicated callers. Since this version was reviewed my arch
> maintainers already, leave it as was already acknowledged.
>
> Thanks,
>
> Rick
>
> [0] https://lore.kernel.org/lkml/20240226190951.3240433-6-rick.p.edgecombe@intel.com/
> ---
> arch/csky/abiv1/mmap.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/csky/abiv1/mmap.c b/arch/csky/abiv1/mmap.c
> index 6792aca49999..7f826331d409 100644
> --- a/arch/csky/abiv1/mmap.c
> +++ b/arch/csky/abiv1/mmap.c
> @@ -28,7 +28,12 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma;
> int do_align = 0;
> - struct vm_unmapped_area_info info;
> + struct vm_unmapped_area_info info = {
I see you didn't had .flags = 0, which is good. Wondering why you didn't
do the same for powerpc.
> + .length = len,
> + .low_limit = mm->mmap_base,
> + .high_limit = TASK_SIZE,
> + .align_offset = pgoff << PAGE_SHIFT
Usually we leave the comma at the end of the last element so that the
day you add a new element you don't have to change an existing line just
to add a comma.
> + };
>
> /*
> * We only need to do colour alignment if either the I or D
> @@ -61,11 +66,6 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
> return addr;
> }
>
> - info.flags = 0;
> - info.length = len;
> - info.low_limit = mm->mmap_base;
> - info.high_limit = TASK_SIZE;
> info.align_mask = do_align ? (PAGE_MASK & (SHMLBA - 1)) : 0;
> - info.align_offset = pgoff << PAGE_SHIFT;
> return vm_unmapped_area(&info);
> }
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 04/12] thp: Add thp_get_unmapped_area_vmflags()
2024-03-12 22:28 ` [PATCH v3 04/12] thp: Add thp_get_unmapped_area_vmflags() Rick Edgecombe
@ 2024-03-13 9:04 ` Christophe Leroy
0 siblings, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 9:04 UTC (permalink / raw)
To: Rick Edgecombe, Liam.Howlett, akpm, bp, broonie, dave.hansen,
debug, hpa, keescook, kirill.shutemov, luto, mingo, peterz, tglx,
x86
Cc: linux-kernel, linux-mm
Le 12/03/2024 à 23:28, Rick Edgecombe a écrit :
> When memory is being placed, mmap() will take care to respect the guard
> gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
> VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
> needs to consider two things:
> 1. That the new mapping isn’t placed in an any existing mappings guard
> gaps.
> 2. That the new mapping isn’t placed such that any existing mappings
> are not in *its* guard gaps.
>
> The long standing behavior of mmap() is to ensure 1, but not take any care
> around 2. So for example, if there is a PAGE_SIZE free area, and a
> mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
> placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
> the mapping that is supposed to have a guard gap will not have a gap to
> the adjacent VMA.
>
> Add a THP implementations of the vm_flags variant of get_unmapped_area().
> Future changes will call this from mmap.c in the do_mmap() path to allow
> shadow stacks to be placed with consideration taken for the start guard
> gap. Shadow stack memory is always private and anonymous and so special
> guard gap logic is not needed in a lot of caseis, but it can be mapped by
> THP, so needs to be handled.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> include/linux/huge_mm.h | 11 +++++++++++
> mm/huge_memory.c | 23 ++++++++++++++++-------
> mm/mmap.c | 12 +++++++-----
> 3 files changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 5adb86af35fc..8744c808d380 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -262,6 +262,9 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>
> unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> unsigned long len, unsigned long pgoff, unsigned long flags);
> +unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff, unsigned long flags,
> + vm_flags_t vm_flags);
>
> void folio_prep_large_rmappable(struct folio *folio);
> bool can_split_folio(struct folio *folio, int *pextra_pins);
> @@ -416,6 +419,14 @@ static inline void folio_prep_large_rmappable(struct folio *folio) {}
>
> #define thp_get_unmapped_area NULL
>
> +static inline unsigned long
> +thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff,
> + unsigned long flags, vm_flags_t vm_flags)
> +{
> + return 0;
> +}
> +
> static inline bool
> can_split_folio(struct folio *folio, int *pextra_pins)
> {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bc3bf441e768..349c93a1a7c3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -806,7 +806,8 @@ static inline bool is_transparent_hugepage(struct folio *folio)
>
> static unsigned long __thp_get_unmapped_area(struct file *filp,
> unsigned long addr, unsigned long len,
> - loff_t off, unsigned long flags, unsigned long size)
> + loff_t off, unsigned long flags, unsigned long size,
> + vm_flags_t vm_flags)
> {
> loff_t off_end = off + len;
> loff_t off_align = round_up(off, size);
> @@ -822,8 +823,8 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> if (len_pad < len || (off + len_pad) < off)
> return 0;
>
> - ret = mm_get_unmapped_area(current->mm, filp, addr, len_pad,
> - off >> PAGE_SHIFT, flags);
> + ret = mm_get_unmapped_area_vmflags(current->mm, filp, addr, len_pad,
> + off >> PAGE_SHIFT, flags, vm_flags);
>
> /*
> * The failure might be due to length padding. The caller will retry
> @@ -848,17 +849,25 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> return ret;
> }
>
> -unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> - unsigned long len, unsigned long pgoff, unsigned long flags)
> +unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff, unsigned long flags,
> + vm_flags_t vm_flags)
> {
> unsigned long ret;
> loff_t off = (loff_t)pgoff << PAGE_SHIFT;
>
> - ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE);
> + ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags);
> if (ret)
> return ret;
>
> - return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
> + return mm_get_unmapped_area_vmflags(current->mm, filp, addr, len, pgoff, flags,
> + vm_flags);
> +}
> +
> +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> + return thp_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0);
> }
> EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index a3128ed26676..68381b90f906 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1863,20 +1863,22 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> * so use shmem's get_unmapped_area in case it can be huge.
> */
> get_area = shmem_get_unmapped_area;
> - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> - /* Ensures that larger anonymous mappings are THP aligned. */
> - get_area = thp_get_unmapped_area;
> }
>
> /* Always treat pgoff as zero for anonymous memory. */
> if (!file)
> pgoff = 0;
>
> - if (get_area)
> + if (get_area) {
> addr = get_area(file, addr, len, pgoff, flags);
> - else
> + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> + /* Ensures that larger anonymous mappings are THP aligned. */
> + addr = thp_get_unmapped_area_vmflags(file, addr, len,
> + pgoff, flags, vm_flags);
> + } else {
> addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
> pgoff, flags, vm_flags);
> + }
> if (IS_ERR_VALUE(addr))
> return addr;
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 03/12] mm: Use get_unmapped_area_vmflags()
2024-03-12 22:28 ` [PATCH v3 03/12] mm: Use get_unmapped_area_vmflags() Rick Edgecombe
@ 2024-03-13 9:05 ` Christophe Leroy
2024-03-13 14:55 ` Edgecombe, Rick P
2024-03-13 15:55 ` Edgecombe, Rick P
0 siblings, 2 replies; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 9:05 UTC (permalink / raw)
To: Rick Edgecombe, Liam.Howlett, akpm, bp, broonie, dave.hansen,
debug, hpa, keescook, kirill.shutemov, luto, mingo, peterz, tglx,
x86
Cc: linux-kernel, linux-mm
Le 12/03/2024 à 23:28, Rick Edgecombe a écrit :
> When memory is being placed, mmap() will take care to respect the guard
> gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
> VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
> needs to consider two things:
> 1. That the new mapping isn’t placed in an any existing mappings guard
> gaps.
> 2. That the new mapping isn’t placed such that any existing mappings
> are not in *its* guard gaps.
>
> The long standing behavior of mmap() is to ensure 1, but not take any care
> around 2. So for example, if there is a PAGE_SIZE free area, and a
> mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
> placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
> the mapping that is supposed to have a guard gap will not have a gap to
> the adjacent VMA.
>
> Use mm_get_unmapped_area_vmflags() in the do_mmap() so future changes
> can cause shadow stack mappings to be placed with a guard gap. Also use
> the THP variant that takes vm_flags, such that THP shadow stack can get the
> same treatment. Adjust the vm_flags calculation to happen earlier so that
> the vm_flags can be passed into __get_unmapped_area().
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> v2:
> - Make get_unmapped_area() a static inline (Kirill)
> ---
> include/linux/mm.h | 11 ++++++++++-
> mm/mmap.c | 34 ++++++++++++++++------------------
> 2 files changed, 26 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f5a97dec5169..d91cde79aaee 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3363,7 +3363,16 @@ extern int install_special_mapping(struct mm_struct *mm,
> unsigned long randomize_stack_top(unsigned long stack_top);
> unsigned long randomize_page(unsigned long start, unsigned long range);
>
> -extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
> +unsigned long
> +__get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags);
> +
> +static inline unsigned long
> +get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags)
> +{
> + return __get_unmapped_area(file, addr, len, pgoff, flags, 0);
> +}
>
> extern unsigned long mmap_region(struct file *file, unsigned long addr,
> unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index e23ce8ca24c9..a3128ed26676 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1257,18 +1257,6 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> if (mm->map_count > sysctl_max_map_count)
> return -ENOMEM;
>
> - /* Obtain the address to map to. we verify (or select) it and ensure
> - * that it represents a valid section of the address space.
> - */
> - addr = get_unmapped_area(file, addr, len, pgoff, flags);
> - if (IS_ERR_VALUE(addr))
> - return addr;
> -
> - if (flags & MAP_FIXED_NOREPLACE) {
> - if (find_vma_intersection(mm, addr, addr + len))
> - return -EEXIST;
> - }
> -
> if (prot == PROT_EXEC) {
> pkey = execute_only_pkey(mm);
> if (pkey < 0)
> @@ -1282,6 +1270,18 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
> mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
>
> + /* Obtain the address to map to. we verify (or select) it and ensure
> + * that it represents a valid section of the address space.
> + */
> + addr = __get_unmapped_area(file, addr, len, pgoff, flags, vm_flags);
> + if (IS_ERR_VALUE(addr))
> + return addr;
> +
> + if (flags & MAP_FIXED_NOREPLACE) {
> + if (find_vma_intersection(mm, addr, addr + len))
> + return -EEXIST;
> + }
> +
> if (flags & MAP_LOCKED)
> if (!can_do_mlock())
> return -EPERM;
> @@ -1839,8 +1839,8 @@ unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *fi
> }
>
> unsigned long
> -get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> - unsigned long pgoff, unsigned long flags)
> +__get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
> {
> unsigned long (*get_area)(struct file *, unsigned long,
> unsigned long, unsigned long, unsigned long)
> @@ -1875,8 +1875,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> if (get_area)
> addr = get_area(file, addr, len, pgoff, flags);
What about get_area(), what happens to vm_flags in case that function is
used ?
> else
> - addr = mm_get_unmapped_area(current->mm, file, addr, len,
> - pgoff, flags);
> + addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
> + pgoff, flags, vm_flags);
> if (IS_ERR_VALUE(addr))
> return addr;
>
> @@ -1889,8 +1889,6 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> return error ? error : addr;
> }
>
> -EXPORT_SYMBOL(get_unmapped_area);
> -
Don't you have to export __get_unmapped_area() so that the static inline
get_unmapped_area() used in a module have access to __get_unmapped_area() ?
If it was pointless to export get_unmapped_area(), its removal should be
a separate patch.
> unsigned long
> mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
> unsigned long addr, unsigned long len,
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 01/12] mm: Switch mm->get_unmapped_area() to a flag
2024-03-13 7:19 ` Christophe Leroy
@ 2024-03-13 14:48 ` Edgecombe, Rick P
2024-03-13 17:20 ` Christophe Leroy
0 siblings, 1 reply; 36+ messages in thread
From: Edgecombe, Rick P @ 2024-03-13 14:48 UTC (permalink / raw)
To: debug, luto, x86, Liam.Howlett, broonie, keescook, dave.hansen,
hpa, christophe.leroy, akpm, mingo, kirill.shutemov, bp, tglx,
peterz
Cc: linux-mm, linux-kernel
On Wed, 2024-03-13 at 07:19 +0000, Christophe Leroy wrote:
> This patch is quite big and un-easy to follow. Would be worth
> splitting
> in several patches if possible. Some of the changes seem to go
> further
> than just switching mm->get_unmapped_area() to a flag.
>
> First patch could add the new flag and necessary helpers, then
> following
> patches could convert sub-systems one by one then last patch would
> remove mm->get_unmapped_area() once all users are converted.
So you are saying to do the tracking in both the new flag and mm-
>get_unmapped_area() during the conversion process and then remove the
pointer at the end? I guess it could be broken out, but most of the
conversions are trivial one line changes. Hmm, I'm not sure.
[snip]
>
> > #ifdef CONFIG_MMU
> > - if (!get_area)
> > - get_area = current->mm->get_unmapped_area;
> > + else
> > + return mm_get_unmapped_area(current->mm, file,
> > orig_addr,
> > + len, pgoff, flags);
> > #endif
> > - if (get_area)
> > - return get_area(file, orig_addr, len, pgoff,
> > flags);
> > +
> > return orig_addr;
> > }
>
> The change looks unclear at first look. Ok after looking a second
> time
> it seems to simplify things, but would be better as a separate patch.
> Don't know.
Hmm. I think the only way to do it in smaller chunks is to do both
methods of tracking the direction during the conversion process. And
then the smaller pieces would be really small. So it would probably
help for changes like this, but otherwise would generate a lot of
patches with small changes.
The steps are basically:
1. Introduce flag and helpers
2. convert arch's to use it one by one
3. convert callers to use mm_get_unmapped_area() one by one
4. remove setting get_unmapped_area in each arch
5. remove get_unmapped_area
Step 3 is where the few non-oneline changes would be, but most would
still be one liners. 1, 2, 4 and 5 seem simpler as a tree wide patch
because of the one line changes.
I don't know any other variations are a ton simpler. Hopefully others
will weigh in.
[snip]
> >
> > +unsigned long
> > +mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
> > + unsigned long addr, unsigned long len,
> > + unsigned long pgoff, unsigned long flags)
> > +{
> > + if (test_bit(MMF_TOPDOWN, &mm->flags))
> > + return arch_get_unmapped_area_topdown(file, addr,
> > len, pgoff, flags);
> > + return arch_get_unmapped_area(file, addr, len, pgoff,
> > flags);
> > +}
>
> This function seems quite simple, wouldn't it be better to make it a
> static inline ?
Then all of the arch_get_unmapped_area() and
arch_get_unmapped_area_topdown() would need to be exported. I think it
is better to only export the higher level functions.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 02/12] mm: Introduce arch_get_unmapped_area_vmflags()
2024-03-13 7:22 ` Christophe Leroy
@ 2024-03-13 14:51 ` Edgecombe, Rick P
0 siblings, 0 replies; 36+ messages in thread
From: Edgecombe, Rick P @ 2024-03-13 14:51 UTC (permalink / raw)
To: debug, luto, x86, Liam.Howlett, broonie, keescook, dave.hansen,
hpa, christophe.leroy, akpm, mingo, kirill.shutemov, bp, tglx,
peterz
Cc: linux-mm, linux-kernel
On Wed, 2024-03-13 at 07:22 +0000, Christophe Leroy wrote:
>
> Please, no new "extern", that's pointless for function prototypes and
> outdate coding style. Checkpatch --strict is likely protesting about
> it.
I was trying to match the style of the surrounding code in the headers.
Previous feedback was to not fix some alignment as well.
But the extern in the mmap.c file was just a mistake. I can remove the
externs.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 03/12] mm: Use get_unmapped_area_vmflags()
2024-03-13 9:05 ` Christophe Leroy
@ 2024-03-13 14:55 ` Edgecombe, Rick P
2024-03-13 16:49 ` Christophe Leroy
2024-03-13 15:55 ` Edgecombe, Rick P
1 sibling, 1 reply; 36+ messages in thread
From: Edgecombe, Rick P @ 2024-03-13 14:55 UTC (permalink / raw)
To: debug, luto, x86, Liam.Howlett, broonie, keescook, dave.hansen,
hpa, christophe.leroy, akpm, mingo, kirill.shutemov, bp, tglx,
peterz
Cc: linux-mm, linux-kernel
On Wed, 2024-03-13 at 09:05 +0000, Christophe Leroy wrote:
>
> What about get_area(), what happens to vm_flags in case that function
> is
> used ?
Shadow stack can only be private and anonymous memory. So this way is
to avoid unnecessary plumbing. And that plumbing becomes quite
extensive when you start intersecting with the
f_ops->get_unmapped_area(), so its pretty fortunate.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 07/12] powerpc: Use initializer for struct vm_unmapped_area_info
2024-03-13 6:44 ` Christophe Leroy
@ 2024-03-13 14:57 ` Edgecombe, Rick P
2024-03-13 21:58 ` Michael Ellerman
0 siblings, 1 reply; 36+ messages in thread
From: Edgecombe, Rick P @ 2024-03-13 14:57 UTC (permalink / raw)
To: debug, luto, x86, Liam.Howlett, broonie, keescook, dave.hansen,
hpa, christophe.leroy, akpm, mingo, kirill.shutemov, bp, tglx,
peterz
Cc: linuxppc-dev, linux-mm, mpe, naveen.n.rao, linux-kernel, npiggin,
aneesh.kumar
On Wed, 2024-03-13 at 06:44 +0000, Christophe Leroy wrote:
> I understand from this text that, as agreed, this patch removes the
> pointless/redundant zero-init of individual members. But it is not
> what
> is done, see below ?
Err, right. I think I decided to leave it because it was already acked
and there wasn't enough discussion on the ack to be sure. I will update
it.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 09/12] mm: Take placement mappings gap into account
2024-03-13 9:04 ` Christophe Leroy
@ 2024-03-13 14:58 ` Edgecombe, Rick P
2024-03-13 16:51 ` Christophe Leroy
0 siblings, 1 reply; 36+ messages in thread
From: Edgecombe, Rick P @ 2024-03-13 14:58 UTC (permalink / raw)
To: debug, luto, x86, Liam.Howlett, broonie, keescook, dave.hansen,
hpa, christophe.leroy, akpm, mingo, kirill.shutemov, bp, tglx,
peterz
Cc: linux-mm, linux-kernel
On Wed, 2024-03-13 at 09:04 +0000, Christophe Leroy wrote:
>
> Only a start_gap is needed ? No need of an end_gap ?
Yea, shadow stacks only have a start gap. If any end gap is needed, it
will be much easier to add after all of this.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 08/12] treewide: Use initializer for struct vm_unmapped_area_info
2024-03-13 3:18 ` Kees Cook
@ 2024-03-13 15:40 ` Edgecombe, Rick P
0 siblings, 0 replies; 36+ messages in thread
From: Edgecombe, Rick P @ 2024-03-13 15:40 UTC (permalink / raw)
To: keescook
Cc: linux-s390, linux-sh, luto, dave.hansen, debug, akpm,
Liam.Howlett, mingo, linux-csky, linux-kernel, christophe.leroy,
linux-mm, kirill.shutemov, tglx, linux-snps-arc, hpa, peterz,
sparclinux, loongarch, bp, linux-alpha, linux-mips,
linux-arm-kernel, x86, broonie
On Tue, 2024-03-12 at 20:18 -0700, Kees Cook wrote:
>
> Thanks! This looks to do exactly what it describes. :)
>
> Reviewed-by: Kees Cook <keescook@chromium.org>
Thanks!
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 03/12] mm: Use get_unmapped_area_vmflags()
2024-03-13 9:05 ` Christophe Leroy
2024-03-13 14:55 ` Edgecombe, Rick P
@ 2024-03-13 15:55 ` Edgecombe, Rick P
1 sibling, 0 replies; 36+ messages in thread
From: Edgecombe, Rick P @ 2024-03-13 15:55 UTC (permalink / raw)
To: debug, luto, x86, Liam.Howlett, broonie, keescook, dave.hansen,
hpa, christophe.leroy, akpm, mingo, kirill.shutemov, bp, tglx,
peterz
Cc: linux-mm, linux-kernel
On Wed, 2024-03-13 at 09:05 +0000, Christophe Leroy wrote:
> > -EXPORT_SYMBOL(get_unmapped_area);
> > -
>
> Don't you have to export __get_unmapped_area() so that the static
> inline
> get_unmapped_area() used in a module have access to
> __get_unmapped_area() ?
>
> If it was pointless to export get_unmapped_area(), its removal should
> be
> a separate patch.
Yes, it seems to have been pointless. I can split it out. Thanks.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/12] x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
2024-03-13 9:04 ` Christophe Leroy
@ 2024-03-13 16:00 ` Edgecombe, Rick P
2024-03-18 1:00 ` Edgecombe, Rick P
0 siblings, 1 reply; 36+ messages in thread
From: Edgecombe, Rick P @ 2024-03-13 16:00 UTC (permalink / raw)
To: debug, luto, x86, Liam.Howlett, broonie, keescook, dave.hansen,
hpa, christophe.leroy, akpm, mingo, kirill.shutemov, bp, tglx,
peterz
Cc: linux-mm, linux-kernel
On Wed, 2024-03-13 at 09:04 +0000, Christophe Leroy wrote:
> > +
> > +unsigned long
> > +arch_get_unmapped_area(struct file *filp, unsigned long addr,
> > + unsigned long len, unsigned long pgoff, unsigned
> > long flags)
> > +{
> > + return arch_get_unmapped_area_vmflags(filp, addr, len,
> > pgoff, flags, 0);
> > +}
> > +
> > +unsigned long
> > +arch_get_unmapped_area_topdown(struct file *filp, const unsigned
> > long addr,
> > + const unsigned long len, const unsigned
> > long pgoff,
> > + const unsigned long flags)
> > +{
> > + return arch_get_unmapped_area_topdown_vmflags(filp, addr,
> > len, pgoff, flags, 0);
> > +}
>
> Wouldn't it be better to define those two as static inlines ?
Yes, I think so. It is generic functionality (though not needed until
the next shadow stack feature), so doesn't need to be in arch/x86
either.
Thanks for your comments on the series and the RB tags.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 03/12] mm: Use get_unmapped_area_vmflags()
2024-03-13 14:55 ` Edgecombe, Rick P
@ 2024-03-13 16:49 ` Christophe Leroy
0 siblings, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 16:49 UTC (permalink / raw)
To: Edgecombe, Rick P, debug, luto, x86, Liam.Howlett, broonie,
keescook, dave.hansen, hpa, akpm, mingo, kirill.shutemov, bp,
tglx, peterz
Cc: linux-mm, linux-kernel
Le 13/03/2024 à 15:55, Edgecombe, Rick P a écrit :
> On Wed, 2024-03-13 at 09:05 +0000, Christophe Leroy wrote:
>>
>> What about get_area(), what happens to vm_flags in case that function
>> is
>> used ?
>
> Shadow stack can only be private and anonymous memory. So this way is
> to avoid unnecessary plumbing. And that plumbing becomes quite
> extensive when you start intersecting with the
> f_ops->get_unmapped_area(), so its pretty fortunate.
Ok,
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 09/12] mm: Take placement mappings gap into account
2024-03-13 14:58 ` Edgecombe, Rick P
@ 2024-03-13 16:51 ` Christophe Leroy
0 siblings, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 16:51 UTC (permalink / raw)
To: Edgecombe, Rick P, debug, luto, x86, Liam.Howlett, broonie,
keescook, dave.hansen, hpa, akpm, mingo, kirill.shutemov, bp,
tglx, peterz
Cc: linux-mm, linux-kernel
Le 13/03/2024 à 15:58, Edgecombe, Rick P a écrit :
> On Wed, 2024-03-13 at 09:04 +0000, Christophe Leroy wrote:
>>
>> Only a start_gap is needed ? No need of an end_gap ?
>
> Yea, shadow stacks only have a start gap. If any end gap is needed, it
> will be much easier to add after all of this.
Ok
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 01/12] mm: Switch mm->get_unmapped_area() to a flag
2024-03-13 14:48 ` Edgecombe, Rick P
@ 2024-03-13 17:20 ` Christophe Leroy
0 siblings, 0 replies; 36+ messages in thread
From: Christophe Leroy @ 2024-03-13 17:20 UTC (permalink / raw)
To: Edgecombe, Rick P, debug, luto, x86, Liam.Howlett, broonie,
keescook, dave.hansen, hpa, akpm, mingo, kirill.shutemov, bp,
tglx, peterz
Cc: linux-mm, linux-kernel
Le 13/03/2024 à 15:48, Edgecombe, Rick P a écrit :
> On Wed, 2024-03-13 at 07:19 +0000, Christophe Leroy wrote:
>> This patch is quite big and un-easy to follow. Would be worth
>> splitting
>> in several patches if possible. Some of the changes seem to go
>> further
>> than just switching mm->get_unmapped_area() to a flag.
>>
>> First patch could add the new flag and necessary helpers, then
>> following
>> patches could convert sub-systems one by one then last patch would
>> remove mm->get_unmapped_area() once all users are converted.
>
> So you are saying to do the tracking in both the new flag and mm-
>> get_unmapped_area() during the conversion process and then remove the
> pointer at the end? I guess it could be broken out, but most of the
> conversions are trivial one line changes. Hmm, I'm not sure.
>
> [snip]
>>
>>> #ifdef CONFIG_MMU
>>> - if (!get_area)
>>> - get_area = current->mm->get_unmapped_area;
>>> + else
>>> + return mm_get_unmapped_area(current->mm, file,
>>> orig_addr,
>>> + len, pgoff, flags);
>>> #endif
>>> - if (get_area)
>>> - return get_area(file, orig_addr, len, pgoff,
>>> flags);
>>> +
>>> return orig_addr;
>>> }
>>
>> The change looks unclear at first look. Ok after looking a second
>> time
>> it seems to simplify things, but would be better as a separate patch.
>> Don't know.
>
> Hmm. I think the only way to do it in smaller chunks is to do both
> methods of tracking the direction during the conversion process. And
> then the smaller pieces would be really small. So it would probably
> help for changes like this, but otherwise would generate a lot of
> patches with small changes.
Yes. Maybe the best would be to have a preparation patch to churn this
function a bit so that when it comes to the conservion it is trivial.
Something like:
if (pde->proc_ops->proc_get_unmapped_area)
return pde->proc_ops->proc_get_unmapped_area(file, orig_addr, len,
pgoff, flags);
#ifdef CONFIG_MMU
return current->mm->get_unmapped_area(file, orig_addr, len, pgoff, flags);
#endif
return orig_addr;
Note that a length of 100 chars is now tolerated when it eases reading
so you should avoid those 3 lines.
And the else inside #ifdef CONFIG_MMU is not needed because 'if' has
returned.
>
> The steps are basically:
> 1. Introduce flag and helpers
> 2. convert arch's to use it one by one
> 3. convert callers to use mm_get_unmapped_area() one by one
> 4. remove setting get_unmapped_area in each arch
> 5. remove get_unmapped_area
>
> Step 3 is where the few non-oneline changes would be, but most would
> still be one liners. 1, 2, 4 and 5 seem simpler as a tree wide patch
> because of the one line changes.
I missed the setting of get_unmapped_area by each arch, you are right it
might be complicated at the end.
>
> I don't know any other variations are a ton simpler. Hopefully others
> will weigh in.
>
>
>
> [snip]
>>>
>>> +unsigned long
>>> +mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
>>> + unsigned long addr, unsigned long len,
>>> + unsigned long pgoff, unsigned long flags)
>>> +{
>>> + if (test_bit(MMF_TOPDOWN, &mm->flags))
>>> + return arch_get_unmapped_area_topdown(file, addr,
>>> len, pgoff, flags);
>>> + return arch_get_unmapped_area(file, addr, len, pgoff,
>>> flags);
>>> +}
>>
>> This function seems quite simple, wouldn't it be better to make it a
>> static inline ?
>
> Then all of the arch_get_unmapped_area() and
> arch_get_unmapped_area_topdown() would need to be exported. I think it
> is better to only export the higher level functions.
Right.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 07/12] powerpc: Use initializer for struct vm_unmapped_area_info
2024-03-13 14:57 ` Edgecombe, Rick P
@ 2024-03-13 21:58 ` Michael Ellerman
0 siblings, 0 replies; 36+ messages in thread
From: Michael Ellerman @ 2024-03-13 21:58 UTC (permalink / raw)
To: Edgecombe, Rick P, debug, luto, x86, Liam.Howlett, broonie,
keescook, dave.hansen, hpa, christophe.leroy, akpm, mingo,
kirill.shutemov, bp, tglx, peterz
Cc: linuxppc-dev, linux-mm, naveen.n.rao, linux-kernel, npiggin,
aneesh.kumar
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> writes:
> On Wed, 2024-03-13 at 06:44 +0000, Christophe Leroy wrote:
>> I understand from this text that, as agreed, this patch removes the
>> pointless/redundant zero-init of individual members. But it is not
>> what
>> is done, see below ?
>
> Err, right. I think I decided to leave it because it was already acked
> and there wasn't enough discussion on the ack to be sure. I will update
> it.
That's fine by me, you can keep my ack.
cheers
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/12] x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
2024-03-13 16:00 ` Edgecombe, Rick P
@ 2024-03-18 1:00 ` Edgecombe, Rick P
0 siblings, 0 replies; 36+ messages in thread
From: Edgecombe, Rick P @ 2024-03-18 1:00 UTC (permalink / raw)
To: debug, luto, x86, Liam.Howlett, broonie, keescook, dave.hansen,
hpa, christophe.leroy, akpm, mingo, kirill.shutemov, bp, tglx,
peterz
Cc: linux-mm, linux-kernel
On Wed, 2024-03-13 at 09:00 -0700, Rick Edgecombe wrote:
> > Wouldn't it be better to define those two as static inlines ?
>
> Yes, I think so.
I gave this a try and it turned out not to fit into any header too
well. I decided putting it into a generic header was not great since
these are actually supposed to be arch implementations. Emphasizing
that point, mmap.c actually defines these unless
HAVE_ARCH_UNMAPPED_AREA is defined. So to make it work, mm.h would have
to assume that if HAVE_ARCH_UNMAPPED_AREA and
HAVE_ARCH_UNMAPPED_AREA_TOPDOWN are defined, the arch actually doesn't
have an arch_unmmapped_area() and wants a static inline version of
arch_get_unmapped_area(). It confuses the meaning of
HAVE_ARCH_UNMAPPED_AREA a bit to mean the opposite in some cases.
Adding a ARCH_WANTS_UNMAPPED_AREA_INLINE seemed excessive.
As for putting them in an arch/x86 header, I tried asm/mmu.h, but
arch_get_unmapped_area_topdown/_vmflags() had to be forward declared.
But then also vm_flags_t couldn't be pulled in properly because of a
circular dependency. A few others hit weirdness or breakages.
So in the end, I decided to just leave it as a non-static inline in
arch/x86. Unless there are any objections, I'm going to just let 0-day
build test all the archs, and I'll post the series with the rest of the
feedback in a few days.
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2024-03-18 1:00 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-12 22:28 [PATCH v3 00/12] Cover a guard gap corner case Rick Edgecombe
2024-03-12 22:28 ` [PATCH v3 01/12] mm: Switch mm->get_unmapped_area() to a flag Rick Edgecombe
2024-03-13 7:19 ` Christophe Leroy
2024-03-13 14:48 ` Edgecombe, Rick P
2024-03-13 17:20 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 02/12] mm: Introduce arch_get_unmapped_area_vmflags() Rick Edgecombe
2024-03-13 7:22 ` Christophe Leroy
2024-03-13 14:51 ` Edgecombe, Rick P
2024-03-12 22:28 ` [PATCH v3 03/12] mm: Use get_unmapped_area_vmflags() Rick Edgecombe
2024-03-13 9:05 ` Christophe Leroy
2024-03-13 14:55 ` Edgecombe, Rick P
2024-03-13 16:49 ` Christophe Leroy
2024-03-13 15:55 ` Edgecombe, Rick P
2024-03-12 22:28 ` [PATCH v3 04/12] thp: Add thp_get_unmapped_area_vmflags() Rick Edgecombe
2024-03-13 9:04 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 05/12] csky: Use initializer for struct vm_unmapped_area_info Rick Edgecombe
2024-03-13 9:04 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 06/12] parisc: " Rick Edgecombe
2024-03-13 9:04 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 07/12] powerpc: " Rick Edgecombe
2024-03-13 6:44 ` Christophe Leroy
2024-03-13 14:57 ` Edgecombe, Rick P
2024-03-13 21:58 ` Michael Ellerman
2024-03-12 22:28 ` [PATCH v3 08/12] treewide: " Rick Edgecombe
2024-03-13 3:18 ` Kees Cook
2024-03-13 15:40 ` Edgecombe, Rick P
2024-03-12 22:28 ` [PATCH v3 09/12] mm: Take placement mappings gap into account Rick Edgecombe
2024-03-13 9:04 ` Christophe Leroy
2024-03-13 14:58 ` Edgecombe, Rick P
2024-03-13 16:51 ` Christophe Leroy
2024-03-12 22:28 ` [PATCH v3 10/12] x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS Rick Edgecombe
2024-03-13 9:04 ` Christophe Leroy
2024-03-13 16:00 ` Edgecombe, Rick P
2024-03-18 1:00 ` Edgecombe, Rick P
2024-03-12 22:28 ` [PATCH v3 11/12] x86/mm: Care about shadow stack guard gap during placement Rick Edgecombe
2024-03-12 22:28 ` [PATCH v3 12/12] selftests/x86: Add placement guard gap test for shstk Rick Edgecombe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox