linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* BUG: vdso changes expose elf mapping issue
@ 2025-04-25 12:41 Ryan Roberts
  2025-04-25 18:37 ` Catalin Marinas
  0 siblings, 1 reply; 4+ messages in thread
From: Ryan Roberts @ 2025-04-25 12:41 UTC (permalink / raw)
  To: Kees Cook, Thomas Weißschuh, Catalin Marinas
  Cc: Linux Kernel Mailing List, Linux-MM, linux-arm-kernel

Hi,

I'm hitting a nasty bug which is preventing VSCode from connecting to my arm64 
VM running v6.15-rc3. Bisection fingers Commit 0b3bc3354eb9 ("arm64: vdso: 
Switch to generic storage implementation") as the point where this started 
failing.

Debugging this, the root cause is due to ldconfig crashing with a segmentation 
fault (I have no idea why VSCode thinks it needs to run this...). The segfault 
happens because ldconfig's attempt to expand the program break fails because 
vvar/vdso are in the way. The above change expands vvar by 2 pages and this 
causes the problem.

But I don't think we can really blame this commit...

ldconfig is a statically linked, PIE executable. The kernel treats this as an 
interpreter and therefore does not map it into low memory but instead maps it 
into high memory using mmap() (mmap is top-down on arm64). Once it's mapped, 
vvar/vdso gets mapped and fills the hole right at the top that is left due to 
ldconfig's alignment requirements. Before the above change, there were 2 pages 
free between the end of the data segment and vvar; this was enough for ldconfig 
to get it's required memory with brk(). But after the change there is no space:

Before:
fffff7f20000-fffff7fde000 r-xp 00000000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
fffff7fee000-fffff7ff5000 rw-p 000be000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
fffff7ff5000-fffff7ffa000 rw-p 00000000 00:00 0 
fffff7ffc000-fffff7ffe000 r--p 00000000 00:00 0                          [vvar]
fffff7ffe000-fffff8000000 r-xp 00000000 00:00 0                          [vdso]
fffffffdf000-1000000000000 rw-p 00000000 00:00 0                         [stack]

After:
fffff7f20000-fffff7fde000 r-xp 00000000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
fffff7fee000-fffff7ff5000 rw-p 000be000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
fffff7ff5000-fffff7ffa000 rw-p 00000000 00:00 0 
fffff7ffa000-fffff7ffe000 r--p 00000000 00:00 0                          [vvar]
fffff7ffe000-fffff8000000 r-xp 00000000 00:00 0                          [vdso]
fffffffdf000-1000000000000 rw-p 00000000 00:00 0                         [stack]

Note that this issue only occurs with ASLR disabled. When ASLR is enabled, the 
brk region is setup in the low memory region that would normally be used by 
primary executable.

So the issue is that when ASLR is disabled, these statically linked, PIE 
programs are mapped with insufficient space to expand the break.

I think in an ideal world, the kernel would notice that this is not an 
interpreter and map it to low memory. But I guess we can't know that for the 
case where the interpreter is invoked directly (as apposed to being referenced 
in the .interp section of the invoked binary)?

Another option would be to always relocate the break to low memory (but without 
the random offset for the ASLR=off case). But it looks like there could be some 
compat issues there? I see CONFIG_COMPAT_BRK...

Or we could just ensure we enforce some dead space after the end of the program 
that nothing else is (initially) mapped into. I think this could be done by 
overallocating the initial MAP_FIXED_NOREPLACE mmap, then munmapping the hole 
after ARCH_SETUP_ADDITIONAL_PAGES(). But it's not really clear what the correct 
reservation size would be, and any mmaps the program does will start to fill 
that space.

I'm hoping someone has some suggestions...

Thanks,
Ryan



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG: vdso changes expose elf mapping issue
  2025-04-25 12:41 BUG: vdso changes expose elf mapping issue Ryan Roberts
@ 2025-04-25 18:37 ` Catalin Marinas
  2025-04-25 19:56   ` Kees Cook
  0 siblings, 1 reply; 4+ messages in thread
From: Catalin Marinas @ 2025-04-25 18:37 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Kees Cook, Thomas Weißschuh, Linux Kernel Mailing List,
	Linux-MM, linux-arm-kernel

On Fri, Apr 25, 2025 at 01:41:31PM +0100, Ryan Roberts wrote:
> ldconfig is a statically linked, PIE executable. The kernel treats this as an 
> interpreter and therefore does not map it into low memory but instead maps it 
> into high memory using mmap() (mmap is top-down on arm64). Once it's mapped, 
> vvar/vdso gets mapped and fills the hole right at the top that is left due to 
> ldconfig's alignment requirements. Before the above change, there were 2 pages 
> free between the end of the data segment and vvar; this was enough for ldconfig 
> to get it's required memory with brk(). But after the change there is no space:
> 
> Before:
> fffff7f20000-fffff7fde000 r-xp 00000000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
> fffff7fee000-fffff7ff5000 rw-p 000be000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
> fffff7ff5000-fffff7ffa000 rw-p 00000000 00:00 0 
> fffff7ffc000-fffff7ffe000 r--p 00000000 00:00 0                          [vvar]
> fffff7ffe000-fffff8000000 r-xp 00000000 00:00 0                          [vdso]
> fffffffdf000-1000000000000 rw-p 00000000 00:00 0                         [stack]
> 
> After:
> fffff7f20000-fffff7fde000 r-xp 00000000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
> fffff7fee000-fffff7ff5000 rw-p 000be000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
> fffff7ff5000-fffff7ffa000 rw-p 00000000 00:00 0 
> fffff7ffa000-fffff7ffe000 r--p 00000000 00:00 0                          [vvar]
> fffff7ffe000-fffff8000000 r-xp 00000000 00:00 0                          [vdso]
> fffffffdf000-1000000000000 rw-p 00000000 00:00 0                         [stack]

It does look like we've just been lucky so far. An ELF file requiring a
slightly larger brk (by two pages), it could fail. FWIW, briefly after
commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for
static PIE"), we got:

          Start Addr           End Addr       Size     Offset  Perms  objfile
      0xaaaaaaaa0000     0xaaaaaab5d000    0xbd000        0x0  r-xp   /usr/sbin/ldconfig
      0xaaaaaab6b000     0xaaaaaab73000     0x8000    0xcb000  rw-p   /usr/sbin/ldconfig
      0xaaaaaab73000     0xaaaaaab78000     0x5000        0x0  rw-p   [heap]
      0xfffff7ffd000     0xfffff7fff000     0x2000        0x0  r--p   [vvar]
      0xfffff7fff000     0xfffff8000000     0x1000        0x0  r-xp   [vdso]
      0xfffffffdf000    0x1000000000000    0x21000        0x0  rw-p   [stack]

This looks like a better layout to me when you load an ET_DYN file
without !PT_INTERP.

When the commit was reverted by aeb7923733d1 ("revert "fs/binfmt_elf:
use PT_LOAD p_align values for static PIE""), we went back to:

          Start Addr           End Addr       Size     Offset  Perms  objfile
      0xfffff7f28000     0xfffff7fe5000    0xbd000        0x0  r-xp   /usr/sbin/ldconfig
      0xfffff7ff0000     0xfffff7ff2000     0x2000        0x0  r--p   [vvar]
      0xfffff7ff2000     0xfffff7ff3000     0x1000        0x0  r-xp   [vdso]
      0xfffff7ff3000     0xfffff7ffb000     0x8000    0xcb000  rw-p   /usr/sbin/ldconfig
      0xfffff7ffb000     0xfffff8000000     0x5000        0x0  rw-p   [heap]
      0xfffffffdf000    0x1000000000000    0x21000        0x0  rw-p   [stack]

With 6.15-rc3 my layout looks like Ryan's but in 5.18 above, the vdso is
small enough and it's squeezed between the two ldconfig sections.

Quick hack forcing the vdso alignment to SZ_64K on arm64 (not a
solution, it can still fail in other ways):

------------------8<----------------------------
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 78ddf6bdecad..9f9085e3e203 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -111,7 +111,7 @@ static int __setup_additional_pages(enum vdso_abi abi,

 	vdso_text_len = vdso_info[abi].vdso_pages << PAGE_SHIFT;
 	/* Be sure to map the data page */
-	vdso_mapping_len = vdso_text_len + VDSO_NR_PAGES * PAGE_SIZE;
+	vdso_mapping_len = ALIGN(vdso_text_len + VDSO_NR_PAGES * PAGE_SIZE, SZ_64K);

 	vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
------------------8<----------------------------

gives:

          Start Addr           End Addr       Size     Offset  Perms  objfile
      0xfffff7f10000     0xfffff7f14000     0x4000        0x0  r--p   [vvar]
      0xfffff7f14000     0xfffff7f16000     0x2000        0x0  r-xp   [vdso]
      0xfffff7f20000     0xfffff7fdd000    0xbd000        0x0  r-xp   /usr/sbin/ldconfig
      0xfffff7feb000     0xfffff7ff3000     0x8000    0xcb000  rw-p   /usr/sbin/ldconfig
      0xfffff7ff3000     0xfffff7ff8000     0x5000        0x0  rw-p
      0xfffffffdf000    0x1000000000000    0x21000        0x0  rw-p   [stack]


Not sure what the best solution. IIUC when ld.so is loaded as an
interpreter, it wouldn't need brk above its data section. However, when
something like ldconfig (or ld.so) is executed directly, it should be
loaded like any other ET_DYN executable at ELF_ET_DYN_BASE, e.g.:

          Start Addr           End Addr       Size     Offset  Perms  objfile
      0xaaaaaaaa0000     0xaaaaaaaab000     0xb000        0x0  r-xp   /usr/bin/wc
      0xaaaaaaabf000     0xaaaaaaac1000     0x2000     0xf000  rw-p   /usr/bin/wc
      0xfffff7fbe000     0xfffff7fe5000    0x27000        0x0  r-xp   /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
      0xfffff7ff6000     0xfffff7ffa000     0x4000        0x0  r--p   [vvar]
      0xfffff7ffa000     0xfffff7ffc000     0x2000        0x0  r-xp   [vdso]
      0xfffff7ffc000     0xfffff8000000     0x4000    0x2e000  rw-p   /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
      0xfffffffdf000    0x1000000000000    0x21000        0x0  rw-p   [stack]

-- 
Catalin


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG: vdso changes expose elf mapping issue
  2025-04-25 18:37 ` Catalin Marinas
@ 2025-04-25 19:56   ` Kees Cook
  2025-04-25 22:48     ` Kees Cook
  0 siblings, 1 reply; 4+ messages in thread
From: Kees Cook @ 2025-04-25 19:56 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Ryan Roberts, Thomas Weißschuh, Linux Kernel Mailing List,
	Linux-MM, linux-arm-kernel

On Fri, Apr 25, 2025 at 07:37:38PM +0100, Catalin Marinas wrote:
> On Fri, Apr 25, 2025 at 01:41:31PM +0100, Ryan Roberts wrote:
> > ldconfig is a statically linked, PIE executable. The kernel treats this as an 
> > interpreter and therefore does not map it into low memory but instead maps it 
> > into high memory using mmap() (mmap is top-down on arm64). Once it's mapped, 
> > vvar/vdso gets mapped and fills the hole right at the top that is left due to 
> > ldconfig's alignment requirements. Before the above change, there were 2 pages 
> > free between the end of the data segment and vvar; this was enough for ldconfig 
> > to get it's required memory with brk(). But after the change there is no space:
> > 
> > Before:
> > fffff7f20000-fffff7fde000 r-xp 00000000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
> > fffff7fee000-fffff7ff5000 rw-p 000be000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
> > fffff7ff5000-fffff7ffa000 rw-p 00000000 00:00 0 
> > fffff7ffc000-fffff7ffe000 r--p 00000000 00:00 0                          [vvar]
> > fffff7ffe000-fffff8000000 r-xp 00000000 00:00 0                          [vdso]
> > fffffffdf000-1000000000000 rw-p 00000000 00:00 0                         [stack]
> > 
> > After:
> > fffff7f20000-fffff7fde000 r-xp 00000000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
> > fffff7fee000-fffff7ff5000 rw-p 000be000 fe:02 8110426                    /home/ubuntu/glibc-2.35/build/elf/ldconfig
> > fffff7ff5000-fffff7ffa000 rw-p 00000000 00:00 0 
> > fffff7ffa000-fffff7ffe000 r--p 00000000 00:00 0                          [vvar]
> > fffff7ffe000-fffff8000000 r-xp 00000000 00:00 0                          [vdso]
> > fffffffdf000-1000000000000 rw-p 00000000 00:00 0                         [stack]
> 
> It does look like we've just been lucky so far. An ELF file requiring a
> slightly larger brk (by two pages), it could fail. FWIW, briefly after
> commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for
> static PIE"), we got:
> 
>           Start Addr           End Addr       Size     Offset  Perms  objfile
>       0xaaaaaaaa0000     0xaaaaaab5d000    0xbd000        0x0  r-xp   /usr/sbin/ldconfig
>       0xaaaaaab6b000     0xaaaaaab73000     0x8000    0xcb000  rw-p   /usr/sbin/ldconfig
>       0xaaaaaab73000     0xaaaaaab78000     0x5000        0x0  rw-p   [heap]
>       0xfffff7ffd000     0xfffff7fff000     0x2000        0x0  r--p   [vvar]
>       0xfffff7fff000     0xfffff8000000     0x1000        0x0  r-xp   [vdso]
>       0xfffffffdf000    0x1000000000000    0x21000        0x0  rw-p   [stack]
> 
> This looks like a better layout to me when you load an ET_DYN file
> without !PT_INTERP.

The trouble is that !PT_INTERP must be loaded out of the way of the
binary it may load, so it cannot be loaded low.

> When the commit was reverted by aeb7923733d1 ("revert "fs/binfmt_elf:
> use PT_LOAD p_align values for static PIE""), we went back to:
> 
>           Start Addr           End Addr       Size     Offset  Perms  objfile
>       0xfffff7f28000     0xfffff7fe5000    0xbd000        0x0  r-xp   /usr/sbin/ldconfig
>       0xfffff7ff0000     0xfffff7ff2000     0x2000        0x0  r--p   [vvar]
>       0xfffff7ff2000     0xfffff7ff3000     0x1000        0x0  r-xp   [vdso]
>       0xfffff7ff3000     0xfffff7ffb000     0x8000    0xcb000  rw-p   /usr/sbin/ldconfig
>       0xfffff7ffb000     0xfffff8000000     0x5000        0x0  rw-p   [heap]
>       0xfffffffdf000    0x1000000000000    0x21000        0x0  rw-p   [stack]

The revert was because, among various additional problems, that this low
load would collide with things. The static PIE alignment was finally
fixed with commit 3545deff0ec7 ("binfmt_elf: Honor PT_LOAD alignment
for static PIE")

The ultimate brk location is determined near the end of load_elf_binary()
(see the code surrounding the comment "Otherwise leave a gap").

> With 6.15-rc3 my layout looks like Ryan's but in 5.18 above, the vdso is
> small enough and it's squeezed between the two ldconfig sections.

I think there are two surprises:

- For loaders (ET_DYN without PT_INTERP, which is also "static PIE") the
  brk location is being moved to ELF_ET_DYN_BASE ... *but only when ASLR
  is enabled*. I think exclusion is the primary bug, with its origin
  in commit bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing
  direct loader exec"). I failed to explain my rationale at the time
  to have it only happen under ASLR, but I think I was trying to be
  conservative and not change things too much.

- vdso can get loaded into _gaps_ in the ELF. I think this is asking for
  trouble, but technically should be okay since neither can grow. But I
  never like seeing immediately adjacent unrelated mappings, since we
  always end up with bugs (see things like commit 2a5eb9995528
  ("binfmt_elf: Leave a gap between .bss and brk").

For fixing the former, the below change might work (totally untested yet,
I just wanted to reply with my thoughts as I start testing this). Pardon
the goofy code style, I wanted a minimal diff here:

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 7e2afe3220f7..9290a29ede28 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1284,7 +1284,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	mm->end_data = end_data;
 	mm->start_stack = bprm->p;
 
-	if ((current->flags & PF_RANDOMIZE) && (snapshot_randomize_va_space > 1)) {
+	{
 		/*
 		 * For architectures with ELF randomization, when executing
 		 * a loader directly (i.e. no interpreter listed in ELF
@@ -1299,7 +1299,9 @@ static int load_elf_binary(struct linux_binprm *bprm)
 			/* Otherwise leave a gap between .bss and brk. */
 			mm->brk = mm->start_brk = mm->brk + PAGE_SIZE;
 		}
+	}
 
+	if ((current->flags & PF_RANDOMIZE) && (snapshot_randomize_va_space > 1)) {
 		mm->brk = mm->start_brk = arch_randomize_brk(mm);
 #ifdef compat_brk_randomized
 		current->brk_randomized = 1;

> > Note that this issue only occurs with ASLR disabled. When ASLR is enabled, the 
> > brk region is setup in the low memory region that would normally be used by 
> > primary executable.

Out of curiosity, why are you running without ASLR?

Thanks for the report! I'll continue testing the above fix. Just for
making sure I am able to exactly reproduce your issue, this is on
a regular arm64 install of Ubuntu 22.04?

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG: vdso changes expose elf mapping issue
  2025-04-25 19:56   ` Kees Cook
@ 2025-04-25 22:48     ` Kees Cook
  0 siblings, 0 replies; 4+ messages in thread
From: Kees Cook @ 2025-04-25 22:48 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Ryan Roberts, Thomas Weißschuh, Linux Kernel Mailing List,
	Linux-MM, linux-arm-kernel

On Fri, Apr 25, 2025 at 12:56:21PM -0700, Kees Cook wrote:
> For fixing the former, the below change might work (totally untested yet,
> I just wanted to reply with my thoughts as I start testing this). Pardon
> the goofy code style, I wanted a minimal diff here:
> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 7e2afe3220f7..9290a29ede28 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1284,7 +1284,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
>  	mm->end_data = end_data;
>  	mm->start_stack = bprm->p;
>  
> -	if ((current->flags & PF_RANDOMIZE) && (snapshot_randomize_va_space > 1)) {
> +	{
>  		/*
>  		 * For architectures with ELF randomization, when executing
>  		 * a loader directly (i.e. no interpreter listed in ELF
> @@ -1299,7 +1299,9 @@ static int load_elf_binary(struct linux_binprm *bprm)
>  			/* Otherwise leave a gap between .bss and brk. */
>  			mm->brk = mm->start_brk = mm->brk + PAGE_SIZE;
>  		}
> +	}
>  
> +	if ((current->flags & PF_RANDOMIZE) && (snapshot_randomize_va_space > 1)) {
>  		mm->brk = mm->start_brk = arch_randomize_brk(mm);
>  #ifdef compat_brk_randomized
>  		current->brk_randomized = 1;

Unsurprisingly, this patch was broken, but the idea appears to be
valid. This new patch works for me so far, though I haven't finished
getting Ubuntu 22.04 installed in an arm64 VM. Please let me know if
this fixes it:
https://lore.kernel.org/lkml/20250425224502.work.520-kees@kernel.org/

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-04-25 22:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-25 12:41 BUG: vdso changes expose elf mapping issue Ryan Roberts
2025-04-25 18:37 ` Catalin Marinas
2025-04-25 19:56   ` Kees Cook
2025-04-25 22:48     ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox