linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Charlie Jenkins <charlie@rivosinc.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Naveen N Rao <naveen@kernel.org>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Huacai Chen <chenhuacai@kernel.org>,
	WANG Xuerui <kernel@xen0n.name>,
	Russell King <linux@armlinux.org.uk>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Helge Deller <deller@gmx.de>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Rich Felker <dalias@libc.org>,
	John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>,
	"David S. Miller" <davem@davemloft.net>,
	Andreas Larsson <andreas@gaisler.com>,
	Shuah Khan <shuah@kernel.org>,
	Alexandre Ghiti <alexghiti@rivosinc.com>,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	Palmer Dabbelt <palmer@rivosinc.com>,
	linux-riscv@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
	loongarch@lists.linux.dev, linux-mips@vger.kernel.org,
	linux-parisc@vger.kernel.org, linux-s390@vger.kernel.org,
	linux-sh@vger.kernel.org, sparclinux@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH 00/16] mm: Introduce MAP_BELOW_HINT
Date: Wed, 28 Aug 2024 14:39:55 -0700	[thread overview]
Message-ID: <Zs+ZK6Q2U9dm19yR@ghost> (raw)
In-Reply-To: <Zs+Ppk0ANaUah7p9@ghost>

On Wed, Aug 28, 2024 at 01:59:18PM -0700, Charlie Jenkins wrote:
> On Wed, Aug 28, 2024 at 02:31:42PM -0400, Liam R. Howlett wrote:
> > * Charlie Jenkins <charlie@rivosinc.com> [240828 01:49]:
> > > Some applications rely on placing data in free bits addresses allocated
> > > by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
> > > address returned by mmap to be less than the maximum address space,
> > > unless the hint address is greater than this value.
> > 
> > Wait, what arch(s) allows for greater than the max?  The passed hint
> > should be where we start searching, but we go to the lower limit then
> > start at the hint and search up (or vice-versa on the directions).
> > 
> 
> I worded this awkwardly. On arm64 there is a page-table boundary at 48
> bits and at 52 bits. On x86 the boundaries are at 48 bits and 57 bits.
> The max value mmap is able to return on arm64 is 48 bits if the hint
> address uses 48 bits or less, even if the architecture supports 5-level
> paging and thus addresses can be 52 bits. Applications can opt-in to
> using up to 52-bits in an address by using a hint address greater than
> 48 bits. x86 has the same behavior but with 57 bits instead of 52.
> 
> This reason this exists is because some applications arbitrarily replace
> bits in virtual addresses with data with an assumption that the address
> will not be using any of the bits above bit 48 in the virtual address.
> As hardware with larger address spaces was released, x86 decided to
> build safety guards into the kernel to allow the applications that made
> these assumptions to continue to work on this different hardware.
> 
> This causes all application that use a hint address to silently be
> restricted to 48-bit addresses. The goal of this flag is to have a way
> for applications to explicitly request how many bits they want mmap to
> use.
> 
> > I don't understand how unmapping works on a higher address; we would
> > fail to free it on termination of the application.
> > 
> > Also, there are archs that map outside of the VMAs, which are freed by
> > freeing from the prev->vm_end to next->vm_start, so I don't understand
> > what that looks like in this reality as well.
> > 
> > > 
> > > On arm64 this barrier is at 52 bits and on x86 it is at 56 bits. This
> > > flag allows applications a way to specify exactly how many bits they
> > > want to be left unused by mmap. This eliminates the need for
> > > applications to know the page table hierarchy of the system to be able
> > > to reason which addresses mmap will be allowed to return.
> > 
> > But, why do they need to know today?  We have a limit for this don't we?
> 
> The limit is different for different architectures. On x86 the limit is
> 57 bits, and on arm64 it is 52 bits. So in the theoretical case that an
> application requires 10 bits free in a virtual address, the application
> would always work on arm64 regardless of the hint address, but on x86 if
> the hint address is greater than 48 bits then the application will not
> work.
> 
> The goal of this flag is to have consistent and tunable behavior of
> mmap() when it is desired to ensure that mmap() only returns addresses
> that use some number of bits.
> 
> > 
> > Also, these upper limits are how some archs use the upper bits that you
> > are trying to use.
> > 
> 
> It does not eliminate the existing behavior of the architectures to
> place this upper limits, it instead provides a way to have consistent
> behavior across all architectures.
> 
> > > 
> > > ---
> > > riscv made this feature of mmap returning addresses less than the hint
> > > address the default behavior. This was in contrast to the implementation
> > > of x86/arm64 that have a single boundary at the 5-level page table
> > > region. However this restriction proved too great -- the reduced
> > > address space when using a hint address was too small.
> > 
> > Yes, the hint is used to group things close together so it would
> > literally be random chance on if you have enough room or not (aslr and
> > all).
> > 
> > > 
> > > A patch for riscv [1] reverts the behavior that broke userspace. This
> > > series serves to make this feature available to all architectures.
> > 
> > I don't fully understand this statement, you say it broke userspace so
> > now you are porting it to everyone?  This reads as if you are braking
> > the userspace on all architectures :)
> 
> It was the default for mmap on riscv. The difference here is that it is now
> enabled by a flag instead. Instead of making the flag specific to riscv,
> I figured that other architectures might find it useful as well.
> 
> > 
> > If you fail to find room below, then your application fails as there is
> > no way to get the upper bits you need.  It would be better to fix this
> > in userspace - if your application is returned too high an address, then
> > free it and exit because it's going to fail anyways.
> > 
> 
> This flag is trying to define an API that is more robust than the
> current behavior on that x86 and arm64 which implicitly restricts mmap()
> addresses to 48 bits. A solution could be to just write in the docs that
> mmap() will always exhaust all addresses below the hint address before
> returning an address that is above the hint address. However a flag that
> defines this behavior seems more intuitive.
> 
> > > 
> > > I have only tested on riscv and x86.
> > 
> > This should be an RFC then.
> 
> Fair enough.
> 
> > 
> > > There is a tremendous amount of
> > > duplicated code in mmap so the implementations across architectures I
> > > believe should be mostly consistent. I added this feature to all
> > > architectures that implement either
> > > arch_get_mmap_end()/arch_get_mmap_base() or
> > > arch_get_unmapped_area_topdown()/arch_get_unmapped_area(). I also added
> > > it to the default behavior for arch_get_mmap_end()/arch_get_mmap_base().
> > 
> > Way too much duplicate code.  We should be figuring out how to make this
> > all work with the same code.
> > 
> > This is going to make the cloned code problem worse.
> 
> That would require standardizing every architecture with the generic
> mmap() framework that arm64 has developed. That is far outside the scope
> of this patch, but would be a great area to research for each of the
> architectures that do not use the generic framework.

Thinking about this again, I could drop support for all architectures
that do not implement arch_get_mmap_base()/arch_get_mmap_end().

> 
> - Charlie
> 
> > 
> > > 
> > > Link: https://lore.kernel.org/lkml/20240826-riscv_mmap-v1-2-cd8962afe47f@rivosinc.com/T/ [1]
> > > 
> > > To: Arnd Bergmann <arnd@arndb.de>
> > > To: Paul Walmsley <paul.walmsley@sifive.com>
> > > To: Palmer Dabbelt <palmer@dabbelt.com>
> > > To: Albert Ou <aou@eecs.berkeley.edu>
> > > To: Catalin Marinas <catalin.marinas@arm.com>
> > > To: Will Deacon <will@kernel.org>
> > > To: Michael Ellerman <mpe@ellerman.id.au>
> > > To: Nicholas Piggin <npiggin@gmail.com>
> > > To: Christophe Leroy <christophe.leroy@csgroup.eu>
> > > To: Naveen N Rao <naveen@kernel.org>
> > > To: Muchun Song <muchun.song@linux.dev>
> > > To: Andrew Morton <akpm@linux-foundation.org>
> > > To: Liam R. Howlett <Liam.Howlett@oracle.com>
> > > To: Vlastimil Babka <vbabka@suse.cz>
> > > To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > To: Thomas Gleixner <tglx@linutronix.de>
> > > To: Ingo Molnar <mingo@redhat.com>
> > > To: Borislav Petkov <bp@alien8.de>
> > > To: Dave Hansen <dave.hansen@linux.intel.com>
> > > To: x86@kernel.org
> > > To: H. Peter Anvin <hpa@zytor.com>
> > > To: Huacai Chen <chenhuacai@kernel.org>
> > > To: WANG Xuerui <kernel@xen0n.name>
> > > To: Russell King <linux@armlinux.org.uk>
> > > To: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> > > To: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
> > > To: Helge Deller <deller@gmx.de>
> > > To: Alexander Gordeev <agordeev@linux.ibm.com>
> > > To: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> > > To: Heiko Carstens <hca@linux.ibm.com>
> > > To: Vasily Gorbik <gor@linux.ibm.com>
> > > To: Christian Borntraeger <borntraeger@linux.ibm.com>
> > > To: Sven Schnelle <svens@linux.ibm.com>
> > > To: Yoshinori Sato <ysato@users.sourceforge.jp>
> > > To: Rich Felker <dalias@libc.org>
> > > To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> > > To: David S. Miller <davem@davemloft.net>
> > > To: Andreas Larsson <andreas@gaisler.com>
> > > To: Shuah Khan <shuah@kernel.org>
> > > To: Alexandre Ghiti <alexghiti@rivosinc.com>
> > > Cc: linux-arch@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: Palmer Dabbelt <palmer@rivosinc.com>
> > > Cc: linux-riscv@lists.infradead.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Cc: linux-mm@kvack.org
> > > Cc: loongarch@lists.linux.dev
> > > Cc: linux-mips@vger.kernel.org
> > > Cc: linux-parisc@vger.kernel.org
> > > Cc: linux-s390@vger.kernel.org
> > > Cc: linux-sh@vger.kernel.org
> > > Cc: sparclinux@vger.kernel.org
> > > Cc: linux-kselftest@vger.kernel.org
> > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > 
> > > ---
> > > Charlie Jenkins (16):
> > >       mm: Add MAP_BELOW_HINT
> > >       riscv: mm: Do not restrict mmap address based on hint
> > >       mm: Add flag and len param to arch_get_mmap_base()
> > >       mm: Add generic MAP_BELOW_HINT
> > >       riscv: mm: Support MAP_BELOW_HINT
> > >       arm64: mm: Support MAP_BELOW_HINT
> > >       powerpc: mm: Support MAP_BELOW_HINT
> > >       x86: mm: Support MAP_BELOW_HINT
> > >       loongarch: mm: Support MAP_BELOW_HINT
> > >       arm: mm: Support MAP_BELOW_HINT
> > >       mips: mm: Support MAP_BELOW_HINT
> > >       parisc: mm: Support MAP_BELOW_HINT
> > >       s390: mm: Support MAP_BELOW_HINT
> > >       sh: mm: Support MAP_BELOW_HINT
> > >       sparc: mm: Support MAP_BELOW_HINT
> > >       selftests/mm: Create MAP_BELOW_HINT test
> > > 
> > >  arch/arm/mm/mmap.c                           | 10 ++++++++
> > >  arch/arm64/include/asm/processor.h           | 34 ++++++++++++++++++++++----
> > >  arch/loongarch/mm/mmap.c                     | 11 +++++++++
> > >  arch/mips/mm/mmap.c                          |  9 +++++++
> > >  arch/parisc/include/uapi/asm/mman.h          |  1 +
> > >  arch/parisc/kernel/sys_parisc.c              |  9 +++++++
> > >  arch/powerpc/include/asm/task_size_64.h      | 36 +++++++++++++++++++++++-----
> > >  arch/riscv/include/asm/processor.h           | 32 -------------------------
> > >  arch/s390/mm/mmap.c                          | 10 ++++++++
> > >  arch/sh/mm/mmap.c                            | 10 ++++++++
> > >  arch/sparc/kernel/sys_sparc_64.c             |  8 +++++++
> > >  arch/x86/kernel/sys_x86_64.c                 | 25 ++++++++++++++++---
> > >  fs/hugetlbfs/inode.c                         |  2 +-
> > >  include/linux/sched/mm.h                     | 34 ++++++++++++++++++++++++--
> > >  include/uapi/asm-generic/mman-common.h       |  1 +
> > >  mm/mmap.c                                    |  2 +-
> > >  tools/arch/parisc/include/uapi/asm/mman.h    |  1 +
> > >  tools/include/uapi/asm-generic/mman-common.h |  1 +
> > >  tools/testing/selftests/mm/Makefile          |  1 +
> > >  tools/testing/selftests/mm/map_below_hint.c  | 29 ++++++++++++++++++++++
> > >  20 files changed, 216 insertions(+), 50 deletions(-)
> > > ---
> > > base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
> > > change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
> > > -- 
> > > - Charlie
> > > 


      reply	other threads:[~2024-08-28 21:40 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-28  5:49 Charlie Jenkins
2024-08-28  5:49 ` [PATCH 01/16] mm: Add MAP_BELOW_HINT Charlie Jenkins
2024-08-28  5:49 ` [PATCH 02/16] riscv: mm: Do not restrict mmap address based on hint Charlie Jenkins
2024-08-28  5:49 ` [PATCH 03/16] mm: Add flag and len param to arch_get_mmap_base() Charlie Jenkins
2024-08-28  5:49 ` [PATCH 04/16] mm: Add generic MAP_BELOW_HINT Charlie Jenkins
2024-08-28  5:49 ` [PATCH 05/16] riscv: mm: Support MAP_BELOW_HINT Charlie Jenkins
2024-08-28  5:49 ` [PATCH 06/16] arm64: " Charlie Jenkins
2024-08-28  5:49 ` [PATCH 07/16] powerpc: " Charlie Jenkins
2024-08-28  6:34   ` Christophe Leroy
2024-08-28 17:29     ` Charlie Jenkins
2024-08-31 16:30     ` David Laight
2024-08-28  5:49 ` [PATCH 08/16] x86: " Charlie Jenkins
2024-08-28  5:49 ` [PATCH 09/16] loongarch: " Charlie Jenkins
2024-08-28  5:49 ` [PATCH 10/16] arm: " Charlie Jenkins
2024-08-28  5:49 ` [PATCH 11/16] mips: " Charlie Jenkins
2024-08-28  5:49 ` [PATCH 12/16] parisc: " Charlie Jenkins
2024-08-28  5:49 ` [PATCH 13/16] s390: " Charlie Jenkins
2024-08-28  5:49 ` [PATCH 14/16] sh: " Charlie Jenkins
2024-08-28  5:49 ` [PATCH 15/16] sparc: " Charlie Jenkins
2024-08-28  5:49 ` [PATCH 16/16] selftests/mm: Create MAP_BELOW_HINT test Charlie Jenkins
2024-08-28 17:48   ` Lorenzo Stoakes
2024-08-28 18:13     ` Charlie Jenkins
2024-08-28 18:19 ` [PATCH 00/16] mm: Introduce MAP_BELOW_HINT Lorenzo Stoakes
2024-08-29  7:14   ` Charlie Jenkins
2024-08-28 18:29 ` Dave Hansen
2024-08-28 20:15   ` Charlie Jenkins
2024-08-29 16:54     ` Dave Hansen
2024-08-29 19:36       ` Liam R. Howlett
2024-08-30  1:10         ` Charlie Jenkins
2024-08-30  1:00       ` Charlie Jenkins
2024-08-28 18:31 ` Liam R. Howlett
2024-08-28 20:59   ` Charlie Jenkins
2024-08-28 21:39     ` Charlie Jenkins [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zs+ZK6Q2U9dm19yR@ghost \
    --to=charlie@rivosinc.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexghiti@rivosinc.com \
    --cc=andreas@gaisler.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=arnd@arndb.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dalias@libc.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=kernel@xen0n.name \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=muchun.song@linux.dev \
    --cc=naveen@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=palmer@rivosinc.com \
    --cc=paul.walmsley@sifive.com \
    --cc=shuah@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tsbogend@alpha.franken.de \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox