From: Palmer Dabbelt <palmer@dabbelt.com>
To: Charlie Jenkins <charlie@rivosinc.com>, lorenzo.stoakes@oracle.com
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Liam.Howlett@oracle.com, Arnd Bergmann <arnd@arndb.de>,
guoren@kernel.org,
Richard Henderson <richard.henderson@linaro.org>,
ink@jurassic.park.msu.ru, mattst88@gmail.com, vgupta@kernel.org,
linux@armlinux.org.uk, chenhuacai@kernel.org, kernel@xen0n.name,
tsbogend@alpha.franken.de, James.Bottomley@hansenpartnership.com,
deller@gmx.de, mpe@ellerman.id.au, npiggin@gmail.com,
christophe.leroy@csgroup.eu, naveen@kernel.org,
agordeev@linux.ibm.com, gerald.schaefer@linux.ibm.com,
hca@linux.ibm.com, gor@linux.ibm.com, borntraeger@linux.ibm.com,
svens@linux.ibm.com, ysato@users.sourceforge.jp, dalias@libc.org,
glaubitz@physik.fu-berlin.de, davem@davemloft.net,
andreas@gaisler.com, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
muchun.song@linux.dev, akpm@linux-foundation.org, vbabka@suse.cz,
shuah@kernel.org, Christoph Hellwig <hch@infradead.org>,
mhocko@suse.com, kirill@shutemov.name, chris.torek@gmail.com,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org,
linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org,
loongarch@lists.linux.dev, linux-mips@vger.kernel.org,
linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
sparclinux@vger.kernel.org, linux-mm@kvack.org,
linux-kselftest@vger.kernel.org,
linux-abi-devel@lists.sourceforge.net
Subject: Re: [PATCH RFC v3 1/2] mm: Add personality flag to limit address to 47 bits
Date: Wed, 02 Oct 2024 07:26:32 -0700 (PDT) [thread overview]
Message-ID: <mhng-411f66df-5f86-4aeb-b614-a6f64587549c@palmer-ri-x1c9a> (raw)
In-Reply-To: <ZuSoxh5U3Kj1XgGq@ghost>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7868 bytes --]
On Fri, 13 Sep 2024 14:04:06 PDT (-0700), Charlie Jenkins wrote:
> On Fri, Sep 13, 2024 at 08:41:34AM +0100, Lorenzo Stoakes wrote:
>> On Wed, Sep 11, 2024 at 11:18:12PM GMT, Charlie Jenkins wrote:
>> > On Wed, Sep 11, 2024 at 07:21:27PM +0100, Catalin Marinas wrote:
>> > > On Tue, Sep 10, 2024 at 05:45:07PM -0700, Charlie Jenkins wrote:
>> > > > On Tue, Sep 10, 2024 at 03:08:14PM -0400, Liam R. Howlett wrote:
>> > > > > * Catalin Marinas <catalin.marinas@arm.com> [240906 07:44]:
>> > > > > > On Fri, Sep 06, 2024 at 09:55:42AM +0000, Arnd Bergmann wrote:
>> > > > > > > On Fri, Sep 6, 2024, at 09:14, Guo Ren wrote:
>> > > > > > > > On Fri, Sep 6, 2024 at 3:18 PM Arnd Bergmann <arnd@arndb.de> wrote:
>> > > > > > > >> It's also unclear to me how we want this flag to interact with
>> > > > > > > >> the existing logic in arch_get_mmap_end(), which attempts to
>> > > > > > > >> limit the default mapping to a 47-bit address space already.
>> > > > > > > >
>> > > > > > > > To optimize RISC-V progress, I recommend:
>> > > > > > > >
>> > > > > > > > Step 1: Approve the patch.
>> > > > > > > > Step 2: Update Go and OpenJDK's RISC-V backend to utilize it.
>> > > > > > > > Step 3: Wait approximately several iterations for Go & OpenJDK
>> > > > > > > > Step 4: Remove the 47-bit constraint in arch_get_mmap_end()
>> > >
>> > > Point 4 is an ABI change. What guarantees that there isn't still
>> > > software out there that relies on the old behaviour?
>> >
>> > Yeah I don't think it would be desirable to remove the 47 bit
>> > constraint in architectures that already have it.
>> >
>> > >
>> > > > > > > I really want to first see a plausible explanation about why
>> > > > > > > RISC-V can't just implement this using a 47-bit DEFAULT_MAP_WINDOW
>> > > > > > > like all the other major architectures (x86, arm64, powerpc64),
>> > > > > >
>> > > > > > FWIW arm64 actually limits DEFAULT_MAP_WINDOW to 48-bit in the default
>> > > > > > configuration. We end up with a 47-bit with 16K pages but for a
>> > > > > > different reason that has to do with LPA2 support (I doubt we need this
>> > > > > > for the user mapping but we need to untangle some of the macros there;
>> > > > > > that's for a separate discussion).
>> > > > > >
>> > > > > > That said, we haven't encountered any user space problems with a 48-bit
>> > > > > > DEFAULT_MAP_WINDOW. So I also think RISC-V should follow a similar
>> > > > > > approach (47 or 48 bit default limit). Better to have some ABI
>> > > > > > consistency between architectures. One can still ask for addresses above
>> > > > > > this default limit via mmap().
>> > > > >
>> > > > > I think that is best as well.
>> > > > >
>> > > > > Can we please just do what x86 and arm64 does?
>> > > >
>> > > > I responded to Arnd in the other thread, but I am still not convinced
>> > > > that the solution that x86 and arm64 have selected is the best solution.
>> > > > The solution of defaulting to 47 bits does allow applications the
>> > > > ability to get addresses that are below 47 bits. However, due to
>> > > > differences across architectures it doesn't seem possible to have all
>> > > > architectures default to the same value. Additionally, this flag will be
>> > > > able to help users avoid potential bugs where a hint address is passed
>> > > > that causes upper bits of a VA to be used.
>> > >
>> > > The reason we added this limit on arm64 is that we noticed programs
>> > > using the top 8 bits of a 64-bit pointer for additional information.
>> > > IIRC, it wasn't even openJDK but some JavaScript JIT. We could have
>> > > taught those programs of a new flag but since we couldn't tell how many
>> > > are out there, it was the safest to default to a smaller limit and opt
>> > > in to the higher one. Such opt-in is via mmap() but if you prefer a
>> > > prctl() flag, that's fine by me as well (though I think this should be
>> > > opt-in to higher addresses rather than opt-out of the higher addresses).
>> >
>> > The mmap() flag was used in previous versions but was decided against
>> > because this feature is more useful if it is process-wide. A
>> > personality() flag was chosen instead of a prctl() flag because there
>> > existed other flags in personality() that were similar. I am tempted to
>> > use prctl() however because then we could have an additional arg to
>> > select the exact number of bits that should be reserved (rather than
>> > being fixed at 47 bits).
>>
>> I am very much not in favour of a prctl(), it would require us to add state
>> limiting the address space and the timing of it becomes critical. Then we
>> have the same issue we do with the other proposals as to - what happens if
>> this is too low?
>>
>> What is 'too low' varies by architecture, and for 32-bit architectures
>> could get quite... problematic.
>>
>> And again, wha is the RoI here - we introducing maintenance burden and edge
>> cases vs. the x86 solution in order to... accommodate things that need more
>> than 128 TiB of address space? A problem that does not appear to exist in
>> reality?
>>
>> I suggested the personality approach as the least impactful compromise way
>> of this series working, but I think after what Arnd has said (and please
>> forgive me if I've missed further discussion have been dipping in and out
>> of this!) - adapting risc v to the approach we take elsewhere seems the
>> most sensible solution to me.
There's one wrinkle here: RISC-V started out with 39-bit VAs by default,
and we've had at least one report of userspace breaking when moving to
48-bit addresses. That was just address sanitizer, so maybe nobody
cares, but we're still pretty early in the transition to 48-bit systems
(most of the HW is still 39-bit) so it's not clear if that's going to be
the only bug.
So we're sort of in our own world of backwards compatibility here.
39-bit vs 48-bit is just an arbitrary number, but "38 bits are enough
for userspace" doesn't seem as sane a "47 bits are enough for
userspace". Maybe the right answer here is to just say the 38-bit
userspace is broken and that it's a Linux-ism that 64-bit sytems have
47-bit user addresses by default.
>> This remains something we can revisit in future if this turns out to be
>> egregious.
>>
>
> I appreciate Arnd's comments, but I do not think that making 47-bit the
> default is the best solution for riscv. On riscv, support for 48-bit
> address spaces was merged in 5.17 and support for 57-bit address spaces
> was merged in 5.18 without changing the default addresses provided by
> mmap(). It could be argued that this was a mistake, however since at the
> time there didn't exist hardware with larger address spaces it wasn't an
> issue. The applications that existed at the time that relied on the
> smaller address spaces have not been able to move to larger address
> spaces. Making a 47-bit user-space address space default solves the
> problem, but that is not arch agnostic, and can't be since of the
> varying differences in page table sizes across architectures, which is
> the other part of the problem I am trying to solve.
>
>> >
>> > Opting-in to the higher address space is reasonable. However, it is not
>> > my preference, because the purpose of this flag is to ensure that
>> > allocations do not exceed 47-bits, so it is a clearer ABI to have the
>> > applications that want this guarantee to be the ones setting the flag,
>> > rather than the applications that want the higher bits setting the flag.
>>
>> Perfect is the enemy of the good :) and an idealised solution may not end
>> up being something everybody can agree on.
>
> Yes you are totally right! Although this is not my ideal solution, it
> sufficiently accomplishes the goal so I think it is reasonable to
> implement this as a personality flag.
>
>>
>> >
>> > - Charlie
>> >
>> > >
>> > > --
>> > > Catalin
>> >
>> >
>> >
next prev parent reply other threads:[~2024-10-02 14:26 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-05 21:15 [PATCH RFC v3 0/2] mm: Introduce ADDR_LIMIT_47BIT personality flag Charlie Jenkins
2024-09-05 21:15 ` [PATCH RFC v3 1/2] mm: Add personality flag to limit address to 47 bits Charlie Jenkins
2024-09-06 6:59 ` Michael Ellerman
2024-09-09 19:07 ` Charlie Jenkins
2024-09-10 9:20 ` Christophe Leroy
2024-09-10 12:43 ` Geert Uytterhoeven
2024-09-11 13:38 ` Michael Ellerman
2024-09-12 6:20 ` Charlie Jenkins
2024-09-20 5:10 ` Michael Ellerman
2024-09-11 13:37 ` Michael Ellerman
2024-09-06 7:17 ` Arnd Bergmann
2024-09-06 8:02 ` Lorenzo Stoakes
2024-09-06 8:14 ` Lorenzo Stoakes
2024-09-06 9:14 ` Arnd Bergmann
2024-09-06 9:52 ` Lorenzo Stoakes
2024-09-09 23:22 ` Charlie Jenkins
2024-09-10 9:13 ` Arnd Bergmann
2024-09-10 23:29 ` Charlie Jenkins
2024-09-11 13:50 ` Michael Ellerman
2024-09-06 9:14 ` Guo Ren
2024-09-06 9:55 ` Arnd Bergmann
2024-09-06 11:43 ` Catalin Marinas
2024-09-10 19:08 ` Liam R. Howlett
2024-09-11 0:45 ` Charlie Jenkins
2024-09-11 7:25 ` Arnd Bergmann
2024-09-12 6:06 ` Charlie Jenkins
2024-09-11 18:21 ` Catalin Marinas
2024-09-12 6:18 ` Charlie Jenkins
2024-09-12 10:53 ` Catalin Marinas
2024-09-12 21:15 ` Charlie Jenkins
2024-09-13 10:08 ` Catalin Marinas
2024-09-13 10:21 ` Catalin Marinas
2024-09-13 20:15 ` Charlie Jenkins
2024-09-13 7:41 ` Lorenzo Stoakes
2024-09-13 21:04 ` Charlie Jenkins
2024-10-02 14:26 ` Palmer Dabbelt [this message]
2024-09-05 21:15 ` [PATCH RFC v3 2/2] selftests/mm: Create ADDR_LIMIT_47BIT test Charlie Jenkins
2024-09-06 6:08 ` [PATCH RFC v3 0/2] mm: Introduce ADDR_LIMIT_47BIT personality flag Guo Ren
2024-09-06 6:19 ` John Paul Adrian Glaubitz
2024-09-08 11:26 ` Jiaxun Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mhng-411f66df-5f86-4aeb-b614-a6f64587549c@palmer-ri-x1c9a \
--to=palmer@dabbelt.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=Liam.Howlett@oracle.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andreas@gaisler.com \
--cc=arnd@arndb.de \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=charlie@rivosinc.com \
--cc=chenhuacai@kernel.org \
--cc=chris.torek@gmail.com \
--cc=christophe.leroy@csgroup.eu \
--cc=dalias@libc.org \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=deller@gmx.de \
--cc=gerald.schaefer@linux.ibm.com \
--cc=glaubitz@physik.fu-berlin.de \
--cc=gor@linux.ibm.com \
--cc=guoren@kernel.org \
--cc=hca@linux.ibm.com \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=ink@jurassic.park.msu.ru \
--cc=kernel@xen0n.name \
--cc=kirill@shutemov.name \
--cc=linux-abi-devel@lists.sourceforge.net \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-csky@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-sh@vger.kernel.org \
--cc=linux-snps-arc@lists.infradead.org \
--cc=linux@armlinux.org.uk \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=loongarch@lists.linux.dev \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=mattst88@gmail.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=muchun.song@linux.dev \
--cc=naveen@kernel.org \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=richard.henderson@linaro.org \
--cc=shuah@kernel.org \
--cc=sparclinux@vger.kernel.org \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=tsbogend@alpha.franken.de \
--cc=vbabka@suse.cz \
--cc=vgupta@kernel.org \
--cc=x86@kernel.org \
--cc=ysato@users.sourceforge.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox