From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE79ACF6D28 for ; Wed, 2 Oct 2024 14:26:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27E6B6B0289; Wed, 2 Oct 2024 10:26:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 22B976B028A; Wed, 2 Oct 2024 10:26:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07DDE6B028B; Wed, 2 Oct 2024 10:26:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CFE2D6B0289 for ; Wed, 2 Oct 2024 10:26:39 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2C1101A0793 for ; Wed, 2 Oct 2024 14:26:39 +0000 (UTC) X-FDA: 82628888118.29.5DB5C31 Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) by imf20.hostedemail.com (Postfix) with ESMTP id AB8591C0016 for ; Wed, 2 Oct 2024 14:26:35 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=dabbelt-com.20230601.gappssmtp.com header.s=20230601 header.b=LNzALn8T; dmarc=none; spf=pass (imf20.hostedemail.com: domain of palmer@dabbelt.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=palmer@dabbelt.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727879174; a=rsa-sha256; cv=none; b=D2DY764prPdwuIc2aAiC5wuQp+XAVaoFP13FAiapofy0VoR2iv8En08DhkpY9sSXzHu5C5 sLJK0yotjhyFVQbpmsDad4Ylrj0C7wkaeEaydmYmBMyFTTEISzalodc6mZ9+qqsD1CRywS 9U6AjH9R+Cqwhs6bhKosR7FnSMgkk9Q= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=dabbelt-com.20230601.gappssmtp.com header.s=20230601 header.b=LNzALn8T; dmarc=none; spf=pass (imf20.hostedemail.com: domain of palmer@dabbelt.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=palmer@dabbelt.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727879174; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to:references:dkim-signature; bh=uOcXAMjC7XDZjDd/akc0amsyLu8m5OVtpNVeGFczMRY=; b=HHTo86oSS9IETveJMH6qSn4yOHgpSphItiC1WT1Q1YabtfuZQbAg6qGL2SRDsQZasVDr2f k0uzvPQjG2lPH/8pJrGBh/1yy1no4OyabIHhyHW0AWa/yxCPdntdVxv3Oah5ufAhaTJvkh LsabJRpNhwXolsN5k9y8rsO0Xh0cptY= Received: by mail-pg1-f170.google.com with SMTP id 41be03b00d2f7-7db0fb03df5so5064727a12.3 for ; Wed, 02 Oct 2024 07:26:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20230601.gappssmtp.com; s=20230601; t=1727879194; x=1728483994; darn=kvack.org; h=message-id:to:from:cc:in-reply-to:subject:date:from:to:cc:subject :date:message-id:reply-to; bh=uOcXAMjC7XDZjDd/akc0amsyLu8m5OVtpNVeGFczMRY=; b=LNzALn8T8xPCchMMAYvN+JGaURwotCWlJxlyollNGj8iDlrmVPctlTFSeJOp7amPsE vULhrPguxz5XE4koCOhFZwNz5xKYlLQsvKnwyFTaAhM+i4AOY6qdzAKWfgpLWBu2u+bX C4q8JOKpQwhIrdzcOMa7CH1BPIr696I4qKcFJBaCuGNfoJYcqhx26WiPDyMg4FEuSvx3 V1IyEJmKpqGwsEtJp1bvIEyRcV4YSiJC2UuYCKljiegG8L8QnR3iGvCuDHOuPyIW7Ylw eJ30vaw05OcsjW65hXMDnK8VL6XRd/pFJ92wUleTa8CPBnld3ReM4JCPgPTUGbLWJNts LkKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727879194; x=1728483994; h=message-id:to:from:cc:in-reply-to:subject:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=uOcXAMjC7XDZjDd/akc0amsyLu8m5OVtpNVeGFczMRY=; b=nYlubM8ZT+JLwV8Y3t9JrJFPOlNIBVnOPVLN+7+72mHqtF3QN6oSYjKsCOfYXJTNe3 3A0LnHZikf7v9r78Kv34TPcnIU0upVMj5k4e6jAjLnK/gP1i7alIXeiD40jNY3/KYYTT cm607TubxExCyQIEhPDOGvuXbuelpeFhRRlYWrSbYUJPrX+TNIh76b33PhRHEnEADRRk ZlZSO62lyP0+wDbnJPCjbiwSf1eW7NR7ZPXcTkLz+6kjFP1z40cYFA5VfjV12ISNAJuk R8Io4BmHQDoPb1aM9awvT+TgrWNTDdFqEA8xkeE7ax4LndpKN4l6YK6eGT7IkyotKtgA tj5A== X-Forwarded-Encrypted: i=1; AJvYcCWpHRuz+MufgbUfP0OfRswExyR1YCty7ASwyp8o2dJL5av547F9++jVbb//II6k6BQYQOJaXPC2fw==@kvack.org X-Gm-Message-State: AOJu0YybA7XeJKs+6byNKHP6TSjxm6kkheIdb2MNxlQ8gPf4z1WfnHSv ZBPaCQf0b8fAf0Nje/BuI2qaC/m67kGNvDob+YOhyUkWll9rDIVviGbcfODh9bU= X-Google-Smtp-Source: AGHT+IHNNKdcu5kwdDsRcpptaNSlsD6gvL0I++M5UX28lY+Nyv67bj3x07yZ+LRvd6o8h2+EV/Ug6A== X-Received: by 2002:a17:90a:be10:b0:2e0:7b2b:f76 with SMTP id 98e67ed59e1d1-2e18468cc49mr4757443a91.19.1727879193916; Wed, 02 Oct 2024 07:26:33 -0700 (PDT) Received: from localhost ([50.145.13.30]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e18f79bb04sm1615137a91.30.2024.10.02.07.26.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Oct 2024 07:26:32 -0700 (PDT) Date: Wed, 02 Oct 2024 07:26:32 -0700 (PDT) X-Google-Original-Date: Wed, 02 Oct 2024 07:26:31 PDT (-0700) Subject: Re: [PATCH RFC v3 1/2] mm: Add personality flag to limit address to 47 bits In-Reply-To: CC: Catalin Marinas , Liam.Howlett@oracle.com, Arnd Bergmann , guoren@kernel.org, Richard Henderson , ink@jurassic.park.msu.ru, mattst88@gmail.com, vgupta@kernel.org, linux@armlinux.org.uk, chenhuacai@kernel.org, kernel@xen0n.name, tsbogend@alpha.franken.de, James.Bottomley@hansenpartnership.com, deller@gmx.de, mpe@ellerman.id.au, npiggin@gmail.com, christophe.leroy@csgroup.eu, naveen@kernel.org, agordeev@linux.ibm.com, gerald.schaefer@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, ysato@users.sourceforge.jp, dalias@libc.org, glaubitz@physik.fu-berlin.de, davem@davemloft.net, andreas@gaisler.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, luto@kernel.org, peterz@infradead.org, muchun.song@linux.dev, akpm@linux-foundation.org, vbabka@suse.cz, shuah@kernel.org, Christoph Hellwig , mhocko@suse.com, kirill@shutemov.name, chris.torek@gmail.com, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linux-abi-devel@lists.sourceforge.net From: Palmer Dabbelt To: Charlie Jenkins , lorenzo.stoakes@oracle.com Message-ID: X-Rspam-User: X-Stat-Signature: dczjbphyd8xatxn1x6tqk17s7czs8nnn X-Rspamd-Queue-Id: AB8591C0016 X-Rspamd-Server: rspam02 X-HE-Tag: 1727879195-211911 X-HE-Meta: U2FsdGVkX1/ifIje+Acy7GTsN+0x8x9IdPfUwqy6SGz8yUtjTmr8iFk60vMcCjSHJP34un5Fdk7UV4pJQT7BnLyXlm2KIspVnpBpmWxrsbOjjuciexOjpgooyau4HxQE0iHXT4RwiygI5ImU0xH58JbePIo2Vhe44NHWN7TvuvSTND/fGxeXXTmTQPvUJ7YOEOX0Rewbm02H+w+w88EQdEd7hGxatsU3iJPoQG+Im6gE9V+zB7g8VdxRv34C92JbyEkPb9vRfxecBBFbNgL+lHphYDeXdTd7CUUI1H5+Q2D+XdmWJCL5nvSdGfwxTEW8SZN8OiBjwG1VvlKBIjk0Qe9n0Tbt9mEBUbm6AoQvYzh1i9DwcrhmKdsFEMiixWd8z9I7uNSqpoAtCYMdrxMDYt/xp2RVh7m0vq5x8v1VjToavJLFqnjkFTd9rEOS/kw8wdfHQqxMXFT6lPrKHTW7rs1d7EXfg7nntfBXAyh3MV9Nep+fYA+ZGx8+Xc8zgnJCu5dcAqmDz9ZAhiVSC2Q4lPlP+B8N4zI7jNtWKf+OpQNcKGuinresRLEMiHa7KJm1pDB8zYqFV/TaLfIQm5i5T/a6BpVcRBhMH9BwSqVBipoUGHI2v9YoBm6gEJiMNmqp601v3OBe+d7+L95rFo7HZwdNWBvHZWZqmIZY3jMXpcUsupFCJqnR2I6JQNlwoBNIyZSC8NKJJ8B5g6lxElCXBwDECMkB63KkhAF9u4bTcqwrf/7f+giLe/4UgpbzihBW28OZjhc/gS+nEy7vm93OiCVPihx24HRXyg4uYjM2rY/XJmJb8bfD/fSeYeaA/TjlDhS7kmJpk4EzNJ/vDkNA9DsiE4xOL3X0J+0XPzNEwB2ji7wZS4DYHttfB/7tT45jvvt0q/Xl4p+JnBfbG75iVtpyIx1kCJzFyHXjipGXDoaLj+Ej8ZyICB5RVcWPLuDi/AwghyM4k/gDM7d+FhG A0GA9sNA yy96nOMA7Ouz2GMEPRbaVCg3uAYVqxQM7FYX+48EiiRK64PD2t7jhUT+bGaF7WIpjKrY03ToBKmIupFQBQxxzYtkS3YHFJfG3v5JebO4wvWq02+YyYwxTjuuzHl/n5r41RzgMXBrpSQSKI5Gy/134/Es5QvBi6zYm2LMSuA6CC/b77sgIlJWfVhknDpeGKBMuCyUTCTZj6PB7Z1cKvULhY1vsH1niX7oNwO2QZRHVphqmZm2yAuIx9utXewZAiS+g7Y1GUn5ua5CEGe7970a5s7X1I1FnamzuUx8gAKR56OeJMY5MPkKeehCIfGZf7/lL6/9l4bd3QjGqJz3uOfiS8jc9GbIjVKD5RGLAgQqldzuwviBYHorLBmBj4Py4K3vGyb8SnD1kmwLVXqa24mNXgXEDeXnoF+pq2NLGpRYZBkIlCaR3QpAFhg7IfCyO1Jy2qhTGTyiku2rdmICJ0DqUFlVpKDMP7XWOeRi2HPSBKI4/wTmfygHgou1XFQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000852, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 13 Sep 2024 14:04:06 PDT (-0700), Charlie Jenkins wrote: > On Fri, Sep 13, 2024 at 08:41:34AM +0100, Lorenzo Stoakes wrote: >> On Wed, Sep 11, 2024 at 11:18:12PM GMT, Charlie Jenkins wrote: >> > On Wed, Sep 11, 2024 at 07:21:27PM +0100, Catalin Marinas wrote: >> > > On Tue, Sep 10, 2024 at 05:45:07PM -0700, Charlie Jenkins wrote: >> > > > On Tue, Sep 10, 2024 at 03:08:14PM -0400, Liam R. Howlett wrote: >> > > > > * Catalin Marinas [240906 07:44]: >> > > > > > On Fri, Sep 06, 2024 at 09:55:42AM +0000, Arnd Bergmann wrote: >> > > > > > > On Fri, Sep 6, 2024, at 09:14, Guo Ren wrote: >> > > > > > > > On Fri, Sep 6, 2024 at 3:18 PM Arnd Bergmann wrote: >> > > > > > > >> It's also unclear to me how we want this flag to interact with >> > > > > > > >> the existing logic in arch_get_mmap_end(), which attempts to >> > > > > > > >> limit the default mapping to a 47-bit address space already. >> > > > > > > > >> > > > > > > > To optimize RISC-V progress, I recommend: >> > > > > > > > >> > > > > > > > Step 1: Approve the patch. >> > > > > > > > Step 2: Update Go and OpenJDK's RISC-V backend to utilize it. >> > > > > > > > Step 3: Wait approximately several iterations for Go & OpenJDK >> > > > > > > > Step 4: Remove the 47-bit constraint in arch_get_mmap_end() >> > > >> > > Point 4 is an ABI change. What guarantees that there isn't still >> > > software out there that relies on the old behaviour? >> > >> > Yeah I don't think it would be desirable to remove the 47 bit >> > constraint in architectures that already have it. >> > >> > > >> > > > > > > I really want to first see a plausible explanation about why >> > > > > > > RISC-V can't just implement this using a 47-bit DEFAULT_MAP_WINDOW >> > > > > > > like all the other major architectures (x86, arm64, powerpc64), >> > > > > > >> > > > > > FWIW arm64 actually limits DEFAULT_MAP_WINDOW to 48-bit in the default >> > > > > > configuration. We end up with a 47-bit with 16K pages but for a >> > > > > > different reason that has to do with LPA2 support (I doubt we need this >> > > > > > for the user mapping but we need to untangle some of the macros there; >> > > > > > that's for a separate discussion). >> > > > > > >> > > > > > That said, we haven't encountered any user space problems with a 48-bit >> > > > > > DEFAULT_MAP_WINDOW. So I also think RISC-V should follow a similar >> > > > > > approach (47 or 48 bit default limit). Better to have some ABI >> > > > > > consistency between architectures. One can still ask for addresses above >> > > > > > this default limit via mmap(). >> > > > > >> > > > > I think that is best as well. >> > > > > >> > > > > Can we please just do what x86 and arm64 does? >> > > > >> > > > I responded to Arnd in the other thread, but I am still not convinced >> > > > that the solution that x86 and arm64 have selected is the best solution. >> > > > The solution of defaulting to 47 bits does allow applications the >> > > > ability to get addresses that are below 47 bits. However, due to >> > > > differences across architectures it doesn't seem possible to have all >> > > > architectures default to the same value. Additionally, this flag will be >> > > > able to help users avoid potential bugs where a hint address is passed >> > > > that causes upper bits of a VA to be used. >> > > >> > > The reason we added this limit on arm64 is that we noticed programs >> > > using the top 8 bits of a 64-bit pointer for additional information. >> > > IIRC, it wasn't even openJDK but some JavaScript JIT. We could have >> > > taught those programs of a new flag but since we couldn't tell how many >> > > are out there, it was the safest to default to a smaller limit and opt >> > > in to the higher one. Such opt-in is via mmap() but if you prefer a >> > > prctl() flag, that's fine by me as well (though I think this should be >> > > opt-in to higher addresses rather than opt-out of the higher addresses). >> > >> > The mmap() flag was used in previous versions but was decided against >> > because this feature is more useful if it is process-wide. A >> > personality() flag was chosen instead of a prctl() flag because there >> > existed other flags in personality() that were similar. I am tempted to >> > use prctl() however because then we could have an additional arg to >> > select the exact number of bits that should be reserved (rather than >> > being fixed at 47 bits). >> >> I am very much not in favour of a prctl(), it would require us to add state >> limiting the address space and the timing of it becomes critical. Then we >> have the same issue we do with the other proposals as to - what happens if >> this is too low? >> >> What is 'too low' varies by architecture, and for 32-bit architectures >> could get quite... problematic. >> >> And again, wha is the RoI here - we introducing maintenance burden and edge >> cases vs. the x86 solution in order to... accommodate things that need more >> than 128 TiB of address space? A problem that does not appear to exist in >> reality? >> >> I suggested the personality approach as the least impactful compromise way >> of this series working, but I think after what Arnd has said (and please >> forgive me if I've missed further discussion have been dipping in and out >> of this!) - adapting risc v to the approach we take elsewhere seems the >> most sensible solution to me. There's one wrinkle here: RISC-V started out with 39-bit VAs by default, and we've had at least one report of userspace breaking when moving to 48-bit addresses. That was just address sanitizer, so maybe nobody cares, but we're still pretty early in the transition to 48-bit systems (most of the HW is still 39-bit) so it's not clear if that's going to be the only bug. So we're sort of in our own world of backwards compatibility here. 39-bit vs 48-bit is just an arbitrary number, but "38 bits are enough for userspace" doesn't seem as sane a "47 bits are enough for userspace". Maybe the right answer here is to just say the 38-bit userspace is broken and that it's a Linux-ism that 64-bit sytems have 47-bit user addresses by default. >> This remains something we can revisit in future if this turns out to be >> egregious. >> > > I appreciate Arnd's comments, but I do not think that making 47-bit the > default is the best solution for riscv. On riscv, support for 48-bit > address spaces was merged in 5.17 and support for 57-bit address spaces > was merged in 5.18 without changing the default addresses provided by > mmap(). It could be argued that this was a mistake, however since at the > time there didn't exist hardware with larger address spaces it wasn't an > issue. The applications that existed at the time that relied on the > smaller address spaces have not been able to move to larger address > spaces. Making a 47-bit user-space address space default solves the > problem, but that is not arch agnostic, and can't be since of the > varying differences in page table sizes across architectures, which is > the other part of the problem I am trying to solve. > >> > >> > Opting-in to the higher address space is reasonable. However, it is not >> > my preference, because the purpose of this flag is to ensure that >> > allocations do not exceed 47-bits, so it is a clearer ABI to have the >> > applications that want this guarantee to be the ones setting the flag, >> > rather than the applications that want the higher bits setting the flag. >> >> Perfect is the enemy of the good :) and an idealised solution may not end >> up being something everybody can agree on. > > Yes you are totally right! Although this is not my ideal solution, it > sufficiently accomplishes the goal so I think it is reasonable to > implement this as a personality flag. > >> >> > >> > - Charlie >> > >> > > >> > > -- >> > > Catalin >> > >> > >> >