From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97623D15D8C for ; Mon, 21 Oct 2024 13:23:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1EF16B007B; Mon, 21 Oct 2024 09:23:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BCEBC6B0082; Mon, 21 Oct 2024 09:23:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABCD86B0083; Mon, 21 Oct 2024 09:23:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8F01B6B007B for ; Mon, 21 Oct 2024 09:23:14 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D5F50C02A0 for ; Mon, 21 Oct 2024 13:22:57 +0000 (UTC) X-FDA: 82697675214.18.7E03B3D Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf10.hostedemail.com (Postfix) with ESMTP id A866EC0018 for ; Mon, 21 Oct 2024 13:23:05 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of steven.price@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=steven.price@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729516842; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iF1qVfbrTRtpSgxM/EnORS8lS3vKqXn/rHlPgeo+5Wg=; b=mTgqccfwA8VV5DTMypQ8zN/1/lc2Pwot+1SGSN4x4CQ6wOW2a/XJ2XXnsmG0FnYvjmFO2Q q1fNo7b6MJREZAw5oFaX2NTbEdp1GUzm+fZmlGP4+MrRZqJFXyOFXIPe7s0c0jnj+gsCXm vZAYO68xt1baV7agFX2px5tcn0lFc1Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729516842; a=rsa-sha256; cv=none; b=lsVGl90YPM0w8l6capbo5rx0EJBto1WfAZs3Z+YV+E4CbRUu9I6FGLcQ/SFLzn5l5BQhyM G/4/Y45hHnal0SdkfsBbd0QnYb1hHe5130su0nzho92YOU0IyTBSGQzMTcMCGQSeBglQht NdDAIf1suLqzHdcxD+6nNNuezKfz4wE= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of steven.price@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=steven.price@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D1B4ADA7; Mon, 21 Oct 2024 06:23:40 -0700 (PDT) Received: from [10.57.24.27] (unknown [10.57.24.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E8DCB3F73B; Mon, 21 Oct 2024 06:22:59 -0700 (PDT) Message-ID: Date: Mon, 21 Oct 2024 14:22:56 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Steven Price Subject: Re: [PATCH RFC v2 0/4] mm: Introduce MAP_BELOW_HINT To: "Kirill A. Shutemov" , Charlie Jenkins Cc: Arnd Bergmann , Richard Henderson , Ivan Kokshaysky , Matt Turner , Vineet Gupta , Russell King , Guo Ren , Huacai Chen , WANG Xuerui , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Alexander Gordeev , Gerald Schaefer , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Muchun Song , Andrew Morton , "Liam R. Howlett" , Vlastimil Babka , Lorenzo Stoakes , Shuah Khan , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org References: <20240829-patches-below_hint_mmap-v2-0-638a28d9eae0@rivosinc.com> Content-Language: en-GB In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A866EC0018 X-Stat-Signature: x4tw6mu11zhxh7sdmm6w74i4be4tk3rt X-HE-Tag: 1729516985-522532 X-HE-Meta: U2FsdGVkX1/XQXFtdl3KahcnaCEbViWpWJiwGCwgWl56T1KIdUFmJjmXZKjAt8eDFaGSyhKdZOSHKFqUDmG9I4f35U4ozavqKE3g1wtptpa2re9eRbK+H7wA7o260et9W1sG57x8pZxWtsBni0RgB+j60RkFnNoicRVCbdmofvIgOjRt0vfLGFNw8+ZSlIlCE20ZU0a5jvpuff47YorgrX9FZzej0Fwbm90iri6i9JW0+MRAxSOpRp5e7NR5Oy0bnZwGG5vV9rIVmgZ1Ran35ZR5HlRACIpNDrg/5b7tbjA3ejtxWPHQ8FM0ZCIZHEjBgnDqB7fYvYSuBwe0qC6IvVIXLIB5j+GYz9ywbGX5DnS+6phUVppc3s2+vhzMSv0dbGXA6Qz5XICJoPLY6Ut4xQJ+uIgayU744ytqpklX2F+PC54RBkhwh2YXNE4tFISyk3wpQji8G/BurOchrvGg7hFsrgywDKNuChOwWWgSNflFhxIQiSBGSUZqgSEY40TV6JO23RCmzp8RxZ/qFxbsPmPR6XxBG2ASSr5mu0DZKdOOgNNPNPnMA/bGYWQzzbGc+iV2TwXnUTyJkOWDOksitAuiB0L7Sz3qw5kJoFaOKkf4HqD3PEstMVckRniUYp7g8awkpl3WBcePjNK2N8+jCAiV3jxxe3OCz/Kt76rtiJ1m4i45SJ0NXS1LCJtdj+uNMIXka9+tLwQMw7eg4kYnLiWv15v1M+Ixd6sSztURi1bOT5CYqDEE8plEJhv0TILhdXwehkdsJIb3/zOi+IycWsettLGhb9IUqTwkVv+A14d6LBCp7dDpJAHwU+Fgh0IyUIrNxvn0ZfRz8GgpDvVYrt8mtuBamr2rubPLZNsWL30WLAFRCIpxFphCitqM0enk46hsyaYMC9f12/QURG9l8puxC9UGk7BtlXMhWosajkquoAVgvNTgboc/5mLDibwhxaSMgtKPXFFLpeDcm6J P4JFkdGe 4WXFoGUGFzA3vYSm5wQdbfXOOnZ4Rftp/F9tURaB7JI+ReqsKh3PxPCHaX8K81N1Mx63JaCoqS6N3xh4wyrSXlqtfx2qB4hJVBFPBoKqglsuK12B2ZibZFmn0OqWusK3rnDD5k4x4zOvEULRcUOtP6lBCfbjafbKh4BEW1mrD2c5sVM/4N5cz5XIk5v3/h3dOnxTcINDJxXvmESBYrYUNrdKdOE8qqPEXIPLRn4L5EHjrLMCSyootfVw8UzPj6n8ux03b X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 09/09/2024 10:46, Kirill A. Shutemov wrote: > On Thu, Sep 05, 2024 at 10:26:52AM -0700, Charlie Jenkins wrote: >> On Thu, Sep 05, 2024 at 09:47:47AM +0300, Kirill A. Shutemov wrote: >>> On Thu, Aug 29, 2024 at 12:15:57AM -0700, Charlie Jenkins wrote: >>>> Some applications rely on placing data in free bits addresses allocated >>>> by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the >>>> address returned by mmap to be less than the 48-bit address space, >>>> unless the hint address uses more than 47 bits (the 48th bit is reserved >>>> for the kernel address space). >>>> >>>> The riscv architecture needs a way to similarly restrict the virtual >>>> address space. On the riscv port of OpenJDK an error is thrown if >>>> attempted to run on the 57-bit address space, called sv57 [1]. golang >>>> has a comment that sv57 support is not complete, but there are some >>>> workarounds to get it to mostly work [2]. > > I also saw libmozjs crashing with 57-bit address space on x86. > >>>> These applications work on x86 because x86 does an implicit 47-bit >>>> restriction of mmap() address that contain a hint address that is less >>>> than 48 bits. >>>> >>>> Instead of implicitly restricting the address space on riscv (or any >>>> current/future architecture), a flag would allow users to opt-in to this >>>> behavior rather than opt-out as is done on other architectures. This is >>>> desirable because it is a small class of applications that do pointer >>>> masking. > > You reiterate the argument about "small class of applications". But it > makes no sense to me. Sorry to chime in late on this - I had been considering implementing something like MAP_BELOW_HINT and found this thread. While the examples of applications that want to use high VA bits and get bitten by future upgrades is not very persuasive. It's worth pointing out that there are a variety of somewhat horrid hacks out there to work around this feature not existing. E.g. from my brief research into other code: * Box64 seems to have a custom allocator based on reading /proc/self/maps to allocate a block of VA space with a low enough address [1] * PHP has code reading /proc/self/maps - I think this is to find a segment which is close enough to the text segment [2] * FEX-Emu mmap()s the upper 128TB of VA on Arm to avoid full 48 bit addresses [3][4] * pmdk has some funky code to find the lowest address that meets certain requirements - this does look like an ALSR alternative and probably couldn't directly use MAP_BELOW_HINT, although maybe this suggests we need a mechanism to map without a VA-range? [5] * MIT-Scheme parses /proc/self/maps to find the lowest mapping within a range [6] * LuaJIT uses an approach to 'probe' to find a suitable low address for allocation [7] The biggest benefit I see of MAP_BELOW_HINT is that it would allow a library to get low addresses without causing any problems for the rest of the application. The use case I'm looking at is in a library and therefore a personality mode wouldn't be appropriate (because I don't want to affect the rest of the application). Reading /proc/self/maps is also problematic because other threads could be allocating/freeing at the same time. Thanks, Steve [1] https://sources.debian.org/src/box64/0.3.0+dfsg-1/src/custommem.c/ [2] https://sources.debian.org/src/php8.2/8.2.24-1/ext/opcache/shared_alloc_mmap.c/#L62 [3] https://github.com/FEX-Emu/FEX/blob/main/FEXCore/Source/Utils/Allocator.cpp [4] https://github.com/FEX-Emu/FEX/commit/df2f1ad074e5cdfb19a0bd4639b7604f777fb05c [5] https://sources.debian.org/src/pmdk/1.13.1-1.1/src/common/mmap_posix.c/?hl=29#L29 [6] https://sources.debian.org/src/mit-scheme/12.1-3/src/microcode/ux.c/#L826 [7] https://sources.debian.org/src/luajit/2.1.0+openresty20240815-1/src/lj_alloc.c/ > With full address space by default, this small class of applications is > going to *broken* unless they would handle RISC-V case specifically. > > On other hand, if you limit VA to 128TiB by default (like many > architectures do[1]) everything would work without intervention. > And if an app needs wider address space it would get it with hint opt-in, > because it is required on x86-64 anyway. Again, no RISC-V-specific code. > > I see no upside with your approach. Just worse user experience. > > [1] See va_high_addr_switch test case in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/mm/Makefile#n115 >