From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB713C87FC9 for ; Thu, 29 Aug 2024 17:00:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 480716B0095; Thu, 29 Aug 2024 13:00:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 430516B00B0; Thu, 29 Aug 2024 13:00:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F8F16B00B2; Thu, 29 Aug 2024 13:00:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0FB236B0095 for ; Thu, 29 Aug 2024 13:00:52 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B4C218064B for ; Thu, 29 Aug 2024 17:00:51 +0000 (UTC) X-FDA: 82505897502.10.AF9B8DA Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf02.hostedemail.com (Postfix) with ESMTP id DE3D38003A for ; Thu, 29 Aug 2024 17:00:48 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=dabbelt-com.20230601.gappssmtp.com header.s=20230601 header.b=AN6VOYTg; dmarc=none; spf=pass (imf02.hostedemail.com: domain of palmer@dabbelt.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=palmer@dabbelt.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724950783; a=rsa-sha256; cv=none; b=5+kwdyY/Hmq4SQ4jOxFkFfCpg7FSukG10yWcAn4BJ17UYG6wqdlbQ4Xv+R7bvCqm6JeAg/ BKzycMAiYlSIS5H4u3oUmvpfWMwwGfK0axOi9rnChJMn1nVoFlU/DXEFzVGk2+mr7pxFJA Is14WoWYtEfOQCh6S1SebHe7tidaMug= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=dabbelt-com.20230601.gappssmtp.com header.s=20230601 header.b=AN6VOYTg; dmarc=none; spf=pass (imf02.hostedemail.com: domain of palmer@dabbelt.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=palmer@dabbelt.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724950783; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to:references:dkim-signature; bh=v6yVeKIAZdNlCjwxHunt8GrVJ/ApsEacIVWq6Y0ogzc=; b=ng4LCz64TJJI/OTNcJ6ywDQ5jvrxiAPsR17qtAhsSHav0a/31PjUVjYXlK8PqAWd3FBwQ4 bjkWbt1cnLP9GOp3C7TxvGEmxfOhumjWlyk+XmuyfSXx8mr6+jcWop0TwSX98ryV30AnEB Yvfbic9Vlox5UlBOC6u7svqIrLo/2vs= Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-71433096e89so723992b3a.3 for ; Thu, 29 Aug 2024 10:00:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20230601.gappssmtp.com; s=20230601; t=1724950847; x=1725555647; darn=kvack.org; h=message-id:to:from:cc:in-reply-to:subject:date:from:to:cc:subject :date:message-id:reply-to; bh=v6yVeKIAZdNlCjwxHunt8GrVJ/ApsEacIVWq6Y0ogzc=; b=AN6VOYTg3PM0cbYQS0MhXFK1u2ttZtRnSWUFdk4ECNn/BNcBw7taiLVX8vBg3teGJh rcGOpLvapb7oqJHoUa8n++ObEw8AB3S09DCFqmUHTn6C9sfY406Sy1QhMj1/X57fbXHf Kuapgsbk0ByQqgjkXwNlK6k24rXp72RXJRXkN08NRCQNsl0fc3eBw7ektWMdczKG0xDJ mDChzlws2UHXJC/7ETfeDxHfIgZhl6cV2GmD0k5r83v7xTqXCJEcelo837KP9Xi1aquL 6QynBMpYOHpsjMs98BNiojxZfq4kBeui8YsUgLV17MjjxTq54uiN1e9TbQYMt6zdiufu OZ2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724950847; x=1725555647; h=message-id:to:from:cc:in-reply-to:subject:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=v6yVeKIAZdNlCjwxHunt8GrVJ/ApsEacIVWq6Y0ogzc=; b=f8ecD7Krieu89LZ5mDmKVjM/7a9OckgKklyYr+QIQJCiVKmqL2py61obeW51ieOXtw SoQIJbhkGJdsx+uxYeR6OVooIbbj6dKmp6dEa3UJTyNTfm9d1ilLI5OGKPB94hPAW2kE /L07pAPt/XiruqixK/dB5XqfjcQs0Z+DvN08HaqrseWpbIBGJJpZv6rs6wz4EHJNa5eV 3SbfcWYM+5ZJbveDxZ3l78WYvcgKnmrF72BLdCv3YeFjLzR8s/Swclx1zuQJVXA0vamT 1+2RIe2screInienn4NQPkQZXed0YsLPMfLp5zHaEEv9Px95A3C9uc9AiXSZkRLMYsLe eM5Q== X-Forwarded-Encrypted: i=1; AJvYcCWzrE8qkHh+lW4bhRHjluyMUqRxVFW2E1e5aeT47ItiJpZmrWEQSKYJ0EmK7+aVFTYCe3P6Emq3Ew==@kvack.org X-Gm-Message-State: AOJu0YxxWjaiPnbHY/YfzAWxMmZT9LTNRabt99iFc6eM5i2oL2QBHHW5 iBiD3lqIiLIkEKI5lC7yfMYQ+sT0Rj1Ura4YjUosTHd1rHmB0L2OER7hKmrzzhA= X-Google-Smtp-Source: AGHT+IH3f6jrHjn3UecOjnnTSh7f1mjtosaBnqcvhiQTmwoBq2WfcZwsjSYUQZNm8240zpeF9tE3wA== X-Received: by 2002:a05:6a00:91a0:b0:70d:2b1b:a37f with SMTP id d2e1a72fcca58-715dfc22895mr4520173b3a.24.1724950846693; Thu, 29 Aug 2024 10:00:46 -0700 (PDT) Received: from localhost ([192.184.165.199]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7d22e77438dsm1466259a12.27.2024.08.29.10.00.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Aug 2024 10:00:45 -0700 (PDT) Date: Thu, 29 Aug 2024 10:00:45 -0700 (PDT) X-Google-Original-Date: Thu, 29 Aug 2024 10:00:42 PDT (-0700) Subject: Re: [PATCH RFC v2 0/4] mm: Introduce MAP_BELOW_HINT In-Reply-To: CC: Charlie Jenkins , Arnd Bergmann , Richard Henderson , ink@jurassic.park.msu.ru, mattst88@gmail.com, vgupta@kernel.org, linux@armlinux.org.uk, guoren@kernel.org, chenhuacai@kernel.org, kernel@xen0n.name, tsbogend@alpha.franken.de, James.Bottomley@HansenPartnership.com, deller@gmx.de, mpe@ellerman.id.au, npiggin@gmail.com, christophe.leroy@csgroup.eu, naveen@kernel.org, agordeev@linux.ibm.com, gerald.schaefer@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, ysato@users.sourceforge.jp, dalias@libc.org, glaubitz@physik.fu-berlin.de, davem@davemloft.net, andreas@gaisler.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, luto@kernel.org, peterz@infradead.org, muchun.song@linux.dev, akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, shuah@kernel.org, Linus Torvalds , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linux-api@vger.kernel.org From: Palmer Dabbelt To: vbabka@suse.cz Message-ID: X-Rspamd-Queue-Id: DE3D38003A X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: w9bsystp1et1x8op4xa9sqi6mjs1bdcy X-HE-Tag: 1724950848-444293 X-HE-Meta: U2FsdGVkX1+4M3jhhrL8ZArkh4wqwRx6T/jlRJIwYQNHJbSYsbG+JWmCs5mZ23pZO2/OGZTU3UNAg9O1yNKFIcC/POQfZgC9Zy9a2mVJtZfNehha5E4fDsyS+rRQVWDjIVPep/AZrMPfoAIkoeXEmuuf86P/24Pt+TF5fy+YbwYcpzD2kf2RyklwuPZm/1DbjBLz8S2wDlVMFRpTpLIegvTFelK6QgKgi3Aodfnnl8lPG2Moo4B3uzwxuLWUwU4N0mj2UX/tywhwESNfqfIWfMd7wzY6plUSLQeB0oSEjzrilfGLh0h9YFYjdNyjLQZNhhrrKgdjeSw8NzwKqU9LTHtiDs9zpUr8DO4wwNQyxFciKvpviC2E/6gmtavwZBPmA8n+U1enjykiNvAzKHyaLNy3bAfVIRckgI+XtaEJolUg3+RsR1nUzU8xOefNVEpK3vzPPQojTqUBzbyuKQZdVWQgfyoPHh981pnpdPImyU9WHf2alOa01RCLchB7WTdZO3nWH/qQ7p2zZyPiXCRuV5JOxV4KbQsbVPgD3Rt66XQBFt0VdR6XZduFVWLehNHOeXGRjoxrvyUM2ahzRHVE58CF+WX/FcfoVzE44++qZxVXDZnKCAwN8g0Pb5FON6Svcd8ydgAIkO8Bk21J3uv2VfbP6NIw9C2s6wu6MXfx6dpFmnd24E2cpQC4qmrHPKXv1om05vg+ItO+UAppiDpPhb4D7IbZY3LZnvQ8SOziU9HiiSHG2k3FTdS0oC286SxDT0Q4KpSgB3luw2u9IDRa84/uaiyd6lZk6IC03txN/2D0FDeirsWSizkNTW6DNCBp5gH8rBoZWf72n0Eija/DuwWpEA6MhFXjxCxJq4dSpRFWVxUtqHu7pUrYLqoNRIrlevnDOD3TkKrH5ABeB94cCo1cuGXXhEhJRxHvTjDcL71BzkVfIrOMoU7nsj/a8x47Xm0tSG0H1nLUD1YKup8 MXVqCI0D 85zXMS0sByY3MfeDBGVd7hVHD+Uzs0IeS6xtKYCEi618mDxTBxTc2ECFH8YSPkeF8YElWLWefT2nEa0NUi1mLe9y3lKD0LF/7U2H5tQUJOlw+PKh2+CJa2ZtQ5MMi5AVDAbWMPgO7wtAgYcoYTXry1z6dRo9v+7wyCrzgQ2joES7ahA5rGi41gd31sucwuk2n+fpKMwkkzXf8kAuxFyGvHtgJqkofNbLC6NUKJQK7ZeK3ouL2QqVRi04ZFguP1yfef46AE1qTU06/VwEJg0NkaK3fIxY4pgD1nYIVLjqpb5TZDrHDeakTaAMhZzGgS5+MEVzZEf8ChLfvJionHYSPQI/je8a9Fxhclk8CUoU9HijSxZHmqDiYXSJuTC8AIbQIp8CfefbgnXphkWynQV4CoPcXhzASNYnv5yZdcfwiM5DkmufUQznjOcE2b6k4HYgovG6hRpNruBRgX9NLkLhfG/gKifbq4M7jTY12LHX7YRlkuq6hjXqNN5jneEgvOYsnbUukHB4AiMSJgivvlrp6I7mgkevDMnX5nn/ZNb1AT5CiUvsHEDxJ6A7hIlInnrHUXnchjNjuczebs8ZsHHEXrFJPO0xrFjHj8NjauzBTwTveXsCN9717fcFZI9salRGHGpWeQDmkPl2xTzRSRj8y1qiCjGTPDSDB/DzfeeKNfwesTH2lcD1mYmfBw0Avu7SefwB01bobI5pMiRWUDpciY0FzCDczS9nYMZxFIEeIu9qE2TpqFZxjUS4w6pYstzacaoQ6C+5RLvWB4bq2QuYmYk8kdh9a1d4DYOjnrepfoBz5an2zgpK+tm6qOeLOr2f6757lRN7HSltE4z7Gfh7OY+VXUO189oPRWt4p8VE/b1WUC7HxI4YMm79JiXySP9lJyHq7WEYMcympPUM5UX64RTaurzNHjSRdzCgkE9peFjxEKtgg2FGeDoVaDq/AD0GaSXNR6ldlAfvIqwUYzavUUkWKR6rG Rbdz04Ap cJyrfpZXigB31wxu63BM5rFbURMTNYua5gnzCD4c9cdZB1pd/wES0XcLhSiBkkPXdT3+5po8Xw8BFc89767PM64tke+gdzGv/fORaGWKhKRJ2wIUi5v15mDLYJdaLU3Hz/7sGTbw7j5hSjgGG7NZbHNzeDNcfWVj8crF9yvS3AVrqk3QQzoPx40SoyM8h24m1JrsUBvWoBWM4nHEMFSabM19Fy1Vcr8304ZNuhw3gDFUbTuis2aB8HiOoi259BX+G/woo9NRqUEc3zadMDRIhbONmcpoG+qDcKlCgeB27iFBMtvNdXhMG40Pq3SsZ5JwRZPLLZpLikSj98flYgJQjYz/GN6jcnTNKhYKusHyrNq/0Pkk8tC5CtzAidWFIv+3awNO+tqeJZde8gv1uPpVfg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 29 Aug 2024 02:02:34 PDT (-0700), vbabka@suse.cz wrote: > Such a large recipient list and no linux-api. CC'd, please include it on > future postings. > > On 8/29/24 09:15, Charlie Jenkins wrote: >> Some applications rely on placing data in free bits addresses allocated >> by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the >> address returned by mmap to be less than the 48-bit address space, >> unless the hint address uses more than 47 bits (the 48th bit is reserved >> for the kernel address space). >> >> The riscv architecture needs a way to similarly restrict the virtual >> address space. On the riscv port of OpenJDK an error is thrown if >> attempted to run on the 57-bit address space, called sv57 [1]. golang >> has a comment that sv57 support is not complete, but there are some >> workarounds to get it to mostly work [2]. >> >> These applications work on x86 because x86 does an implicit 47-bit >> restriction of mmap() address that contain a hint address that is less >> than 48 bits. >> >> Instead of implicitly restricting the address space on riscv (or any >> current/future architecture), a flag would allow users to opt-in to this >> behavior rather than opt-out as is done on other architectures. This is >> desirable because it is a small class of applications that do pointer >> masking. > > I doubt it's desirable to have different behavior depending on architecture. > Also you could say it's a small class of applications that need more than 47 > bits. We're sort of stuck with the architeture-depending behavior here: for the first few years RISC-V only had 39-bit VAs, so the defato uABI ended up being that userspace can ignore way more bits. While 48 bits might be enough for everyone, 39 doesn't seem to be -- or at least IIRC when we tried restricting the default to that, we broke stuff. There's also some other wrinkles like arbitrary bit boundaries in pointer masking and vendor-specific paging formats, but at some point we just end up down a rabbit hole of insanity there... FWIW, I think that userspace depending on just tossing some VA bits because some kernels happened to never allocate from them is just broken, but it seems like other ports worked around the 48->57 bit transition and we're trying to do something similar for 39->48 (and that works with 49->57, as we'll have to deal with that eventually). So that's basically how we ended up with this sort of thing: trying to do something similar without a flag broke userspace because we were trying to jam too much into the hints. I couldn't really figure out a way to satisfy all the userspace constraints by just implicitly retrofitting behavior based on the hints, so we figured having an explicit flag to control the behavior would be the sanest way to go. That said: I'm not opposed to just saying "depending on 39-bit VAs is broken" and just forcing people to fix it. >> This flag will also allow seemless compatibility between all >> architectures, so applications like Go and OpenJDK that use bits in a >> virtual address can request the exact number of bits they need in a >> generic way. The flag can be checked inside of vm_unmapped_area() so >> that this flag does not have to be handled individually by each >> architecture. >> >> Link: >> https://github.com/openjdk/jdk/blob/f080b4bb8a75284db1b6037f8c00ef3b1ef1add1/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L79 >> [1] >> Link: >> https://github.com/golang/go/blob/9e8ea567c838574a0f14538c0bbbd83c3215aa55/src/runtime/tagptr_64bit.go#L47 >> [2] >> >> To: Arnd Bergmann >> To: Richard Henderson >> To: Ivan Kokshaysky >> To: Matt Turner >> To: Vineet Gupta >> To: Russell King >> To: Guo Ren >> To: Huacai Chen >> To: WANG Xuerui >> To: Thomas Bogendoerfer >> To: James E.J. Bottomley >> To: Helge Deller >> To: Michael Ellerman >> To: Nicholas Piggin >> To: Christophe Leroy >> To: Naveen N Rao >> To: Alexander Gordeev >> To: Gerald Schaefer >> To: Heiko Carstens >> To: Vasily Gorbik >> To: Christian Borntraeger >> To: Sven Schnelle >> To: Yoshinori Sato >> To: Rich Felker >> To: John Paul Adrian Glaubitz >> To: David S. Miller >> To: Andreas Larsson >> To: Thomas Gleixner >> To: Ingo Molnar >> To: Borislav Petkov >> To: Dave Hansen >> To: x86@kernel.org >> To: H. Peter Anvin >> To: Andy Lutomirski >> To: Peter Zijlstra >> To: Muchun Song >> To: Andrew Morton >> To: Liam R. Howlett >> To: Vlastimil Babka >> To: Lorenzo Stoakes >> To: Shuah Khan >> Cc: linux-arch@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org >> Cc: linux-alpha@vger.kernel.org >> Cc: linux-snps-arc@lists.infradead.org >> Cc: linux-arm-kernel@lists.infradead.org >> Cc: linux-csky@vger.kernel.org >> Cc: loongarch@lists.linux.dev >> Cc: linux-mips@vger.kernel.org >> Cc: linux-parisc@vger.kernel.org >> Cc: linuxppc-dev@lists.ozlabs.org >> Cc: linux-s390@vger.kernel.org >> Cc: linux-sh@vger.kernel.org >> Cc: sparclinux@vger.kernel.org >> Cc: linux-mm@kvack.org >> Cc: linux-kselftest@vger.kernel.org >> Signed-off-by: Charlie Jenkins >> >> Changes in v2: >> - Added much greater detail to cover letter >> - Removed all code that touched architecture specific code and was able >> to factor this out into all generic functions, except for flags that >> needed to be added to vm_unmapped_area_info >> - Made this an RFC since I have only tested it on riscv and x86 >> - Link to v1: https://lore.kernel.org/r/20240827-patches-below_hint_mmap-v1-0-46ff2eb9022d@rivosinc.com >> >> --- >> Charlie Jenkins (4): >> mm: Add MAP_BELOW_HINT >> mm: Add hint and mmap_flags to struct vm_unmapped_area_info >> mm: Support MAP_BELOW_HINT in vm_unmapped_area() >> selftests/mm: Create MAP_BELOW_HINT test >> >> arch/alpha/kernel/osf_sys.c | 2 ++ >> arch/arc/mm/mmap.c | 3 +++ >> arch/arm/mm/mmap.c | 7 ++++++ >> arch/csky/abiv1/mmap.c | 3 +++ >> arch/loongarch/mm/mmap.c | 3 +++ >> arch/mips/mm/mmap.c | 3 +++ >> arch/parisc/kernel/sys_parisc.c | 3 +++ >> arch/powerpc/mm/book3s64/slice.c | 7 ++++++ >> arch/s390/mm/hugetlbpage.c | 4 ++++ >> arch/s390/mm/mmap.c | 6 ++++++ >> arch/sh/mm/mmap.c | 6 ++++++ >> arch/sparc/kernel/sys_sparc_32.c | 3 +++ >> arch/sparc/kernel/sys_sparc_64.c | 6 ++++++ >> arch/sparc/mm/hugetlbpage.c | 4 ++++ >> arch/x86/kernel/sys_x86_64.c | 6 ++++++ >> arch/x86/mm/hugetlbpage.c | 4 ++++ >> fs/hugetlbfs/inode.c | 4 ++++ >> include/linux/mm.h | 2 ++ >> include/uapi/asm-generic/mman-common.h | 1 + >> mm/mmap.c | 9 ++++++++ >> tools/include/uapi/asm-generic/mman-common.h | 1 + >> tools/testing/selftests/mm/Makefile | 1 + >> tools/testing/selftests/mm/map_below_hint.c | 32 ++++++++++++++++++++++++++++ >> 23 files changed, 120 insertions(+) >> --- >> base-commit: 5be63fc19fcaa4c236b307420483578a56986a37 >> change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55