From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94AE8EE57FD for ; Thu, 12 Sep 2024 02:39:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C8F58D0002; Wed, 11 Sep 2024 22:39:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 079B46B00B2; Wed, 11 Sep 2024 22:39:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5BA48D0002; Wed, 11 Sep 2024 22:39:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C6C8D6B00B0 for ; Wed, 11 Sep 2024 22:39:48 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4B45781AAE for ; Thu, 12 Sep 2024 02:39:48 +0000 (UTC) X-FDA: 82554530856.25.F9A945C Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf09.hostedemail.com (Postfix) with ESMTP id 79F3014000C for ; Thu, 12 Sep 2024 02:39:46 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MHOSYePc; spf=pass (imf09.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726108733; a=rsa-sha256; cv=none; b=EryT91oVdg4gn3BzuB3Y7qHThsbngLZoBy91kpplql8+Sk08CjJp5JBirKWkp7stlC3kJ2 TrS5JX1rhbgw7exzz73hlCW4Ulk7qCFtITDfIoCxYF6aowyr3OG8jBE2H2qHvBUZxtxPyw 7zEmDwwnpOmLfy5McXE+htNNBOYinpw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MHOSYePc; spf=pass (imf09.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726108733; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k/kkkBe9iezBDeEgawr9D7kOffi9AK834GJ+Uv7qWW8=; b=rkj+EYasX5lFnYA/AJfbBvGINY11wSkvK6Nnb/6j5mpp1JRGYKTJx7soBMGJjvAy25NnkV kRsROKOPJTORrnLq8WZYyyE94eWzTNi1QV5ZK32miNa3de7FiWIkgXhh5PeczpmzMQfv21 /xkdbc4jy0iE5NZjH3/blM9onvtMJYc= Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-a8d0d82e76aso62025166b.3 for ; Wed, 11 Sep 2024 19:39:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726108785; x=1726713585; darn=kvack.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=k/kkkBe9iezBDeEgawr9D7kOffi9AK834GJ+Uv7qWW8=; b=MHOSYePcWYXUHGSLesnLzJo+Y7Yd83fdlXzgDOavGuzEWy5vWCZoo71HZK+VAlEnsJ Uqa+fScXQKbux0LLE7N58CIFFQG2KnTw5wYiDuofKWwWW89c5VSNVfWzA5YtdS6ODPR1 JdJUw44uFsszxa3x4luRVcD+RWJ7e5+RWv1nYyTtYAZeINcEJKddFJG9ww/yD0gyE0ME 3UeOECGJX6E1Ar9bMEsqlNPekv/wi4j3JknRGqWFf1tYxkd/WIjW39H4kX+slsrP8hkt mGn8hbdmgWNoz2CV2ZSjWw4iGVRGBS4myxMnLzyDj315Ip89iA92RYgsd43zrm2rIh2a Fy3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726108785; x=1726713585; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k/kkkBe9iezBDeEgawr9D7kOffi9AK834GJ+Uv7qWW8=; b=k/hBxUHjYlUtDS3FFA/Of8lgRYowm+MLD1Em2pJjvlNyAnfuOWeC5XsM909e4MeRto toKiz2Du6Zu/RRmsgiuU/QdoUpgPLmS79UFJuAJBx/1aNxd8SDVxxfb7fpOHOLPeaCuB ystaVOk/ZYrknasSlGTAWfPxQeW0j/SQLtgLmKfFR3tHgXjOTmi2eRQS6M28A1JBsTo3 Z0hUsahVf8jSJeifD1lVicpiFk0nfntCaftdJcoMHJH6ByhmakFN/U1oYl6Lw8+DiYID edU/6q4Q0urFBxnxT+QJX2+eUcVmpeoyp3QAaUaQGXOyKEjqVBr5JKTS+k1buy3xJn7X 9I2g== X-Forwarded-Encrypted: i=1; AJvYcCWbD2fgkAakaa6nAFmxrn73RlAR4LVo4POHE26mJ/DL3SKbYQEy3iU3ipTTxJU5GOhCl2MdAenejw==@kvack.org X-Gm-Message-State: AOJu0YxmDu4JYQpbW017WRCyXvzb9bGmLfUmLCBJDhkrF+Mk2RhDda9O qrKLWPXBTUhloeqUe2WwONxNcr3Beu9YGphOkcT6IyiLJBlgN1eld+eRmcUUR/k7I3tGvyLcjb8 jUqU2MiZsr9MY8S/9Qhv4VBlTGdg= X-Google-Smtp-Source: AGHT+IFt4V8e236rRYDmApHPERZeFERsGtFTWd5FkFJ7PPhgLtzf+6ih0bwQGygY9Sh14JzyV5nfIMKwNbll/Qs96SM= X-Received: by 2002:a17:907:f75a:b0:a7a:a0c2:8be9 with SMTP id a640c23a62f3a-a9029432e30mr139276566b.18.1726108784683; Wed, 11 Sep 2024 19:39:44 -0700 (PDT) MIME-Version: 1.0 References: <95c4efe9-e92a-46fe-bf41-9141e125332d@gmx.de> <75ihgmlcmou7yatoeva5sezbf6stow4gtdyurwzj5fxghjq7yw@o3u5wudjdnkk> In-Reply-To: <75ihgmlcmou7yatoeva5sezbf6stow4gtdyurwzj5fxghjq7yw@o3u5wudjdnkk> From: Yang Shi Date: Wed, 11 Sep 2024 19:39:31 -0700 Message-ID: Subject: Re: [PATCH] [RFC] mm: mmap: Allow mmap(MAP_STACK) to map growable stack To: "Liam R. Howlett" , Yang Shi , Helge Deller , Helge Deller , linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org, linux-parisc@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: zzg6ejo9cpssstzegk7xj8ryatj7fggq X-Rspamd-Queue-Id: 79F3014000C X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1726108786-290175 X-HE-Meta: U2FsdGVkX180Jo0DZuZxBD0uZReene+oiusS/SA5jTsDcDyMLJD1l/iajI0iRfbdhtSpuxGxyzjZ8YGD6gTr85E9/Ttyy8nmZQyfBpQbwaGeH7EKaomGrj+q62zxzAiR/KuAjQnCaVtoJhEqlPo57w4/Ao7uKqbfVLbFSxB3zWVSSTQfSvcF1oTrF9HF2N4aoXObV3aSnw+KJ412iy5jGKpB6oS3mxR4syvy4bKi/wWH9BCH1JctBjnGeTdmVg4agGX7vczJSxkN9AhFMqZJdrJH1lOx94pffwjZA3dD4NVoslCICFthH4CT3QigE+Zu2FpSplWrH4qav0FJJfgqLw3kG6idFNy9I2x5Uc94jb6qOeNYHA3Ydv+KOfNTfEdIZl7pc+spanavol8tmJZlhJ7ivY2UXySP2Fgw3FmX8pgLL/iwahCtsEAAkjapynXosJTX8rDzfqpQqtWC2Z8x299faG3UZpC78adBX9uIovedhHUi10vSjPgBSgTZKZnfkreUUIY18cPLUYspEQYsEvNcENouB0J14Ea63mW3bgfyDd2XWU6C3e5cclvWHFYiRt1QBGuj0ANqCYUNV08qu6YQgfZvZaPUuzkGTZBS/K7tZN4Rp8TAZY65Uc7ZP8sAmcIuP+oSlSChlm2RzyNB9JIQwyBFEKjb41jeGJY/pdKDVnOIWVU64Ssxv+v5+lJDCDo/YHpYGUikd5UI1ApK7jvUc9D52jSEuWgApZAzxTdxxU5WCFFljmPRoD2A7U2uEeW3rUaxG95l200Kk1ZQL0gn/jKLiGRxnoLjDcPYb013ICy1DRWmF3ea2jazMi+RZDxEa4a44NCjWJ5fR7jXKU5PeoEPEBIPTx3sLpLOigLX9bJyskTmxUBxZvE0lzS9g/EW1anj2cZECIxPfhDRtcBDHou4AExGSfRK3W8EkOov/jkRjwt1OJzrDYzdGBN3ibKhWdZDOHPxw+R/UyM 4GdrZBFn WujXBLQ4L+9TptzKX7jZgexsIfte9hZyD+gxl7ljJ/Nu4tdN8EWLwalnkUhvcfZQgrdt+GlYzq9c58zB0e1a/rGUquYd8sLorqKa5osZaccphGlMEE+giNbTmV0gAoHUFEo1MX0Gy9f2yztm2K+/I8GoQO+avGnlUvG34Td45RA21WeBOE4vS1GTIyQHy0nB9DE/As1ja/cmVGzKJG5IlhucOuG0M1HIUZHT88tLQ7hAnTUD2npCQGPJlYSVBF/DtuotzJcqMcMrI28ZMciPkc6UDmRGw40f8knUcr9xUX4BViRsBudGZcJUnNT+j2mozqVvQAdkWu9wqJSFY7Rkqs4foaxHweu3z9wrih3+NhnHziq5OityOqQ0lzpcp3OPge1GYQIbvjtpnqTbMEaKhF/0Cs7LSAnNSHa5uRaItIhaI8SPBoXJXSk1sJA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Sep 11, 2024 at 6:42=E2=80=AFPM Liam R. Howlett wrote: > > * Yang Shi [240911 21:08]: > > On Wed, Sep 11, 2024 at 5:50=E2=80=AFPM Helge Deller wr= ote: > > > > > > On 9/12/24 01:05, Liam R. Howlett wrote: > > > > * Yang Shi [240911 18:16]: > > > >> On Wed, Sep 11, 2024 at 12:49=E2=80=AFPM Liam R. Howlett > > > >> wrote: > > > >>> > > > >>> * Helge Deller [240911 15:20]: > > > >>>> This is a RFC to change the behaviour of mmap(MAP_STACK) to be > > > >>>> sufficient to map memory for usage as stack on all architectures= . > > > >>>> Currently MAP_STACK is a no-op on Linux, and instead MAP_GROWSDO= WN > > > >>>> has to be used. > > > >>>> To clarify, here is the relevant info from the mmap() man page: > > > >>>> > > > >>>> MAP_GROWSDOWN > > > >>>> This flag is used for stacks. It indicates to the kernel vir= tual > > > >>>> memory system that the mapping should extend downward in mem= ory. The > > > >>>> return address is one page lower than the memory area that i= s > > > >>>> actually created in the process's virtual address space. To= uching an > > > >>>> address in the "guard" page below the mapping will cause the= mapping > > > >>>> to grow by a page. This growth can be repeated until the map= ping > > > >>>> grows to within a page of the high end of the next lower map= ping, > > > >>>> at which point touching the "guard" page will result in a SI= GSEGV > > > >>>> signal. > > > >>>> > > > >>>> MAP_STACK (since Linux 2.6.27) > > > >>>> Allocate the mapping at an address suitable for a process or= thread > > > >>>> stack. > > > >>>> > > > >>>> This flag is currently a no-op on Linux. However, by employi= ng this > > > >>>> flag, applications can ensure that they transparently obtain= support > > > >>>> if the flag is implemented in the future. Thus, it is used i= n the > > > >>>> glibc threading implementation to allow for the fact that > > > >>>> some architectures may (later) require special treatment for > > > >>>> stack allocations. A further reason to employ this flag is > > > >>>> portability: MAP_STACK exists (and has an effect) on some > > > >>>> other systems (e.g., some of the BSDs). > > > >>>> > > > >>>> The reason to suggest this change is, that on the parisc archite= cture the > > > >>>> stack grows upwards. As such, using solely the MAP_GROWSDOWN fla= g will not > > > >>>> work. Note that there exists no MAP_GROWSUP flag. > > > >>>> By changing the behaviour of MAP_STACK to mark the memory area w= ith the > > > >>>> VM_STACK bit (which is VM_GROWSUP or VM_GROWSDOWN depending on t= he > > > >>>> architecture) the MAP_STACK flag does exactly what people would = expect on > > > >>>> all platforms. > > > >>>> > > > >>>> This change should have no negative side-effect, as all code whi= ch > > > >>>> used mmap(MAP_GROWSDOWN | MAP_STACK) still work as before. > > > >>>> > > > >>>> Signed-off-by: Helge Deller > > > >>>> > > > >>>> diff --git a/include/linux/mman.h b/include/linux/mman.h > > > >>>> index bcb201ab7a41..66bc72a0cb19 100644 > > > >>>> --- a/include/linux/mman.h > > > >>>> +++ b/include/linux/mman.h > > > >>>> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > > > >>>> return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN= ) | > > > >>>> _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED = ) | > > > >>>> _calc_vm_trans(flags, MAP_SYNC, VM_SYNC = ) | > > > >>>> + _calc_vm_trans(flags, MAP_STACK, VM_STACK = ) | > > > >>> > > > >>> Right now MAP_STACK can be used to set VM_NOHUGEPAGE, but this wi= ll > > > >>> change the user interface to create a vma that will grow. I'm no= t > > > >>> entirely sure this is okay? > > > >> > > > >> AFAICT, I don't see this is a problem. Currently huge page also sk= ips > > > >> the VMAs with VM_GROWS* flags set. See vma_is_temporary_stack(). > > > >> __thp_vma_allowable_orders() returns 0 if the vma is a temporary > > > >> stack. > > > > > > > > If someone is using MAP_STACK to avoid having a huge page, they wil= l > > > > also get a mapping that grows - which is different than what happen= s > > > > today. > > > > > > > > I'm not saying that's right, but someone could be abusing the exist= ing > > > > flag and this will change the behaviour. > > > > > > Wouldn't a plain mmap() followed by madvise(MADV_NOHUGEPAGE) do exact= ly that? > > > Why abusing MAP_STACK for that? > > > > Different sources and reports showed having huge pages for stack > > mapping hurts performance. A lot of applications, for example, pthread > > lib, allocate stack with MAP_STACK and they don't call MADV_NOHUGEPAGE > > on stack mapping. > > > > It makes sense to have a stack with NOHUGEPAGE, but does anyone use > MAP_STACK to avoid the extra syscall to madv to set it on mappings that > are NOT stacks which would now become stack-like with this change? AFAICT, I'm not aware of such usecase. It is definitely not recommended and misuse of MAP_STACK. I don't see how we can prevent this in kernel other than document it properly. > > ... > > > >>>> _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAG= E) | > > > >>>> arch_calc_vm_flag_bits(flags); > > > >>>> } > > >