From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CE56C54E58 for ; Fri, 15 Mar 2024 03:40:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A942800F9; Thu, 14 Mar 2024 23:40:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 858A9800B4; Thu, 14 Mar 2024 23:40:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7205F800F9; Thu, 14 Mar 2024 23:40:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6301C800B4 for ; Thu, 14 Mar 2024 23:40:47 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7846980ABC for ; Fri, 15 Mar 2024 03:40:46 +0000 (UTC) X-FDA: 81897871692.19.68391E8 Received: from mail.zytor.com (terminus.zytor.com [198.137.202.136]) by imf03.hostedemail.com (Postfix) with ESMTP id 95D9620005 for ; Fri, 15 Mar 2024 03:40:43 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=zytor.com header.s=2024021201 header.b=Jb1Rt4Ce; dmarc=pass (policy=none) header.from=zytor.com; spf=pass (imf03.hostedemail.com: domain of hpa@zytor.com designates 198.137.202.136 as permitted sender) smtp.mailfrom=hpa@zytor.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710474044; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d12bgbWcboIeLzPqweVBTKg7Wy5TnLglhiQB0KflxuE=; b=j29Rzy5/PM2ktROGmSJDu9FcNwRdgRQg06+pTlGxQr6ihuGjJeSCjMqjaRtMiNqE3xd6rM 9EicxoZYQaOBGF6oK5ZNzQawGkXXKuFaYqEb+yQeCuY9y0Ofc36kAdH0/QmXhdUByjbR7j MckH+WNbZkIhRfvU8EbLPf5pXutmLik= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=zytor.com header.s=2024021201 header.b=Jb1Rt4Ce; dmarc=pass (policy=none) header.from=zytor.com; spf=pass (imf03.hostedemail.com: domain of hpa@zytor.com designates 198.137.202.136 as permitted sender) smtp.mailfrom=hpa@zytor.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710474044; a=rsa-sha256; cv=none; b=RyhCYMRAWZTE2m/3q/t12YlWZ5sACrnyHJU9Dpvy+504AoIHPhGscvs8xyFoaVX2OSwXRu 7tBhDELMwzedG6K0JZhPY9SzwEPH732cVEXM0HuAL2Wc18B4sJ3ygUMjgyMn8cLlrr+lON VRG2pP6HSB9E0+cS4BopCEdeHoBkKxQ= Received: from [127.0.0.1] ([76.133.66.138]) (authenticated bits=0) by mail.zytor.com (8.17.2/8.17.1) with ESMTPSA id 42F3de723195829 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Thu, 14 Mar 2024 20:39:40 -0700 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.zytor.com 42F3de723195829 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zytor.com; s=2024021201; t=1710473984; bh=d12bgbWcboIeLzPqweVBTKg7Wy5TnLglhiQB0KflxuE=; h=Date:From:To:CC:Subject:In-Reply-To:References:From; b=Jb1Rt4Ce4pcqjp00IixnEU0v/Uu7UQkorVXSeDw6i/ntNT45vLWnbKCQj62J6guYk xu3rh4V9isDZu7pLQjh8crm8G0HhnbaB2zAmrZ/IspQFKExfW6MplY6U59A+iwzezy 2HXq6D6FNxtYCB3MYJB8x+7EW3853RsC3ILRThV90Bh1qUdcmYfA8DWMU21JDFX6ja TGwvSIWJqPn8IdNhwdG5/sIxxkyTd3ga1+gaHwjZLloC6jBuPfOUzjJ81qVK7Li5uA KzI6iRovLwf11OwPSaLiK5k8HZPnz63SYnV9dzIfHZ0sxLxOOdOEO8rM0ysgaW9KTm TX2qdoW5Q2UDw== Date: Thu, 14 Mar 2024 20:39:36 -0700 From: "H. Peter Anvin" To: Pasha Tatashin , Matthew Wilcox CC: Kent Overstreet , linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, hch@infradead.org, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mjguzik@gmail.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, tglx@linutronix.de, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com Subject: Re: [RFC 00/14] Dynamic Kernel Stacks User-Agent: K-9 Mail for Android In-Reply-To: References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <2cb8f02d-f21e-45d2-afe2-d1c6225240f3@zytor.com> <2qp4uegb4kqkryihqyo6v3fzoc2nysuhltc535kxnh6ozpo5ni@isilzw7nth42> Message-ID: <39F17EC4-7844-4111-BF7D-FFC97B05D9FA@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: crxffokhow1tgge9bb63j3n61q8pd3io X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 95D9620005 X-HE-Tag: 1710474043-738406 X-HE-Meta: U2FsdGVkX18LmfO7IW6tZ5f/f0MFGSCuYe4xjJj+aRMAOF8CLLnbmUDUcQAZkNpl1XKFaNTkfODk6VXOMgKMv5erHiEabrrEg2HKn5IF29xv0XZuPLCZUe93qBElBSkb+3zIoDfPsPA85Xg+CcVw8xt1QhbSZsZlNtrT/Ijr9IcBpTkZoIpmgowzMdrFhuPAYz2Pfb6loMhjibt+Gv26MX5AaY7Dhgx9JFhtlRP2Y9tjpoAPSPZGqPPxfuHaShCGmr75vICIe1qCK2RePWkq22gaTFTGzP+1THPgNNpo1BQI68vXcY8kI6dR2cYfvD8uvfB6bzkXdZiQmDWsKsf44GCvNpsBh0cOJuo0CVlF26KjGz57Ht40FTyxU2aG8JAzy6Axa3dEzSpmEaKn9q7/BlwOTUcDm2qETom94qX7pqJ+e+i9LQrBmnkmvXrebLqnhCBnO4ksu9n9nV3ooAe1ZOLELcy6GmDzD42XmaPDuJVLC4c6OTPFt/oKqcPRgEnvh3vqI9kPsA1CAgKcBI6iJJoRoIvfM4Rt+EVkTQlzlep6s+OD/yqVmkpdwh6Pa2Jx3ASyc7QWctXCj6GjgQQzrALmpvccu83RxjZK3LLHplKiP0IVxe4ALuhdsZGMmjYFEWv0B4z2nkAL5hut7ZShbg4bsx+rr6nGOIa+ahtGFeh2xFx4MoTa9m5Q4seRuTPA1CxAiFfsNoOCifi0mhofY+CHOOSA5/NepDbkRTnvA/0Acy0gVW+X9Ty7wetMtYW44dEra3Q/YeCwd1/kWC8g2GfG0OLSrZ5DpP+jyh//W+Qr/W5CYAZj9MdWrTXqQSotBF3eqQljD4dJxmM9yozf0w5p5trzZViy2rVhWg/tKsSIvC6ldjGaXp6D+vb/9O0c X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On March 14, 2024 8:13:56 PM PDT, Pasha Tatashin wrote: >On Thu, Mar 14, 2024 at 3:57=E2=80=AFPM Matthew Wilcox wrote: >> >> On Thu, Mar 14, 2024 at 03:53:39PM -0400, Kent Overstreet wrote: >> > On Thu, Mar 14, 2024 at 07:43:06PM +0000, Matthew Wilcox wrote: >> > > On Tue, Mar 12, 2024 at 10:18:10AM -0700, H=2E Peter Anvin wrote: >> > > > Second, non-dynamic kernel memory is one of the core design decis= ions in >> > > > Linux from early on=2E This means there are lot of deeply embedde= d assumptions >> > > > which would have to be untangled=2E >> > > >> > > I think there are other ways of getting the benefit that Pasha is s= eeking >> > > without moving to dynamically allocated kernel memory=2E One icky = thing >> > > that XFS does is punt work over to a kernel thread in order to use = more >> > > stack! That breaks a number of things including lockdep (because t= he >> > > kernel thread doesn't own the lock, the thread waiting for the kern= el >> > > thread owns the lock)=2E >> > > >> > > If we had segmented stacks, XFS could say "I need at least 6kB of s= tack", >> > > and if less than that was available, we could allocate a temporary >> > > stack and switch to it=2E I suspect Google would also be able to u= se this >> > > API for their rare cases when they need more than 8kB of kernel sta= ck=2E >> > > Who knows, we might all be able to use such a thing=2E >> > > >> > > I'd been thinking about this from the point of view of allocating m= ore >> > > stack elsewhere in kernel space, but combining what Pasha has done = here >> > > with this idea might lead to a hybrid approach that works better; a= llocate >> > > 32kB of vmap space per kernel thread, put 12kB of memory at the top= of it, >> > > rely on people using this "I need more stack" API correctly, and fr= ee the >> > > excess pages on return to userspace=2E No complicated "switch stac= ks" API >> > > needed, just an "ensure we have at least N bytes of stack remaining= " API=2E > >I like this approach! I think we could also consider having permanent >big stacks for some kernel only threads like kvm-vcpu=2E A cooperative >stack increase framework could work well and wouldn't negatively >impact the performance of context switching=2E However, thorough >analysis would be necessary to proactively identify potential stack >overflow situations=2E > >> > Why would we need an "I need more stack" API? Pasha's approach seems >> > like everything we need for what you're talking about=2E >> >> Because double faults are hard, possibly impossible, and the FRED appro= ach >> Peter described has extra overhead? This was all described up-thread= =2E > >Handling faults in #DF is possible=2E It requires code inspection to >handle race conditions such as what was shown by tglx=2E However, as >Andy pointed out, this is not supported by SDM as it is an abort >context (yet we return from it because of ESPFIX64, so return is >possible)=2E > >My question, however, if we ignore memory savings and only consider >reliability aspect of this feature=2E What is better unconditionally >crashing the machine because a guard page was reached, or printing a >huge warning with a backtracing information about the offending stack, >handling the fault, and survive? I know that historically Linus >preferred WARN() to BUG() [1]=2E But, this is a somewhat different >scenario compared to simple BUG vs WARN=2E > >Pasha > >[1] https://lore=2Ekernel=2Eorg/all/Pine=2ELNX=2E4=2E44=2E0209091832160= =2E1714-100000@home=2Etransmeta=2Ecom > The real issue with using #DF is that if the event that caused it was asyn= chronous, you could lose the event=2E