From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E463BC5475B for ; Mon, 11 Mar 2024 19:55:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 666986B0119; Mon, 11 Mar 2024 15:55:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F03B6B011A; Mon, 11 Mar 2024 15:55:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 490B96B011B; Mon, 11 Mar 2024 15:55:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 33D686B0119 for ; Mon, 11 Mar 2024 15:55:46 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AD3EA140527 for ; Mon, 11 Mar 2024 19:55:45 +0000 (UTC) X-FDA: 81885813450.04.D907E7F Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf14.hostedemail.com (Postfix) with ESMTP id CFD2F10000E for ; Mon, 11 Mar 2024 19:55:43 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=JhLYjrs1; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf14.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710186943; a=rsa-sha256; cv=none; b=56v3eUnL5ub3DSpcg+9v6/glYMxJwAZ5qJbLk9dj0CriimfBvXY8z3LNDRkauonpBCOx1X sBaJmNNbLjLt4Ge8yuLqgZOlxade94JBOXMtnMC1rdsQ2nW5AS1mxRQ13r3yYJJq2nOW7i va1N6E6LkVgHq/af+61ID0KRs1i0Jmk= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=JhLYjrs1; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf14.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710186943; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A1aBHw5gQ1J5Bs32lBVEKHzkNuxIPKT5rP8UBnpACOM=; b=DO+miRHhK7u63A0qeW1ocmO8cJgYlqUE1PsnP3DnHykoCLa/ZR1u50LDMv3pEo9XsIZrFI MkHgSQVVGDJ3lPj83JzuosVSlZ22dZMCS0fYPjrglNyJbqNI9NtkuUdm0iPiRUX59nem6K eJXtLEirdLRLIhO54LFk95YNSIEUi2w= Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-42e323a2e39so41640021cf.1 for ; Mon, 11 Mar 2024 12:55:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710186943; x=1710791743; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=A1aBHw5gQ1J5Bs32lBVEKHzkNuxIPKT5rP8UBnpACOM=; b=JhLYjrs1fuX46y5QSPdito9jfKe3sn6w0KSQhTu3uuElL0rX2vvXPFbXRKjIhbo8Sx R2+BdaYRb+U24Ioa2YEVa/mb0xOr/w3hKbZhnXnsy33jLCZhGu/YPL5cbE5ST9K53amL XbC2gcVQKgLQxw3eS2K+k+HvaWdBlGjS6uNZ8nEn45i1fZ3bd9czI9SvpdtWh/J08rD2 ECtP4iyT10ZcLHsjvH4q4xEUVrCOaNVcyyGPZ5ttLEh7TehZzYbUBDMW9HjOXYzGZDIh yZp/oFj0QHsyVP2Rhe3/V1DqSOvVmb1U7VMus2yVVW6OoXrI2g0ARTebfg2gQcnM4zU8 8jdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710186943; x=1710791743; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A1aBHw5gQ1J5Bs32lBVEKHzkNuxIPKT5rP8UBnpACOM=; b=QpFloFmSF/PgJ3bTTTaYtliEEmyfBEHoAJdyJc5X7xx0D4M+oLAsf09D3OZTuYAnJt HVEYr6zXIWhyc0DzQBdlGfRjtTZeyRP5Y0miIpagOm2Z2vQzZ/Ku8hLul+MK4LDENtj7 zog0mOAw7imV/B6mXdqLr/rIMac9gmoEJ7waKx6+ho00k1MFCMpxyRgvPl15XvEGYh9a KgjHY11KrkAE7SkjJfBnkKIZwWRpzTCIp2ZYBTRzR5mnLL54x+KneLUEKkjcFD32Fnp4 Xo1oiBR1/MdklhIGIABhqd1lHxSpJP3QXutsn5Xsu6PFF8CAsOLQOWwsyMi5OItsSt+m xe7g== X-Forwarded-Encrypted: i=1; AJvYcCWeVKF1QHjestLaZ+cZroHBRFOYbaIGhJ/WbR2RXhd5r8C4dwuok71NQxAntzIZfF4kVHSWIsiHHubnldc1loUaCQA= X-Gm-Message-State: AOJu0YzWA2zZhKFUx/uKkPvYMFxQIz/CsGEhsy8K4MXj33je+Rs2cOpL KCeiYwA0GD02npUROHFQWQzX6jaLuJs4+hW259n47KtGIXb8JcnMlLHuYk5xdiq1lcKfxV2aZGJ 8M09JK4X/vHZDwmjFyAE9eFkW598VGbAfCoJITw== X-Google-Smtp-Source: AGHT+IEbhAQGXiaViORgMJfrvqB5zS4m9SUE1Hq6townGOyxh3VvGdyGHv0NuToNp++Lep6CKcjI4w+N2or15yqECU8= X-Received: by 2002:a05:622a:4cf:b0:42f:201c:d4e3 with SMTP id q15-20020a05622a04cf00b0042f201cd4e3mr12570375qtx.13.1710186942817; Mon, 11 Mar 2024 12:55:42 -0700 (PDT) MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> In-Reply-To: From: Pasha Tatashin Date: Mon, 11 Mar 2024 15:55:06 -0400 Message-ID: Subject: Re: [RFC 00/14] Dynamic Kernel Stacks To: Mateusz Guzik Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, hca@linux.ibm.com, hch@infradead.org, hpa@zytor.com, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kent.overstreet@linux.dev, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, tglx@linutronix.de, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CFD2F10000E X-Stat-Signature: qznzg1hisq3pmo1yw1zbusptpmfp4xt8 X-HE-Tag: 1710186943-425208 X-HE-Meta: U2FsdGVkX1+vrGLZ/HfY8Pln9jy8hzn6Z/ygk6q3ykTtO0eSKQWbqxBELQm8Ix4sWptmDZSwBFRbuOA7YD/zJtMpto9BobE1eNgny5ADaVgNpmIYE5MXKQ6hSlCQ4uY6He1DqS6yN1LE7X8lzZXx46iPeu3atiCBHhmXZ+EQr2gKUDpgXCrzqfIK5fF62v6/nnq4iCpsPdUMIRRCQPx9N02KlbGaitJGTuRS1OYrISPyXITbHJNcCpAaqk7nAJNXamOSLtzEcCnjeCB+/hhKSbDINNID5ceU93zRxUhhvsHcQ/rcMDkxTNsGwwFKKUiHqcoElDYtI+SLseyBWduwcCTaZe2fDM/S7QsEVf5SwyF513cyLs01g8awSY9gZGS3uvEe6SU2WLXgQz5up0ax5n9kip79QZlGvVLC6TLuggvgO5y1PFliIfaApPMV9ESvaoWe19OjydipEqUdab/CfgBjP50ddZgf7bsXt432QGvWFXPUCoPFmDemq+yOnmdJn6q2y/+dwn2DSePf2eXOo/Pnmjqnq3mADZZ1qCSMtm3rYDVKzvk43QOh4EUREfQQUCuff765au5bMDbPfpt29VK/6pZ50WeFvh3UHioRfWzuUxQpAeZM+AkAlZ8WZTab02IMDGiANw5CozX65GyYrvIeWQ8lszZkWVVDBTiLQpr9slSl/yMlaKqlBl5VkPhOyQ2REJNQlcJMsqn15aDy3eJsZcSAryFwnqFtfcRQsgGkP5+ZSsStsRXNOeHzjrgDduPMIZrjgUowXM7pr4rOZPIY7kbmk9MmORkOG2ARiyH0RwipZgxXNDPXms2iLDZetBBFyXoeDqJnF0JRJXE5NfNQ1zckOUdBafWXAt6fuNQ9ppd7BxD6gT1DNt87AtJJvktr3fOXPx3u4gBkCJgeG0bB+5MpDb4YwfEPAkLltKzKOmZwRWxg0RicLCSF7iCk/lPek7z+OpdNLtgZHDp gHJuU3ai lOauJxslkfbfGRouH3d03YbPWUqxWhf1iB5bUQ/tn0K/9mQQBs6+hmWPLdzf6YkgG92CxMN6bnROcNj3OmIoSf3jMAyQR44hswFcGh4233jkLAdHNbBEX/5Dc6ZFQ+J/wHorBjKEJRZEc8xi/EXxmuioLFaJzlxZViJI8Gp1aAAVfrD+WQV6BtvTe3J74FTp+ky0KSw1WBg3ESJ+FBMoV/97cG+nAL4bbw+kIpffu9zB5Nsj4IAx+jgZGcmHNGo8JXqx5ZcLWsvFepyJsLP9WX/rN6SvPnC5SEP8PdLNgD1jwB6cKaiB28FGbMr0FctjN9RPQVSXkl6Q96ZkZuiaBu6o3ky5G+lgmV39PJNAVAUByR0C9hQ0/mIvRyQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 11, 2024 at 3:21=E2=80=AFPM Mateusz Guzik w= rote: > > On 3/11/24, Pasha Tatashin wrote: > > On Mon, Mar 11, 2024 at 1:09=E2=80=AFPM Mateusz Guzik wrote: > >> 1. what about faults when the thread holds a bunch of arbitrary locks > >> or has preemption disabled? is the allocation lockless? > > > > Each thread has a stack with 4 pages. > > Pre-allocated page: This page is always allocated and mapped at thread > > creation. > > Dynamic pages (3): These pages are mapped dynamically upon stack faults= . > > > > A per-CPU data structure holds 3 dynamic pages for each CPU. These > > pages are used to handle stack faults occurring when a running thread > > faults (even within interrupt-disabled contexts). Typically, only one > > page is needed, but in the rare case where the thread accesses beyond > > that, we might use up to all three pages in a single fault. This > > structure allows for atomic handling of stack faults, preventing > > conflicts from other processes. Additionally, the thread's 16K-aligned > > virtual address (VA) and guaranteed pre-allocated page means no page > > table allocation is required during the fault. > > > > When a thread leaves the CPU in normal kernel mode, we check a flag to > > see if it has experienced stack faults. If so, we charge the thread > > for the new stack pages and refill the per-CPU data structure with any > > missing pages. > > > > So this also has to happen if the thread holds a bunch of arbitrary > semaphores and goes off cpu with them? Anyhow, see below. Yes, this is alright, if thread is allowed to sleep it should not hold any alloc_pages() locks. > >> 2. what happens if there is no memory from which to map extra pages in > >> the first place? you may be in position where you can't go off cpu > > > > When the per-CPU data structure cannot be refilled, and a new thread > > faults, we issue a message indicating a critical stack fault. This > > triggers a system-wide panic similar to a guard page access violation > > > > OOM handling is fundamentally what I was worried about. I'm confident > this failure mode makes the feature unsuitable for general-purpose > deployments. The primary goal of this series is to enhance system safety, not introduce additional risks. Memory saving is a welcome side effect. Please see below for explanations. > > Now, I have no vote here, it may be this is perfectly fine as an > optional feature, which it is in your patchset. However, if this is to > go in, the option description definitely needs a big fat warning about > possible panics if enabled. > > I fully agree something(tm) should be done about stacks and the > current usage is a massive bummer. I wonder if things would be ok if > they shrinked to just 12K? Perhaps that would provide big enough The current setting of 1 pre-allocated page 3-dynamic page is just WIP, we can very well change to 2 pre-allocated 2-dynamic pages, or 3/1 etc. At Google, we still utilize 8K stacks (have not increased it to 16K when upstream increased it in 2014) and are only now encountering extreme cases where the 8K limit is reached. Consequently, we plan to increase the limit to 16K. Dynamic Kernel Stacks allow us to maintain an 8K pre-allocated stack while handling page faults only in exceptionally rare circumstances. Another example is to increase THREAD_SIZE to 32K, and keep 16K pre-allocated. This is the same as what upstream has today, but avoids panics with guard pages thus making the systems safer for everyone. Pasha