From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E0AFC54E58 for ; Mon, 11 Mar 2024 19:21:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B58FD6B009F; Mon, 11 Mar 2024 15:21:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B09EC6B010C; Mon, 11 Mar 2024 15:21:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 983096B00A1; Mon, 11 Mar 2024 15:21:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 823026B0099 for ; Mon, 11 Mar 2024 15:21:06 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 514A2A0923 for ; Mon, 11 Mar 2024 19:21:06 +0000 (UTC) X-FDA: 81885726132.06.70FAD4C Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf06.hostedemail.com (Postfix) with ESMTP id 92BFF180004 for ; Mon, 11 Mar 2024 19:21:04 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JbDFX662; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710184864; a=rsa-sha256; cv=none; b=3Z3eM5LEDM9n+bG+E2X9iMIJRI4laQKjwUQz+gQiTxlFiWhTBXXi99bFr/XGwsqA/1IrAD 9PbrU+9Ml/VhA+4Rp98M+wDWjBeE9exCcsZN2ZJdgNaloj3LQvyl8msQqLSfS+VGpCYln9 T4M0YgFGYYggjwI/RtjLjsQtoFYraDg= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JbDFX662; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710184864; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sdiIgAJg0WVhlnuSmujapDS8NBl6O2XbKat3g5PwiPI=; b=cMCvucoGwrVNrTYS1U8+wjiivICOWwnh/L8Vx9ZWSPqtYgaUMfd9AoZYu+CbWPfodHYHPD 3yjn3a3aKX9O6Moad3V7ip7dkaT572pBupq9O8f6wV/1ZofAH8MhxTwn2yeKRCJFK7f9V3 kZcnr7uDRYIka7IFVKfCKWg5RjR8Dmw= Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-5682ecd1f81so4429951a12.0 for ; Mon, 11 Mar 2024 12:21:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710184863; x=1710789663; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sdiIgAJg0WVhlnuSmujapDS8NBl6O2XbKat3g5PwiPI=; b=JbDFX662nzCLP1irSsB0aHzuqJpWmJtdecXdpW/mgBnFuUwq/IgHB90Gfw94uE40+2 w/pSCCAsuEycuRoOSJ1Bp2rkkZcv4GK91HCrqpXzxVkag9Pggg6MIua5vNeJTB2sC3vB yBTit69PNu1QovD7NiIydsRXbgN9kict/JinPq+cStlmy1yobp24QVF4Hj4Od1j4jocI KG+pIV1Ak+QWQhBY/NebdhQSBQCUcv245LE1viEJRp4a9n3BwjOizC1FStMmOrlJ8hN1 iYq9Jtrq49C6+UfE1Q6zjdvS8Uo97lJi6Off9d/w49XoswZjW2YS891+qBvmwGJuq4jw r/4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710184863; x=1710789663; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sdiIgAJg0WVhlnuSmujapDS8NBl6O2XbKat3g5PwiPI=; b=eHnQ1WBWhgSmauc3vc6XfBL4FVu+oVXr+LxYMy5qNZ8Hxss2/oDdjFid/C4KSyfUqg awqX/ejWfzV57nMAARH2N8vfF1OOhWh8hY6jNxcTll5tTV5pOQSmPcB41tDydDn3/Kz7 sPaSZet/Lt5oJfnUQppscUxjVMWvrNjFnVSzlSR7JDYRxknroF5+SJlUdETnjUebQklC 8R5vSgKJqK1d60SHpdTE+zdOogEMeKzDFvQSEci2xHg+68UIIm6O76ZSTqOxgOYZZosP tyud32ZfOQT96GTMK0JaeyXJiM3gair57JVUQfwcZuek/JABFzPK0Wuvq9pSbDJkkBE6 WtEQ== X-Forwarded-Encrypted: i=1; AJvYcCV3WMhgYiWOw1ydBSLFS2oeGLltBQbHD8/HZhxjRWEeNEcVQI49Thj5QykofNgCHqUD9FcL2ezbR8OazUcOuRtnXXo= X-Gm-Message-State: AOJu0YwezkBjj0XRw/zg4+LCpjoodTHa6WYWBNBMqWQbBa/TwiIljsMD 9g61RaVri2hWQCAAJfPRRVCroYfKi/o2sLJ52yuY4tTqTURuYqjgM5YRKmDFjp64YpxF2PCIJZ4 EEkW+SpE9cj0B8wVYPRrBlEZVVmw= X-Google-Smtp-Source: AGHT+IFeiZ9ORIOiJdWwUhVsq8ZnHumI5NXM2hO8BfIFekg68bFpl1pd7vDKsZz/aQMub/KKywTthYNOv8Yl6yC9yp4= X-Received: by 2002:a50:d7de:0:b0:565:3aa7:565f with SMTP id m30-20020a50d7de000000b005653aa7565fmr4754481edj.8.1710184862828; Mon, 11 Mar 2024 12:21:02 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a05:6f02:f0d:b0:65:b649:35ec with HTTP; Mon, 11 Mar 2024 12:21:01 -0700 (PDT) In-Reply-To: References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> From: Mateusz Guzik Date: Mon, 11 Mar 2024 20:21:01 +0100 Message-ID: Subject: Re: [RFC 00/14] Dynamic Kernel Stacks To: Pasha Tatashin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, hca@linux.ibm.com, hch@infradead.org, hpa@zytor.com, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kent.overstreet@linux.dev, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, tglx@linutronix.de, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 92BFF180004 X-Stat-Signature: 85qiip9xr5sbbwm4fy9xopon3osst7pe X-HE-Tag: 1710184864-691953 X-HE-Meta: U2FsdGVkX19jM/TF5QP3Xk7aVNvVmRftI4xjwOnr4JrFtG+8x9wRigUSYCf0+/Z91izFtA+ZKRjGyhtP4iJcnFyPvxuEiG2tkHa1hNqH1p52CNVEEP9a6NOd51tkg5kSXOWLHwpIZuHJmty5l+sGNXPyRbWaR0Gmk1FwLD8t/JYn5JhQArKTrjxO9U2CBpgIc53q2A60tmzT5U7JPkjJKLQedeiElheG9w0pdzyAeRPqZZGP0wD1yo3sbh1BQTD2ST6b1kikT7FBpwVoDd2Yw/fvv2XeV7LUFcH85cAAsJnAIqrsgAtWd4OE+PlCJQpgFwO+GRbH11HbrXWTXNGYERECuA3MPVo38v1DqQr5Nq+7zoRqt7lSPjzmRdv7FqY6OiK5PTPcJERrUnG2IvmhE2KNtK46s5H13Glzp+TAEz9ppfSeoxoLDGqS+fEdJRqqcvy76c2jUOmqJendBn7mDWP73cc+bR8FDuF98QU3Tg62ziGOLRc14HDhzUjR4i3hHys3z58YMICFA0xjC40g6LR9B1t46MgP389fNLku6NN2XqHNEhX/nELoef55qoAO82TbuiDAShIMWJBL2l4B6yOJLwRbewdzAyUvUL6b0IwTlvAMsn3cAXQbO0Dr8KquOn3qAW04gxv1sPb9ucOu6c92/gWtsIquHEEFBQZrkeanTr0BWdDhTCXyL8DtISbXt6aoyV8k2EnF4ZcBq9d6DvKe2iy0h3QJ72syAUV6nFGCCm/ss/DuZoscCLMs0NLg0BuOx4jdv1T2rSutMJYTqmJrms1bO//p3UiQeZgKfnZ0A77TSV6AM7uxtCPoq8bqv3r43p1/SIXRDZUEjokDJpIm5Cz2x+OO3AlXkbEy1Bo9xbb6/mba8npZ7H7FGquGUxYIjedsjmM1PaWsY+h5yqzW6KW9pmiV/1nClgXPtOI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000054, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/11/24, Pasha Tatashin wrote: > On Mon, Mar 11, 2024 at 1:09=E2=80=AFPM Mateusz Guzik = wrote: >> 1. what about faults when the thread holds a bunch of arbitrary locks >> or has preemption disabled? is the allocation lockless? > > Each thread has a stack with 4 pages. > Pre-allocated page: This page is always allocated and mapped at thread > creation. > Dynamic pages (3): These pages are mapped dynamically upon stack faults. > > A per-CPU data structure holds 3 dynamic pages for each CPU. These > pages are used to handle stack faults occurring when a running thread > faults (even within interrupt-disabled contexts). Typically, only one > page is needed, but in the rare case where the thread accesses beyond > that, we might use up to all three pages in a single fault. This > structure allows for atomic handling of stack faults, preventing > conflicts from other processes. Additionally, the thread's 16K-aligned > virtual address (VA) and guaranteed pre-allocated page means no page > table allocation is required during the fault. > > When a thread leaves the CPU in normal kernel mode, we check a flag to > see if it has experienced stack faults. If so, we charge the thread > for the new stack pages and refill the per-CPU data structure with any > missing pages. > So this also has to happen if the thread holds a bunch of arbitrary semaphores and goes off cpu with them? Anyhow, see below. >> 2. what happens if there is no memory from which to map extra pages in >> the first place? you may be in position where you can't go off cpu > > When the per-CPU data structure cannot be refilled, and a new thread > faults, we issue a message indicating a critical stack fault. This > triggers a system-wide panic similar to a guard page access violation > OOM handling is fundamentally what I was worried about. I'm confident this failure mode makes the feature unsuitable for general-purpose deployments. Now, I have no vote here, it may be this is perfectly fine as an optional feature, which it is in your patchset. However, if this is to go in, the option description definitely needs a big fat warning about possible panics if enabled. I fully agree something(tm) should be done about stacks and the current usage is a massive bummer. I wonder if things would be ok if they shrinked to just 12K? Perhaps that would provide big enough saving (of course smaller than the one you are getting now), while avoiding any of the above. All that said, it's not my call what do here. Thank you for the explanation= . --=20 Mateusz Guzik