From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 898ECC5475B for ; Mon, 11 Mar 2024 18:59:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA65C6B0104; Mon, 11 Mar 2024 14:59:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7D1E6B0105; Mon, 11 Mar 2024 14:59:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D75906B0106; Mon, 11 Mar 2024 14:59:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C7F666B0104 for ; Mon, 11 Mar 2024 14:59:24 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 45FBA1A064F for ; Mon, 11 Mar 2024 18:59:24 +0000 (UTC) X-FDA: 81885671448.20.288CBA8 Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf30.hostedemail.com (Postfix) with ESMTP id 95FEF80002 for ; Mon, 11 Mar 2024 18:59:22 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b="oaVKDQ/T"; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf30.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710183562; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=F7yEjUH2hqaPjpKrV4lv2CTP/Ykmgg4sI9QI5kdnGhY=; b=BrWHlDgUbygmVIHcx6MokPRiPkYDDCCApQe7MwRSPbKvKKde3QnT0GJPtWpOAfwQr43h9Y 4beCWIVfKewWiTTBP+YKbOBZXO/D8J5lXG+Ni5uKiNgkhAK2+8k6EZ5M/iHeqwVmDM9YTj QYPZ0a2EP9r7lii9J5e4rLDT5k+FJ9s= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b="oaVKDQ/T"; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf30.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710183562; a=rsa-sha256; cv=none; b=TGSkk0T+xgkBDs16sj/I1duZ+MqtvyjbOHCNNq7Y/7J65qNLJ9vBmJwG7ZePYZ665RnUAh HKXDq6kHWuFlk6ptE1gC2WH1Og98pAuhP32Nznc2kj9BIw0Ns0PUn4VF19Tu6hrW8ZDQ4p Jp21C6iLTVKGz7FuISBTTGTurymFSGc= Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-42a029c8e76so38960271cf.2 for ; Mon, 11 Mar 2024 11:59:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710183561; x=1710788361; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=F7yEjUH2hqaPjpKrV4lv2CTP/Ykmgg4sI9QI5kdnGhY=; b=oaVKDQ/TWAwYjMS760TPdbtJp1rvfmM5R4DdZ5uxmO/LQFY5Q23OS/uMZvXZ+8Nf+U UvMt47BvnHIBfnTpQ9JugqPjnpaSCNAxkVzms0U2qCshLPCZmamlgFpzt8tHPwaNWUP+ xZOe4GVr3AMg8xYIm+qVlXdJ/T0OLbtBDyoMLO8NioAg07nnkhALn4Juwqi2NAhZ+ii4 4wzRItcPMyiOkKPUBigSqH4tXAaVyq0hmlgPHaLth/o+g1uQnfy/z6EZyb0gOoqK70tl WJ3XEgFMuXGxCLJDjG632HQHFyB0XxXJ+yr3cBb+e34R+2X2XJi/SM2tVhGR5nNR0E1+ LmyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710183561; x=1710788361; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F7yEjUH2hqaPjpKrV4lv2CTP/Ykmgg4sI9QI5kdnGhY=; b=O/BOGkn2weKvyYASDeZg9gUENEDvTXYkJ9kojk385GONfMSpprXFx5CCMu6vVjsjys p94Mvn5tJd8RFU6Soqx2j17DoqLwF7nv59TrjEHNf6JfS0mv8D2gZHTvXLe/1ebuTPnh IrxGXmp1KtqIPEYoQTqmz4lJ8pUfRF9rIZk/72JCpk8OK31kZn+xhNEdWKr5qD6IVHUr BZjX3QKpdIBb7tzSQhoYWfLUI8RF6dfUS3bKL38SGbitPj69ax8FU3vXk6UA1NXPbyv4 zwFBhiC+/l3Tl0G3A4yiMT8VnHSHW4/hSAhkpkUPR7aQQIRxY9c8UtlOTXfBYuFrEAGO 3Lwg== X-Forwarded-Encrypted: i=1; AJvYcCUypaNbc85fQ/qQh9mBZQgiyfRqD3/Jkr0ZuT8OufpwNCZXLov44cNWxeFqQj0jtqLQqASJKZ6So8k1CnQKh1y26QA= X-Gm-Message-State: AOJu0YzzuaHQvnubIUJTbYeBYiPclZpV8DVFby/885V5j1q41AE1Qxjq /zY58h5Ezt9T90FLRHs2pSBDHB+i0pMppeZUaS+qmMs+JOBjNs7QfHgY97ad38UIQpDeH4NutTR tC8gJGhgx6FxtZq9g5+B9Bunpv/7huolIbLtCvg== X-Google-Smtp-Source: AGHT+IHhWoUziIo/dulgzgVm9eXi+3lXMqItaWWRi6EPNkrmYCmfHTmNck2A3vCarMYvNNFfu4ThbWVSvWRNxjTf1Ys= X-Received: by 2002:ac8:7d07:0:b0:42f:f7b:f789 with SMTP id g7-20020ac87d07000000b0042f0f7bf789mr1493749qtb.40.1710183561602; Mon, 11 Mar 2024 11:59:21 -0700 (PDT) MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> In-Reply-To: From: Pasha Tatashin Date: Mon, 11 Mar 2024 14:58:45 -0400 Message-ID: Subject: Re: [RFC 00/14] Dynamic Kernel Stacks To: Mateusz Guzik Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, hch@infradead.org, hpa@zytor.com, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kent.overstreet@linux.dev, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, tglx@linutronix.de, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 1kgoow8pkhsgixyrgc5ce5t54dnizya4 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 95FEF80002 X-HE-Tag: 1710183562-416719 X-HE-Meta: U2FsdGVkX1+TaE/OWC5MvHIUxxxfZNuaGGhxy1sYHkc4u42K0dXgpHjs9d4K0Qgv2l+SHXBuDUaGV4Hr5iBiM5aTnuxqnm3IPW4Derb5eK0fYFmLdrHA6TbWUSS8Hzt5WR0CPOdC6jqi9Kxu40Otn2jaVAsWPHmtzqr0rQWGVxkLT0xdhgAEJhEQLf4tAZyd9N0cehp9LjR9/zKgQJSYCOyRFpKUtJbqxNnydXFhvF8NjAbsgCismt8jt6DT9oIzWHaNyp4Rk/Xuv+Ul6ktVIWuAHoWZB870xrTtRSy1duxpCCmxivLkSJSRpXbw66VXZqEBGMK+hCOBmF4foBn6ijjU8s0C68aSSNqH7+bMM7mhO3T7ygz9GlaFxmTvpZdL0XmKPBZ1SsNMHIe4BE2NcF1s6VRDNVc/n97X499JfWH/J+Dthi35wPYt9clPD0sQifIbFqw7ICS6q00m3GKrLSVy42ssUatobvhKkIhGwHZPLIw/7/ntRH5nK9DwQraUcZKMG0rgsEJvI2uk+mOboSwmT3xx2x/pCoNdMavQIpkI2masegWyr8HmHQfWFRllalxCmgPUWgsA23FwXz5ot9GksCT/4x/pW5yqW5W0Y0cNukuVxTPsq0TLQ1nXSSck+Uqav31C7B0BtVoVH+X7Z3vdEVF35RHic9jHRYZApvav+qA6tpZJe2IiecvE2qvCJ/zGO9f7rq1XBvNp1Z6BISu5+gSrKaBRo9C5XZ5r5DsiXzPfNw4IKtq/3eO4yEAQf4MgycQ83+xrl9HQX1CvvDJALky0TSQss9TkdvPH6ceDlblkQUYSA7D+HH8v7ca90oE08havvzMBaST5nHpR3DyZEFLxRxRY4q4Td8KAYTir4arQ38Cwq3X4jJH59pYEqIAdQcO2x0wLsI+qoCF7bU+x/OThC5rltFJTXew05D3cQNYLYzKoeyCtHo+n0zXQ9tm+ziQNF9QlMKs9Npv dAWc5Xwu EFNqNLQ/bZ7hHCQ0p5qGoCF6sDgXAp6md5PAgTixezFO5VXS9j7YqH1HkJQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.001471, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 11, 2024 at 1:09=E2=80=AFPM Mateusz Guzik w= rote: > > On 3/11/24, Pasha Tatashin wrote: > > This is follow-up to the LSF/MM proposal [1]. Please provide your > > thoughts and comments about dynamic kernel stacks feature. This is a WI= P > > has not been tested beside booting on some machines, and running LKDTM > > thread exhaust tests. The series also lacks selftests, and > > documentations. > > > > This feature allows to grow kernel stack dynamically, from 4KiB and up > > to the THREAD_SIZE. The intend is to save memory on fleet machines. Fro= m > > the initial experiments it shows to save on average 70-75% of the kerne= l > > stack memory. > > > Hi Mateusz, > Can you please elaborate how this works? I have trouble figuring it > out from cursory reading of the patchset and commit messages, that > aside I would argue this should have been explained in the cover > letter. Sure, I answered your questions below. > For example, say a thread takes a bunch of random locks (most notably > spinlocks) and/or disables preemption, then pushes some stuff onto the > stack which now faults. That is to say the fault can happen in rather > arbitrary context. > > If any of the conditions described below are prevented in the first > place it really needs to be described how. > > That said, from top of my head: > 1. what about faults when the thread holds a bunch of arbitrary locks > or has preemption disabled? is the allocation lockless? Each thread has a stack with 4 pages. Pre-allocated page: This page is always allocated and mapped at thread crea= tion. Dynamic pages (3): These pages are mapped dynamically upon stack faults. A per-CPU data structure holds 3 dynamic pages for each CPU. These pages are used to handle stack faults occurring when a running thread faults (even within interrupt-disabled contexts). Typically, only one page is needed, but in the rare case where the thread accesses beyond that, we might use up to all three pages in a single fault. This structure allows for atomic handling of stack faults, preventing conflicts from other processes. Additionally, the thread's 16K-aligned virtual address (VA) and guaranteed pre-allocated page means no page table allocation is required during the fault. When a thread leaves the CPU in normal kernel mode, we check a flag to see if it has experienced stack faults. If so, we charge the thread for the new stack pages and refill the per-CPU data structure with any missing pages. > 2. what happens if there is no memory from which to map extra pages in > the first place? you may be in position where you can't go off cpu When the per-CPU data structure cannot be refilled, and a new thread faults, we issue a message indicating a critical stack fault. This triggers a system-wide panic similar to a guard page access violation Pasha