From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50350C54E5D for ; Tue, 12 Mar 2024 17:19:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DACE16B025A; Tue, 12 Mar 2024 13:19:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D5BB56B0292; Tue, 12 Mar 2024 13:19:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C23A86B0293; Tue, 12 Mar 2024 13:19:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B25256B025A for ; Tue, 12 Mar 2024 13:19:01 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5186E120E10 for ; Tue, 12 Mar 2024 17:19:01 +0000 (UTC) X-FDA: 81889047282.02.D4E060C Received: from mail.zytor.com (terminus.zytor.com [198.137.202.136]) by imf26.hostedemail.com (Postfix) with ESMTP id EC755140007 for ; Tue, 12 Mar 2024 17:18:58 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=zytor.com header.s=2024021201 header.b=mLp1Ekhg; dmarc=pass (policy=none) header.from=zytor.com; spf=pass (imf26.hostedemail.com: domain of hpa@zytor.com designates 198.137.202.136 as permitted sender) smtp.mailfrom=hpa@zytor.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710263939; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Qe3oFmqY5MEQgBRHG9tAoiRCX5zfTbErpUtF36h8c24=; b=mdja2mWfdnqydbO9gDDazXMY9L7TVNpsVqI++JyxXlTa8hXgveBrbXDItuhAZbOuwmSvuk zVwtACn3KK5RxoK/5DKP53YcORrLXGN8aI/9pPik2jw5nQaRAR93hNBAXdnauBnDklbcfH EvUcvhDaIDS44IzLxbF+89xOhXOoqjc= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=zytor.com header.s=2024021201 header.b=mLp1Ekhg; dmarc=pass (policy=none) header.from=zytor.com; spf=pass (imf26.hostedemail.com: domain of hpa@zytor.com designates 198.137.202.136 as permitted sender) smtp.mailfrom=hpa@zytor.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710263939; a=rsa-sha256; cv=none; b=fiQtbBEknqiopDdjqvUOIL1vy6dV3ZAbXWTuvAcKFQ+QiUXAUmYh+oJjZWYcdHi4ZXng+C sRr82oKmOcOAFFXExICSpeag7rjAbWbPC3r6E/Hos5MtUDI885e3J//IuHkN3VRCyzXuEZ htx5lJmm2JOqOlcK6YcnjAdhTQswv90= Received: from [IPV6:2601:646:8002:4640:7285:c2ff:fefb:fd4] ([IPv6:2601:646:8002:4640:7285:c2ff:fefb:fd4]) (authenticated bits=0) by mail.zytor.com (8.17.2/8.17.1) with ESMTPSA id 42CHIGVW1562136 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Tue, 12 Mar 2024 10:18:16 -0700 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.zytor.com 42CHIGVW1562136 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zytor.com; s=2024021201; t=1710263899; bh=Qe3oFmqY5MEQgBRHG9tAoiRCX5zfTbErpUtF36h8c24=; h=Date:Subject:To:References:From:In-Reply-To:From; b=mLp1EkhgGinRLcAGo8uGBaxXRz4LyCYuneOqQidnn7SHNigpbVDY+9R9UYgZaugvz pybLWZ5O1JTDCT4asghe5oUkn6PLjyI/zfGKtfiwR5bN4zqH9pRFu90V+e88Aebqp2 vszN1QorH3ASTGbSs4NJ5WS1jmYrdiI0HWqDN2dFYjbIGrmJY68nFyzsJFMdMpfPcZ p0LUGdVWGyBe3LoXS2WoQkQ/ea0HIp1eUXwRnqkBiTiIDXE+WV2By5fA4lbPCQEbB+ CcWgj1ozJ4SJm3cdsS8LRFyiI4ADdFFe/NIsv56OqXZwHXXQLSFDnY14wTBYTp4UTd YH5rvrAVUJrZQ== Message-ID: <2cb8f02d-f21e-45d2-afe2-d1c6225240f3@zytor.com> Date: Tue, 12 Mar 2024 10:18:10 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 00/14] Dynamic Kernel Stacks Content-Language: en-US To: Pasha Tatashin , linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, hch@infradead.org, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kent.overstreet@linux.dev, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mjguzik@gmail.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, tglx@linutronix.de, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> From: "H. Peter Anvin" In-Reply-To: <20240311164638.2015063-1-pasha.tatashin@soleen.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: 5togyqzc98bcyf747osyohp43c8qkb3b X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: EC755140007 X-HE-Tag: 1710263938-480107 X-HE-Meta: U2FsdGVkX19HVmUiAxNTNwBkYxJgBX9xuFNPbSani/ffHnuC+nqZN8FR5G5JdM1M+FYni4RzKkDo7+Se3JfR9WCIR3DFuBjsfqAn7wPy1v+0ayomhWQC8sTtGzqAdq5WTELOwhk2KRwaZ2OzlQfMTaAfSRgDnsAlQ3ByJZHC8yx7XnkB4TpPXFeHs3y5yFafxGSAfSMGTWU7HYiYro4hQK4I5ipfiLhBzXzpuN5hHmmfXwm/dxR+CKXhs5+s7/kN40uhgpgtjuksWl7b+Uqe0c0unRwN84Bw1fdC3al8v67uJNAr3aJllRcvR5mHIKDWV8a5e4e1rBzd1jI/nRunp/Gx6477fpbPxpxJgUuwFNOpVUJio2qfCs2aa2/f2LOUoh9plHhckoiUbqmSIEy666r1iYmeFn4w7g2F/q/cLYSbwBg1PCgfsdJ3UpJnTGST0GOA3HKW6KkBCZ2zguoYToChmt1K2DoI/aIal4siEvq66WdmljCPOMLuDC3MqxcB5HMXN93PqmXiBF3NvDxKG1I0hER3t11tIvaNzoLnVplVfyC67a8NXE7V7FySvO+Z2zn83UkB+j18n69fSozcFruXYuwQPA+JTIr8g3VyQzD2dQMTkemJotgATNyO3/I2fMAz+eqZDT4vrLXvfc7fa6YWkKD/Fl2sO/06Fo36QNqexmlZtjvv2DYdulvwpZ6wsXIDUIE3kvrIOI1j5ZP2XYbcE9iX4LdYiUSSdtcMuSbUCKZkdUioFkNJiViVhhhOwY5fb0h+U42uIgjMvlzLVmsdF7bzy23bsUR7Lq0AuWpPF5Ld9AtaH/Ppav0k9tRcOZVBl70bWa1FVzBPqYc1UZX7jSL2ZkTrHF38bfU/pUlRw+a2B+PlTny5lRobd1mK4R/AtB46XQioRo5CspXYHRt7Zwo/MoUwuOTwQQyUBFVHfinKZtRxD4WrAhq4HKX/Is5bRR0y5NS67Mpai2J PRhVrWe4 xoUt2Kydp+t/KtatnAbWH12ipjfxaDZu1MlJ39L71/WooUdHVO1O5khRMNXs+/6th7DvSh82PvaXpcFjFYWjmmHOjcY6lkfc8JQCh9jvnAlvNv65k9grF7Dr9YzVYjakDPeyU5ivPx8uwUhdcLc8w0sbA2wXHGvRvmkW6tPOIsmmz8ap53If25ZWlsujr102klnUC9uvSeWIiwktaKW3ocAMihQmE4RVfQF6Mvcx9Ow5sgQRE6SJFbBgx+876RNXiKPTw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/11/24 09:46, Pasha Tatashin wrote: > This is follow-up to the LSF/MM proposal [1]. Please provide your > thoughts and comments about dynamic kernel stacks feature. This is a WIP > has not been tested beside booting on some machines, and running LKDTM > thread exhaust tests. The series also lacks selftests, and > documentations. > > This feature allows to grow kernel stack dynamically, from 4KiB and up > to the THREAD_SIZE. The intend is to save memory on fleet machines. From > the initial experiments it shows to save on average 70-75% of the kernel > stack memory. > > The average depth of a kernel thread depends on the workload, profiling, > virtualization, compiler optimizations, and driver implementations. > However, the table below shows the amount of kernel stack memory before > vs. after on idling freshly booted machines: > > CPU #Cores #Stacks BASE(kb) Dynamic(kb) Saving > AMD Genoa 384 5786 92576 23388 74.74% > Intel Skylake 112 3182 50912 12860 74.74% > AMD Rome 128 3401 54416 14784 72.83% > AMD Rome 256 4908 78528 20876 73.42% > Intel Haswell 72 2644 42304 10624 74.89% > > Some workloads with that have millions of threads would can benefit > significantly from this feature. > Ok, first of all, talking about "kernel memory" here is misleading. Unless your threads are spending nearly all their time sleeping, the threads will occupy stack and TLS memory in user space as well. Second, non-dynamic kernel memory is one of the core design decisions in Linux from early on. This means there are lot of deeply embedded assumptions which would have to be untangled. Linus would, of course, be the real authority on this, but if someone would ask me what the fundamental design philosophies of the Linux kernel are -- the design decisions which make Linux Linux, if you will -- I would say: 1. Non-dynamic kernel memory 2. Permanent mapping of physical memory 3. Kernel API modeled closely after the POSIX API (no complicated user space layers) 4. Fast system call entry/exit (a necessity for a kernel API based on simple system calls) 5. Monolithic (but modular) kernel environment (not cross-privilege, coroutine or message passing) Third, *IF* this is something that should be done (and I personally strongly suspect it should not), at least on x86-64 it probably should be for FRED hardware only. With FRED, it is possible to set the #PF event stack level to 1, which will cause an automatic stack switch for #PF in kernel space (only). However, even in kernel space, #PF can sleep if it references a user space page, in which case it would have to be demoted back onto the ring 0 stack (there are multiple ways of doing that, but it does entail an overhead.) -hpa