From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCF36C54798 for ; Fri, 23 Feb 2024 21:49:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09B946B0074; Fri, 23 Feb 2024 16:49:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 04C3B6B0075; Fri, 23 Feb 2024 16:49:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E56016B0078; Fri, 23 Feb 2024 16:49:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D146E6B0074 for ; Fri, 23 Feb 2024 16:49:26 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 538C2811DC for ; Fri, 23 Feb 2024 21:49:25 +0000 (UTC) X-FDA: 81824410290.15.08FBDF4 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf05.hostedemail.com (Postfix) with ESMTP id 8E46E100008 for ; Fri, 23 Feb 2024 21:49:23 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=we5jMfTS; spf=pass (imf05.hostedemail.com: domain of pcc@google.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=pcc@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708724963; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IjWgDzganl0uKn8tNs70RFf/RtOfG3ctH7D3l3AXzxU=; b=RIwhHr7FwMtcWgdePATrqVe7Erm22x/0rXxWOjx4Xa7tgRH+0KQ12bkYllleU8j+Tu2yV/ bBl8iBcFUxvK6Rhc12kDzTVjCfepZ2T79abJjLoB73YP5Uw83tqTBIenX/1OQ4jf9/5XZA StNr1LKMQ5VXDsqi2FsSYBVIQ5xwcCA= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=we5jMfTS; spf=pass (imf05.hostedemail.com: domain of pcc@google.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=pcc@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708724963; a=rsa-sha256; cv=none; b=XGB2SzE9VmMYVO1SlLYJpdY8zwTCp4Hfg/fb1Ov1qURc0Y90qUmaGd2epQ4FRNC85gp8EE 6Psp0jV/UCyf24HI3tpf9XkaHesSxnXIDuFdqVz/ZS6McZ+dZ3jep+izpwxYwnq339dsJ0 ToBoHuIqvL9wisFN0GUAK+LhL6Mkjzs= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-1d5ce88b51cso59525ad.0 for ; Fri, 23 Feb 2024 13:49:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1708724962; x=1709329762; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=IjWgDzganl0uKn8tNs70RFf/RtOfG3ctH7D3l3AXzxU=; b=we5jMfTSxi/hgMl8LyMhHUCzJZn+FY6a44bFXcI2dpRbl8R7Xz1XfnWQcCetf0Y9HP 2jSHiYI7ibFr1y2jTfFm3Qemtrxg/KpXMVOSSK9rDfxl4cziS65Yfmk69X8U/fwP3YfR fAzs9yzSUK+nOV4DHhJvehcDAwCpOQ9sxlCZxNoq+NvbzihrHEdsCcp7mSeNfJPRlTwb hX+TA8qsIU3SWm90L+oExCC+iZhJ1nhy4RgcAEtp1PF0HQpMVZvutRRrHvudpafhEb9I G15qlhDKneN4nu4AptTkNmVewUyP6EuVU2cmPSZZYIG9kAZFK9srGS0Cv4Ln5SRUZRSG EPWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708724962; x=1709329762; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=IjWgDzganl0uKn8tNs70RFf/RtOfG3ctH7D3l3AXzxU=; b=du6ZtQaJEvPwEi6v3vVxxh02TgumxlsAt+TdYtlCNxmMcAjVY4u2sp4fbG83jo2Crp RMcjC7jUbi5K0+BtuzNAsvtNNoo8eLdpkeDQLTj//BW0IBPQPu9+yBD39s3q9Dwlx8Dl 366GUAA/0isn13vlm2kliNieZJtR0+MQQEPs4S+BMfuJAFQU1wset1fRop3tyLlxwJ7u QsYQOZ6x+kFDH/QBh9g9i/3TNZ7YOby7SZz5wfOas2w/x/vQZhlfmfADosOQGAmpDJtl GUereVvQgQRLuPj1wi5WKmA09wV+00IToljDeQkN0Y+D7F9mpD0W7JIFk0AkESgPMdkL Koow== X-Forwarded-Encrypted: i=1; AJvYcCVFI7utXtnDblUc51kpYebCwuTPvj2WBmpDcDS2T1IrDYxsRqUqJtXh9ezg73fOQeitT0zz4Pt5Ug5cA9uZ+o6naxY= X-Gm-Message-State: AOJu0YwD3Y5X/MsIXB6e/1Sup5dzVHBiwigZ9fBPFjQav19s7s8RkySL rfq1yGk9avi5AQ5YIh0kAvhROuzCYzBU5wLhKoa7ajspbqN6wu3/BUzoOiO9YAsk7oT7FvgM0Gc 6M++bDy6BXvhGIZLMXSFIsefJTg70moQWOEtl X-Google-Smtp-Source: AGHT+IGqh1SvyGJVpv9aJOfMgFoYRnYA0iBcNfQ9oIC23Rtwi1D8G/OUBtoRTyCU1xoxaWNAb16BnhjwBYK+1Wx7gbU= X-Received: by 2002:a17:902:ee06:b0:1d8:ffbe:82d0 with SMTP id z6-20020a170902ee0600b001d8ffbe82d0mr75698plb.12.1708724961990; Fri, 23 Feb 2024 13:49:21 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Peter Collingbourne Date: Fri, 23 Feb 2024 13:49:08 -0800 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks To: Pasha Tatashin , Alexandru Elisei , David Hildenbrand Cc: lsf-pc@lists.linux-foundation.org, "cc: linux-mm" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 8E46E100008 X-Rspam-User: X-Stat-Signature: 6bus6nmayqfya91ufyxfwuuz77qab56f X-Rspamd-Server: rspam01 X-HE-Tag: 1708724963-351501 X-HE-Meta: U2FsdGVkX1/AgSFTHoH5QP3MDirTaeZRzy5uG/ppcMNvrrRKn6CPfx3ZyLuWaNmKWDIzZFJ+oXtYilHNy4oUQBDCLRH6xPDZPe5OWW5OMuDkd7EafAjPDR2kDkcAwE4mi1Y89YaZqjfkxUhp4lZX+FEH7Wj+8yOEYYhiWOSQgQ3mepyhmO0OROY9gIQaH5o5Z1GwP2zawmDWSEv4HjQvrkI5Ey8IJDxKcpYkLoEt2NZTQZFb46shbaUDIHaW4tr2ZuLZWRreJt6/DN+mi9JkXj3saq9gajLkBigwcpxYwqbBgLF6zva1v2Ly0SpcNs/hcu2bKIkSlYaFRsIcxFKz4gyoPdcPdGMyXNhRNPUvhVhS4tsUGtJPTdzf0vwGsCRW6GctweSjqX9c+jwO5NIKS1bZCRgXCaQQwHFdh42mbOD3wbmcQKjwkRP9ZaRqf0W5fNiXhKoA94Um3j0FnGlHGnqqq61vutRReiR0JAN88Gj1YRyp9uJqaXJw5fbIh649MWTrKU1NMV3ERrtcz/YHCdd862hV9ZO4qx0TWFxsxcrfloztSXZaHqXn/nCseuZsexujoXENbPUsZZfb7LdOrCnHwiJorcH8f4DrOU8Vi/nVEWMxpR9Vso9FTgx/O0IMMRXKdWzaHS12+c1YfGpjBrMHmIIRLu19e8iQsQHRNAgLSRi+t3ewtEJpEvQ2b4GMIyjMbAp015Xth/derI928uEEDnmBYkSL88sgefIxdlqHoD6OZZeKkBjxo3/yfty5W0F6kzUf777s/Q9jLgJyr8weDpKH/5unNzf7EmeqE9Co9VD/VV7M+B7pkrC1stK9pBZJNICoKmzr8Q7WZPRyWHicXigme14QkNA+1QAw0cBhFlanYZQH3Z2NgME3qeH41PMz8e6oF7aog2B8RtxsE1KlYJy+DoOetdzP9pw4jKAthKwYbquZvawuUIcxaZiCYzOufT/z5+0WfTaS672 DObeep2h JgSKiA3wuRGWoyYO2t4km5s6o591MKmmwr2r/1WSfYtsoNACzCaH5T9z2VPik/4XoPhsRZl34rOkrt6H9XIaB4gijYw5CbpwEnNxntTe7QgOZpbYidpMvnvRIEEtImgUY1pcHD7+VN/0t/d9n3CuAl4wbu/SzsmnIuCmQ0M4OpvK++iS4JO9CuVbvcvXEZvYG2YynH37dnk+gCUDGu/2++QxmMYnTF/WgS2KOhbnDWOlMsnx1JnMNC5Q7/Vk75SBcTy8nThdc0Biq+tNC0Q7hkhihoNLXKl1w640VB2VI45OIx42B/ZbgE8sYdDZgrAWUYNRfZ7hyMuikCU8mOm77MnMNrLyRrghCbEitaOkBcpxcJvxGj4U1t9e0ZLW/VtJ9UmXy5lRuDlpMfXpig55zWBz6VA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.005158, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 22, 2024, 17:04 Pasha Tatashin wrote: > > For a long time, an 8K kernel stack was large enough. However, since > 2014, the default stack size has increased to 16K [1]. To conserve > memory at Google, we maintained 8K stacks via a custom patch while > verifying that our workload could fit within this limit. > > As we qualify new workloads and kernels, we find it more difficult to > keep the stacks at 8K. Therefore, we will increase the stack size to > the mainline value of 16K. However, this translates to a significant > increase in memory usage, potentially counted in petabytes. > > With virtually mapped stacks [2], it's possible to implement > auto-growth on faults. Ideally, the vast majority of kernel threads > could fit into 4K or 8K stacks, with only a small number requiring > deeper stacks that would expand as needed. > > The complication is that new pages must always be available from > within an interrupt context. To ensure this, pages must be accessible > to kernel threads in an atomic and lockless manner. This could be > achieved by using a per-CPU supply of pages dedicated to handling > kernel-stack faults. > > [1] https://lwn.net/Articles/600644 > [2] https://lwn.net/Articles/692608 Hi Pasha, I wonder if this is another potential use case for bringing back cleancache, as proposed in [1]? The idea would be that all kernel stacks have 16KB allocations but only one page accessible and the rest available as cleancache. We can handle a fault on one of those pages by discarding the cleancache page and remapping it as r/w. Peter [1] https://lore.kernel.org/all/ZdSMbjGf2Fj98diT@raptor/