From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3DEDC54E58 for ; Tue, 12 Mar 2024 00:08:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52DFC8D000C; Mon, 11 Mar 2024 20:08:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B75C8D0008; Mon, 11 Mar 2024 20:08:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 357AA8D000C; Mon, 11 Mar 2024 20:08:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 244AF8D0008 for ; Mon, 11 Mar 2024 20:08:56 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CE3D11A02C6 for ; Tue, 12 Mar 2024 00:08:55 +0000 (UTC) X-FDA: 81886451430.09.C1D804B Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf15.hostedemail.com (Postfix) with ESMTP id 07AA4A0005 for ; Tue, 12 Mar 2024 00:08:53 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=L2qTvsvb; spf=pass (imf15.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710202134; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AnBY48HsUFUFKc13ajCqh9BG/16i5QkSsCcX73NoIzo=; b=PhqhjuiTPqSV10Gcg2eHlhKghmnzT8TsDbpVON6fupnzYsBPGf8+J35IGqp1IVb6LyedAV q43s5wPxFQLXcoIe7ymuJMBw+XKCjT2KIbpdY6PAW+rVtZjb16dcXFjiPxR7YspnaxtUBt TcpEQz4tsfoFSWM6WN7EbjduKV+SL2g= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=L2qTvsvb; spf=pass (imf15.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710202134; a=rsa-sha256; cv=none; b=OngtheNBTcNELrBmuKm1nPdXDbKjwflrV79Fi9hr7n5vF3CqYeRuuhGjWTsOLM6i3E1Mdy PWTcotvjpQFJspeaIhzFkwHab1jOWWTUTAeAWdpUAOKBWdZJRKSdAZjbyEUpd9/BOZ4HK9 Vhwyp2hI79dJAoCIVh+JoDU5gmr/NFA= Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-42e323a2e39so43310631cf.1 for ; Mon, 11 Mar 2024 17:08:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710202133; x=1710806933; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AnBY48HsUFUFKc13ajCqh9BG/16i5QkSsCcX73NoIzo=; b=L2qTvsvbg9DarmtcP/V+zxrsRJqUcxbU4mKEVx9nqJ29NLXQ7iUKwuRXICc0RE9gTX yw8XdsR9TyJyPgYyOIRpWYHrpPIE0gpAK8V5xl2kuSCcdCJPG5M0fZM6AYhO12ruK8q0 rwACkPCrx1HcY1Qqa8hJyLuRNEKT+AaRxf7rmgNtVCzoZ/SBoM1BmQ5EUaQVyTmif3DZ wMS64nZhgQDwPsWq0v3ZR+7oxZxd4nFPrIkHnV+e1MuW6RtAEH3hzIH0Nex54q1d20A7 wRSFUZ5IB+Qex9yM8qd4d3YrHJ0gzSncRpFUZLLK8FDeuYwILRm8KvJoVtkJSgFy7C62 ++3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710202133; x=1710806933; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AnBY48HsUFUFKc13ajCqh9BG/16i5QkSsCcX73NoIzo=; b=oRDOXkNlJkvsQXW3vGg/3pYzevVBuJaHPMQXWspYKKxxFVrz8lwCtiK32AhNLkmSkH MX2d1brJxdqmccfNDEeJjXC0GVpPrtWcKx8c8yraBEzK+B2M985McHTtg38//J/ERJzA 83CZUrLfJVKWst51znmI2nw/EAFFPndc31dThv/VX5gw+R/nCqDTEjw1JtLoQxK6Gpbn tQ6EWIhuTW2anEKfhF5NdC77yeRBnqPM4JdYXOZrEKYvKfn3j5mkpHnku7e93w8nJH4V EZLIXCJHuoN/cX7Majj4a5phAxOm/IkvUwMotpBAFuFA75mIMKi3gSH1pxNBBx2tRC8/ i2Rw== X-Forwarded-Encrypted: i=1; AJvYcCWY/21MhqaKbhYYE7eMghgUfq8bs6CR/F9aGezmm6S/KPsZO31wJoao8KneRi45HoCe8jAYLEnKLzGTk/C7DNmAqsg= X-Gm-Message-State: AOJu0YyA4bapUvHsZn7NZsOJYAfi6lnLnwe87m7r/p6xHaxmgncDxvsa rN4458PvaxQpI0kqa0aUITJNBTooUqHO1mQGVemVHWkCNMoOvGL75DMAZhjyUB434BEDfrwmDO+ R8PctctvdCNakvX3Gc9d9/LCK0F5hpLP5hssYSQ== X-Google-Smtp-Source: AGHT+IFclJ3u+ZdSsmwur/E7OHzlyZd0+Q+MXWI4kYlZH3wpOUFOLSiwEmYSM3EPMVo3LpV3zVQHzIbAtD83uPkQQSo= X-Received: by 2002:ac8:5f13:0:b0:42e:db75:3cf9 with SMTP id x19-20020ac85f13000000b0042edb753cf9mr14493400qta.27.1710202133152; Mon, 11 Mar 2024 17:08:53 -0700 (PDT) MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <20240311164638.2015063-12-pasha.tatashin@soleen.com> <3e180c07-53db-4acb-a75c-1a33447d81af@app.fastmail.com> <1ac305b1-d28f-44f6-88e5-c85d9062f9e8@app.fastmail.com> In-Reply-To: <1ac305b1-d28f-44f6-88e5-c85d9062f9e8@app.fastmail.com> From: Pasha Tatashin Date: Mon, 11 Mar 2024 20:08:16 -0400 Message-ID: Subject: Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks To: Andy Lutomirski Cc: Linux Kernel Mailing List , linux-mm@kvack.org, Andrew Morton , "the arch/x86 maintainers" , Borislav Petkov , Christian Brauner , bristot@redhat.com, Ben Segall , Dave Hansen , dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, "hch@infradead.org" , "H. Peter Anvin" , Jacob Pan , Jason Gunthorpe , jpoimboe@kernel.org, Joerg Roedel , juri.lelli@redhat.com, Kent Overstreet , kinseyho@google.com, "Kirill A. Shutemov" , lstoakes@gmail.com, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, Ingo Molnar , mjguzik@gmail.com, "Michael S. Tsirkin" , Nicholas Piggin , "Peter Zijlstra (Intel)" , Petr Mladek , Rick P Edgecombe , Steven Rostedt , Suren Baghdasaryan , Thomas Gleixner , Uladzislau Rezki , vincent.guittot@linaro.org, vschneid@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 07AA4A0005 X-Rspam-User: X-Stat-Signature: tgfnz7quf7emf1h6raygobjtthyzinaj X-Rspamd-Server: rspam01 X-HE-Tag: 1710202133-437275 X-HE-Meta: U2FsdGVkX1+2ZY8oxpRvjWEEF5epnk4OtnmmRsou+jgTEEbeEu8aCpeu0r6KaoMqPXYl0oazeM6ijYB4QEZzhPy6QVZbb0NiiiR4SO/2drgYwxdXl4N944BcfF+QNx6JbtcM08il42kzamBsAbWtmy1MyqgyYpbQUIkqgrYhxV2xCCR2Ylz7+C9rOaUeBpZTLXIGbDXbArz89NgDaRACB4GwAsIBQjWeTAYAwCoyAx7QF/bvvnmCXqLFqVy5VCKKjgaTKUFzaILINz8Sg3HYNrN+JgQ6M2jgoOQ0FodJ/kNYFxWTQYZEM7G9wBSUyUqfpcyVdz1ikIgVAVddTdPUeNZaZzEDMTonih9iTebV8aSJVfWiqoRQUUdhDH551CsSRHci3bVAb1uzYP2rea7qfWkrf/NpmGMZ4UTiRU+xm7O7TKdW06eMTIOwzE/jbfpqepIYTZcD4wqjEi1oKhMMK4wVUnIJgZ3Ejj4+T46YynLci7Tc1dLxJNQx149Ps8/JOcMWL7CcRIRnxm01lGf5bSy5kNx1wQJtI+YU5iZgOtF1K4k5X5ZizqOcK+Mmi9M06C+FpyIPcVuXN7J9T2BlpC5vM8KhLsHDJsjsFzi6HwhnrQ/u3Ywn/Iddhj0VnH/HIg/51WSSqZOBYNZ2+qWVKVdGiYWrpVCOErvc72988sdVSY4IGKQMmAlt1DCJRIPr+yrz/vN83vZfQsW2WgFDBGcydJBDWGg4Vgkjg1r8tr/tvJtsOPIpwkAIvlBIyGU2h5Sog2HChDKx/6Mf2Pfhb1Qc+FOEOfObVu6b1C15qNM55AAGD1GlVAP3Zu4P5XMDFvW+UmY+Bc/Rcm7kK7AULeF84AhHTlOILB9703FVKu7MQ08mN9Nr76zIXcCE3RnIN1sbiDrv+6FquVwdU9TJfhXIisryuFd9Ig/LAtbtUHHfVTCOFL2S/dF6O8LUwmHVhVHatjfzQaeHtd44BSX lOs9jiZF K8/pkc2bPAG7RSD5mUwkr9O14DMEMRaqk6b5JAwEKa8DeQejX79ixDxFr/y3ryMt8cxKLfO+LeLLKVGjLXFfXm8fI2F7uBPP+XsWTnwna5GMV8Lj/B664u7UUbIQL5Y+1yivxPibLUbmtyz7GrjyiNuK8Ai5IS1+8BuHxmQCFmNErhQkkPaMgcaqfWK008XtUJ7JeC2WKUv4sW92V2CbWmdjf/CDECyc0LpfCmbUKTktCP1M07dYlzfOnY2CTS5SzFjGIpOmiSlKGecvoKF8NRXBdqaag9pAWqMqgdWvQ9gAo2lvnsjAP+vTJOhapBWzrblMt3PiG3izbmpo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > >> There are some other options: you could pre-map > > > > Pre-mapping would be expensive. It would mean pre-mapping the dynamic > > pages for every scheduled thread, and we'd still need to check the > > access bit every time a thread leaves the CPU. > > That's a write to four consecutive words in memory, with no locking requi= red. You convinced me, this might not be that bad. At the thread creation time we will save the locations of the unmapped thread PTE's, and set them on every schedule. There is a slight increase in scheduling cost, but perhaps it is not as bad as I initially thought. This approach, however, makes this dynamic stac feature much safer, and can be easily extended to all arches that support access/dirty bit tracking. > > > Dynamic thread faults > > should be considered rare events and thus shouldn't significantly > > affect the performance of normal context switch operations. With 8K > > stacks, we might encounter only 0.00001% of stacks requiring an extra > > page, and even fewer needing 16K. > > Well yes, but if you crash 0.0001% of the time due to the microcode not l= iking you, you lose. :) > > > > >> Also, I think the whole memory allocation concept in this whole series= is a bit odd. Fundamentally, we *can't* block on these stack faults -- we= may be in a context where blocking will deadlock. We may be in the page a= llocator. Panicing due to kernel stack allocation would be very unpleasan= t. > > > > We never block during handling stack faults. There's a per-CPU page > > pool, guaranteeing availability for the faulting thread. The thread > > simply takes pages from this per-CPU data structure and refills the > > pool when leaving the CPU. The faulting routine is efficient, > > requiring a fixed number of loads without any locks, stalling, or even > > cmpxchg operations. > > You can't block when scheduling, either. What if you can't refill the po= ol? Why can't we (I am not a scheduler guy)? IRQ's are not yet disabled, what prevents us from blocking while the old process has not yet been removed from the CPU?