From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 607FBC54E67 for ; Thu, 14 Mar 2024 14:04:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E861A800AF; Thu, 14 Mar 2024 10:04:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E35E2800A9; Thu, 14 Mar 2024 10:04:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFE09800AF; Thu, 14 Mar 2024 10:04:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BC823800A9 for ; Thu, 14 Mar 2024 10:04:52 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2449414145E for ; Thu, 14 Mar 2024 14:04:52 +0000 (UTC) X-FDA: 81895815624.15.2B77577 Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) by imf18.hostedemail.com (Postfix) with ESMTP id 681471C0083 for ; Thu, 14 Mar 2024 14:04:33 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b="mS/YC0Ej"; spf=pass (imf18.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.179 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710425073; a=rsa-sha256; cv=none; b=a8jmW9Xe/exNPaEBJeXeeF+VuFwbjO4sOQ8Ys3CAidQOM8NwI5Qx/77jfn9o7tjgB2FTfO CK65vlXJTpTIFMIOLpIpQDLKhkjHw5Ut9MJ8KYU2GUTCVqZWWiHzIN7bJpTQhMJYNkDw9r GlIg0yhW4AvUokwaGRUzhcYlQ1NdYKM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b="mS/YC0Ej"; spf=pass (imf18.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.179 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710425073; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hmgq8CAaz4Ke0Zq2nDCvEYSHRExJjsASTSq1J5x5BSo=; b=37RhZQDWmYNufdGMremND2QhJVcpGV+juDORadyoQL5gmtxTG6Xqam0exrY2Nivoc5FIMi Edzq109qqyklu3lTKN9rukIjaPwzh6tZ20M9YjHjPRqR/0HIP/cAeovaUb+R0sqiKLhN78 YDUQVb4ykTGzFMSndjPIxpxCumQNgWU= Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-42e323a2e39so9179741cf.1 for ; Thu, 14 Mar 2024 07:04:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710425072; x=1711029872; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Hmgq8CAaz4Ke0Zq2nDCvEYSHRExJjsASTSq1J5x5BSo=; b=mS/YC0Ej1bHldwqPqFt9ZEe6omu26z9Lq4Z6Ib8/h2AA+mTbjhTigwU5gZMWNkpHrS SspkTWuvQmcqb/KJzpHC0RUk881YDEaNg+jk1OflHPTRF/FTDaQv18y3AMLNsPiMeLPJ EnAD6YBdAFpeKlLYoJgPFzpK3+YJ0bweSNswhuHaEsd0YYV0EgWobno2aMqJi+YCm+Uh ilnf6cCi5A3Lropy6pWl+pYQjEwg3Qxhpw3N5eJLeiOKwc4DCMyvskNtKqjfwuuX9FtE JFcdRKqMdmWGgzwkPSFfBZC9GNC1Ssrbm/AL7b88F1LabGbkRerexPsptroLd+Wp6paQ uAiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710425072; x=1711029872; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hmgq8CAaz4Ke0Zq2nDCvEYSHRExJjsASTSq1J5x5BSo=; b=cwl1UpU5kz4L9ai5+mlUMtpQs/aKGuF6JmnIxoQkma88dLWW5Oj2GFd64tj4JqxMdP ikx9lcYj8jvGW8by0ipxmkCAeYxy3tno+bwG1zCZZcyWYzwZyO9bOuMR8khYqhe/35w8 7EZw3ELix3IDICFL72/u6QXqVKWwdzoqIhypfGOpHCmP0DJ7SFkSsY7PURL2heVNQaDk AwuyM8wLpG8IZlS2YapeEg1RySPB9JfehhmtqwyC3fkPFhouEoUK88BzdvTIYuo37JEA ClWYrePwu90CqOSWpZrqtV6gMvLO/1xHywhbiQgC8VRSN1nYEh7NJuPoi9XefCYr9kY+ FKOw== X-Forwarded-Encrypted: i=1; AJvYcCW9I6pdklGCPK/iOSSZGmW8nxHwfNwCQ5KVJODOjgN7JL+HZSGm9uWlv+4nO/TbbOIjU6I9sKoeB2YZHMPI/0DjHTI= X-Gm-Message-State: AOJu0YwrrO1bny2UI093CpAmO4wVX5CF3YEZYkdl4GX5CizJO/MVG0f+ fq41V+ZAl4KTLudZAht2cSTvDFPXKqI2bLWD+faUVhJ7cD5kVi0d30bXBFKrf5vGE5817oeR96X v3BdKxrF5sE5vZDm1gQPA+AO+D0qVj5SM/4tKuw== X-Google-Smtp-Source: AGHT+IF1WiEU4N3Z6Uqdf3j6rNjBbma5Y0Jff1HtPv863Egdopl6dhLMAJDdICC3crqwRhHV2ZHM33uBwxoDyrFBoss= X-Received: by 2002:a05:622a:1816:b0:42e:db75:3cf9 with SMTP id t22-20020a05622a181600b0042edb753cf9mr8873826qtc.27.1710425072462; Thu, 14 Mar 2024 07:04:32 -0700 (PDT) MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <20240311164638.2015063-12-pasha.tatashin@soleen.com> <87v85qo2fj.ffs@tglx> <87bk7inmah.ffs@tglx> In-Reply-To: <87bk7inmah.ffs@tglx> From: Pasha Tatashin Date: Thu, 14 Mar 2024 10:03:55 -0400 Message-ID: Subject: Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks To: Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, hch@infradead.org, hpa@zytor.com, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kent.overstreet@linux.dev, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mjguzik@gmail.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 681471C0083 X-Stat-Signature: ypa84mgttoh356neqekm7awmxiua99aj X-Rspam-User: X-HE-Tag: 1710425073-951075 X-HE-Meta: U2FsdGVkX193c0+Ni2LjKSgcGmN4RwYhKmY4rksNI1Otk9KueReDSEdRoBKEi/nNiFrSR4X398+xj2Ol2ISQo861hMWpdDAOaufb4auyyxeja+3RpN/5Hzm/C0VdmAHi5wlg3o2P/Gfd9OkTK5brzVQ5Bo7PvOcgtuVW40Lgh2ov4nF8h+OlDnQZZC2qouhA+VIuuY1u1EEVNqxL3Nw1yKRHYoOJ1b1DXS6hKHLGWCrstbATSLgrnbxwzvA/jGg1tNM5SSj1TmYatKv0kvpOYCBCQApQLxk/s6A4x5f9Ccg2G4V+nwYPeZkqgKK4RnzbkJ5rAfDm5fXfcepJ8OO/AULZzc84lme7okT1VjpXGG0CGVHUZpSab53IetMqLdcPFqfGQLg+OsT8JlNxqHpg2xb8/EJxpdwi6STZReWMiXpMXdruOZ8fq/N4rxqbcz5z6iGzCbOA8F2H6Z8Gv9XmMdx2RLSsj8ziflTFixVRaGMZ5R5c5lk39KJIqyh58Peg+qOwPvObbr9ycHb6770v7uB/wAKqGuBTYaJEEQcinwFvdmkht3TtLqHyM87szrkQOPTaepJ6SwYpFQjUJawq0SITe+1uYFVcVpwWeiKVhktyXmdP6W0L0ogsvqFnE/qCX60l3j135lLepTiZuVLTGB2ywXtehCraAKnwlUN5/n93gdjrQSFoHIWJMlRDbDXDY0jtpt0wGDMMa5Gv2lnNnxTuCkNjsOAgOp5P/6Uz6yNLyrNoSccki75tH4760sHrFMqF2K16uRz93hy0KOG8w/JvT0V7YEEvcmCh/dR8g/S3/byptWguVYxiuzdBMD6oZfhRraWxYTwM0SEqCmBsOLA7I+PbX8vcGNgtTVGY5u7mYQWbOFvsuj7Y6OHJCoAIvLcNrQ+KMAeEJpNV89mMwhbEb63Qiwc2+Z7f+mPH3d/O0MkWaEGa3ScSEIpFp6il3wQ0Y5OqoirEOCowEOE ZOdMRPkJ tzk5tNJoRQu2k7SXy80F1gO0UCKIwYxOgzPU3pmsW8BAWhLP2IS+kpYYaVQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.006695, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 13, 2024 at 12:12=E2=80=AFPM Thomas Gleixner wrote: > > On Wed, Mar 13 2024 at 11:28, Pasha Tatashin wrote: > > On Wed, Mar 13, 2024 at 9:43=E2=80=AFAM Pasha Tatashin > > wrote: > >> Here's a potential solution that is fast, avoids locking, and ensures = atomicity: > >> > >> 1. Kernel Stack VA Space > >> Dedicate a virtual address range ([KSTACK_START_VA - KSTACK_END_VA]) > >> exclusively for kernel stacks. This simplifies validation of faulting > >> addresses to be part of a stack. > >> > >> 2. Finding the faulty task > >> - Use ALIGN(fault_address, THREAD_SIZE) to calculate the end of the > >> topmost stack page (since stack addresses are aligned to THREAD_SIZE). > >> - Store the task_struct pointer as the last word on this topmost page, > >> that is always present as it is a pre-allcated stack page. > >> > >> 3. Stack Padding > >> Increase padding to 8 bytes on x86_64 (TOP_OF_KERNEL_STACK_PADDING 8) > >> to accommodate the task_struct pointer. > > > > Alternatively, do not even look-up the task_struct in > > dynamic_stack_fault(), but only install the mapping to the faulting > > address, store va in the per-cpu array, and handle the rest in > > dynamic_stack() during context switching. At that time spin locks can > > be taken, and we can do a find_vm_area(addr) call. > > > > This way, we would not need to modify TOP_OF_KERNEL_STACK_PADDING to > > keep task_struct in there. > > Why not simply doing the 'current' update right next to the stack > switching in __switch_to_asm() which has no way of faulting. > > That needs to validate whether anything uses current between the stack > switch and the place where current is updated today. I think nothing > should do so, but I would not be surprised either if it would be the > case. Such code would already today just work by chance I think, > > That should not be hard to analyze and fixup if necessary. > > So that's fixable, but I'm not really convinced that all of this is safe > and correct under all circumstances. That needs a lot more analysis than > just the trivial one I did for switch_to(). Agreed, if the current task pointer can be switched later, after loads and stores to the stack, that would be a better solution. I will incorporate this approach into my next version. I also concur that this proposal necessitates more rigorous analysis. This work remains in the investigative phase, where I am seeking a viable solution to the problem. The core issue is that kernel stacks consume excessive memory for certain workloads. However, we cannot simply reduce their size, as this leads to machine crashes in the infrequent instances where stacks do run deep. Thanks, Pasha