From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56D2DC54E58 for ; Wed, 13 Mar 2024 13:44:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81AE88002A; Wed, 13 Mar 2024 09:44:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A2FB940010; Wed, 13 Mar 2024 09:44:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61CDC8002A; Wed, 13 Mar 2024 09:44:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 4D971940010 for ; Wed, 13 Mar 2024 09:44:30 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D147B40F9D for ; Wed, 13 Mar 2024 13:44:29 +0000 (UTC) X-FDA: 81892135458.30.A1089D0 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf05.hostedemail.com (Postfix) with ESMTP id 09745100015 for ; Wed, 13 Mar 2024 13:44:27 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=TIKygaKD; spf=pass (imf05.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710337468; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1unsI42A9OoozSRZhehZbbrOVDAXegi3ZIYJY1GY/YI=; b=61Dk66pIAcvTHyc/F9dwlcMCAtbkbUDJF0gOsnOK9nd0sJgjyFmxokk2Uy7G50iO3aIfDk uBHWZqRUboUT6Z3P+HzjbIhdww64ZObKemMXmWYe6wX2evdGptEC4heEZ0KfdvPqFGiO+p nO1CDEKDTgzLUuc8ue+GNsS9BrQ9ymk= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=TIKygaKD; spf=pass (imf05.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710337468; a=rsa-sha256; cv=none; b=MdtnZOGQdQvf8OCZp8aqOsiOmS9EjbMtdGuOeN8KHeA8aRYwoCToXU6zCvnW1aIPbQOgzY x/Z9nMyTqdoqCbb7Wc+Fc0Nzztl5MuBIgAmuF3RKhEbyQ2GY2JsCkqqE/PDuqaYb7Mx1o2 MJfqUpdGEBPeEBeEc2l/pF4Y3Lj/Mv4= Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-42efa84f7b5so6397161cf.1 for ; Wed, 13 Mar 2024 06:44:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710337467; x=1710942267; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1unsI42A9OoozSRZhehZbbrOVDAXegi3ZIYJY1GY/YI=; b=TIKygaKD9cC5XQEf7W0LeQBL66gShJV6DauHKqwnKJ43v22TfkoiZdcVC6kyB7B4d/ vdtilwJcbQaLpyZLKA4Yz6Di7jXqNH96wSoovmo5Nnwf7QtNZUu4nEy5BD5Ozo0HNGgt jJlPflX6cl5Yx6OQwXL7mKvHqa+WaeuziJBLu2MKl4K930oRvqqfYs909VKnNHO8kx3+ K5l6+TG3pK9gABGLxhmCbowc7khOsbdoScCErx9ITgxA7ZQZ5NnSxaKYPEwKHqmUgmpZ WLK6dKi6CQhBQW1mBPWOfz+nW9txaccWLcaiHshIgXwU5k1CE1ck6J/guBoMR/+q0fF1 yiZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710337467; x=1710942267; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1unsI42A9OoozSRZhehZbbrOVDAXegi3ZIYJY1GY/YI=; b=ws6zbb4cowSXhak9rWN7yZ0jxbYGdfahLYBfxEtO+e+rhUUyNDlTkwUSAN8B+1TekV P4p/L+hC1L0+5PU/CKaZtrQHWEBHMb8t6h1CN03FHgoUqI6hACg3i7Ej9wcAZbeuxuAv feTCFmoY4yucX6Mj5ajGcWSwfJhGCGlcyFrNOh0D40ZekmblTTdlnJtjixmUqGYaBdzC ct4CTbDxN+UGmOKQs6MEry+kVqrsSHNxCe9lO3pc9blsaRQgy0RBgsccNKMhDrunvjxG Utvs/IOjhlpstNF9V6sCyJi+L0A/IlPhCQaROL18S8iy2OudqCEt6cL0dWsOcKGG8cLV mlFg== X-Forwarded-Encrypted: i=1; AJvYcCUG3+RsbuKrFM9z2gEWvkWx9WCCVhvrnVtIXkbirRtnpQjUvlzQ0G0OiWDfxfVgKYcT/HnfiSZgzIiZNERri0BCq6s= X-Gm-Message-State: AOJu0YyeWQUKmpc2k1dVZRcGAfY5Lox7ms3eBAGfNac3Xdfp4pmMgC3u qbxbFjbJS0DTZchtJtifC33hAk0PzGJ3+XEkkYdOLiAReIkCdlaEqpJNOg+E5vCrATDp3pKlP7i 6C0DYAGoIThANg45p+SwZUNGP1bX9n1L7TfXy7Q== X-Google-Smtp-Source: AGHT+IE+LvF+pG53KH+24dZ0NQseSIenM5+QuAbESRPftegLLVAhNTGpcBbK43Nn7zGumkmtmUYhNMgvVTy85/c3XU4= X-Received: by 2002:a05:622a:1a26:b0:42e:f6c2:e68b with SMTP id f38-20020a05622a1a2600b0042ef6c2e68bmr4264866qtb.14.1710337467079; Wed, 13 Mar 2024 06:44:27 -0700 (PDT) MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <20240311164638.2015063-12-pasha.tatashin@soleen.com> <87v85qo2fj.ffs@tglx> In-Reply-To: <87v85qo2fj.ffs@tglx> From: Pasha Tatashin Date: Wed, 13 Mar 2024 09:43:50 -0400 Message-ID: Subject: Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks To: Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, hch@infradead.org, hpa@zytor.com, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kent.overstreet@linux.dev, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mjguzik@gmail.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 09745100015 X-Rspam-User: X-Stat-Signature: fku7uabdc9a8rmonpxdw3rceqr9xewos X-Rspamd-Server: rspam01 X-HE-Tag: 1710337467-779065 X-HE-Meta: U2FsdGVkX1+2sCAyE5urLHrEOiNrI9wRKc0PTRCKoDqt3N4vKOWja96zkBDJmx2O9GEo3eaOePA6lZV9igYUFOT8nUeFkDU54KcyajMH6ZH93t6qpUuqEz0lwntjtK11RTOsO7J2Bytdcq66Wfo0XyqnCOMAyuP4YeK3bbiWlLv4vWch3KZI3s1UjpCekWQCwMBBJGkzQSkdbkrO2ZbHvoQOQ/co882vUW5X/J7hJg9UJRu65+Y0c6HUnUxK31HGfZLxtUfEninN8dztOyjSBI8Vq941aCUMaR9sLrx0BhDAG4QUZ9o7CxhNowW0wL3lscYWe0WuXIVXcJfL5eOWcLWBiXFSgzw0SCYiFx1/lLGtv+HA3HQFjlNMjBCzneRDWxKl9wpliL1+y1vlYVrLYaX29A0GCeBYHoXNwrYeLSBurUs5CvRgLqRdgZqNr8WycMoF8/dq1SD7oQevNY7qXo562k+n/Ln6jltkYZ49E2nUEPTVxUudVU/QYfJhEOyC98JFniSzFaBlbu0K9FWf8ypr6Puxq0a2HpnzvbsdqToh6qpF1zruA3L/2kf324kjtdl3W0XJ/ncI+LgZYqIkfyehMgIHiKoQa6gciuEE7QHM5/+M7PR3oJZCDclwzEdrqoEV/YeviBn8NuVX4dLZ+dmDFqC+u2HEStowjCeFHTQBSxkF6fP/bEiPa1iK0Wbt/8gADIps7VAXji8/UFT/FnjKa8wAuAULU6QVVDHjvq7zrWLri12eVG372+k33YZu+D8rc75UZ4qcKqGBwuuJp5Ysj3SHLgxDf4l27yChhHBcflMFQI6CqAIold7PwoMDmiDTht3z1AkB8rzkyulyvI4yjJJhvHp2nTk8jCcIzm3cOE/X7fT+L89zKExHzn0RO6yGmqk2KPPx1U7lAXnSoKvhBrgx4XYJoXyVbper/Pc2T7dxQjGX519YeUbmtmhAJ1Yf7WKc4CeM0cGFF/Z cSMHFl2d y0xWVVhp4xuB+tWUSjmjjPbTE1Qq74bUzP70FdAXk1rlO1REZ/Eyd05wMA0SKk1SatYVeO4zBY+2uTAYe8vfdYD+kGGVkF7oFtvQiQ2oxgTcfeXm+gI28O7SeGifttTgL7ZNW98vU3U+aDKngMG6383T1j1jjCKIgkMJxm9t4ucXbvJQypG/kC2sipKT+l7eFVJoKAzlJsydgiG9fslWx79e63+0kM5uetR5BvvEMoYY853dpJAPGfRfbaZ+yrIesAevi8v/zhPb6VIjC83HIOEU7gtsICGqHm1Lv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000184, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 13, 2024 at 6:23=E2=80=AFAM Thomas Gleixner wrote: > > On Mon, Mar 11 2024 at 16:46, Pasha Tatashin wrote: > > @@ -413,6 +413,9 @@ DEFINE_IDTENTRY_DF(exc_double_fault) > > } > > #endif > > > > + if (dynamic_stack_fault(current, address)) > > + return; > > + > > irqentry_nmi_enter(regs); > > instrumentation_begin(); > > notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV)= ; > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > > index d6375b3c633b..651c558b10eb 100644 > > --- a/arch/x86/mm/fault.c > > +++ b/arch/x86/mm/fault.c > > @@ -1198,6 +1198,9 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned= long hw_error_code, > > if (is_f00f_bug(regs, hw_error_code, address)) > > return; > > > > + if (dynamic_stack_fault(current, address)) > > + return; > > T1 schedules out with stack used close to the fault boundary. > > switch_to(T2) > > Now T1 schedules back in > > switch_to(T1) > __switch_to_asm() > ... > switch_stacks() <- SP on T1 stack > ! ... > ! jmp __switch_to() > ! __switch_to() > ! ... > ! raw_cpu_write(pcpu_hot.current_task, next_p); > > After switching SP to T1's stack and up to the point where > pcpu_hot.current_task (aka current) is updated to T1 a stack fault will > invoke dynamic_stack_fault(T2, address) which will return false here: > > /* check if address is inside the kernel stack area */ > stack =3D (unsigned long)tsk->stack; > if (address < stack || address >=3D stack + THREAD_SIZE) > return false; > > because T2's stack does obviously not cover the faulting address on T1's > stack. As a consequence double fault will panic the machine. Hi Thomas, Thank you, you are absolutely right, we can't trust "current" in the fault handler. We can change dynamic_stack_fault() to only accept fault_address as an argument, and let it determine the right task_struct pointer internally. Let's modify dynamic_stack_fault() to accept only the fault_address. It can then determine the correct task_struct pointer internally. Here's a potential solution that is fast, avoids locking, and ensures atomi= city: 1. Kernel Stack VA Space Dedicate a virtual address range ([KSTACK_START_VA - KSTACK_END_VA]) exclusively for kernel stacks. This simplifies validation of faulting addresses to be part of a stack. 2. Finding the faulty task - Use ALIGN(fault_address, THREAD_SIZE) to calculate the end of the topmost stack page (since stack addresses are aligned to THREAD_SIZE). - Store the task_struct pointer as the last word on this topmost page, that is always present as it is a pre-allcated stack page. 3. Stack Padding Increase padding to 8 bytes on x86_64 (TOP_OF_KERNEL_STACK_PADDING 8) to accommodate the task_struct pointer. Another issue that this race brings is that 3-pages per-cpu might not be enough, we might need up-to 6 pages: 3 to cover going-away task, and 3 to cover the new task. Pasha