From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F15CC54791 for ; Wed, 13 Mar 2024 15:29:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8BF7780037; Wed, 13 Mar 2024 11:29:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8484D940010; Wed, 13 Mar 2024 11:29:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C20E80037; Wed, 13 Mar 2024 11:29:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 573F8940010 for ; Wed, 13 Mar 2024 11:29:06 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2A52F160353 for ; Wed, 13 Mar 2024 15:29:06 +0000 (UTC) X-FDA: 81892399092.21.1EAAAEE Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf30.hostedemail.com (Postfix) with ESMTP id 3235F80021 for ; Wed, 13 Mar 2024 15:29:03 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=gmlWdpjb; spf=pass (imf30.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710343744; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HWKlYumLkX/+t+ibDwbAcEteYC479oCXqcOK3/Ft1Sg=; b=k2Z29xWdbENN3qNFAlU3/gaFFd1xalqFK1Kx4qZXj5ENhlce10autSGPBgbphhvYKaVJiq Q12gUgOAMACzyL7K+lAmclQLSe7MX8ZPpHVZ6h1mdoFfvMYwG6NJBO6V3SYN7fNpKU1CkC LYpqDOrFPeoRUbM1VRMtSQIwiYeAZpc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710343744; a=rsa-sha256; cv=none; b=Hi0o7NKJJd9E52hOSrjfqBjRyOHy8bFAK4KrSm7t8f0I9fZMyeFXeG0+C/u6WHd0Z992dL 7CVHgj4b3eTgWqnY4orfbinGPjx8ACfQToXsiDupL/CrpvqzI6vu6PVuVMPh4mbPAHv1D2 nGhYu6JFRhVmMoloAOD7rTl169HCmcQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=gmlWdpjb; spf=pass (imf30.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-42f0d09f567so26903881cf.1 for ; Wed, 13 Mar 2024 08:29:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710343743; x=1710948543; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HWKlYumLkX/+t+ibDwbAcEteYC479oCXqcOK3/Ft1Sg=; b=gmlWdpjbAr3yMio8pFTFaG1X+P2a3CwOo8q+TJlH0Kv7tTOXX/FDeZl9/L5LAB9qBm dzbKi1KOcx9xB5gASHSOMvw+2Mpv5m1ckcZp5wzDEDDA4N4WuhYLJqNQRSx2aHotCC/5 uscsGyhg8eM4tOg8p/VIyl4g8lvpCNgd7Bu8bmivkIfKc7ExfL8uAmVSqQ5HRatCV4rY AQVWiPfPbZVBfAuEy9PXKN+IGgWpcbExY7yY+kghDK2yUDKZM6dS7D3pdYkY0JR0hc/8 yLkLUX3j/EBA+iZC0GXgWqNgwmNr2rMSAU9hI/irxsIwXpv3z2zeL7EJnkoFVBetHkh8 pQNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710343743; x=1710948543; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HWKlYumLkX/+t+ibDwbAcEteYC479oCXqcOK3/Ft1Sg=; b=bgSH+fEb/9bDU4YTayw8qTLK7jmCUgWXkJ19ZD2LjEUgCatfkz+aPWO+MI2EwAJr5l +Bhk+mZoAGQxPb+I1PPQGBB93X57jPbhgTGggFmV5N3uNR+1+zQBhmKVDdBpnDandcw+ lc2yEEwvTQpsE2JCPrBzJGSVxQHEnB5U0p/KM0m7D1vKMshYGXEK2E+rnBCkINCx4L4o GK68sE0ZUTb2FuKWOFoLF0yEkvlk4Dz5Ce0kHsqdEXuTpRg9kNlxht5i739nJqoxgFFa 3gVHcrfbLP+OuQBOCaJYJ+qjdIbYRWAdAwr4lPzuA8v2XGBjzIfWxapXYG4K4SnnhXBN tUDg== X-Forwarded-Encrypted: i=1; AJvYcCWORXz4K9IvEkGiSaldbsxVklLzIYKS6xvbGe79Tmo/ppo0aAU2DaK0WIswx+7rfL8PQo+8CPWOI8BIBZ0QHP7N0Zc= X-Gm-Message-State: AOJu0YzC/yjujTGh/NvUNjLOTEi4QhXZ1Su2cFYb29ydwiEFgpvUwBMF mkX91QwgVkQwkidOQ+zK0g9fU3ix3Ebm9XTbsuNr2bHYwKd2Si/GgNDJDt8RRyemH7trQ1eFn+h DzdbhbOrvCpfcpFp6BFI0bEDAcSlOG+UIwzuK4w== X-Google-Smtp-Source: AGHT+IEFWsngRc36KlsisR2qyCNd3E/wD9YsSrdlX25LgRM0Ku6xw1Arb90QhWmAXGGmH9s8YBlIqxzNWtDBwy1ApS0= X-Received: by 2002:a05:622a:1909:b0:42f:200b:8d0a with SMTP id w9-20020a05622a190900b0042f200b8d0amr4149422qtc.43.1710343743210; Wed, 13 Mar 2024 08:29:03 -0700 (PDT) MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <20240311164638.2015063-12-pasha.tatashin@soleen.com> <87v85qo2fj.ffs@tglx> In-Reply-To: From: Pasha Tatashin Date: Wed, 13 Mar 2024 11:28:26 -0400 Message-ID: Subject: Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks To: Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, hch@infradead.org, hpa@zytor.com, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kent.overstreet@linux.dev, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mjguzik@gmail.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3235F80021 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: ihj6kwtkgbpbkrg544qjqbnndnrh3w76 X-HE-Tag: 1710343743-872937 X-HE-Meta: U2FsdGVkX18e31MAg1lQrafG95T/jzc0M4SgnSSa7gjt0KNYi0U9o0egkCNUVJG03OtPdGsuqpP7aBI1oZ9JoVxxedp7HC57c8qx1N7nJ4oz+3vZ2GD2FPSqd+4aWY1Qsmh/HkA5V7TfFznjpKiDXD4MvinOq54eAn8IWmbuxCnUKk7dzKOSQdIXnJibF5CjFPAzJIlgzwlzLt8r3vUxhBWJojv/araoh81nifi/cSRIrCp6yHEsZhV/cbVobH0rmz527FHjM1IJ34V+IkQHkWNSSBjIvbq+OZuoJL2luF+AoNNNEH/5rfz+fEXZ56c5G/u0hq/6uHSXxGttZ91dQLvP+keU3rDb9/LgArmybd31uyXCJzTNaVOqyQL5CMsBKaR01pMyIhM7fUkKbFYb3qotaE4U52r+OoHgnnBOesZL+TOy3zb4miS0VTyMe3SCYStNKNH2EW2eUF4Jvna/7NnnRGL3ikekBhTGkQIMB9mYd4emxZEr/Yh/jsihCakVl8A4cQ0GImB3aLKuEWW3N0dgJC91yLC/jg8fb/rzS7UnsdDvmJGkCIFAq/pkuJ4yKQg4K22tiVhEx0kHcchdgF05ZaHBGiaDYJib3M3Fc2DEie5TltCKoPN2H+sH411DCmWQWihnivf2VZvVHNREVB414cdtcnyfpkS3EcgWtnT6hjiiYiDO6JdVpPf2Q5aTWEGo34lUb13v3lESUQ/UhksaNXBc5QSfSEvbjB8NG1Ovg7su7ydeiHqKldYZpwgDjDwOj3qGfyunYwEKwqaLvP2h4UN4H42BsNa/POYIueu6x9D0XdNjNXkHgq1mb/xzCr1ewgFzb+I/maS5tsIetQFl3naa9AKK4YBdrnb/1RTxgj2hDXdXgCqvTPd1gERDIctb5rohElacSzxIMfI3u0zY/STtuCbIdKJ9IrBDnv0OKIhFQSLSbEj3XfMOO3kTjnNjNphRCnB4+SZ3DO9 89rcsF66 u9w5wQxDMi6t27410I8m5b8H7R2kOBI4u5XKetPMxX5t1LDBIPJ6hnGqW7s3oPkvq3jX+lcFK2P1wNqrQL2g9AjieZhP/F8rxt2hDcAkFz6XXdZjks34rHdHR/9N0vF7j/hElpoU2NdfhH5dWqlooX6PKlVgG4CZVGrPvzitloqKFKSeNep1LeNVhY9BlonhMzpAdmPPp+MaNOlF1ODwLJZME1obJIfqPEuOYCKNDFQS82yVtCdg2XpmyFCl35AoeN9sVJQfwsN5T8zMI9dwwLEs2+EKbJVO5XzZJ6lPX10LpZ5ODBYPVNSlD6I3TaKs06tKaAVdURqNvNZ3HC4Shd2r7a3CUNiOIokEw6+y8/Li99mGNWCu6ah8AVtag4snxTqC7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 13, 2024 at 9:43=E2=80=AFAM Pasha Tatashin wrote: > > On Wed, Mar 13, 2024 at 6:23=E2=80=AFAM Thomas Gleixner wrote: > > > > On Mon, Mar 11 2024 at 16:46, Pasha Tatashin wrote: > > > @@ -413,6 +413,9 @@ DEFINE_IDTENTRY_DF(exc_double_fault) > > > } > > > #endif > > > > > > + if (dynamic_stack_fault(current, address)) > > > + return; > > > + > > > irqentry_nmi_enter(regs); > > > instrumentation_begin(); > > > notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEG= V); > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > > > index d6375b3c633b..651c558b10eb 100644 > > > --- a/arch/x86/mm/fault.c > > > +++ b/arch/x86/mm/fault.c > > > @@ -1198,6 +1198,9 @@ do_kern_addr_fault(struct pt_regs *regs, unsign= ed long hw_error_code, > > > if (is_f00f_bug(regs, hw_error_code, address)) > > > return; > > > > > > + if (dynamic_stack_fault(current, address)) > > > + return; > > > > T1 schedules out with stack used close to the fault boundary. > > > > switch_to(T2) > > > > Now T1 schedules back in > > > > switch_to(T1) > > __switch_to_asm() > > ... > > switch_stacks() <- SP on T1 stack > > ! ... > > ! jmp __switch_to() > > ! __switch_to() > > ! ... > > ! raw_cpu_write(pcpu_hot.current_task, next_p); > > > > After switching SP to T1's stack and up to the point where > > pcpu_hot.current_task (aka current) is updated to T1 a stack fault will > > invoke dynamic_stack_fault(T2, address) which will return false here: > > > > /* check if address is inside the kernel stack area */ > > stack =3D (unsigned long)tsk->stack; > > if (address < stack || address >=3D stack + THREAD_SIZE) > > return false; > > > > because T2's stack does obviously not cover the faulting address on T1'= s > > stack. As a consequence double fault will panic the machine. > > Hi Thomas, > > Thank you, you are absolutely right, we can't trust "current" in the > fault handler. > > We can change dynamic_stack_fault() to only accept fault_address as an > argument, and let it determine the right task_struct pointer > internally. > > Let's modify dynamic_stack_fault() to accept only the fault_address. > It can then determine the correct task_struct pointer internally. > > Here's a potential solution that is fast, avoids locking, and ensures ato= micity: > > 1. Kernel Stack VA Space > Dedicate a virtual address range ([KSTACK_START_VA - KSTACK_END_VA]) > exclusively for kernel stacks. This simplifies validation of faulting > addresses to be part of a stack. > > 2. Finding the faulty task > - Use ALIGN(fault_address, THREAD_SIZE) to calculate the end of the > topmost stack page (since stack addresses are aligned to THREAD_SIZE). > - Store the task_struct pointer as the last word on this topmost page, > that is always present as it is a pre-allcated stack page. > > 3. Stack Padding > Increase padding to 8 bytes on x86_64 (TOP_OF_KERNEL_STACK_PADDING 8) > to accommodate the task_struct pointer. Alternatively, do not even look-up the task_struct in dynamic_stack_fault(), but only install the mapping to the faulting address, store va in the per-cpu array, and handle the rest in dynamic_stack() during context switching. At that time spin locks can be taken, and we can do a find_vm_area(addr) call. This way, we would not need to modify TOP_OF_KERNEL_STACK_PADDING to keep task_struct in there. Pasha