From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57AD2C54E58 for ; Mon, 18 Mar 2024 15:10:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD9D96B0085; Mon, 18 Mar 2024 11:10:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A89C66B0087; Mon, 18 Mar 2024 11:10:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92AFF6B0088; Mon, 18 Mar 2024 11:10:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 827D26B0085 for ; Mon, 18 Mar 2024 11:10:28 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0AF6A1406B4 for ; Mon, 18 Mar 2024 15:10:28 +0000 (UTC) X-FDA: 81910496136.21.21DBBAF Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) by imf24.hostedemail.com (Postfix) with ESMTP id B599A18000D for ; Mon, 18 Mar 2024 15:10:24 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=ke1CFER0; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf24.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.179 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710774624; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9KVpr3mWHXuGxJ7EXrWZlpaYerSnna04C7a1q/1FFdM=; b=uelRGRGtmzyWcJK2VbL+pWtIr6VPc59A4Hs8IL0Py1nZeXcmNrb8InyItkXdN6LXA0iQ7u H4ovqFmSSnRwlgorB8kNMMcu+eMEIeEvQZ0TsMB7z80cz1ZKVDZSvV1cR57Vy6Ab+8muDR 8IDs4eUD6ZzP8BoWr+nIyENIgZtI9VI= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=ke1CFER0; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf24.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.179 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710774624; a=rsa-sha256; cv=none; b=TGtHFfsGfEd3iidyyaM5RzGh57aLpvRzQ6SpadTcwevV3o13aerJ9cfgriPh8vG1CqqkpL vpUDHkdMHSmfMXInrBs7vA1QG89P0fuJGkeyp5OhAR8Ha1YKiAbBLDjxDZdhbaaDWYMTN3 2ZjN0rmAfMT9s4p3eNaqz1JV/eAbG4I= Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-430acf667afso19520871cf.1 for ; Mon, 18 Mar 2024 08:10:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710774624; x=1711379424; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9KVpr3mWHXuGxJ7EXrWZlpaYerSnna04C7a1q/1FFdM=; b=ke1CFER08DHKO03VtupKgtGz5IZ92Z61SpmW2y19ez8z4buIxFQfFxLYpfTk5I/5yt 0Vmuq6eaPfn6O1AfffqBE+3v6dFILJz6/rwAMJl9FGtPnhcOk91cLhWvvdR2dqsvfhTG wcmmfUoSlSN5/fuiC4ef3I7ccZ+Ozi2cuUthMCdr9BAx/4JAITyPZMYO2ZC8FRRuwZgp mnVYYmyas4asZadHzzRixmRHLMBgJGCSgyoezBc+rgGU1FMhKHlcjcnZkaeyQgPke4JE D8BzGiggFSOxeM9YAzGy+hFRjM8JrpYT5h+yYn96DSO3oq3oBDo9jflPiCr9ZWQasJ9C FGCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710774624; x=1711379424; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9KVpr3mWHXuGxJ7EXrWZlpaYerSnna04C7a1q/1FFdM=; b=MC7zvwP7ehdPheLp9MTCuGwwhgSSM0k45GiVBWgVf3oaZBBR81w7kTiqpaVjO+QG3k By6OyfJFt0vt6Q+xY/m9mX/o7HGXCbyXlCM7mKHK7cv/xBu1Lqw1YZ/UReWpoAc2q1+O v/ep3aFuUs4TbgmUIq3G7HpBp8xisrNXptCf2qdIQloJ8rCQO4togYGaORICPfIh2Crc Swoz8JV/ldLdqGeQNqy8mniTSJSkGUpIGTZTejIt/UNq00EiNq2ZkTq2c95o0rXJ8Xgv 8qDpjW5HjJIQCmXxoQcyqy4JBYnNlPHr9ZBaN5bKaE+gcY/oQ05b55P6Fn68z7xdqXwB 1Dfw== X-Forwarded-Encrypted: i=1; AJvYcCV/LuX/TQGX9llE4fcDtUyIRtWpmLOrkx4Dz+z9iR9+TvvDJ1bZ8OIJHMhULeA1LmQvBMRbCXKaUNqSwzy/7Leqbro= X-Gm-Message-State: AOJu0YzcOIwqTSNYgLBWgHS374bA7OPCAGd+RGy+ZwzhWyzzYaWP2FDl P2bvhs/z81ZVRW81AxsBHoliVP+Y6rk/DjGy9rtK2YtshkipJtGi84JahSP2La45FySWUtvlheB lxulYp/LSHWW8NMSykuyJlfz0pGu0wAaLTQwRMozqX3TGABRc X-Google-Smtp-Source: AGHT+IHFN0fU8r0syuFMehIeN1NFAZzvX59LJVNDd/w7DnmgPi3+6kWifl8gfb5LM7luoZkAmDyKUfv3vsiwBPByVnk= X-Received: by 2002:a05:622a:d4:b0:430:d2ed:3bbe with SMTP id p20-20020a05622a00d400b00430d2ed3bbemr3376760qtw.59.1710774623709; Mon, 18 Mar 2024 08:10:23 -0700 (PDT) MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <2cb8f02d-f21e-45d2-afe2-d1c6225240f3@zytor.com> <2qp4uegb4kqkryihqyo6v3fzoc2nysuhltc535kxnh6ozpo5ni@isilzw7nth42> <39F17EC4-7844-4111-BF7D-FFC97B05D9FA@zytor.com> In-Reply-To: From: Pasha Tatashin Date: Mon, 18 Mar 2024 11:09:47 -0400 Message-ID: Subject: Re: [RFC 00/14] Dynamic Kernel Stacks To: David Laight Cc: "H. Peter Anvin" , Matthew Wilcox , Kent Overstreet , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "akpm@linux-foundation.org" , "x86@kernel.org" , "bp@alien8.de" , "brauner@kernel.org" , "bristot@redhat.com" , "bsegall@google.com" , "dave.hansen@linux.intel.com" , "dianders@chromium.org" , "dietmar.eggemann@arm.com" , "eric.devolder@oracle.com" , "hca@linux.ibm.com" , "hch@infradead.org" , "jacob.jun.pan@linux.intel.com" , "jgg@ziepe.ca" , "jpoimboe@kernel.org" , "jroedel@suse.de" , "juri.lelli@redhat.com" , "kinseyho@google.com" , "kirill.shutemov@linux.intel.com" , "lstoakes@gmail.com" , "luto@kernel.org" , "mgorman@suse.de" , "mic@digikod.net" , "michael.christie@oracle.com" , "mingo@redhat.com" , "mjguzik@gmail.com" , "mst@redhat.com" , "npiggin@gmail.com" , "peterz@infradead.org" , "pmladek@suse.com" , "rick.p.edgecombe@intel.com" , "rostedt@goodmis.org" , "surenb@google.com" , "tglx@linutronix.de" , "urezki@gmail.com" , "vincent.guittot@linaro.org" , "vschneid@redhat.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B599A18000D X-Stat-Signature: sgkgc1germgetpqpcgunegwr3po7j9wj X-Rspam-User: X-HE-Tag: 1710774624-977929 X-HE-Meta: U2FsdGVkX180x5oiOMJDqhPqhaG5s7+iOr1x3jmao/N94mupAshPyKoaHLLbGlr65e8md0kdYzh7w8mC2UwOHSWApuNZc1POArcRCeyn+fk6lnhJFVUS9mmr51YIol9RjpTXNBJblMiMqu38DnhRaPZEtvDNYKvgzUV+dPuMQpBhnqO3+ohFdGLWgnLkYgw3i2FmU0eKl0c0m1zHDKOBkidrnELsGyEbUnJc+gCOPvfqPcWiFI6IF3J5u01ObpqK3HJIWnErsMrscHyAfmdYJUOMBzOUXlz3OXYAbb1KZMdV3AlyieacPZDtqSVm/tpES6PhP4KZNgkvX6/a4KDfDDoHTWKyn9OIRFzit7UkKQl1laQiFc7BVFUuWRFjie6mMpyIw/A/jw/Zqs27UNnMXFvX7JA4hXWwgok+SM1PX8wCcGDklI0Af0bCftRR3ZutWlX8N8NbVvsX1L6Eifuu0CQQHFnW9VMy8YaE9WpysWgcyZ26sZGWxYi2043qQKuVlI6t5lyhFM+R24pFcZTXRb6FlllO++tFKyA+iLH/9Y2jtKZboxmYlOXi080P00sz/le0WLzZGeXQqLZcfedEVCDAD16N9jixhWD6DH/2sAwZc/8LhuvQzgH4Qe8K3gyfHLjsEOFr34m+u6zWQ4nYwT7i4XXbUDhNEt4Me6Xhknjs2BChVUWrDNup+EKbOFCLt/n2JX5j3vFksoOr347oBtpvpbRVJJn+IJ4U9WmZwU95EvqFMdsiBTSBD2VAyqQxkjj5xoUAQ8TJv+pxIm+5g5PF2KAf2wcu6DNMc4dLHHzmB28Hy0YXaxHUprXYByBQfOgLfzGyJm9dvgAS6O5yXhjBPDP+kmMxjcIb7dXok4WjFUFiC4YBNnaz3iHSrJOQRjUpsSDcTpspcBQLHhWOJ2VlgdrigegvAUe1XD6JT6lqkc/MKqxPGo9Q5TPHbm4PWVJrvDNVPR6QIesxRic mWHxaTYW DmdIQ/QYwuhcGERnWvqph58/irue/AW/y0YCEilsOUQS3aDADy67+LgGw6LIVUFqlGkp7svTBjrnZP+pwR9l0wc9Dt254dmDOlT69da8b35+F6nR6FhV1kOK0RiK1PEJ3cZxWka9bcUbju1H6fpLB1qRv1fNVWIxl4pmfS1GWSWkVHQu8vaKWKot4jdUd+zAng037StgIbHQtsaycD4RR3zxbxMJTjpfD2LFEUMb3pB8XeU04+VIx7GhSs89CGXmiIGa1j+Zg64Qx6eHU4xo7dTqwJ7EHJxD3i+TEj4xILUKmrELLwnPA0tau+MzutO8KK1jwaTHo+pWFyx0XBlj2+6r9zyR39YQV/pvD3LBZMzZcY8ZNuup9SaT68Bll2doP5PXLVRwxzLdqjgc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000068, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Mar 17, 2024 at 2:58=E2=80=AFPM David Laight wrote: > > From: Pasha Tatashin > > Sent: 16 March 2024 19:18 > ... > > Expanding on Mathew's idea of an interface for dynamic kernel stack > > sizes, here's what I'm thinking: > > > > - Kernel Threads: Create all kernel threads with a fully populated > > THREAD_SIZE stack. (i.e. 16K) > > - User Threads: Create all user threads with THREAD_SIZE kernel stack > > but only the top page mapped. (i.e. 4K) > > - In enter_from_user_mode(): Expand the thread stack to 16K by mapping > > three additional pages from the per-CPU stack cache. This function is > > called early in kernel entry points. > > - exit_to_user_mode(): Unmap the extra three pages and return them to > > the per-CPU cache. This function is called late in the kernel exit > > path. > > Isn't that entirely horrid for TLB use and so will require a lot of IPI? The TLB load is going to be exactly the same as today, we already use small pages for VMA mapped stacks. We won't need to have extra flushing either, the mappings are in the kernel space, and once pages are removed from the page table, no one is going to access that VA space until that thread enters the kernel again. We will need to invalidate the VA range only when the pages are mapped, and only on the local cpu. > Remember, if a thread sleeps in 'extra stack' and is then resheduled > on a different cpu the extra pages get 'pumped' from one cpu to > another. Yes, the per-cpu cache can get unbalanced this way, we can remember the original CPU where we acquired the pages to return to the same place. > I also suspect a stack_probe() is likely to end up being a cache miss > and also slow??? Can you please elaborate on this point. I am not aware of stack_probe() and how it is used. > So you wouldn't want one on all calls. > I'm not sure you'd want a conditional branch either. > > The explicit request for 'more stack' can be required to be allowed > to sleep - removing a lot of issues. > It would also be portable to all architectures. > I'd also suspect that any thread that needs extra stack is likely > to need to again. > So while the memory could be recovered, I'd bet is isn't worth > doing except under memory pressure. > The call could also return 'no' - perhaps useful for (broken) code > that insists on being recursive. The current approach discussed is somewhat different from explicit more stack requests API. I am investigating how feasible it is to use kernel stack multiplexing, so the same pages can be re-used by many threads when they are actually used. If the multiplexing approach won't work, I will come back to the explicit more stack API. > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1= 1PT, UK > Registration No: 1397386 (Wales)